Journal Article Page
European UrologyVolume 62, issue 4, pages e69-e82, October 2012
Comparison of Three Different Tools for Prediction of Seminal Vesicle Invasion at Radical Prostatectomy
Accepted 3 April 2012, Published online 2 May 2012, pages 590 - 596
Statistical prediction tools are increasingly common, but there is considerable disagreement about how they should be evaluated. Three tools—Partin tables, the European Society for Urological Oncology (ESUO) criteria, and the Gallina nomogram—have been proposed for the prediction of seminal vesicle invasion (SVI) in patients with clinically localized prostate cancer who are candidates for a radical prostatectomy.
Using different statistical methods, we aimed to determine which of these tools should be used to predict SVI.
Design, settings, and participants
The independent validation cohort consisted of 2584 patients treated surgically for clinically localized prostate cancer at four North American tertiary care centers between 2002 and 2007.
Robot-assisted laparoscopic radical prostatectomy.
Outcome measurements and statistical analysis
Primary outcome was the presence of SVI. Traditional (area under the receiver operating characteristic [ROC] curve, calibration plots, the Brier score, sensitivity and specificity, positive and negative predictive value) and novel (decision curve analysis and predictiveness curves) statistical methods quantified the predictive abilities of the three models.
Results and limitations
Traditional statistical methods (ie, ROC plots and Brier scores) could not clearly determine which one of the three SVI prediction tools should be preferred. For example, ROC plots and Brier scores seemed biased against the binary decision tool (ESUO criteria) and gave discordant results for the continuous predictions of the Partin tables and the Gallina nomogram. The results of the calibration plots were discordant with those of the ROC plots. Conversely, the decision curve indicated that the Partin tables represent the best strategy for stratifying the risk of SVI, resulting in the highest net benefit within the whole range of threshold probabilities.
When predicting SVI, surgeons should prefer the Partin tables over the ESUO criteria and the Gallina nomogram because this tool provided the highest net benefit. In contrast to traditional statistical methods, decision curve analysis gave an unambiguous result applicable to both continuous and binary models, providing an insight into clinical utility.
Statistical prediction tools are increasingly common in contemporary medicine . As just one example, a recent literature review identified >100 models in prostate cancer alone . This profusion of models begs the question of model evaluation: How do we know whether a model is a good one? Typical recommendations from the methodological literature emphasize external validation, that is, testing a model on a data set other than that used to generate the model, and that “both calibration and discrimination should be evaluated”. However, there is often little guidance as to how calibration or discrimination should be assessed or the results evaluated. For example, in the words of one group of well-regarded experts, researchers should “pre-specify acceptable performance of a model in terms of calibration and discrimination… it is, however, unclear how to determine what is acceptable”.
We evaluated three published models for seminal vesicle invasion (SVI) in prostate cancer patients: the 2007 update of Partin tables , the European Society for Urological Oncology (ESUO) criteria , and the nomogram developed by Gallina et al. . Complete removal of seminal vesicles is commonly performed during radical prostatectomy for prostate cancer. The tip of the seminal vesicles is close to the arterial supply of the bladder base and to the proximal neurovascular bundles. Some investigators have suggested that sparing the tip of these structures would decrease erectile and urinary dysfunction , , , and .
Three small retrospective studies have demonstrated improved functional outcomes following seminal vesicle-sparing surgery , , and . In a fourth study, Albers et al. randomly subjected patients with localized prostate cancer either to a seminal vesicle-sparing approach or to a radical prostatectomy with complete seminal vesicle removal. The authors observed better urinary, but not erectile, outcomes in patients who underwent a seminal vesicle-sparing radical prostatectomy .
Moreover, SVI is associated with poor prognosis after radical prostatectomy , , , and . It is reasonable to assume that omission of seminal vesicle removal in patients who have cancer in the seminal vesicles would result in worse cancer control outcomes. As such, seminal vesicle-sparing surgery should be restricted to men at low risk of SVI. Furthermore, the higher likelihood of SVI may favor a wider resection of the prostate and performance of pelvic lymphadenectomy.
Several published tools have been devised to determine SVI risk. We have previously published a comparison of the Gallina and Partin prediction models . Our results were rather indeterminate, with one model (Gallina nomogram) showing better discrimination and the other (Partin tables) better calibration. Similarly, the ESUO criteria were recently evaluated using a decision analytic approach . In the current study, we performed an evaluation of the three tools for SVI prediction by using a broad range of statistical methods.
2. Materials and methods
2.1. Study population and clinical and pathologic assessment
The study population consisted of 2606 consecutive patients treated with robot-assisted radical prostatectomy for clinically localized prostate cancer at four North American tertiary care centers between 2002 and 2007. None of these patients was involved in the creation of any of the three prediction tools, and, as such, this constitutes an entirely independent external validation. To comply with the inclusion criteria of the Gallina nomogram and of the 2007 Partin tables, 7 patients were excluded for clinical stage T3-T4, 11 patients for unknown clinical stage, and 4 patients for prostate-specific antigen (PSA) level >45 ng/ml, resulting in a study population of 2584 patients. The three prediction tools are based on similar routinely collected data: clinical stage, biopsy grade, and PSA level; the Gallina nomogram and the ESUO criteria require, in addition, percentage of positive cores. The data were prospectively collected at each of the centers and then retrospectively reviewed. Operative and postoperative pathologic data were all present as part of the inclusion criteria.
The clinical stage was assigned by the attending urologist according to the 1992–2002 American Joint Committee on Cancer TNM guidelines. In all men, pretreatment PSA was measured before digital rectal examination and transrectal ultrasound (TRUS). All patients underwent multicore (≥10) TRUS-guided prostate biopsy. Biopsy Gleason grades were assigned by dedicated genitourinary pathologists at each institution . All prostate specimens were processed according to the Amin et al. protocol and graded according to the system . In all patients, complete removal of the seminal vesicles was performed.
2.2. Statistical analysis
For each model we calculated area under the receiver operating characteristic (ROC) curve, calibration plots, the Brier score (mean square error), sensitivity and specificity, and positive and negative predictive value. For the continuous predictors, namely Partin tables and the Gallina nomogram, we used a probability of 2.5% to dichotomize into positive and negative the results for calculation of binary statistics, such as sensitivity. We considered several novel techniques, including risk stratification tables , the net reclassification index , decision curve analysis , and predictiveness curves .
Decision curve analysis incorporates the clinical consequences of using a prediction rule by applying a different weight to positive and false-positive results. A false-negative result (not removing cancerous seminal vesicles) has more serious consequences than a false-positive result (removal of seminal vesicles free of cancer). However, the weighting of false-negative and false-positive results can be varied according to patient preferences or differences in opinion about the risks of the procedure. These preferences represent the threshold probability for action (ie, seminal vesicle removal). For example, an individual who had a threshold probability of 3% would choose complete seminal vesicle removal when risk of SVI is ≥3% but seminal vesicle sparing when risk of SVI is <3%. For each threshold probability, decision curve analysis quantifies the net benefit of using an SVI predictive model relative to preserving seminal vesicle in all men. The optimal strategy is the one with the highest net benefit across the complete range of reasonable threshold probabilities. Net benefit can be interpreted as the number of true-positive instances of SVI treated with seminal vesicle removal at surgery, if no individual without SVI was subjected to seminal vesicle removal.
A second novel method is the predictiveness curve. This plots cumulative proportion of predictions against absolute risk. The proportion of predictions thought to be in an indeterminate region, where risk is not obviously too low to warrant seminal vesicle removal or too high to warrant sparing, is indicative of a model's value: Better models have fewer patients classified as intermediate risk. The two other novel methods we considered, risk stratification and net reclassification, have been explicitly deemed inappropriate for comparison of two models  and  and so were not considered further. All analyses were conducted using Stata v.11 (Stata Corp., College Station, TX, USA).
Table 1 shows the characteristics of the study cohort (n = 2584). The patients’ characteristics are typical of stage-shifted US radical prostatectomy population, with most individuals with low grade, early stage disease. SVI was observed in 109 (4.2%) patients.
|Variable||Overall population||Patients with seminal vesicle invasion|
|Median PSA, ng/ml (quartiles)||5.2 (4.0, 7.0)||5.3 (7.8, 13.5)|
|Clinical stage, n (%)|
|T1||2100 (81)||73 (67)|
|T2a||381 (15)||22 (20)|
|T2B+||103 (4)||14 (13)|
|Biopsy Gleason score, n (%)|
|≤6||1387 (54)||16 (15)|
|7||1010 (39)||60 (55)|
|≥8||187 (7)||33 (30)|
|Year of surgery, n (%)|
|2002||61 (2)||2 (1.8)|
|2003||123 (5)||3 (2.8)|
|2004||410 (16)||8 (7.3)|
|2005||459 (18)||16 (14.8)|
|2006||788 (30)||39 (35.8)|
|2007||743 (29)||41 (37.6)|
|Percentage of positive biopsy cores (quartiles)||20 (10, 39)||33 (16, 63)|
|Biopsy cores: ≥60% positive, n (%)||244 (9)||31 (28.4)|
|Pathologic stage, n (%)||–|
|Presence of seminal vesicle invasion||109 (4)||–|
PSA = prostate-specific antigen.
Table 2 shows statistics for each model. ROC curves are given in Figure 1; calibration plots for the Gallina and Partin models are shown in Figure 2A and 2B, respectively. The Gallina nomogram had the highest area under the curve (AUC): 0.805 versus 0.792 for Partin and 0.692 for ESUO. The Partin tables had better calibration and very slightly better Brier score than the Gallina nomogram (0.0376 vs 0.0382). No calibration plot could be provided for ESUO because it only provides binary results. In addition, the Brier score for the ESUO criteria was far inferior (0.506).
|Gallina nomogram||0.805 (95% CI,0.763–0.848)||0.0382 (95% CI,0.0330–0.0440)||92.7 (95% CI,86.0–96.8)||33.1 (95% CI,31.2–35.0)||5.7 (95% CI,4.7–6.9)||99.0 (95% CI,98.1–99.6)|
|Partin tables||0.792 (95% CI,0.751–0.833)||0.0376 (95% CI,0.0315–0.0443)||89.0 (95% CI,81.6–94.2)||56.3 (95% CI,54.3–58.3)||8.2 (95% CI,6.7–10.0)||99.1 (95% CI,98.5–99.6)|
|ESUO criteria||0.692 (95% CI,0.663–0.721)||0.5062 (95% CI,0.4876–0.5255)||90.8 (95% CI,83.8–95.5)||47.6 (95% CI,45.6–49.5)||7.1 (95% CI,5.8–8.6)||99.2 (95% CI,98.5–99.6)|
AUC = area under the curve; PPV = positive predictive value; NPV = negative predictive value; CI = confidence interval; ESUO = European Society for Urological Oncology.
Figure 3 shows the predictiveness curves. If risks of 2–5% are considered intermediate, then the proportion of patients at intermediate risk is 14% for the Partin tables but 45% for the Gallina nomogram; if the range is 1–4%, the proportions are 55% and 43%, respectively.
Figure 4 shows the decision curves. Use of the Partin tables to determine treatment approach has the highest net benefit across the whole range of threshold probabilities, even for very low threshold probabilities likely associated with surgery. In particular, use of the Partin tables has a higher net benefit than the current clinical strategy of seminal vesicle removal in all men. Table 3 shows the values for net benefit plotted in the figures. Table 3 also shows the advantage of using Partin tables to determine treatment rather than the current strategy of treating all men. The table gives net reduction in interventions. A difference of 31 for a threshold probability of 2% can be interpreted as follows: Using the Partin tables to determine seminal vesicle resection is equivalent to a strategy that led to 31 fewer patients per 100 undergoing unnecessary seminal vesicle resection but did not fail to treat any man with affected seminal vesicles. This does not imply that the Partin tables have a zero false-negative rate. Instead, consider a new prediction model that had no false negatives and reduced false positives by 31 per 100: The Partin tables would have equivalent net benefit to this new prediction model.
|Threshold probability, %||Net benefit||Net reduction in unnecessary seminal vesicle resections per 100 patients|
|Treat all||Gallina nomogram||Partin tables*||ESUO criteria||Gallina nomogram||Partin tables||ESUO criteria|
* The 2007 Partin tables have the highest net benefit for all threshold probabilities >1.5% where they are equal to ESUO.
ESUO = European Society for Urological Oncology.
We have evaluated three prediction tools—the 2007 Partin tables, the ESUO criteria, and the Gallina nomogram—that have been proposed to inform clinical decisions about the removal of seminal vesicles at radical prostatectomy We found that the traditional statistical methods were not of value for distinguishing among the three tools. Using sensitivity and specificity required us to dichotomize two continuous predictors (Partin and Gallina models), and it was not entirely clear whether increases in sensitivity were worth corresponding decreases in specificity. AUC and Brier score seemed biased against our binary decision tool (the ESUO criteria) and gave discordant results regarding which of the two continuous prediction models was optimal. The results of the calibration plot seemed to favor the Partin tables, although no calibration plot was possible for the binary predictor. The predictiveness curves were similarly restricted to comparison of Partin tables and the Gallina nomogram and gave inconsistent results depending on how intermediate risk was defined. The two other novel evaluation tools, risk stratification tables and the net reclassification index, were also found to be inappropriate for a comparison of published models  and .
In contrast, the decision curve analysis gave an unambiguous result applicable to both the continuous and binary models. With respect to ambiguity, the decision curve result stands by itself; in comparison, there is no need to trade sensitivity and specificity or compare calibration and discrimination.
This was also true of the Brier score, although the Brier score appeared biased against the binary predictor (the ESUO criteria). To further explore Brier scores, we manipulated the ESUO predictor to artificially increase its sensitivity and specificity. This did not markedly improve the Brier score. Even when we modified the ESUO tool until it was virtually perfect (100% sensitivity, 90% specificity), its Brier score was still far inferior to that of the Gallina nomogram and Partin tables. This suggests that the Brier score is a questionable metric for tools that provide binary-coded predictions, such as the presence or absence of SVI.
Conversely, the Brier score is applicable to prediction tools that provide a probability that can be quantified on a continuous scale, such as the Gallina nomogram and Partin tables.
It is highly noteworthy that of all tested methods, only the decision curve analysis provided information enabling us to identify the most clinically useful model. Other methods, such as the Brier score or AUC analysis, failed to provide such information. For example, the AUC analysis found the Gallina nomogram superior to Partin tables. Conversely, the Brier score analysis reversed this preference order, suggesting the use of Partin tables to predict SVI. However, imagine that new data were published indicating that the benefits to seminal vesicle preservation were much lower than reported and the risks much greater. In this case, the threshold probability at which a surgeon might consider seminal vesicle resection would be much less than 1%. Statistics such as AUC would be affected, leading us to conclude, for example, that the Gallina nomogram had superior discrimination. That neither model should be used to determine surgical approach, and instead that all men should be treated, would not be apparent. Decision curve analysis provides a clear answer to the question of comparative effectiveness: Instead of having to wonder whether discrimination is high enough to warrant use of a model, or whether miscalibration is sufficiently severe to prevent its use, the approach with the highest net benefit is chosen.
The main advantage of the decision curve analysis is that it evaluates the consequences of using a prediction tool in clinical terms. Instead of providing abstract and potentially conceptually challenging numeric benchmarks, such as the concordance index (the probability of correct classification for a randomly selected discordant pair) or the Brier score (the mean square error between predictions and outcome), decision curve analysis is based on the numbers of cancers treated versus unnecessarily performed surgeries. Such a concept is clearly reflective of daily clinical decision making. In addition, although differences between strategies are sometimes small, decision theory would hold that the strategy likely to lead to the best outcome should be chosen, irrespective of the size of the difference. Note that decision curve analysis is most appropriate for single decisions and single end points; complex decisions involving multiple decision points and end points are best analyzed using more sophisticated forms of decision analysis.
It is noteworthy that whereas in the current population, decision curve analysis univocally demonstrates the superior net benefit of Partin tables relative to other models when predicting SVI, it is also possible that, in different cohorts, other tools may result in the highest net benefit. In consequence, our comparative analyses should be performed in different study populations to provide further evidence about the use of Partin tables as a tool for predicting SVI in radical prostatectomy candidates. Similarly, since the study population consisted of patients treated in four tertiary-care North American centers, our results may not be generalizable to low-volume centers. It is also noteworthy that all patients were subjected to robot-assisted radical prostatectomy. Therefore, our findings may need to be further validated in a population of patients with clinically localized prostate cancer treated with an open approach.
Lack of pathologic review of the surgical specimens represents an additional limitation of the current study. In addition, it must be kept in mind that in the past few years, the parameters for clinical decisions have improved, with progress of imaging techniques (eg, magnetic resonance) and new treatment options.
Finally, it may be that decision curve analysis is not informative for evaluating prediction models in other contexts. For example, some models are used only for patient counseling and no explicit decisions are made depending on predicted risk. As such, it is not entirely clear how a decision analytic method should be interpreted. That said, clearly a key aim for prediction models is to aid medical decision making. Decision curve analysis can complement traditional statistical metrics for prediction tools, providing an insight into clinical utility.
In the current population, the 2007 Partin tables are better than alternative tools for predicting the presence of SVI. Unless a clinician would consider seminal vesicle resection to be necessary, even in patients at a very low risk (<1%) of harboring SVI, using the Partin tables to determine resection would improve clinical results compared with the current strategy of seminal vesicle resection in all men. Moreover, our analyses showed that one methodological approach, namely decision curve analysis, appears best suited to identify the SVI prediction tool that provides the optimal prediction characteristics. Decision curve analysis has the ability to provide clinically meaningful comparisons between predictive models and can readily determine whether use of a model would lead to better clinical decisions. Other statistical methods for evaluation of prediction models gave inconsistent results that were difficult to interpret.
Author contributions: Giovanni Lughezzani had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Lughezzani, Zorn, Vickers, Karakiewicz.
Acquisition of data: Zorn, Lee, Shalhav, Zagaya, Shikanov, Gofrit, Thong, Albala, L. Sun.
Analysis and interpretation of data: Lughezzani, Budaus, M. Sun, Cronin, Vickers, Karakiewicz.
Drafting of the manuscript: Lughezzani, Zorn, Budaus, M. Sun, Vickers, Karakiewicz.
Critical revision of the manuscript for important intellectual content: Zorn, Vickers, Karakiewicz.
Statistical analysis: Lughezzani, Cronin, Vickers, Karakiewicz.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: Vickers, Karakiewicz.
Other (specify): None.
Financial disclosures: Giovanni Lughezzani certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: Pierre I. Karakiewicz is partially supported by the University of Montreal Health Center Urology Specialists, Fonds de la Recherche en Santé du Quebec, the University of Montreal Department of Surgery, and the University of Montreal Health Center (CHUM) Foundation. Andrew Vickers is supported in part by funds from David H. Koch provided through the Prostate Cancer Foundation, the Sidney Kimmel Center for Prostate and Urologic Cancers, and P50-CA92629 SPORE grant from the US National Cancer Institute to P.T. Scardino.
Funding/Support and role of the sponsor: None.
-  A.J. Vickers. Prediction models in cancer care. CA Cancer J Clin. 2011;61:315-326
-  S.F. Shariat, M.W. Kattan, A.J. Vickers, P.I. Karakiewicz, P.T. Scardino. Critical review of prostate cancer predictive tools. Future Oncol. 2009;5:1555-1584 Crossref.
-  D.G. Altman, Y. Vergouwe, P. Royston, K.G. Moons. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338:b605 Crossref.
-  D.V. Makarov, B.J. Trock, E.B. Humphreys, et al. Updated nomogram to predict pathologic stage of prostate cancer given prostate-specific antigen level, clinical stage, and biopsy Gleason score (Partin tables) based on cases from 2000 to 2005. Urology. 2007;69:1095-1101 Crossref.
-  A.R. Zlotta, T. Roumeguere, V. Ravery, et al. Is seminal vesicle ablation mandatory for all patients undergoing radical prostatectomy? A multivariate analysis on 1283 patients. Eur Urol. 2004;46:42-49 Abstract, Full-text, PDF, Crossref.
-  A. Gallina, F.K. Chun, A. Briganti, et al. Development and split-sample validation of a nomogram predicting the probability of seminal vesicle invasion at radical prostatectomy. Eur Urol. 2007;52:98-105 Abstract, Full-text, PDF, Crossref.
-  H. John, D. Hauri. Seminal vesicle-sparing radical prostatectomy: a novel concept to restore early urinary continence. Urology. 2000;55:820-824 Crossref.
-  M. Sanda, R. Dunn, J. Wei, J. Resh, G. Montie. Seminal vesicle sparing technique is associated with improved sexual HRQOL outcome after radicalprostatectomy [abstract 606]. J Urol. 2002;167(Suppl 4):151
-  M. Bellina, M. Mari, A. Ambu, S. Guercio, L. Rolle, M. Tampellini. Seminal monolateral nerve-sparing radical prostatectomy in selected patients. Urol Int. 2005;75:175-180 Crossref.
-  P. Albers, S. Schafers, H. Lohmer, P. de Geeter. Seminal vesicle-sparing perineal radical prostatectomy improves early functional results in patients with low-risk prostate cancer. BJU Int. 2007;100:1050-1054
-  J.I. Epstein, M. Carmichael, P.C. Walsh. Adenocarcinoma of the prostate invading the seminal vesicle: definition and relation of tumor volume, grade and margins of resection to prognosis. J Urol. 1993;149:1040-1045
-  H.J. Jewett, R.W. Bridge, G.F. Gray, W.M. Shelley. The palpable nodule of prostatic cancer. Results 15 years after radical excision. JAMA. 1968;203:403-406 Crossref.
-  A.A. Villers, J.E. McNeal, E.A. Redwine, F.S. Freiha, T.A. Stamey. Pathogenesis and biological significance of seminal vesicle invasion in prostatic adenocarcinoma. J Urol. 1990;143:1183-1187
-  P.M. Pierorazio, A.E. Ross, E.M. Schaeffer, et al. A contemporary analysis of outcomes of adenocarcinoma of the prostate with seminal vesicle invasion (pT3b) after radical prostatectomy. J Urol. 2011;185:1691-1697 Crossref.
-  F.P. Secin, F.J. Bianco, A. Cronin, et al. Is it necessary to remove the seminal vesicles completely at radical prostatectomy? Decision curve analysis of European Society of Urologic Oncology criteria. J Urol. 2009;181:609-613 discussion 14
-  D.F. Gleason. Histologic grading of prostate cancer: a perspective. Hum Pathol. 1992;23:273-279 Crossref.
-  M.B. Amin, D. Grignon, D. Bostwick, V. Reuter, P. Troncoso, A.G. Ayala. Recommendations for the reporting of resected prostate carcinomas. Association of Directors of Anatomic and Surgical Pathology. Am J Clin Pathol. 1996;105:667-670
-  H. Janes, M.S. Pepe, W. Gu. Assessing the value of risk predictions by using risk stratification tables. Ann Intern Med. 2008;149:751-760 Crossref.
-  M.J. Pencina, R.B. D’Agostino Sr., R.B. D’Agostino Jr., R.S. Vasan. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27:157-172 discussion 207–12 Crossref.
-  A.J. Vickers, E.B. Elkin. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26:565-574 Crossref.
-  Y. Huang, M. Sullivan Pepe, Z. Feng. Evaluating the predictiveness of a continuous marker. Biometrics. 2007;63:1181-1188
a Cancer Prognostics and Health Outcomes Unit, University of Montreal Health Center, Montreal, QC, Canada
b Vita-Salute San Raffaele University, Milan, Italy
c University of Chicago Medical Center, Chicago, IL, USA
d University of Pennsylvania, Philadelphia, PA, USA
e Duke University, Durham, NC, USA
f Harvard University, Boston, MA, USA
g Memorial Sloan-Kettering Cancer Center, New York, NY, USA
Equal study contribution.
© 2012 European Association of Urology, Published by Elsevier B.V.
Recommend this article
Currently this article has a rating of 1. Please log in to recommend it.