EDITORIAL

Judging New Markers by Their Ability to Improve Predictive Accuracy

Michael W. Kattan

Correspondence to: Michael W. Kattan, Ph.D., Health Outcomes Research Group, Department of Epidemiology and Biostatistics and Department of Urology, Memorial Sloan-Kettering Cancer Center, 1275 York Ave., New York, NY 10021 (e-mail: kattanm{at}mskcc.org).

The man who has recently received a radical prostatectomy to treat his clinically localized prostate cancer now faces an important decision: whether or not adjuvant therapy would be beneficial. Clearly, a major factor in this decision is the likelihood of his disease recurring in the absence of additional therapy. There are at least three well-documented prognostic models for use in this setting, and each predicts the likelihood of biochemical progression (i.e., prostate-specific antigen [PSA]-defined recurrence of prostate cancer). Partin et al. (1) developed an equation they called "Rw"; Blute et al. (2) devised the "GPSM" score (which includes the Gleason score, PSA level, seminal vesicle status, and margin status), and Kattan et al. (3) derived a postoperative nomogram, which was later validated by Graefen et al. (4). Which of these models predicts best for the individual patient? The GPSM score and the postoperative nomogram have been evaluated by the concordance index and have values of 0.76 and 0.80, respectively, suggesting relatively similar performance. The concordance index is the probability that, given two randomly selected patients, the patient with the worse outcome is, in fact, predicted to have a worse outcome (5). This measure, similar to an area under the receiver operating characteristic curve, ranges from 0.5 (i.e., chance or a coin flip) to 1.0 (perfect ability to rank patients).

In this issue of the Journal, Rhodes et al. (6) have found that a novel marker, the E-cadherin and enhancer of zeste homolog 2 (EZH2) status, may provide additional prognostic ability in the postoperative prostate cancer disease setting. They have found that the interaction of E-cadherin and EZH2 is statistically significant in multivariable analysis (P = .003) and has a hazard ratio of 3.19. This association may prove to have important biologic implications. However, from a prediction perspective, an important question should be asked of any new marker: How accurate is the best prediction model that contains the new marker relative to the best model that lacks it? That is, how much does the concordance index improve with knowledge of the patient’s novel marker? This increment is a direct gauge of the progress being made in our ability to predict patient outcome.

Analyses that characterize markers by their impact on the predictive accuracy (e.g., as measured by a change in the concordance index) of a model are rare, but beneficial. Begg et al. (7) effectively did this when they compared three rival staging systems in thymoma. As Begg et al. point out, many prognostic factors contain little or no relevant information that is not already available when standard prognostic factors are combined optimally. For this reason, it is important to compare the best (i.e., most accurately predicting) models, with and without the marker of interest. When judging the value of a model containing a new marker, an important question is whether it is possible to achieve an equivalent concordance index by the optimal modeling of all predictors besides the novel marker. If so, the new marker has not improved our ability to predict patient outcome.

Why should we change the way we ordinarily look at markers and instead compare the accuracies of two models? The reasons that the comparison must be model-based and that traditional reporting of P values and hazard ratios from multivariable analysis is inadequate are manyfold. First, an individual patient’s optimal prediction, in most cases, will come from a multivariable model. Rarely would a single marker, absent any modeling, be ideal for prediction. If a model of markers provides the most accurate prediction, we should be evaluating models of markers. Second, the P value tests whether the association with the marker is 0, which is not testing the question of direct interest: whether a new marker improves our ability to predict. As Simon (8) points out, these are different questions. Third, when examining the P value for a novel marker, this value may depend on how the other variables are considered in the multivariable model. For example, the use of cutoffs or transforms for the established marker(s) can affect the P value of the novel marker. A comparison of the best models with and without a marker of interest provides a more objective alternative, because the emphasis is shifted to predictive accuracies of the models; the modeling should be used that provides the most accurate predictions (e.g., maximizes the concordance index), an objective goal. This model comparison conveniently alleviates another problem, that of automated variable selection. Procedures such as backwards elimination tend to reduce the P values of variables that survive elimination (i.e., the P values of remaining variables tend to shrink as other variables are eliminated) (9). Thus, the concern when a marker has a small P value only after variable selection, and not when judged in the full model before variable selection, is largely solved because 1) automated variable selection procedures would be used only when they improve a model’s predictive ability [which is very rare (9)] and 2) the P value of the marker after variable selection would not be of direct interest.

Interpretation of a novel marker’s hazard ratio, in an effort to judge the marker’s prognostic value, has similar drawbacks. The hazard ratio is dependent on the measurement scale of the marker, cutoff(s) used for the novel marker, and the manner in which established variables are modeled.

The following case study illustrates why incremental model predictive accuracy is a valuable metric. A new marker, percentage of biopsy cores positive for prostate cancer, was recently analyzed for its ability to improve preoperative prediction of prostate cancer recurrence after radical prostatectomy (10). When added to a model containing the established markers (pretreatment PSA level, clinical tumor stage, and biopsy Gleason score), the percentage of positive cores was highly statistically significant (P<.001). However, the concordance index of this model (0.75) was identical to that of the best model that lacked this predictor. Thus, percentage of positive cores, despite being statistically significant on multivariable analysis, did not advance our ability to predict individual patient outcomes by any appreciable amount. However, when levels of interleukin 6 and tumor growth factor {beta}1 were added to the model containing the established markers, these new markers were also each highly statistically significant (each P<.001), and indeed the concordance index improved to 0.83. Thus, evaluation of the concordance index provided a bottom-line analysis when marker P value inspection results were inconsistent.

In conclusion, more emphasis on prognostic model predictive accuracy is needed. Markers should be judged on their ability to improve an already optimized prediction model, rather than on their P value in a multivariable analysis. In addition, this incremental predictive accuracy should be quantified. As a measure of a model’s predictive ability, the concordance index may not be the perfect metric, and methods of comparing concordance indices need further development, but it is perhaps the best alternative presently available (11). Measurement of predictive accuracy is an active area of research. Nonetheless, continuous measurement of the improvement in our ability to predict outcomes for cancer patients helps us to know when prognostic progress has been made and keeps our focus on the important goal of improving our ability to predict patient outcome.

REFERENCES

1 Partin AW, Piantadosi S, Sanda MG, Epstein JI, Marshall FF, Mohler JL, et al. Selection of men at high risk for disease recurrence for experimental adjuvant therapy following radical prostatectomy. Urology 1995;45:831–8.[CrossRef][Medline]

2 Blute ML, Bergstralh EJ, Iocca A, Scherer B, Zincke H. Use of Gleason score, prostate specific antigen, seminal vesicle and margin status to predict biochemical failure after radical prostatectomy. J Urol 2001;165:119–25.[CrossRef][Medline]

3 Kattan MW, Wheeler TM, Scardino PT. Postoperative nomogram for disease recurrence after radical prostatectomy for prostate cancer. J Clin Oncol 1999;17:1499–507.[Abstract/Free Full Text]

4 Graefen M, Karakiewicz P, Cagiannos I, Klein EA, Kupelian PA, Quinn D, et al. A validation study of the accuracy of a postoperative nomogram for recurrence after radical prostatectomy for localized prostate cancer. J Clin Oncol 2002;20:951–6.[Abstract/Free Full Text]

5 Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA 1982;247:2543–6.[Abstract]

6 Rhodes DR, Sanda MG, Otte AP, Chinnaiyan AM, Rubin MA. Multiplex biomarker approach for determining risk of prostate-specific antigen-defined recurrence of prostate cancer. J Natl Cancer Inst 2003;95:661–9.[Abstract/Free Full Text]

7 Begg CB, Cramer LD, Venkatraman ES, Rosai J. Comparing tumor staging and grading systems: a case study and a review of the issues, using thymoma as a model. Stat Med 2000;19:1997–2014.[CrossRef][Medline]

8 Simon R. Evaluating prognostic factor studies. In: Gospodarowicz MK, editor. Prognostic factors in cancer. 2nd ed. New York (NY): Wiley-Liss, Inc.; 2001. p. 49–56.

9 Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361–87.[CrossRef][Medline]

10 Kattan MW, Scardino P. Prediction of progression: nomograms of clinical utility. Clinical Prostate Cancer 2002;1:90–6.[Medline]

11 Harrell FE Jr. Regression modeling strategies with applications to linear models, logistic regression, and survival analysis. New York (NY): Springer-Verlag; 2001.


This article has been cited by other articles in HighWire Press-hosted journals:


             
Copyright © 2003 Oxford University Press (unless otherwise stated)
Oxford University Press Privacy Policy and Legal Statement