1 Unità di Statistica Medica e Biometria, 3 Unità Operativa Determinanti Biomolecolari nella Prognosi e Terapia, Istituto Nazionale per lo Studio e la Cura dei Tumori, Milan; 2 Istituto di Statistica Medica e Biometria, Università degli Studi di Milano, Milan, Italy
Received 6 February 2003; revised 30 May 2003; accepted 18 July 2003
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The present study investigated complex time-dependent effects of routinely assessed factors on the risk of breast cancer recurrence over follow-up time, with a partial logistic artificial neural network (PLANN) model.
Patients and methods:
PLANN was applied to data from 1793 patients with node-negative breast cancer, not submitted to any adjuvant treatment and with a minimal potential follow-up of 10 years.
Results:
The shape of the hazard function changed according to histology, which showed a time-dependent effect, partly modulated by estrogen receptors (ERs). Age and progesterone receptors (PgR) showed protective effects; the latter was more evident for short follow-up and high ER values. Tumour size and ER content showed time-dependent unfavourable effects at early and long follow-up times, respectively. Predicted values of disease recurrence probability at 2 years of follow-up showed that low steroid-receptor content, young age and large tumour size were associated with the highest risk of relapse. Although the oldest patients with high ER content seem to be those most protected overall, high risk predictions tend to spread also to higher steroid-receptor contents, intermediate ages and small tumour size, with an increase in follow-up time.
Conclusion:
PLANN with suitable visualisation techniques provided thorough insights into the dynamics of breast cancer recurrence for improving individual risk staging of node-negative breast cancer patients.
Key words: artificial neural networks, breast cancer, prognostic factors, survival analysis
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
For example, the time-dependent prognostic relevance of steroid receptors has long been debated [46] and several studies [710], including our own performed in a large series of N breast cancer patients without adjuvant systemic therapy after surgery [11], have shown that the impact of estrogen receptors (ERs) on the risk of recurrence changes during the course of follow-up. However, the presence of non-additive effects of steroid receptors and other routinely assessed prognostic variables was not investigated, and no information on the shape of the hazard, as a function of time, was available from the Cox model.
The few studies performed so far on the relationship between the hazard function and routinely assessed prognostic factors are generally based on their dichotomisation, with predefined cut-off values [810]. A more informative analysis of the dynamics of breast cancer recurrence should thus involve the evaluation of the joint effect of steroid-receptor content and other prognostic factors, accounting for their original scale of measurement. It would therefore be useful to adopt a non-linear modelling approach that is able to identify previously unknown prognostic relationships underlying data, avoiding too restrictive a priori assumptions, and to integrate and verify the consistency of the results obtained with conventional modelling strategies. Methods based on artificial neural networks (ANNs) appear to be suited for this task.
ANNs are statistical models whose mathematical structure reproduces the biological organisation of neural cells for simulation of the learning dynamics of the brain [12]. Considerable attention has been paid in recent years to the application of ANN-based regression methods for the development of prognostic models in oncology. Although some doubts have been raised about the true advantages of ANNs over traditional techniques [13], a recent review highlights their benefits for outcome prediction [14]. A desirable improvement in their use as black box predictors is the recognition of the underlying prognostic relationships and risk profiles, which may well be non-linear in nature and difficult to analyse using standard statistical methodology [15].
The possible use of ANNs as exploratory tools for the study of disease dynamics has been neglected because no ANN models suitable for the hazard function of censored survival data were available. The partial logistic ANN (PLANN) approach, based on the extension of the well-known logistic regression model, has been developed for the analysis of the hazard as a function of multiple variables and time [16]. A challenging issue in PLANN is the representation of complex results for their clinical interpretation. With this approach, visualisation techniques are exploited for evaluating covariate effects on the hazard function and model-predicted disease recurrence probability at different follow-up times.
The aim of the present study was to investigate the joint role of ERs and progesterone receptors (PgR), and other clinically relevant patient and tumour characteristics (age, tumour size, histology), on the risk of cancer recurrence during follow-up after surgery of N patients without adjuvant therapy to avoid any possible confounding effect due to systemic treatment. ER, PgR, age and tumour size were analysed, accounting for their original scale of measurement, in order to improve the accuracy of the picture of their effects on the hazard function and individual risk profiling. The results obtained with the PLANN model integrated previous findings [11] from the time-dependent Cox model, outlining the shape of the hazard function, and provided additional details on the prognostic role of steroid receptors as a function of time, and other factors useful to improve follow-up and/or therapy planning.
![]() |
Patients and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Receptor determination
ER and PgR concentrations were determined using the dextran-coated charcoal method, quantified by multiple-point Scatchard analysis, and expressed as fmol/mg of cytosolic protein as described previously according to the European Organisation for Research and Treatment of Cancer (EORTC) standard assay protocol [17, 18]. ER and PgR levels were determined within the Quality Control Program activated by the Italian Committee for Hormone Receptor Assay Standardization [19].
Statistical analysis
ANNs [12, 20] have been introduced recently to model censored survival data, accounting for complex prognostic patterns. A general framework for neural network models on censored survival data has recently been proposed [21]. Within this framework, PLANN is an ANN generalisation of partial logistic regression, strictly related to the conventional logistic regression [16, 22], which is suitable for smoothed hazard estimation as a function of time and covariates, and allows for non-linear, non-proportional and non-additive effects. A description and a graphical representation of the model is provided in the Appendix (Figure A1). Since the distributions of steroid receptors were positively skewed and proportional increments of their values were considered relevant for their prognostic impact, the natural logarithmic transformation of these variables was adopted. Models were applied to 1715 (95.6%) cases with complete information, on the considered variables. The analysis was based on 10-year curtailed follow-up information.
To visualise the effect of the continuous covariate (ER, PgR, age, tumour size) and the categorical one (histology), model results were represented by three-dimensional surface plots of estimated conditional discrete hazard as a function of time and of the continuous covariate for each type of histology, after fixing other continuous covariates to their median values. To investigate possible high order interactions, multipanel conditioning plots were adopted [23]. These plots display the joint effect of two covariates and follow-up time on the logarithm of the relative hazard (log HR), fixing other continuous covariates to their median values. Confidence intervals were estimated by means of a non-parametric bootstrap procedure [24].
Multiple correspondence analysis (MCA) was adopted to visualise the relationship between model predictions and covariate patterns [25]. For this purpose, the 2-, 5- and 8-year disease recurrence probabilities predicted by the PLANN model for each subject were represented jointly with covariate levels. The details of MCA have been described elsewhere [11]. Briefly, MCA is a suitable approach to investigate the association among categorical and continuous variables. Variables and subjects can be plotted onto a plane defined by the first two factorial axes (which mainly contribute to explain the total variability of the original data) according to new coordinates (factorial scores). Such a plot approximates the association patterns underlying data, since, considering the origin of factorial axes, the angular distance among categories and subjects is related to their mutual associations (smaller distances indicate higher associations). For MCA, the categories of the continuous variables defined for PLANN model implementation (see Appendix) and histological type were plotted jointly with subjects, according to their factorial scores. In addition, the size and the colour level of the dots are proportional to the predicted probability of disease recurrence: large dark and small light dots indicate high and low risk subjects, respectively. Therefore, the relative position of the covariate categories indicates the association between model-predicted values and covariate patterns.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Figure 1D shows the time-dependent prognostic role of tumour size: the risk of recurrence is greatest for early follow-up time and large tumour size (it increases with an increase in tumour size), but this effect lessens after 5 years of follow-up as the hazard decreases.
To obtain an overall view of the association between covariate levels and model-predicted risks, a multivariate visualisation approach (MCA) was adopted [25]. Predicted disease recurrence probabilities for each subject were jointly projected with covariate categories (see Appendix) onto a plane defined by the first two factorial axes. For a dynamic view of the evolution of the risk of disease recurrence, plots of 2-, 5- and 8-year predictions are reported in Figure 3A, B and C, respectively. According to the general trend, high risk predictions at 2 years are mainly concentrated in the lower right corner of the graph, being associated with intermediate to large tumour size, low ER and PgR levels, and young age. Conversely, low risk predictions appear to be well separated, being mainly concentrated in the upper left corner and associated with older age, high ER and high PgR levels. The separation tends to diminish from 58 years. Although the oldest patients with a high ER content appear to be the most protected overall, high risk predictions progressively spread towards the upper left corner of the graph with the increase in follow-up time, being gradually associated with higher steroid-receptor content, intermediate ages and smaller tumour sizes. The increasing effects of ERs and histology, along with the decreasing effect of tumour size over time, probably contribute to this evolving pattern.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Prediction of breast cancer recurrence is a difficult task; in fact, although a considerable number of variables have been investigated, only a few have proved to have a clinically relevant prognostic role, even considering their availability in a reliable way on the whole breast cancer population [13]. Therefore, clinical practice is still based to a large extent upon information provided by only a few clinical and pathological factors, including patient age, tumour size and histology, together with routinely measured biological markers such as hormone steroid receptors, which are considered more useful for predicting the clinical response to hormone therapy than for discriminating patients at different risk of relapse. An important question is whether the prognostic information provided by the above variables has been fully exploited, particularly for those that are measurable on a quantitative scale. Flexible statistical models play a key role in the discovery and evaluation of complex unknown prognostic relationships over prolonged follow-up (at least 10 years) in breast cancer. In this context, Coradini et al. [11] applied a flexible time-dependent extension of the Cox model to a large series of N breast cancer patients, focusing on the relative effects of covariates without exploring the shape of the hazard function over time. The PLANN approach, an ANN suitable for the analysis of the hazard as a function of time and covariates for censored survival time data [16], was adopted in the present work to investigate the joint effects of routinely measured prognostic factors on the dynamics of tumour recurrence. Therefore, the present results integrate those achieved with the Cox regression models.
ANN-based models rely on inductive inference. They attempt to discover biologically and clinically relevant prognostic relationships directly from patient and tumour data, thereby avoiding too restrictive a priori assumptions [15]. In their conventional application to outcome prediction, ANNs have been adopted according to this principle, even though they are essentially used as black boxes. In fact, although they may improve the accuracy of the prediction, they consist of a statistical model not immediately accessible to clinicians and investigators. As we are aware of the need to increase the knowledge of the dynamics of breast cancer recurrence, the PLANN model is not exploited as a black box. Namely, multivariate visualisation techniques are adopted for exploring covariate effects on the hazard function over time, and patient profiles are displayed with the corresponding model-predicted disease recurrence probabilities at different follow-up times. According to this strategy, prognostic relationships underlying data are identified, in order to understand how the model predicts outcomes before further clinical application. With this aim, the identification of time-dependent risk profiles can be useful in improving follow-up and/or therapy planning.
The substantial consistency of the results of the PLANN model with previous ones confirmed the time-dependent prognostic relevance of histology, ER, and pathological tumour size, revealing that PgR may partly share such behaviour. Moreover, complementary information on disease dynamics was obtained by the present analysis, which focused on the shape of the hazard function of disease recurrence. The increased relative risk of recurrence of IDC plus ILC in the course of follow-up with respect to IDC was correlated with the different shapes of the hazard function according to tumour histology, thus suggesting different recurrence dynamics as a function of histology. Moreover, such behaviour is partly influenced by ER and PgR. Different to previous results [11], in PLANN analysis the protective role of PgR is much more evident for early follow-up. Overall, present results suggest a different impact of hormonal regulation between ductal and lobular structures. As expected, tumour size showed a time-dependent effect that was similar in all histologies, supporting the evidence for an unfavourable impact on early recurrence, which decreases over time. Overall these effects could be relevant for improving the accuracy of the prediction of disease evolution in each patient.
The joint representation of patient profiles with the corresponding PLANN-predicted disease-free survival probabilities after 2, 5 and 8 years of follow-up provides a comprehensive picture of the evolution of patient risk of relapsing that better clarifies the interrelationships between variables. Patients with recognised unfavourable prognostic features such as large tumour size, young age and low steroid-receptor content appeared to be at particular risk of relapse at the earliest follow-up time. However, patients with small tumours associated with high ER levels and IDC plus ILC histology should also be carefully monitored, even after prolonged follow-up, because they could be at high risk of disease recurrence in the medium to long term. It should be considered that the probabilities of disease recurrence according to the studied prognostic factors may be substantially different following the administration of an adjuvant systemic therapy. Considering the response to such therapy, it would be useful to assess whether the increase in risk at long follow-up for the above-mentioned patients may be overcome. A partial evidence against such a chance is the presence of time-dependent effects for ERs also for case series including treated patients [8]. Although the present results need to be confirmed for other case series, they show the possible advantages of the application of flexible statistical techniques, such as ANNs, for survival analysis in an exploratory framework.
In conclusion, thorough insights into the dynamics of breast cancer recurrence have been obtained by the application of an ANN model to survival time data. The present findings allow us to refine the evidence previously acquired using Cox regression analysis on the same data set and using the same covariates. Moreover, they constitute the basis for the identification of the prognostic contribution of new variables and the improvement of biological hypotheses about the dynamics of breast cancer recurrence. This step is critical for the generation of future prognostic classification schemes that should provide tailored prediction for individual patients, and thus contribute to the improvement in the clinical management of early breast cancer.
![]() |
Acknowledgements |
---|
![]() |
Appendix |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the corresponding graphical representation (Figure A1), input nodes are assigned to the explanatory variables (x1i ... xKi) for the ith observation, namely: age (years), tumour size (mm), ER and PgR content (logarithmic transformation of fmol/mg cytosolic protein), and histology (lobular, ductal, mixed lobular plus ductal, other), whereas time, grouped into l = 1 ... L intervals with midpoints (ml), is treated as an additional input. Input values are summed with weights ßhk and ßhm transformed by hidden nodes with activation functions gh (·). The weighted sum of the values of the hidden nodes with weights ßh is transformed by a single output node with activation function f(·) to estimate the discrete hazard function (ml, x1i ... xKi).
|
Models with 612 hidden nodes and 0.011 penalty values with 10 different weighting initialisations were evaluated. The selected model had eight hidden nodes and 0.18 as a penalty value, with 20 free estimated parameters. The number of effectively estimated parameters is much smaller than the theoretical number (81) for the network model with eight hidden nodes, but such a result is expected due to the effect of the penalty term. In fact, for penalty values >0.10 the choice of the number of hidden nodes in the network was found to have less influence than weight decay in modulating the number of parameters actually estimated. Results on model selection obtained with NIC were further confirmed by a leave-five-out, non-linear cross-validation procedure [26]. Confidence intervals on the PLANN model results have been estimated by means of a non-linear version [26] of the non-parametric bootstrap procedure [24].
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
2. Goldhirsch A, Glick JH, Gelber RD et al. Meeting highlights: International Consensus Panel on the Treatment of Primary Breast Cancer. Seventh International Conference on Adjuvant Therapy of Primary Breast Cancer. J Clin Oncol 2001; 19: 38173827.
3. National Institutes of Health-Consensus Development Panel. Adjuvant therapy for breast cancer, 13 November 2000. J Natl Cancer Inst 2001; 93: 979989.
4. Hahnel R, Woodings T, Vivian AB. Prognostic value of estrogen receptors in primary breast cancer. Cancer 1979; 44: 671675.[ISI][Medline]
5. Aamdal S, Bormer O, Jorgensen O et al. Estrogen receptors and long-term prognosis in breast cancer. Cancer 1984; 53: 25252529.[ISI][Medline]
6. Osborne CK. Steroid hormone receptors in breast cancer management. Breast Cancer Res Treat 1998; 51: 227238.[CrossRef][ISI][Medline]
7. Gray RJ. Flexible methods for analyzing survival data using splines with applications to breast cancer prognosis. J Am Stat Assoc 1992; 87: 942951.[ISI]
8. Hilsenbeck SG, Ravdin PM, de Moor CA et al. Time-dependence of hazard ratios for prognostic factors in primary breast cancer. Breast Cancer Res Treat 1998; 52: 227237.[CrossRef][ISI][Medline]
9. Saphner T, Tormey DC, Gray R. Annual hazard rates of recurrence for breast cancer after primary therapy. J Clin Oncol 1996; 14: 27382746.[Abstract]
10. Gasparini G, Fanelli M, Boracchi P et al. Behaviour of metastasis in relation to vascular index in patients with node-positive breast cancer treated with adjuvant tamoxifen. Clin Exp Metastasis 2000; 18: 1520.[ISI][Medline]
11. Coradini D, Daidone MG, Boracchi P et al. Time-dependent relevance of steroid receptors in breast cancer. J Clin Oncol 2000; 18: 27022709.
12. Bishop CM. Neural networks for pattern recognition. New York, NY: Oxford University Press 1995.
13. Schwarzer G, Vach W, Schumacher M. On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Stat Med 2000; 19: 541561.[CrossRef][ISI][Medline]
14. Lisboa PJG. A review of evidence of health benefit from artificial neural networks in medical intervention. Neural Netw 2002; 15: 1139.[CrossRef][ISI][Medline]
15. Grumett SA, Snow PB. Artificial neural networks: a new model for assessing prognostic factors. Ann Oncol 2000; 11: 383384.[CrossRef][ISI][Medline]
16. Biganzoli E, Boracchi P, Mariani L et al. Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. Stat Med 1998; 17: 11691186.[CrossRef][ISI][Medline]
17. Ronchi E, Granata G, Brivio M et al. A double-labeling assay for simultaneous estimation and characterization of estrogen and progesterone receptors using radioiodinated estradiol and tritiated Org 2058. Tumori 1986; 72: 251257.[ISI][Medline]
18. Revision of the standards for the assessment of hormone receptors in human breast cancer; report of the second E.O.R.T.C. Workshop, held on 16-17 March, 1979, in The Netherlands Cancer Institute. Eur J Cancer 1980; 16: 15131515.[ISI][Medline]
19. Piffanelli A, Pelizzola D, Giovannini G et al. Characterization of laboratory working standard for quality control of immunometric and radiometric estrogen receptor assays. Clinical evaluation on breast cancer biopsies. Italian Committee for Hormone Receptor Assays Standardization. Tumori 1989; 75: 550556.[ISI][Medline]
20. Ripley BD. Pattern recognition and neural networks. Cambridge, UK: Cambridge University Press 1996.
21. Biganzoli E, Boracchi P, Marubini E. A general framework for neural network models on censored survival data. Neural Netw 2002; 15: 209218.[CrossRef][ISI][Medline]
22. Efron B. Logistic regression, survival analysis, and the KaplanMeyer curve. J Am Stat Assoc 1988; 83: 414425.[ISI]
23. Gray RJ. Hazard rate regression using ordinary nonparametric regression smoother. J Comput Graph Stat 1996; 5: 190207.
24. Boracchi P, Biganzoli E, Marubini E. Joint modelling of cause specific hazard functions with cubic splines: an application to a large series of breast cancer patients. Comput Stat Data Anal 2003; 42: 243262.[CrossRef][ISI]
25. Greenacre MJ. Correspondence analysis in practice. London, UK: Academic Press 1996.
26. Moody J. Prediction risk and architecture selection for neural networks. In Cherkassky D, Friedman JH, Wechsler H (eds): From Statistics to Neural Networks: Theory and Pattern Recognition Applications, NATO ASI Series F. New York, NY: Springer-Verlag 1994.