University of Bristol, Department of Social Medicine, Canynge Hall, Whiteladies Road, Bristol BS8 2PR, UK.
Coronary heart disease (CHD) remains the major cause of premature death in developed countries. Many of the risk factors for CHD are well known and interventions exist for some of them, e.g. statins for reducing cholesterol and drugs for hypertension. In the paper by Reinhard Voss and colleagues, a new approach for predicting those at high absolute risk (>20%) of a coronary event in the next 10 years using neural network models is compared to a method using a standard logistic regression model.1 The success of the multi-layer perceptron neural network (MLP-NN) is quite remarkable: 74.5% of coronary events were predicted compared to 45.8% by the logistic regression model. Furthermore, this was achieved by predicting a smaller number of men to be at high risk (7.9%) compared to the logistic regression model (8.4%) so that less men would be eligible for treatment overall, but there would be a substantial increase in the number of coronary events potentially preventable by treatment from 15% with the logistic regression model to 25% with the MLP-NN model. This commentary looks at two issues: firstly how might prediction of CHD be further improved using neural networks, and secondly how transportable are these models and can they be used as clinical tools?
Modelling coronary risk is a complex task that involves several stages and many decisions. An important part of the modelling process that is often not given enough attention is the selection of prognostic variables to be included in the model. The selection may be done a priori based on previous knowledge. More often than not it is done in a data driven way that leads to a different set of variables being selected each time. Some risk factors, such as cholesterol, have such an important influence that they are invariably selected. Others are more problematical: alcohol consumption and family history of CHD may or may not be in prognostic models for CHD. This paper is interesting in that it has two completely different methods of selecting variables: stepwise variable selection with backwards elimination within the logistic regression framework and using a genetic algorithm which is supplied as part of the neural network software package. Not surprisingly, there is a large overlap between the set of variables selected by the two methods, but the two sets do not match completely. What is the explanation for this and which is the better set of variables?
Stepwise variable selection as a method of choosing prognostic variables has been widely criticized.2 Two problems with stepwise selection are that the number of candidate variables affects the number of noise variables that get included in the model, and the degree of correlation between the predictor variables affects the frequency with which authentic predictor variables get selected.3 In their paper, Voss et al. used univariable screening to reduce 57 candidate variables to 26 likely prognostic variables. This can miss important variables that are only important after adjustment for other variables.4 The next stage was to reduce these 26 variables to 13 using multivariate stepwise selection. The results of this process are very dependent on the particular dataset used for selection. If the selection process were to be repeated on bootstrapped replicates of the dataset, then it is likely that slightly different sets of variables would have been chosen. In other words, the selection process overfits to the data and is not optimal. Just as it is important to split data into test and training sets (or use cross-validation or bootstrapping techniques) in order to ensure a model fits data other than that on which it was estimated, so it is also important to choose prognostic variables that generalize well to data other than that on which they were chosen. Sauerbrei and Schumacher give a method that uses bootstrapped samples to select prognostic variables.5 This method retains in the model those variables that are selected most often across all the bootstrapped samples.
In this paper a genetic algorithm was also used to select variables, but the decision was made not to use the set of variables chosen by this method in the comparison of the different models. If we agree with the authors of this paper that there are complex non-linear relationships between the prognostic variables and the CHD outcome variable which is why the MLP-NN model has greater predictive power than the less complex linear logistic regression model, then it makes little sense to use a set of variables chosen by the inferior method. The genetic algorithm, which is also non-linear, efficiently compares huge numbers of different combinations of variables on replicated datasets. It is likely, therefore, that it will result in a better choice of prognostic variables, although this is yet to be tested. Voss and colleagues opted to ignore the results from the genetic algorithm. Their rationale for this was that their aim was not to test the absolute performance of neural networks, but rather to compare them to logistic regression and so they used the 13 variables chosen by the logistic regression method. This seems rather misguided as surely the aim ought to be to find the optimal model. Forcing the neural networks to use the same set of variables selected by an inferior model is like forcing all Formula-1 racing drivers to drive the same carit might be a fairer test of the drivers, but that is not the point of the race, which is to find the fastest team. An alternative fair comparison would be to use the set of variables selected by the genetic algorithm in both the logistic regression model and the neural networks. Neural networks are often compared to other techniques in a way that does not allow them to perform optimally as models. The stakes are high here as better models may lead to prevention of more deaths from CHD. It is to be hoped that this research group will address the issue of whether variable selection by genetic algorithms leads to better prognostic models for CHD in future work.
We now consider the question of the potential for using this type of neural network model in a wide range of clinical settings. At present many clinical tools for CHD risk prediction are derived from the Framingham data using either logistic regression models or Cox or Weibull survival models. Using a neural network model for this purpose is no more difficult than using any other type of model. Whilst the model itself may have a more complicated form with more parameters and may be harder to fit, once it has been fitted then a website could be set up, similar to the Framingham CHD prediction website, which could calculate an individuals risk from the patients data input by the clinician. However, at this point there has to be the usual caveat about transportability of the model to settings other than men working in North Germany. Whilst some care has been taken to validate the models, the standard of validation in this paper is very weak.6 Cross-validation using a random split of the data is a technique which will not address the serious problem of statistical over-optimism in assessing the performance of a model as the performance is being tested on a statistically homogeneous dataset.7 Whilst there is no dispute that the MLP-NN performs much better than the logistic regression model on this dataset, it would be naïve to expect any of the models to perform as well on completely independent data. In order to obtain a clinically useful prediction tool, prognostic models should be fitted across a range of heterogeneous datasets so that generalizability is built into the modelling process and is used as one of the criteria for the choice of the prognostic model from a set of candidate models. Techniques for fitting, testing and selecting prognostic models across multiple datasets have been developed by Royston et al.8,9 Several papers have compared Framingham predictions with locally produced estimates in diverse populations and have found that the Framingham model either over- or underestimates CHD risk in these populations.1012 An improvement on the current situation should be possible if neural networks are used to provide more accurate predictions and the models are fitted across multiple cohorts to ensure the models are applicable to diverse populations.
![]() |
References |
---|
![]() ![]() |
---|
2 Harrell FEJ. Regression Modeling Strategies. New York: Springer-Verlag, 2001.
3 Derksen S, Keselman HJ. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. British Journal of Mathematical and Statistical Psychology 1992;45:26582.
4 Sun G, Shook TL, Kay GL. Inappropriate use of bivariable analysis to screen risk factors for use in multivariable analysis. J Clin Epidemiol 1996;49:90716.[CrossRef][ISI][Medline]
5 Sauerbrei W, Schumacher M. A bootstrap resampling procedure for model building: Application to the Cox regression model. Stat Med 1992;11:2093109.[ISI][Medline]
6 Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med 1999;130:51524.
7 Efron B. How biased is the apparent error rate of a prediction rule? J Am Statist Assoc 1986;81:46170.[ISI]
8 Royston P, Parmar MKB, Sylvester R. Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer. Stat Med 2002; in press.
9 Egger M, May M, Chêne G et al. Prognosis of HIV-1-infected patients starting highly active antiretroviral therapy: a collaborative analysis of prospective studies. Lancet 2002;360:11929.[CrossRef][ISI][Medline]
10 Haq IU, Ramsay LE, Yeo WW, Jackson PR, Wallis EJ. Is the Framingham risk function valid for northern European populations? A comparison of methods for estimating absolute coronary risk in high risk men. Heart 1999;81:4046.
11 Ramachandran S, French JM, Vanderpump MP, Croft P, Neary RH. Using the Framingham model to predict heart disease in the United Kingdom: retrospective. BMJ 2000;320:67677.
12 Menotti A, Puddu PE, Lanti M. Comparison of the Framingham risk function-based coronary chart with risk function from an Italian population study. Eur Heart J 2000;21:365370.