Prediction of risk of coronary events in middle-aged men in the Prospective Cardiovascular Münster Study (PROCAM) using neural networks

Reinhard Vossa, Paul Cullena,b, Helmut Schulteb and Gerd Assmanna,b

a Institute of Clinical Chemistry and Laboratory Medicine,
b Institute of Arteriosclerosis Research, University of Münster, Germany.

Correspondence: Dr Paul Cullen, Institut für Arterioskleroseforschung an der Universität Münster, Domagkstraße 3, 48149 Münster, Germany. E-mail: cullen{at}uni-muenster.de


    Abstract
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 Appendix
 References
 References for Appendix 
 
Background Logistic regression (LR) is commonly used to estimate risk of coronary heart disease. We investigated if neural networks improved on the risk estimate of LR by analysing data from the Prospective Cardiovascular Münster Study (PROCAM), a large prospective epidemiological study of risk factors for coronary heart disease among men and women at work in northern Germany.

Methods We used a multi-layer perceptron (MLP) and probabilistic neural networks (PNN) to estimate the risk of myocardial infarction or acute coronary death (coronary events) during 10 years’ follow-up among 5159 men aged 35–65 years at recruitment into PROCAM. In all, 325 coronary events occurred in this group. We assessed the performance of each procedure by measuring the area under the receiver-operating characteristics curve (AUROC).

Results The AUROC of the MLP was greater than that of the PNN (0.897 versus 0.872), and both exceeded the AUROC for LR of 0.840. If ‘high risk’ is defined as an event risk >20% in 10 years, LR classified 8.4% of men as high risk, 36.7% of whom suffered an event in 10 years (45.8% of all events). The MLP classified 7.9% as high risk, 64.0% of whom suffered an event (74.5% of all events), while with the PNN, only 3.9% were at high risk, 58.6% of whom suffered an event (33.5% of all events).

Conclusion Intervention trials indicate that about one in three coronary events can be prevented by 5 years of lipid-lowering treatment. Our analysis suggests that use of the MLP to identify high-risk individuals as candidates for drug treatment would allow prevention of 25% of coronary events in middle-aged men, compared to 15% and 11% with LR and the PNN, respectively.

Keywords Coronary heart disease, risk factors, neural networks, logistic regression

Accepted 31 July 2002


    Introduction
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 Appendix
 References
 References for Appendix 
 
The risk of developing coronary heart disease (CHD) depends on a large number of determinants—some of which are associated with lifestyle—operating from early childhood on. Many of these determinants interact in a complex non-linear fashion complicating assessment in the individual patient. Prediction rules developed using clinical judgement include a large subjective component and are therefore difficult to standardize.1,2 Prediction rules have also been developed using statistical procedures such as stepwise logistic regression in large prospective epidemiological studies such as the Framingham study in the US3 and the Prospective Cardiovascular Münster Study (PROCAM) in Europe.4 These provide a truer picture of a person’s overall or global risk of CHD than clinical classification or schemes (e.g. ref. 5) that suggest the simple addition of a small number of variables.

Logistic regression assumes that a variable is related to risk in a particular continuous fashion. Because the number of coefficients present in logistic regression functions is limited, it is usually not difficult to detect when such a function is producing results that are biologically implausible and to identify the source of that implausibility. By contrast, neural networks are more complex, utilize larger numbers of coefficients, and take into account complex non-linear relationships that exist within the data. Moreover, as recently noted by Dayhoff and Deleo, in approximating a multifactorial function, neural networks create the functional form and fit the function at the same time, a capability that gives them a decided advantage over traditional statistical multivariate regression techniques.6

As a result, neural networks may produce a model of greater discrimination and thus a more accurate estimation of risk than logistic regression. Several good introductions to the neural network approach have been published.7–9 Our aim here was to use neural networks to calculate the posterior probability of an individual suffering a fatal or non-fatal myocardial infarction if a particular constellation of risk factors is present, and to compare the results to those obtained with logistic regression analysis. Neural networks have been shown to provide good estimates of classical Bayesian probabilities.10,11 As reviewed by Tu, neural networks are also able to implicitly detect complex non-linear relationships between dependent and independent variables and to uncover all possible interactions between predictors, in our case risk factors for CHD.12 For these reasons, neural networks may provide better generalization than conventional regression techniques.12

The PROCAM study is a large prospective epidemiological investigation performed among the working population in Westphalia and the northern Ruhr regions of Germany. We developed a logistic regression model and two kinds of neural network to identify those individuals at risk of CHD in the cohort of middle-aged men with 10 years of follow-up.


    Subjects and Methods
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 Appendix
 References
 References for Appendix 
 
Study design
Recruitment to the PROCAM study13,14 was started in 1979 and completed in 1985. During this time 20 060 employees of 52 companies and local government authorities were examined. Details of the examination procedure are reported elsewhere.4

Follow-up
Follow-up was by questionnaire every 2 years, with a response rate of 96%. In each case in which evidence of morbidity or mortality was entered in the questionnaire, hospital records and records of the attending physician were obtained and, in the case of deceased study participants, an eyewitness account of death was sought. A Critical Event Committee reviewed these data, together with data from death certificates, in order to verify the diagnosis or the cause of death. A ‘major coronary event’ was defined as the presence of sudden cardiac death or of a definite fatal or non-fatal myocardial infarction on the basis of ECG and/or cardiac enzyme changes.15

Cohort analysis
In the cohort of 5159 men aged 35–65 years recruited before the end of 1985, a sufficient number of major coronary events occurred within 10 years to allow statistically valid longitudinal analysis.4 Among women and younger men, numbers were insufficient to permit such analysis. In this group of middle-aged men, 325 major coronary events occurred. In 63 men who did not suffer an acute coronary event as defined in this study, CHD was diagnosed by angiography or other means during the period of follow-up. These 63 men were excluded from the analysis presented here. Fourteen suspected coronary deaths were observed, and these individuals were also excluded from the analysis. In all, 218 men died from other causes within the 10-year follow-up period. Non-fatal stroke occurred in 46 men. Thus, 4493 middle-aged men survived 10 years after examination without a major coronary event and without the exclusion factors listed above. For all analyses in this report, these 4493 men were compared to the 325 men (prevalence 6.7%) who suffered a major coronary event. The establishment of the cohort of middle-aged men studied in this report is shown in Table 1Go.


View this table:
[in this window]
[in a new window]
 
Table 1 Study population in Prospective Cardiovascular Münster Study (PROCAM) and procedure by which the sub-population reported in this paper was selected
 
Identification of variables
Fifty-seven clinical and laboratory variables were measured in PROCAM.16 In order to identify the variables used in the present study, the following procedure was followed:17 univariate analysis was performed and those variables with a P-value > 0.25 were excluded. The remaining 26 variables were analysed by logistic regression analysis using forward selection with a test of backward elimination using as exclusion criterion a log-likelihood ratio with a P-value < 0.25. Following this step, 13 variables (age, high density lipoprotein [HDL] cholesterol, low density lipoprotein [LDL] cholesterol, triglycerides, systolic blood pressure, smoking, diabetes mellitus, family history of myocardial infarction, height, gamma glutamyl transferase, body mass index, and personal or family history of hypertension) remained. The assumption of linearity of the log-likelihood estimates of the continuous variables within this group of 13 was then tested. To do this, quadratic and logarithmic transformations of each variable were generated and examined for significance. This led to the inclusion of the terms ln(triglyceride) and ln(HDL cholesterol) x HDL cholesterol. The use of cubic splines produced numerically unstable results with improbable terms for the linear coefficients, did not improve the log-likelihood ratios and was therefore not pursued further. Finally, interactions between the 15 terms identified were tested using as a cut-off a log-likelihood ratio with a P-value of 0.1 for each interaction; those values with a P-value > 0.05 were then eliminated by forward selection with a test of backward elimination. The final model contained the variables age, triglycerides, ln(triglycerides), HDL cholesterol, LDL cholesterol, systolic blood pressure, gamma glutamyl transferase, smoking, diabetes, family history of myocardial infarction, and the terms HDL cholesterol x ln(HDL cholesterol), triglyceride x HDL cholesterol and triglyceride x gamma glutamyl transferase.

Neural networks
Two types of neural network were used, a multi-layer perceptron network with one hidden layer (MLP) and a probabilistic neural network (PNN). Both are supervised networks that are used for prediction and classification problems. The networks were designed using Statistica Neural Networks (SNN) software release 4.1 (StatSoft Inc., Tulsa, Oklahoma, USA) running under Microsoft Windows 2000®. The architecture of these networks is shown in Figures 1Go and 2Go, respectively, and is described in detail in the Appendix. In order to select the variables for use with neural networks, we first used forward and backwards stepwise feature selection and a genetic input selection algorithm,18 which are all part of the SNN software, to identify and/or remove variables that did or did not contribute significantly to the performance of the networks. Algorithms for forward and backwards selection work by adding or removing variables one at a time. Genetic algorithms, by contrast, generate optimal sets of input variables by constructing binary masks using artificial mutation, crossover and selection operators. Each mask is used to construct a new training set for testing with probabilistic neural networks. The increase of the network error is an indicator of irrelevant input variables.18



View larger version (27K):
[in this window]
[in a new window]
 
Figure 1 Architecture of the feed-forward supervised 3-layer perceptron neural network used in the present study. There are 13 inputs corresponding to the 8 continuous and 5 categorical risk factors as described in the text, 4 hidden units with logistic sigmoid activation functions, and one linear output corresponding to presence or absence of a major coronary event. The bias parameters in the first layer are shown as weights (w0) from an extra input having a fixed value of x0 = 1

Details of the theoretical background to this network may be obtained from the authors on request

 


View larger version (31K):
[in this window]
[in a new window]
 
Figure 2 Architecture of the probabilistic neural network used in the present study showing the input layer, the radial or pattern layer, the summation layer, and the output layer

Details of the theoretical background to this network are available from the authors on request.

 
The sets of variables selected with this approach showed a large degree of overlap with the 13 variables chosen by logistic regression analysis as outlined above. For this reason, and because the aim of the present work was not to test the absolute performance of neural networks, but rather to compare them to logistic regression, we constructed our networks using the 13 variables chosen by logistic regression.

Use of validation sets to adjust performance of logistic regression and the neural networks
In order to derive the logistic regression equations and the neural networks in this study, we divided our data into five equal and distinct sets. Four of these five sets were then combined and used for training. The remaining fifth was used for testing the performance of the logistic regression and neural networks models on ‘unknown’ data. This cross-validation procedure was repeated for every possible 4 + 1 combination. As training of an MLP is an iterative adaptive task, each of the 4/5 datasets described above was randomly partitioned into an internal training set comprising half of the cases, a verification set comprising a quarter of the cases, and a validation set comprising the remaining quarter of the cases. The MLP was then trained in iterative fashion through a number of epochs. In each epoch, the entire training set was presented to the network, case by case. Errors were calculated and were used to adjust the weights in the network, the performance of which was tested against the validation dataset. We used this cross verification as an early stopping condition to prevent over-fitting, a phenomenon in which the training error falls but the verification error rises as the network learns to produce the desired outputs given the training set, but fails to generalize to new data.

The PNN, by contrast, was trained in a single epoch (Appendix). The 4/5 training sets were passed in toto through the network and a set of weights was calculated for each class of outcome (coronary event, no coronary event). The performance of each PNN was optimized by varying the smoothing factor (i.e. the radial deviation of the Gaussian kernel functions) in the range from 0.1 to 0.3.

Neural networks and over-fitting
A major problem of neural networks is their tendency to over-fit the data, i.e. to generate networks that are too closely adapted to the training data (the so-called bias-variance problem [Appendix]).19

When applied to unseen data, such over-fitted networks often produce biologically implausible results. One way to deal with over-fitting is to divide the data into a training set and a validation set as described above and to stop training the network at the point at which the error in the verification set is at a minimum (early stopping procedure).12 We tried this approach but found that errors still occurred when the networks were presented with unseen data. To address this problem, we generated datasets in which all variables but one were held constant at their average value in our population and the remaining variables were varied throughout the entire biologically plausible range. For example, risk profiles were generated in which all variables but systolic blood pressure were held constant at their average value and outcomes were computed for systolic blood pressures varying in steps of 1 mmHg from 70 mmHg to 250 mmHg. These simulations showed that MLP networks containing more than four nodes in the hidden layer and probabilistic networks using smoothing factors lower than 0.12 were liable to produce implausible results when presented with unseen constellations of risk factors. For these reasons, an MLP with four nodes in one hidden layer and a PNN with a smoothing factor of 0.14 were used.

Statistics
The performances of logistic regression, the MLP and the PNN were compared using receiver-operating characteristics (ROC) analysis.20 In a ROC analysis, the sensitivity (positive fraction of class 2 [presence of a major coronary event]) is plotted against 1 minus the specificity (false negative fraction of class 1 [absence of a major coronary event]) for each possible decision threshold. Performance in this case refers to the ability of the various procedures to discriminate between men who had developed a coronary event and men who had not. We compared the areas under the ROC curves according to the approach of Hanley and McNeil.21 Statistical analyses were performed using the Statistical Package for the Social Sciences (SPSS-X).22


    Results
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 Appendix
 References
 References for Appendix 
 
Coronary heart disease risk factors examined in this study
The risk factors examined by logistic regression and neural networks in this study are listed in Table 2Go, which shows the mean levels of the 26 potential coronary heart disease risk factors among 325 men with coronary events and 4493 controls who survived 10 years without a major coronary event, as selected by univariate analysis with a cut-off of P< 0.25. The last two columns in Table 2Go indicate if the variable was used in the neural networks or the logistic regression model.


View this table:
[in this window]
[in a new window]
 
Table 2 List of 26 coronary heart disease risk factors among men aged 35–65 years with and without coronary events within 10 years of follow-up in the Prospective Cardiovascular Münster Study (PROCAM) as selected by univariate analysis with a cut-off of P < 0.25 (see text for details) showing which variables were used for logistic regression and neural networks. Upper section: Age-adjusteda mean values (± standard deviation) of continuous risk variables. Lower section: Percentages of men with and without coronary events showing presence of categorical variables
 
Logistic regression analysis
Stepwise logistic regression was used to calculate a risk function of the form:

probability of a major coronary event in 10 years =


where the variables xi and the coefficients ßi were as follows
Name of variable units ß

x1 Age years  0.0979
x2 Triglycerides mg/dl -0.0226
x3 ln(triglycerides) mg/dl  5.7525
x4 HDL cholesterol mg/dl -0.4106
x5 LDL cholesterol mg/dl  0.0151
x6 systolic blood pressure mmHg  0.0140
x7 gamma glutamyl transferase units/l -0.0257
x8 smoking no = 0, yes = 1  0.7547
x9 diabetes mellitus no = 0, yes = 1  0.5827
x10 positive family history of myocardial infarction no = 0, yes = 1  0.4154
x11 HDL cholesterol x ln(HDL cholesterol) mg/dl  0.0899
x12 triglycerides x HDL cholesterol mg/dl -0.0004
x13 triglycerides x gamma glutamy transferase mg/dl, units/l  0.0001

Receiver-operating characteristics curve analysis of neural networks and logistic regression analysis
Analysis of ROC curves showed the superior performance of neural network analysis over logistic regression in predicting coronary events among middle-aged men in PROCAM (Figure 3Go, Table 3Go). The area under the ROC curve of the MLP was greater than that of the PNN (0.897 versus 0.872, respectively), both of which values exceeded the area under the ROC curve for logistic regression of 0.840. As can be seen from Table 3Go, the 95% CI of the three ROC curves showed no overlap, indicating that the differences between the ROC curves were unlikely to be due to chance. In prevention of CHD, the value of a predictive test lies in its ability to accurately identify those people who require intervention while excluding those who do not. For this reason, the left-hand sections of the ROC curves, at which there is a low false positive rate, are of particular relevance in our context (Figure 3Go). Thus, even though the areas under the ROC curves were similar for both networks and only moderately greater than the area under the ROC curve for logistic regression, the performance of the MLP greatly exceeded those of the other two models. The ROC curve of the MLP rose much more steeply than that of the PNN or of logistic regression, indicating that this network achieved a true positive rate of greater than 75% at a false positive rate of less than 5%.



View larger version (24K):
[in this window]
[in a new window]
 
Figure 3 Receiver-operating characteristics (ROC) curves showing the performance of logistic regression analysis, a four-node multi-layer perceptron (MLP) neural network, and a probabilistic network (PNN) in predicting the occurrence of major coronary events among 4818 men aged 35–65 years at recruitment into the Prospective Cardiovascular Münster Study (PROCAM). Such ROC curves provide a pure measure of the diagnostic accuracy of a test.20 The curve of a test with no discriminatory power does not deviate from the dotted line running at 45° to the x-axis, the curve of a test with perfect discrimination between healthy and diseased states passes through the upper left-hand corner of the frame. The greater the area under the curve, the more accurate the test. For each curve, the closed circles indicate the threshold value corresponding to an estimated absolute risk of a coronary event of 20% in 10 years of follow-up. At this threshold the MLP showed the greatest sensitivity (greatest true positive rate) while also displaying a high degree of specificity (low false positive rate), and was therefore best able to identify men at high risk

 

View this table:
[in this window]
[in a new window]
 
Table 3 Performance characteristics of the models used to predict the risk of coronary events among middle-aged men in the Prospective Cardiovascular Münster Study (PROCAM) using a multi-layer perceptron neural network with four nodes in one hidden layer (MLP), a probabilistic neural network (PNN) with a smoothing factor of 0.14, and logistic regression analysis (Log Reg). The area under the receiver-operating characteristics (ROC) curve, the sensitivity, the specificity, the false positive rate, the false negative rate, the positive predictive value and the positive likelihood ratios (PLR = true positive rate/false positive rate) were calculated for the standard high risk threshold of 20% coronary event risk in 10 years
 
In recent years, a consensus has emerged that those patients with an absolute risk of a coronary event exceeding 20% in 10 years should be regarded as ‘high risk’ and should receive special attention.23 We applied this threshold to our analyses in order to calculate the diagnostic sensitivity and specificity, the false negative and false positive rates, and the positive predictive values shown in Table 3Go. This risk threshold is also indicated by a closed circle on each of the three ROC curves shown in Figure 3Go. Using logistic regression, 8.4% of men were classified as high risk, with an event incidence of 36.7% in this high risk group which contained 45.8% of all coronary events seen. Using the probabilistic neural network, only 3.9% of men were classified as high risk, and although 58.4% of these men experienced a coronary event, the high risk group defined using the probabilistic neural network contained only 33.5% of all the events occurring in the study. By contrast, the multilayer perceptron neural network classified 7.9% of men as being at high risk, no fewer than 66.7% of whom suffered a major coronary event. Moreover, the high risk group as defined by the multilayer perceptron contained fully 74.5% of all the coronary events occurring within this cohort of men in 10 years in PROCAM.


    Discussion
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 Appendix
 References
 References for Appendix 
 
Risk prediction for CHD requires consideration of the totality of a person’s risk factors. This follows a two-stage process. First, all variables that are thought to play a possible role based on observational studies and biological information are measured at the start of the study. In PROCAM, for example, 57 variables were assessed in each participant. Following a period of observation, those variables showing significant association with an outcome of interest are isolated by a mathematical selection process. Up to now, the latter step has mainly been achieved by means of logistic regression. Such logistic regression equations have been derived both from the PROCAM study in Europe4 and from the Framingham study in the US.24

As noted in the introduction, neural networks may produce a model of greater discrimination and thus a more accurate estimation of risk than logistic regression. However, as described by Tu,12 neural networks may cause over-fitting, resulting in implausible results when the network is presented with unseen data. In this study, we used three methods to deal with this problem. First, we divided the data into five distinct sets and used cross-validation to improve the performance of logistic regression and both types of neural network. Second, we stopped training the MLP neural network when the errors in the validation sets were at a minimum (early stopping procedure). Third, we presented the networks with large sets of synthetic data that spanned in small steps the biologically meaningful range. We then progressively modified the networks until biologically implausible results were no longer obtained with these synthetic datasets. This reduced predictive power but produced networks that were robust predictors when presented with unseen data.

Because there is no a priori way of knowing which type of network will perform best in a given clinical situation,12 we tested two of the most commonly used network types on the PROCAM dataset. In our data and under the constraint of biological plausibility, the multilayer perceptron was clearly superior in identifying individuals at high risk of developing a coronary event.

In preventive medicine, the value of a test lies in its ability to identify those individuals who are at high risk of an illness and who therefore require intervention while excluding those who do not require such intervention. In our case, we took ‘high’ to mean a risk of developing an acute coronary event that exceeds 20% in 10 years. Accuracy of risk classification is of particular relevance in the case of coronary artery disease. Because of the high prevalence of this condition, inaccurate risk prediction will lead to over-treatment of a large number of people and under-treatment of many more. Moreover, it has become clear in recent years that the risk of an acute coronary event in people who have not yet suffered a myocardial infarction but have a number of risk factors (the so-called ‘pre-symptomatic’ patient) may equal or even exceed that of people who have already suffered a myocardial infarction. Thus the distinction between primary and secondary prevention has become blurred.

In recent years, several large scale intervention studies have shown that, in people at high risk of CHD, treatment of risk factors with diet and drugs may be expected to prevent about a third of all coronary events.25–28 It is instructive to apply this experience to our data. Thus, using logistic regression, 843 of every 10 000 middle-aged men are classified as high risk, 309 (incidence 36.7%, Table 3Go) of whom will suffer a myocardial infarction within 10 years. Treatment of all 843 high-risk men may thus be expected to prevent 103 events (one-third of 309), or to put it another way, 8.2 men need to be treated to prevent one event (840 divided by 103 equals 8.2). With the probabilistic neural network, only 386 men are classified as high risk, 226 (58.6%) of whom will suffer a myocardial infarction. Treatment may be expected to prevent 75 events (one-third of 226), and 5.1 men need to be treated to prevent one event. With the multilayer perceptron, 785 men are classified at high risk, 502 (64.0%) of whom will suffer myocardial infarctions, 167 of which can be prevented by treatment. Thus 4.7 men need to be treated to prevent one event. Since in the overall group of 10 000 men, 675 events may be expected to occur in 10 years (overall event prevalence in PROCAM), identification of high-risk individuals by means of the multilayer perceptron may allow us to prevent no fewer than 25% of all myocardial infarctions among middle-aged men (167 x 100/675 = 25). By contrast, logistic regression and the probabilistic network would allow prevention of only 15% and 11% of myocardial infarctions, respectively.

In medicine, neural networks have seen increasing use, with about 60 new papers on the topic appearing in the medical literature every year.6 To our knowledge, however, this is the first report where neural networks have been applied to calculating risk of a coronary event in a prospective epidemiological study. Moreover, as far as we can ascertain, none of the published studies directly addressed the issues of biological plausibility and over-fitting using the methods described in the present report. We therefore believe that the approach we have taken may be of use not only in improving risk assessment in CHD but also in other areas where outcomes are determined by a large number of variables that interact in a complex non-linear fashion.


    Appendix
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 Appendix
 References
 References for Appendix 
 
A more detailed version of this appendix is available from the authors on request.

The multi-layer perceptron (MLP)
The final MLP we used in this paper was a three layer network (Figure 1Go) with 13 input units (nodes) corresponding to our 13 risk factors (x1,...x13), 4 hidden units and one output unit, modelling the dichotomous risk outcome.1

Pre-scaling of input values for the MLP and the probabilistic network
The input values (except the values from nominal variables) are pre-scaled to a range between 0 and 1 using

(1)
where the input vectors are arranged in rows and i and j are the row and column indices of the complete matrix.

Optimizing MLP-weights, cross-entropy error and sigmoid activation
Typically, the weights in an MLP are adjusted using least squares fitting together with a suitable back propagation algorithm2,3 to minimize a root mean square error (RMS) function. In order to interpret the network outputs as probabilities and to make them comparable to the results of the logistic regression, we however used a cross entropy error function to adjust the weights (Table 1Go) and to minimize the network fit criterion. This cross-entropy function can be derived from the likelihood of the underlying Bernoulli distribution of the entire training set, where the cross entropy error,

(2)
where t denotes the target value and is equal to 1 if class 2 is true and 0 otherwise and p(c2|x) is the probability that x belongs to class 2. The cross entropy error function is specially designed for classification problems, where it is used in combination with the logistic activation function

(3)
which maps all its arguments to values between 0 and 1, in the output layer of the network. The mean value of the cross-entropy function is called the average negative log-likelihood (ANLL),4 which indicates how accurately the network model predicts the probability of an adverse outcome. An optimal network is one in which the ANLL is minimized in both the training and test datasets.

Probabilistic Neural Networks (PNN)
We used a probabilistic network5 with four layers, an input layer, a radial or pattern layer, a summation layer and an output layer (Figure 2Go). The radial units are copied directly from the normalized training data, one per case. Each models a Gaussian function (such as exp[–(x - w)2/{sigma}2]) centered at the training case. The term x - w denotes the difference between input vectors and the weight matrices for class 1 and class 2, the term 1/2{sigma}2 is the radial deviation of the Gaussian kernel function and is used as an adjustable smoothing factor. There was one summation unit for each class (c1 and c2). Each is connected to all radial units belonging to its own class. The outputs were each proportional to the kernel-based estimates of the probability density function of the classes c1 and c2. The normalized outputs are estimates of the underlying class probabilities.

Preprocessing
The learning sets were randomly partitioned into a training and verification or test set (70/30%) and normalized to unity using Equation 1Go.

Training
Training was accomplished by adding sets of risk variables from individual study participants (pattern units) with the appropriate weight vectors in place.

Classification and post-processing
Summing the outputs of all the pattern units belonging to a single class (e.g. class 2) enabled us to compute the posterior probability distribution function for that class 2 {pi}d/2{sigma}dnc2p(x|c2), evaluated at the input point x, where n is the number of cases and d is the dimension of the space. Using Bayes’ theorem and the prior probabilities of class 1 and 2, we were able to compute the probability of class membership for each individual.

Bias, variance, ANLL and over-fitting
The aim in epidemiology is not to fit the training data to any specified accuracy, but to predict the probability of a disease outcome for individual cases with as few false positive and false negative results as possible. Any model used, such as a neural network, should therefore be flexible enough to learn the topography described by the training data. However, the network should not contain too many nodes or hidden layers, and should not have smoothing parameters that are too small, because this may cause the model to fit even the ‘noise’ that is inherent to every dataset. This phenomenon is known as ‘over-fitting’. An over-fitted network fails to generalize and gives poor results when predicting the probability of disease in previously unseen individuals. Bias,6 variance and average negative log likelihood are statistical parameters that allow us to obtain more insight into this problem and can be computed both from the network outputs and from the outputs of the logistic regression.

Table 2Go shows the root-mean-square (RMS) errors, the ANLL, the bias values, and the mean-square errors that we derived for the logistic regression model, the probabilistic neural network and the multilayer perceptron in our test and training datasets.


KEY MESSAGES

  • In the Prospective Cardiovascular Münster Study (PROCAM), a multilayer perceptron neural network substantially improved risk prediction compared to standard logistic regression.
  • Based on intervention trials that indicate that about one in three coronary events can be prevented by 5 years of lipid-lowering treatment, our analysis suggests that use of the multilayer perceptron might allow prevention of 25% of coronary events in middle-aged men, compared to 15% with logistic regression.

 


View this table:
[in this window]
[in a new window]
 
Table 1 Weights derived from a trained multi-layer perceptron with 13 input units (nodes), 4 hidden units and one output unit. The weights of the above set represent the connections between the four nodes (h1.1...h1.4) in the hidden layer and all input nodes. Weights connecting the hidden layer and the output node are shown in Table 4b. The threshold or bias weights are arranged in the first row of each Table
 

View this table:
[in this window]
[in a new window]
 
Table 1b
 

View this table:
[in this window]
[in a new window]
 
Table 2 Performance of the final models (logistic regression (MLF), probabilistic network (PNN) and multi-layer perceptron (MLP) network) comparing the root-mean-square error (RMS), the average negative log likelihood (ANLL), the BIAS and the mean-square error (MSE) using the training and test data. The models were trained using a two-third (2400 cases) random partition of the complete data. One-third (1204) of the cases were used for testing the generalization on unknown data. The remaining 1204 cases were used as internal verification set to minimize the network error of the MLP network. The parameters drawn from the verification set are not shown.
 

    Acknowledgments
 
The authors express their gratitude to the Landesversicherungsanstalt Westfalen and the Landesversicherungsanstalt Rheinland for support of this project.


    References
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 Appendix
 References
 References for Appendix 
 
1 Tu JV, Naylor CD. Clinical prediction rules. J Clin Epidemiol 1997;50:743–44.[CrossRef][ISI][Medline]

2 Wasson JH, Sox HC, Neff RK et al. Clinical-prediction rules—applications and methodological standards. New Engl J Med 1985; 313:793–99.[Abstract]

3 Anderson KM, Wilson PWF, Odell PM et al. An updated coronary risk profile—a statement for health professionals. Circulation 1991;83: 356–62.[ISI][Medline]

4 Assmann G, Schulte H, von Eckardstein A. Hypertriglyceridemia and elevated levels of lipoprotein (a) are risk factors for major coronary events in middle-aged men. Am J Cardiol 1996;77:1179–84.[CrossRef][ISI][Medline]

5 Califf RM, Armstrong PW, Carver JR et al. Task-force 5—stratification of patients into high, medium and low-risk subgroups for purposes of risk factor management. J Am Coll Cardiol 1996;27:1007–19.[CrossRef][ISI][Medline]

6 Dayhoff JE, DeLeo JM. Artificial neural networks. Cancer 2001; 91:1615–35.[CrossRef][ISI][Medline]

7 Guerriere MJ, Detsky AS. Neural networks—what are they. Ann Intern Med 1991;115:906–07.[ISI][Medline]

8 Hinton GE. How neural networks learn from experience. Scientific American 1992;267:145–51.[ISI]

9 Bishop CM. Neural Networks for Pattern Recognition. Oxford: Oxford University Press, 1985.

10 Richard MD, Lippmann RP. Neural networks classifiers estimate Bayesian posterior probabilities. Neural Computation 1991;3:461–83.

11 Funahashi T. Multilayer neural networks and Bayes decision theory. Neural Networks 1998;11:209–13.[CrossRef][ISI][Medline]

12 Tu JV. Advantages and disadvantages of using artificial neural networks versus logistic-regression for predicting medical outcomes. J Clin Epidemiol 1996;49:1225–31.[CrossRef][ISI][Medline]

13 Assmann G, Cullen P, Schulte H. The Munster Heart Study (PROCAM)—Results of follow-up at 8 years. Eur Heart J 1998; 19:A2–A11.[ISI][Medline]

14 Cullen P, Schulte H, Assmann G. The Münster Heart Study (PROCAM). Total mortality in middle-aged men is increased at low total and LDL cholesterol concentrations in smokers but not in nonsmokers. Circulation 1997;96:2128–36.[Abstract/Free Full Text]

15 Assmann G, Schulte H. Relation of high density lipoprotein cholesterol and triglycerides to incidence of atherosclerotic coronary artery disease (the PROCAM experience). Am J Cardiol 1992; 70:733–37.[ISI][Medline]

16 Assmann G, Schulte H. Results and conclusions of the Prospective Cardiovascular Münster (PROCAM) Study. In: Assmann G (ed.). Lipid Metabolism Disorders and Coronary Heart Disease. München: MMV Medizin Verlag, 1993, pp.19–68.

17 Hosmer DW, Lemeshow S. Applied Logistic Regression. 82–91. 1989. New York, John Wiley & Sons, Inc. Wiley Series in Probability and Mathematical Statistics. Barnett V, Bradley RA, Hunter S et al. (eds).

18 Goldberg DE. Genetic Algorithms. Reading, MA: Addison Wesley, 1989.

19 Bienenstock E, Doursat R. Neural networks and the bias/variance dilemma. Neural Computation 1992;4:1–58.[ISI]

20 Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin Chem 1993;39:561–77.[Abstract/Free Full Text]

21 Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristics curves derived from the same cases. Radiology 1983;148:839–43.[Abstract]

22 Nie NH. SPSS-X Users’ Guide. New York: McGraw-Hill, 1983.

23 Wood D, De Backer G, Faergemann O et al. Prevention of coronary heart disease in clinical practice. Recommendations of the second joint Task Force of European and other Societies on Coronary Prevention. Eur Heart J 1998;19:1434–503.[Free Full Text]

24 Anderson KM, Wilson PWF, Odell PM et al. An updated coronary risk profile—a statement for health professionals. Circulation 1991;83:356–62.[ISI][Medline]

25 Scandinavian Simvastatin Survival Study Group. Randomised trial of cholesterol lowering in 4444 patients with coronary heart disease: The Scandinavian Simvastatin Survival Study (4S). Lancet 1994; 344:1383–89.[ISI][Medline]

26 Shepherd J, Cobbe SM, Ford I et al. Prevention of coronary heart disease with pravastatin in men with hypercholesterolemia. New Engl J Med 1995;333:1301–07.[Abstract/Free Full Text]

27 Sacks FM, Pfeffer MA, Moye LA et al. The effect of pravastatin on coronary events after myocardial infarction in patients with average cholesterol levels. New Engl J Med 1996;335:1001–09.[Abstract/Free Full Text]

28 Downs JR, Clearfield M, Weis S et al. Primary prevention of acute coronary events with lovastatin in men and women with average cholesterol levels—Results of AFCAPS/TexCAPS. JAMA 1998;279: 1615–22.[Abstract/Free Full Text]


    References for Appendix 
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 Appendix
 References
 References for Appendix 
 
1 Ripley BD. Pattern Recognition and Neural Networks. Cambridge, UK: Cambridge University Press, 1996.

2 Patterson D. Artificial Neural Networks. Singapore: Prentice Hall, 1996.

3 Bishop CM. Neural Networks for Pattern Recognition. Oxford: Oxford University Press, 1995.

4 Kullback S, Leibler RA. On information and sufficiency. Annals of Mathematical Statistics 1951;22:79–86.[ISI]

5 Specht DF. Probabilistic neural networks. Neural Networks 1990;3:109–18.[CrossRef][ISI]

6 Geman S, Bienenstock E, Doursat R. Neural networks and the bias/variance dilemma. Neural Computation 1992;4:1–52.[ISI]