Observations, predictions and decisions—assessing cardiovascular risk assessment

Hans-Werner Hense

Institute of Epidemiology and Social Medicine, University of Münster, Germany. E-mail: hense{at}uni-muenster.de

All policy (including treatment) decisions should be based on absolute measures of risk; relative risk is strictly for researchers only.’

Geoffrey Rose1

Within the last decade, cardiovascular medicine has seen a proliferation of algorithms, functions, and scores that have been generated with the aim of accurately predicting the probability of a subsequent cardiovascular event in individuals free of symptomatic disease of the heart and the blood vessels. Most national and international guidelines for the management of patients with cardiovascular risk factors presently contain some formal assessment of the absolute risk of coronary heart disease (CHD) or cardiovascular disease (CVD).2–5 The magnitude of this risk is commonly expressed as the per cent chance of suffering a fatal or non-fatal event over the next 5 or 10 years. The data required for calculating risk involve basic demographic information such as age and sex, and varying biochemical, physiological, clinical, and lifestyle factors. Customarily, individuals under assessment should not yet have received medical interventions, such as medication, to lower their risk factor levels. Without exception, all risk prediction charts and risk scores are based on evidence obtained in prospective observational studies, thus constituting one of the rare examples in present day medicine where results of epidemiological observation are to be employed directly in the daily clinical care of individual patients.

The first attempts to assess the absolute risk of CHD date back to 1973 when the Committee on Reduction of Risk of Heart Attack and Stroke of the American Heart Association (AHA) published the Coronary Risk Handbook.6 In this booklet, the AHA presented for the first time an estimate of the 6-year risk of fatal and non-fatal CHD that was associated with certain values of the risk factors age, sex, systolic blood pressure, and total serum cholesterol, and, in addition, the presence of current smoking, diabetes mellitus, and ECG signs of left ventricular hypertrophy. These estimates were derived from an analysis of 16 years of follow-up in the Framingham Heart Study (FHS). The FHS used to invite participants for regular biennial re-examinations thus providing an unprecedented close and systematic follow-up of its cohort. Initially, the risk functions used to calculate cumulative risk were based on multivariate logistic regression. Methods were subsequently refined and became statistically more sophisticated when Anderson introduced a Weibull model7,8 to be able to accommodate rate density and different periods of follow-up in the computations. Of note, separate models were fit for distinct clinical endpoints including cerebrovascular and all cardiovascular events. The flexibility of this approach raised its acceptability and applicability for various purposes. It coincided with a debate about the utility of absolute, rather than relative, risk for public health and clinical decision making.1,9 Proponents of absolute risk eagerly adopted the new options of risk assessment and integrated them into innovative approaches to the management of patients in primary cardiovascular prevention.10 Elaborations11 of the initially rather complex computational procedures7,8 were subsequently suggested by the FHS investigators. The presently preferred coronary risk functions are based on coefficients derived from Cox regression models and an altered set of predictor variables.2 Indeed, until recently, use of the FHS risk functions was virtually synonymous with cardiovascular risk assessment in clinical guidelines.

However, alternative approaches have emerged. Algorithms derived from the pooled Danish Glostrup Population Studies and the Copenhagen City Heart Study were used to develop a coronary risk score in combination with an interactive management tool (PRECARD) that has been translated into several European languages.12,13 Likewise, the German PROCAM study developed another coronary risk score for men including risk factors such as low density lipoprotein instead of total cholesterol, triglycerides, and family history of premature coronary disease.14 The PROCAM score has gained access to the recommendations of the International Task Force for Prevention of Coronary Heart Disease (http://www.chd-taskforce.com). Concurrently, the European SCORE project (Systematic Coronary Risk Evaluation) pooled the data from 12 European cohorts with over 2.7 million years of follow-up to address the problem of regional variation of risk.15 As there was a lack of sufficient data on morbidity endpoints, the SCORE investigators had to use CHD and CVD mortality as an endpoint for risk estimations. They provided two separate risk charts to display the probability of death due to cardiovascular disease in European populations with a high and low background risk, respectively. SCORE was included in the most recent European Guidelines for the Prevention of CHD published by the European Society of Cardiology3 where it replaced the FHS charts.

Recently, a methodologically unique approach was chosen by Voss et al. who, using data from the PROCAM study, employed neural network techniques to predict the risk of coronary events.16 They concluded that neural network prediction was superior to the conventional logistic regression; however, external validation of this new method is yet missing.

In the future, cardiovascular risk assessment is likely to exert an impact on treatment allocation in primary prevention since clinical guidelines tend to employ predicted absolute risk as a tool for clinical decision making. To this end, threshold values of absolute risk are postulated to identify ‘high risk’ individuals who are assigned eligible for intensified management and drug treatment. Irrespective of the threshold value adopted, if absolute risk is systematically misclassified then part of the population will be either inappropriately elected for or withheld from medical treatment. As CHD and CVD incidence and mortality rates vary substantially between populations,17 the accurate assessment of the level of the ‘background risk’ in a population is therefore of utmost importance and portability of predictions from one population to another must be rigorously evaluated.

One way of accomplishing this is to compare the overall proportion of cases with CHD or CVD in a specific population that are predicted by risk functions to occur over a certain period with that actually observed. There are numerous reports indicating that FHS-derived risk charts are systematically overestimating risk, in particular for CHD, in different population settings. This has initially been thought to be confined to Mediterranean populations with their established low cardiovascular risk levels18–20 but there is now ample evidence to support the idea that this is the case for many populations from Western and Northern Europe.13,21–23 Interestingly, US-based population studies found a better agreement between FHS-derived predictions and observed coronary risk. This was, however, confined to white and black men and women but could not be confirmed in studies of individuals with a Native American, Japanese, or Hispanic ethnic background.24 A recent report from New Zealand confirms that in an ethnically mixed cohort (10% Maori, 5% Pacific Islanders, 85% European or other) the risk of any cardiovascular event was accurately predicted by FHS algorithms.25

Several factors may contribute to the observed inaccuracy and to the overestimation of risk in particular. First, the rate of cardiovascular events varies remarkably across populations and this population-specific ‘background risk’ is only partly explained by the traditional risk factors that are represented in most prediction models.26 Second, trends for CHD and CVD incidence and mortality in most industrialized countries are currently declining.27 In this situation, risk estimation based on observational periods that started some 20 or more years ago are implicitly prone to overestimation. Furthermore, there is also diversity between populations in terms of the prevalence and distribution of risk factors.28 Of note, identical risk factor levels are associated with similar relative risks but with varied absolute risks in different populations.29,30 Therefore, rather than using an individual's measured risk factor value in the risk function its position within the risk factor distribution of the population should be considered, for example by using its deviation from the population mean. Recently published work indicates that such methods of recalibration are effective when applying risk functions to new populations. FHS-derived predictions were modified by introducing population-specific event rates and risk factor means while maintaining the original regression coefficients derived from the Cox models in the FHS. This procedure, requiring that data on event rates and risk profile are locally available, appeared to work reasonably well in various populations.19,24

There may be other reasons why risk predictions are inaccurate. One argument frequently raised by the clinical community relates to the restricted number of risk factors included in prediction algorithms. This argument assumes that by inclusion of further or different risk factors the prediction becomes more accurate. In the PROCAM study 57 clinical and laboratory variables were investigated but only 8 of them were eventually included in the PROCAM risk score.14 However, when the prospective PRIME study evaluated the performance of the PROCAM and the FHS scores in population-based cohorts from Belfast and France it could not demonstrate a lower magnitude of overestimation by this approach.23

The SCORE project15 attempted to respond to the problem of population diversity by pooling data from 12 European cohorts with varying background risks. This approach must be considered a step in the right direction, and it gains attractiveness—apart from making more efficient use of age by improved statistical techniques—by encompassing all cardiovascular rather than only coronary fatal events as endpoints. The SCORE investigators are presently about to implement an interactive tool in the internet where local mortality figures can be plugged in to obtain ‘calibrated’ regional estimates of risk of fatal events. Unfortunately, acceptance may be hampered by the fact that fatal endpoints represent only part of the population risk experience. Not only are absolute risks for fatal events considerably lower than those for morbid events—thus compromising, for example, risk communication for patients and clinicians alike—but it is also evident that case fatality of myocardial infarction and stroke are declining rapidly, from different absolute magnitude with different rates, in many populations. Moreover, the speed of alteration of case fatality differs between populations.27 Evaluations of the SCORE predictions of current rates in differing populations are therefore to be awaited.

Furthermore, the assumption of equivalence of the relative hazard estimates derived for single risk factors in the risk function and in the population where this function is to be applied needs confirmation. There is fairly consistent evidence from numerous studies to conclude that relative risk estimates are similar in most populations even when ethnicity varies24 although a recent analysis of the Atherosclerosis Risk in Communities (ARIC) study cautions against generalizing FHS coefficients particularly in women.31 Related to this is a problem that is inherent in most prospective studies and was termed the regression dilution bias. It refers to the fact that risk factor measurements taken at one occasion—here: the baseline examination of a cohort—underestimate relative risk in particular for risk factors with a high intra-individual variability.32 As a consequence, the relative hazard associated with clinically obtained ‘usual’ risk factor levels, that is, by means of repeated measurements on several occasions, is principally higher than that reflected by the coefficients of risk functions. To date, however, the impact of regression dilution bias has not been systematically evaluated as epidemiological studies are usually unable to supply the necessary data. In theory, underestimation may be as high as one-third for factors such as blood pressure,32 hypothetically rendering prognosis particularly inaccurate in those having high values of risk factors with great within-person variability. Of note, the SCORE investigators reported that the impact of computationally extrapolated ‘usual’ values on absolute risk was generally negligible.15

Another frequently disregarded point should be given attention. In the context of evaluation of risk functions, one should bear in mind that incidence and mortality rates currently observed in cohort studies are probably not unbiased endpoints for the evaluation of risk prediction systems. The point has been raised that the natural course, without intervention, from risk factor to cardiovascular event is increasingly difficult to observe33 because event rates in most populations get contaminated by the rising prevalence of people with varying intensity of medical interventions. For the purpose of risk assessment in primary prevention, however, it is necessary to obtain valid estimates of disease occurrence among those who remain untreated over the entire period of prediction. As one way of dealing with this problem, indicator variables, for example for the use of antihypertensive medication, have been included in recent prediction models.2 It should be noted that evaluations that do not take the amount of contamination by intervention into account may make predictions look worse than they actually are.

Epidemiological point estimates are by necessity derived with an imprecision that is commonly expressed as the 95% CI of the parameter estimates. The computation of the variance estimators of absolute risk is particularly complex as it involves estimates of population-specific hazard ratios and average event rates as well as their covariance.34 Customarily, CI are not reported in risk charts or with risk scores thus potentially conferring an inappropriate sense of precision that is evidently unfounded. In an earlier report from the FHS,7 95% CI for the 10-year predicted CHD risk ranged from about 6% to more than 30% in width, demonstrating particularly imprecise predictions for individuals with extreme risk factor constellations. In addition, precision may be further compromised by factors peculiar to the individual's examination, for example, specific measurement errors.

It has become common practice to evaluate the predictivity of risk scores in the original as well as in external population samples by submitting them to an assessment of the area under a receiver operating characteristic (ROC) curve (or c statistic). For a binary decision rule, the ROC plots the proportion of true-positive versus false-positive results observed at each point across the entire range of predicted risks. The area-under-the-curve (AUC) can be interpreted as representing an estimate of the probability that the risk function assigns a higher risk to those who develop the endpoint of interest over the prediction period than to those who do not. Values of the AUC range from 0.5 (mere chance) to 1 (perfect prediction). For example, the AUC for CHD risk prediction in men and women from the FHS was 0.79 and 0.83, respectively, when assessed internally and ranged from 0.67 to 0.75 in men and from 0.66 to 0.83 in women when evaluated externally against six multi-ethnic studies.24 Likewise, in the SCORE project predicting risk of fatal CVD, the AUC in the cohorts not used to derive the risk function was between 0.70 and 0.72 among high-risk and 0.71 and 0.84 in low-risk populations.15 In general, the externally assessed AUC rarely exceed 0.80 in men and 0.85 in women, often they were much lower than that,21–23 a finding apparently also true for the PROCAM score involving a new set of risk predictors.23 Hence, aside from the problems of availability and cost effectiveness, AUC improvements observed by inclusion of new biochemical and subclinical variables may have to stand the test of external evaluation before being accepted as the way to go.31

Generally, however, one may also have to question the utility of using AUC for the evaluation of a risk function from the perspective of its clinical application. This method is most useful when comparing the overall predictive performance of competing approaches or when assessing the impacts of different potential threshold values. Thus, each point on a ROC plot can be represented by a 2 x 2 table that contains information on the proportion of true positives (sensitivity) and false positives (1 - specificity) that arise from using this particular point as a threshold value in a clinical decision rule. However, such threshold values have already been set in many guidelines and, as appropriately discussed by the SCORE authors,15 they vary considerably between guidelines. Therefore, rather than evaluating performance across the entire range of possible predicted risks it seems clinically more meaningful to evaluate the performance of risk functions only at these prespecified cut-offs by reporting sensitivity, specificity, and positive clinical likelihood ratios: these indicators of the accuracy of classification of future cases will help to confer a clearer understanding of the foundation of the recommended clinical decisions. Few studies have chosen to explicitly report these figures. Milne et al. have very recently provided such information when applying the New Zealand National Heart Foundation risk charts,35 derived from an FHS risk function,7 to a cohort of 6354 men and women, aged 35–74 years. They show that at the recommended 15% threshold of 5-year CVD risk the specificity of over 90% in both men and women is paired with a sensitivity of only 20% in men and 27% in women. Likewise, the SCORE investigators present data to show that the risk threshold of 5% of fatal CVD over 10 years—the recommended intervention level according to the latest ESC prevention guideline3—is associated with sensitivities ranging from 59% to 83% and specificities of 46% to 73% in high-risk populations, and 20% to 43% and 90% to 96%, respectively, in low-risk populations.15 The respective positive clinical likelihood ratios, measuring the power of augmenting the probability of an event in a person with predicted risk above threshold, had a range from around 1.5 to 4.5 in both studies, indicating only moderate predictivity associated with the risk scores in individuals. Interestingly, using a threshold value of predicted coronary risk of 20% over 10 years in the PROCAM neural networking approach,16 a multiple-layer perceptron (MLP) technique performed with a sensitivity of 74.5%, specificity of 97%, and a positive likelihood ratio of 24.7 in the autochthonous data set. Here again, due external evaluation may dampen overoptimistic expectations.36 The implications of these characteristics of a clinical decision rule based on thresholds of predicted absolute risk shed a fairly sobering light on the utility of this approach both in terms of missing large proportions of those experiencing the event over the prediction period (who need intensified care, i.e. the false negatives) while at the same time giving such care to many of those remaining free of the event, at least in the short run (the false positives).

As for the predictivity in the individual patient, it is crucial to remain always clear that the decision rule by itself, that is, by using predicted probabilities, cannot be expected to be a good, let alone a near perfect classifier of future events—even if high accuracy of the predicted probabilities is assumed. Indeed, specifying, for example, a 5-year risk of CVD of 15% as the threshold for a clinical intervention means that up to 85% of those exceeding this cut-off are predicted to probably not experience an event over the prediction period. Conversely, among those remaining below threshold, for example, 1 in 10 or 1 in 20 (5-year CVD risks of 0.10 or 0.05, respectively) are expected to probably have an event during the next 5 years. In other words, probability based thresholds implicitly contain the information of the expected, fairly modest, predictive performance in individuals. This information is essentially valid with appropriate population calibration and measurement technique while it is difficult to anticipate the size and direction of bias brought about by the various methodological problems outlined above.

What then are we to conclude from all this? First, calibration of risk functions to current regional or local event rates and risk profiles is indispensable.37 With some caution, relative hazard estimates appear portable between populations. Second, endpoints should be unified to encompass many cardiovascular endpoints in order to attain total risk assessment and to avoid heterogeneity between guidelines. Pooling of cohort studies seems the best way to achieve this goal giving proper consideration, aside from all technicalities, also to aspects of regression dilution bias and measurement error. Third, thresholds should be defined in a uniform and comparable way explicitly describing the rationale behind them, such as, for example, expected benefit of treatment.38 After appropriate calibration the regional implications in terms of predictivity for the community at large and for the individual patients should be made clear. Some measure of uncertainty must accompany individual risk estimates. Clinical decisions rules essentially based on specified thresholds of absolute risk should explicitly contain information on all of the above to allow informed discussions between clinicians and patients. Fourth, cardiovascular risk assessment is a valid and excellent tool for risk communication, in particular when visually edited, e.g. as a risk chart. It allows the instructive demonstration of absolute and relative risk, of attributable fractions (or rather: excess risk), and of risk advancement periods, thus helping medical staff and patients to understand the various aspects of risk. It may well turn out to be the most valuable and least debatable aspect of all this.

References

1 Rose G. Environmental health: problems and prospects. J R Coll Physicians Lond 1991;25:48–52.[ISI][Medline]

2 Expert Panel on Detection EaToHBCiAAI. Executive Summary of the 3rd Report of the National Cholesterol Education program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (ATP III). JAMA 2001;285:2486–97.[Free Full Text]

3 Third Joint Task Force of European and other Societies on Cardiovascular Disease Prevention. European Guidelines on Cardiovascular Disease Prevention in Clinical Practice. Eur Heart J 2003;24:1601–10.[Free Full Text]

4 Chobanian A, Bakris GL, Black HR et al. The 7th Report of the Joint National Committee on Detection, Prevention, Evaluation and Treatment of High Blood Pressure. JAMA 2003;289:2560–72.[Abstract/Free Full Text]

5 Ramsay LE, Williams B, Johnston GD et al. British hypertension society guidelines for hypertension management 1999: summary. BMJ 1999;319:630–35.[Free Full Text]

6 AHA Committee on Reduction of Risk of Heart Attack and Stroke. American Heart Association Coronary Risk Handbook. 1973.

7 Anderson KM, Odell PM, Wilson PW, Kannel WB. Cardiovascular disease risk profiles. Am Heart J 1991;121:293–98.[ISI][Medline]

8 Anderson KM, Wilson PW, Odell PM, Kannel WB. An updated coronary risk profile. A statement for health professionals. Circulation 1991;83:356–62.[ISI][Medline]

9 Sackett D, Haynes R, Guyatt G, Tugwell P. Clinical Epidemiology, 2nd Edn. Boston, Toronto, London: Little, Brown and Company, 1991.

10 Jackson R, Barham P, Bills J et al. Management of raised blood pressure in New Zealand: a discussion document. BMJ 1993;307:107–10.[ISI][Medline]

11 Wilson P, D'Agostino R, Levy D, Belanger A, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation 1998;97:1837–47.[Abstract/Free Full Text]

12 Thomsen TF, Davidsen M, Ibsen H, Jorgensen T, Jensen G, Borch-Johnsen K. A new method for CHD prediction and prevention based on regional risk scores and randomized clinical trials. J Cardiovasc Risk 2001;8:291–97.[CrossRef][ISI][Medline]

13 Thomsen TF, McGee DM, Davidsen M, Jorgensen T. A cross-validation of risk scores for coronary heart disease mortality based on data from the Glostrup Population Studies and Framingham Heart Study. Int J Epidemiol 2002;31:817–22.[Abstract/Free Full Text]

14 Assmann G, Cullen P, Schulte H. Simple scoring scheme for calculating the risk of acute coronary events based on the 10-year follow-up of the Prospective Cardiovascular Münster (PROCAM) Study. Circulation 2002;105:310–15.[Abstract/Free Full Text]

15 Conroy RM, Pyorala K, Fitzgerald AP et al. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur Heart J 2003;24:987–1003.[Abstract/Free Full Text]

16 Voss R, Cullen P, Schulte H, Assmann G. Prediction of risk of coronary events in middle-aged men in the Prospective Cardiovascular Münster Study (PROCAM) using neural networks. Int J Epidemiol 2002;31:1253–62.[Abstract/Free Full Text]

17 WHO MONICA Project. Myocardial infarction and coronary deaths in the World Health Organization MONICA Project. Registration procedures, event rates and case fatality rates in 38 populations from 21 countries in four continents. Circulation 1994;90:583–612.[Abstract]

18 Menotti A, Puddu PE, Lanti M. Comparison of the Framingham risk function-based coronary chart with risk function from an Italian population study. Eur Heart J 2000;21:365–70.[Abstract/Free Full Text]

19 Marrugat J, D'Agostino R, Sullivan L et al. An adaptation of the Framingham coronary heart disease risk function to European Mediterranean areas. J Epidemiol Community Health 2003;57:634–38.[Abstract/Free Full Text]

20 Laurier D, Nguyen PC, Cazelles B, Segond P. Estimation of CHD risk in a French working population using a modified Framingham model. The PCV-METRA Group. J Clin Epidemiol 1994;47:1353–64.[ISI][Medline]

21 Brindle P, Emberson J, Lampe F et al. The Framingham score overestimates coronary risk in British men. A prospective study. BMJ 2003;327:1267–73.[Abstract/Free Full Text]

22 Hense HW, Schulte H, Lowel H, Assmann G, Keil U. Framingham risk function overestimates risk of coronary heart disease in men and women from Germany—results from the MONICA Augsburg and the PROCAM cohorts. Eur Heart J 2003;24:937–45.[Abstract/Free Full Text]

23 Empana JP, Ducimetiere P, Arveiler D et al. Are the Framingham and PROCAM coronary heart disease risk functions applicable to different European populations. Eur Heart J 2003;24:1903–11.[Abstract/Free Full Text]

24 D'Agostino RB, Grundy S, Sullivan L.M., Wilson PW. Validation of the Framingham coronary heart disease prediction scores. Results from a multiple ethnic groups investigation. JAMA 2001; 286:180–87.[Abstract/Free Full Text]

25 Milne R, Gamble G, Whitlock G, Jackson PR. Framingham Heart Study risk equation predicts first cardiovascular event rates in New Zealanders at the population level. N Z Med J 2003;116: U662.[Medline]

26 Stewart AW, Kuulasmaa K, Beaglehole R, for the WHO MONICA Project. Ecological analysis of the association between mortality and major risk factors of cardiovascular disease. Int J Epidemiol 1994;23:505–16.[Abstract]

27 Tunstall-Pedoe H, Kuulasmaa K, Mähönen M et al. Contribution of trends in survival and coronary-event rates to changes in coronary heart disease mortality: 10-year results from 37 WHO MONICA Project populations. Lancet 1999;353:1547–57.[CrossRef][ISI][Medline]

28 Kuulasmaa K, Tunstall-Pedoe H, Dobson A et al. Estimation of contribution of changes in classic risk factors to trends in coronary event rates across the WHO MONICA Project populations. Lancet 2000;355:675–87.[CrossRef][ISI][Medline]

29 Verschuren WM, Jacobs DR, Bloemberg BP et al. Serum total cholesterol and long-term coronary heart disease mortality in different countries. Twenty-five-year follow-up of the Seven Countries Study. JAMA 1995;274:131–36.[Abstract]

30 van den Hoogen PC, Feskens EJ, Nagelkerke NJ, Menotti A, Nissinen A, Kuulasmaa K. The relation between blood pressure and mortality due to coronary heart disease among men in different parts of the world. Seven Countries Study Research Group. New Engl J Med 2000;342:1–8.[Abstract/Free Full Text]

31 Chambless L, Folsom A, Sharrett A et al. Coronary heart disease risk prediction in the ARIC study. J Clin Epidemiol 2003;56:880–90.[CrossRef][ISI][Medline]

32 Clarke R, Shipley M, Lewington S et al. Underestimation of risk associations due to regression dilution in long-term follow-up of prospective studies. Am J Epidemiol 1999;150:341–53.[Abstract]

33 Hennekens CH, D'Agostino RB. Global risk assessment for cardiovascular disease and astute clinical judgement. Eur Heart J 2003;24:1899–900.[Free Full Text]

34 Benichou J. Absolute risk. In: Gail MH, Benicou J (eds). Encyclopedia of Epidemiologic Methods. Chichester: Wiley & Sons, 2000, pp. 1–17.

35 Milne R, Gamble G, Whitlock G, Jackson PR. Discriminative ability of a risk-prediction tool derived from the Framingham Heart Study compared with single risk factors. N Z Med J 2003;116:U663.[Medline]

36 May M. Commentary: Improved coronary risk prediction using neural networks. Int J Epidemiol 2002;31:1262–64.[Free Full Text]

37 Hense HW. Risk factor scoring for coronary heart disease. Prediction algorithms need regular updating. BMJ 2003;327:1238–39.[Free Full Text]

38 Jackson R. Updated New Zealand cardiovascular disease risk-benefit prediction guide. BMJ 2000;320:709–10.[Free Full Text]