a President, International Epidemiological Association 19741977, 250 Partops Mountain Road, #35 Charlottesville, Virginia 22911-8680, USA.
SirThe International Journal of Epidemiology published a superb set of papers to celebrate Jerry Morris's enduring contributions to the ever-broadening applications of epidemiological concepts. To the authors' thoughtful comments may be added his role in initiating and shaping Health Services Research in the US and beyond. George Davey Smith1 has already mentioned the 1952 University of North Carolina's Chapel Hill conference on research requirements for health and medical care. That conference, and especially Jerry's keynote address, helped to establish a major research agenda at what was the first new medical school in the US after World War II. I was attracted from McGill University to join the founding faculty of the Department of Internal Medicine in 1953, as were Frank Williams, Bob Huntley, and Dan Martin. John Cassell and Sydney Kark from South Africa and Bernie Greenberg were similarly attracted to the also new School of Public Health. We were friends, taught in each other's classes in the two schools, and initiated studies under the rubric of Medical Care Research. Jerry's landmark 1955 article in the British Medical Journal on the Uses of Epidemiology, the prelude to his classic volume, intrigued me further. The result was a sabbatical year (19591960) in London spent jointly with Jerry at the London Hospital where he was then based and at the London School of Hygiene and Tropical Medicine. Whatever I was able to pass on later to students and colleagues was learned from Jerry Morris.
Here are several examples.
Jerry had introduced me to the UK's Hospital In-Patient Enquiry (HIPE) and the Register General's fascinating but little analysed reports. Indeed this enterprise embodied an idea that Florence Nightingale had urged upon the medical fraternity a century earlier. Jerry also emphasized the still startling message in Table 3 of Davey Smith's article summarizing the Glover Phenomenon.1 On return to Chapel Hill, I had hoped to use Glover's model to examine hospital and clinical performance both locally and eventually more extensively. Accordingly, I attempted to introduce into the North Carolina Memorial Hospital a discharge abstracting system, the Professional Activities System (PAS), which Vergil Slee was developing with support from the Kellogg Foundation. My biomedical colleagues at the University of North Carolina informed me that such ideas were unwelcome and that I obviously did not know I was working in a university hospital whose activities required no monitoring but rather set the gold standard for all hospitals in its region.
Two years later in 1962, the University of Vermont invited me to head what we called the Department of Epidemiology and Community Medicine (the first US medical school to use the term epidemiology' as a department label). One major attraction to Vermont was its relatively small population (then about 400 000), compact geography, and the prospect of installing a hospital discharge abstract system (PAS) in all that state's hospitals. The goal, as in North Carolina, was to examine the Glover and related phenomena by comparing population-based rates for hospital admissions (or discharges), procedures, and outcomes. With the help of John Last, whom I had recruited from Australia to our department in Vermont, we did just that. We went about installing PAS in all Vermont hospitals for what would be the first population-based hospital discharge abstract system in the US. It was not easy! There were endless meetings with hospital administrators, their boards and lawyers, as well as with health officers and local politicians; most required intense persuasion. Even the Vermont Medical Society paraded me with the charge of mounting a communist plot, and so forth.
Before we could generate adequate data or attempt analyses, I was offered the job of founding a new department at Johns Hopkins (initially the Department of Medical Care and Hospitals, later Health Care Organization, and now the Department of Health Policy and Management). Somewhat reluctantly, I left Vermont for Baltimore with the unwholesome thought that a message from Hopkins might attract more attention than the same one from a provincial medical school. In addition, I was to have expansive real estate within the hospital and a large budget. Osler once remarked to his three fellow horsemen: It's lucky we have jobs in this place because we would never get in as students! It is also said of Hopkins that it would not matter if the faculty were all idiots because the students are so good. One of our first such students in 1965, Jack Wennberg, was interested in what we still called Medical Care Research. I introduced him to Jerry's work, the Glover Phenomenon, and the prospects of small area comparisons, and told him about the wealth of unmined hospital discharge abstract data being generated by Vermont hospitals. We put him in touch with authorities in Vermont and he took it from there. His Center for Evaluative Clinical Science at Dartmouth is now a global pioneer and leader in pursuing Glover's and Jerry's visions with, of course, creative and greatly enhanced data and methods. We now have national atlases depicting the alarming variations in medical procedures and outcomes. Jerry should be delighted!
Convinced in 1968 that Florence Nightingale and the UK's more recent HIPE reports were the way to go, we organized from Hopkins an international conference on Hospital Discharge Abstract Systems, a number of which were evolving by this time, but none of which contained the same core data elements and none were population-based. From this conclave, Nightingale's notion of Minimum Uniform Data Sets re-emerged. Their adoption by the US National Center for Health Statistics (NCHS) followed, albeit gradually; 17 years to be exact! This set was followed by two additional conferences and promulgation of minimum data sets for ambulatory care and for long-term care. Jerry Morris had introduced me to this relatively simple idea for generating the essential population-based data required to assess the use of a population's health services.
Although we had been barred from monitoring clinical performance for inpatients at the University of North Carolina Hospital, Bob Huntley, Frank Williams, and I pursued the same goals in a new educational venue we called a General Clinic, for which we were responsible. Among the several investigations we mounted was a systematic review of all our medical charts. We thought we ran an unusually tight ship but found to our dismay that we were experiencing unacceptably large error rates. There were unexamined abnormal laboratory results and X-rays, broken return appointments, and failures to notify referring physicians of the results of their patients' studies. We published what seems to have been the first assessment of the quality of medical care in an ambulatory setting. But that was not the end! When I arrived at Hopkins in 1965 I again tried to introduce a system of hospital discharge abstracts and met the same negative reaction; the gold standard, I was told, was of even higher quality than that in North Carolina. From the Johns Hopkins Hospital, we switched to the outpatient clinics of the Baltimore City Hospital, a teaching affiliate of the Hopkins Medical School. Here I persuaded Julie Krevans, the Chief of Medicine, to allow Bob Brook, recently the Chief Resident on the Osler Medical Service in the Johns Hopkins Hospital and now a graduate student in our department, to replicate the study we had done in the General Clinic at the University of North Carolina. Imbued with Jerry's thinking, especially in relation to outcome comparisons of teaching and non-teaching hospitals (Table 2 in Davey Smith's paper), Bob worked with John Williamson in our department to complete an investigation with results that were even worse than our original ones at Chapel Hill. His paper was published in the New England Journal of Medicine. Eventually the Johns Hopkins Hospital allowed us to replicate the study and again, the results were unacceptable. Bob went on from there to his current role as Vice-president of the Rand Corporation and Professor of Medicine at the University of California, Los Angeles. His work on outcomes and the appropriateness of medical procedures has set new standards and constituted tangible testimony of Jerry's ubiquitious influence on Health Services Research in the US.
At the University of North Carolina, as I indicated above, our group included colleagues in both the new School of Medicine and the new School of Public Health (housed in the basement of the medical school building). At the time I questioned the wisdom of separating these two academic entities but accepted the emerging realities since we had other fish to fry. When I arrived at Hopkins our new department was based organizationally in the School of Hygiene and Public Health but housed in the Johns Hopkins Hospital. Subsequently we were required to move to fancier quarters in a new addition to the old School of the Public Health building. Here I realized that two vastly different cultures were separated by Wolfe Street. Again, I was reminded that Jerry Morris argued that there are three venues for studying health and disease, the bedside or clinic, the laboratory, and the population. Why should the third dimension be separated academically from the first two? Given the 1965 advent of new publicly funded Medicare and Medicaid legislation in the US, it seemed to me, again reflecting Jerry's vision, that the country's health departments would eventually have responsibility for spending and/or monitoring the disbursement of these funds and for assessing their impact on the health of the populations served. This would require a new kind of physician and a new approach to managing health care arrangements.
It is true that US schools of public health had something called Departments of Public Health Administration but I could never find out what they administered, let alone what difference it made, and to whom. How could we stimulate discussion of these matters? Again, I turned to Jerry and invited him to be visiting professor at Hopkins. Here he gave his seminal talk on Tomorrow's Community Physician. It was a great local success and drew a much larger crowd than Archie Cochrane's talk a few years later. We debated whether he would submit it to the New England Journal of Medicine as Bradford Hill and Will Pickles had done with similar original messages several years previously. Jerry's thoughts seemed too stark for US readers and he elected to publish in The Lancet. Liam Donaldson has reviewed the impact of Jerry's paper in the UK. Sadly, in the US there is still great confusion about the structure, responsibilities, and staffing of what we now call the public health infrastructure. Nowhere is this more apparent than in American schools of public health where currently only about seven per cent of the students are physicians, and of those probably less than half are US citizens. Jerry's message of 30 years ago has yet to be embraced in the US.
Here is another example. Jerry introduced me to the reports that WPD Logan of the Registrar General's Office was publishing on the content of General Practice in the UK and to Tev Eimerl's E-book for recording encounter data. It seemed to me incredible that we were operating a system of medical education with little knowledge about the tasks physicians were expected to undertake for their ambulatory patientsthe vast bulk of medical practice. Accordingly, John Last and I undertook to replicate and publish E-book recordings we persuaded a group of Vermont General Practitioners to generate. I believe this was the first of its kind in the US. Several years later, again, based on Jerry's tutoring, I persuaded Ted Woolsey, newly appointed Director of the NCHS, that medical schools required a survey of the content of ambulatory care as a guide for constructing their curricula and guiding the composition of the country's physician workforce. Woolsey gave our Hopkins department a grant to design the National Ambulatory Medical Care Survey (NAMCS). When rolled out nationally it became an immediate best seller and in great demand by many medical school faculties. Although Jerry might use a different term, some might call it essential marketing information!
I firmly believe that none of these and other Health Services Research developments in the US would have happened without Jerry Morris's inspired scientific creativity. Thank you Jerry!
Reference
1
Davey Smith G. The uses of Uses of Epidemiology. Int J Epidemiol 2001;30:114655.
b The University of Cambridge, The Institute of Public Health, Strangeways Research Laboratory, Wort's Causeway, Cambridge CB1 8RN.
Professor Willett, in his editorial,1 dismisses the relevance of our paper,2 in which we compare the performance of a food frequency questionnaire (FFQ) with a 7-day diet diary in estimating nitrogen and potassium intake, using nitrogen and potassium measurements from six 24-h urine collections as independent biomarkers of intake. He claims we have been unfair to our FFQ. His main criticism is that we have not adjusted for energy intake. The principal reason for focussing on energy-adjusted intakes is that both in the epidemiological and the experimental setting, interest is primarily in comparing isocaloric diets. We would agree. Energy adjustment is highly desirable. The question is whether it can be done. Although Professor Willett has argued repeatedly for isocaloric comparisons, he has never, to our knowledge, produced any evidence on the accuracy with which the FFQ can estimate energy intake, apart from one small (n = 20) study with doubly-labelled water in which it was concluded that none of the methods (including the Willett FFQ) gave accurate estimates of the usual energy requirements of individual subjects'.3 As part of the programme of dietary assessment validation associated with our EPIC-Norfolk cohort, we have estimated energy expenditure, which in the absence of changes in body weight, we take to be equivalent to energy intake. Using 4-day, individually calibrated, heart rate monitoring (HRM), it has been shown that this method agrees well with estimates of energy expenditure using doubly-labelled water.4 In the figure, we show the relationship between the FFQ estimate of energy intake with the HRM estimate of energy expenditure in 100 individuals. For comparison, we give the 7-day diary estimates of energy intake and also weight. As can be seen, the FFQ estimate of energy intake is almost independent of the HRM values. The diet diary does better, and weight is the most strongly associated with energy expenditure. Results from a larger study (n = 448) in the US using doubly-labelled water, the OPEN study, corroborate our finding, that a FFQ provides a very poor measure of energy intake (Kipnis, personal communication). Thus Willett's principal reason for energy adjustment is void. Using FFQ, energy adjustment cannot be achieved because energy intake is not adequately measured. If one wanted to adjust for true energy intake, then one would do best by adjusting for weight. This of course is often done, but its effect on diet-disease associations is usually minimal.
|
Our paper may well be one of the first where such estimates, using independent biomarkers, are given.7 It is clear from these papers that these error correlations can be large, and that their effect on fitting multivariate dietary models for disease risk will also be large, and difficult to predict. The first step in understanding the role of correlated error, and hence of controlling it, is to understand in full what the multivariate error structure actually is. Our paper is a move in this direction: Biomarkers do not exist for energy adjusted intakes, they exist for absolute intakes. Hence to understand error structure, one has to start with absolute intakes, as we do. The error structure of derived measures, such as energy adjusted intakes, is then a secondary calculation. We are proceeding, with the data from the study we reported, in just this direction, as are our American colleagues with the OPEN data.
The second, subsidiary, criticism of our paper is that we did not adjust for age and sex. If the aim of our paper had been to provide regression dilution parameters for general epidemiological use, this criticism would be valid. One would always perform age-sex adjusted disease association analyses. However, our purpose was more limited, namely to examine how the error variance compared with the full true variation in the population. With colleagues (Kipnis, personal communication) we have in fact done a parallel analysis adjusted for age and sex. The results will bring no comfort to Professor Willett.
The effect on both the FFQ and the 7-day diaries is approximately to double the regression correction factors. For the three nutrients we consider, the correction factors for the FFQ now lie between 10 and 20, i.e. the observed regression coefficients will underestimate by a factor of between 10- and 20-fold the true values. Real effects would have to be very large for the message to escape from the noise. It has become clear over the last few years that using other record type instruments to attempt to validate the FFQ leads to an underestimation of the error variance of the FFQ. Correlations of about 0.3 between errors in the FFQ and in a diet record, as we report, can lead to substantial such underestimation. Statements, often repeated, that the FFQ is a validated instrument are seriously misleading. The underestimation of relative risk estimates consequent on using the FFQ will be substantially greater than suggested by the partial validation usually reported. If our age-sex adjusted analyses are replicated in other data sets, it would seem that little weight should be given to negative findings using a traditional FFQ, especially in cohort studies where the variation in true intake across cohort members may be rather limited. Professor Willett has done much over the past two decades to popularize the FFQ. Perhaps he should join the efforts being made to validate it rigorously.
References
1
Willett W. Commentary: Dietary diaries versus food frequency questionnairesa case of undigestible data. Int J Epidemiol 2001;30:31719.
2
Day NE, McKeown N, Wong MY, Welch A, Bingham S. Epidemiological assessment of diet: a comparison of a 7-day diary with a food frequency questionnaire using urinary markers of nitrogen, potassium and sodium. Int J Epidemiol 2001;30:30917.
3 Sawaya AL, Tucker K, Tsay R et al. Evaluation of four methods for determining energy intake in young and older women: comparison with doubly labelled water measurements of total energy expenditure. Am J Clin Nutr 1996;63:49199.[Abstract]
4 Spurr GB, Prentice AM, Murgatroyd PR, Goldberg GR, Reina JC, Christman NT. Energy expenditure from minute-by-minute heart-rate recording: comparison with indirect calorimetry. Am J Clin Nutr 1998;48:55259.[Abstract]
5 Willett W. Nutritional Epidemiology. New York: Oxford University Press, 1998.
6 Margetts BM, Nelson M. Design Concepts in Nutritional Epidemiology. Oxford: Oxford University Press, 1997.
7
Kipnis V, Midthune D, Freeman LS, Bingham SA, Schatzkin A, Carroll RJ. Empirical evidence of correlated biases in dietary assessment instruments and its implications. Am J Epidemiol 2001;153:394403.
c Department of Nutrition, Harvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA.
The agreement by Day et al. that isocaloric diets, i.e. energy-adjusted intakes, are of primary interest in experimental and epidemiological studies is welcome. This concept is fundamentally important in the analysis and interpretation of studies in nutritional epidemiology.1 For this reason, the validity of energy-adjusted intakes should be the primary focus of validation studies of dietary assessment methods. Because Day et al.2 had cast doubt on the performance of their food frequency questionnaire (FFQ), and FFQs in general, based only on a measure of crude protein intake, and did not adjust for age and sex, I have pointed out in a commentary that their conclusions could be misleading.3
In their letter, Day et al.2 say that they have now adjusted their results for age and sex, as would be necessary to draw any conclusions applicable to use of their methods in epidemiological studies, and that both FFQ and diet record perform less well. This would be expected because these adjustments reduce the true between-person variation in protein intake without reducing measurement error. However, they still have not provided data useful for epidemiological applications because their results remain unadjusted for total energy intake, which they agree is of primary importance.
Day et al. claim that nutrient intakes cannot be adjusted for total energy intake without highly valid measures of total energy intake, and that there has not been a single paper indicating how adjustment for total energy intake might control for measures in measurement. As described on page 113 of Nutritional Epidemiology4 adjustment for total energy intake can reduce errors in energy-adjusted nutrient intakes due to cancellation of correlated errors in measurements of the specific nutrient and energy intake. These errors will inevitably be strongly correlated because nutrients and energy intake are calculated from the same foods. Thus, for example, if red meat is over-reported, both fat intake and energy intake will be proportionally overestimated. The reduction in error by adjustment for total energy intake can be illustrated by considering the nutrient density (nutrient divided by total energy intake) as the energy-adjusted nutrient, and defining error as the difference between intake measured by diet records and by FFQ. As shown in the Table, the correlations between errors for energy intake and specific nutrients are high, even for nutrients that do not provide energy; for total fat intake this correlation was 0.92. The reduction in measurement error can be appreciated by considering that observed nutrient intake (No) for an individual is equal to true nutrient intake (Nt) multiplied by an over/underestimation error (eN), and observed energy intake (Eo) is equal to true energy intake (Et) multiplied by another over/underestimation error (eE). The observed nutrient intake adjusted for energy intake, expressed as the nutrient density is thus:
![]() |
|
When total energy and nutrient intakes are measured by different methods with independent errors, such as when protein intake is measured by urinary nitrogen excretion and energy intake is measured by doubly-labelled water (DLW), adjustment for energy intake will tend to increase, rather than reduce, measurement error in nutrient intake. Day et al. claim that only the doubly labelled water technique for energy intake and the 24-h urine for nitrogen and potassium are acknowledged biochemical indicators that are sufficiently accurate in themselves to act as independent reference biomarkers. Were this to be true, opportunities for validation studies would be severely limited because the 24-h urine methods provide only measures of crude intakes, which Day et al. agree are not of primary interest, and the DLW method is extremely expensive, costing roughly $1000 per analysis. Also, substantial between- and within-laboratory errors exist in DLW measurements5 and, to my knowledge, appropriate studies of reproducibility over time in the same individuals have not been published.
Fortunately, other biomarkers, such as nutrient levels measured in blood or adipose, can be informative for assessing validity or relative validity among methods, even though they may not provide quantitative measures of intake. These concentration biomarkers are generally more biologically relevant than measures of urinary excretion because they more directly reflect the nutrient levels to which internal organs are exposed. As an example of their application, using data from the study described by Day et al.,2 Michels et al.6 have shown that the energy-adjusted vitamin C intakes from their FFQ and diet records have similar correlations with long-term plasma vitamin C levels. We have also shown that energy-adjusted intakes of carotenoids and polyunsaturated fat from a 1-week diet record and from an FFQ provide similar correlations, respectively, with plasma carotenoids and adipose polyunsaturated fatty acid concentrations.7,8 These observations have important practical implications because the cost of collecting dietary data by FFQ is several orders of magnitude lower than by diet records, which enables the conduct of much larger studies and collection of repeated measures of intake over time. Because repeated measures over time can account for true changes in intake and also reduce random errors, they can substantially improve the assessment of long term intake compared to a single measure.9 We have also shown that even a biological measurement that is not uniquely responsive to a dietary factor, such as fasting blood triglyceride level which is reduced by higher intake of dietary total fat, can be useful in assessing the validity of a dietary assessment.10 In this example, the regression coefficient relating fasting triglyceride level to percentage of energy from fat was at least as strong as predicted by controlled feeding studies. This provides objective support for the validity of the questionnaire for measuring dietary fat, and also indicates that the validity of the questionnaire for measuring fat intake has not been seriously overestimated in comparisons with diet records due to correlated errors between methods.
In summary, Day et al.2 express an overly narrow view of methods for assessing validity of dietary methods. I welcome future work by their group that focuses on the validity of energy-adjusted nutrients, which we agree are most directly relevant to epidemiological applications.
References
1 Willett WC, Howe GR, Kushi LH. Am J Clin Nutr 1997;65(Suppl): 1220S28S.[Abstract]
2
Day NE, McKeown N, Wong MY, Welch A, Bingham S. Epidemiological assessment of diet: a comparison of a 7-day diary with a food frequency questionnaire using urinary markers of nitrogen, potassium and sodium. Int J Epidemiol 2001;30:30917.
3
Willett W. Commentary: Dietary diaries versus food frequency questionnairesa case of undigestible data. Int J Epidemiol 2001;30:31719.
4 Willett W. Nutritional Epidemiology. 2nd Edn. New York: Oxford University Press, 1998.
5 Roberts SB, Dietz W, Sharp T, Dallal GE, Hill JO. Multiple laboratory comparison of the doubly labeled water method. Obes Res 1995; 3(Suppl.):314.[ISI][Medline]
6 Michels KB. Multivariate Diet Models and the Effect of Correlated Errors. Presented at the Congress of Epidemiology, Toronto, Canada, 2001.
7 Hunter DJ, Rimm EB, Sacks FM et al. Comparison of measures of fatty acid intake by subcutaneous fat aspirate, food frequency questionnaire, and diet records in a free-living population of US men. Am J Epidemiol 1992;135:41827.[Abstract]
8 Ascherio A, Stampfer MJ, Colditz GA, Rimm EB, Litin L, Willett WC. Correlations of vitamin A and E intakes with the plasma concentrations of carotenoids and tocopherols among American men and women. J Nutr 1992;122:1792801.[ISI][Medline]
9 Hu FB, Sampson LA, Stampfer MJ et al. A Validation Study of Repeated Measurement of Diet Through Food Frequency Questionnaire in Assessing Long-Term Diet Among Female Nurses. Presented at the Fourth International Conference on Dietary Assessment Methods, Arizona, USA, 2000.
10
Willett WC, Stampfer M, Chu N-F, Spiegelman D, Holmes M, Rimm E. Assessment of questionnaire validity for measuring total fat intake using plasma lipid levels as criteria. Am J Epidemiol 2001;154:110712.