Measurement of Fruit and Vegetable Consumption with Diet Questionnaires and Implications for Analyses and Interpretation

Karin B. Michels1,2,3, Ailsa A. Welch3, Robert Luben3, Sheila A. Bingham3,4 and Nicholas E. Day3

1 Obstetrics and Gynecology Epidemiology Center, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
2 Department of Epidemiology, Harvard School of Public Health, Boston, MA
3 Strangeways Research Laboratory, Institute of Public Health, University of Cambridge, Cambridge, United Kingdom
4 Medical Research Council Dunn Human Nutrition Unit, Cambridge, United Kingdom

Correspondence to Dr. Karin Michels at the Strangeways Research Laboratory, Institute of Public Health, University of Cambridge, Worts Causeway, Cambridge CB1 8RN, United Kingdom, or the Obstetrics and Gynecology Epidemiology Center, Brigham and Women's Hospital, 221 Longwood Avenue, Boston, MA 02115 (kmichels{at}rics.bwh.harvard.edu).

Received for publication August 13, 2004. Accepted for publication January 11, 2005.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Measurement error can have an important impact on the estimation of the true relation between diet and disease. The authors examined the performance of models regressing plasma vitamin C level on fruit and vegetable consumption and the effect of categorization of fruit and vegetable consumption on the association with plasma vitamin C. They used diet information reported by 4,487 participants in the Norfolk, United Kingdom, portion of the European Prospective Investigation into Cancer and Nutrition by means of a 7-day diet diary and a food frequency questionnaire (FFQ) (1993–1998). The authors found substantial differences in mean fruit and vegetable consumption assessed by the two diet instruments. Consumption estimated with the FFQ was about twice as high as that obtained with the 7-day diary, and the ranking of individuals according to estimates of fruit and vegetable consumption from the 7-day diary and the FFQ differed substantially. When fruit and vegetable consumption were categorized into quintiles, the two questionnaires produced similar associations of relative intake with plasma vitamin C, but estimation of the association of absolute intake with plasma vitamin C differed.

bias (epidemiology); data collection; diet; food; nutrition assessment; questionnaires; regression analysis


Abbreviations: EPIC, European Prospective Investigation into Cancer and Nutrition; FFQ, food frequency questionnaire


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
We recently reported the distorting impact of correlated measurement error on multivariate models of diet (1Go). Self-reported dietary intake is affected by measurement error (2Go). Errors in reporting of individual food items are correlated within a diet assessment instrument, particularly on an instrument with prespecified food lists. We observed that this correlated error may lead to distorted estimates in regression models that include several dietary predictors (1Go). We found that spurious associations can be introduced, resulting in a misleading interpretation of the true diet-disease relation.

Our previous observations were based on nutrients, which are additionally correlated by being derived in part from the same foods. In the current paper, we examine the behavior of two food groups, fruit and vegetables, whose intake—and hence the error in their assessment—is correlated in most people. We were interested in exploring whether inclusion of these food groups plus total caloric intake in a model adequately captured their association with plasma levels of vitamin C.

Vitamin C is an essential nutrient that is circulated in the bloodstream. Vitamin C intake is the strongest predictor of plasma levels of vitamin C, and about 90 percent of dietary vitamin C in Western diets comes from consumption of fruits and vegetables, mainly citrus fruits and juices, green vegetables, tomatoes, and potatoes (3Go). Hence, plasma vitamin C is a biomarker of both fruit and vegetable consumption and vitamin C intake (4Go, 5Go). Absorption and clearance of vitamin C, as well as smoking habits, infections, and inflammation, affect plasma levels of vitamin C.

It is customary in analyses of epidemiologic data to compare outcomes among individuals with extreme consumption of specific food items of interest (i.e., comparing the highest and lowest categories of intake). It has been assumed that the ranking of individuals according to their levels of intake is preserved despite measurement error in diet assessment. We considered the effect of categorization of fruit, vegetables, and energy intake on the association with plasma vitamin C.

The European Prospective Investigation into Cancer and Nutrition (EPIC) in Norfolk, United Kingdom (EPIC-Norfolk), provides unique data on self-reported diet assessed with both a 7-day diary diet record and a food frequency questionnaire (FFQ) and plasma levels of vitamin C obtained from 4,487 women and men. This allowed us to compare the performance of these two assessment instruments in relation to the biomarker.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
EPIC-Norfolk
EPIC-Norfolk is a prospective population-based study of 30,445 women and men aged 45–74 years in 1993 and residing in Norfolk, United Kingdom (6Go). Participants completed a baseline health and lifestyle questionnaire (n = 30,414), a 24-hour dietary recall (n = 30,414), and an FFQ (n = 25,351), and 25,637 attended a clinic visit between 1993 and 1998. The FFQ was sent to participants with their health-check invitation. They were asked to complete the FFQ and bring it to the clinic visit. At this baseline visit, 24,146 women and men provided a blood sample. Participants were instructed in how to maintain a 7-day diary, and 24,983 women and men returned a completed 7-day diary. Thus, the plasma measurement was close to the time of completion of the 7-day diary.

Diet assessment
7-day diet diary.
At the clinic visit, trained nurses, using the participant's diet of the previous day as an example, taught participants how to fill in the diary. Participants completed the second and subsequent 5 days of the diary at home, recording in as much detail as possible all foods and beverages they had consumed. The 7-day diary booklets included colored photographs of 17 foods, each with three different portion sizes to help participants estimate the portion size consumed. The diaries were mailed back to the coordinating center at the University of Cambridge. Diary data were coded and analyzed with a specially developed program for extraction of average daily nutrient intakes (7Go, 8Go).

Food frequency questionnaire.
The self-administered semiquantitative FFQ was designed to measure the average consumption of 130 food items during the year preceding the baseline health check. The questionnaire was based on the FFQ developed by Willett et al. (2Go, 9Go) and adapted as previously described (7Go, 10Go). For each food item, participants were asked to indicate their usual consumption from one of nine frequency categories ranging from "never or less than once per month" to "six or more times per day." The FFQ did not include specific questions on portion size but rather specified average portions and unit sizes (e.g., piece, slice) or household units (e.g., glass, cup, spoon). Nutrient intake was calculated with a specially developed program (11Go).

Biomarker
Plasma vitamin C level was measured from blood samples taken in citrate-covered tubes by venipuncture. Plasma was stabilized in a standardized volume of metaphosphoric acid and stored at –70°C. The plasma concentration of vitamin C was measured with a fluorometric assay within 1 week of sampling (12Go). The coefficient of variation ranged from 4.6 percent to 5.6 percent across the distribution of plasma vitamin C concentrations.

Statistical analysis
EPIC-Norfolk participants were included in this analysis if they had plasma measures for vitamin C and if information on fruit and vegetable consumption from both dietary instruments was computerized (n = 5,067). Regular users of vitamin supplements containing vitamin C were excluded from this analysis (n = 446). We also excluded participants whose measured plasma levels of vitamin C were below the first percentile or above the 99th percentile (n = 86) and participants whose total caloric intake as assessed by either instrument was below 500 kcal or above 4,200 kcal (n = 14).

The distribution of individuals in quintiles of fruit consumption, vegetable consumption, and total caloric intake according to the 7-day diary and FFQ values was cross-tabulated. Agreement was evaluated using a weighted kappa statistic, which considers disagreement close to the diagonal less heavily than disagreement further from the diagonal.

Linear regression was used to model the association between plasma vitamin C levels and self-reported consumption of fruits and vegetables as assessed by 7-day diary and FFQ. We considered fruit consumption and vegetable consumption separately in our analyses and created a combined variable summing data across the consumption of fruits and vegetables. To permit comparability between the two diet-assessment instruments, we standardized fruit and vegetable consumption and energy intake by dividing them by their respective standard deviations. Regression coefficients for standardized dietary variables resulting from the linear regression models are interpretable as the µmol/liter change in vitamin C plasma levels per one-standard-deviation change in fruit, vegetable, or energy intake. Fruit and vegetable consumption was also modeled per 100 g of consumption of the food.

We also included quintiles of the fruit, vegetable, and energy intake variables in our models to explore how categorization affected the association with the dependent variable given the differences in classification of fruit and vegetable consumption obtained with the two diet assessment instruments. Note that creating quintiles is a form of standardization, since we compare the second, third, fourth, and highest quintiles of estimated intake with the lowest quintile as estimated with either instrument.

Regression models were adjusted for potential confounders assessed concurrently: age, sex, body mass index (weight (kg)/height (m)2), height, and current smoking. Participants with missing values for any of the covariates were excluded from this analysis (n = 34). This left a study population of 4,487 EPIC-Norfolk participants.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Of the 4,487 participants in EPIC-Norfolk included in this analysis, 2,337 were women and 2,150 were men. At baseline in 1993, participants were, on average, age 62.2 years and had an average body mass index of 26.4; 9.6 percent were current smokers (table 1). The mean plasma concentration of vitamin C was 51.4 µmol/liter (standard deviation, 18.1). Vegetable consumption reported on the FFQ was, on average, more than double that reported in the 7-day diary (207.5 g/day vs. 96.5 g/day); fruit consumption reported on the FFQ was also considerably higher than that reported in the 7-day diary (241.8 g/day vs. 150.0 g/day). The combined consumption of fruits and vegetables was reported as 249.1 g/day in the 7-day diary and as 453.0 g/day on the FFQ. The quintile means of fruit consumption and vegetable consumption differed considerably for the 7-day diary and the FFQ.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Baseline characteristics of 4,487 participants (2,337 women and 2,150 men) in the European Prospective Investigation into Cancer and Nutrition, Norfolk, United Kingdom, 1993–1998

 
The Pearson coefficient for correlation between self-reported fruit consumption and vegetable consumption in the 7-day diary was 0.24; the respective correlation coefficient for the FFQ foods was 0.32 (table 2). The correlation between the two methods of estimating fruit consumption was 0.57, and it was 0.33 for vegetable consumption. Correlations of fruit consumption and vegetable consumption with total energy intake were very low and were slightly inverse for FFQ values. Partial correlations adjusted for energy intake, age, sex, body mass index, height, and smoking status were marginally lower than simple correlations.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Pearson correlation coefficients and partial correlation coefficients for daily fruit and vegetable consumption (not standardized) as calculated from a 7-day diet diary and a food frequency questionnaire among 4,487 women and men, European Prospective Investigation into Cancer and Nutrition, Norfolk, United Kingdom, 1993–1998

 
A cross-tabulation of individuals allocated to quintiles of fruit consumption and vegetable consumption assessed with the two instruments provides a more detailed comparison of their distribution (table 3). For fruit consumption, 39 percent of participants were in the same categories for both instruments; for vegetable consumption, 29 percent were in the same categories; and for energy intake, 32 percent were in the same categories. For fruit consumption, 61 percent of participants were misclassified by one or more quintiles; for vegetable consumption, 71 percent; and for total caloric intake, 68 percent. For fruit consumption, 22 percent were misclassified by two or more quintiles; for vegetable consumption, 35 percent; and for total caloric intake, 32 percent. The weighted kappa coefficient was 0.44 for fruit consumption, 0.23 for vegetable consumption, and 0.29 for energy intake.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Cross-tabulation of classification of 4,487 participants according to their daily fruit (not standardized), vegetable (not standardized), and energy intake as calculated from a 7-day diet diary and a food frequency questionnaire, European Prospective Investigation into Cancer and Nutrition, Norfolk, United Kingdom, 1993–1998

 
Results from linear regression models regressing plasma vitamin C level on self-reported fruit and vegetable consumption are provided in table 4. Standardized fruit consumption and standardized vegetable consumption assessed with both instruments were strongly related to plasma vitamin C levels, with marginally stronger associations being seen with 7-day diary values (table 4, models 2 and 8). Including both fruit consumption and vegetable consumption in the same model changed regression coefficients substantially, indicating mutual confounding (model 14).


View this table:
[in this window]
[in a new window]
 
TABLE 4. Results from univariate and multivariate linear regression models regressing plasma vitamin C level on daily fruit and vegetable consumption (standardized or per 100 g) as calculated from a 7-day diary and a food frequency questionnaire among 4,487 women and men, European Prospective Investigation into Cancer and Nutrition, Norfolk, United Kingdom, 1993–1998

 
When plasma vitamin C level was regressed on absolute consumption of fruit and of vegetables measured in grams per day, differences in measurement of food assessment had a more pronounced impact on the regression coefficient (models 3 and 9). An increase in fruit consumption of 100 g/day was associated with an increase in plasma vitamin C levels of 4.9 µmol/liter according to assessments from the 7-day diary and with an increase of 2.6 µmol/liter according to assessments from the FFQ (model 3). Similarly, an increase in vegetable consumption of 100 g/day was associated with an increase in plasma vitamin C levels of 5.7 µmol/liter according to assessments from the 7-day diary and with an increase of 3.0 µmol/liter according to assessments from the FFQ (model 9). When the combined intake of fruit and vegetables was considered, results were similar (models 20 and 21).

Adding total caloric intake to the model did not affect estimates for dietary variables obtained with the 7-day diary, while regression coefficients for foods from the FFQ were marginally altered (models 4, 10, 16, and 17). Adding energy intake had a larger impact on coefficients of vegetable consumption (models 10, 16, and 17) than on coefficients of fruit consumption (models 4, 16, and 17). While energy intake calculated from either the 7-day diary or the FFQ was not related to plasma vitamin C levels in a univariate model (model 1), significant inverse associations emerged when energy was added to models with fruit and/or vegetable intake reported on the FFQ, but not for those reported in the 7-day diary (models 4, 5, 10, 11, 16, 17, and 22).

When fruit and vegetable consumption was categorized into quintiles of intake, the association with plasma vitamin C persisted (models 6, 12, and 18). Inclusion of quintiles of fruit consumption, vegetable consumption, and caloric intake in one model produced similar associations between fruit and vegetable consumption and plasma vitamin C for both the 7-day diary and the FFQ, but caloric intake was significantly related to plasma vitamin C only if caloric intake was estimated from the FFQ (model 19).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 
Using data obtained from 2,337 women and 2,150 men participating in EPIC-Norfolk, we explored empirically in an analytic regression model the behavior of two food groups that share measurement error components. We also compared the performance of the two food groups as estimated with a 7-day diary and an FFQ in an analytic model.

Our analytic model regressed plasma vitamin C level on fruit consumption and vegetable consumption. Plasma vitamin C has been found to be the biomarker with the strongest relation to fruit and vegetable consumption (13Go, 14Go).

We found an approximate twofold difference in fruit consumption and vegetable consumption estimated from the two diet assessment instruments. Fruit and vegetable consumption reported in the 7-day diary was approximately consistent with reports of an average intake of three servings per day by the United Kingdom population (15Go). A 24-hour dietary recall administered in a subgroup of the United Kingdom EPIC population found a mean fruit consumption of about 160 g/day (172.9 g among women and 148.7 g among men) and a mean vegetable consumption of about 160 g/day (165.4 g for women and 157.4 g for men) (16Go). Hence, it is possible that fruit and vegetable consumption is overreported on the FFQ and captured more accurately by the 7-day diary, but we cannot be certain which instrument provides the more accurate assessment. Furthermore, since the 7-day diary was proximal to the time of blood drawing and recorded fruit and vegetable consumption during a 1-week period while the FFQ asked the respondent to recall habitual diet during the year prior to blood drawing, the 7-day diary intake levels would be expected to be more closely correlated with plasma vitamin C levels than the FFQ intake levels. It is unlikely, however, that the population mean in fruit and vegetable consumption decreased considerably during this 1-year time interval.

A few studies have attempted to validate self-reported intake of foods. The validity of food intake measurements obtained by means of an FFQ was evaluated among 173 participants in the Nurses' Health Study (17Go) and among 127 participants in the Health Professionals' Follow-up Study (18Go) by comparison with reports from 7-day diaries. In both studies, self-reported consumption of fruits and vegetables was higher on the FFQ than in the 7-day diary.

In the present study, correlations between fruit and vegetable consumption from the FFQ were higher than those from the 7-day diary, probably indicating a higher degree of correlated error in the FFQ values. The change in regression coefficients for FFQ-derived foods when energy was included in the model also suggests a higher error correlation for FFQ foods.

The measurement error in the assessment of fruit and vegetable consumption also introduced substantial differences in classification of individuals into categories of intake. When individuals were grouped in quintiles of intake, their ranking differed according to whether intake values from the 7-day diary or the FFQ were considered. For a substantial proportion of the population, classification differed by more than one quintile.

When quintiles of fruit and vegetable consumption were included in a linear regression model, regression coefficients were similar for intakes estimated with the two dietary assessment instruments. However, quintile mean values differed, leading to different interpretations of the comparisons made. Whereas comparing individuals in the highest quintile of vegetable consumption with those in the lowest quintile indicated an increase in plasma vitamin C levels of approximately 10 µmol/liter with both dietary instruments, the 7-day diary required an increase from an average of 25 g/day to 195 g/day, whereas the FFQ required an increase of 86 g/day to 369 g/day. Respective dietary recommendations based on the FFQ would prescribe twice the level of consumption of fruit and vegetables as the 7-day diary to achieve a comparable health benefit (e.g., a change in plasma vitamin C level translating into reduced mortality and/or morbidity).

In our previous analyses, error correlations between nutrients derived from the same questionnaire distorted estimates in a statistical model (1Go). This distortion was particularly pronounced if nutrient values were untransformed and energy was entered into the model as a separate term. In the food model presented here, error correlations between foods on a questionnaire did not seem to result in distorted estimates in analytic models, even if energy was introduced as a separate term. Nutrients may be more affected than foods by correlated measurement error, since nutrients are additionally correlated through their shared food sources.

In summary, we found substantial differences in classification of fruit consumption and vegetable consumption assessed with a 7-day diary and an FFQ, with differences in the ranking of individuals according to their intakes estimated from the 7-day diary and the FFQ. The difference in ranking did not have a substantial impact on estimation of the effect of fruit and vegetable consumption when intake was categorized into quintiles, but the errors in the assessment of fruit and vegetable consumption with the 7-day diary and the FFQ resulted in different estimates of their effect size. We did not find distortions of effect estimates due to correlated errors in the estimation of fruit consumption, vegetable consumption, and energy intake.


    ACKNOWLEDGMENTS
 
Dr. Karin Michels was supported by Senior International Fogarty Fellowship F06 TW05568 from the US National Institutes of Health and by grant R01 DK 54900 from the US National Institute of Diabetes and Digestive and Kidney Diseases.

EPIC-Norfolk is supported by program grants from the Cancer Research Campaign and the Medical Research Council, with additional support from the British Heart Foundation, the Stroke Association, the United Kingdom Department of Health, the United Kingdom Food Standards Agency, the Europe Against Cancer Program, the World Health Organization, and the Wellcome Trust.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 References
 

  1. Michels KB, Bingham SA, Luben R, et al. The effect of correlated measurement error in multivariate models of diet. Am J Epidemiol 2004;160:59–67.[Abstract/Free Full Text]
  2. Willett WC. Nutritional epidemiology. 2nd ed. New York, NY: Oxford University Press, 1998.
  3. Jacob RA. Vitamin C. In: Shils ME, Olson JA, Shike M, et al, eds. Modern nutrition in health and disease. 9th ed. Philadelphia, PA: J B Lippincott Company, 1999:A-131–2.
  4. Kaaks RJ. Biochemical markers as additional measurements in studies of the accuracy of dietary questionnaire measurements: conceptual issues. Am J Clin Nutr 1997;65(suppl):1232S–9S.[Abstract]
  5. Bates CJ, Thurnham DI, Bingham SA, et al. Biochemical markers of nutrient intake. In: Margetts BM, Nelson M. Design concepts in nutritional epidemiology. 2nd ed. New York, NY: Oxford University Press, 1997:170–240.
  6. Day NE, Oakes S, Luben R, et al. EPIC-Norfolk: study design and characteristics of the cohort. Br J Cancer 1999;80(suppl 1):95–103.[ISI][Medline]
  7. Bingham SA, Welch AA, McTaggart A, et al. Nutritional methods in the European Prospective Investigation of Cancer in Norfolk. Public Health Nutr 2001;4:847–58.[ISI][Medline]
  8. Welch AA, McTaggart A, Mulligan AA, et al. DINER (Data into Nutrients for Epidemiological Research)—a new data-entry program for nutritional analysis in the EPIC-Norfolk cohort and the 7-day diary method. Public Health Nutr 2001;4:1253–65.[ISI][Medline]
  9. Willett WC, Sampson L, Browne ML, et al. The use of a self-administered questionnaire to assess diet four years in the past. Am J Epidemiol 1988;127:188–99.[Abstract]
  10. Bingham SA, Gill C, Welch A, et al. Comparison of dietary assessment methods in nutritional epidemiology. Br J Nutr 1994;72:619–42.[ISI][Medline]
  11. Welch AA, Luben R, Khaw KT, et al. The CAFÉ (Compositional Analyses from Frequency Estimates) program for nutritional analysis of the EPIC-Norfolk food frequency questionnaire: development and data issues. J Hum Nutr Diet 2005;18:1–18.[CrossRef][ISI][Medline]
  12. Vuilleumier J, Keck E. Fluorometric assay of vitamin C in biological materials using a centrifugal analyser with fluorescence attachment. J Micronutr Anal 1989;5:25–34.[ISI]
  13. Block G, Norkus E, Hudes M, et al. Which plasma antioxidants are most related to fruit and vegetable consumption? Am J Epidemiol 2001;154:1113–18.[Abstract/Free Full Text]
  14. John JH, Ziebland S, Yudkin P, et al. Effects of fruit and vegetable consumption on plasma antioxidant concentration and blood pressure: a randomised controlled trial. Lancet 2002;359:1969–74.[CrossRef][ISI][Medline]
  15. Ministry of Agriculture, Fisheries and Food. The Dietary and Nutritional Survey of British Adults—further analysis. London, United Kingdom: HMSO, 1994.
  16. Agudo A, Slimani N, Ocke MC, et al. Consumption of vegetables, fruit and other plant foods in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohorts from 10 European countries. Public Health Nutr 2002;5:1179–96.[CrossRef][ISI][Medline]
  17. Salvini S, Hunter DJ, Sampson L, et al. Food-based validation of a dietary questionnaire: the effects of week-to-week variation in food consumption. Int J Epidemiol 1989;18:858–67.[Abstract]
  18. Feskanich D, Rimm EB, Giovannucci EL, et al. Reproducibility and validity of food intake measurements from a semiquantitative food frequency questionnaire. J Am Diet Assoc 1993;93:790–6.[CrossRef][ISI][Medline]