1 Unit of Nutrition and Cancer, International Agency for Research on Cancer, Lyon, France
2 National Institute of Public Health and the Environment (RIVM), Bilthoven, The Netherlands
3 Cancer Research UK, Radcliffe Infirmary, University of Oxford, Oxford, United Kingdom
4 University of Athens Medical School, Athens, Greece
5 Institute of Community Medicine, University of Tromsø, Tromsø, Norway
6 Epidemiology Department, Murcia Health Council, Murcia, Spain
7 INSERM, XR 251, Institut Gustave Roussy, Villejuif, France
8 Department of Nutritional Research, University of Umeå, Umeå, Sweden
9 Molecular and Nutritional Epidemiology Unit, Centro per lo Studio e la Prevenzione Oncologica, Scientific Institute of Tuscany, Florence, Italy
10 German Institute of Human Nutrition, Potsdam-Rehbrücke, Germany
11 Department of Clinical Epidemiology, Aalborg Hospital, Aarhus University Hospital, Aalborg, Denmark
Correspondence to Dr. Pietro Ferrari, Unit of Nutrition and Cancer, International Agency for Research on Cancer, 150 cours Albert-Thomas, 69372 Lyon Cedex 08, France (e-mail: Ferrari{at}iarc.fr).
Received for publication August 24, 2004. Accepted for publication April 27, 2005.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
carotenoids; diet; epidemiologic methods; multicenter studies; random-effects model
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the present paper, taking advantage of the uniqueness of the European Prospective Investigation into Cancer and Nutrition (EPIC) data, the individual and the aggregate correlation coefficients are estimated through a random-effects (multilevel) regression model. This approach investigates in detail the source of variability of the observed variables. The associations are explored between plasma levels of selected carotenoids (beta-cryptoxanthin, lycopene, and alpha-carotene) and intakes of specific fruit and vegetables (total fruits, tomatos, and carrots) in a large cross-sectional subsample of plasma levels within the EPIC cohort. These carotenoids were chosen because they have been reported to be more closely correlated to specific fruit and vegetable intakes than other carotenoids in the EPIC study (4).
Dietary intake was assessed by using a dietary questionnaire (DQ), collected for all study participants to assess individual habitual, long-term intake levels, and a single, highly standardized 24-hour dietary recall (24-HDR) of actual food consumption during the previous day. Blood samples were taken from most subjects when the DQ was collected.
Here, we report estimates of within- and between-region variance regarding intakes of dietary items and plasma carotenoid levels, as well as the between- and within-group regression coefficients from a random-effects model, with plasma level as the dependent variable regressed on the dietary measurements. We also present estimates of between- and within-group correlation coefficients, as well as total correlation between plasma carotenoid levels and dietary measurements. Moreover, since the variables examined are prone to measurement error, we discuss the effects of measurement error on individual- and group-level correlation coefficients when using the two types of dietary assessments.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Twenty-three research centers in 10 European countries are participating in the study, which is coordinated by the International Agency for Research on Cancer in Lyon, France. Collection of data and blood samples started in 1992. A subsample of 37,000 participants also provided dietary information by means of 24-HDR records for calibration purposes (6). EPIC is unique in that it is one of the largest studies to combine information on diet and lifestyle obtained by means of questionnaires with a biorepository of blood samples collected from 386,080 subjects.
Study population
The study population has been described elsewhere (4). In brief, a subsample of subjects from the following 16 geographic areas (regions) was designated: France (Paris and the surrounding area), Florence (central Italy), Varese/Turin (northern Italy), Ragusa/Naples (southern Italy), northern Spain (San Sebastian, Pamplona, Oviedo), Granada (southern Spain), Murcia (southeastern Spain), Cambridge (subjects living in Norfolk, United Kingdom), Oxford (vegetarians living throughout the United Kingdom), the Netherlands (subjects from Utrecht, Amsterdam, Maastricht, and Doetichemand), Athens (Greece), Heidelberg (southwest Germany), Potsdam (in the former East Germany), Malmö (southern Sweden), Umeå (northern Sweden), and Denmark (including Aarhus and Copenhagen). In each of these regions, 100 women and 100 men (except in the French cohort, which included only women) were randomly selected from the subjects participating in the EPIC calibration substudy involving 512 percent of the EPIC cohort (6
).
In total, 3,089 subjects were selected to participate in the study. After we excluded those affected by one or more of the following problemsmissing plasma aliquots, laboratory and technical issues, missing carotenoid levels, and missing dietary informationdata on 2,910 subjects (1,397 men and 1,513 women) for whom DQ and 24-HDR measurements were available were retained for statistical analyses.
Further details of the sampling scheme, the characteristics of the study subjects, and the exclusion criteria have been described elsewhere (4).
Dietary variables
Information on individual dietary intake is described in detail elsewhere (6, 7
). In brief, usual dietary intake was assessed by DQ at baseline for each subject entering the EPIC cohort. Countries differed in the type of validated questionnaire they used; some used extensive DQs, others semiquantitative questionnaires, and others a mixed method combining a semiquantitative food frequency questionnaire and 14-day record. Some of these were self-reported, while others were obtained through interview. The variables expressing fruit and vegetable intakes were calculated as continuous by multiplying the frequency of intake by the portion size for each dietary item using a common classification and food definition across countries (7
).
A single 24-HDR measurement, with highly standardized interviews, was used across all EPIC countries (6). Trained dietitians conducted all interviews face-to-face. More details on the concept of standardization and structure of EPIC-SOFT (8
) and the distribution of fruit and vegetable intake among participating countries are described in detail elsewhere (8
10
).
Of the study subjects, 89 percent of the men and 85 percent of the women selected for the present analyses had their blood collected in the same season as the DQ assessment. However, lower agreement was registered regarding season of the 24-HDR (44 percent for the men and 47 percent for the women).
Statistical model
To compute the between- and within-region correlation coefficient between biomarker plasma carotenoid levels (Y) and dietary intake (X), estimated from 24-HDR or DQ measurements, while adjusting for the effect of a confounding factor Z, we considered the following random-effects (multilevel) regression model:
![]() | (1) |
This model has fixed-effect parameters (, ßB, ßW,
B,
W) and three random effectsu0j, u1j, and
ijwhich give rise to four variance components reflecting, respectively, variation between groups (
), variability in the coefficients across groups (
), their covariance (
B,ßW), and within-group variation among study subjects (
2). Multilevel models provide the opportunity to model within- and between-group effects simultaneously (2
, 3
, 12
).
In this context, it is possible to partition the overall correlation coefficient into the between- and within-group components, making use of group-specific and individual data, respectively, as extensively discussed by Snijders and Bosker (3
). The between- and within-group correlation coefficients can be estimated (3
, 13
) as follows:
![]() | (2) |
![]() | (3) |
![]() | (4) |
Expression 3 assumes that the within-group relation between Y and X is homogeneous across groups. If this condition does not hold, the parameter that expresses the within-group relation cannot be summarized by a fixed-effects term. In multilevel models, the heterogeneity of effects across groups can be modeled by using a random effect for the coefficient ßW. In this way, estimation of ßW in model 1 takes into account the variability of effects across groups.
With a random-effects model, efficient point estimates for ßW and ßB can be obtained, and the associated standard errors are unbiased estimates because the variation among groups is explicitly modeled in model 1 (12). In addition, by following this approach, it is possible to estimate the variance components of Y and X once the effect of Z has been taken into account, through model 4. Similar to expression 2, the overall correlation coefficient can be expressed as
![]() | (5) |
Moreover, it is well known that the overall regression coefficient is a function of the between- and within-group components of model 1, through the expression
![]() | (6) |
![]() |
The total correlation coefficient can be estimated, given expressions 5 and 6 (refer to the Appendix for computational details), as
![]() | (7) |
In this paper, we present results on the association between alpha-carotene and carrot intake, lycopene and total tomato intake (tomato and tomato products), and beta-cryptoxanthin and total fruit intake. Adjustments for energy intake, age, and body mass index were performed, but results were rather similar. In addition, in consideration of the relatively low agreement observed between season of blood collection and the 24-HDR interviews, we attempted to adjust for season but did not produce any differences in results. To take into account the specificity of dietary measurement variability and accuracy across genders (13), analyses for men and women were performed separately. Given that results were rather similar across genders, gender-adjusted findings are presented throughout. All biomarker and dietary measurements were log-transformed to improve normality of the observed distributions. Random-effects models were fitted by using the SAS PROC MIXED procedure (14
). Parameter estimates were obtained by restricted maximum likelihood estimation. Significance of the variance components for the within-region coefficient in model 1 was tested by comparing the difference in 2 restricted maximum likelihood estimation log-likelihood of a model with the random effect and a model without it with a chi-square distribution with two degrees of freedom (15
). A "sandwich" estimator was used to compute corrected standard errors involving fixed-effects parameters, which are robust against model misspecification (16
18
). Statistical significance was assessed at the 5 percent level.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
Overall, the between-region regression coefficients were consistently higher than their within-region counterparts.
The variance components associated with the within-region regression coefficients were statistically significantly different from zero for beta-cryptoxanthin and fruits for both the DQ and the 24-HDR, and for alpha-carotene and carrots in the DQ. In all other instances examined, results suggest that the power of dietary variables in predicting the variability of biomarkers levels is rather homogeneous across regions.
The between-region correlation coefficients were higher than the within-region values for both the DQ and the 24-HDR, and 24-HDR values were higher than DQ values for all three carotenoids. For beta-cryptoxanthin and fruit intake, the respective between-group correlations versus the within-group correlations were 0.783 and 0.256 for the DQ and 0.852 and 0.194 for the 24-HDR. The total correlation coefficient values were closer to the within-region coefficient values. For beta-cryptoxanthin and fruits, and for lycopene and tomato intake in the DQ measurements, intermediate values were observed for the total correlation as a result of appreciable variations at the aggregate levels in both plasma levels and dietary intakes.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the EPIC study, country-specific DQs (5, 7
), not standardized across countries, were used. They therefore very likely contain measurement errors varying in both nature and magnitude across countries. Systematic between-country measurement errors are minimized by using a calibration method (6
, 20
, 21
) to express dietary measurements on a common scale and to correct for bias in the diet-disease relation due to measurement errors in DQ measurements. In the EPIC study, this process is achieved by comparing the DQ with a single 24-HDR measurement used as a reference in a calibration subsample. It is assumed that errors in 24-HDR measurements are strictly random, distributed around individuals' true estimates of mean intake. With one replicate per subject available, 24-HDR measurements are expected to provide correct estimates of intake at the aggregate levels, while within-person, day-to-day variability is very likely to inflate the observed distributions.
In our analysis, the within-region variance components of the 24-HDR showed greater variation than their DQ counterparts, and the resulting ICC values were therefore consistently lower for 24-HDR than for DQ measurements, particularly for tomato intake. In addition, within-region variability, for both the 24-HDR and DQ, overall was higher than the aggregate counterparts. This finding suggests that, although some of the within-region variability is likely to be the result of measurement errors, a sizable part of the overall variability comes from the individual component.
This analysis showed that within-region correlations were consistently lower for 24-HDR compared with DQ measurements. However, especially for total fruit and tomato intake, correlation coefficients at the individual level for the 24-HDR were rather close to within-region DQ values, thus suggesting that a single 24-HDR record for assessment of intake of some specific fruits and vegetables in Europe can be relatively valid for actual average intake. As expected, between-region correlation coefficients were higher for standardized 24-HDR than for DQ measurements, with the exception of carrot intake, where DQ measurements showed better agreement with plasma alpha-carotenoid levels at the ecologic level. Exclusion of nonconsumers of carrots, defined according to DQ measurements, as well as adjustment for season, did not change these findings. The high correlation values observed from the between-region component suggest that biomarker levels, as well as dietary measurements, perform fairly accurately in discriminating population-level consumption. Given the relatively large sample size in each region, the reliability of aggregated information is higher than the reliability of individual measurements (3, 12
), which are subject to random and systematic measurement error.
Overall, the total correlation coefficients were closer to the within-region coefficients, especially for 24-HDR measurements, as a result of moderate ICC values observed for plasma biomarker levels and dietary intakes. The total correlations were consistently higher for DQ than for 24-HDR measurements. These results are partially driven by the higher ICC values observed for the DQ, determined by the great variation at the individual level observed by using one single 24-HDR measurement per subject.
For a comprehensive understanding of these results, note that it is not realistic to assume that between- and within-region variation in dietary intake of specific fruits and vegetables can account for all of the variability observed in plasma carotenoid levels because the sources of these carotenoids overlap widely between different fruits and vegetables. Throughout our study, it was assumed that the differences in bioavailability and absorption of carotenoids are random between subjects and did not bias the levels of carotenoids from the different countries in one direction or the other. However, we cannot exclude the possibility of bias introduced by difference in the bioavailability of carotenoids.
The approach used in these analyses is of great importance in investigating the association between plasma levels and specific dietary variables by disentangling the within- and between-region components of the correlation coefficient and evaluating the contribution of each component to the overall coefficient. In addition, with this approach, it is possible to investigate the sources of variability in a quantity of interest and to explore the association between two variables by evaluating the predictive power of one (independent) variable on the total variability of another (dependent) variable.
Individual and aggregate associations have different interpretations (19, 21
, 22
). At the individual level, the association between dietary and biomarker measurements reflects the metabolic absorption, transport, and excretion of fruit and vegetable components related to the quantification of subjects' carotenoid levels in blood. At the aggregate level, this association is the result of ecologic agreement, without inference about the possible biologic mechanism that drives the individual-level association. Although the association between two variables is supposed to be rather similar at the aggregate and individual levels, this is often not the case because several intervening factors, particularly related to exposure assessment, can make the two associations rather heterogeneous (23
).
Taking advantage of the uniqueness of the EPIC data, where the variability of data has a within- and a between-populations component and where the three main dietary assessment methods were used (DQ, 24-HDR, and plasma carotenoid level), we were able to highlight how these different correlation components can influence the overall correlation. In our analyses, by evaluating aggregate and individual variability of exposure and deeply investigating the characteristics of the dietary instruments used as well as their differing accuracy in providing aggregate versus individual-level estimates of intake, we attempted to take into account multiple operating forces to better understand the observed heterogeneity.
In our example, we presented the components of variability and the associations between two variables at different levels of evidence, and we showed that the correlation between carotenoids and intake of certain fruits and vegetables is reasonably good. As observed previously (21, 24
), results from the present analysis suggest that the between-region variability in exposure provides a reliable amount of information that could be exploited in EPIC when evaluating the association between dietary exposure and disease, in a framework in which both individual and aggregate levels of evidence are estimated simultaneously.
To provide more accurate estimates of the association of fruit and vegetables and/or carotenoid intake with disease outcomes, ongoing research is focusing on the use of biomarkers to complement questionnaires (25). The accuracy and reliability of using objective biomarkers of diet, as well as the definition of appropriate statistical methodology to make use of these measurements in a disease model (26
), are likely to improve our understanding of diet-disease associations.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|