1 Unit of Nutrition and Cancer, International Agency for Research on Cancer, Lyon, France.
2 Hormones and Cancer Group, International Agency for Research on Cancer, Lyon, France.
3 Institute of Public Health, University of Cambridge, Strangeways Research Laboratory, Cambridge, United Kingdom.
4 SERC, Institut Català dOncologia, Barcelona, Spain.
5 Centre for Information Technology and Methodology, National Institute for Public Health and the Environment, Bilthoven, the Netherlands.
6 Cancer Research UK Epidemiology Unit, Oxford University, Oxford, United Kingdom.
7 Division of Clinical Epidemiology, German Cancer Research Centre, Heidelberg, Germany.
8 German Institute of Human Nutrition, Potsdam-Rehbrücke, Germany.
9 E3N-EPIC, INSERM Z521, Institut Gustave Roussy, Villejuif, France.
10 Department of Hygiene and Epidemiology, University of Athens Medical School, Athens, Greece.
11 Epidemiology Unit, Istituto Nazionale dei Tumori, Milan, Italy.
12 Institute of Community Medicine, University of Tromsø, Tromsø, Norway.
Received for publication October 2, 2003; accepted for publication April 8, 2004.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
calibration; diet; epidemiologic methods; multicenter studies
Abbreviations: Abbreviations: EPIC, European Prospective Investigation into Cancer and Nutrition; ICC, intraclass correlation coefficient.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Compared with ecologic studies, investigations that use individual-level information on dietary exposures and disease outcome have the advantage of allowing more complex analyses, with adjustments for potential confounding factors and evaluations of possible interactions between multiple exposures. However, a possible limitation is that random errors in assessing individuals habitual dietary intake can be so large as to virtually mask potentially existing associations with disease risk (3, 4). The impact of random measurement error on estimated measures of diet-disease associations depends on the variance of errors relative to the between-person variance of true intake levels (5).
For practical and logistic reasons, the increase in statistical power that may be obtained by increasing study size or reducing the average magnitude of random dietary assessment errors will generally be limited. However, an additional possibility for improving the statistical power is to increase the true overall heterogeneity of the dietary exposures studied, combining data from multiple studies conducted in populations with different dietary habits. This rationale was used to set up the European Prospective Investigation into Cancer and Nutrition (EPIC), a multicenter cohort study on diet and cancer conducted in 28 regional centers nested in 10 Western European countries with widely varying dietary practices and cancer risks (6).
In a multicohort study, the overall association between diet and cancer risk can be broken down into 1) within-cohort relations, which reflect the associations at the individual level in each of the cohorts; and 2) a between-cohort relation, which captures the association between exposure and disease risk at the aggregate level. For proper comparison, it is crucial that corrections be made for any between-cohort differences that may exist in systematic exposure measurement error as well as for attenuation biases in relative risk estimates within cohorts (7, 8). For this purpose, the EPIC project combined two types of dietary intake assessment: 1) for all 520,000 cohort participants, a food frequency questionnaire or modified dietary history questionnaire to assess individuals habitual, long-term intake levels; and 2) a single, highly standardized, 24-hour recall of actual food consumption during the previous day for a large representative subsample (about 37,000 subjects) of the entire EPIC cohort (9).
To adjust for between-cohort differences in systematic over- or underestimation in dietary intake measurements, and to correct for attenuation bias in relative risk estimates, a regression calibration approach can be used (7, 8, 10). With this approach, an (approximate) estimate can be obtained of within-cohort variation in true intake levels predicted by the individual-level questionnaire measurements (11). Furthermore, between-cohort variation in mean intake levels can be estimated from the population mean values of the 24-hour recalls.
In this paper, we propose the use of a multilevel calibration model that takes into account the two levels of evidence, within-cohort and between-cohort, and uses information at the individual and aggregate levels. Moreover, we report estimates of within- and between-cohort variation in intakes of total energy and macronutrients before and after regression calibration. We show that, after considering within-cohort attenuation effects, between-cohort differences provide a considerable amount of variation in intake levels. This observation may have important implications for the analysis of the EPIC study as well as for the design of other prospective studies.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Calibration model
We assumed the following measurement error model for the baseline dietary assessment measurements (Q) and for the 24-hour dietary recall measurement (R) in relation to the unknown true intake (T):
Q = Q + ßQT +
Q
R = T + R,
where E(R |T) = 0, E(
Q|T) = 0, var(
R) =
R2, and var(
Q) =
Q2. The association shown in model 1bfor reference measurement Rassumes that errors are strictly random and that variation around individuals unknown true intake is totally due to within-person random variability or to random measurement errors in reporting individuals diet (i.e., cov(
R, T) = 0). For the questionnaire measurements Q, the coefficients
Q and ßQ in model 1a express constant and proportional scaling biases (14), whereas the residual term
Q models the random part of measurement errors in Q and is assumed to be uncorrelated with true intake level T (15).
To extend the calibration methodology introduced by Rosner et al. (10), a very general form of the EPIC calibration model can be defined as
where
j =
+ u0j,
Wj =
W + u1j
where j = 1 ... G indexes the group variable, be it the recruitment region, center, or country; i = 1 ... nj are individual measurements within group j; and are the group means. The coefficients
model the effect of those variables (Z) included as confounding factors, for example, age, body mass index, physical activity, and smoking status. Models 1a and 1b and model 2 assume independence between random errors in R and Q, which is unlikely to occur in self-reported measurements as a consequence of study subjects tendency to consistently under- or overestimate dietary intake (14, 1619). However, in the absence of a third objective measurement of intake, such as a biomarker, the assumption of independence between the errors in R and Q cannot be assessed. One single replicate of R allows a correct estimate of the attenuation factors (11) to be obtained in model 2. This model has fixed-effect parameters (
,
s, and
s) and three random effects, u0j, u1j, and
ij, which give rise to four variance components reflecting, respectively, variation between groups (
B2), variability in the calibration coefficients across groups (
2), their covariance (
B
), and within-group variation among study subjects (
2). In model 2,
Wj captures the within-group component of the calibration model, while
B reflects the between-group component. Multilevel models provide a unique opportunity to simultaneously model within- and between-group effects (2022).
Use of a random-effects model allows the heterogeneity of attenuation factors to be estimated parsimoniously. It also enables evaluation of the agreement between R and Q measurements at the aggregate level in model 2 (20, 23).
Within center, the calibration factor Wj can be used to correct the observed ("naïve") parameter that estimates the association between dietary exposure and the outcome of interest, for example, cancer incidence, by
where is the naïve estimate that expresses the increase in risk for a unit increase in the exposure and is obtained by relating cancer incidence to the Q measurements, following a "disease model." A mathematically equivalent way to correct for measurement error in the diet/disease association is to estimate E(T|Q) as the predicted values of the calibration model and use these predicted values as the imputed exposures in the disease model.
Variability of exposure in the EPIC study
To evaluate the effect of calibration in a multicenter setting, we focused on the distribution of predicted exposure values, E(T |Q). As a result of the calibration model considered, the variance of predicted within-group values, compared with Q, will shrink by a factor proportional to Wj2 (11). Using the calibration sample in the EPIC study (13), we calculated the predicted values according to calibration model 2. To estimate the between- and within-group variance components in the "calibrated" exposure, a mixed model was used. Here, we refer to this model as the "evaluation" model, where the observed Q and R measurements and the predicted intake values are each used as response variables in separate analyses, and the variance components are estimated by modeling the group effect with a random effect as
Ej =
E + uE0j,
uE0j ~ N(0, 12) and
Eij ~ N(0,
22),
with Yij being either Qij, Rij, or E(T|Q)ij estimated in model 2. This model enables estimation of the proportion of total variation in the exposure attributable to the between-group component, expressed by intraclass correlation coefficient (ICC) = 12/(
12 +
22). By symmetry, 1 ICC expresses the variability attributable to the within-group component.
Energy and macronutrient variability
We considered statistical models for intakes of protein, lipids, carbohydrate, alcohol, and total energy. In the absence of a standardized European nutrient database, country-specific food composition tables were used to calculate nutrient intakes for both Q and R. Results from a recent validation study (24) showed higher correlations between dietary protein from R measurements and protein estimated from urinary nitrogen, compared with Q. This finding suggests that the dietary assessment method used probably has a higher impact on measurement errors than use of local nutrient databases.
To adjust for total energy, macronutrient densities (i.e., macronutrient intakes divided by total energy intake) were also considered. ICC estimates were computed for nutrient intake in Q and R measurements, as well as for predicted values, according to model 2 for absolute intakes and density values.
To model the group effect in the present analysis, 28 centers were considered. The macronutrient distributions approximated a normal distribution, whereas the distribution of alcohol intake was rather skewed. However, the present analyses focused primarily on point estimates of fixed effects and variance components. These quantities are fairly robust to departures from normality in mixed models (25). Results using log-transformed values were very similar to those obtained by using untransformed variables.
The calibration and the evaluation models were all adjusted for age and body mass index at the individual and at the group levels. These variables were chosen because previous studies (2629) have suggested that they are significant determinants of the accuracy of Q measurements. To allow the effect of age and body mass index to be heterogeneous across centers, random effects for these variables were included in model 2, but the associated variance components were not significantly different from zero, thus suggesting homogeneity of effects. Overall, linear fixed effects for age and body mass index, compared with models with higher-order terms, best fit the data.
To explore the role of confounding factors in explaining the heterogeneity of the attenuation factors, we also considered a calibration model with a cross-level interaction between Q measurements and mean values of age and body mass index. Doing so enables exploration of the dependency of attenuation factors on predictors.
Gender-specific analyses were performed. Statistical analyses were carried out by using the MIXED procedure of SAS software (30). Parameter estimates were obtained by restricted maximum likelihood estimation. A "sandwich" estimator was used to compute corrected standard errors involving fixed-effects parameters, which are robust against model misspecification (25, 31, 32) in model 2. The likelihood ratio statistic based on restricted maximum likelihood estimation was used to test the significance of random effects.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
The square root of the variance component of Wj,
, which estimates the variability of attenuation factors across EPIC centers, was consistently and significantly different from zero for both men and women. This finding suggests that the accuracy of Q was rather heterogeneous across the EPIC centers.
Variance components for Q measurements showed very similar patterns for between-center variability in men and women. Respectively, we found low ICC values for alcohol (0.11 and 0.06), moderately low ICCs for energy intake (0.14 and 0.13) and lipids (0.10 and 0.15), and slightly higher ICCs for protein (0.25 and 0.21) and carbohydrates (0.17 for both genders).
The patterns of ICC values for R measurements were similar, but the values were consistently lower than 0.10. The between-center variance components were overall lower in R than in Q, except regarding lipids in men. On the other hand, within-center variation was considerably higher in R, a result that very likely reflected day-to-day variability in the single dietary measurements available for each study participant in the calibration subsample. For energy intake, within-center variability was 1.8 and 1.6 times larger in R than in Q measurements for men and women, respectively. For men and women, the same ratios were 2.0 and 1.5 for protein, 1.6 and 1.4 for carbohydrates, 2.5 and 2.1 for lipids, and 2.1 and 2.4 for alcohol.
For the predicted intake E(T|Q), the between-center variability was very close, but not identical, to the variation in the R measurements as a consequence of including a random effect (u0j) for the center variable in model 2. The within-center variability of E(T|Q) shrank when compared with Q measurements.
Adjustment for energy intake had a variable impact on the estimated variability of macronutrients. For alcohol and protein in men and women, ICC values for predicted intakes were basically unaffected. Adjustment for energy had a moderate effect for carbohydrates in men (ICC from 0.41 to 0.52). Decidedly higher differences in ICC values were observed for macronutrient densities regarding carbohydrates in women (ICC from 0.26 to 0.51) and lipids in men (ICC from 0.35 to 0.61) and women (ICC from 0.39 to 0.60). Very little change in the variance components for Q and R measurements was observed for macronutrient density values.
For both absolute intake and density values, ICC values overall were between 1.5- and 3.0-fold higher for E(T|Q) than for Q, thus indicating strong shrinkage in the within-center variance component after calibration. For lipid density in men, the ICC ranged from 0.20 for Q measurements to 0.61 for predicted values.
Overall, negative effects were observed in calibration model 2 for age and body mass index. For alcohol intake in men, the effects of body mass index were positive (data not shown). Results from the model with a cross-level interaction to explore the heterogeneity of attenuation factors revealed overall nonsignificant associations. However, in the case of carbohydrate and lipid density, of alcohol intake in men, and of protein density in women, a significant negative interaction with body mass index was observed, thus suggesting that centers in which mean body mass index levels were higher tended to have lower Q accuracy (data not shown).
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the context of the European EPIC study on diet and cancer, we have described the use of calibration substudies to correct for between-cohort differences in effects of dietary assessment errors, putting the measurements on a more standardized scale. The calibration was based on collection, within a large, stratified subsample of the EPIC cohorts, of standardized R measurements. Regression of these "reference" measurements on the individual-level questionnaire data enabled estimation of predicted values E(T|Q), indicating average true intake levels for given Q measurements that, in a second phase, can be used as predictors in a disease model. A two-level, random-effects model was used to estimate within- and between-center calibration effects. The calibration model included adjustments for effects of potential confounders, particularly total energy intake, body mass index, and age.
Random effects allowed parsimonious modeling of the heterogeneity of effects across EPIC centers without having to include a large number of interaction terms. Random-effects models also allow the inclusion of aggregate (contextual) variables to be evaluated and the exploration of possible sources of between-center differences in dietary questionnaire accuracy, by making such differences dependent on aggregate-level predictors.
We decided to focus on the effects of calibration on within- and between-center variance components of macronutrient intakes. The macronutrient composition of diet is still a major factor of interest in the nutritional epidemiology of cancer. In a relatively recent review (33), the macronutrients were considered comparable (e.g., in the case of alcohol) or readily comparable after correction for different methods of calculation and mode of expression (as for energy, protein, and carbohydrates). In addition, a validation study by Slimani et al. (24), recently conducted in the EPIC cohort, supported this finding, where relatively high correlations were observed in an ecologic comparison of protein intake estimated from urinary nitrogen and dietary protein, estimated by R and Q measurements. A more extensive, fully standardized European food composition database, also comprising all detailed micronutrients, is currently being prepared for the EPIC study (34).
The lack of common food composition tables might lead to a larger between-center variability, as a consequence of systematic food composition tables-specific errors, randomly distributed across countries, but it is less likely to affect the sizable change in ICC values observed for predicted intakes. In the present analyses, the calibration model took into account the geographic specificity of the EPIC entities, by performing center-specific comparisons between R and Q measurements, for which the same national nutrient databases were used. After correction for measurement error, we believe that the shrinkage of within-center macronutrient variability rather than the inflation of between-center variability was responsible for the relative increase in ICC values for E(T|Q) compared with Q. As expected, between EPIC centers, mean values of E(T|Q) were practically identical to mean levels of R. The term B provides an estimate of the between-center (linear) relations between mean R and Q measurements. Rather low values of the coefficient were observed for lipids in men and total energy intake in women, but values approximating 1 were observed for alcohol, particularly in men, thus suggesting very good agreement between R and Q at the aggregate level. Because special efforts have been made to standardize the R measurements to estimate absolute, mean dietary intake levels, we assume that the mean calibrated values reflect more accurately the between-cohort variations in mean true intake than the mean Q values do (23).
Within EPIC centers, the distributions of predicted intake values (i.e., after calibration) generally showed strong shrinkage toward center-specific mean values. The shrinkage in within-center variance was proportional to the square of within-center calibration coefficients Wj. The shrinkage depended only very moderately on the effects of other individual-level variables used to explain R variability, such as age and body mass index.
The association between exposure and disease can be decomposed as ßTotal = ßB ICC + ßW (1 ICC), where the ßs express the ecologic (B) and individual (W) association between exposure and disease, provided that the within-centers relations are homogeneous. After calibration, at least for macronutrients, a sizable weight is given to the between-cohort component of the multicenter EPIC analysis, and the weights of the two levels of evidence of the diet/disease association are similar.
Results from the present analyses suggest that the between-center variability of exposure provides a sizable amount of information and should therefore be exploited. Analysis of the diet/disease association in the EPIC study should provide estimates of the within- and between-center components in a framework very similar to that of calibration model 2, where both levels of effect were estimated simultaneously. In this type of disease model, the between-center component has the great advantage over a "classical" ecologic study that it can adjust for confounding at the individual level and over an aggregate design (35) and that the populations under study provide estimates of cancer incidence and dietary exposure.
The impact of energy adjustment on observed macronutrient variability appeared to be considerable in the calibration model. Adjustment for energy is motivated both by the requirement to consider isocaloric models and by the partial control of measurement error that it can bring about (36). Within-center variances were affected more than between-center variance, particularly for carbohydrates and lipids, reflecting a reduction in the error variance. Thus, the ICC values for the macronutrient densities were considerably larger than those for the absolute intakes.
The calibration model relies on some essential statistical assumptions. First, the errors in R and Q measurements must be uncorrelated. In practice, this assumption may not be fully met, and some positive correlation between errors may occur, for instance, because of subjects tendencies to under- or overreport intake when different dietary assessment methods are used (14, 16, 37, 38). A positive correlation between errors leads to overestimation of the within-center calibration coefficient W and thus to an underestimation of shrinkage of the intake distributions within centers. The ICC values in the present analyses may thus underestimate the relative weight that the between-center component should have in a full multilevel analysis. There is some evidence that adjustment for total energy intake, either through a linear model or by calculating nutrient densities, can reduce the magnitude of correlation between random measurement errors of Q measurements and R or food consumption records (36, 39). The effective error variance for the macronutrients is also probably reduced by energy adjustment. This seems to happen because the error in energy and the error in estimated macronutrient intake are highly correlated and, after adjustment, partly "cancel" each other (18). Thus, although the ICC values for the absolute intakes may be considerably underestimated, the ICC values for the energy-adjusted macronutrients can be less affected. The latter are of greater relevance, since some form of energy adjustment of macronutrient intakes is standard practice in epidemiology.
A second major assumption is that R provide unbiased estimates of dietary intake at a group level; that is, R = 0 and ßR = 1.0 in measurement-error model 1b (absence of constant and proportional scaling biases). This assumption may be relaxed, however, to that of homogeneous scaling biases (i.e., homogeneous
Rs and ßRs) in all study centers, which in practice may be reached approximately. Again, energy adjustment may help to standardize dietary measurements and meet at least the relaxed assumption. Under this assumption, the measurement error correction will still provide a correct relative weighting of within- versus between-center evidence for diet-disease associations. In addition, ßRs of less than unity lead to underestimation of within-center calibration factors. In this context, it is possible that the correlation between errors and the reference measurement bias may counterbalance their effects, thus leading to correct estimates of
Wj.
The calibration model in model 2 assumes fully linear associations between response and explanatory variables. Nonlinear calibration models are advisable solutions to capture the lack of linearity between R and Q. The shrinkage of the within-center variance component would characterize the predicted macronutrient distribution, also in nonlinear calibration models.
Measurement errors bias the association between any two quantities, for example, the relative risk estimate in a disease model, and cause power loss. Within single study centers, calibration may correct for bias but does not recuperate power losses due to random errors. Indeed, the effect of shrinkage of exposure distributions generally outweighs the effect of bias correction (i.e., increase) in relative risk estimates. In a multicenter study, however, calibration in principle can reduce between-center heterogeneity in diet-disease associations caused by differential impact of measurement errors and, probably more importantly, will lead to a less biased estimate of the overall relative risk (7, 11).
Moreover, in estimating associated variability of corrected relative risk estimates, it is necessary to take into account the variability in the estimate s (10, 11). In the EPIC study, this issue is minor because the large size of the calibration study allows the calibration factor to be estimated with high precision.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|