Commentary: Correlated errors and energy adjustment—where are the data?

Donna Spiegelman

Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA, USA. E-mail: stdls{at}channing.harvard.edu

In this issue of the International Journal of Epidemiology (IJE), Day et al.1 conduct a sensitivity analysis of the effects of correlated errors of mis-measured covariates in linear regression models, when the data to permit direct adjustment for the effects of correlated errors are not available. They show yet again what has been documented many times before, but not cited by Day et al.,2–6 that measurement error in more than one model covariate in which both the underlying true values and the errors may be correlated leads to a conflation of the well-known effects of confounding with the well-known effects of measurement error. When the correlations between the underlying variables, the error correlations, and the measurement errors are large, these results can be misleading and unpredictable. It is not entirely clear what is new here, although the detailed example is certainly useful to further emphasize this point to those who didn't get it the first few times around.

There are many instances in nutritional epidemiology when it is of interest to evaluate the independent effects of several nutrients on a health endpoint. The sensitivity analysis approach given by Day et al. is of limited use, since as they and others before them have shown, the regression coefficients relating the nutrients to the health endpoint could take on virtually any value. With the standard study design, it is impossible to validly estimate or even put bounds of any informative width on the regression coefficients of interest.

So how can we remove bias in multiple regression coefficients for mis-measured nutrients which both are correlated and may have correlated error? Are there study designs and then methods of data analysis available to correct for these biases? Of course! Day et al.'s revisitation of the problem without reviewing the proposed solutions does not highlight the potential for progress in nutritional epidemiology.

Very briefly, Carroll & Stefanski,7 Rosner et al.,8,9 Spiegelman et al.,10,11 and Xie et al.12 presented the regression calibration method for main study/validation study designs, which permit approximately unbiased point and interval estimates of effect to be estimated from linear, Cox, and logistic regression models when the usual methods of exposure assessment can be validated against an unbiased, perhaps 'alloyed' gold standard in a sub-sample, possibly outside of the original main study. The method is inherently multivariate and naturally accommodates multiple correlated nutrients with correlated errors as discussed in Day et al.'s paper in this issue of the IJE. Fraser & Stram2 discuss study design in this setting, and show that validation studies of 1000 participants or more are needed when correlations of either the errors or the underlying true values of nutrients are high.

Concerns have been raised about the validity of the assumptions made in the original regression calibration method. First, it was suggested that when multiple nutrients are being validated simultaneously by ‘alloyed’ gold standards, the errors in the reference instruments may be correlated with the errors in the surrogates. When this is the case, regression calibration will fail to remove all of the bias due to measurement error—the surrogates will appear to be better measured than they are. Spiegelman et al.13 developed an augmented validation study design which includes a third method of exposure assessment (e.g. a biomarker), the errors of which can reasonably be assumed to be uncorrelated with the errors in the reference instrument (e.g. diet records) and surrogate (e.g. food frequency questionnaires). With these measurements in hand, they extended the regression calibration procedure to allow for correlated errors between the reference and surrogate measures. Error correlations were small in examples of vitamin E and physical activity. Next, Kipnis et al.14 suggested that in addition to the presence of correlated random within-person error of the reference and surrogate exposure measures, there may also be systematic correlated within-person error between these two instruments. In response, Spiegelman et al.15 developed a class of augmented main study/validation study designs which permit valid regression calibration estimation of relative risks and other regression slopes, requiring either replicated biomarkers or an additional ‘instrumental variable’ which has some correlation with the underlying exposure of interest with error jointly uncorrelated with all other errors.

It should be pointed out that these extended regression calibration methods have not been expanded to handle multiple regression covariates measured with error, as Day et al. focus upon here.1 Should there be sufficient interest, it is straightforward to derive these further generalizations and produce user-friendly SAS macros to facilitate use, as has already been done with standard multivariate regression calibration (www.hsph.harvard.edu/faculty/spiegelman/software.html). The truly limiting issue in practice is the one underscored by Fraser & Stram—we nutritional epidemiologists have not been conducting validation studies of sufficient size to make the necessary bias corrections with meaningful precision. To my knowledge, there is not yet a single validation study available for which it would be informative to fully investigate the independent effects of several nutrients together, allowing for correlated random and possibly systematic errors of the reference instrument (diaries) with the surrogate (food frequency questionnaire or diet recall). Let's do it, and do the job right!

A few brief comments on the Jakes et al.16 paper—the unanswered question here is to what extent total energy expenditure (TEE) measures the underlying quantity of interest—long-term average total energy intake (TEI). What we need, again, are data—data on the association between TEE, measured by heart rate monitoring (HRM) and perhaps by other methodologies as well, with long-term average TEI. Otherwise, although body weight and physical activity may have higher correlations with TEE than TEI self-reported by either food frequency questionnaire or 7-day diary, it is impossible to determine from Jakes et al.'s data that weight and activity are better predictors of long-term average TEI. If it turns out that Jakes et al. are correct in the empirical equivalence between long-term average intake and daily expenditure, then the best solution, when the desired goal is emulation of the isocaloric experiment, is simply to adjust nutrients for self-reported energy intake, weight, and activity! No need to choose between them—let the data in the given study decide. Which variables work best in any given situation may depend upon a complex set of circumstances that are impossible to determine, nor is it worthwhile to do so.

Moreover, as noted by Willett17 and as confirmed in the OPEN study (Kipnis et al.18), adjustment for TEI by the same method used to assess the specific nutrient (e.g. protein or fat intake) has the advantage of 'cancelling' correlated error, and thus substantially improving the validity of the energy-adjusted nutrient intake. This will not be accomplished solely by using an independent surrogate or measure of TEI such as weight or physical activity assessed by HRM. Among 51 290 male adults in the Health Professionals' Follow-up Study (1986),19 the correlations between total fat intake (kg/day) and total energy intake (kcal/day), activity (metabolic equivalents [METS]/day) and weight (kg) were 0.85, 0.03, and 0.11, respectively. The R2 for the regression of total fat intake on all three was 0.74, and for total energy intake alone was 0.73. Results for the regressions of total protein intake (g/day) and total carbohydrate intake (g/day) were virtually identical. Thus, it appears that in large cohort studies where the measurement of TEE by HRM is a practical impossibility except in a small validation study, energy adjustment remains best accomplished by the standard methods suggested by Willett and Stampfer,20 combined with measurement error correction for TEI through the estimation of a regression calibration deattenuation factor which depends on HRM, weight, and the usual measure of TEI.


    References
 Top
 References
 
1 Day NE, Wong MY, Bingham S et al. Correlated measurement error—implications for nutritional epidemiology. Int J Epidemiol 2004; 33:1373–81.[Abstract/Free Full Text]

2 Fraser GE, Stram DO. Regression calibration in studies with correlated variables measured with error. Am J Epidemiol 2001; 154:836–44.[Abstract/Free Full Text]

3 Armstrong BG, Whittemore AS, Howe GR. Analysis of case-control data with covariate measurement error: application to diet and colon cancer. Stat Med 1989; 8:1151–63.[ISI][Medline]

4 Zidek JV, Wong H Le ND et al. Causality, measurement error and multicollinearity in epidemiology. Environmetrics 1996; 7:441–51.[CrossRef][ISI]

5 Fung KY, Howe GR. Methodological issues in case-control studies. III. The effect of joint misclassification of risk factors and confounding factors upon estimation and power. Int J Epidemiol 1984; 13:366–70.

6 Elmstahl S, Gullberg S. Bias in diet assessment methods—consequences of collinearity and measurement errors on power and observed relative risks. Int J Epidemiol 1997; 26:1071–79.

7 Carroll RJ, Stefanski LA. Approximate quasi-likelihood estimation in models with surrogate predictors. J Am Statist Assoc 1990; 85:652–63.[ISI]

8 Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of mutiple covariates measured with error. Am J Epidemiol 1990; 132:734–45.[Abstract]

9 Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. Am J Epidemiol 1992; 136:1400–13.[Abstract]

10 Spiegelman D, McDermott A, Rosner B. Regression calibration method for correcting measurement-error bias in nutritional epidemiology. Am J Clin Nutri 1997;65(Suppl):1179S–86S.

11 Spiegelman D, Carroll RJ, Kipnis V. Efficient regression calibration in main study/internal validation study designs with an imperfect reference instrument. Stat Med 2001; 20:139–60.[CrossRef][ISI][Medline]

12 Xie SX, Wang CY, Prentice RL. A risk set calibration method for failure time regression by using a covariate reliability sample. J R Statist Soc Ser B 2001; 63:855–70.[CrossRef][ISI]

13 Spiegelman D, Schneeweiss S, McDermott A. Measurement error correction for logistic regression models with an 'alloyed gold standard'. Am J Epidemiol 1997; 145:184–96.[Abstract]

14 Kipnis V, Carroll RJ, Freedman LS et al. Implications of a new dietary measurement error model for estimation of relative risk: application to four calibration studies. Am J Epidemiol 1999; 150:642–51.[Abstract]

15 Spiegelman D, Zhao B, Kim J. Correlated error in biased surrogates: study designs and methods for measurement error correction. Stat Med (in press).

16 Jakes RW, Day NE, Luben R et al. Adjusting for energy intake—what measure to use in nutritional epidemiological studies? Int J Epidemiol 2004; 33:1382–86.[Abstract/Free Full Text]

17 Willett W. Commentary: Dietary diaries versus food frequency questionnaires—a case of undigestible data. Int J Epidemiol 2001; 30:317–19.[Free Full Text]

18 Kipnis V, Subar AF, Midthune D et al. Structure of dietary measurement error: results of the OPEN Biomarker Study. Am J Epidemiol 2003; 158:14–21.[Abstract/Free Full Text]

19 Ascherio A, Rimm EB, Giovannucci EL et al. Dietary fat and risk of coronary heart disease in men: cohort follow-up study in the United States. BMJ 1996; 313:84–90.[Abstract/Free Full Text]

20 Willett WC, Stampfer MJ. Total energy intake: implications for epidemiologic analysis. Am J Epidemiol 1986; 124:17–27.[Abstract]