Correlated measurement error—implications for nutritional epidemiology

NE Day1, MY Wong2, S Bingham3, KT Khaw4, R Luben1, KB Michels5, A Welch1 and NJ Wareham6

1 Strangeways Research Laboratory, Institute of Public Health, University of Cambridge, Cambridge, CB1 8RN, UK
2 Department of Mathematics, The Hong Kong University of Science & Technology, Hong Kong
3 MRC Dunn Human Nutrition Unit, Cambridge, UK
4 Clinical Gerontology Unit, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK
5 Ob/Gyn Epidemiology Center, Harvard Medical School, Brigham & Women's Hospital, Boston, MA, USA
6 MRC Epidemiology Unit, Strangeways Research Laboratory, Cambridge, CB1 8RN, UK.

Correspondence: Prof. NE Day, Strangeways Research Laboratory, Institute of Public Health, University of Cambridge, Cambridge CB1 8RN, UK. E-mail: nick.day{at}srl.cam.ac.uk


    Abstract
 Top
 Abstract
 Method
 Discussion
 Appendix 1
 Appendix 2
 References
 
Background In nutritional epidemiology, it is common to fit models in which several dietary variables are included. However, with standard instruments for dietary assessment, not only are the intakes of many nutrients often highly correlated, but the errors in the estimation of the intake of different nutrients are also correlated. The effect of this error correlation on the results of observational studies has been little investigated. This paper describes the effect on multivariate regression coefficients of different levels of correlation, both between the variables themselves and between the errors of estimation of these variables.

Methods Using a simple model for the multivariate error structure, we examine the effect on the estimates of bivariate linear regression coefficients of (1) differential precision of measurement of the two independent variables, (2) differing levels of correlation between the true values of the two variables, and (3) differing levels of correlation between the errors of measurement of the two variables. As an example, the prediction of plasma vitamin C levels by dietary intake variables is considered, using data from the European Prospective Investigation of Cancer (EPIC) Norfolk study in which dietary intake was estimated using both a food frequency questionaire (FFQ) and a 7-day diary (7DD). The dietary variables considered are vitamin C, fat, and energy, with different approaches taken to energy adjustment.

Results When the error correlation is zero, the estimates of the bivariate regression coefficients reflect the precision of measurement of the two variables and mutual confounding. The sum of the observed regression coefficients is biased towards the null as in univariate regression. When the error correlation is non-zero but below about 0.7, the effect is minor. However, as the error correlation increases beyond 0.8 the effect becomes large and highly dependent on the relative precision with which the two variables are measured. At the extreme, the bivariate estimates can become indefinitely large. In the example, the error correlation between fat and energy using the FFQ appears to be over 0.9, the corresponding value for the 7DD being approximately 0.85. The error correlation between vitamin C and fat, and vitamin C and energy, appears to be below 0.5 and smaller for the 7DD than for the FFQ. The impact of these error correlations on bivariate regression coefficients is large. The effect of energy adjustment differs widely between vitamin C and fat.

Conclusion High levels of error correlation can have a large effect on bivariate regression estimates, varying widely depending on which two variables are considered. In particular, the effect of energy adjustment will vary widely. For vitamin C, the effect of energy adjustment appears negligible, whereas for fat the effect is large indicating that error correlation close to one can partially remove regression dilution due to measurement error. If, for fat intake, energy adjustment is performed by using energy density, the partial removal of regression dilution is achieved at the expense of substantial reduction in the true variance.


Keywords Measurement error, correlated error, energy adjustment

Accepted 23 February 2004

The effect of measurement error on estimating disease–exposure relationships in epidemiology has been extensively investigated in the univariate case. Less attention has been paid to the multivariate situation. Although the algebraic formulation of this issue is straightforward,1 the numeric consequences have not been elaborated in a way to aid interpretation of epidemiological findings. Part of the problem is the inherent difficulty of estimating a multivariate error structure. The issue is particularly acute in nutritional epidemiology, where typically the errors of measurement, for example with a food frequency questionnaire (FFQ), may be large, and where the need on many occasions to adjust for total energy consumption automatically makes the analyses multivariate. A further complication in nutritional epidemiology is that high levels of correlation may exist between both true intakes of different nutrients, and in the measurement error associated with the estimated intake. In this situation, application of univariate correction factors (attenuation coefficients) to estimates derived from a multivariate analysis may be severely misleading. In the absence of credible estimates of the multivariate error structure, one approach is to do a sensitivity analysis, using a range of plausible multivariate error structures to delineate the range of true disease–exposure relationships, which could have generated the observed results. Some estimates of multivariate error structure for an FFQ have been reported. Kipnis et al.2 and Subar et al.3 considered fat, non-fat energy, and total energy. Day et al.4 considered nitrogen, potassium, and sodium. The correlation between the true values of the nutrients ranged from 0.5 to 0.8 and the correlation between the errors for the different nutrients ranged from 0.7 to 0.9. Correlations of observed with true intakes range from 0.13 for total energy5 to 0.3 to 0.5 for other nutrients.

The issue of correlated error is of particular relevance to the question of energy adjustment. Since typical dietary assessment questionnaires in use in epidemiology, such as versions of the FFQ, assess total energy intake poorly, the principal reason for energy adjustment is the potential of high error correlation to reduce the effects of measurement error, i.e. to reduce regression dilution.6,7 In the OPEN study,5 the correlation of intake estimates from the FFQ with true intake was considerably better for protein as energy density rather than as absolute intake. As part of our more general exploration of the effect of multivariate error structures on observed regression coefficients, we examine the conditions under which regression dilution may be reduced. For illustrative purposes, we consider the prediction of plasma vitamin C levels by dietary variables, specifically dietary vitamin C, fat, and total energy. We consider the effect of energy adjustment on the regression coefficients for dietary vitamin C and fat, both in terms of standard regression models and in terms of vitamin C and fat intake expressed as energy densities.


    Method
 Top
 Abstract
 Method
 Discussion
 Appendix 1
 Appendix 2
 References
 
Measurement error model
In this paper, we concentrate on the regression of a continuous variable y on two exposure covariates. The true exposures are represented by T1 and T2 and y is assumed to be a function of T1 and T2, that is, y = ß0 + ß1T1 + ß2T2 + {varepsilon}. A realistic model for measurement error in nutritional epidemiology is likely to be complex.1 Here, we concentrate on a relatively simple model to illustrate some key points. In place of the true covariates, T1 and T2, we observe surrogates, X1 and X2. We assume that error is non-differential with regard to the outcome variable y, that is, X1 and X2 contribute no information about y beyond what is available in T1 and T2. X1 and X2 are related to T1 and T2 by an additive error model as

for i = 1, 2. The true latent variable Ti is assumed to have mean µi and variance . We designate the correlation coefficient between T1 and T2 as {rho}T, and between the random errors {varepsilon}Xi and {varepsilon}Xi as {rho}{varepsilon}. The correlation coefficient between the true and observed exposures is defined as {rho}i, that is, . Appendix 1 gives detailed expression for the expected values of the regression coefficients from the univariate and bivariate regression of y on X1 and X2, together with functional relationships between these expected values.

Numerical results
Expressions for the univariate and bivariate estimates represent both mutual confounding, and the effect of measurement error. The bivariate estimates correspond to the usual epidemiological practice of adjustment to remove the effect of confounding. When the error correlation is zero, i.e. {rho}{varepsilon} = 0, then the situation is relatively straightforward, reflecting the combined effects of independent error and mutual confounding. It has been explored previously (e.g. Wong et al.8). The sum of the bivariate estimates is biased towards the null, a generalization of regression dilution of univariate estimates, the details of which are given in Appendix 2. It should be noted that the individual coefficients of the bivariate estimate may exceed their true value, but not when taken together.

To illustrate how the effect of measurement error varies with the crucial parameters, {rho}1, {rho}2, {rho}T, and in particular {rho}{varepsilon}, we consider two representative situations. One where only T1 is related to disease, so that ß1 = 1 and ß2 = 0. The other where both variables are equally related to disease, so that ß1 = ß2 = 1. We have put the relative scale parameter {tau}1/{tau}2 = 1. Figures 1a to 1c plot the two bivariate estimates, for {rho}{varepsilon} varying from 0 to 1, for various values of {rho}1 and {rho}2, and for {rho}T equal to 0.7, 0.8, 0.9, and 0.95. For all the values of {rho}1, {rho}2, and {rho}T considered, for values of {rho}{varepsilon} below approximately 0.8 there is a considerable degree of stability, in that the bivariate estimates of ß1 and ß2 do not differ much from the values with {rho}{varepsilon} = 0, i.e. the effect of correlated error is not large. For values of {rho}{varepsilon} greater than about 0.8, however, the picture is radically different. Figures 1a and 1b display two complementary situations, 1a where measurement error is slightly smaller for the first variable and 1b where the second variable has slightly smaller measurement error. For values of {rho}{varepsilon} below 0.7 or 0.8, the two Figures are similar. For values of {rho}{varepsilon} greater than 0.8, the two Figures diverge sharply. Figure 1a represents the partial adjustment for measurement error that can result from correlated error,9,10 in that the bivariate estimate of ß1 increases as {rho}{varepsilon} increases. However, as increases, decreases and can become substantially negative. In addition, in some circumstances, can exceed its true value of 1 (i.e. in fact unbounded if {rho}1 = {rho}2 and {rho}T, {rho}{varepsilon} -> 1), so that the effect of measurement error in the bivariate situation is not necessarily conservative. In Figure 1b, in striking contrast to Figure 1a, for large values of {rho}T and {rho}{varepsilon}, becomes substantially negative, and becomes strongly positive. In Figure 1c, the true values of ß1 and ß2 are equal, but there is a slight difference in the accuracy of measurement of the two variables. The relatively minor impact of the different accuracy of measurement seen when {rho}{varepsilon} is zero, becomes strongly exaggerated as {rho}{varepsilon} increases towards one, to the point that can exceed one, the true parameter value, and can become substantially negative. Table 1 presents the effects of varying {rho}1 and {rho}2, for a limited number of values of {rho}T and {rho}{varepsilon}. The effects of correlated error as {rho}{varepsilon} increases beyond 0.7 become exaggerated as the difference between {rho}1 and {rho}2 increases.



View larger version (20K):
[in this window]
[in a new window]
 
Figure 1 Expected values, and , of the estimates for the observed bivariate regression coefficients of a continuous variable y on two continuous variables measured with error. {rho}1 and {rho}2 represent the accuracy of measurement of exposures, {rho}T is the correlation between true values of the two exposures and {rho}{varepsilon} is the correlation of the errors of measurement of the two variables. ß1 and ß2 are the true regression coefficients. (a) First variable related to outcome (ß1 = 1), second variable not related to outcome (ß2 = 0). First variable measured with slightly less error than the second ({rho}1 = 0.6, {rho}2 = 0.5). (b) As in (a) except the second variable measured with slightly less error ({rho}1 = 0.5, {rho}2 = 0.6). (c) Both variables equally related to outcome (ß1 = 1, ß2 = 1), second variable measured with slightly less error ({rho}1 = 0.5, {rho}2 = 0.6)

 

View this table:
[in this window]
[in a new window]
 
Table 1 Expected values of the estimates for the regression coefficients of two variables. {tau}1/{tau}2 is set equal to one. b1,bi and b2,bi are the expectations of and , respectively

 
The values of the bivariate estimates in relation to the univariate estimates depend on the value of . If this expression is negative with ß1 > 0 and ß2 = 0, then one will have and , i.e. the appearance of negative confounding. If the two variables are measured with equal accuracy, with , then the condition reduces to {rho}{varepsilon} > {rho}T, i.e. the errors are more strongly correlated than the true values.

Sensitivity analysis
Given the great range of estimates that can be obtained from varying multivariate error structure for a given true set of parameter values, and the likelihood in most situations that this multivariate error structure is only approximately known, it is clear that results obtained using standard nutritional epidemiology instruments need cautious interpretation. In particular, sensitivity analyses are indicated, to map out the range of plausible true parameter values.

There are eight unknown parameters ß1, ß2, {sigma}1, {sigma}2, {tau}1, {tau}2, {rho}T, and {rho}{varepsilon}. The observations include the univariate and bivariate regression estimates, and the sample variance covariance matrix for X1 and X2. Using the method of moments, these observed values are equated to their expected values. Given the functional relationship between the four regression coefficients, these reduce to five equations, leaving three unknown parameters. In particular, we have



in which SX1, X1, SX2,X2, and SX1,X2 are the sample variances of X1 and X2 and their sample covariance, respectively. Solving the above three equations, we have

and

where r12 is the sample correlation coefficient of X1 and X2.

As can be seen from Appendix 1, the two univariate and two bivariate estimates are related through two functional equations. There are therefore only two independent expressions, the simpler ones being the univariate estimates. So, for fixed {rho}1, {rho}2, and {rho}T, the estimates for ß1 and ß2 can be obtained by equating the observed univariate estimates to their expectations, i.e., by solving the following two equations: and . The true values of ß1 and ß2 are, thus, estimated by

and

We, then, calculate the values of ß1 and ß2 for reasonable ranges of {rho}1, {rho}2, and {rho}T, to establish the range of true values of ß1 and ß2 compatible with the data.

Example
The EPIC-Norfolk study11 has collected extensive dietary and other data and blood samples on over 25 000 men and women aged 45–74 at entry to the study. Plasma vitamin C was measured on 8162 individuals. We have explored the relationship between plasma levels of vitamin C and estimates of dietary intake of vitamin C, fat, and energy, using two dietary instruments, a 7-day diary (7DD) and an FFQ.12 We used data from 5934 individuals for whom the 7DD data had been computerized, and for whom the FFQ and plasma vitamin C data were available. We assume that dietary vitamin C should be the only dietary determinant of plasma vitamin C levels, although confounding may induce associations between fat and energy and plasma vitamin C.

We have performed the analyses both with the original variables and with derived variables obtained from estimated vitamin C and fat intakes by dividing by estimated energy intake, as a method of adjusting for energy. Both the original variables and the derived variables were standardized to make their standard deviations equal to one. The sample correlations between the dietary variables are given in Table 2. For the derived variables vitamin C/energy and fat/energy, the correlation is –0.29 from the 7DD and –0.39 from the FFQ.


View this table:
[in this window]
[in a new window]
 
Table 2 Sample standard deviations (diagonal) and sample correlation coefficients (off-diagonal) for the original variables and the derived variables

 
Table 3a presents the results of regressing plasma vitamin C levels on the dietary variables, either alone or in combination. On the assumption that dietary vitamin C is the only dietary variable which directly affects plasma vitamin C levels appreciably, the univariate models indicate negative confounding with fat and to a lesser extent total energy. In the bivariate models including dietary vitamin C, all the coefficients are further from zero than in the corresponding univariate regressions. This is contrary to what one would expect if the correlations between the errors were zero (Appendix 1), and is thus indicative of the behaviour seen in Figure 1a (but with the correlation of the true values negative) with moderate values of correlation between the errors. The bivariate model including fat and energy exhibits similar behaviour, but more extreme particularly for the FFQ. The regression coefficients are markedly further from zero than the corresponding univariate coefficients, the coefficients for energy even changing sign. The behaviour thus resembles that seen in Figure 1c, with a value for the error correlation approaching unity.


View this table:
[in this window]
[in a new window]
 
Table 3 Estimates of regression coefficients for the regression of plasma vitamin C on vitamin C, fat, and energy, and on vitamin C/energy and fat/energy. All dietary factors are standardized by their corresponding standard deviation

 
Table 3b gives corresponding coefficients when dietary vitamin C and fat are replaced by the equivalent energy density variables, the energy density variables being rescaled to make their variances equal to one. The coefficients for vitamin C are very similar to the values in Table 3a. For fat and energy, the bivariate coefficients are all closer to zero than the corresponding values in Table 1a, although the coefficients for energy have changed sign. It is interesting to note that in the model including fat and energy, the coefficient for fat is considerably smaller in Table 3b than in Table 3a, but that the ratio of the coefficient to its standard error is virtually unchanged.

More extensive analyses of these data are presented in a separate paper.13 In that paper, where the emphasis is on the substantive determinants of plasma vitamin C, non-dietary variables (age, sex, smoking status, body mass index, and height) thought to affect plasma vitamin C levels are included in the regressions. Inclusion of the non-dietary variables has little effect on the regression coefficients for the dietary variables. The analyses presented in this paper omitted the non-dietary variables since the motivation was to illustrate the strictly bivariate analyses given earlier.

A sensitivity analysis for the univariate and bivariate regressions indicates the likely values of some of the underlying parameters. For each pair of standardised variables, there are 6 parameters, ß1, ß2 {rho}1, {rho}2, {rho}T, and {rho}{varepsilon}. For vitamin C and fat, and for vitamin C and energy, we assume that the true regression coefficient is non-zero only for vitamin C. An approximate estimate of the true value of ßvita min C can be derived from Bates et al.14 and should lie between 20 and 40 mmol/dl plasma vitamin C per standard deviation of dietary vitamin C. Thus assuming these values of ß1 and ß2, given the observed sample covariance of the dietary variables and equating the univariate and bivariate coefficients from Table 3a to the expected values of the parameters, fixing one further parameter then determines the other three parameters. We have chosen to fix {rho}T, since it is independent of the dietary instrument. We then calculate the values of the other three parameters for the two dietary instruments. Table 4a displays a range of plausible values when considering dietary vitamin C and fat, and Table 4b the corresponding values for the derived density variables. Focusing on the value of the correlation between the errors of estimating vitamin C and fat intake, one notes that this correlation is further from zero in both Tables 4a and 4b for the FFQ than for the 7DD, and that substantial negative correlations have been generated in the density model (not surprisingly since energy is in the denominator of both variables). The values of {rho}1 (dietary vitamin C) are slightly larger for the 7DD than for the FFQ. The values of {rho}2 are slightly larger for the FFQ than for the 7DD. Comparing the values of {rho}1 and {rho}2 between Tables 4a and 4b indicates that taking energy densities makes no impact on the ratio of error to true variance for vitamin C, whereas for fat makes a substantial improvement to the ratio of true to error variance, the latter finding in line with that reported for protein intake, as assessed by the FFQ, in the OPEN study.5 For fat and energy, where the correlation of the true intake is plausibly between 0.7 and 0.9, the error correlation for the diary would lie between 0.85 and 0.90, whereas for the FFQ the range is more plausibly 0.90 to 0.95. It is these high values for the error correlation that generate the regression coefficients for the bivariate fat-energy model in Table 3a, higher for the FFQ than for the diary.


View this table:
[in this window]
[in a new window]
 
Table 4 Sensitivity analysis. Implied values of {rho}1, {rho}2, and for two instruments, given the observed univariate and bivariate estimates (aT1 = vitamin C and T2 = fat, and bT1 = vitamin C/energy and T2 = fat/energy). *calculated. ßfat and ßfat/energy are set to be zero separately

 

    Discussion
 Top
 Abstract
 Method
 Discussion
 Appendix 1
 Appendix 2
 References
 
We have explored the effect of measurement error in bivariate regression models when the errors of measurement of the two independent variables are correlated. For moderate values of this error correlation, less than 0.7 say, the effects are at most modest. For values increasing beyond 0.8 the effects can be large, and in extreme cases can be unbounded. Thus for large error correlations, the usual assumption that measurement error leads to biases towards the null can be incorrect. In many epidemiological situations it might be considered unlikely that large error correlations would be present, particularly when dealing with biochemical variables, or other biomarkers. One area, however, where such high correlations may be present is in nutritional epidemiology, with questionnaire methods such as an FFQ or a self-completed diary. It is this context that provides the motivation of the work developed in this paper.

Correlated error is of particular relevance to the question of adjustment for total caloric intake, now a standard procedure in nutritional epidemiology analysis. When using a standard dietary instrument such as an FFQ, however, caloric intake is poorly estimated, with correlations with true energy intake, as assessed by doubly labelled water or heart rate monitoring, between 0.1 and 0.3 being reported.5,15 Weight, or a combination of weight and a simple index of physical activity, is substantially more highly correlated with energy intake, as we report in our related second paper. If the aim of energy adjustment is to mimic an isocaloric experiment, then adjustment for weight, and perhaps additionally physical activity, would be preferable to adjusting for an FFQ derived estimate of energy intake.15 However an additional reason for energy adjustment has been proposed, namely that it provides some control of measurement error6,7 if errors are correlated. The results presented here allow this suggestion to be examined quantitatively. The results in Figure 1 demonstrate that the impact of measurement error on relative risk estimates can be radically modified by adjustment for a second variable with correlated error. The degree of modification, however, is highly sensitive to the value of the error correlation, especially for values above about 0.8, and is also strongly affected by the relative precision with which the two variables are measured. As can be seen from Figure 1, if energy were more precisely measured than the dietary variable being investigated, then the effect of energy adjustment might go in the direction opposite to that intended, even reversing the sign of the estimate of the relevant parameter. Fortuitously in this respect, the precision of estimation of energy intake is poor. The effect of different levels of error correlation is illustrated in Table 3a, where the regression coefficient for fat changes markedly when energy is included in the model (more for the FFQ than for the 7DD), reflecting the partial elimination of the effect of measurement error since the error correlation appears to be of the order of 0.9. The coefficient for dietary vitamin C, however, changes only slightly when energy is included, since the error correlation is around 0.2 to 0.3. Since the effect of adjusting for energy depends on a parameter which it is difficult to estimate, the effect will be to some extent unpredictable, and will vary widely depending on the variables under consideration.

Rather than adjust for energy in terms of a standard model as in Table 3a, energy adjustment is often undertaken by the creation of derived variables, either using the energy density method or residuals, after regression on energy.9 In the example we have considered, the two approaches give very similar results.13 We present the results of using dietary vitamin C and fat in terms of energy density in Table 3b, the two energy density variables being standardized. Comparison between Tables 3a and 3b of the bivariate model with fat and energy is instructive. One notes that the regression parameters in the bivariate model are markedly smaller in Table 3b, as are the standard errors of the estimates. The change in plasma vitamin C for a standard deviation change in the independent variable is more than 50% smaller for the energy density variable than for untransformed fat intake, and this reduction would equally apply for changes in plasma vitamin C between categories of fat intake defined by percentiles of the distribution. The reason is that much of the variation in fat intake across the population is removed by considering fat as fat/energy, since much of the variation in energy intake comes from variation in fat intake. The effect of expressing fat as in terms of energy density is thus twofold, firstly the error variance is substantially reduced, by the cancellation of correlated errors; secondly, the true variance in intake is also substantially reduced, since much of the true variation in energy comes from variation in fat. Adjustment methods, which produced the first effect, but not the second, would clearly be preferable.

Our main conclusions are that:

  1. In bivariate regression, the estimated regression coefficients are strongly affected by correlation between the errors in the two independent variables once this correlation becomes large. This effect can potentially be exploited to reduce the regression dilution effects of measurement error, but the effect is heavily dependent on the relative precision with which the two variables are measured, and on the level of error correlation. These three parameters are difficult to measure with precision.
  2. Adjustment for energy in standard regression models as in Table 3a can reduce the effect of measurement error substantially, but the reduction will vary widely with the dietary variable in question, and with the dietary instrument used. Thus for dietary vitamin C in the example in this paper, energy adjustment had a negligible effect on regression dilution.
  3. For some dietary variables, although not all, the use of energy density (or energy residual models, results not shown) may lead to a reduced ratio of error variance to true variance, and so reduce the role of measurement error. However, taking energy densities may substantially reduce the true relevant variation in the population. When treating several dietary variables simultaneously, it is possible that such derived variables will have lower correlations between their errors and so give rise to more stable estimates, apart from the energy term.
  4. There is a role for energy adjustment using standard regression models to reduce the effect of measurement error, but more satisfactory mitigation of the effect of measurement error may be achieved by considering energy compartments, such as energy from fat or from carbohydrates.

This paper has only considered analytically the bivariate case. Many epidemiological studies report results of analyses of considerably higher dimension, incorporating three, four or even more dietary variables. The analytic complexities increase rapidly with increasing dimensionality, and the implications of different error structures on the regression coefficients appears difficult to predict. It is probable however that large error correlations will generate complex behaviour of the type seen in Figure 1.


    Appendix 1
 Top
 Abstract
 Method
 Discussion
 Appendix 1
 Appendix 2
 References
 
Statistical method
We denote the point estimates of ß1 and ß2 from univariate regressions of y on X1 and y on X2 as and , respectively, and, the point estimates of ß1 and ß2 from bivariate regression of y on X1 and X2 as and , respectively. The estimates and are then given

and

On the right hand side in the first of these equations, the term with ß1 gives the univariate attenuation factor, and the term in ß2 reflects confounding and attenuation. It is conversely for the second equations. The estimates and are also given

and

where In these expressions, confounding and attenuation are ineradicably linked. It should be noted that if ß2 = 0 and {rho}{varepsilon} = 0, then

and

i.e., the bivariate estimates are closer to zero than the univariate estimates.

These four expectations are functionally related since we have

and

There are then only two functionally independent expressions.


    Appendix 2
 Top
 Abstract
 Method
 Discussion
 Appendix 1
 Appendix 2
 References
 
Bounds on the bivariate regression coefficients when the errors are uncorrelated
Without loss of generality, we can take {tau}1 = {tau}2 and ß1 > ß2 > 0. The expectations of estimates and are given

and

If {rho}T ≥ 0, then clearly and . In addition, for all {rho}T, writing as

shows that is always positive.

We then write as

= w1ß1 + w2ß2, say, where w1, w2 ≤ 1 for all {rho}1, {rho}2, and {rho}T so that

If {rho}T < 0, then

where w1* and w2* are given by the expression for w1 and w2 with {rho}T replaced by |{rho}T|. We then have 0 ≤ w1*, w2* ≤ 1

The possible values of and are therefore bounded by


and

These bounds are displayed graphically in Figure 2.



View larger version (20K):
[in this window]
[in a new window]
 
Figure 2 Bounds on the bivariate regression coefficients when the errors are uncorrelated, {tau}1 = {tau}2 and ß1 > ß2 > 0

 

    Acknowledgments
 
This work has been supported by a Yamagiwa-Yoshida Memorial UICC International Cancer Study Grant. The EPIC-Norfolk Study is funded by the Cancer Research Campaign, British Heart Foundation, Department of Health, Europe Against Cancer Programme of Commission of European Communities, Medical Research Council, Ministry of Agriculture, Fisheries and Food, Wellcome Trust, and the World Health Organisation. MYW was funded by the Wellcome Trust. NW is a Wellcome Trust Senior Clinical Fellow.


    References
 Top
 Abstract
 Method
 Discussion
 Appendix 1
 Appendix 2
 References
 
1 Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol 1990;132:734–45.[Abstract]

2 Kipnis V, Freedman LS, Brown CC, Hartman AM, Schatzkin A, Wacholder S. Effect of measurement error on energy-adjustment models in nutritional epidemiology. Am J Epidemiol 1997;146: 842–55.

3 Subar AF, Thompson FE, Kipnis V et al. Comparative validation of the Block, Willett, and National Cancer Institute food frequency questionnaires. Am J Epidemiol 2001;154:1089–99.[Abstract/Free Full Text]

4 Day NE, McKeown N, Wong MY, Welch A, Bingham S. Epidemiological assessment of diet: a comparison of a 7 day diary with a food frequency questionnaire using urinary markers of nitrogen, potassium and sodium. Int J Epidemiol 2001;30:309–17.[CrossRef][ISI]

5 Kipnis V, Subar AF, Midthune D et al. Structure of dietary measurement error: results of the OPEN biomarker study. Am J Epidemiol 2003;158:14–21.[Abstract/Free Full Text]

6 Willett WC. Dietary diaries versus food frequency questionnaires—a case of undigestible data. Int J Epidemiol 2001;30:317–19.[Free Full Text]

7 Day NE, Ferrari P. Some methodological issues in nutritional epidemiology. In: Riboli E, Lambert R. Nutrition and Lifestyle: Opportunities for Cancer Prevention. IARC Scientific Publications No. 156, 2002, pp. 5–10.

8 Wong MY, Day NE, Wareham NJ. Measurement error in epidemiology: the design of validation studies II: bivariate situation. Stat Med 1999;18:2830–45.

9 Willett W. Nutritional Epidemiology. New York: Oxford University Press, 1998.

10 Day NE. Author's response to the commentary on the paper 'Epidemiological assessment of diet: a comparison of a 7 day diary with a food frequency questionnaire using urinary markers of nitrogen, potassium and sodium'. Int J Epidemiol 2002;31:692–93.[ISI][Medline]

11 Day NE, Oakes S, Luben R et al. EPIC in Norfolk: study design and characteristics of the cohort. Br J Cancer 1999;80(Suppl.1): 95–103.[ISI][Medline]

12 Bingham S, Gill C, Welch A et al. Validation of dietary assessment methods in the UK arm of EPIC. Int J Epidemiol 1997;26:S137–151.[Abstract/Free Full Text]

13 Michels KB, Bingham S, Luben R, Welch A, Day NE. The effect of correlated measurement error in multivariate models of diet. Am J Epidemiol. (in press).

14 Bates CJ, Thurnham DI, Bingham SA, Margetts BM, Nelson M. Biochemical markers of nutrient intake. In: Margetts BM, Nelson M. Design Concepts in Nutritional Epidemiology. 2nd Edn. Oxford: Oxford University Press, 1997.

15 Jakes RW, Day NE, Luben R et al. Adjusting for energy intake—what measure to use in nutritional epidemiological studies? Int J Epidemiol 2004;33:1382–86.