Regression Calibration in Studies with Correlated Variables Measured with Error

Gary E. Fraser1 and Daniel O. Stram2

1 Center for Health Research, School of Public Health, Loma Linda University, Loma Linda, CA.
2 Department of Preventive Medicine, School of Medicine, University of Southern California, Los Angeles, CA.


    ABSTRACT
 TOP
 ABSTRACT
 BACKGROUND
 REGRESSION CALIBRATION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Regression calibration is a technique that corrects biases in regression results in situations where exposure variables are measured with error. The existence of a calibration substudy, where accurate and crude measurement methods are related by a second regression analysis, is assumed. The cost of measurement error in multivariate analyses is loss of statistical power. In this paper, calibration data from California Seventh-day Adventists are used to simulate study populations and new calibration studies. Applying regression calibration logistic analyses, the authors estimate power for pairs of nutritional variables. The results demonstrate substantial loss of power if variables measured with error are strongly correlated. Biases in estimated effects in cases where regression calibration is not performed can be large and are corrected by regression calibration. When the true coefficient has zero value, the corresponding coefficient in a crude analysis will usually have a nonzero expected value. Then type I error probabilities are not nominal, and the erroneous appearance of statistical significance can readily occur, particularly in large studies. Major determinants of power with use of regression calibration are collinearity between the variables measured with error and the size of correlations between crude and corresponding true variables. Where there is important collinearity, useful gains in power accrue with calibration study size up to 1,000 subjects.

bias correction; bias (epidemiology); measurement error; models, statistical; regression analysis; regression calibration; statistical power; statistical significance

Abbreviations: FFQ, food frequency questionnaire; SE, standard error


    BACKGROUND
 TOP
 ABSTRACT
 BACKGROUND
 REGRESSION CALIBRATION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Because of cost and practical considerations, large epidemiologic studies of diet-disease relations are usually constrained to simple methods of dietary assessment, most commonly food frequency questionnaires (FFQs). Correlations between estimates from these questionnaires and those obtained from a reference method are often in the range 0.3–0.7, which suggests substantial error. The effect of errors is to bias both estimates of the effect of diet on disease and statistical significance, since events are dependent on true diet rather than crude estimates.

Recently, statistical techniques have been developed to help correct this bias, usually in the context of a calibration study (1GoGo–3Go). These methods include regression calibration and the analysis of structural models with latent variables (4Go). In both cases, the intent is to estimate the magnitude of disease associations with the unobservable true dietary intake.

The regression calibration method has been carefully described for multivariate logistic and proportional hazards models (5Go, 6Go), both of which have application to nutritional epidemiology. Thus, we focus here on regression calibration and illustrate the loss of statistical power caused by errors in exposure variables. This depends partly on the size of the calibration study, but the details of this relation are not well understood. Power also depends on the magnitude of the correlation between the data from the FFQ used and the true data (T), as well as the degree of collinearity with other variables in the model.


    REGRESSION CALIBRATION
 TOP
 ABSTRACT
 BACKGROUND
 REGRESSION CALIBRATION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
If there is only one variable measured with error, and this error follows the classical error model (i.e., errors about the truth are random and have a mean of zero), estimates of the regression coefficient ßs using traditional regression analyses are biased toward the null value. Where there are two or more such variables, biases may point in either direction, and there are important cases in which bias points away from the null. This is an example of residual confounding, i.e., loss of control of confounding due to errors in surrogate variables (1Go, 7GoGoGo–10Go). Moreover, estimates for variables that are not measured with error will also be biased (5Go).

Regression calibration for linear models will correct such biases if the assumptions are met. The concept is simple. Rather than use true dietary data in the regression, one substitutes the expected value of the true variable (conditional on observed data measured with error) (11Go). For nonlinear models, the estimates obtained are approximately unbiased (1Go, 5Go, 6Go). Specifically, for logistic regression, Rosner et al. (12Go) have shown good performance for odds ratios of 3.0 or less.

The expected values, conditional on the observed food frequency and other data not measured with error, are derived from a calibration study conducted in a much smaller sample that represents the cohort. Then the calibration equation is


where there are D dietary variables denoted by subscript k (one of which is the dependent dietary variable of interest h), there are J other variables not measured with error, i = 1, 2, ... N subjects, and c indicates the calibration model.

A problem with this simple substitution algorithm is that estimates of standard errors (SEs) of the corrected coefficients produced by standard analytical packages will be too small, because no account is taken of sampling variability of the calibration study regression coefficients. Fortunately, there are methods that provide unbiased estimates of the standard errors (5Go, 6Go), but these methods are not yet included in standard analytical packages. Thus, a simple substitution strategy using standard linear, logistic, or proportional hazards analyses will produce a set of coefficients that are approximately unbiased estimates of true coefficients, ßd, but special routines are necessary to calculate their standard errors.

Why is there loss of statistical power when regression calibration is used, as compared with the case where XT values are known for the whole cohort? A primary reason is the reduced variability of the predictor variables in the new regression model. With regression calibration, we substitute as the predictor variable in place of XT, and is always less variable than XT, because (13Go)

In general, the greater the variability of the predictor variables, the greater the power to detect a nonzero slope in the regression analysis. In addition, there may be more variability in Y given XFFQ than in Y given XT, which also reduces power. However, for binary Y, this second issue can be neglected, since the variance of Y is uniquely determined by its mean value.

For a univariate linear calibration equation, , where R is the correlation coefficient between XT and XFFQ. Thus, to a first approximation, 1/R2 more subjects are required to detect a nonzero regression between Y and XFFQ using regression calibration than if XT were available. Since R is rarely above 0.7 in dietary assessments, the implication is that at least twice as many subjects are needed to make up for errors in estimating XT using the FFQ. This loss of power is due to the presence of measurement errors, not to the use of the regression calibration method, and it would be present even if the regression calibration coefficients were known exactly.

In the univariate situation, it is well known (2Go)that d = sc. If ßc is treated as a constant, SE(d) = SE(s)/ßc. The ratio ß/SE() determines power (see below), and it can be seen that this ratio is identical for s and d (with a very large calibration study). Thus, univariate analyses using uncorrected nutrient measurements will also require approximately 1/R2 more subjects for equivalent power than when XT is available.

For multivariate problems, the issues are similar but more complicated, since the correlations between X1, T and X2, T become potentially important determinants of study power and biases can point both toward and away from the null. In univariate analyses, uncorrected estimates and their standard errors can be used to perform appropriate tests for nonzero regression parameters ß. However, in multivariate analyses, the uncorrected estimates will usually provide invalid tests of the hypothesis that ß1 = 0 (see below), in that the actual type I error rate is not nominal. This emphasizes the importance of correcting for measurement errors when assessing the independent effect of one exposure given the effect of another.

In summary, power may be adversely affected in regression calibration by a number of mechanisms: 1) a small study size, as expressed by the number of events, Y; 2) multicollinearity between the predictor variables in the disease regression—the typical collinearity problem; 3) poor validity, as measured by the correlation between truth and FFQ (this depends in part on the residual error in the calibration equations); and 4) imprecision in the estimate of due to imprecision of the calibration equation coefficients, ßc, which is also adversely affected by collinearity between variables in the calibration equation.

Designing a calibration study of sufficient size will diminish the fourth type of problem by decreasing the variance of the ßck's. The third type of problem will persist despite a large calibration study, but its effect on power is diminished by a larger cohort. If the validity correlation (that between XFFQ and XT) is low, the expected values of Xi,k, T will often differ substantially from the actual individual true values. However, it is these true values that determine the individual disease outcomes, so although dk is unbiased, its variance will be increased.

Referring to the work of Rosner et al. (5Go) and applying this to models with two dietary variables and two variables measured without error,

(1)
s is a vector of coefficients obtained by using the food frequency ariables in an uncorrected logistic model, d are the unbiased coefficients obtained by regression calibration, and {lambda} is the matrix

incorporating coefficients from the calibration equation.

As a consequence of equation 1, in a crude analysis a particular ßsk represents a linear combination of the true ßdk's. This confounding is particularly influential when variables are importantly correlated, because then the off-diagonal values of the upper rows of {lambda} are greater. Since X2, T is poorly measured by X2, FFQ, a correlation between X1, T and X2, T implies that the measurements X1, FFQ contain additional information about X2, T, an example of residual confounding. Then the estimate of the effect of X1 given X2 is biased.

It is apparent that even when ßdk = 0, ßsk in general is not zero. For instance, ßs1 = ß d1 x ß1c1 + ßd2 x ß2c1, where there are two dietary variables in the model, but only the first term becomes zero under the null for the first variable. Thus, type I error rates are distorted in a crude analysis when the true value of X1 is correlated with the measured X2, even after conditioning on the measured X1—a problem also alluded to by Armstrong et al. (7Go).


    METHODS
 TOP
 ABSTRACT
 BACKGROUND
 REGRESSION CALIBRATION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The index calibration study
In a validity study of diet, 159 Seventh-day Adventist men and women were randomly selected from a random sample of 40 churches in California (14Go). These adult subjects completed an extensive FFQ (202 questions) between two blocks of 24-hour recalls. The four recalls in each block included a nonconsecutive Saturday and Sunday and two randomly selected weekdays from different weeks. Each recall was separated from the previous recall by at least 10 days, and the two blocks were separated by 3–4 months. Recall data were obtained by telephone using Nutrition Data System (15Go) software, which was also used to convert food consumption data to nutrient values.

Thus, we obtained estimates of nutrient intake from the FFQ and also from a "reference" method constructing a synthetic week by appropriate choices of weights for weekend and weekday 24-hour recall data. In the following simulation, these reference data are used as a model with which to provide realistic covariance structures and calibration ß's when generating hypothetical true dietary data. Both reference and food frequency data were energy-adjusted by the residual method (16Go), and the variables XT and XFFQ below refer to energy-adjusted values. For the purpose of this illustration, we shall ignore the fact that the subjects of the above calibration study were clustered by church.

Simulating a cohort population and a separate calibration study
The steps were as follows for each combination of factors (e.g., cohort size, calibration study size, dietary variables, etc.), programming in S-Plus (Insightful Corporation, Seattle, Washington).

  1. Generate full cohort populations of age and sex data (the distributions in our previous Adventist cohort (17Go) were used for this). The cohorts were of size 50,000, 200,000, and 400,000 when the numbers of simulated cases were 260, 1,040, and 2,080, respectively.
  2. Using calibration study data, regress the two "true" nutrients on age and sex (labeled Z1 and Z2 below). Then use this multivariate regression to generate a normally distributed set of "true" variables (labeled XT below) for the full cohort population, maintaining the observed variance-covariance residual structure of all variables.
  3. Use the hypothesized odds ratios to find the corresponding true vector ßd, calculated as the log odds ratios. Then log (oddsi) = {alpha} + {sum} ßdkXikT + {sum} ßdjZij for subject i. Choosing {alpha} so as to produce the desired number of cases in the whole population, a binomial distribution is used with oddsi/(1 + oddsi) as the parameter to produce simulated disease events, Y, in the whole population. An incidence similar to that seen in our previous cohort for colon cancer (18Go) was chosen for this illustration. All simulations were based on alternate hypothesis values equaling odds ratios of 1.06 per year of age, 1.25 for male sex, and 2.0 (or 0.5) when comparing midpoints of extreme quintiles of true dietary variables. Cases for a particular analysis were conditioned only on the two dietary variables relevant to that analysis and on Z.
  4. Using the original observed calibration data in a bivariate regression, regress X1, FFQ and X2, FFQ on X1, T, X2, T, Z1, and Z2.
  5. Use this fitted model based on the original calibration study to generate values of X1,FFQ and X2,FFQ for the cohort, conditional on the X1,T and X2,T (generated in step 2 above), and also Z1 and Z2, adding errors according to the estimated covariance matrix from step 4.
  6. Select a random n subjects from the generated total population for a new external calibration study that is excluded from the cohort for subsequent analyses of disease outcome.
  7. Choose a random sample of noncases, plus all disease cases, to use in an unmatched nested case-control analysis. We chose this type of analysis to reduce computation time. A 1:10 case:control ratio was chosen, which should only slightly affect standard errors as compared with a full cohort analysis (19Go).
  8. Perform a regression calibration analysis, using a logistic model, the generated nested case-control population, their X1,FFQ, X2,FFQ, Z1, and Z2 values, and the new calibration study data, according to the method of Rosner et al. (5Go). For the purpose of this illustration, we have assumed a short follow-up period and an unmatched nested case-control design, so that the use of an unconditional logistic analysis is appropriate (20Go).
  9. Repeat steps 3 and 5–8 p times and use the square root of the average of the variance estimates for d to estimate SE(d).
  10. Estimate power as


    , where F is the standardized normal distribution function.

Each repetition of steps 3 and 5–8 produces one estimate of SE(ß). We could have based our entire analysis upon a single set of simulated data for each configuration of parameter values considered below. However, for the finite data set sizes used, some sampling variability remains in the estimate of SE(ß) obtained from a single simulation. In order to reduce this sampling variability to an ignorable level, we chose to repeat these steps a total of 30 times, as we found that p = 20–30 provided very stable estimates for each parameter configuration. If, instead of using the Rosner et al. (5Go) formula, we had empirically attempted to estimate SE(ß) by simulation alone, we would have needed many more simulations.


    RESULTS
 TOP
 ABSTRACT
 BACKGROUND
 REGRESSION CALIBRATION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
We have chosen three pairs of variables that are of biologic interest and may be included together in analyses to explain risk of colon cancer or other cancer. These are 1) dietary fiber and total fat, 2) calcium and total fat, and 3) vitamin C and total fat. Although the first pair of variables are highly correlated, as are their errors, the second pair was selected, since they are essentially uncorrelated. The third pair was selected to be moderately correlated, but also because the measurement of vitamin C has relatively poor validity in our data.

Along with other correlations, those between the energy-adjusted food frequency nutrient estimates and the corresponding reference (used as a surrogate for true) data in the calibration study are shown in table 1. The correlation for vitamin C is noticeably lower than the other correlations at 0.34. Fat, a constituent of all three models, whether measured by repeated 24-hour recalls or the FFQ, is highly correlated with the corresponding dietary fiber variable, weakly correlated with calcium, and modestly correlated with vitamin C.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Correlations between reference and food frequency questionnaire (FFQ) dietary intake data in a calibration study*

 
Power using regression calibration is estimated for six sizes of the calibration study: n = 50, 100, 250, 500, 1,000, and 2,000. Figure 1 shows power estimates for regressions containing fat and fiber as dietary variables, at different sizes of the calibration study and the cohort, the latter indicated by the number of disease cases. Clearly substantial gains in power result from calibration study sizes up to 1,000 subjects, although if the cohort is small, even large calibration studies are of little benefit. A similar analysis for the weakly correlated variables fat and calcium (figure 2) demonstrates greater power than that in the fat/fiber analyses for equivalent cohort sizes, but there is little gain beyond a calibration study size of 250–500 subjects. Note that power for fat is somewhat lower than that for fiber or calcium at a particular calibration study size, reflecting the greater validity of estimation of fiber and calcium intakes. The effect of poor validity of the FFQ nutrient estimator is shown with vitamin C (figure 3), where power suffers in comparison with fat, although larger cohort sizes can substantially overcome this problem.



View larger version (15K):
[in this window]
[in a new window]
 
FIGURE 1. Effect of calibration study size and cohort size* on statistical power{dagger} ({alpha} = 0.05) with regression calibration: fat and fiber. (* Mean of 30 simulations, except for n = 2,080, when a mean of 15 was used. {dagger}Alternative hypotheses are of twofold relative risks between extreme quintiles of true dietary data.)

 


View larger version (15K):
[in this window]
[in a new window]
 
FIGURE 2. Effect of calibration study size and cohort size* on statistical power{dagger} ({alpha} = 0.05) with regression calibration: fat and calcium. (* Mean of 30 simulations, except for n = 2,080, when a mean of 15 was used. {dagger} Alternative hypotheses are of twofold relative risks between extreme quintiles of true dietary data.)

 


View larger version (15K):
[in this window]
[in a new window]
 
FIGURE 3. Effect of calibration study size and cohort size* on statistical power{dagger} ({alpha} = 0.05) with regression calibration: fat and vitamin C. (* Mean of 30 simulations, except for n = 2,080, when a mean of 15 was used. {dagger} Alternative hypotheses are of twofold relative risks between extreme quintiles of true dietary data.)

 
Biased results are expected when disease events are simulated conditional on true data but FFQ data are used in the disease regression without correction (table 2). To ensure that the beta coefficients for true and FFQ data are comparable, both dietary variables are converted to standardized units but units of two different kinds. Because the range of values expressed in the same units (e.g., energy-adjusted grams per day) may be quite different in the true data and the FFQ data, we first standardized to the difference between extreme quintiles of the population as measured by either true data or FFQ data. The second method used units common to both FFQ data and true data, with no consideration of the different standard deviations.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Bias that results when regression calibration is not used: a comparison of true and crude odds ratios*

 
As can be seen in table 2, odds ratios from uncorrected models are substantially biased. This is especially true when the standard is energy-adjusted units per day, reflecting a wider range of such values in the FFQ estimates. The poor validity of the vitamin C index appears to have resulted in particularly strong biases for this variable.

In contrast, when regression calibration is used (calibration study size = 1,000), the mean values of 30 simulated beta coefficients are 0.174 (fat) and -0.161 (fiber) for the fat/fiber model, 0.186 (fat) and -0.103 (calcium) for the fat/calcium model, and 0.181 (fat) and -0.382 (vitamin C) for the fat/vitamin C model. Apart from sampling errors, these results are close to expectation (see the "True ß" columns on the right side of table 2) and a great improvement on the crude results. In addition, the means of estimated variances produced by the Rosner et al. (5Go) method were close to the empirical variances (d), as expected.

A comparison of power under three different models is shown in table 3. The three models are the regression analysis in which true values are known, the regression calibration model (with a large calibration study used to give stability to expected values), and the uncorrected (crude) analysis. The quite dramatic loss of power in both of the latter models is clear, as predicted above.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Estimated statistical power ({alpha} = 0.05) obtained using either underlying true regression, regression calibration, or crude regression

 
Power with a crude analysis is a good deal greater than that in a corrected analysis when there is collinearity (e.g., fat and fiber). When the nutrients are nearly uncorrelated (e.g., fat and calcium), power in a crude analysis approximates that in a corrected analysis with a large calibration study, similar to univariate results. Interestingly, for vitamin C and calcium, power is actually greater with regression calibration (with a large calibration study), and this seems to be due to an improvement in the precision of estimation of true vitamin C and calcium when fat, age, and sex are predictors in the multivariate calibration equations.

In table 4, each entry is the observed size of the test of the null hypothesis for that variable in a crude analysis. The true coefficient for that variable was set to zero in the simulation, other coefficients being at alternative-hypothesis values. As expected, the size with regression calibration (not shown) was always nominal at 0.05. The results demonstrate that a crude analysis can result in markedly supernominal type I error estimates. Events are hypothesized as depending on FFQ data, but in fact they are conditional on true data (as they were simulated in table 4). When the true effect is null, a nonnull apparent effect, often of substantial magnitude, persists. To reiterate, this is due to the correlation between the nutrients, as measured. Because of the distortions in type I error rates, power calculations based on crude estimates may be quite incorrect.


View this table:
[in this window]
[in a new window]
 
TABLE 4. Size of tests of statistical significance in multivariate analyses without correction for measurement error (nominal size = 0.05)*

 

    DISCUSSION
 TOP
 ABSTRACT
 BACKGROUND
 REGRESSION CALIBRATION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
The above simulations demonstrate that biases present when crude food frequency data are used alone in a disease regression are essentially eliminated by regression calibration. It is instructive to observe the magnitude of the biases present when regression calibration is not used and to ponder their effects on both effect estimates and statistical significance in the large body of diet-disease literature that generally has not used any error correction techniques and has often produced conflicting or inconsistent results.

The reference data that we used to simulate "true" data in these analyses may not properly represent the actual errors between crude and (unobservable) true data. Thus, the calculated results may not represent true fat, calcium, fiber, and vitamin C intakes in this population, could they be measured. However, our purpose was to further explore some aspects of the regression calibration method, and we have used a variety of correlational structures for illustration.

A little-recognized fact is that a null true effect will generally not correspond to a null effect in a crude multivariate analysis, although this has been described by other investigators (8Go, 21Go, 22Go). A test of both dietary coefficients treated together would not have this problem, but separate tests are subject to residual confounding by the other dietary coefficient(s). This will result in either overestimates or underestimates of statistical significance, or even reversal of statistical significance between two variables (10Go). A larger study size will not change the biased nonzero "null" beta coefficients. In fact, the likelihood of spurious statistical significance is especially great then because of the smaller standard errors.

We have demonstrated that exposure measurement error can substantially reduce statistical power, even after bias reduction by regression calibration. This reduction in power is less when both food frequency variables have good validity and are not highly correlated. The effect of moderate collinearity between measured dietary variables in reducing power can be just as severe as that of relatively poor validity, a point also noted by Wong et al. (23Go). Many epidemiologists prefer confidence intervals above statistical significance. Reduced power results from greater standard errors of the coefficients, and this, of course, is directly related to wider confidence intervals after bias correction.

Although a validation study size of 100–200 subjects has often been selected in practice (4Go, 23Go, 24Go), our results show that this may be inadequate when multivariate calibration is the goal. A validation study that produces an acceptably precise validity correlation may be suboptimal for regression calibration where there is collinearity between the variables. In a large cohort, the cost of increasing a calibration study from 200 individuals to 1,000 may be only a modest proportion of the total budget, and it can lead to important gains in statistical power. Other researchers have considered the optimal size of a validation study for the purpose of calibration and the number of recalls/diaries needed per person when costs are fixed (25Go, 26Go), but not in the context of multivariate analyses.

It is of interest that regression calibration with approximately uncorrelated variables and with a relatively large calibration study does not lose much power as compared with an uncorrected analysis using FFQ data only. This is similar to the result for a univariate regression calibration analysis. However, when variables in a regression calibration analysis are importantly correlated, this has a "double" effect, increasing the standard errors of both the calibration ß's and the disease regression ß's.

Of course, misspecification of the calibration model would be a serious problem and might lead to new biases in both beta coefficients and their standard errors. Many nutritional variables are not normally distributed. Hence, study of the goodness of fit of a linear regression for the calibration model, the use of transformations where necessary, and a search for influential outliers are all mandatory.

Although in practice we cannot observe true dietary values for individuals, if a reference method can be assumed to follow the classical error model Xref,i = XT,i + {varepsilon}i, E({varepsilon}i) = 0 for all i, then calibrating to this reference method will also give unbiased estimates of ßd (27Go). This is because, in such a situation, the calibration ß's, ßc, will be unbiased estimates of those produced when calibrating to XT and so will produce unbiased estimates of E(XT|XFFQ, Z) for regression calibration. Expressed a little differently, in this situation, errors in estimating XT from Xref are independent both of errors in estimating XT using XFFQ and of the outcome Y, conditional on XT (27Go).

In conclusion, if a reference method is found such that the classical error model can be reasonably assumed, regression calibration produces unbiased estimates of the true ß's and thereby becomes a very useful tool. It is clear that the crude regression using only FFQ data may produce markedly biased estimates due to residual confounding, and spurious statistical significance when testing the null hypothesis in models with two or more correlated nutritional variables. The cost of measurement errors is loss of statistical power in a corrected analysis, which may be quite extreme in multivariate analyses if variables are highly correlated. Where correlations are modest, our examples show that important gains in power can accrue with calibration studies of up to 1,000 subjects.


    ACKNOWLEDGMENTS
 
The authors gratefully acknowledge the helpful advice of Drs. Sander Greenland, Theodore Holford, Bernard Rosner, Laurence Freedman, and Duncan Thomas.


    NOTES
 
Correspondence to Dr. Gary E. Fraser, Center for Health Research, Nichol Hall, Room 2008, School of Public Health, Loma Linda University, Loma Linda, CA 92350 (e-mail: gfraser{at}sph.llu.edu).


    REFERENCES
 TOP
 ABSTRACT
 BACKGROUND
 REGRESSION CALIBRATION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Carroll RJ, Ruppert D, Stefanski LA. Measurement error in non-linear models. New York, NY: Chapman and Hall Ltd, 1995.
  2. Thomas D, Stram D, Dwyer J. Exposure measurement error: influence on exposure-disease relationships and methods of correction. Annu Rev Public Health 1993;14:69–93.[ISI][Medline]
  3. Freedman L. Challenges for statistical approaches to dietary assessment. Eur J Clin Nutr 1998;52(suppl 2):S79.
  4. Kipnis V, Carroll RJ, Freedman LS, et al. Implications of a new dietary measurement error model for estimation of relative risk: application to four calibration studies. Am J Epidemiol 1999;150:642–51.[Abstract]
  5. Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol 1990;132:734–45.[Abstract]
  6. Spiegelman D, McDermott A, Rosner B. Regression calibration method for correcting measurement-error bias in nutritional epidemiology. Am J Clin Nutr (suppl) 1997;65:1179S–86S.[Abstract]
  7. Armstrong BG, Whittemore AS, Howe GR. Analysis of case-control data with covariate measurement error: application to diet and colon cancer. Stat Med 1989;8:1151–63.[ISI][Medline]
  8. Kipnis V, Freedman LS, Brown CC, et al. Effect of measurement error on energy-adjustment models in nutritional epidemiology. Am J Epidemiol 1997;146:842–55.[Abstract]
  9. Greenland S. The effect of misclassification in the presence of covariates. Am J Epidemiol 1980;112:564–9.[Abstract]
  10. Zidek JV, Wong H, Le ND, et al. Causality, measurement error and multicollinearity in epidemiology. Environmetrics 1996;7:441–51.[ISI]
  11. Carroll RJ, Ruppert D, Stefanski LA. Measurement error in non-linear models. New York, NY: Chapman and Hall Ltd, 1995:17.
  12. Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med 1989;8:105–69.
  13. Lindgren BW. Statistical theory. London, United Kingdom: MacMillan Company, 1969.
  14. Myint T, Fraser GE, Lindsted KD, et al. Urinary 1-methyl histidine is a marker of meat consumption in black and white California Seventh-day Adventists. Am J Epidemiol 2000;152:752–5.[Abstract/Free Full Text]
  15. Nutrition Data System. Minneapolis, MN: Nutrition Coordinating Center, University of Minnesota, 1993.
  16. Willett W. Nutritional epidemiology. New York, NY: Oxford University Press, 1990:252–69.
  17. Beeson WL, Mills PK, Phillips RL, et al. Chronic disease among Seventh-day Adventists, a low risk group. Cancer 1989;64:570–81.[ISI][Medline]
  18. Singh PN, Fraser GE. Dietary risk factors for colon cancer in a low-risk population. Am J Epidemiol 1998;148:761–74.[Abstract]
  19. Breslow N. Design and analysis of case-control studies. Annu Rev Public Health 1982;3:29–54.[ISI][Medline]
  20. Prentice R, Pyke R. Logistic disease incidence models and case-control studies. Biometrika 1979;66:403–11.[ISI]
  21. Fung KY, Howe GR. Methodological issues in case-control studies. III. The effect of joint misclassification of risk factors and confounding factors upon estimation and power. Int J Epidemiol 1984;13:366–70.[Abstract]
  22. Elmstahl S, Gullberg B. Bias in diet assessment methods—consequences of collinearity and measurement errors on power and observed relative risks. Int J Epidemiol 1997;26:1071–9.[Abstract]
  23. Wong MY, Day NE, Wareham NJ. Measurement error in epidemiology: the design of validation studies. II. Bivariate situation. Stat Med 1999;18:2831–45.[ISI][Medline]
  24. Willett WC, Hunter DJ, Stampfer MJ, et al. Dietary fat and fiber in relation to risk of breast cancer. JAMA 1992;268:2037–44.[Abstract]
  25. Spiegelman D, Gray R. Cost-efficient study designs for binary response data with gaussian covariate measurement error. Biometrics 1991;47:851–69.[ISI][Medline]
  26. Stram DO, Longnecker MP, Shames L, et al. Cost-efficient design of a validation study. Am J Epidemiol 1995;142:353–62.[Abstract]
  27. Spiegelman D, Schneeweis S, McDermott A. Measurement error correction for logistic regression models with an "alloyed gold standard." Am J Epidemiol 1997;145:184–96.[Abstract]
Received for publication March 9, 2000. Accepted for publication April 4, 2001.