From the Department of Epidemiology, University of Washington, and Cancer Prevention Research Program, Fred Hutchinson Cancer Research Center, Seattle, WA.
Received for publication April 26, 2002; accepted for publication November 13, 2002.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
epidemiologic methods; measurement error; reliability; validity
Abbreviations: Abbreviations: OR, odds ratio; FFQ, food frequency questionnaire.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Differential exposure measurement error between subjects with and without the disease under study is a major source of bias in epidemiologic studies. Sources of differential measurement error include differential recall or knowledge of exposures between cases and controls, the data collectors knowledge of the subjects disease status, and the biologic effects of the disease or its preclinical phase. Although differential measurement error is a major concern in retrospective studies, because the subjects (and possibly data collectors) know both disease and exposure status, it could also occur in cohort studies, for example, if those with symptoms of disease change their "usual" diet before future diagnosis, or those with a strong family history of the disease have both more accurate knowledge of their family history and substantially higher future disease risk. Although nondifferential error generally biases the exposure-disease association toward the null, differential measurement error can cause the observed association between the measured exposure and the outcome to appear stronger than the true association or weaker than the true association, or it can lead to an association in the opposite direction, thus completely invalidating the results of the study (6, 7).
Despite this major concern about differential measurement error, most reliability and validity studies of exposures are not conducted to assess differential error, or they assess differential measurement error incompletely. For dichotomous exposures, differential error can be assessed by the sensitivity and specificity of the exposure measure versus a criterion measure in a sample of cases and in a sample of controls, and simple formulas can be used to interpret these parameters in terms of bias in the odds ratio (6, 7). On the other hand, for continuous exposure measures, much of what has been written about differential error is mathematically complex (5) and therefore has not entered the mainstream of epidemiologic methods. However, if simplifying assumptions are made, then there is a simple approximate equation for the effect of differential measurement error on the odds ratio for a continuous exposure (6, 8). The primary aim of this paper is to show that the parameters in that equation are the key parameters that need to be estimated in a validity or reliability study to assess differential error in a continuous exposure and to discuss how one can design a validity/reliability study to estimate those parameters.
![]() |
EFFECTS OF MEASUREMENT ERROR ON THE ODDS RATIO |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A model of measurement error
A common model of measurement error in a population is the following:
Xi = Ti + b + Ei,
where
µE = 0
and
TE = 0.
This states that for a given individual i, his observed measure Xi differs from his true value Ti by two types of measurement error. The first is systematic bias, b, that would occur (on average) for all measured subjects in a population of interest. The second, Ei, is the additional error that varies by subject.
For the population of interest, X, T, and E are variables with expectations (means over an infinite population) denoted by µX, µT, and µE, respectively, and variances denoted by ,
, and
. Because the average measurement error in X in the population is expressed as a constant, b, it follows that µE, the population mean of the subject error, is zero. The model includes the assumption that the correlation coefficient of T with E,
TE, is zero; that is, that subjects with high true values do not have systematically larger (or smaller) errors than subjects with lower true values.
Two measures of measurement error are used to describe the validity of X, that is, the relationship of X to T in the population of interest, based on the above model and assumptions. One is the bias or the average measurement error in the population:
b = µX µT.
The other parameter is a measure of the precision of X, which is a measure of the variation of the measurement error in the population. One measure of precision is the correlation of T with X, TX, termed the validity coefficient of X. Under the above model, it can be shown (9) that
TX is assumed to range between zero and one; that is, for X to be considered to be a measure of T, X must be positively correlated with T.
Bias and precision are independent concepts. A measure can be biased but perfectly precise; for example, an accurate scale that is calibrated to measure all subjects exactly 2 kg too light would have a bias of 2 kg and TX = 1.0. A measure can be unbiased, that is, yield the correct average in a population but lack precision. For example, a scale could be correct on average while overestimating the weight of some subjects and underestimating the weight of others. In this situation, b = 0, while
TX would be less than one.
Figure 1 demonstrates the effect of measurement error on the distribution of X in a population. The bias in X causes a shift in the distribution of X compared with T, such that the means differ by b:
|
The imprecision of X (measured by TX) causes a greater variance or dispersion of the distribution of X compared with that of T (9):
Effects of differential measurement error on the odds ratio
Measurement error is not an inherent property of an instrument but, rather, is a property of the instrument administered in a particular way to a specific population. Therefore, measurement error can differ between those with the disease of interest and a control group. Differential measurement error would have effects on the observable means and variances of the exposure variable within the disease and control groups (as above) and, more importantly, would bias the measure of exposure-disease association, for example, the odds ratio function (the odds of disease at each level of exposure vs. the odds at a reference level r).
Extending the measurement error model above to the two groups (D for disease group and C for control group) and using X1 to represent the exposure measure to be used in the epidemiologic study:
In this model, differential exposure measurement error occurs when blC, the bias in the exposure measure in the control group, differs from blD, the bias in the disease group, or when the precision of XlC differs from that of XlD.
Figures 2 and 3 give an example of differential measurement error, specifically differential bias between cases and controls. In the figures, the true mean exposure in the disease group, , is greater than the true mean exposure in the control group, µTC, which leads to a positive slope in the true odds ratio function (OR = f(T)). In this example, exposure is overestimated in the control group (positive bias) so the distribution of XlC is shifted to the right relative to TC, and the exposure is underestimated among those with disease (negative bias) so that the distribution of XlD is shifted to the left relative to TD. This leads the observable odds ratio curve (OR = f(X)) to cross over the null value of one: It indicates less disease risk with increasing exposure, rather than the true increasing disease risk.
|
|
If A is positive, it gives the proportion over- or underestimation; for example, A = 1.5 means that X1, the exposure measured with error, overestimates the true case-control mean difference in exposure by 50 percent. If A is negative, then the mean difference in exposure between cases and controls has changed signs; that is, if the disease were truly associated with higher levels of exposure on average, it would appear, based on the use of measure X1, that the disease was associated with lower mean exposure.
To make possible a simple equation for the effects of differential measurement error on the odds ratio, one needs to make certain assumptions. Results can be derived for unmatched case-control studies under the following assumptions: 1) XlD and XlC are modeled as above with for each group; 2) TD and TC are normally distributed with means
and
, respectively, and the same variance,
; and 3) ElD and E1C are normally distributed with mean zero and common variance,
. The last two assumptions mean that there is only differential bias, and the precision,
, is the same for cases and controls. (The equations below also are based on the assumption that the only source of error in the odds ratio is measurement error in the exposure.)
The above assumptions lead to a logistic model for the probability of disease (P) as a function of true exposure T, with a true logistic regression coefficient ßT (10):
where
In this model, the odds ratio function can be expressed as a single parameter representing the true odds ratio for any u-unit increase in T:
With measurement error in the exposure variable Xl, the assumptions also lead to a logistic model (8):
where
Then, the observable odds ratio (ORO) for a u-unit increase in X can be expressed in terms of ORT (if ) as follows:
where A is defined in equation 1 above. ORO differs from ORT because of the effect of differential bias (expressed by A) and lack of precision (expressed by ). The effect of
is more predictable because it can only range from zero to one. However, as noted above, the factor A can be any magnitude and either positive or negative. If
is between zero and one, the observable odds ratio will be attenuated toward the null value of one compared with the true odds ratio; if
is greater than one, the observable odds ratio will be biased away from the null; and if A is less than zero, the observable odds ratio crosses over the null value of one from the true odds ratio. For example, if the exposure measurement were perfectly precise (
), then values of A of 0.5, 0.0, 0.5, 1.0, and 1.5 can be interpreted as biasing a true odds ratio of 4 to an observable odds ratio of 0.5, 1, 2, 4, and 8, respectively. When
, then each of these observable odds ratios would be biased toward one.
Note that, when there is nondifferential measurement error and the assumptions above hold, equation 2 can be simplified (10, 11) to the following:
This equation shows that, under nondifferential error, the observable odds ratio for any u-unit increase in X1 is attenuated toward the null value of one compared with the true odds ratio for a u-unit increase in T, by the power . The bias in the odds ratio (the attenuation) under nondifferential measurement error is not a function of the bias in X1 (because the same constant bias is added to exposure for both disease and control subjects). Thus, the focus of validity/reliability studies for nondifferential error is on estimation of
. This makes their design and interpretation substantially different from studies of differential measurement error, which need to assess A as well as
. Therefore, this paper focuses below on the measurement and interpretation of factor A.
![]() |
DESIGN OF STUDIES TO MEASURE DIFFERENTIAL MEASUREMENT ERROR |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
For a given subject i, two (continuous) exposure measurements, Xi1 and Xi2, are obtained. The simple model of measurement error above can be extended to X2 measured in the two groups (disease and control):
This model states that the second measure on the ith subject in the disease group, Xi2D, is equal to his true value, TiD (same true value as the first measure), plus the bias of the second instrument in the disease group, b2D, plus his error on the second measure, Ei2D. Similarly, b2C is the bias of the comparison measure in the control group.
In a method comparison study, information is collected on X1 and X2 for each subject but not on T. Such a study can yield estimates of the mean of X1 and X2 in the diseased group ( and
) and in the control group (
and
) and of the correlation between X1 and X2 in each group. The primary question is: How can a study be designed so that these estimable parameters can be used to estimate the parameters in equation 2?
Selection of the comparison measure for estimation of differential bias in X1
Method comparison studies often cannot provide information on the bias in X1. Only if X2 is perfect or if X2 is an unbiased measure of T (b2 = 0), then:
Thus, a comparison measure X2 could be selected if it is correct on average (unbiased), even if it is not perfect.
As described above, a meaningful measure of differential bias is factor A (equation 1). The difference between the bias in X1 between cases and controls, b1D b1C, can be measured not only when X2 is perfect or X2 is unbiased (b2D = b2C = 0) as above, but also when there is bias in the comparison measure X2 but it is nondifferential, that is, if b2D = b2C. Then, under the simple measurement error model (6):
Therefore, if the comparison measure is carefully selected, a method comparison study can assess the differential bias between cases and controls in the measure of interest, X1. Ideally, to determine the accuracy of an instrument, measurements from the instrument would be compared with those from a perfect measure of exposure in a validity study. This would yield estimates of both the bias and the validity coefficient among cases and among controls. Almost all techniques for measuring and adjusting for differential measurement error assume that a perfect comparison measure of exposure is available (5). However, as shown above, when a perfect measure is not available or feasible, a study can assess differential bias in X1, if the comparison measure X2 is selected that is unbiased (among cases and among controls) or that can be assumed to have nondifferential bias (equal bias for cases and controls). In each case, the differential bias in X1 can be estimated by equation 4. For example, to assess differential bias in mothers recall of a childs birth weight (X1) between mothers of children with leukemia and control mothers, X1 could be compared with medical records (X2) among cases versus a similar comparison among controls. Although there may be error in the medical records, any bias is unlikely to be different between cases and controls. Thus, a good choice for a comparison measure is prospectively collected information recorded before disease diagnosis (or more accurately, prior to the period during which preclinical disease could influence exposures) if such information is available on at least a subset of cases and controls. Unfortunately, because such comparison measures are often not available, studies of differential measurement error are not always feasible.
Analysis of differential bias
The value of t from a two-sample t test on the variable (Xi1 Xi2) computed for each subject can be used to compute a confidence interval for the difference in b1 between the case and control groups. However, a judgment as to whether there is differential measurement error should not rely on statistical significance. A statistically significant difference between cases and controls would imply differential error, but a nonsignificant difference might still indicate an important degree of differential measurement error that was not significant given the sample size of the validation study.
Interpretation of differential bias
As outlined above, factor A is a useful parameter for describing the effect of differential bias on the mean exposure difference between cases and controls and, under certain assumptions, the effect of differential bias on the odds ratio (equation 2). Estimation of factor A (equation 1) requires an estimate of ( ) as well as an estimate of (b1D b1C) by equation 4. When X2 is perfect, unbiased, or has nondifferential bias:
If the parent epidemiologic study has been completed, it may be more accurate to estimate ( ), using the mean of X1 in the disease group (
) and nondisease group (
) from the parent study, and the differential bias from the method comparison study:
So a method comparison study in which X2 is perfect, unbiased, or has nondifferential bias can be used to estimate A from equation 4 and equation 5 or 6. This can be used to understand the effect of differential measurement error on the odds ratio from equation 2, using an estimate of from the same or another reliability/validity study. An example of the design and interpretation of a study to measure differential measurement error is given in the Appendix.
Another way to use equation 2 to interpret the effects of differential error is to estimate the true odds ratio after completion of the parent epidemiologic study. The true odds ratio (ORT) could be estimated from the observed odds ratio (ORO) by solving equation 2 for ORT (8):
Cautions about the use of such adjustment equations are discussed below.
Estimation of TX1
To fully understand the effects of differential measurement error, one also needs to estimate , preferably separately for the disease and control groups. Detailed discussions of the design and interpretation of validity/reliability studies to estimate
or of the design of validation substudies used to correct for the effects of lack of precision of the exposure in the parent epidemiologic study have been presented (3, 6, 1222). Briefly,
can be estimated directly from a validity study in which the comparison measure X2 is perfect or can be calculated under certain assumptions when the errors in X1 and X2 are uncorrelated. Under some situations, one could use the same study to estimate
as well as A, even when a perfect comparison measure is not available. If X2 is not perfect but does not have differential bias and is more precise than X1, and if the errors in X1 and X2 are uncorrelated, then information about both A and
can be gained (6).
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To understand the effects of differential measurement error, one also needs to estimate , separately for the disease and control groups. Differential precision also biases odds ratio estimates. If there was nondifferential bias in X1 but
differed between cases and controls, the shape of the odds ratio function could change. For example, the observable odds ratio function could be U-shaped when, in reality, disease frequency increases monotonically with increasing exposure (23). However, studies that report only differential measurement error in terms of the validity coefficient for each of the case and control groups do not measure the important effects of differential bias.
There are several limitations to the results presented. The first is that the results are based on a simple additive error model, with a constant additive error (bias) within each group and an additive subject error. If part of the error is proportional to the true value, that is, X1 or X2 captures only some fixed proportion of the exposure or similarly if the scales of X1 and X2 are in different units, then the results do not hold. For example, one could not generally use a biochemical measure as a comparison to assess bias or differential bias in a food frequency measure of intake because these would be on different scales.
The adjustment equations (equations 2 and 7) are presented to give researchers tools to interpret the magnitude of factor A in terms of the magnitude of its effect on the odds ratio, under simplifying assumptions. These assumptions would not necessarily hold in real applications. Specifically, these equations were based on the assumption of equal variances of T and E1 across cases and controls, which implies equal for the two groups. If one has a perfect measure of exposure, T, in a validity study, one can test these assumptions directly. Otherwise, if the variance of X1 varies substantially between the two groups (in the full epidemiologic study), or if the correlation coefficients between X1 and X2 in a validity/reliability study vary substantially between the two groups, this suggests that these assumptions do not hold. The normality assumptions used in the derivation of equations 2 and 7 mean that these equations become poorer approximations as the exposure deviates further from a normal distribution.
The simple adjustment equations given also do not take into consideration the imprecision in the estimates of and A (and for equation 7, the observed odds ratio). For example, the results from equation 7 when the observed odds ratio is slightly less than one would lead to a very different estimate of the true odds ratio than if the observed odds ratio were slightly greater than one. Also, these equations ignore the effects of covariates in the model. The bias in the covariate-adjusted odds ratio would depend on the multivariate measurement error structure of the main exposure and covariates, and an accurate adjustment of the odds ratio would generally require information from a validity study with perfect measures of the main exposure and covariates. There are several statistical methods that can be used for adjustment for differential measurement error that make fewer distributional assumptions, provide confidence intervals, allow other measurement error models, and/or allow adjustment for measurement error in the covariates as well (8, 15, 2428) (see Thürigen et al. (5) for a review). Unless these techniques are used, the emphasis should be on understanding the possible degree of bias in the observed odds ratio due to the exposure measurement error rather than on the estimated true odds ratio.
Despite these limitations and cautions, the approaches given in this paper may provide added insight into the design and interpretation of studies of differential measurement error. If more such studies are conducted, this will help epidemiologists to understand which exposure-disease associations or study designs are particularly vulnerable to differential error.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
As an example, a reliability study was conducted using a nested case-control study within a cohort study to assess differential bias between breast cancer cases and controls in a retrospective food frequency questionnaire (FFQ) estimate of dietary fiber intake (X1) (29). Women in the cohort study completed an FFQ in 1986 covering their diet in the past year. This prospective (prediagnostic) FFQ estimate of fiber intake was used as the comparison measure (X2). Women who developed breast cancer over the next 2 years and selected controls completed another FFQ in 1989. That retrospective FFQ (X1) asked about their diet in 1985, so it covered approximately the same time period as the 1986 FFQ. The results for mean grams of fiber intake are shown in appendix table A1.
|
and A (equation 1) can be estimated using equation 5 as:
Thus, only 20 percent of the estimated true difference between cases and controls in fiber intake was observed on the retrospective questionnaire. This can also be interpreted to mean that, if the true odds ratio for dietary fiber and breast cancer were, for example, 0.25 for a 10-g increase in fiber intake, and if the validity coefficient of dietary fiber intake from the FFQ, , were estimated to be 0.6 (for both cases and controls for the retrospective questionnaire), then the differential measurement error would lead the observable odds ratio to be (by equation 2):
This could be compared with the attenuation of the odds ratio due to nondifferential measurement error in the prospective study, which would lead to an observed odds ratio of 0.61 (from equation 3), if TX = 0.6 for the prospective FFQ. Thus, in this example, a strong protective relation of fiber on risk of breast cancer (OR = 0.25) would be attenuated to an observed OR = 0.61 in the cohort study, but the relation would be almost completely obscured because of the differential measurement error in the retrospective study (OR = 0.91).
It should be noted that, in this example, some of the assumptions used in the derivation of equations 2 and 3 do not hold. The correlation coefficients between the retrospective and prospective FFQs differed between cases and controls (r = 0.43 and 0. 64, respectively), which suggests that was not equal for cases and controls, and dietary values are unlikely to be normally distributed. Nonetheless, parameter A did approximate the difference between the odds ratio as observed in the actual retrospective study conducted (1.08 for the highest quintile of fiber) versus the risk ratio of 0.62 observed in the actual prospective study (29), (i.e., if ORO from equation 3 were substituted into equation 2).
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|