From the Epidemiology Unit, Institute for Medical Informatics, Biometry and Epidemiology, University Hospital, University of Duisburg-Essen, Essen, Germany.
Received for publication April 2, 2003; accepted for publication July 3, 2003.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
bias (epidemiology); data collection; data interpretation, statistical; epidemiologic measurement; epidemiologic methods; measurement error; prevalence; response
Abbreviations: Abbreviation: NDME, nondifferential misclassification error.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Nonresponse may introduce bias in the effect measures that is usually referred to as "selection bias" or "nonresponse bias" (2), if the exposure of interest is associated with the willingness to participate in a study. However, there is no logical connection between low response proportions and the degree of potential nonresponse bias. For example, if a case-control study with a response proportion of 15 percent among cases and controls examines the association between blood group and risk of gastric cancer and willingness to participate in the study is not associated with blood group, effect estimates will be unbiased, even though 85 percent of the eligible subjects did not participate. In addition, some types of nonresponse bias (pure risk factor bias, pure disease status bias) extinguish each other so that the effect estimate is not biased (3).
Monitoring of the response proportion during the recruitment period of an epidemiologic study reveals that some contacted people react earlier than others (48). Often, several attempts are made to contact eligible persons and motivate them to participate. For example, people who do not respond in the first round of recruitment are usually contacted a second (or even third, fourth, or fifth) time. This results in consecutive recruitment waves of respondents. The cumulative response proportion is usually an asymptotic curve, with the late respondents contributing to the right side of the curve (6, 9).
Analyses of exposure prevalence according to cumulative response (often referred to as wave analyses) may provide insights into the association between ease of recruiting and exposure prevalence (1014). The results of those analyses often reveal that the exposure prevalence of interest differs among early respondents, late respondents, and nonrespondents, because these subgroups represent different segments of the study base with regard to the exposure of interest (48, 1517).
The association between the exposure of interest and the outcome may be further complicated if exposure misclassification error differs by recruitment wave. We could not find any studies in the literature that dealt with nondifferential exposure misclassification by recruitment wave. However, in their cross-sectional study, Helasoja et al. (15) found that the proportion of missing information was higher among late responders than among early responders, which suggests that exposure misclassification in late responders may be higher.
There is good reason to assume that subjects who are difficult to recruit for a study tend to be less motivated to participate than people who are easy to recruit. Participants who are less motivated may answer questionnaires on exposure less carefully. Therefore, misclassification error in studies with questionnaire-based exposure assessment is likely to be higher among participants from later waves, who are less motivated than participants from early waves.
We hypothesize that studies with low response proportions may be considered less biased than studies with high response proportions if the measurement error in a dichotomous exposure increases by recruitment wave. In this paper, we attempt to illustrate the effect of varying degrees of wave-specific exposure measurement error on the relative risk in a hypothetical cohort study by recruitment wave.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We assign a specific exposure prevalence to each wave and then calculate cumulative exposure prevalences that express the exposure prevalence of all waves combined up to the wave of interest. For example, if the prevalence in wave 1 is 0.30 (450 truly exposed participants out of 1,500 participants) and the prevalence in wave 2 is 0.20 (100 truly exposed participants out of 500 participants), the cumulative prevalence for waves 1 and 2 is 0.28 (550 out of 2,000).
We assume three different patterns of nondifferential misclassification error (NDME) in exposure status. With the first pattern, the NDME remains the same in all waves (sensitivity = 0.85, specificity = 0.85). With the second, the NDME increases by wave (waves 15: sensitivity = 0.95, 0.90, 0.85, 0.80, and 0.75; specificity = 0.95, 0.90, 0.85, 0.80, and 0.75). This means that misclassification error among people recruited during the earlier waves (early respondents) is lower than that among people recruited in later waves, who are more difficult to recruit (late respondents). With the third pattern, the NDME decreases by wave (waves 15: sensitivity = 0.75, 0.80, 0.85, 0.90, and 0.95; specificity = 0.75, 0.80, 0.85, 0.90, and 0.95).
We investigate three different wave-specific true exposure prevalence patterns: 1) the true exposure prevalence remains the same in all waves (waves 15: 0.30); 2) the true exposure prevalence increases by wave (waves 15: 0.20, 0.25, 0.30, 0.35, and 0.40); and 3) the true exposure prevalence decreases by wave (waves 15: 0.40, 0.35, 0.30, 0.25, and 0.20). In all of these scenarios, the true exposure prevalence at the end of the fifth recruitment wave (i.e., the 100 percent response proportion) is 0.30.
We use the notation shown in table 1 for the number of correctly and incorrectly classified exposed and unexposed subjects. Applying simple misclassification algebra to the exposure measurement yields, for wave 1,
|
b1 = e1 x (1 Sn1)
c1 = x (1 Sp1)
d1 = x Sp1,
where e1 is the number of truly exposed subjects in wave 1, is the number of truly unexposed subjects in wave 1, Sn1 is the sensitivity of exposure measurement in wave 1, and Sp1 is the specificity of exposure measurement in wave 1. By the same reasoning, the number of correctly and incorrectly classified subjects up to wave i can be calculated iteratively according to
ai = ai1 + (ei ei1) x Sni
bi = bi1 + (ei ei1) x (1 Sni)
ci = ci1 + (
) x (1 Spi)
di = di1 + (
) x Spi,
where ei is the number of truly exposed subjects up to wave i, is the number of truly unexposed subjects up to wave i, Sni is the sensitivity of exposure measurement in wave i, and Spi is the specificity of exposure measurement in wave i.
The disease rate in the truly exposed subjects is 1, and the disease rate in the truly unexposed subjects is
0. The relative risk (RR) in the absence of any exposure misclassification is
1/
0 (i.e., RRtrue =
1/
0). The observed relative risk in the presence of the exposure misclassification up to wave i is
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Figures 1, 2, and 3 show the observed cumulative exposure prevalences by wave under different trends in the true wave-specific exposure prevalence and different wave-specific measurement errors. The true exposure prevalence in the absence of any NDME is 0.30 in each wave (dotted line).
|
|
|
Figure 2 displays the effect of the different wave-specific trends in the NDME in the presence of an increasing wave-specific true exposure prevalence. If the NDME decreases by wave, the bias of the observed cumulative prevalence estimate increases. If the NDME is constant or increasing, the observed cumulative exposure prevalence crosses the true exposure prevalence (0.30) and the direction of the bias in the prevalence estimate changes with additional recruitment waves.
Figure 3 shows the effect of the different wave-specific trends in the NDME in the presence of a decreasing wave-specific true exposure prevalence. The observed cumulative prevalences are overestimates of the true exposure prevalence for all three patterns of NDME. Up to wave 4, all observed cumulative prevalences decrease by wave, and the smallest bias in the cumulative prevalence occurs when the NDME increases by wave. The degrees of bias in the prevalence estimates are identical at the end of wave 4. After wave 4, the magnitudes of bias in the prevalence estimates have changed their order with regard to the different patterns of NDME.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Increasing exposure misclassification error by wave may be most likely to occur in epidemiologic studies that use questionnaire techniques for dichotomous exposure assessmentsincluding questions that deal, for example, with the medical history of index persons and their relatives ("have you ever had..."), occupational exposures ("have you ever worked with..."), and other exposures. The misclassification error of those exposure measures may be higher among participants who are recruited in later waves and are less motivated than among participants who are recruited in earlier waves and are highly motivated. In addition, study personnel may also contribute to increasing exposure misclassification error by wave if they accept sloppy answers more easily from difficult-to-recruit participants in later waves than from easy-to-recruit participants in earlier waves.
These results illustrate that if we observe an increased relative risk in a cohort study with a low response proportion, we have no good reasonprovided that the exposure misclassification in the dichotomous variable is nondifferentialto assume that the observed relative risk is an overestimation of the true relative risk. This finding is the logical consequence of the fact that if the exposure misclassification is nondifferential by wave, it is so for all intermediate steps; hence, the relative risk estimate is biased towards the null up to any wave. On the contrary, if we observe an elevated relative risk in a cohort study with a low response proportion, this relative risk estimate may be less biased towards the null than if we had achieved a higher response proportion.
In the presence of NDME in a dichotomous exposure variable, the degree of bias in the exposure prevalence estimates depends on the wave-specific trend of the misclassification error and on the wave-specific trend of the underlying true exposure prevalence. We found that if the true wave-specific exposure prevalence is constant and the NDME increases by wave, studies with low response proportions give less biased prevalence estimates than studies with high response proportions. This association is complicated when the true wave-specific exposure prevalence increases by recruitment wave. In that scenario, the degree and direction of bias in the prevalence estimate depend on the magnitude of change in the true wave-specific exposure prevalence and the nondifferential misclassification of the exposure. If the true wave-specific exposure prevalence decreases by wave, the prevalence estimates tend to be more biased the lower the response for all three patterns of wave-specific NDME that we presented. In this scenario, one may be right in assuming that the higher the response proportion the less biased is the prevalence estimate. However, even then the bias of the relative risk estimate is smaller for studies with lower response proportions, except if the NDME decreases by wave. Monitoring of the wave-specific exposure misclassification (sensitivity, specificity) could provide further insights into these interrelations. For example, in each wave, current smoking status as derived from interviews with study participants could be compared with serum cotinine concentration, which is frequently used to validate self-reported information on smoking (18).
We have good reason to assume that for some risk factors (e.g., smoking), NDME increases by wave and true prevalence increases by wave (e.g., see the study by Barchielli and Balzi (17)), so that in practical situations a lower response proportion may be better in terms of bias in the true relative risk estimate. In terms of bias in the prevalence estimate, our calculations give acceptable results between wave 3 and wave 4 (5080 percent cumulative response).
There are several factors that limit our findings. First, we considered only misclassification of a dichotomous exposure. If the exposure measure is a polytomous or continuous variable, nondifferential exposure misclassification may bias the effect estimate away from the null (19). In addition, if the exposure misclassification is differential, bias in all directions is possible. However, this does not limit our findings with respect to the prevalence estimation problem as presented in figures 13. The relative risk estimation in this situation becomes even more complex than in the usual differential misclassification situation if we make the realistic assumption that the probability of disease depends not only on exposure status but also on willingness to participate in the study (wave). Further research is necessary to understand the interdependence of this mechanism and the differential misclassification assumption and relative risk estimation.
Second, we studied only a limited number of potential scenarios of wave-specific true exposure prevalence and misclassification trends and true relative risk estimates. Third, we did not focus on increases in missing data by wave, which may bias the estimates if the true exposure prevalence and missingness are both associated with recruitment wave. Fourth, our findings are not necessarily generalizable to case-control studies in which recruitment may vary by disease status.
In conclusion, studies with low response proportions may be less biased than studies with high response proportions if the NDME of a dichotomous exposure increases by recruitment wave. Epidemiologists need a better understanding of the interrelated effects of response proportions, wave-specific true prevalences, and exposure misclassification errors on observed exposure prevalences and effect estimates.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|