From the Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD.
Received for publication October 22, 2002; accepted for publication October 16, 2003.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
bias (epidemiology); cross-over studies; epidemiologic methods; epidemiologic research design; length-biased distribution; mixture model; recurrence
Abbreviations: Abbreviation: MH, Mantel-Haenszel.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Information in the original version of the design, for each case, consists of the subjects past usual exposure experience and the gap time between the event and the last exposure before it. The task then is to assess how different the observed gap times are from those that would be expected on the basis of the usual exposure experience and if there were no true event exposure association (null hypothesis). This basic design is mostly recommended (2) and is being frequently used; for more recent applications, see, for example, these reports (36). Literature exists also on other versions of the design, such as relying on proportional hazards models for inference (7), using only short periods of potential exposure before the event (8, 9), or including periods of exposure after the event (10). Time trends in exposure have also been discussed by several author groups (1115).
In this paper, we reveal and address two problems that are more generally present in the various versions of the case-crossover design. These problems exist even without time trends (stationarity of exposure experience); they can invalidate testing even when there is truly no exposure-event relation, and they are not specific to any single particular model such as the proportional hazards assumption. Moreover, to our knowledge, these two problems have not yet been appropriately addressed. To better focus on the main ideas, we present these problems in the original design for stationary exposure experience.
These problems are as follows: first, length bias even when the null hypothesis of no exposure-event relation is true; and second, low precision when such a relation exists. In particular, usual-frequency analyses discretize time into blocks of exposed and unexposed periods and then calculate the following: 1) the exposed time periods having an event, as a fraction of the total exposed periods; 2) the unexposed periods having an event, as a fraction of the total unexposed periods; and 3) a "risk ratio," using "1" and "2" across all subjects. This risk is then compared with unity. We show that such analyses are length biased in the sense that, under the null, the event will more likely fall on a longer-than-average period between exposures. As a result, we show that, under the null, the standard risk ratios are generally lower than 1 when some subjects average periods between exposures are comparable with the assumed effect period. We obtain a formula for the bias and adjust the usual-frequency analyses. The second problem arises because usual-frequency analyses do not use the full gap times between events and last exposures. We provide and demonstrate a new method of analysis that is valid under the null and also takes account efficiently of the full information on the gap times.
![]() |
MATERIALS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Consider a sample of cases, i = 1, ..., n, who have the event, and assume that they are a random sample from the population of cases to which we wish to generalize. Assume that each case is experiencing the exposure in a systematically recurrent way; the times between successive exposures are called the exposures interarrival times and are denoted by T (figure 1). To reflect the systematic pattern in the interarrivals within a person, when appropriate, assume that the successive interarrivals are independent and identically distributed samples from a cumulative distribution that is specific to the subject, denoted by Fi = pr(T t | subject i). Periods during which the systematic pattern is interrupted, for example, by sleep, do not count (2). For extensions, see the Discussion.
|
The general analytical task with these data is to compare the gap times Gi that are observed with those that would be expected under the null hypothesis, H0, of no association between exposure and timing of the event. Next, we review briefly the usual-frequency analysis for this comparison.
Review of usual-frequency analyses
The usual-frequency analysis tries to quantify the association between exposure and event by estimating a "rate ratio" calculated as follows.
First, a fixed period d, for instance, 1 hour, is chosen to reflect a time window after exposure and during which exposure can affect the event, if such an effect exists. This period is called the "assumed effect period." Then, a long-time period of length L (e.g., a year), counting backwards from the event of each subject, is binned into "exposed periods" and "nonexposed" periods, each of length d. Exposed periods are defined to be those that start at an exposure and last for one effect period of length d, and nonexposed periods are the remaining periods of length d (figure 1).
The event for a subject i is classified to occur either during an exposed period or not, which is represented by the variable Xi as equation 1:
The two-way classification that results from categorizing a subjects periods based on exposure and event status and that is used in usual-frequency analyses is given in table 1.
|
where L/µi is the number of exposed times for subject i, and L/d is the maximum number of times that subject could have been exposed. Use of this estimator implicitly defines the target of estimation to be the ratio, for example, MH, to which MH(n) would converge with an increasing number of subjects from the case population. It is important, therefore, to gain insight into MH by interpreting it as a ratio of two fractions: the exposed time periods having an event (as a fraction of the total exposed periods) divided by the unexposed periods having an event (as a fraction of the total unexposed periods), combined across all the subjects with the usual Mantel-Haenszel weights. MH is then compared with 1 to assess the exposure-event association. For example, consider a hypothetical large study in which all subjects have a common period of µi = 8 hours as the average interarrival between injections of a drug, and where the effect period d for myocardial infarction is taken to be 5 hours. Then, the MH ratio will be 1 when, and only when, we observe that five of eight subjects have the myocardial infarction event in an exposed period, and values of MH that are larger (smaller) than 1 are interpreted by the usual-frequency analysis to indicate a positive (negative) exposure-event association. More generally, the usual-frequency analyses take the value of 1 for MH to be the reference value indicating no exposure-event association and around which it judges other degrees of association.
Such analyses generally have two related problems. First, the classification of the gap times Gi based on a common effect period cannot address possible heterogeneous effect periods across subjects and so loses power for tests and efficiency for estimation in situations when exposure does affect the event. Consider, for example, subject 2 of Maclures data on sexual activity (1). That subject has a frequency of exposure of 2/week, and the gap time between last exposure and the event is 90 minutes. The ratio of that gap time to the period between exposures, therefore, is 0.018, so the data from that person alone suggest that exposure has affected that persons event. This individual-level degree of information is lost, however, in the usual-frequency analyses where the persons gap time is simply classified as either "having" or "not having" the event in an exposed period, and this problem exists regardless of how the common bin length d of the effect period is chosen for making that classification. To address this problem, one should use the full gap times Gi in an analysis that compares them with the usual periods of exposure. This task brings up a more fundamental problem, which is to understand the reference distribution of the gap times Gi in the case-crossover design if the null hypothesis, H0, of the no exposure-event association is true. For this reason, in the next section we discuss first the more fundamental properties that H0 induces on the gap times Gi and demonstrate them on the MH estimator. The results of that section are then used to better address the heterogeneous effects of exposure on the timing of the event.
Revealing and addressing the length bias of gap times in the case-crossover design
In this section, we assume that the null hypothesis H0 is true. By definition, under H0, for a subject, the event is a random occurrence along the time pattern of exposures. It follows that the event is more likely to be found in an interarrival between exposures, for example, Te, that is longer than the average interarrival µi (figure 1). Therefore, the case-crossover design exhibits a phenomenon that is known in other applications in epidemiology and sampling as "waiting-time bias" or "length bias" (1618).
In particular, when subject is timing of event is unrelated to the subjects process of exposures, and using standard results of renewal process theory (19), the distribution of the gap times Gi between the event and the last exposure is related to the interarrival distribution Fi:
for times t > 0. The distribution function 3 is the reference distribution against which the gap times should be compared at a subject-specific level under H0. This means that no matter what method is used to model the effects of exposure on the event, it should be such that its properties under H0 reduce to those determined by equation 3.
To demonstrate this argument, note that equation 3 also determines the properties of the MH estimator under the null H0. In particular, from equations 3 and 1, we find that the probability that the subjects event will fall in an "exposed" period of length d is
where
Using E(Xi | Fi) = pr(Xi = 1 | Fi) and the weak law of large numbers, we find, after some algebra, that the estimator MH(n) will, in large samples, converge to MH(null), where
where
and where the expectation, E(.), in equation 5 denotes the average over all subjects in the case population.
Generally, to study transient effects, the effect period d is chosen to be smaller than all subject-specific average interarrivals, µi. It is evident, then, from equation 5, that MH(null) is smaller that 1 whenever some , that is, whenever some actual interarrival times are smaller than the assumed effect period d. As noted by a referee, Maclure (1) deals with length bias by originally assuming in table 4 therein that such cases are not possible. However, when, more generally, such cases occur, if there is truly no exposure-event relation, the usual-frequency analyses will incorrectly indicate an inverse association, that is, a protective effect of exposure. By the same argument, whenever there is an exposure-event relation, the usual frequency analysis will tend to pull that relation toward the negative association present due to length bias. In short, the usual-frequency analysis will tend to underestimate positive risk ratios and exaggerate true protective effects. The degree of bias, that is, how different MH(null) is from 1, depends on the relative magnitude of the quantities
and E(ki) and, thus, on the relative mass of the distribution Fi of interarrivals that lie in (0, d). Therefore, in practice, the bias is expected to be large for exposures with relatively short interarrivals compared with the assumed effect periods, such as with injection of illegal drugs, and expected to be small for exposures with longer interarrivals, such as sexual activity.
|
Using mixture models to increase efficiency
We can now use the reference distribution 3 for the gap times to formulate a model that allows different subjects to be possibly susceptible to different effect periods as follows.
Each subject is event-exposure gap time Gi may or may not have been affected by the exposure. We assume that Gi was not affected by exposure if, conditionally on that subjects usual exposure experience as characterized by Fi, the subjects gap time Gi was a random draw from the null cumulative distribution function of Gi,
from equation 3; such a subject we indicate by Ai = 0. Otherwise, we assume that the subjects gap time was affected by exposure, and, in this case, we let Gi be a draw from some other cumulative distribution, for instance, Haff; such a subject we indicate by Ai = 1 (table 2).
|
where aff is the case populations fraction of subjects whose gap time is affected by exposure, which is assumed to be common across Fi.
Estimation in the above model here focuses on the fraction aff, which is 0 in the special case when exposure does not affect the event. For stable estimation in small samples, it is preferable that Haff be set to a fixed distribution, the variability of which can encompass a reasonably wide range of actual effect periods. Model 6 then explicitly allows that 1) any affected subjects were affected by possibly different actual effect periods and 2) the null model 3 is true (when
aff = 0), so tests based on model 6 are valid. In larger samples, Haff can sometimes also be estimated and be allowed to depend on covariates (see Discussion). For identifiability of the above model, it is best that the chosen Haff be centered at a different time than the average of the centers of
.
The above model is useful for inference, such as constructing confidence intervals for aff and testing the null
aff = 0. For testing, when
aff is expected to be relatively small, the score test using the full data (full score test) gives largest power. Analogously, when we keep only the binned gap times based on the assumed effect period d, which we call the "reduced data," the corresponding reduced score test gives largest power among all tests that use only the reduced data. Generally, a full score test is more efficient than a reduced score test in the sense that it can achieve the same power as that of the reduced score test using only a fraction of the reduced score tests number of subjects (20). The gain in efficiency, then, can be expressed as the percent saving in sample size, when designing the study, between the use of the full score test and the use of the reduced score test for testing, and it is calculated as 1 the ratio of the variances of the reduced score test statistic versus the full score test statistic. For the examples of Maclure (1), we report the gain in efficiency of the full score test versus the reduced score test in the next section.
To obtain 95 percent confidence intervals for aff, we first find its maximum likelihood estimator,
. Because the null value
aff = 0 is at the boundary of the allowed values of
aff, the usual theory for constructing confidence intervals based on the standard error and using a normal approximation to the distribution of
, or of a transformation of it, is not applicable here. For this reason, we use the approach described by Frangakis and Varadhan (21). With this approach, we find the small sample distribution of the maximum likelihood estimator,
(using simulation), as a function of the different possible true values of
aff. Then, by inverting that relation, we obtain a 95 percent confidence interval for
aff that has, to any desired approximation, 95 percent coverage for the true fraction
aff, whether or not that fraction is on the boundary. For more details of this method, given in terms of a different model and data, see the article by Frangakis and Varadhan (21). Using this method, we report confidence intervals for
aff in examples in the next section.
![]() |
DEMONSTRATION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Under the exponential model for Fi, the distribution of the gap time is the same exponential, (t) = Fi(t). Moreover, when a bin of length d is used as the assumed effect period, the MH(null), at which the standard risk ratio is actually centered under the null, can be approximated by replacing the expectations in expression 5 with the sample averages:
It can be seen from equation 7 that the null risk ratio depends only on the parameters d/µi, when the interarrival distribution is exponential. The smaller the d/µi are, the closer the null risk ratio is to 1, and the larger the d/µi are, the closer the null risk ratio is to 0 (note that d < µi, for all subjects).
Table 3 gives MH(null) (null risk ratio), along with the original MH(n) estimate (standard risk ratio) and the adjusted risk ratio MH(n)/MH(null). As the table demonstrates, for sexual activity the bin length as a percentage of the median of periods µi is negligible, and therefore so is the length bias. For coffee drinking, the three smallest average interarrivals µi were 2.4, 3, and 4.8 hours, and the bin length as a percentage of the median of periods µi is 816 percent. Then the null value MH(null) is estimated by the exponential model to be 2040 percent smaller than the null value of 1 assumed by the usual-frequency analysis. In practical terms, the bias arises because the model predicts that some actual interarrivals for some subjects are smaller than the effect period if the latter is assumed to be 1 or 2 hours. This prediction can be tested using the data on previous actual interarrivals other than the last ones that have the events. Note, however, that it is generally not appropriate to remove from the analyses such subjects for which that prediction is actually true, because efficiency would be decreased and because the effect can be different between those subjects and the remaining subjects.
|
In table 4, we calculate the efficiency gained when using the full gap times with model 6 versus when using binned gap times. We take Haff to be the exponential distribution with mean µaff set to three different effect periods, 0.5, 1, and 2 hours. The results show gains of 2052 percent in efficiency. Moreover, the gains in both examples, coffee and sexual activity, are approximately the same function of the ratio of the true average effect period µaff to the length d used when binning the gap times. This suggests that the relative efficiency between the two methods does not depend much on the interarrival distribution. Note, however, that even when the true average effect period µaff equals the length used for binning the gap times Gi, there is an efficiency gain of 20 percent when using the full Gi versus the binned Gi into Xi. This is because the method that fully uses data on gap times better capitalizes on the available subject-specific information about possible effects than the usual-frequency analysis.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
For example, a modified version of the frequency design is to replace using the usual past frequency with using discrete periods in the past (or future) as control periods to compare with the case period that has the event (8, 9). The usual frequency design is then a special case where the window of the control periods becomes contiguous and increases to a long period of, for example, 1 year. Versions of this design with small control window widths close to the case period can be more appropriate for the stationarity assumption when there are time trends in exposure for longer periods, and less prone to reporting biases, compared with the frequency elicited by the subjects (1). As also noted by a referee, when only one control period is used, length bias will not be a concern because it will be equally present in both the case and the control periods. As the control window width increases, however, length bias decreases for the control period, whereas it remains a problem for the case period and therefore still needs to be addressed. Addressing length bias in such intermediate designs is a topic for future work. In addition, in such versions of the design, the distribution Fi would have to be estimated from the few reference interarrivals. Such estimation can also allow that different interarrivals be dependent within a subject, through an unobserved frailty, for example, as discussed previously (22). To increase efficiency when using a mixture model 6, Fi could alternatively be estimated jointly with the components representing the affected subjects.
With regard to modeling exposure interarrival times using a probability distribution, it is useful to distinguish two cases: 1) assuming stationary interarrivals (i.e., absence of a temporal trend), and 2) not assuming stationary interarrivals. In case 1, the assumption of stationarity expedites the modeling of exposure interarrival times. For example, we can model the interarrival distribution using a semiparametric approach, where we model the null interarrival distribution, (t), nonparametrically, and the interarrival distribution under the alternative, Haff(t), using a parametric model. This semiparametric approach, when used for testing, will have the correct
level (type I error rate) for any
(t), since any null distribution can be obtained from equation 6 by letting µaff = 0. More importantly, this yields a more powerful procedure for detecting departures from the null, with heterogeneous effect periods. In case 2, time trends or nonstationarity in the exposure interarrival affects not only our approach but the very principle of the case-crossover design, in that the basic question of whether something unusual happened before the event needs to be more clearly formulated. Additional assumptions are needed to model such time trends in exposure.
The mixture modeling approach discussed here can incorporate a number of improvements in larger data sets, where more stable estimation is possible. For example, we can allow for cofactor effects by simultaneously modeling other exposures, as done in existing approaches (8, 10). Second, as stated earlier, model 6 can be used to estimate the distribution of the gap time conditionally on affected individuals, even though affected status Ai is not directly observed. Third, the model can allow for covariates in both the proportion of affected subjects aff, for example, using a logistic regression, and the distribution of affected subjects Haff. Such joint modeling, not explored by the usual-frequency analyses, can clarify the effect process, because it distinguishes between a relation of the covariate in
aff, which informs about the likelihood of being affected when exposed, and the relation of the covariate in Haff, which informs about the duration of the effect when affected. In such mixture modeling, the expectation-maximization algorithm (23) can facilitate estimation by maximum likelihood.
We recommend our approach when there are accurate data on gap times and on past exposure experience, but not when such data are prone to large measurement error. We hope that, with increasing capability to accurately record processes in real time, for example, with modern medical monitoring devices, our approach will contribute to both addressing length bias when needed and improving the understanding of mechanisms in transient effects when using the case-crossover design.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank Drs. Eleni Petridou, Dimitrios Trichopoulos, and Mei-Cheng Wang for valuable comments.
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|