Methods for the analysis of incidence rates in cluster randomized trials

Steve Bennetta, Tamiza Parpiaa,b, Richard Hayesa and Simon Cousensa

a MRC Tropical Epidemiology Group, Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, Keppel St, London WC1E 7HT, UK.


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Background The published literature on cluster randomized trials focuses on outcomes that are either continuous or binary. In many trials, the outcome is an incidence rate, such as mortality, based on person-years data. In this paper we review methods for the analysis of such data in cluster randomized trials and present some simple approaches.

Methods We discuss the choice of the measure of intervention effect and present methods for confidence interval estimation and hypothesis testing which are conceptually simple and easy to perform using standard statistical software. The method proposed for hypothesis testing applies a t-test to cluster observations. To control confounding, a Poisson regression model is fitted to the data incorporating all covariates except intervention status, and the analysis is carried out on the residuals from this model. The methods are presented for unpaired data, and extensions to paired or stratified clusters are outlined.

Results The methods are evaluated by simulation and illustrated by application to data from a trial of the effect of insecticide-impregnated bednets on child mortality.

Conclusions The techniques provide a straightforward approach to the analysis of incidence rates in cluster randomized trials. Both the unadjusted analysis and the analysis adjusting for confounders are shown to be robust, even for very small numbers of clusters, in situations that are likely to arise in randomized trials.

Keywords Cluster randomized trials, incidence rate, t-test, Poisson regression, mortality rate

Accepted 1 February 2002


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Clinical trials typically involve the randomization of individual subjects to the intervention or control groups. Cluster randomized trials are, however, characterized by the randomization of groups or clusters of individuals (such as schools, medical practices or communities) to treatment groups. Such designs are used increasingly frequently in trials of preventive interventions, for example of the effects of vitamin A supplementation1 and of insecticide-impregnated bednets2–4 on child mortality, or of control of sexually transmitted diseases5 on human immunodeficiency virus (HIV) incidence. Cluster randomized trials have a particularly important role in the study of behavioural interventions, which may necessarily be implemented at the cluster level, and of interventions against infectious diseases, where a cluster-wide intervention may have an additional effect of reducing transmission.

An important feature of cluster randomized trials is that the responses of individuals within a cluster may be correlated. Conventional approaches to the analysis of comparative trials, which assume that individuals’ responses are independent of each other, are thus invalid. This correlation within clusters results in the statistical efficiency of cluster randomized designs being lower than that of individually randomized designs, and so a larger sample size is necessary for equivalent power.6 In analysis, failure to take account of such intracluster correlation does not bias the point estimate of the intervention effect, but leads to a falsely narrow confidence interval (CI) and an inappropriately small P-value.

Several methods have been developed for the statistical analysis of cluster randomized trials, which take into account the intracluster correlation. Methods for analysis of continuous outcome data from such study designs are well established.7,8 Methods for the analysis of binary outcomes have been developed by Donner and Klar.9,10 These include simple techniques such as the t-test, the Wilcoxon rank sum test and Fisher’s permutation test, which use the proportion of individuals experiencing the event in each cluster as the observation, and methods using individual data based on corrections to the {chi}2 test. Methods for paired cluster designs have also been described.11–13

More sophisticated approaches involve the use of hierarchical regression models which permit control of both individual- and cluster-level covariates, for example, generalized estimating equations14 or multilevel modelling.15 These methods, and their limitations, are not widely understood by non-statisticians, and it is not clear that they are more valid than the simple methods; for example, a test based on generalized estimating equations has been shown to be invalid when the number of clusters is small.16

Gail and co-workers17,18 propose a simpler approach to adjusting for individual- and cluster-level covariates, based on using a permutation test on the residuals from a logistic regression model, which includes all baseline covariates except the intervention term.

There has been little work done on methods for the analysis of cluster randomized trials where the outcome measure is an event rate (per person year), such as mortality, or incidence rate of disease, rather than a proportion. Brookmeyer and Chen19 evaluated significance tests for paired designs, while Lui et al.20 evaluated CI for estimates of the risk ratio (not rate ratio) from cluster sampling, although only for studies with at least 20 clusters. We describe practical methods for the analysis of event rate data from an unmatched design, including the choice of estimator, and methods for hypothesis testing and CI construction. We show how these methods can allow for confounding factors using residuals from Poisson regression. We evaluate these methods by Monte Carlo simulation over a wide range of scenarios, including very small numbers of clusters, which are commonly used in practice, and where asymptotic results do not hold. We further illustrate our preferred approach by applying it to data from a cluster randomized trial in West Africa of insecticide-treated bednets in which the outcome measure was all-cause child mortality. We emphasize a number of practical issues in the analysis, such as whether it is appropriate to transform the data, the choice of residual and the robustness of the technique, and describe the extension of the methods to a design in which clusters are paired or stratified.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Unadjusted analysis
Let dij be the observed number of events and yij the number of person years of observation in cluster j of intervention group i, where i = 1 represents the intervention group and i = 2 the control group and j = 1...mi. The cluster event rate is rij = dij/yij.

Point estimate
There are two natural ratio estimates of the intervention effect. The first is obtained from the ratio of the overall event rates in the intervention and control groups, and gives equal weight to each person-year of observation in the study. The overall rate in intervention group i is given by


((1))
and the rate ratio is estimated by

An alternative estimate of the rate in each intervention group, and hence of the intervention effect, is based on the mean of the cluster event rates, and gives equal weight to each cluster. The mean of the cluster event rates in group i is given by


and thus an estimate of the intervention effect is

Alternatively, if the distribution of mortality rates in each group is highly skewed, their geometric mean may be used. The average log(mortality rate) in each group i is given by


leading to exp(ui) as the geometric mean of the cluster event rates and

as an estimate of the intervention effect.

Confidence intervals
A CI may be obtained for the intervention effect RRM by using the standard deviation of the cluster event rates in each group. Although under this approach information about differences in sample sizes among individual clusters is not used explicitly, the variability of the observed cluster rates is reduced as cluster size increases (Ref. 21, pp. 220–1). Thus, increasing cluster sizes still carries a benefit in terms of precision and power.

The distribution of RRM is likely to be skewed, and so we calculate a CI on a logarithmic scale. Using a Taylor series approximation (Ref. 21, p. 92), the variance of log(RRM) is estimated by


where


is the estimated variance of the cluster rates in the ith group. The sampling distribution of RRM will be asymptotically normal, and a 95% CI for RRM may be given by

However, simulations which follow show the coverage of this to be too low when the number of clusters is small, due to uncertainty in the estimation of VM. To allow for this uncertainty we propose a heuristic 95% CI for RRM given by

where t0.025 is the 97.5% point of the t distribution on m1 + m2 2 degrees of freedom. A 95% CI for RRGM is given conventionally by


where


and wi2 is the observed variance of the cluster log(event rate)s for the ith group.

A CI centred on RRO may be obtained using an estimate of the variance from survey sampling theory22,23 which relies on the fact that ri is a ratio:


((2))
where

Then


as above, and similarly a 95% CI for RRO based on normal theory is given by exp[log RRO ± 1.96{surd}VO] but we shall show that better coverage is obtained by using the t distribution:

Hypothesis testing
Whether using RRM or RRO, an unpaired t-test on the cluster rates (or the log cluster rates) with m1 + m2 – 2 degrees of freedom may be used to test the effect of the intervention on the outcome, unadjusted for any covariates. Donner and Klar10 discuss the validity of the t-test. The logarithmic transformation of the cluster event rates may be advisable if the assumptions of the t-test (normality, equal variance) are not satisfied, or a Wilcoxon test may be used.

Adjusted analysis
To allow for covariates by fitting an ordinary Poisson regression model, including an indicator for the intervention effect, would pool all the deaths across the clusters and treat each individual response as independent. Although the point estimate of the intervention effect would be valid (and analogous to RRO above), the standard error associated with the estimate would be falsely precise. In the proposed method,17,24 a Poisson regression model is fitted to the individual data, including all covariates thought, a priori, to be confounders, but excluding the term representing the intervention effect. The residuals from this model are aggregated for each cluster and the methods applied above are then applied using these aggregated residuals, rather than the raw mortality rates, as the cluster observations. These residuals represent the effect of the intervention on the incidence rate, after adjustment for the confounding variables. The analysis is unaffected by whether the data are analysed as one observation per individual or as one summary observation per combination of covariates.

Residuals
Fitting a Poisson regression model excluding the intervention effect will give an expected value eijk for the kth individual in the (i,j)th cluster. The residual for this subject is a measure of the discrepancy between the observed value dijk (which = 1 for a death and 0 for a survivor) and the expected value eijk. This residual includes the effects of random variation, and of explanatory factors not included in the model, one of which is the intervention. There are a number of possible forms for the residual and also a number of ways in which they may be aggregated over each cluster.

By analogy with the standardized mortality ratio (SMR), the ratio of observed deaths to expected deaths, we define the SMR residual for the (i,j)th cluster as

where

and

.

In the presence of a beneficial intervention, the SMR residuals will tend to be less than 1 in clusters receiving intervention, and greater than 1 in control clusters.

Estimation and inference
The covariate-adjusted equivalent of the unadjusted estimate based on individual data, RRO, is obtained by fitting an ordinary Poisson model including an indicator variable for the intervention effect since no bias is induced in the point estimate by the correlation of observations.

A covariate-adjusted analogue of the mean of cluster rates in the ith group (i) is given by the mean of the SMR residuals, and so a covariate-adjusted equivalent of the point estimate RRM can be obtained from the ratio of such means between the two groups.

Hypothesis testing and construction of CI may then be carried out as for unadjusted data, except that the SMR (or log SMR) residuals are used in place of cluster rates, so that yij is replaced throughout by eij.

Simulation
We simulated data from a cluster randomized trial with groups of m = 3, 5, 10 or 20 clusters allocated to each of intervention and control treatments. The cluster sizes and mortality rate were chosen to be typical of studies of child (under-5) mortality in rural Africa. The numbers of person-years per cluster (yij) were sampled either from a symmetric distribution (normal with mean 150 and standard deviation 30), or from a skewed distribution (log-normal with mean 5 and standard deviation 0.5, equivalent to geometric mean 148 and 95% reference range (56, 395). The event rate in the control group (µ) was set at 40 per 1000 person-years. The intervention was assumed to have a protective efficacy of 25%, implying a rate ratio of {theta} = 0.75. A null effect was also simulated to determine the size of the test statistics. A random cluster effect {lambda}ij was included which acted multiplicatively on the mortality rate, and whose logarithm was normally distributed with mean zero and standard deviation {phi} = 0, 0.125 or 0.25.18 Note (using the approximation in the formula for VM) that {phi} is approximately equal to the coefficient of variation (k) in underlying cluster rates used in our sample size formulae for cluster randomized trials.6 One-thousand simulations were run for each combination of m and {phi}. The number of events in the (i,j)th cluster was generated from a Poisson distribution with mean yijµ{lambda}ij for the control clusters (i = 2) and yijµ{lambda}ij{theta} for the intervention clusters (i = 1).

For the adjusted analysis of data with a confounding variable we generated a binary covariate X such that 60% of person-years in the control group and 40% in the intervention group had value x = 1, with the remainder having x = 0. The multiplicative effect of the covariate on the event rate was eß = 3. The number of events in the (i,j)th cluster with covariate x (= 0,1) was then generated from a Poisson distribution with mean yijµ{lambda}ijeßx for the control clusters (i = 2) and yijµ{lambda}ij{theta}eßx for the intervention clusters (i = 1).

The number of simulations ensures that the size of the test and the coverage of the CI are estimated with a 95% CI of ±1.4%. The simulations were performed using Stata 6. (Stata Corporation, College Station, TX, USA)


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Simulation
Table 1Go shows the results of unadjusted analyses of simulated data with no confounding factor. As the sampling distributions of both point estimates of the rate ratio, RRM and RRO, are positively skewed, we present their geometric means, which indicate lack of bias on the log scale. Results for symmetric or skew distribution of cluster sizes were similar. The normal CI show sub-nominal coverage for small numbers of clusters, but those based on the t distribution give coverage extremely close to the nominal value of 0.95 for any number of clusters and any level of between cluster variation {phi}. Results for the geometric mean estimate RRGM and its CI were similar (data not shown).


View this table:
[in this window]
[in a new window]
 
Table 1 Results of unadjusted analysis of simulated data with no confounding factors. Rate ratio {theta} = 0.75, except for size of test, where {theta} = 1
 
The size of the t-test is close to nominal for all parameter values, whether based on transformed or raw data, but the size of the Wilcoxon test is low when the number of clusters is small. The power of the tests was also studied (data not shown). As expected, the power decreases with increasing {phi}, and for small numbers of clusters the Wilcoxon has less power than the t-test, being unable to give a significant result with only three clusters per group.

Table 2Go shows the corresponding results for the adjusted analysis of data with a confounding variable. The results are very similar to those of the unadjusted analysis. The imbalance in the confounding factor considered here is greater than is likely to arise in most situations in a randomized trial, unless the covariate is a cluster-level one, and the number of clusters is very small. We also considered a more extreme scenario,19 in which 80% of person-years in the control group and 20% in the intervention group had value x = 1, and in this case found (Table 3Go) that the point estimate of RRM was biased upwards, the coverage of the t-based CI for RRO was too low, and the sizes of the tests (based on RRM) were all too low.


View this table:
[in this window]
[in a new window]
 
Table 2 Results of adjusted analysis of simulated data with confounding factor prevalence 60:40. Rate ratio {theta} = 0.75, except for size of test, where {theta} = 1
 

View this table:
[in this window]
[in a new window]
 
Table 3 Results of adjusted analysis of simulated data with confounding factor prevalence 80:20. Rate ratio {theta} = 0.75, except for size of test, where {theta} = 1
 
Case study
Bednets have long been used for protection against mosquitoes and malaria, and impregnation with insecticide markedly increases their effectiveness. We present here data from a cluster randomized trial held in the Kassena-Nankana district of northern Ghana to determine the ability of insecticide-impregnated bednets to reduce all-cause child mortality.4

A census was carried out and the area was divided into 96 clusters with an average of 120 compounds (approximately 1400 people) per cluster. In all, 48 clusters were randomly selected by an open lottery to receive the impregnated bednets, and a longitudinal surveillance system set up to monitor births, deaths and migrations. Follow-up continued for 2 years, at the end of which impregnated bednets were provided to all compounds in the control clusters.

Mortality rates in the bednet study were calculated using the person-years of follow-up from the demographic surveillance data (Table 4Go). The cluster mortality rates were reasonably normally distributed so no transformation was required and the crude rates were used as the cluster observations for unadjusted analysis (Table 5Go). The intervention effect estimated by the overall value, RRO = 0.84, was used since the sample was the whole population of a given area, and we wanted to give equal weight to each person. The test statistic using the t-test was 1.73 (P = 0.09), and the t-based 95% CI using equation (2) was 0.71–1.00.


View this table:
[in this window]
[in a new window]
 
Table 4 Mortality rates in Ghana trial of insecticide-impregnated bednets
 

View this table:
[in this window]
[in a new window]
 
Table 5 Results of analysis of the Ghana bednet trial
 
The intervention effect was adjusted for age using a Poisson regression model. Pre-intervention mortality was also included in the model initially but had no substantial effect on the estimated rate ratio and was therefore excluded. The cluster-specific SMR, derived from the Poisson model under the null hypothesis of no intervention effect, were reasonably normally distributed in the two groups (whereas the log SMR were negatively skewed) and were therefore used for hypothesis testing. Results are shown in Table 5Go. Analyses, both crude and adjusted, which ignored clustering, gave CI that were too narrow and P-values that were too small.


    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
We have presented here some simple practical approaches to the analysis of cluster randomized trials whose outcome is an event rate (events per person time). We have evaluated both adjusted and unadjusted methods by simulation and shown them to be robust in scenarios typical of those seen in real data: this robustness extends to very small numbers of clusters and to a skew distribution of cluster sizes, although an extreme imbalance of covariates may lead to problems.

The choice of point estimate for the rate ratio depends on whether one wishes to give equal weight to each person-year of observation (RRO) or to each cluster (RRM), and on relative efficiency. If there is little or no intra-cluster correlation in responses ({phi} {approx} k {approx} 0) all clusters will have the same underlying rate, and RRO will be the most efficient estimator, as it weights each cluster according to its size. If the intra-cluster correlation is high, the underlying rates will be very different, and the most efficient estimate will be only partially weighted, by analogy with a random effects meta-analysis,13 so that a high k favours the equally-weighted RRM. The fact that the hypothesis tests are based implicitly on RRM indicates a possible disadvantage of RRO, in that there may be a mismatch in results when significance is borderline. This is shown by the adjusted results of the case study where the 95% CI (for RRO) just touches unity but the P-value is 0.09.

The CI that we propose are based on large sample theory, as are those given by more complex methods. Basing the intervals on a normal distribution led to less than nominal coverage but this was corrected for all cluster sizes by the use of a t distribution.

For testing, it is advisable to examine the distributions of cluster rates and of log(cluster rates) and to consider carefully whether to log-transform the rates, even though the t-test is reasonably robust to departures from the assumptions of normality and equal variances of cluster observations.10 If both treatment groups contain an equal, large number of clusters, as in the case study, the assumption of equal variances is not required. We have shown that the size (Type I error) of the t-test is close to its nominal value, although that of the Wilcoxon test is low when the number of clusters is small. An alternative to these tests is the permutation test,17,18 which makes no distributional assumptions and has been shown to be more robust to mis-specification of the underlying model,24 but is not as straightforward to compute. Gail and co-workers18 point out that these tests are asymptotically equivalent and show that their small-sample properties are similar, in that their power is close to nominal levels when the numbers of clusters in each treatment group are similar but not when the design is unbalanced. Brookmeyer and Chen also showed similar power for a paired design with small numbers of clusters.19

The point estimates of the intervention effect for a paired design are the same as those for the unpaired design. Hypothesis testing is carried out as for an unpaired design, except that a paired t-test is used. Calculation of CI for a paired design, taking account of the pairing, is not so straightforward. The ratio estimate of variance cannot be extended to paired data (since there is only one cluster per treatment group in each stratum) so there is no simple CI for RRO. Diehr et al.25 have shown that, unless the matching has had a major effect, there may be benefits in an unmatched analysis of a matched design. Alternatively, a bootstrap CI26 (resampling pairs) may be calculated. A CI for RRM may also be obtained by the bootstrap. However, if the geometric mean of cluster rates has been used rather than the mean, a simple CI is available5 by noting that the estimate of the rate ratio, RRGM, is equal to the geometric mean of the m cluster pair rate ratios RRj. A CI for RRGM is given by


where w is the estimated variance of the log(RRj)s.

Klar and Donner27 have shown advantages of stratified designs with more than two clusters per stratum over pair-matched designs. For such designs, the paired t-test outlined here may be replaced by a two-way analysis of variance (Ref. 20, p. 237), a stratified ratio estimate is available (Ref. 23, p. 164), and the bootstrap may be adapted.

For the adjusted analysis we estimated the intervention effect from a Poisson regression model. The residuals derived from the Poisson model under the null hypothesis of no intervention effect are based on an SMR as they provide a summary measure of the intervention effect that ties in with rate ratios. There are several other possible definitions of a residual that one could use, for example dij – eij, (dij – eij)/yij or (dij – eij)/{surd}eij, and their properties require further research.

In the realistic scenario with moderate imbalance of a confounding factor with a large effect on outcome, the adjusted analysis worked well, with results similar to those of the unadjusted analysis in the absence of confounding. In the situation of extreme imbalance of such a covariate (as is only likely to occur with a cluster-level covariate and a very small number of clusters) we obtained a small bias in the results. Just as omitting a confounder from a regression would lead to a biased estimate of the treatment effect, so conversely, omitting treatment from the model will result in a biased estimate of the covariate effect, and hence of the SMR residual. The direction of the bias will depend on the direction of the confounding. The effect will not be important if either the treatment effect is small, or the covariate is balanced (or only moderately unbalanced) across the treatment groups, as should usually be true in a randomized trial.

By comparison with the simple approaches presented here, the use of multilevel modelling may be far from straightforward, with complexities in the specification of the model, the choice of fitting method, the process of fitting the model, and the interpretation of the results. The approach of generalized estimating equations is not robust to small numbers of clusters16 and is not straightforward to apply to paired cluster data. An alternative is to fit a standard Poisson or Cox proportional hazards regression model, adjusting for clustering using a robust variance estimator.28,29 The properties of this method require further investigation for small numbers of clusters.

The methods presented here provide a set of simple tools for the analysis of incidence rates in cluster randomized trials. It must be remembered that different data sets will have different properties and, as with any analysis, the data should first be explored thoroughly.


    Acknowledgments
 
These methods were developed for the analysis of four trials of insecticide-impregnated bednets and curtains organized by the UNDP/World Bank/WHO Special Programme for Research and Training on Tropical Diseases. We are extremely grateful to Fred Binka for allowing us to use the data from the trial conducted in Ghana, and to Christian Lengeler for co-ordinating the series of trials and, in particular, a workshop on analytical methods. We appreciate the input of Linda Morison, Laura New, Gilly Maude and other colleagues at the London School of Hygiene and Tropical Medicine, and of all the participants in the workshop. We thank Steven Mark, Neal Alexander and anonymous referees for their constructive comments on earlier versions of this paper.


    Notes
 
b Current address: Pfizer Biostatistics and Reporting, Pfizer Global Research and Development, Sandwich, Kent CT13 9NJ, UK. Back


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
1 Ghana VAST Study Team. Vitamin A supplementation in northern Ghana: effects on clinic attendances, hospital admissions and child mortality. Lancet 1993;342:7–12.[CrossRef][ISI][Medline]

2 D’Alessandro U, Olaleye B, McGuire W et al. Mortality and morbidity from malaria in Gambian children after introduction of an impregnated bednet programme. Lancet 1995;345:475–83.[CrossRef]

3 Nevill CG, Some ES, Mung’ala VO et al. Insecticide-treated bednets reduce mortality and severe morbidity from malaria among children on the Kenyan coast. Trop Med Int Health 1996;1:139–46.[ISI][Medline]

4 Binka FN, Kubaje A, Adjuik M et al. Impact of permethrin impregnated bednets on child mortality in Kassena-Nankana district, Ghana: a randomized controlled trial. Trop Med Int Health 1996;1:147–54.[ISI][Medline]

5 Grosskurth H, Mosha F, Todd J et al. Impact of improved treatment of sexually transmitted diseases on HIV infection in rural Tanzania: randomized controlled trial. Lancet 1995;346:530–36.[ISI][Medline]

6 Hayes RJ, Bennett S. Simple sample size calculation for cluster-randomized trials. Int J Epidemiol 1999;28:319–26.[Abstract]

7 Donner A. A regression approach to the analysis of data arising from cluster randomization. Int J Epidemiol 1985;14:322–26.[Abstract]

8 Koepsell TD, Martin DC, Diehr PH et al. Data analysis and sample size issues in evaluations of community-based health promotion and disease prevention programs: a mixed-model analysis of variance approach. J Clin Epidemiol 1991;44:701–13.[ISI][Medline]

9 Donner A, Klar N. Confidence interval construction for effect measures arising from cluster randomization trials. J Clin Epidemiol 1993;46:123–31.[ISI][Medline]

10 Donner A, Klar N. Methods for comparing event rates in intervention studies when the unit of allocation is a cluster. Am J Epidemiol 1994; 140:279–89.[Abstract]

11 Donner A. Statistical methodology for paired cluster designs. Am J Epidemiol 1987;126:972–79.[Abstract]

12 Donner A, Hauck WW. Estimation of a common odds ratio in paired-cluster randomization designs. Stat Med 1989;8:599–607.[ISI][Medline]

13 Thompson SG, Pyke SDM, Hardy RJ. The design and analysis of paired cluster randomized trials: an application of meta-analysis techniques. Stat Med 1997;16:2063–79.[CrossRef][ISI][Medline]

14 Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13–22.[ISI]

15 Goldstein H. Multilevel Models. 2nd Edn. London: Edward Arnold, 1995.

16 Donner A, Eliasziw M, Klar N. A comparison of methods for testing homogeneity of proportions in teratological studies. Stat Med 1994; 13:1253–64.[ISI][Medline]

17 Gail MH, Byar DP, Pechacek TF, Corle DK, for the COMMIT study group. Aspects of statistical design for the community intervention trial for smoking cessation (COMMIT). Controlled Clin Trials 1992; 13:6–21.[CrossRef][ISI][Medline]

18 Gail MH, Mark SD, Carroll RJ, Green SB, Pee D. On design considerations and randomization-based inference for community intervention trials. Stat Med 1996;15:1069–92.[CrossRef][ISI][Medline]

19 Brookmeyer R, Chen Y-Q. Person-time analysis of paired community intervention trials when the number of communities is small. Stat Med 1998;17:2121–32.[CrossRef][ISI][Medline]

20 Lui K-J, Mayer JA, Eckhardt L. Confidence intervals for the risk ratio under cluster sampling based on the beta-binomial model. Stat Med 2000;19:2933–42.[CrossRef][ISI][Medline]

21 Armitage P, Berry G. Statistical Methods in Medical Research, 3rd Edn. Oxford: Blackwell, 1994.

22 Rao JNK, Scott AJ. A simple method for the analysis of clustered binary data. Biometrics 1992;48:577–85.[ISI][Medline]

23 Cochran WG. Sampling Techniques. 3rd Edn. Chichester: John Wiley, 1977.

24 Gail MH, Tan WY, Piantadosi S. Tests for no treatment effect in randomized clinical trials. Biometrika 1988;75:57–64.[ISI]

25 Diehr P, Martin DC, Koepsell T, Cheadle A. Breaking the matches in a paired t-test for community interventions when the number of pairs is small. Stat Med 1995;14:1491–504.[ISI][Medline]

26 Efron B, Tibshirani RJ. An Introduction to the Bootstrap. London: Chapman and Hall, 1993.

27 Klar N, Donner A. The merits of matching in community intervention trials: a cautionary tale. Stat Med 1997;16:1753–64.[CrossRef][ISI][Medline]

28 Moore DF, Tsiatis A. Robust estimation of the variance in moment methods for extra-binomial and extra-Poisson variation. Biometrics 1991;47:383–401.[ISI][Medline]

29 Lin DY. Cox regression analysis of multivariate failure time data: the marginal approach. Stat Med 1994;13:2233–47.[ISI][Medline]