Comparison of methods for analysing cluster randomized trials: an example involving a factorial design

TJ Peters1,2, SH Richards3,4, CR Bankhead5, AE Ades6 and JAC Sterne2

1 Division of Primary Health Care,
2 Department of Social Medicine, University of Bristol.
3 Department of Primary Care and General Practice, University of Birmingham.
4 Current affiliation: Department of Primary Care, Peninsula Medical School, Universities of Exeter and Plymouth.
5 Cancer Research UK Primary Care Education Research Group, University of Oxford.
6 MRC Health Services Research Collaboration, Department of Social Medicine, University of Bristol.

Professor Tim J Peters, Division of Primary Health Care, University of Bristol, Cotham House, Cotham Hill, Bristol BS6 6JL, UK. E-mail: tim.peters{at}bristol.ac.uk


    Abstract
 Top
 Abstract
 Methods
 Results
 Discussion
 References
 
Background Studies involving clustering effects are common, but there is little consistency in their analysis. Various analytical methods were compared for a factorial cluster randomized trial (CRT) of two primary care-based interventions designed to increase breast screening attendance.

Methods Three cluster-level and five individual-level options were compared in respect of log odds ratios of attendance and their standard errors (SE), for the two intervention effects and their interaction. Cluster-level analyses comprised: (C1) unweighted regression of practice log odds; (C2) regression of log odds weighted by their inverse variance; (C3) random-effects meta-regression of log odds with practice as a random effect. Individual-level analyses comprised: (I1) standard logistic regression ignoring clustering; (I2) robust SE; (I3) generalized estimating equations; (I4) random-effects logistic regression; (I5) Bayesian random-effects logistic regression. Adjustments for stratification and baseline variables were investigated.

Results As expected, method I1 was highly anti-conservative. The other, valid, methods exhibited considerable differences in parameter estimates and standard errors, even between the various random-effects methods based on the same statistical model. Method I4 was particularly sensitive to between-cluster variation and was computationally stable only after controlling for baseline uptake.

Conclusions Commonly used methods for the analysis of CRT can give divergent results. Simulation studies are needed to compare results from different methods in situations typical of cluster trials but when the true model parameters are known.


Keywords Cluster randomized trial, factorial trial, data analysis, logistic regression

Accepted 7 May 2003

Cluster randomized trials (CRT) are increasingly being used in health services research.1 These are trials in which individuals are randomly allocated in groups, or clusters, so that all participants in a group are assigned to the same intervention.2 The ‘unit of randomization’ is commonly a health professional (or team of health professionals such as in a primary care centre or hospital clinic) responsible for the care of a group of patients. The cluster design is appropriate when the intervention operates at the group level or where there is a strong likelihood of contamination between interventions if they are conducted within the same cluster. The principles of the design and analysis of CRT are well advanced2 and have many parallels in meta-analysis,3 since the latter also involves combining information from different units (trials) of varying sizes.

These principles are increasingly being adhered to in cluster trial design, especially sample size planning. It is widely understood that analyses of CRT must allow for the clustering in the data. If individuals in a cluster tend to be more similar to each other than to individuals in other clusters then ignoring clustering will lead to standard errors for the intervention effects that are too small, confidence intervals that are too narrow, and P-values that are too small. There is, however, little consensus or consistency in the analysis of such trials. Analyses may be performed using individual-level data or data aggregated at the cluster level, and various methods are available for each of these options.2,4 Comparative data analyses for simple parallel group designs have been published, mostly for continuous outcome variables.1,5,6 A few have covered binary outcomes, but these have either been for equal sized clusters7 or focused primarily on individual-level analyses.6 Moreover, these studies have considered only simple parallel group trials;6,7 complex designs such as factorial trials have not been covered. There remains considerable uncertainty about the relative merits of the different methods, and further illustrations of the alternatives and their performances in different settings are required.

The aim of this paper is to describe different methods for the analysis of factorial CRT, and to compare their results using a trial of two primary care-based interventions designed to increase attendance for breast screening. This study took place between July 1997 and August 1998, and 24 general practices were (cluster) randomized in a 2 x 2 factorial design.8 One of the methods considered here corresponds to the primary analysis specified in the protocol (to accord with CONSORT9,10). While the alternatives in this paper therefore constitute a sensitivity analysis for the original trial,8 the principal purpose here is to provide an illustration of how the various methods compare with one another.


    Methods
 Top
 Abstract
 Methods
 Results
 Discussion
 References
 
The study
The objectives and methods of the trial are described fully elsewhere.8 Briefly, this was a 2 x 2 factorial trial evaluating a systematic intervention (a letter from the general practitioner (GP) to individual women) and an opportunistic intervention (a flag in women’s notes to prompt discussion if women consulted for any reason) designed to increase breast screening uptake. The 24 practices randomized to one of the four groups (neither intervention, letter only, flag only, both interventions) had previously low uptake themselves (below 60% in the second round of the UK National Health Service Breast Screening Programme) and were in two areas with generally low uptake. Randomization was by practice, and given the small number of practices this was stratified by area (London and the West Midlands; 12 practices from each) and number of GPs (single or multi-handed; 4 and 8 respectively in each area). The primary outcome was whether or not women attended for mammography in the subsequent screening round, and the trial was designed to detect differences of 8.5 to 10 percentage points in this uptake rate. Attendance data were obtained for 5732 (93%) of the 6133 women randomized.8

The trial was reported in accordance with CONSORT (Con solidated S tandards of R eporting T rials).9 The pre-specified primary analysis was an intention-to-treat comparison of women allocated to each intervention compared with those allocated to not receive it. As detailed in option I4 below, this was a random-effects logistic regression with the individual as the unit of analysis and the practice as a random effect. Since the design was factorial, the effect of each intervention was always adjusted for the other. In addition, adjustment was planned for the two stratification variables and the practice uptake in the second round as a ‘baseline’ covariate, as fixed effects available at the practice level. Various secondary analyses were planned, including the interaction between the two interventions, although the study was not specifically powered to detect this effect. The target sample size of 6048 allowed for the effects of clustering, by using an inflation factor of 6 based on variation across practice-based attendance rates in the second round of screening.8

Options for the analysis of cluster randomized trials
CRT may be analysed at the cluster level, by deriving summary statistics for each cluster, or at the individual level using the data for each patient in each cluster. Analyses adjusting for cluster characteristics—for example, a measure of social deprivation corresponding to the geographical location of each practice—may be performed using either cluster-level or individual-level analyses. Only the latter, however, enables adjustment for patient characteristics—for example, baseline blood pressure in a study of subsequent risk of coronary heart disease. While cluster-level analyses automatically allow for the increased similarity of individuals in the same cluster, individual-level analyses should use a statistical model that allows for the clustering.

Consistent with the original protocol, all eight methods considered here modelled the log odds11 of attendance for screening: three were explicitly at the cluster level and five were at the individual level.

Cluster-level analyses
Number of attenders and total number of women in a practice i (i = 1 to 24) are denoted by ri and ni respectively. With logs to base e, for each practice the log odds of attendance is estimated as log oddsi = log(ri/(ni – ri)) and its variance as vari = (1/ri + 1/(ni – ri)).11 The vector of covariates (the two interventions, stratification variables, and second round practice uptake) in practice i is denoted by xi.

C1: Unweighted linear regression of log odds
This is a standard multiple linear regression:


in which the vector of regression coefficients ß represents differences in the log odds of attendance (i.e. log odds ratios) corresponding to the effects of the covariates x, and ui (the random effect at the practice level) is assumed to be normally distributed with zero mean and variance usually denoted by {sigma}2. Since each practice is given equal weight, this analysis does not allow for the differing precision with which the log odds is estimated in each practice, resulting from between-practice variation in the number of women or number of attendances.

C2: Empirically weighted linear regression of log odds
This is a modification of C1 in which the regression is weighted by the inverse variance of the log odds for each practice (weights wi = 1/vari). Such weights minimize the variance of the regression coefficients. The greater the precision with which the log odds in practice i is estimated, the greater then is the influence of that practice on the estimated regression coefficients. This model assumes that the variance of the log odds in each practice is vari x {phi}, where the multiplicative ‘overdispersion’ parameter {phi} (estimated by the between-practice variance in the weighted regression) allows for between-cluster heterogeneity.12

C3: Random-effects meta-regression
In meta-regression, the model similarly includes a random effect ui for each practice:


The random effects are assumed to be normally distributed on the log odds scale with zero mean and variance denoted by {tau}2 (the between-practice variance). The meta-regression procedure differs from method C1 in that both the regression coefficients ß and between-practice variance {tau}2 are estimated allowing appropriately for the estimated variance of log oddsi. Methods for meta-regression have been reviewed;12 here, restricted maximum likelihood estimation was used, as implemented in the Stata13 command ‘metareg’.14 In a situation with large between-practice variation in the log odds, the random-effects model would be expected to assign more equal weights to the log odds from the various practices than would a model employing a multiplicative overdispersion parameter, and thus to yield results more similar to C1 than C2.

Individual-level analyses
I1: Standard logistic regression
The standard logistic regression model for the intervention effects on the log odds of attendance is:


where log oddsij and xij are, respectively, the log odds of the outcome and the vector of covariates for the jth individual in the ith practice, and the vector of regression coefficients ß again represents log odds ratios for the covariate effects. This ignores clustering because it assumes independent outcomes for all individuals. As with all logistic regression models, there is no explicit error term in the model specification; the variability in the outcome event is assumed to arise from binomial variation around the expected outcome predicted by the regression model.11

I2: Robust standard errors
This uses the same model as I1, but adjusts the standard errors to allow for clustering by practice. Such ‘robust’ standard errors are calculated using the ‘sandwich’ variance estimator attributed to Huber15 and White,16 modified to allow for clustering.17 The regression coefficients (log odds ratios) are unaffected by this procedure and are thus identical to those for I1.

I3: Generalized estimating equations
Generalized estimating equations (GEE) extend the logistic regression model I1 to allow for clustering. This is achieved by specifying a correlation matrix that describes, in terms of additional parameters, the association between different individuals in the same cluster.18 For a CRT, it is assumed that all correlations between individuals in the same cluster are the same (‘exchangeable’ correlation matrix). If robust standard errors are specified, and the sample size is large enough, then both the regression coefficients and their standard errors will be correct (in the sense that they will be consistently estimated) even if the correlation matrix is incorrectly specified. GEE that assume outcomes for individuals in the same cluster are uncorrelated give identical results to I2.

I4: Random-effects logistic regression
While many of the other options also involve random effects, the logistic regression model specified in the protocol allows the mean log odds to vary between practices by including a random effect ui for each practice as follows:


and assumes normally distributed random effects with zero mean and variance {tau}2 (the between-practice variance). Estimates of odds ratios from these models will tend to be further from 1 than those from GEE.19 Parameter estimation for method I4 is computationally intensive because of the need to fit a polynomial approximation to the marginal likelihood (the likelihood ‘averaged’ over the values of the random effects). The analyses presented here used Gauss-Hermite quadrature for this computation, and included the recommended check for the stability of the parameter estimates from such models by changing the number of ‘quadrature points’ employed in the polynomial approximation.13

I5: Bayesian random-effects logistic regression
This is the same model as C3, except that the uncertainty in the estimated between-cluster variance {tau}2 is taken into account. Bayesian analysis was implemented using WinBUGS version 1.3,20 in which Markov Chain Monte-Carlo methods estimate posterior distributions of the parameters, given the data and assumed prior distributions for all parameters. The principal results presented here assume a common normal distribution for (practice) random effects on the logistic scale and vague gamma priors. A sensitivity analysis was conducted using a vague uniform prior, but since this had virtually no affect on the findings these results are not presented here. Convergence of the Markov Chain was assessed using graphical methods,20,21 and by checking that similar posterior summaries were obtained with different starting values. Convergence appeared to occur with 3000–4000 iterations. The posterior summaries reported here were based on 110 000 iterations with the initial 10 000 discarded. The estimates and standard errors presented are the means and standard deviations of their respective posterior distributions, and the P-values are double the tail area of the posterior distribution under the null hypothesis.

Models fitted using the various analytical options
The eight methods of analysis (C1 to C3 and I1 to I5) were each used to estimate the effects on attendance for, first, the letter and flag interventions adjusted for each other but fitted without the interaction term (in accordance with the primary analysis of the trial8), and second for their interaction. The two intervention effects and their interaction were each estimated in two models: without and then with adjustment for baseline (second round) practice uptake and the two stratifying factors (area and single/multi-handed practice). The results are thus for three effects estimated using two statistical models, each obtained using the eight methods of analysis. To enable direct comparisons, they are primarily presented as log odds ratios and corresponding standard errors.4 To aid general interpretation, odds ratios and confidence intervals are also presented for selected analyses. All analyses apart from I5 employed Stata version 7.0.13


    Results
 Top
 Abstract
 Methods
 Results
 Discussion
 References
 
Descriptive trial results
Of the 6133 women (cluster) randomized, 3600 attended for screening in the third round, 2132 women failed to attend, and attendance status was unknown for 401 (6.5%) women.8 The proportion attending for screening was 6.4 percentage points higher among those sent a letter (1951/2960) compared with those who had not been sent a letter (1649/2772), and 6.7 percentage points higher for those whose notes were flagged (1606/2408) compared with those whose notes were not flagged (1994/3324).

Comparisons of interest
Letter
The upper third of Table 1Go presents the eight analyses for the letter effect, first with no adjustment (other than for the flag) and secondly adjusted for baseline and stratifying variables. The excessively low standard errors and P-values from I1 compared with the other methods are clear, regardless of adjustment.


View this table:
[in this window]
[in a new window]
 
Table 1 A comparison of eight analytical methods for a cluster randomized trial, with and without adjustment for covariates
 
The parameter estimates from the cluster-level analyses were quite variable, though since the standard errors varied correspondingly the P-values were more consistent. Results from meta-regression were more comparable here to unweighted than weighted regression, and very similar to those from the Bayesian model I5 (Table 1Go).

For individual-level methods, the log odds ratios were necessarily identical for I1 and I2. Moreover, apart from I1, standard errors were lowest here for I2. Both with and without adjustment for covariates, log odds ratios from I4 and I5 were very similar, and as expected the estimates from the random-effects method I4 were larger than those from the GEE (I3).19 However, without adjusting for covariates, the estimates of the coefficients and standard errors for I4 were unreliable according to the quadrature check, a problem known to occur in the presence of large cluster sizes and high intra-cluster correlation.13,22 Reliable estimates were, however, obtained after controlling for baseline uptake, which itself exhibited large between-practice variation.

For the planned primary analysis (adjusted I4), the results in Table 1Go correspond to an odds ratio (95% confidence interval) of 1.31 (1.05, 1.64). As an indication of the scale of variation in the results, the corresponding figures from the (adjusted) standard logistic regression (I1) and meta-regression (C3) were 1.29 (1.16, 1.44) and 1.31 (1.004, 1.71) respectively. After adjustment for covariates the meta-regression log odds ratios were very similar to those from the classical as well as the Bayesian random-effects model, although the standard errors differed. In addition, the estimates of the between-practice variance ({tau}2) varied: 0.075, 0.053, and 0.093 for methods C3, I4, and I5 respectively from the model without the interaction but including covariates.

Flag
Many of the general observations for the letter effect apply equally here, though overall the flag effect was larger in magnitude and P-values were smaller. In this case, however, as well as generally decreasing the standard errors, adjustment for covariates increased the log odds ratios markedly for all eight models. The adjusted odds ratio (95% confidence interval) from I4 was 1.43 (1.14, 1.79); the corresponding figures from I1 and C3 were 1.48 (1.32, 1.66) and 1.42 (1.08, 1.86) respectively. Again, while the log odds ratios from methods C3, I4, and I5 were similar, the standard errors differed.

Letter and flag interaction
Although as expected the interaction P-values were much higher than those for the two interventions adjusted for each other, many of the above general patterns were maintained, in particular the clear bias in standard error for I1 (Table 1Go). In addition, estimates of the interaction log odds ratio differed between the random-effects models, even after adjusting for covariates.

The apparently surprising change of direction of the estimates from the unadjusted to the adjusted models is due to a systematic baseline difference across groups. Despite the structured allocation, the combined letter and flag group had a lower mean practice uptake in the second round compared with the others, principally due to the lack of stratification on previous uptake.8 Allowing for this led to an odds ratio (95% confidence interval) of 1.41 (0.88, 2.28) for I4, with 1.36 (1.06, 1.75) and 1.41 (0.77, 2.57) for I1 and C3 respectively.


    Discussion
 Top
 Abstract
 Methods
 Results
 Discussion
 References
 
Summary of results
Numerous methods are available for the analysis of CRT. Using as an example a factorial CRT of interventions to increase breast screening uptake, eight possible methods of analysis were compared. Three of those considered are based on cluster-level summary statistics, while the five individual-level analyses can be extended to incorporate individual-level covariates. Only one method (standard individual-level logistic regression I1) is invalid, because it fails to account for between-cluster variation. Each of the other methods accounts for clustering in some way, and would therefore be appropriate. Of note, parameter estimates and especially their standard errors differed markedly according to the method of analysis, even when based on the same underlying model.

Cluster-level methods
Methods C1 (unweighted regression of cluster log odds) and C2 (weighted regression of cluster log odds) are both based on standard linear regression—they differ in that method C1 gives equal weight to each cluster while method C2 weights according to the precision of the summary statistic (log odds). The parameter estimates from weighted and unweighted analyses would be expected to differ substantially only if there were large differences in cluster size and/or cluster-specific outcome proportions. In addition, the greater the between-cluster variability the more equal will be the weights used in method C3 (random-effects meta-regression of practice log odds) and so the greater will be the similarity between methods C1 and C3, as illustrated by the results for the present example. A further distinction between C2 and C3 is that although both incorporate between-cluster variance estimates in their weights, the former does so with a multiplicative overdispersion parameter (as do the GEE used in method I3) while the latter employs an additive approach where the weights depend on the sum of the within-cluster variance and the between-cluster variance ({tau}2).12 A disadvantage of C2 is that if there is less between-cluster variability than would be expected by chance then the estimated overdispersion parameter may be below 1, causing the standard errors to be too small.

Random-effects models
As explained above, methods C3, I4 (random-effects logistic regression), and I5 (Bayesian random-effects logistic regression) are based on the same statistical model. The parameter estimates from C3 and I5 were similar while, as expected, the standard errors from I5 were somewhat larger. This is because I5 accounts for the uncertainty in estimating the between-cluster variability as well as the uncertainty in the regression coefficients. In other words, the larger standard errors given by the Bayesian approach arise from the fact that it allows appropriately for the error in estimating the between-cluster variance, rather than from the uninformative priors being too vague.

The results from I4, however, differed in two ways. First, the parameter estimates from the unadjusted analyses (left hand column of Table 1Go) were unstable from the quadrature checks.13 Second, the standard errors from both unadjusted and adjusted analyses were noticeably smaller than for C3 and I5. This may lead to concern that the estimation procedure produces standard errors that are too small, even when quadrature checks do not reveal problems. Difficulties in estimating the parameters of random-effects logistic regression models are well documented,23 but it remains to be seen whether recently developed improvements to the estimation of such models24 yield larger standard errors.

Robust standard errors and generalized estimating equations
Methods I2 (logistic regression with robust standard errors) and I3 (GEE with robust standard errors) both attempt to allow for clustering without fitting a full random-effects model. In I2 the standard errors only are adjusted, with the version of the ‘sandwich’ variance estimate implemented in Stata deriving standard errors by using the observed variability in the data (via the model residuals), rather than the variability predicted by the model.13 The GEE assessed here employed an ‘exchangeable’ correlation matrix, and were thus similar to traditional models for overdispersed binomial data25,26 given that all covariates were at the cluster level.2 In this example the robust standard errors were noticeably smaller than those from the model-based procedures. This may be because the number of clusters was less than the minimum of 20 per group required given the large-sample theory on which the robust methods are based.2,27

The parameter estimates from GEE should not be interpreted as corresponding to the parameter estimates from random-effects models. In the context of analysing longitudinal data, parameter estimates from GEE are described as having ‘population-averaged’ interpretations,19,28 because they are averaged across the values of the cluster-level random effect ui. The relationship between the regression coefficients from random-effects models (ßRE) and GEE (ßGEE) is approximately:


where {tau}2 is the between-cluster variance.19 This implies that ßRE will be further from zero than ßGEE, by a degree directly related to the between-cluster variance. Here, substituting into this equation the estimate of {tau}2 from method I4 including covariates but not the interaction, the expected random-effects log odds ratio would be 0.255 x (1 + 0.35 x 0.053)1/2 = 0.257 for the letter, and likewise 0.348 for the flag. These are not the same as the estimates obtained from I4 (Table 1Go), and the corresponding calculations for C3 and I5 give the same result. Since the log odds ratios across the various random-effects models are similar, this is again most likely to be due to the lack of reliability of I3 here given the number of clusters.

Issues in the analysis of cluster-randomized trials
The analyses adjusting for covariates gave stronger evidence for effects of the interventions, principally due to the effect of baseline uptake. While strictly speaking adjustments for centre and area were necessary having stratified on them,29 in these analyses they had little effect. Adjusting for cluster characteristics strongly associated with the outcome will often be necessary in CRT, because the number of clusters randomized is typically not large enough for randomization to provide balance between the intervention groups. Although individual-level covariates were not included in the analysis of this trial, this might be necessary where individuals are examined at baseline and follow-up. Use of an individual-level method would then be essential. More general random-effects (multi-level) models could be used to examine complex variation, using WinBUGS20 or specialist software such as MlwiN30 or HLM,31 although larger numbers of clusters are likely to be required than are typically available in CRT.

Since this was a factorial trial, the interventions were adjusted for one another.32 In an individually randomized factorial trial with equal numbers randomized to the four groups (a balanced design), this has no effect on the parameter estimates, although the standard errors are altered. In a CRT, differing cluster sizes will mean that the estimated intervention effects will be altered by such adjustment, whether or not the design is balanced at the cluster level. Moreover, this likelihood of imbalance means that while the example employed here involved a factorial design, the above comments regarding covariates are also of relevance to non-factorial cluster randomized designs, as indeed are the general conclusions from this comparison of analytical methods. Finally, while the trial presented here was not powered to detect an interaction, the nature of the interventions in this trial, as indeed in many cluster trials, means that such an effect is certainly not implausible.

Sensitivity analyses
While it remains important to specify the primary data analysis in advance, various forms of sensitivity analysis can be considered for CRT. First, results from analyses using different methods might be presented. The results presented here suggest that in practice this could lead to confusion, especially in the absence of clear guidance from methodological studies as to which is the best method. Second, investigations could be made into the sensitivity of the conclusions to model assumptions—for example, whether the most appropriate scale is the log odds (as specified in advance here), the log risk or the risk itself. Likewise for distributional assumptions concerning the random effects—options include a common Gaussian or t-distribution on the log odds scale or a common beta distribution on the risk scale. Although requiring more expertise than standard statistical packages such as Stata, WinBUGS20 allows flexible specification of this distribution. It could therefore be used to perform such sensitivity analyses, though the number of clusters will usually be too small to allow empirical investigation of the distribution of the random effects. Finally, the impact of changing the assumed priors can be investigated. Here, in analyses using a common beta distribution with vague exponential priors (data not shown) the parameter estimates were almost identical to those assuming normally distributed random effects (method I5 in Table 1Go), while the standard errors were smaller and even closer to those from C3 (meta-regression). Another important sensitivity analysis most easily achieved in WinBUGS would be to allow the degree of clustering to vary by randomization group.

Conclusions
With a sufficient number of clusters, statistical theory shows that all the methods considered here will give valid results apart from I1. The analysis of this CRT shows that choices regarding both the method of analysis and the variables included in the model can make important differences to the conclusions from a trial. From the results presented it is not possible to conclude which method or methods gives the most ‘correct’ estimates and standard errors. This would require extensive simulation studies comparing the results of the different methods in situations typical of CRT but where the true model parameters are known. In the context of estimating variance and covariance parameters, different procedures appear to perform better in different situations and with different types of outcome variable.33 Although difficult because of the computationally intensive nature of the individual-level methods, further such investigations are needed to provide guidance for those planning and analysing CRT.


KEY MESSAGES

  • Many different methods are available for analysing studies involving clustering effects.
  • Using a cluster randomized trial (CRT) as an example, we demonstrate that even methods allowing properly for clustering effects can give different results.
  • Adjustment for characteristics measured at baseline may have a substantial influence on parameter estimates and their standard errors.
  • Simulation studies are needed to guide the choice of analytical strategy for CRT.

 


    Acknowledgments
 
We are grateful to the Medical Research Council for funding the original study (grant numbers G9328350 and G9328361). The trial would also not have been possible without the input of Joan Austoker, Juliet Formby, Richard Hobbs, Val Redman, Lesley Roberts, Debbie Sharp, Clare Tydeman, and Sue Wilson, and the support of Julietta Patnick, Brent and Harrow Health Authority and the many individuals involved from the Breast Screening Units, Health Authorities and General Practices in London and the West Midlands. Bristol is the lead centre of the MRC Health Services Research Collaboration.


    References
 Top
 Abstract
 Methods
 Results
 Discussion
 References
 
1 Campbell MJ. Cluster randomized trials in general family practice. Statist Methods Med Res 2000;9:81–94.[ISI]

2 Donner A, Klar N. Design and Analysis of Cluster Randomisation Trials in Health Research. London: Arnold, 2000.

3 Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey Smith G, Altman DG (eds). Systematic Reviews in Health Care: Meta-Analysis in Context, 2nd Edn. London: BMJ Books, 2001, pp. 285–312.

4 Ukoumunne OC, Gulliford MC, Chinn S, Sterne JAC, Burney PGJ. Methods for evaluating area-wide and organisation-based interventions in health and health care: a systematic review. Health Technol Assess 1999;3(5).

5 Campbell MK, Mollison J, Steen N, Grimshaw JM, Eccles M. Analysis of cluster randomized trials in primary care: a practical approach. Fam Pract 2000;17:192–96.[Abstract/Free Full Text]

6 Gulliford M, Ukoumunne O, Chinn S, Sterne J, Burney P, Donner A. Methods for evaluating organisation- or area-based health interventions. In: Stevens A, Abrams K, Brazier J, Fitzpatrick R, Lilford R (eds). The Advanced Handbook of Methods in Evidence Based Healthcare. London: SAGE, 2001, pp. 295–313.

7 Ukoumunne OC, Thompson SG. Analysis of cluster randomized trials with repeated cross-sectional binary measures. Statist Med 2001; 20:417–33.[CrossRef][ISI]

8 Richards SH, Bankhead C, Peters TJ et al. A cluster randomised controlled trial comparing the effectiveness of two interventions in primary care aimed at improving attendance for breast screening. J Med Screen 2001;8:91–98.[CrossRef][ISI][Medline]

9 Moher D, Schulz KF, Altman D, for the CONSORT Group. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 2001;285:1987–91.[Abstract/Free Full Text]

10 Elbourne DR, Campbell MK. Extending the CONSORT statement to cluster randomized trials: for discussion. Statist Med 2001;20:489–96.[CrossRef][ISI]

11 Collett D. Modelling Binary Data. London: Chapman & Hall, 1991.

12 Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods. Statist Med 1999;18:2693–702.[CrossRef][ISI]

13 StataCorp. Stata Statistical Software: Release 7.0. College Station, TX: Stata Corporation, 2001.

14 Sharp S. sbe23: Meta-analysis regression. Stata Technical Bull 1998;42:16–24.

15 Huber PJ. Robust Statistics. New York: Wiley, 1981.

16 White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 1980;48:817–30.[ISI]

17 Rogers WH. sg17: Regression standard errors in clustered samples. Stata Technical Bull 1993;13:19–23.

18 Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;72:13–22.

19 Zeger SL, Liang K-Y. An overview of methods for the analysis of longitudinal data. Statist Med 1992;11:1825–39.[ISI]

20 Spiegelhalter D, Thomas A, Best N. WinBUGS Version 1.3 User manual. University of Cambridge MRC Biostatistics Unit, 2000. [Available at http://www.mrc-bsu.cam.ac.uk/bugs]

21 Brooks SP, Gelman A. Alternative methods for monitoring convergence of iterative solutions. J Computational Graphical Statist 1998;7:434–55.[ISI]

22 Lesaffre E, Spiessens B. On the effect of the number of quadrature points in a logistic random-effects model: an example. Appl Statist 2001;50:325–35.

23 Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. J Am Statist Assoc 1993;88:9–25.[ISI]

24 Rabe-Hesketh S, Skrondal A, Pickles A. Reliable estimation of generalized linear mixed models using adaptive quadrature. Stata J 2002; 2:1–21.

25 McCullagh P, Nelder JA. Generalized Linear Models. London: Chapman & Hall, 1983.

26 Williams DA. Extra-binomial variation in logistic linear models. Appl Statist 1982;31:144–48.[ISI]

27 Donner A. Some aspects of the design and analysis of cluster randomisation trials. Appl Statist 1998;47:95–113.

28 Zeger SL, Liang K-Y, Albert PS. Methods for longitudinal data: a generalized estimating equation approach. Biometrics 1988;44:1049–60.[ISI][Medline]

29 Senn S. Statistical Issues in Drug Development. Chichester: Wiley, 1997.

30 http://multilevel.ioe.ac.uk/features/index.html

31 http://ssicentral.com/hlm/hlm.htm

32 Armitage P, Berry G, Matthews JNS. Statistical Methods in Medical Research, 4th Edn. Oxford: Blackwell Science, 2002.

33 Evans BA, Feng Z, Peterson AV. A comparison of generalized linear mixed model procedures with estimating equations for variance and covariance parameter estimation in longitudinal studies and group randomized trials. Statist Med 2001;20:3353–73.[CrossRef][ISI]





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Peters, T.
Articles by Sterne, J.
PubMed
PubMed Citation
Articles by Peters, T.
Articles by Sterne, J.