Mathematical coupling: a multilevel approach

Mark S Gilthorpe and Yu-Kang Tu

Biostatistics Unit, Centre for Epidemiology and Biostatistics, Leeds Institute of Genetics, Health & Therapeutics, University of Leeds, Leeds LS2 9LN, UK. E-mail: m.s.gilthorpe{at}leeds.ac.uk

Results from a paper published in this journal1 are acknowledged by its authors (see previous contribution2) as suffering bias due to the effects of mathematical coupling (MC). The authors report an alternative analysis that seeks to overcome MC. However, this is not without its problems—we explain why and propose an alternative strategy.

Taking an example from Gunnell et al.,1 if self-reported height (x1) tended to be underestimated compared with recorded height (x2) amongst tall people and overestimated amongst short people, the standard deviation (SD) of x1 would be less than that for x2. Under the null hypothesis (H0) that over-/under-reporting is not related to either measure, the SD of x1 and x2 should be equal. However, under H0, the difference x1x2, when correlated or regressed on x1 or x2, nearly always yields a statistically significant association for large samples.3 Such analyses are therefore misleading. A solution is to assess the difference x1x2 with respect to the mean (x1 + x2)/2, as proposed by Oldham.4 Although MC remains, its effects are annulled because the statistical association between x1 x2 and x1 + x2 is zero under H0, illustrated geometrically if we envisage x1 and x2 as vectors with lengths equal to their SD; under H0, the vectors representing x1 and x2 are of equal length and the cosine of the angle between them is their correlation (Figure 1). Vectors representing x1x2 and x1 + x2 are always perpendicular, i.e. their values are uncorrelated, under H0. Consequently, differences correlated with or regressed on means yield a correlation or regression coefficient of zero.



View larger version (10K):
[in this window]
[in a new window]
 
Figure 1 Variables x1 and x2 represented as vectors with lengths equal to their SD; cosine {theta} is the correlation between x1 and x2; under H0 (the SD of x1 and x2 are equal) the vectors x1x2 and x1 + x2 are always perpendicular, irrespective of the correlation between x1 and x2

 
Extending Oldham's method to multiple regression could contravene the special circumstance in which the effects of MC are annulled. A covariate z, to be added to the regression model, may also be represented as a vector with length equal to its SD. Were z related to x1 or x2 (or both), its vector would not lie perpendicular to the vector x1 + x2. Consequently, the regression of x1 x2 on x1 + x2 would no longer be perpendicular, as required under H0, unless the vector for z is also perpendicular to the vector for x1x2 (i.e. its values are uncorrelated with those for x1x2). The effects of MC are therefore potentially reintroduced, the extent of which depends upon the correlation of z with both x1x2 and x1 + x2. It is therefore unclear how accurate Oldham's method remains when extended to multiple regression.

An alternative approach is to use multilevel modelling.5 The specific multilevel model required to analyse two variable differences in relation to their mean, avoiding MC, is where one specifies both variable measures as repeated outcomes at level-1, clustered within individuals at level-2. A covariate indicating the measure-type (self-reported or recorded) is included, where its coefficient is allowed to exhibit random variation about its mean—known as a random coefficient model.6 MC is not present since the dependent variable has no formulaic relationship with the independent variable. Furthermore, covariates associated with either measure, or their mean, or their difference, may be added without distorting the model. The unbiased correlation between differences and mean, akin to Oldham's correlation, is obtained from the covariance between the random intercept and the random slope.7 Note that the covariate for measure-type has to be centred about zero to avoid overestimation of the covariance term, and the covariate interval must be one in order to interpret the regression coefficient as the mean difference between measures. MC is clearly a problem not to be overlooked. However, the extent to which Oldham's method may remain biased when extended to multiple regression is not known, and the magnitude of any potential bias would vary from study to study. Multilevel modelling, on the other hand, circumnavigates the risk of MC totally.


    References
 Top
 References
 
1 Gunnell D, Berney L, Holland P et al. How accurately are height, weight and leg length reported by the elderly, and how closely are they related to measurements recorded in childhood? Int J Epidemiol 2000; 29:456–64.[Abstract/Free Full Text]

2 Gunnell D, Berney L, Holland P et al. Does the misreporting of adult body size depend upon an individual's height and weight? Methodological debate. Int J Epidemiol 2004; 33:1398–99.[Free Full Text]

3 Tu Y-K, Gilthorpe MS, Griffiths GS. Is reduction of pocket probing depth correlated with the baseline value or is it ‘mathematical coupling’? J Dental Res 2002; 81:722–26.[Abstract/Free Full Text]

4 Oldham PD. A note on the analysis of repeated measurements of the same subjects. J Chron Dis 1962; 15:969–77.[CrossRef][ISI][Medline]

5 Goldstein H, Browne W, Rasbash J. Multilevel modelling of medical data. Stat Med 2002; 21:3291–315.[CrossRef][ISI][Medline]

6 Bryk AS, Raudenbush AW. Hierarchical Linear Models: Applications and Data Analysis Methods. London: Sage; 1992.

7 Gilthorpe MS, Cunningham SJ. The application of multilevel, multivariate modelling to orthodontic research data. Community Dental Health 2000; 17:236–42.[ISI][Medline]