a Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, MD, USA.
b Section of Epidemiology and Biostatistics, Department of Obstetrics, Gynecology and Reproductive Sciences, UMDNJRobert Wood Johnson Medical School, NJ, USA.
Correspondence: Dr Stephen Cole, Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St E-7139 Baltimore, MD 21205, USA. E-mail: scole{at}jhsph.edu
Abstract
Epidemiologists frequently encounter studies with ordered responses. Standard ordered response logit models, such as the continuation ratio model, constrain exposure to have a homogenous effect across thresholds of the ordered response. We demonstrate a method for fitting regression models for unconstrained, partially or fully constrained continuation odds ratios using a person-threshold data set. For each subject, we create a separate record for each response threshold the subject is at risk of passing and then apply standard binary logistic regression to estimate the continuation-ratio model. An example demonstrates the unconstrained, partially and fully constrained continuation-ratio model, while a small simulation study examines some properties of the proposed person-threshold approach. Finally, we present a brief discussion of statistical software to implement the method.
Keywords Continuation-ratio model, epidemiological methods, odds ratio, ordered response
Accepted 3 May 2001
Epidemiologists frequently encounter studies with ordered responses. Standard ordered response logit models, such as the cumulative logit (a.k.a. proportional odds) and continuation-ratio models,1 constrain the effect of a treatment or exposure X to have an equivalent effect on transitioning among thresholds of an ordered response Y. A recent review2 and commentary3 on regression models for ordinal response data underscored the need to assess the assumption of homogeneity of threshold-specific effects in models that provide a single effect estimate collapsed over thresholds. Herein, we use an empirical example to explicate a simple method to fit unconstrained, partially and fully constrained continuation-ratio models as mentioned by Armstrong and Sloan.4 In addition, we evaluate some of the statistical properties of the proposed method by Monte Carlo simulation.
Continuation-ratio Model
Consider an ordered response Y, and a vector of covariates x, collected on N subjects. To such data, one may fit a fully constrained continuation-ratio model of the form
![]() |
To clarify, if Y has k levels, k-1 logits are formed. For example, a 4-level ordered response, Y = [1,2,3,4] would have j = 3 logit comparisons: one for each threshold (1 2, 2
3, 3
4). In the continuation-ratio formulation of an ordered model, the three specific logit comparisons are: level 2+ versus level 1, level 3+ versus level 2, and level 4 versus level 3. There are alternative formulations of the ordered model which may better suit a specific application, such as the cumulative logit model (which compares levels 2+ versus 1, levels 3+ versus 1 and 2, and level 4 versus 13) and the adjacent-category logit model (which compares level 2 versus level 1, 3 versus 2, and 4 versus 3). Greenland5 gives some biological considerations that can be used to make initial choices among the formulations. In particular, the continuation-ratio model is recommended when the underlying outcome is irreversible, in the sense that upon attaining level j a subject's response cannot revert to a lower level.
For a fully constrained continuation-ratio model (model 1), the estimated continuation log odds ratio, , is a weighted average of the j threshold-specific continuation log odds ratios. In an unconstrained continuation-ratio model of the form
![]() |
As the fully constrained model is nested within the unconstrained model, the difference in 2 log likelihoods (deviances) provides a test of the validity of the assumption that the threshold-specific continuation odds ratios are equal, distributed as a 2 variate under the null with degrees of freedom (d.f.) equal to the difference in the number of parameters between the nested models. Interval estimation for
is conducted using the standard formula for a (1
/2)% confidence interval based on the asymptotic standard error,
± t1
/2
var(
).
The method we espouse is accomplished by rearranging the data in a fashion parallel to discrete person-time logit models for survival analysis.6,7 Each subject receives a number of records equal to the number of response thresholds for which he or she was at risk of passing. Subjects had to have passed the previous threshold to be at risk for the next threshold. A standard binary logit model may then be applied to analyse the rearranged person-threshold data and to derive estimates of the continuation log odds ratios from models 1 or 2, depending on specification.
Example
In Table 1, we provide data presented by Ananth and Kleinbaum,2 and re-analysed by Cole.3 These data reflect the degree of perinatal laceration (none, 1°, 2°, 3° and 4°), an ordered response, in relation to a midline episiotomy (further details about these data are available in ref. 2). Because degree of laceration is irreversible, the continuation-ratio model (model 1) is a reasonable starting formulation. To expand the data to a person-threshold format, each subject receives one observation for each threshold for which the subject is at risk. This person-threshold data formulation explicitly weights the subjects' contributions as is necessary to recover the summary continuation odds ratio, which is a weighted average of threshold specific odds ratios. As can be seen in Table 2
, the multiple observations for a subject in the person-threshold data set all have the same X and Y values (parallel to time-constant variables in a person-time data set). A threshold variable Threshold marks which of j thresholds the record corresponds to, while an indicator variable Pass is set to one if the threshold was passed and zero otherwise. Therefore, each subject may have at most one observation with Pass equal zero. Following the argument by Cox,8 these contributions may be considered conditionally independent observations. Using the person-threshold data set, a binary logistic model regressing Pass on X and including indicator variables for the j thresholds fits a fully constrained continuation-ratio model. Interactions between the indicators of threshold level and the covariate of interest X are created to model the effect of the covariate on each threshold and thereby to estimate and test the threshold-specific effects of X on Y. These exposure-by-threshold interactions are analogs to exposure-by-time interactions in pooled logistic regression, which can be used to test the assumption of proportional hazards. Modification to these interactions allows a general method for relaxing constraints for the X effect over thresholds. For example, including all the threshold-specific interactions with X allows an unconstrained continuation-ratio model with a log odds ratio for each threshold, while fitting a fully constrained model with no threshold-specific interactions produces a single summary continuation log odds ratio.
|
|
|
Simulations
To examine this method of fitting continuation-ratio models, we performed a limited Monte Carlo simulation study. We randomly generated independent observations of data D = (X, Y), where X was a binary variable and Y was a 3-level ordinal response dependent on X through the model
![]() |
The proposed method, using a person-threshold data set, appeared unbiased (on average, the method recovered the true point estimate, Table 4). Asymptotic standard errors were nearly equivalent to the robust standard errors, and coverage of the 95% CI was uniformly nominal for correctly specified models. One exception to the forgoing statement was a slight, but apparent, underestimation of the standard error for the second threshold-specific continuation log odds ratio in unconstrained models. For example, in the first scenario of Table 4
when the true continuation log odds ratios were both 0, the average estimated standard error for the second threshold was 0.416, while the robust standard error was 0.426.
|
Discussion
This proposed method for fitting continuation-ratio models improves upon a standard fully constrained model by allowing for greater flexibility in fitting the effects of covariates, while requiring only a binary logistic regression model with indicators for thresholds. Such binary logistic regression models are common to all major statistical software packages. The person-threshold formulation easily generalizes to include multiple covariates to account for confounding and interactions among covariates to assess effect measure modification. The formulation of the person-threshold data set can be easily programmed using standard statistical software packages with data management capabilities, such as SAS. A copy of the SAS program that was developed to analyse the perinatal laceration data set is available upon request.
Compared to a fully constrained model, a partially or totally unconstrained continuation ratio model will have less efficiency (due to the increased number of parameters) but likely will represent the threshold-specific exposure effects better when these effects are not homogenous. This is a straightforward example of the omnipresent tradeoff between bias and efficiency in statistics. We should note that the constrained continuation log odds ratio estimated in Table 3 does not equal that generated by the SAS program LOGISTIC formulated for the continuation-ratio model. The difference is due to the choice of link function; in our analyses we chose the logit link rather than the complementary log-log link function as proposed by Laara and Matthews9 for continuation-ratio models. When a response variable has more than two levels, the SAS program LOGISTIC makes the ordered logit comparisons Pr(Y > yj | X = x) versus Pr(Y
yj | X = x) for all j. Therefore, with the (default) logit link the SAS program LOGISTIC fits a proportional odds model. Alternatively, with a complementary log-log link the SAS program LOGISTIC fits a cumulative complimentary log-log model, which estimates a log odds ratio mathematically equivalent to a complementary log-log continuation-ratio model.9
Analogous to our extension of the continuation-ratio model to fully and partially constrained models, Peterson and Harrell10 extended the cumulative logit model to allow non-proportional odds. In particular, their methods include the unconstrained ( j parameters for each covariate) and constrained (by specifying constraints based on trends in log odds ratios) non-proportional odds models, however such models are difficult to implement in currently available statistical software.
In conclusion, the proposed alternative method for fitting continuation-ratio models allows great flexibility over the assumption of homogeneity of threshold-specific covariate effects, and may allow more appropriate application of models for ordered responses. Such models will be of great benefit to epidemiologists who frequently encounter studies with ordered responses.
Acknowledgments
We thank Drs Paul Allison, Sander Greenland, Tom Ten Have, Nancy Cook and John Smulian for helpful comments and suggestions.
References
1 McCullagh P. Regression models for ordinal data. J R Statist Soc (B) 1980;42:10942.[ISI]
2 Ananth CV, Kleinbaum DG. Regression models for ordinal responses: a review of methods and applications. Int J Epidemiol 1997;26:132333.[Abstract]
3 Cole SR. A reanalysis of data from regression models for ordinal responses. Int J Epidemiol 1999;28:805.[ISI][Medline]
4 Armstrong BG, Sloan M. Ordinal regression models for epidemiologic data. Am J Epidemiol 1989;129:191204.[Abstract]
5 Greenland S. Alternative models for ordinal logistic regression. Stat Med 1994;13:166577.[ISI][Medline]
6 Allison PD. Discrete-time methods for the analysis of event histories. Soc Methodol 1982;13:6198.
7 D'Agostino RB, Lee ML, Belanger AJ, Cupples LA, Anderson K, Kannel WB. Relation of pooled logistic regression to time dependent Cox regression analysis: The Framingham Heart Study. Stat Med 1990; 9:150115.[ISI][Medline]
8 Cox DR. Regression models and life tables. J R Statist Soc (B) 1972;34:187220.[ISI]
9 Laara E, Mathews JNS. The equivalence of two models for ordinal data. Biometrika 1985;72:20607.[ISI]
10 Peterson B, Harrell F. Partial proportional odds models for ordered response variables. Appl Stat 1990;39:20517.[ISI]