Effect Heterogeneity by a Matching Covariate in Matched Case-Control Studies: A Method for Graphs-based Representation

Inyoung Kim1, Noah D. Cohen2 and Raymond J. Carroll1

1 Department of Statistics, Texas A&M University, College Station, TX.
2 Department of Large Animal Medicine and Surgery, Texas A&M University, College Station, TX.

Received for publication February 23, 2001; accepted for publication April 18, 2002.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
The authors describe a method for assessing and characterizing effect heterogeneity related to a matching covariate in case-control studies, using an example from veterinary medicine. Data are from a case-control study conducted in Texas during 1997–1998 of 498 pairs of horses with colic and their controls. Horses were matched by veterinarian and by month of examination. The number of matched pairs of cases and controls varied by veterinarian. The authors demonstrate that there is effect heterogeneity related to this characteristic (i.e., cluster size of veterinarians) for the association of colic with certain covariates, using a moving average approach to conditional logistic regression and graphs-based methods. The method described in this report can be applied to examining effect heterogeneity (or effect modification) by any ordered categorical or continuous covariates for which cases have been matched with controls. The method described enables one to understand the pattern of variation across ordered categorical or continuous matching covariates and allows for any shape for this pattern. This method applies to effect modification when causality might be reasonably assumed.

case-control studies; colic; effect modifiers (epidemiology); epidemiologic methods; heterogeneity; logistic models; multicenter studies


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Matching in case-control studies is often desirable or necessary. Although it is generally accepted that covariates for which cases and controls are matched cannot exert a confounding effect on independent covariates included in the analyses (1), effect heterogeneity or effect modification by matching covariates may occur. To our knowledge, methods for evaluating the magnitude and direction of effect heterogeneity by matching covariates in matched case-control studies are not well described. The purpose of this report is to describe a graphs-based method for assessing and characterizing effect heterogeneity by a matching covariate in matched case-control studies, using data obtained from a veterinary study. The graphs themselves are easily computed using any conditional logistic regression program. The method described in this report can help identify variables that seem to be correlated with cluster-specific odds ratios and thus document that there may be effect heterogeneity. The method, however, cannot determine causality and, in particular, cannot determine whether the variable of interest is an important determinant of responses to exposure or merely an ecologically related factor. For this reason, we will use the term "effect heterogeneity" rather than the more common "effect modification"; however, our method applies to the latter when causality might reasonably be inferred.

Specifically, we examine whether a characteristic of a matching covariate is associated with effect heterogeneity for the association of equine colic with other covariates of interest. More generally, the methods described can be extended to examine effect heterogeneity by any ordered categorical or continuous matching covariate; other categorical factors such as geographic region can also be considered. The methods described in this report may be particularly useful for the analysis of multicenter case-control studies where effect heterogeneity by center or center size may occur. Failure to account for effect heterogeneity by center could lead to inappropriate generalization of study findings among centers or failure to detect factors relating to exposure or outcome that differ among centers.

Although epidemiologic studies of equine colic have been reported (2–13), few describe management practices that predispose to colic (3, 4, 6, 8, 10). Recent change in diet has been identified as a risk factor for colic (3, 4); in these reports, participating veterinarians contributed various numbers (clusters) of matched pairs of cases of colic and controls. A natural question to ask is the extent to which cluster size results in effect heterogeneity for the association of diet change with colic. The rationale for examining cluster size is that veterinarians who contributed more pairs of cases and controls may have differed in meaningful ways from those who contributed less. This is an illustration of a multicenter study where center-level-specific factors (e.g., number of contributed matched pairs) may be associated with effect heterogeneity.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Study population
As previously reported (4), veterinarians in private practice in Texas were recruited to participate in a study of the association of equine colic with dietary and other management factors. A letter soliciting participation was mailed to 774 veterinarians. Of 244 willing respondents, 145 ultimately contributed data to the study, representing 104 private practices.

Participating veterinarians were asked to provide data for one horse treated for colic and one horse that received emergency treatment for any condition other than colic, monthly between March 1, 1997, and February 28, 1998. A colic case was defined as the first horse treated during a given month for signs of intraabdominal pain. A control was defined as the next horse that received emergency treatment for any condition other than colic and was treated by the veterinarian who treated the horse with colic. Controls were examined no more than 30 days after the corresponding colic case. The rationale for selecting the next horse treated for a condition other than colic as a control was to obviate any seasonal bias in choosing the comparison population, because incidence of colic is considered to be seasonal and because feeding and other management practices of horses vary by season; for example, the amount of fresh grass ingested while grazing varies between winter and spring. Horses were functionally matched on the basis of month, veterinarian, and region. Horses less than 6 months of age were excluded, because the types of colic in weanlings and foals are different from those in horses. Because some participating veterinarians were employed in the same practice, matched case-control pairs were often contributed by practices with cases and controls not matched by individual veterinarian. For the purposes of this study, we examined only the 498 matched pairs (996 horses) contributed by the 145 veterinarians in which the case-control sets were seen by the same veterinarian (table 1). Data collected for cases and controls included information regarding identifiers for the horse, farm, and veterinarian, date of examination, and various management factors (including dietary practices) (4).


View this table:
[in this window]
[in a new window]
 
TABLE 1. Tabulation of the number of discordant matched pairs of horses with a recent diet change, by disease status (horses with colic or their controls) and by the number of matched pairs contributed by Texas veterinarians participating in a matched case-control study of equine colic, 1997–1998
 
In the original analysis (4), using conditional logistic regression, we found that colic was significantly associated with a recent (i.e., within the 2-week period prior to examination) change in diet (i.e., that which was routinely fed to the horse), parasite control, and other management factors. We were interested in whether there was effect heterogeneity associated with the veterinarian for any or all of these variables. Unfortunately, the database was limited with respect to detailed information about veterinarians, but one potential source of effect heterogeneity to consider was the number of matched pairs contributed to the study by a given veterinarian.

Statistical methods and analysis of the data
This section describes the development of graphs-based methods, with associated confidence intervals, for understanding effect heterogeneity in a variable that is part of the matching process. Currently, no such graphs-based methods exist. The graphs used for our method, however, can be constructed by any conditional logistic regression program. In our study, the basic matching depended on the veterinarian, and the derived variable was the number of matched pairs contributed by a veterinarian. Each member of a matched pair had the same value of the potential source of effect heterogeneity.

Because it may not be obvious in what follows, we emphasize that every conditional logistic regression that we consider below is a multivariate conditional logistic regression: All the factors that were considered in the original multivariate model of risk factors for colic are considered simultaneously in our method. These factors were recent change in diet, recent change in type or batch of hay, recent change in stabling/housing management, recent change in weather conditions, history of previous colic, age (>10 years or <=10 years), Arabian breed, whether the horse was exercised at least once each week, farm acreage (farms of >25 acres (0.1012 km2) and farms of <=25 acres), recent administration of an anthelmintic, and whether anthelmintics were administered regularly (regular parasite control).

A simple method to test for effect heterogeneity is to add one or more multiplicative interaction terms to the model, multiplying the potential source of effect heterogeneity by the covariate in question and then testing whether this derived variable has a statistically significant effect. This can be done either simultaneously or in a series of conditional logistic regressions with each variable alone but in turn having the multiplicative interaction. Our method differs from this in two respects: 1) it is based on graphs rather than on statistical significance tests; and 2) it does not assume that the effect heterogeneity is of multiplicative form.

Our method is a version of what is known as the varying coefficient model (14), in which the coefficients of a conditional logistic regression vary with a matching variable. The idea of using the varying coefficient model in conditional logistic regression has not been reported previously, to the authors’ knowledge. We describe our method as it applies to our example. The Appendix gives algebraic details of the varying coefficient model.

Our method to understand effect heterogeneity graphically is based on a type of moving average. The moving average approach entails running conditional logistic regression analyses for various subcohorts of the potential source of effect heterogeneity and running conditional logistic regressions for each of the various subcohorts. In our example, the potential source of effect heterogeneity, k, was the number of matched pairs contributed by a veterinarian. We ran conditional logistic regression models for six subcohorts, each subcohort defined by a window of the number of matched pairs contributed. Subcohort 1 contained those matched pairs from veterinarians who contributed k = 1, 2, 3, 4, 5, or 6 matched pairs; subcohort 2 contained those matched pairs from veterinarians who contributed k = 2, 3, 4, 5, 6, or 7 matched pairs, and so on. The final subcohort, subcohort 6, consisted of those veterinarians who contributed {sigma}{iota}{xi} o{rho} µo{rho}{varepsilon} matched pairs. The regression coefficients are then plotted as a function of the source of effect heterogeneity. Any resultant trends can then be investigated further.

To assess probabilistically whether the odds ratio depended heavily on the subcohort number j, we conducted two analyses. First, we bootstrapped the data nonparametrically (15) by sampling veterinarians with replacement and then by sampling matched pairs by veterinarians with replacement (as an aside, we also sampled veterinarians with replacement in each subcohort defined by the number of matched pairs, with no major differences). Having formed the new bootstrap sample, we recomputed the moving average subcohorts j = 1, 2, ..., 6, as previously described, thereby obtaining the coefficients ß(j). We then fit linear and quadratic regressions to the plot of the coefficients ß(j) against j. We then used bootstrap confidence intervals to make probability statements. We also investigated a second, more parameterized analysis. Instead of using our semiparametric subcohort method, we fit conditional logistic regression with linear and quadratic modifying effects for the change in diet. In the linear case, for example, to the dummy variable (X) indicating a change in diet, we added the variable k x X, where k is the number of matched pairs contributed by the veterinarian.

The method described above for ordered categorical variables can be extended to a continuous source of effect heterogeneity, if one categorizes the potential source of heterogeneity and then uses the methods described above (e.g., 5-year age categories instead of the specific age of the veterinarian). More generally, if one prefers to treat continuous variables as continuous data rather than as categorical data, we describe in the Appendix how this can be done with varying coefficient models (14), which are a class of models that contains ours as a special case.

Another way to model effect heterogeneity is through multiplicative interactions (i.e., to multiply each covariate by the source of effect heterogeneity, possibly modeling the product nonparametrically; see the Appendix for details). Our approach has the advantage of interpretation: Regression coefficients themselves depend on the source of effect heterogeneity. More importantly, our graphs-based method gives, we think, more direct and intuitive understanding of the effect heterogeneity than do significance levels obtained by fitting models with numerous multiplicative interactions.

Ours is not the only possible method that can be used to study effect heterogeneity. In the Appendix, we not only describe the varying coefficient model algebraically, but we also discuss another method, currently available only as a user-contributed S-PLUS program (MathSoft, Inc., Cambridge, Massachusetts) and suggested to us by a referee, that can be used to understand a specific type of effect modification, although the program was not intended for evaluating effect modification in case-control studies. This alternative method is not graphs based and requires special software. It is also not a varying coefficient model for effect heterogeneity except in the case that all covariates are binary. Our preliminary simulations (data not shown) suggest that, in this case, our method is more powerful statistically in detecting linear and quadratic effect heterogeneity.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
For all of the following results, conditional logistic regressions were performed with all risk factors in the model simultaneously. The results of conditional logistic regression using the moving window approach to assess effect heterogeneity of the number of matched pairs of cases and controls submitted by veterinarians (k) with whether there has been a change in diet are displayed in figure 1. The regression coefficients ß(j) for the jth subcohort appear nearly linear in j, varying from 1.52 to 2.38. This means that the estimated odds ratios varied from 4.57 to 10.80, a ratio of 2.36. Because figure 1 indicated a linear dependence of the coefficients ß(j) on the subcohort number j, we estimated this fitted line by a linear regression of the coefficients ß(j) on j. The estimated line was equal to {alpha}0 + {alpha}1 j = 1.38 + 0.19 (j). The bootstrap 95 percent confidence interval for this slope was 0.10, 0.30, indicating that the slope and the effect heterogeneity were highly statistically significant. For the comparison of the fitted regression prediction for the last subcohort (j = 6, {alpha}0 + 6 {alpha}1) with the fitted regression prediction for the first subcohort (j = 1, {alpha}0 + {alpha}1), the ratio of the estimated odds ratios for these two was 2.6, and the bootstrap 95 percent confidence interval for the ratio of estimated odds ratios was 1.7, 4.5, evidence of a surprisingly large modifying effect for the number of matched pairs contributed by the veterinarian. The more parameterized approach with a linear modifying effect for change gave a similar analysis.



View larger version (11K):
[in this window]
[in a new window]
 
FIGURE 1. Plot of coefficients obtained from conditional logistic regression using a moving average approach for association of a change in diet with colic as a function of the subcohort number, where each subcohort was defined by a window of the number of matched pairs of cases and controls contributed by a veterinarian, Texas, 1997–1998. Solid diamonds are the regression coefficients. The regression coefficients are the natural logarithm of the estimated odds ratios for the binary variable change in diet, which takes the value of 1 if the horse’s diet had changed. The line is the least-squares fitted regression.

 
For all of the following results, conditional logistic regressions were performed with all risk factors in the model simultaneously. The results of conditional logistic regression using the moving window approach to assess effect heterogeneity of the number of matched pairs of cases and controls submitted by veterinarians (k) with whether there has been a change in diet are displayed in figure 1. The regression coefficients ß(j) for the jth subcohort appear nearly linear in j, varying from 1.52 to 2.38. This means that the estimated odds ratios varied from 4.57 to 10.80, a ratio of 2.36. Because figure 1 indicated a linear dependence of the coefficients ß(j) on the subcohort number j, we estimated this fitted line by a linear regression of the coefficients ß(j) on j. The estimated line was equal to {alpha}0 + {alpha}1 j = 1.38 + 0.19 (j). The bootstrap 95 percent confidence interval for this slope was 0.10, 0.30, indicating that the slope and the effect heterogeneity were highly statistically significant. For the comparison of the fitted regression prediction for the last subcohort (j = 6, {alpha}0 + 6 {alpha}1) with the fitted regression prediction for the first subcohort (j = 1, {alpha}0 + {alpha}1), the ratio of the estimated odds ratios for these two was 2.6, and the bootstrap 95 percent confidence interval for the ratio of estimated odds ratios was 1.7, 4.5, evidence of a surprisingly large modifying effect for the number of matched pairs contributed by the veterinarian. The more parameterized approach with a linear modifying effect for change gave a similar analysis.

Because conditional logistic regression was performed with all variables in the model simultaneously, we can consider effect heterogeneity for other risk factors. For example, consider regular parasitic control. In figure 2, for this risk factor we plot the regression coefficients ß(j) for the jth subcohort. Note here that effect heterogeneity appears to be nonlinear. Indeed, we fit a quadratic regression to this figure, resulting in the fit –0.74 + 0.05 x k – 0.02 x k2. A 95 percent confidence interval for the quadratic coefficient is –0.03, –0.01, so that a quadratic effect is indicated. We performed the same analysis for diet change and, as expected, the quadratic effect was not statistically significant.



View larger version (11K):
[in this window]
[in a new window]
 
FIGURE 2. Plot of coefficients obtained from conditional logistic regression using a moving average approach for the association of the horse receiving regular administration of an anthelmintic (vs. not) with colic as a function of the subcohort number, where each subcohort was defined by a window of the number of matched pairs of cases and controls contributed by a veterinarian, Texas, 1997–1998. Solid dots are the regression coefficients, and the line is the least-squares fitted regression. The regression coefficients are the natural logarithm of the estimated odds ratio for the binary variable parasitic control, which takes the value of 1 if the horse anthelmintics had been administered regularly to the horse.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Our goal was to develop methods for understanding how regression parameters, logits, and estimated odds ratios depend on a given ordered categorical or continuous covariate for which cases were matched with controls. In the example used, the number of matched pairs contributed by a veterinarian (k) was considered to be of interest. In a sufficiently large data set, this could have been done by running a separate conditional logistic regression for each value of k. Thus, for every k, one could have created a data set consisting of all matched pairs from veterinarians who contributed k matched pairs. Then a conditional logistic regression could have been run on this subset of the data, and the regression parameter ß(k) could have been computed. If one then plotted ß(k) against k, the basic trend would be apparent; if there were no effect heterogeneity, then the plot of ß(k) against k should reveal no trend.

This basic method is semiparametric in the following sense. In a standard conditional logistic regression with no effect heterogeneity, one is making the parametric modeling assumption that ß(k) does not depend on k. On the other hand, a parametric model for linear effect heterogeneity assumes that ß(k) = {alpha}0 + {alpha}1 k, for some values of {alpha}0 and {alpha}1. In our initial analysis, we tested whether the number of matched pairs contributed by the veterinarian was a source of linear effect heterogeneity for diet change and found this to be highly statistically significant. The linear effect heterogeneity test is a type of test for trend, but as a test alone it sheds no light on the shape of the effect heterogeneity, other than to suggest that it is somewhat monotone in pattern. In contrast, the method we describe enables one to describe the shape of the effect heterogeneity by ordered categorical or continuous matching covariates and allows for any shape of effect heterogeneity. For example, we found evidence of a quadratic effect heterogeneity for regular parasitic control. The basic point is that our method allows the study of effect heterogeneity without having to specify the shape a priori.

This advantage of a semiparametric method becomes more evident when one allows for the possibility of more than one covariate. Recall that our analyses were multivariate conditional logistic regression, with all 11 aforementioned risk factors for colic considered simultaneously. Using parametric modeling, one would necessarily be required to specify 11 different functions. Some covariates would not have effect heterogeneity attributable to k, some would be linear in k (as described above for diet change), while others might be quadratic in k (as described above for parasitic control). Our method will, at the least, provide an analytical means for proposing models for each of the covariates. For example, if only one covariate (e.g., X) appears to have systematic effect heterogeneity and if it appears to be linear in the source of effect heterogeneity (e.g., k), then this can be modeled directly by adding to the model the term k x X. Effect heterogeneity can be tested by conditional logistic regression software. Thus, the bootstrap, which requires special programming, is not necessary for using our approach.

The method of running a separate conditional logistic regression for each value of k will often be unacceptable, because the conditional logistic regression in the subcohort of matched pairs for a covariate may either be wildly variable or, in the worst instance, not even converge. Using the moving average approach obviates this problem. The methods we have proposed extend easily to the analysis of any ordered categorical variable as a potential source of effect heterogeneity. The numeric scale attributed to the categories should be selected with care, because the horizontal scale can affect the slope of the dose response. If the conditional logistic regressions are stable for each value of the ordered categorical variable, then one can run these regressions and plot the regression coefficients against the ordered categories. Otherwise, the moving average idea can be used, along with the bootstrap-based confidence intervals described above. These methods also can be applied to continuous covariates (Appendix). It also may be possible to extend these methods to categorical data, such as geographic information. For example, the conditional logistic regression for subcohorts of pairs matched on geographic location could be run, and one could plot the resulting coefficients. Evaluation of the resulting plot might indicate a spatial pattern that could be represented in a model. Although our study is an example of 1:1 matching, the method is applicable for any type matching for which conditional logistic regression can be applied.

The results indicated that veterinarians who participated more completely differed from those who did not participate completely. This may have been because the former provided more accurate information or because these veterinarians had busier practices that saw different types of horses and clients than did those veterinarians that saw fewer horses. If, for example, the type of diet change or response to diet change depended on a horses activity level or other management factors, it is possible that veterinarians who participated more completely were more likely to see horses at a certain level of activity (e.g., racing horses) or horses managed in a particular manner (e.g., horses that were predominately stalled). Alternatively, the veterinarians who contributed more cases may have been less careful with data collection and tended to record similar differential responses for their matched pairs (i.e., were more likely to indicate that cases but not controls had a recent change in diet).

The response rate by veterinarians was low. Explanations include the fact that some veterinarians listed as large-animal or mixed-animal practices likely treated no or few horses (i.e., treated large animals other than horses) and lacked the interest and time to collect and transmit data for a 12-month study using a long questionnaire. Comparison of respondents with nonrespondents was limited because the only information recorded about veterinarians was their address and type of practice. There was no significant difference between respondents and nonrespondents regarding the distribution of practice types; however, the categories of practice type may not have been useful to discriminate between those large-animal practices that treated horses and those that treated few or none.

Only case-control pairs that were matched by specific veterinarian were considered in this analysis. Excluding sets of pairs matched by a clinic (rather than by the individual veterinarian) could have biased our results. We believe that the effect of this bias was minimal, however, because veterinarians who contributed a large number of cases individually were associated with clinics that contributed a large number of cases. Consequently, we believe the results would have been similar if we had matched by practice. Unfortunately, we did not include a covariate for practice.

This report describes a graphs-based method for assessing and characterizing the presence of effect heterogeneity by a matched covariate in matched case-control studies. The method can be applied to ordered categorical or continuous covariates for which cases have been matched with controls. The graphs themselves can be constructed with any conditional logistic regression program. The graphs may suggest forms of effect heterogeneity, for example, linear or quadratic, that can be modeled directly using conditional logistic regression software. At least in principle, it would appear that the method could yield information about effect heterogeneity even if the covariate of interest were not ordered (e.g., geographic data), although this remains to be explored.

Our method applies without change when the variable suspected of being linked to effect heterogeneity is measured as a matching factor for the pairs (see the Appendix if this variable is a continuous one). Our method is restricted to the case of a single variable’s being linked to effect heterogeneity. In principle, it would seem possible to extend the idea to the case of effect heterogeneity in two different variables. We have not attempted to do this and believe that such an extension is likely to be challenging.

Our method is a type of regression, where, in effect, the odds ratios are being regressed on a variable suspected of being linked to effect heterogeneity. As in any regression method, there are dangers in overinterpretation and variable misspecification. The overinterpretation issue is a standard one: One cannot claim causality nor can one claim that there are no other variables that might be linked to effect heterogeneity. As mentioned, our method can help to identify variables that seem to be correlated with the cluster-specific odds ratios and thus to document that there may be effect heterogeneity. The method, however, cannot determine causality and, in particular, cannot determine whether the variable of interest is an important determinant of responses to exposure or a merely ecologically related factor.

The misspecification issue is more subtle but still important. For example, it might be that there are two variables that are linked to effect heterogeneity. If one applies our method to such a case, with a single variable, the same thing can happen as can happen in any regression problem where two variables affect a response but only one is placed in a model. As is well known, such model misspecification can lead to missed effects or effects even of the wrong sign. Thus, it is entirely possible that instances may arise whereby our method does not detect effect heterogeneity in a single variable when in fact effect heterogeneity exists and is "caused" by two variables. Alternatively, as in any regression problem, it is possible that our single-variable method may suggest an increasing pattern in odds ratios, when in fact controlling for a second variable would have shown that the first variable was associated with decreasing odds ratios. Unexplained variables affect all regression models, and ours is no exception. Patterns can be obscured or even wrongly interpreted, and significance levels can be affected in ways that are impossible to predict. Variance estimation can also be biased because of cluster-based dependencies that are secondary to an incompletely specified model for the binary outcomes, in the presence of effect heterogeneity.

Despite these important caveats, we believe that our simple graphs-based method represents an advance in suggesting a way to gain some understanding of the possibility of heterogeneity in odds ratios.


    ACKNOWLEDGMENTS
 
The research of Dr. Noah D. Cohen and Inyoung Kim was supported by the Link Equine Research Endowment. Dr. Raymond J. Carroll was supported by a grant from the National Cancer Institute (CA57030) and by the Texas A&M Center for Environmental and Rural Health via a grant from the National Institute of Environmental Health Sciences (P30-ES09106).


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 
This appendix has two sections. The first gives an algebraic discussion of the varying coefficient model in conditional logistic regression and compares it with another approach suggested by a reviewer. The second section describes how to fit the varying coefficient model for continuous data.

Section 1
We let D be case-control status, S be all stratum-level variables (stratum is defined as the level of the matching covariate of interest, in our case the level of pairs submitted by the veterinarian), X1, X2, ..., XM be the individual-level predictors, and Z be the variable that may be an effect modifier. In the colic case-control study, Z = k, the number of matched pairs contributed by the veterinarian.

The usual unconditional risk model without effect modification is that the logit of risk conditional on covariates is

Logit{Pr(D = 1)} = ß1 X1 + ß2 X2 ... + ßM XM + q(S), (1)

where q(S) includes the intercept and the unknown effects of the strata that disappear with the conditioning in conditional logistic regression; that is, they are not modeled.

The most general varying coefficient model allows all the regression parameters in equation 1 to depend on the effect modifer Z, that is,

Logit{Pr(D = 1)} = ß1(Z) X1 + ß2(Z) X2 ... + ßM(Z) XM + q(S). (2)

If Z were categorical, as in our study, and if there were sufficient observations for each level of Z, then one simple way to fit equation 2 is to run a conditional logistic regression separately for each category and then graph each of the regression coefficients as a function of Z. Any conditional logistic regression program can be used to construct the graphs.

The interaction model described in the text takes a different form, namely,

Logit{Pr(D = 1)} = ß1X1 + ... + ßMXM + {theta}1(Z x X1) + {theta}2 (Z x X2) ... + {theta}M(Z x XM) + q(S). (3)

In equation 3, the functions {theta}1(Z x X1), ..., {theta}M(Z x XM) are functions of the products of Z and X. Fitting equation 3 requires specialized software. As far as we know, no commercial packages include options for fitting equation 3, although for S-PLUS a program is available on the Web (http://lib.stat.cmu.edu/s-news/Burst/12907). In addition, if Z is an ordered categorical variable and any of the predictors are continuous variables, equation 3 suffers from the difficulty of interpreting multiplicative interactions. We believe that our model (equation 2) is more natural and more easily interpreted. We point out that even the use of equation 3 is new in the context of matched studies.

There is one case when equation 2 and equation 3 coincide, namely, when all of the predictors X1, X2, ..., XM are binary. Such a model is overparameterized in this case, so that the ßs must be removed from the model for fitting. In this case, the strength of our graphs-based method is that any conditional logistic regression program can be used and not just specialized software.

Section 2
The text describes how to fit equation 2 when Z is a categorical variable. Here we describe how to fit equation 2 when Z is a continuous variable. The basic varying coefficient model is fit by what is called local linear regression (LOESS smoothing) (14, 16). The idea is to select matched pairs whose value of Z is somewhat near any given value, for example, z, and then run a conditional logistic regression on this subset of the data. To implement this method, one must specify a percentage, called the span; typically, a span = 67 percent is used as the default. Then one collects the values of Z, called the nearest neighbors, containing the span percentage closest to z: If the span = 67 percent, then these are the collection of the 67 percent of the Z values closest to z. Let {Delta} be the range of the collected nearest neighbors. Define the weight for any value of Z by w = 0 if Z is not in the nearest neighbor collection, while otherwise

w = {1 – (Zz)2/{Delta}2}3.

One then runs a weighted multivariate conditional logistic regression with weights w and with the predictors Xj and Xj(Zz), for j = 1, ..., M. If the former have regression coefficient estimates {theta}0 j and the latter have regression coefficient estimates {theta}1 j, then the estimate of ßj(z) is ßj(z) = {theta}0 j + {theta}1 j z. Although these local regression methods are well established, their application to evaluation of effect modification in conditional logistic regression is new.

As before, a plot of ß(z) against z can suggest structure for effect modification. One can, for example, fit a linear regression in this plot and then use the bootstrap for inference in the same way that was described for the categorical case.


    NOTES
 
Correspondence to Dr. Noah D. Cohen, Department of Large Animal Medicine and Surgery, College of Veterinary Medicine, Texas A&M University, College Station, TX 77843-4475 (e-mail: ncohen{at}cvm.tamu.edu). Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 REFERENCES
 

  1. Breslow NE, Day NE, eds. Statistical methods in cancer research. Vol I. The analysis of case-control studies. Lyon, France: International Agency for Research on Cancer, 1980:162–88 and 248–78. (IARC scientific publication no. 32).
  2. White NA. Epidemiology and etiology of colic. In: The equine acute abdomen. Malvern, PA: Lea & Febiger, 1990:50.
  3. Cohen ND, Matejka PL, Honnas CM, et al. Case-control study of the association between various management factors and development of colic in horses. Texas Equine Colic Study Group. J Am Vet Med Assoc 1995;206:667–73.[ISI][Medline]
  4. Cohen ND, Gibbs PG, Woods AM. Dietary and other management factors associated with colic in horses. J Am Vet Med Assoc 1999;215:53–60.[ISI][Medline]
  5. Cohen ND, Peloso JG. Risk factors for history of previous colic and for chronic, intermittent colic in a population of horses. J Am Vet Med Assoc 1996;208:697–703.[ISI][Medline]
  6. Reeves MJ, Salman MD, Smith G. Risk factors for equine acute abdominal disease (colic): results from a multi-center case-control study. Prev Vet Med 1996;26:285–301.[ISI]
  7. Reeves MJ, Gay JM, Hilbert BJ, et al. Association of age, sex and breed factors in acute equine colic: a retrospective study of 320 cases admitted to a veterinary teaching hospital in the U.S.A. Prev Vet Med 1989;7:149–60.[ISI]
  8. Kaneene JB, Miller R, Ross WA, et al. Risk factors for colic in the Michigan (USA) equine population. Prev Vet Med 1997;30:23–36.[ISI][Medline]
  9. Barrett DC, Taylor FR, Morgan KL. A telephone-based case-control study of fatal equine colics in Wales during 1988 with particular reference to grass disease. Prev Vet Med 1992;12:205–15.[ISI]
  10. Tinker MK, White NA, Lessard P, et al. Prospective study of equine colic risk factors. Equine Vet J 1997;29:454–8.[ISI][Medline]
  11. Leblond A, Villard I, Leblond L, et al. A retrospective evaluation of the causes of death of 448 insured French horses in 1995. Vet Res Commun 2000;24:85–102.[ISI][Medline]
  12. Rollins JB, Clement TH. Observations on incidence of equine colic in a private practice. Equine Pract 1979;1:39–42.
  13. Proudman CJ. A two year, prospective survey of equine colic in general practice. Equine Vet J 1992;24:90–3.[ISI][Medline]
  14. Carroll RJ, Ruppert D, Welsh AH. Local estimating equations. J Am Stat Assoc 1998;93:214–27.[ISI]
  15. Efron B, Tibshirani RJ. An introduction to the bootstrap. London, England: Chapman & Hall, 1993.
  16. Fan J, Gijbels I. Local polynomial modeling and its applications. London, England: Chapman & Hall, 1996.




This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (1)
Disclaimer
Request Permissions
Google Scholar
Articles by Kim, I.
Articles by Carroll, R. J.
PubMed
PubMed Citation
Articles by Kim, I.
Articles by Carroll, R. J.