Application of Nonparametric Models for Calculating Odds Ratios and Their Confidence Intervals for Continuous Exposures
Adolfo Figueiras1 and
Carmen Cadarso-Suárez2
1 Department of Preventive Medicine and Public Health, Faculty of Medicine, University of Santiago de Compostela, Santiago de Compostela, Spain.
2 Unit of Biostatistics, Department of Statistics and Operational Research, University of Santiago de Compostela, Santiago de Compostela, Spain.
 |
ABSTRACT
|
---|
Calculating odds ratios and corresponding confidence intervals for exposures that have been measured using a continuous scale presents important limitations in the traditional practice of analytical epidemiology. Approximations based on linear models require making arbitrary assumptions about the shape of the relation curve or about its breakpoints. Categorical analyses generally have low statistical efficiency, and cutpoints for the categories are in most cases arbitrary and/or opportunistic. The use of logistic generalized additive models to calculate odds ratios does not require these assumptions and allows great flexibility and adequate statistical efficiency. Based on the asymptotic normality of the logarithm of the odds ratio, the authors propose the use of an approximate analytical expression for the corresponding covariance matrix, which will allow the construction of confidence intervals for odds ratios that can be interpreted as in the classical parametric context. The authors illustrate this procedure by examining the relation between glycemia and risk of postoperative infection, using data obtained from a cohort study of patients undergoing surgery in Santiago, Spain (January 1996March 1997). The authors found that glycemia values below 75 mg/dl and above 130 mg/dl were associated with increased risk of postoperative infection.
blood glucose; epidemiologic methods; generalized additive models; infection; logistic models; postoperative complications; risk; smoothing
Abbreviations:
AIC, Akaike's Information Criterion; GAM, generalized additive model; OR, odds ratio.
 |
INTRODUCTION
|
---|
Traditionally, for analysis of the effect of exposure variables measured on a continuous scale, there have been two options: 1) categorizing the exposure into two or more categories, creating dummy variables, and then calculating the effect(s) using one category as a reference group, or 2) using a linear model to describe the relation between exposure and effect. However, neither of these options is entirely appropriate. Categorizing the continuous variable and using dummy variables presents the advantage of a simple epidemiologic interpretation (one category is used as a reference group, for which the odds ratio equals 1 and no confidence interval exists). However, the disadvantages of this method include a loss of statistical efficiency (1
), the possibility of an arbitrary and/or opportunistic choice of cutpoints, and the fact that the relative risks or odds ratios are based on averaging risks within categories (2
). In many instances, categorization is done mechanically (tertiles, quartiles), and this can give rise to important errors when most subjects are grouped within a very narrow range of exposures (3
). When reviewing the literature, we encountered various methods for establishing adequate cutpoints in relation to the results (4
, 5
) that help investigators avoid subjective decisions when choosing cutpoints. This approach, however, does not solve the other two problems: loss of statistical power and the fact that the relative risks or odds ratios are based on averaging risks within categories.
In many instances, linear models can provide better power than categorical analysis. However, using these models for estimation, we "force" the data to follow a linear parametric response that frequently does not fit the data closely. This problem can be overcome by examining high-order terms (x2, x3, and so on) in the model, leading to different polynomial regression fits. These polynomial regression estimates are useful if they are appropriate for the data at hand but are potentially misleading otherwise: Practical experience indicates that high-order terms beyond the quadratic term tend to produce artificial turns in the fitted model (3
). Fractional and inverse polynomials of x can be also included for greater flexibility, but important problems limit their application (3
, 6
). For example, the exposure cannot take negative values; in addition, the methods used to select the polynomial terms to be included in the model can be questionable (3
, 6
).
Most of the problems with fractional polynomial regression can be avoided by using regression splines (3
, 7
9
). Under these models, the fit is represented as a piecewise (usually cubic) polynomial with breakpoints (knots) at pre-chosen values of the covariate X (7
). By allowing more knots, such models allow the family of curves to become more flexible (and more bumpy). The regression splines method is an appealing procedure because of the explicit local nature of its fit, providing a continuous and flexible estimate of the odds ratio function. Moreover, the resulting odds ratio curve is computed by using the standard software for multiple linear regression fit. Despite these great advantages (similar to what occurs in categorical analysis), the main drawback of this approach lies in the practical difficulties involved in choosing the number and location of the breakpoints in the covariate (10
).
Recent epidemiologic publications (9
, 11
13
) have shown an interest in the application of a more general flexible regression technique such as generalized additive models (GAMs) (14
16
). This modern regression technique has the advantage of not assuming a parametric relation between exposure and effect ("Let the data show us the appropriate functional form" (15
, p. 1)), and it eliminates the need for the investigator to impose functional assumptions. The only assumption required is that the effect of the continuous covariate follows an arbitrary continuous smooth function. In this context, a variety of nonparametric regression techniques are used to fit the curve to the data points locally, so that at any target point of the exposure, the response is obtained by averaging the responses that have covariate values in the neighborhood of the target value. The fitted curve is called a smoother. Procedures for producing such fits are called scatterplot smoothers, and they are usually based on the cubic smoothing splines (17
). In spite of the advantages of GAMs, there is currently no proposed analytical method with which to calculate GAM-based odds ratios and confidence intervals for continuous exposures. Bootstrap techniques have been suggested as a solution to this problem (13
).
In nonparametric models, there is no need to assume a function to which data "must" adjust, but the number of degrees of freedom for the exposure-effect relation must be specified beforehand (15
, pp. 5255). For a given theoretical framework about the shape of the exposure-effect relation, one can make an a priori assumption of the number of df. Nevertheless, a previous theoretical framework is often missing. In such a case, if a number below the optimal df is selected, a large bias in the odds ratio estimation may arise, while if a higher number is set, the result may be a loss of statistical efficiency and thus a less parsimonious model, mainly because of the introduction of variance in the odds ratio estimation (15
, p. 12).
In this paper, we propose a nonparametric method for calculating odds ratios and their corresponding confidence intervals when the exposure is continuous. Based on the asymptotic normality of the logarithm of the odds ratio, we construct the confidence intervals through a new approximation to the covariance matrix of the log odds ratio. We also suggest an algorithm for selecting the optimal df in the exposure-effect relation. Macros for these procedures (in S-PLUS) can be obtained from the second author by electronic mail (eicadar{at}usc.es). We use this new method to analyze the relation between glycemia and risk of postoperative infection. Diabetes mellitus is a known risk factor for postoperative infection (18
20
), and risk can increase with very high levels of glycemia (21
, 22
). However, there are unanswered questions about the relation between postoperative infection risk and glycemia: 1) What is the shape of the curve describing this relation? 2) Is hypoglycemia a risk factor for postoperative infection? 3) If so, below what value does the risk begin to increase? Our aim in this paper is to answer these questions by applying GAM-based odds ratio estimates.
 |
MATERIALS AND METHODS
|
---|
Data source
We used data from a registry-based prospective cohort study of patients who underwent surgery in the Clinical Hospital of Santiago (northwestern Spain) between January 1996 and March 1997 (23
). The medical records of all patients (n = 2,357) were reviewed for data on preoperative and operative factors related to postoperative infection. We focused our analysis on the association between postoperative infection and plasma glucose levels prior to surgery. Details appear in a previous report (23
).
Statistical analysis
Since the exposure X (glycemia) is a single continuous covariate, the logistic regression model takes the general form
 | (1) |
where ß0 is needed because of the constraint that at every stage of the procedure f(x) has mean 0, Y represents the binomial outcome of interest (Y = 1 when ill (postoperative infection) and Y = 0 when healthy (no postoperative infection)), x is the exposure value of interest, and f(x) is a function relating the covariate to the logit. Assuming the above logistic model (equation 1), the odds ratio (OR) curve can be expressed as follows:
 | (2) |
where xref is a specific value of the exposure taken as the reference.
Parametric approaches
In order to study the relation between glycemia and postoperative infection, we used generalized linear models, where f(x) from equation 1 is a known function established a priori. We applied several traditional models, including categorical, linear, quadratic, and cubic models. We also used local parametric methods (cubic regression splines (8
)), where f(x) has the following form:
Here (a)+ denotes a when a > 0 and 0 otherwise, p is the number of knots, and ki is the value of x at knot i. Note that a "knot" refers to a value of x where the cubic polynomial changes its form. Two regression spline models were used, one having only one knot (located at the median) and another having two knots (located at the tertiles).
Nonparametric approach
We also studied the relation between plasma glucose levels and postoperative infection using nonparametric models. We used a logistic regression model belonging to the broad family of GAMs (15
). In these models, f(x) in equation 1 is an unknown smooth function. To estimate f(x) in the GAM context, we apply cubic smoothing splines (for technical details, see appendix 1).
A crucial step in estimating the function f(x) is choosing the df, which controls the degree of smoothing of the estimated function,
(x) (see appendix 1 for the mathematical definition of nonparametric df). As is the case in a parametric context, larger df curves are jagged while smaller df curves are smooth. In the extreme case of df = 1, f is forced to be a straight line f(x) = ß1x. An optimal choice of df should balance the bias of the exposure-effect relation estimate against its variance. Automatic procedures, based on minimizing some error criterion, have been proposed in order to achieve this balance. In transformed response models (such as logistic regression), one fitting criterion minimizes the theoretical expected prediction error (15
, p. 158). Given that the calculation of prediction error involves unknown quantities, some approximations of the prediction error have been proposed. One of the most common procedures used is Akaike's Information Criterion (AIC) (25
). This is defined as follows:
 |
The AIC is based on the number of subjects (n), df, the deviance (i.e., the likelihood ratio statistic for the fitted model), and
= deviance/(n df).
We utilized the AIC to determine the number of df of the best-fitting parsimonious nonparametric model describing the relation between glycemia and risk of postoperative infection (26
). We calculated the AIC of all of the models with df between 1 and 10, at intervals of 0.25, and we selected the model with the smallest AIC. The AIC was also used (along with the deviance) to compare the fit of parametric models among themselves and with the fit of the nonparametric logistic regression model.
Once we selected the optimum df for the model, we proceeded to calculate the confidence interval for the odds ratio. Since the function f(x) is arbitrary and unknown, it cannot be explicitly expressed through coefficients. Therefore, odds ratios as well as standard errors must be estimated "point by point" for each exposure value in the data. That way, each exposure value has its own associated odds ratio and standard error (SE).
We can construct the following 100 percent x (1
) limits for the confidence interval of the odds ratio, OR(x, xref):
where
. For details on this derivation, see appendix 2.
Finally, the model was adjusted for potential confounders such as sex, age (with a straight-line relation), and diabetes status.
 |
RESULTS
|
---|
Figure 1 shows the distribution of plasma glucose levels among the study participants. Glucose levels were below 95 mg/dl for one third of the sample, below 80 mg/dl for 106 participants, and above 250 mg/dl for only 0.4 percent of the sample. Among study participants, 465 (19.7 percent) developed a postoperative infection, and 142 were diabetic. Of those with glucose levels between 90 mg/dl and 100 mg/dl, 13.5 percent developed a postoperative infection.

View larger version (16K):
[in this window]
[in a new window]
|
Figure 1. Plasma glucose levels of participants in a Spanish cohort study of postoperative infection risk, 19961997. "Frequency" = number of patients.
|
|
To study the relation between glycemia and postoperative infection risk using a categorical analysis, we calculated the quartiles for glucose concentration and recorded the variables in four categories, which were entered into the model as three dummy variables. We took as the reference group the second quartile (92102 mg/dl), because it contained the "normal" concentration values. The results are shown in table 1, model 4. Only glucose values above 121 mg/dl were associated with an increased risk of postoperative infection relative to the reference category.
View this table:
[in this window]
[in a new window]
|
TABLE 1. Results obtained by applying different mathematical models to estimation of the relation between plasma glucose level and risk of postoperative infection*
|
|
View this table:
[in this window]
[in a new window]
|
TABLE 2. Odds ratios for the relation between plasma glucose level and risk of postoperative infection, using as the reference value a glucose level of 95 mg/dl*
|
|

View larger version (23K):
[in this window]
[in a new window]
|
Figure 4. Relation between plasma glucose level and risk of postoperative infection (log odds) in a Spanish cohort study (19961997): influence of degrees of freedom (df) on the nonparametric estimate of odds. Part a, 3 df; part b, 4 df; part d, 8 df; part e, 9 df; part c, optimal number of df (6.25) calculated using Akaike's Information Criterion. , log odds ratio; , 95% confidence interval.
|
|
If we assume that f is a linear function, we obtain the results shown in table 1, model 1: a statistically significant association and an increment in the postoperative infection odds of 10.0 percent (95 percent confidence interval: 7.1, 13.1) for each 10-mg/dl increment of plasma glucose. If we take the value 95 as a reference value, we obtain figure 2, part a. This portion of figure 2 suggests that values below 95 are associated with a lower risk of postoperative infection than the value 95, and values above 95 are associated with a higher risk than 95. One can examine other parametric functions, such as the quadratic function (see table 1), where results indicate that the quadratic term is significant and suggest the relation depicted in figure 2, part b. If we try a cubic term in the model, it is also significant, and the relation would be as shown in figure 2, part c.

View larger version (25K):
[in this window]
[in a new window]
|
Figure 2. Relation between plasma glucose level and risk of postoperative infection (log odds ratio (LnOR)) in a Spanish cohort study (19961997), analyzed using different parametric methods. Part a, linear model; part b, quadratic polynomial; part c, cubic polynomial; part d, regression splines with a knot at the median value (102 mg/dl); part e, regression splines with knots at the tertiles. , log odds ratio; , 95% confidence interval.
|
|
When applying local parametric regression methods (cubic regression splines), the fit depends on the number of knots (compare parts d and e of figure 2). When the knots are established using quantiles that minimize the AIC (tertiles in our data), figure 2, part e, is obtained. The minimum risk is found at a blood glucose level of 95 mg/dl, and the maximum risk is found at approximately 180 mg/dl. However, when a single knot (located at the median) is selected, the minimum and maximum risks shift to 86 mg/dl and approximately 200 mg/dl, respectively (see figure 2, part d).
Figure 3 shows the effect of the df in GAMs on the AIC. The AIC is minimized at 6.25 df. Figure 4 shows the estimated relations based on applying GAMs using various df. When we take the model with the optimal number of df (6.25 df) as the reference model (see figure 3 and figure 4, part c), the log odds estimates with 3 or 4 df (figure 4, parts a and b) indicate that hypoglycemia would not seem to exert any effect upon postoperative infection. Estimates with 8 or 9 df (figure 4, parts d and e) do not provide any information additional to estimates with 6.25 df. With 8 or 9 df, the corresponding estimates only produce inflexions with hardly any information between glucose levels of 130 mg/dl and 225 mg/dl. The model with 6.25 df (figure 4, part c) shows that a "spoon" shape seems to explain the dose-response relation between glucose level and odds of infection, and the lower risk corresponds to levels of glucose around 95 mg/dl. So far, from these results we do not know the values that can be considered risk indicators for postoperative infection.

View larger version (14K):
[in this window]
[in a new window]
|
Figure 3. Nonparametric relation between plasma glucose level and risk of postoperative infection in a Spanish cohort study (19961997): effect of degrees of freedom (df) on Akaike's Information Criterion (AIC).
|
|
Applying the method proposed in this article for calculating the confidence interval of the odds ratio, and taking 95 as the reference value (since it is a minimum risk value), we obtain figure 5, part a. It shows the estimated relation between glucose concentration and the postoperative infection odds ratio, as well as the values which present the highest risk of postoperative infection from a statistical viewpoint: values under 80 mg/dl and above 130 mg/dl. The adjusted analysis (adjusted for sex, age, and diabetes status) provides similar results (see figure 5, part b). Finally, we can also present a table with the odds ratio (crude and adjusted) and its 95 percent confidence interval for any glucose value in relation to the reference value (see table 2).

View larger version (21K):
[in this window]
[in a new window]
|
Figure 5. Relation between plasma glucose level and risk of postoperative infection (log odds ratio (LnOR)) in a Spanish cohort study (19961997), taking as the reference value a plasma glucose level of 95 mg/dl. Part a, crude analysis; part b, analysis with adjustment for sex, age (with a straight-line relation), and diabetes status. , log odds ratio; , 95% confidence interval.
|
|
When one compares the results obtained by the various methods applied to the data in our example, the method proposed in this article seems to offer greater statistical efficiency than the categorical analysis. Moreover, it provides information on the odds ratio evolution in each quartile, while other methods would show only odds ratios based on average risks within categories (one can observe that in the first quartile the odds ratio diminishes as the glucose concentration rises, while in the third quartile the opposite happens). Parametric methods (linear, quadratic, or cubic) "force" data to follow a parametric form, which can result in important biases in the odds ratio estimates: Values under 95 mg/dl appear either to be protectors or to have no effect; by contrast, values below 85 mg/dl, when one uses flexible methods, represent risk factors. Moreover, the forms of the relation between glycemia and postoperative infection provided by nonlocal linear regression methods differ markedly from those suggested by nonparametric techniques (compare figure 2, parts a, b, and c, with figure 4, part c). Regarding models based on regression splines, the proposed method eliminates the subjectivity of having to select the number and position of knots, and it can provide less biased odds ratio estimates (obviously, if the model is truly linear, a linear approach to fitting does not result in bias). Finally, if we use the AIC as a comparison statistical criterion of the estimates of the various models (see table 1), we see that the model which best fits the data is the nonparametric model (although the two-knot regression spline provides a similar curve and AIC).
 |
DISCUSSION
|
---|
Our results show that the use of nonparametric methods can permit better estimation than the traditional approaches to calculating odds ratios and confidence intervals for continuous variables; at the same time, our proposed method allows an interpretation of these effect measures equivalent to that used in classical epidemiologic research. Comparing it with other approximations based on linear models, the main advantage of the method proposed in this paper is that the investigator does not need to make any a priori assumptions about the kind of relation or about the number or location(s) of knots in the curve. In comparison with categorical analysis, the nonparametric method described here can be statistically more efficient, and it eliminates the need to select cutpoints for categories in an arbitrary or opportunistic manner.
Through the application of our method, we have observed that both high and low levels of plasma glucose are associated with an increased risk of postoperative infection. Our method also suggests that the relation curve between plasma glucose level and postoperative infection risk has a "spoon" shape. Diabetes has been characterized as an important risk factor for postoperative infection (18
20
), and that risk is exacerbated in the presence of glucose levels higher than 200 mg/dl (21
, 22
); but our results indicate that glucose levels above 130 mg/dl are associated with a greater risk of postoperative infection. However, the most interesting finding of our study was that low plasma glucose levels prior to surgery also increase the risk of postoperative infection.
Mechanisms for an effect of hypoglycemia on postoperative infection risk are uncertain. A detailed literature review found no articles that related low plasma glucose levels to an increased risk of postoperative infection. However, as a feature of the systemic inflammatory process, circulating levels of the protein known as macrophage migration inhibitory factor are elevated (28
, 29
). It is likely that migration inhibitory factor participates in the stimulation of insulin secretion (30
, 31
) and thus may decrease glycemia. Therefore, a decrease in plasma glucose level could reflect a response to some illness (e.g., alcoholism (32
), diabetes (33
, 34
), sepsis (33

36
), or malnutrition (37
)) which in turn is a risk factor for postoperative infection, even in subclinical phases (34
). Hence, in view of our results, preoperative glucose levels below 75 mg/dl could indicate the presence of some risk factor (systemic inflammatory process) in the subclinical phase. Thus, hypoglycemia could be viewed as an indicator of risk of postoperative infection. However, we cannot reject the possibility that low levels of glycemia can be a risk factor for postoperative infection per se: It has been demonstrated that glucose uptake by tissue decreases when glycemia values are low (38
) and that cells increase glucose consumption to satisfy energy needs during times of stress (39
, 40
). Thus, low-level glycemia could reduce the patient's ability to respond to postsurgical stress, increasing the risk of postoperative infection. This agrees with the results of Shilo et al. (41
), who recently found that risk of postoperative infection was elevated among older hospitalized patients who were hypoglycemic, even after adjustment for other risk factors. Our findings could suggest that there is a need to monitor and treat hypo- and hyperglycemia and possibly increase prophylactic measures in these patients to diminish their risk of postoperative infection. Further studies will be necessary to further elucidate the relation between glycemia and postoperative infection.
From a methodological perspective, our detection of an elevated infection risk at low levels of glycemia could be discounted as being due to the great flexibility of nonparametric methods, which can artifactually detect minor irregularities in the data that are simply stochastic variation. However, GAMs are very conservative at the extremes of the curve (they have wide confidence intervals, as figures 4 and 5 show). This is the case because normally there are few observations at the extremes of exposure, and because of a slight boundary effect inherent in all flexible methods (42
, pp. 1702).
The risk of postoperative infection related to low-level glycemia could potentially reflect a bad choice of df. In general, selection of the number of df is problematic. The researcher can use his/her previous knowledge as a base, but knowledge is often scarce or nonexistent, as was the case in our study. In these situations, one can use the AIC to select the df, which should bring about a good compromise between flexibility and loss of statistical power and parsimony. However, use of the AIC to select the df presents two limitations. First, it has been shown that for a Gaussian response, the classical AIC tends to result in undersmoothing (that is, it overestimates the number of df) (26
). A correction for the classical AIC has recently been proposed (26
) but has not been tested with binomial models. In spite of that, simulation studies using families of Gaussian response curves have shown that the cubic smoothing splines technique is the nonparametric method in which the classical AIC produces the minimum amount of undersmoothing (26
). Second, utilizing the AIC to determine the number of df makes the model selection data-dependent, increasing uncertainty about the estimates due to data dependence at the model selection step (43
). Because of this, the variances calculated with our method are underestimated. However, in our case, the sample size is large and the number of "candidate models" is small; for that reason, the impact of this may be negligible. Additional simulation studies may be necessary to validate the use of the AIC in binomial response scenarios and the effect of selecting the model in a data-dependent fashion, using scenarios with a wide range of "candidate models" and sample sizes.
One of the main difficulties in applying GAMs in epidemiology has been the scarcity of software, since initially only S-PLUS software (44
, 45
) and specific software (46
) was developed for its application. However, current popular statistical software packages for epidemiologists, such as Stata (47
), have begun to incorporate GAM procedures. Some authors, such as Mather and Lu (48
), think that the main utility of nonparametric methods is in exploring the shape of the curve and then identifying parametric functions. However, utilizing GAMs in such an exploratory manner can introduce additional uncertainty in the estimates that is not accounted for during variance calculations. Moreover, as we have illustrated, GAMs can be useful themselves as explicative models.
Another possible limitation of our proposed method is that results can only be presented in relation to a reference category. Greenland et al. (49
, 50
) maintain that establishing a reference category in which the odds ratio equals 1, without a confidence interval, for continuous exposures (where the confidence bands "pinch" the odds ratio) results in two fundamental limitations: 1) a graphical distortion that can promote misinterpretation of results by the reader and 2) arbitrary selection of the reference point. These investigators propose representing the curve centered on the weighted mean (50
), based on the "floating absolute risk" (51
), where there is no reference point. Although this method permits comparison of two nonreference values of exposure, the reader must calculate the widths of the confidence bands for any two exposure values ("manually" if results are only presented graphically) and carry out a series of calculations (50
).
In our example, 95 mg/dl was a natural choice for the reference value. Not all applications provide an obvious choice. In some scenarios, one might want to take a point from the background literature as one of the common clinical reference points. Finally, it is necessary to highlight that the pointwise confidence bands estimated through our proposed method represent the 100 percent x (1
) confidence interval of the log odds ratio at each of the values of the continuous exposure, but these do not allow us to make global inferences about possible dose-response curves. However, the log odds ratio curve gives us a quite reasonable representation of the dose-response relation in the sample studied.
 |
APPENDIX 1
|
---|
The method used for fitting the generalized additive model (GAM) (equation 1 in the text) is the local Fisher scoring algorithm (15
, pp. 14051), which is analogous to the Fisher scoring algorithm for generalized linear models. Maximization of the log-likelihood functions is carried out by Fisher scoring iterations. The difference is the introduction of an inner backfitting iteration loop to estimate locally the function f(x) in equation 1, through nonparametric methods.
An estimate of the function f(x),
(x) can be obtained by applying the local Fisher scoring algorithm to model 1. The smooth estimate can also be expressed as
(x) = S · z(x), S being any (weighted) smoother matrix applied to the working response z(x), obtained from the Fisher scoring fit (15
, pp. 1401). For simplicity of notation, we will suppress the dependency of z on x, referring to z(x) as simply z. In the nonparametric context, the usual S matrix considered is the one based on the cubic smoothing splines (17
).
However, the backfitting algorithm in the multivariate case is inefficient when the covariates are correlated, and the uniqueness of the solution cannot be ensured when collinearity is present (24
). To avoid most of these problems and improve the efficiency of the method, a modified version of the backfitting algorithm is required (15
, pp. 1247). The strategy is simple. Even though f(x) is a term to be smoothed, it may be exactly decomposed into a parametric partfor instance, a linear expression ß1xand a purely nonparametric one, g(x). This way, f(x) can be expressed as f(x) = ß1x + g(x).
This procedure can be justified through rigorous Bayesian arguments, when the smoother matrix is defined in terms of smoothing splines. In such a case, it can be shown that matrix S can be orthogonally decomposed as a sum of two matrices H and G, where H is the projection operator matrix giving an estimate of the corresponding parametric part of f(x) and G is the nonprojection operator matrix, responsible for smoothing g(x) through cubic smoothing splines. The estimate
(x) is then given by
The number of degrees of freedom of a nonparametric regression smoother gives an indication of the amount of fitting produced by the smoother matrix S. Formally, in the nonparametric context, degrees of freedom is defined as the sum of the diagonal elements of S (i.e, the trace of S).
 |
APPENDIX 2
|
---|
Assuming the nonparametric logistic model (equation 1 in the text), a natural nonparametric estimate,
(x, xref), of the odds ratio (OR) curve can be constructed as
(x, xref) = exp(
(x)
(xref)) by substituting function f(x) by any smoother estimate,
(x).
The variance of Ln
(x,xref) can be expressed in terms of the covariance matrix of the smoother
(x):
Given that the smoother matrix, S, can be orthogonally decomposed as a sum of two matrices, H and G (see appendix 1), to produce solution
(x) =
1x +
(x), it follows that
1x and
(x) are independent. Therefore, the Ln
(x,xref) asymptotic variance can be rewritten in terms of the corresponding asymptotic variances of those two components:
where
is the estimated dispersion parameter
of the model and Cov(
) = Cov(
(xi),
(xj)) is the asymptotic covariance matrix of the purely nonparametric smoother function,
(x).
It can be shown that the asymptotic covariance matrix Cov(
) takes the form
GW-1Gt, which can itself be well approached by
GW-1, (27
). In the expression, W is the diagonal matrix of the final iteratively reweighted least squares weights, wi, and G is the smoothing operator matrix. Note that G can be directly obtained, although the involved calculations are very lengthy from a computational point of view. Specifically, the smoothing operator is an n x n matrix that can be written as G = (G1, ..., Gn), where Gj(j = 1, ..., n) represents the jth column of G. Each of these columns, Gj, is obtained through a weighted regression by using the backfitting approach. In each of these regressions, the response is taken to be the jth column of the n x n identity matrix, and the exposure of interest (X) as covariable. For the regression weighting, the weights wi are used.
Given that the two components,
1x and
(x), of the smoother
(x) are demonstrated to follow asymptotically a normal distribution (15
, p. 157), we also have
Finally, these results allow us to construct the following 100 percent x (1
) pointwise confidence intervals around the OR(x,xref) curve:
where SE is the standard error and
.
 |
ACKNOWLEDGMENTS
|
---|
This research was supported in part by a Fondo de Investigaciónes Sanitarias grant (97/1089) from the Spanish Ministry of Health and by a Dirección General de Enseñanza Superior e Investigación Científica grant (pb98-0182-c02-02) from the Spanish Ministry of Education.
The authors thank Drs. Sander Greenland, Trevor Hastie, and Marc Saez for helpful comments that improved the paper. They also thank Drs. Miguel Caínzos and Aquilino Fernández for making the data available.
 |
NOTES
|
---|
Correspondence to Dr. Carmen Cadarso-Suárez, Unidad de Bioestadística, Facultad de Medicina, University of Santiago de Compostela, 15705 Santiago de Compostela, Spain (e-mail: eicadar{at}usc.es).
 |
REFERENCES
|
---|
-
Greenland S. Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. Epidemiology 1995;6:4504.[ISI][Medline]
-
Weinberg CR. How bad is categorization? Epidemiology 1995;6:3457.[ISI][Medline]
-
Greenland S. Dose-response and trend analysis in epidemiology: alternatives to categorical analysis. Epidemiology 1995;6:35665.[ISI][Medline]
-
Wartenberg D, Northridge M. Defining exposure in case-control studies: a new approach. Am J Epidemiol 1991;133:105871.[Abstract]
-
Schulgen G, Lausen B, Olsen JH, et al. Outcome-oriented cutpoints in analysis of quantitative exposures. Am J Epidemiol 1994;140:17284.[Abstract]
-
Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modeling (with discussion). Appl Stat 1994;43:42567.
-
Wegman EJ, Wright IW. Splines in statistics. J Am Stat Assoc 1983;78:35165.[ISI]
-
de Boor C. A practical guide to splines. New York, NY: Springer-Verlag, 1978.
-
Rothman KJ, Greenland S, eds. Modern epidemiology. 2nd ed. Philadelphia, PA: Lippincott-Raven, 1998.
-
Hastie T, Tibshirani R. Comment on: Ramsay JO. Monotone regression splines in action. Stat Sci 1988;3:42561.
-
Schwartz J. Air pollution and hospital admissions for cardiovascular disease in Tucson. Epidemiology 1997;8:3717.[ISI][Medline]
-
Abrahamowicz M, du Berger R, Grover SA. Flexible modeling of the effects of serum cholesterol on coronary heart disease mortality. Am J Epidemiol 1997;145:71429.[Abstract]
-
Zhao LP, Kristal AR, White E. Estimating relative risk functions in case-control studies using a nonparametric logistic regression. Am J Epidemiol 1996;144:598609.[Abstract]
-
Hastie TJ, Tibshirani RJ. Generalized additive models: some applications. J Am Stat Assoc 1987;82:37186.[ISI]
-
Hastie TJ, Tibshirani RJ. Generalized additive models. London, United Kingdom: Chapman and Hall Ltd, 1990.
-
Green PJ, Silverman BW. Non-parametric regression and generalized additive models: a roughness penalty approach. New York, NY: Chapman and Hall, 1994.
-
Wahba G. Spline functions for observational data. (CBMS-NSF regional conference series). Philadelphia, PA: Society for Industrial and Applied Mathematics, 1990.
-
Terranova A. The effects of diabetes mellitus on wound healing. Plast Surg Nurs 1991;11:205.[Medline]
-
Zerr KJ, Furnary AP, Grunkerneier GL, et al. Glucose control lowers the risk of wound infection in diabetics after open heart operations. Ann Thorac Surg 1997;63:35661.[Abstract/Free Full Text]
-
Lilienfeld DE, Vlahov D, Tenney JH, et al. Obesity and diabetes as risk factors for postoperative wound infections after cardiac surgery. Am J Infect Control 1988;16:36.[ISI][Medline]
-
Pornposelli JJ, Baxter JK, Babineau TJ, et al. Early postoperative glucose control predicts nosocomial infection rate in diabetic patients. J Parenter Enteral Nutr 1998;22:7781.[Abstract]
-
Golden SH, Peart-Vigilance C, Kao WH, et al. Perioperative glycemic control and the risk of infectious complications in a cohort of adults with diabetes. Diabetes Care 1999;22:140814.[Abstract]
-
Fernandez Perez JA. Estudio de las infecciones postoperatorias utilizando el programa, "CIRUGIA." (PhD thesis). Santiago de Compostela, Spain: University of Santiago de Compostela, 2000.
-
Buja A, Hastie T, Tibshirani R. Linear smoothers and additive models (with discussion). Ann Stat 1989;17:4535.[ISI]
-
Akaike H. Statistical predictor identification. Ann Inst Stat Math 1970;22:20317.[ISI]
-
Hurvich CM, Simonoff JS, Tsai CL. Smoothing parameter selection in non-parametric regression using an improved Akaike information criterion. J R Stat Soc B 1998;60:27193.[ISI]
-
Gu C. Penalized likelihood regression: a Bayesian analysis. Stat Sinica 1992;2:25564.[ISI]
-
Mizock BA. Alterations in carbohydrate metabolism during stress: a review of the literature. Am J Med 1995;98:7584.[ISI][Medline]
-
Waeber G, Calandra T, Bonny C, et al. A role for the endocrine and pro-inflammatory mediator MIF in the control of insulin secretion during stress. Diabetes Metab Res Rev 1999;15:4754.[ISI][Medline]
-
Bernhagen J, Calandra T, Mitchell RA, et al. MIF is a pituitary-derived cytokine that potentiates lethal endotoxaemia. Nature 1993;365:7569.[ISI][Medline]
-
Calandra T, Bernhagen J, Metz CN, et al. MIF as a glucocorticoid-induced modulator of cytokine production. Nature 1995;377:6871.[ISI][Medline]
-
Nouel O, Bernuau J, Rueff B, et al. Hypoglycemia: a common complication of septicemia in cirrhosis. Arch Intern Med 1981;141:14778.[Abstract]
-
Scheetz A. Hypoglycemia and sepsis in two elderly diabetics. (Letter). J Am Geriatr Soc 1990;38:492.
-
Guyuron B, Raszewski R. Undetected diabetes and the plastic surgeon. Plast Reconstr Surg 1990;86:4714.[ISI][Medline]
-
Miller SI, Wallace RJ Jr, Musher DM, et al. Hypoglycemia as a manifestation of sepsis. Am J Med 1980;68:64954.[ISI][Medline]
-
Romijn JA, Endert E, Sauerwein HP. Glucose and fat metabolism during short-term starvation in cirrhosis. Gastroenterology 1991;100:7317.[ISI][Medline]
-
McMurray DN, Bartow RA, Mintzer CL, et al. Micronutrient status and immune function in tuberculosis. Ann N Y Acad Sci 1990;587:5969.[Abstract]
-
Lang CH, Dobrescu C. Sepsis-induced increases in glucose uptake by macrophage-rich tissues persist during hypoglycemia. Metabolism 1991;40:58593.[ISI][Medline]
-
Lang CH, Dobrescu C. Gram-negative infection increases noninsulin-mediated glucose disposal. Endocrinology 1991;128:64553.[Abstract]
-
Mizock BA. Alterations in carbohydrate metabolism during stress: a review of the literature. Am J Med 1995;98:7584.[ISI][Medline]
-
Shilo S, Berezovsky S, Friedlander Y, et al. Hypoglycemia in hospitalized nondiabetic older patients. Am Geriatr Soc 1998;46:97882.
-
Simonoff JS. Smoothing methods in statistics. New York, NY: Springer-Verlag New York, 1996.
-
Abrahamowicz M, MacKenzie T, Esdaile JM. Time-dependent hazard ratio: modeling and hypothesis testing with application in lupus nephritis. J Am Stat Assoc 1996;91:14329.[ISI]
-
MathSoft, Inc, Data Analysis Products Division. S-PLUS 4.5. Seattle, WA: MathSoft, Inc, 1997.
-
Chambers JM, Hastie TJ. Statistical models in S. London, United Kingdom: Chapman and Hall Ltd, 1993.
-
Almudevar T, Tibshirani R. GAIM user's guide. Toronto, Ontario, Canada: S. N. Tibshirani Enterprises, Inc, 1991.
-
Stata Corporation. Stata statistical software, release 6.0. College Station, TX: Stata Corporation, 1999.
-
Mathur AK, Lu Y. Re: "Estimating relative risk functions in case-control studies using a non-parametric logistic regression." (Letter). Am J Epidemiol 1997;146:8823.[ISI][Medline]
-
Greenland S. Re: "Estimating relative risk functions in case-control studies using a non-parametric logistic regression." (Letter). Am J Epidemiol 1997;146:8834.[ISI][Medline]
-
Greenland S, Michels KB, Robins JM, et al. Presenting statistical uncertainty in trends and dose-response relations. Am J Epidemiol 1999;149:107786.[Abstract]
-
Easton DF, Peto J, Babiker AG. Floating absolute risk: an alternative to relative risk in survival and case-control analysis avoiding an arbitrary reference group. Stat Med 1991;10:102535.[ISI][Medline]
Received for publication July 29, 1999.
Accepted for publication .