Flexible Meta-Regression Functions for Modeling Aggregate Dose-Response Data, with an Application to Alcohol and Mortality

Vincenzo Bagnardi, Antonella Zambon, Piero Quatto and Giovanni Corrao 

From the Department of Statistics, University of Milan–Bicocca, Milan, Italy.

Received for publication April 3, 2002; accepted for publication December 16, 2003.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 
In this paper, the authors describe fractional polynomials and cubic splines with which to represent smooth dose-response relations in summarizing meta-analytical aggregate data. Use of these two curve-fitting families can help prevent the problems arising from inappropriate linearity assumptions. These methods are illustrated in the problem of estimating the shape of the dose-response curve between alcohol consumption and all-cause mortality risk. The authors considered aggregate data from 29 cohort studies investigating this issue (1966–2000). J-shaped curves with a nadir at approximately 5–7 g/day of alcohol consumption and a last protective dose of 47–60 g/day were consistently obtained from fractional polynomials and cubic splines. The authors conclude that both of the curve-fitting families are useful tools with which to explore dose-response epidemiologic questions by means of meta-analytical approaches, especially when important nonlinearity is anticipated.

alcohol drinking; dose-response; meta-analysis; mortality; polynomial regression; regression analysis; spline smoothing

Abbreviations: Abbreviations: AIC, Akaike’s Information Criterion; CI, confidence interval; RR, relative risk.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Meta-analysis is the statistical analysis of a collection of analytical results for the purpose of integrating the findings (1).

Categorical methods are most commonly used to analyze the relation between an exposure and disease risk (2). In a single report, results are almost always presented in a 2 x J table, with a condition (present or absent) given on one margin of the table, J levels of exposure on the other, and a baseline category which serves as the common reference group for J – 1 dose-specific estimates of the relative risk (RR) of disease. When access to the original data is not allowed, the only way to combine the results of several reports is to use aggregate data from the above-described tables.

One method of summarizing dose-response relations across studies is to estimate b, the change in the natural logarithm of the relative risk per unit of exposure within each study, and to combine these estimates across studies (3). Such an approach can be misleading, however, because it assumes that the dose-response relation follows a specific model form, usually linear (4).

Attempts to represent nonlinearity are usually made by means of polynomial models, typically quadratic models (5, 6). However, low-order polynomials offer a limited family of shapes, and high-order polynomials may fit poorly at the extreme values of the exposure variable (7). A further disadvantage is that polynomials do not have asymptotes and cannot fit data for which a threshold value is expected.

Nonparametric regression methods are useful alternatives for avoiding strict assumptions about the form of the dose-response relation (8, 9). A main hindrance of this approach is the lack of widely available software. Moreover, nonparametric regression is particularly useful when nothing may be assumed about the form of the exposure-disease relation. On the contrary, for most epidemiologic purposes, knowledge or strong prior evidence exists about the functional form of the relation of interest. This is particularly true in meta-analysis, for which the shape of the relation is supplied by individual studies. Finally, because of the data-driven nature of nonparametric regression, the nonparametric curve estimation methods only provide accurate estimates of the effects of particular exposure levels, rather than investigate the whole structure of the relation of interest.

Two alternative curve-fitting methods, fractional polynomial regression and spline regression, have recently been described by Greenland (10) and Royston (11). Fractional polynomials are a family of models considering as covariates power transformations of a continuous exposure variable restricted to a small predefined set of integer and noninteger exponents (12). The family includes conventional polynomials as a particular case and therefore comes close to simple regression, though it supplies important improvements (13). Splines are a family of smooth functions that can take on virtually any shape, and, as a consequence, they come close to nonparametric regression (14).

Such approaches are underused in epidemiologic research, and to our knowledge they have never been compared in meta-analysis of dose-response aggregate data. Therefore, we attempted to explore the use of these flexible tools to model aggregate dose-response data. Below we use a meta-analysis of alcohol consumption and all-cause mortality to illustrate the methods discussed in this paper.


    APPLICATION
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Epidemiologic studies have consistently shown that moderate consumers of alcohol have a reduced risk of all-cause mortality in comparison with nondrinkers and heavy drinkers (15, 16) because of a protective effect of alcohol consumption against coronary heart disease (1719), despite its harmful effects on risk of other diseases (20). Knowledge of the functional relation between alcohol intake and mortality is important in formulating public health messages (21), but only a few limited quantitative overviews of the evidence have been performed (2224).

Data collection
A meta-analysis of all of the published cohort studies investigating this issue was carried out. Articles included were found through a search of the literature published from 1966 to 2000 in MEDLINE and other bibliographic databases (Current Contents, EMBASE, and CAB Abstracts), supplemented by attention to all references listed in the selected articles and by a hand search of the most relevant journals of epidemiology and medicine. Each identified study was reviewed, and the study was included in the analysis if the corresponding findings were reported as relative risks (and numbers of cases and person-years) corresponding to two or more levels of alcohol consumption with respect to the nondrinking category. Studies that were excluded from analysis were excluded because of baseline exposure contamination (e.g., use of the lowest percentile category as the reference group) or consideration of alcoholism as an exposure category. A data set was constructed as a series of records including information about each dose-response reported by the included studies. The main fields in the data set were: 1) the value x of exposures (expressed as g/day) assigned as the midpoints of the ranges of the reported categories of alcohol intake (as suggested by Berlin et al. (5), the x values were calculated as 1.2 times the lower bound of the open-ended upper categories); 2) frequency counts, adjusted estimates of log RR, and corresponding 95 percent confidence intervals for each exposure level (when confidence intervals were not reported, we evaluated them using the standard error Woolf estimator for large samples (2)); and 3) covariates describing the characteristics of the study. Major details about the methods used in assessing the studies’ characteristics are given elsewhere (19, 20).

Describing aggregate data
Twenty-nine cohort studies (16, 2552) including a total of 171,869 deaths were considered in the analysis (table 1). The scatterplot shown in figure 1 was obtained by pooling the results reported from all of the included studies. Although the data are quite scattered, a rather rapid decrease in log RR for low doses and an approximately linear increase thereafter is observed. However, the data are not homogeneously distributed in number and precision along the scale of alcohol intake. Rather, the reported estimates mainly pertain to low doses for which more precise estimates are available.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Summary of the features of 29 cohort studies of the effect of alcohol intake on all-cause mortality included in a meta-analysis, 1966–2000
 


View larger version (14K):
[in this window]
[in a new window]
 
FIGURE 1. Natural logarithm of the relative risk of all-cause mortality according to alcohol intake in a meta-analysis of 29 cohort studies (16, 2552) 1966–2000. The area of each circle is proportional to the precision of the relative risk (inverse of its variance).

 

    MODEL FITTING
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 
In modeling a function across the 184 points shown in figure 1, the following problems are encountered.

First, the conventional procedure estimating the slope within each study and then combining these estimates across studies (the "postpool" method) is not suitable when more than one parameter needs to be estimated to explore nonlinear relations. In fact, within each study, J – 2 parameters can be estimated in an unsaturated model, implying that only linear relations can be explored for studies reporting three exposure levels. An alternative involves pooling of the study data before trend analysis (the "prepool" method). This method is algebraically equivalent to the postpool procedure when linear relations are investigated, but it has the advantage of allowing fitting and testing of nonlinear trends (5, 53).

Second, because the relative risks within each study were calculated using a common reference group, the independence assumption is systematically violated. Greenland and Longnecker (53) suggested correcting the reported relative risks for the covariance matrix asymptotically estimated from the correlation between all pairs of log RRs, thus obtaining estimators of coefficients and their variance, respectively more efficient and more consistent, than the corresponding uncorrected estimators. In brief, the asymptotic covariance matrix of log RRs of the kth study was estimated as C = W-1RW-1, where W-1 is a diagonal matrix with the standard errors of the adjusted log RRs on the diagonal and R is the correlation matrix of the crude log RRs estimated using the delta method applied to the cross-classification of exposure and outcome. (See Appendix 3 in Greenland and Longnecker (53) for major details.)

Third, the relative risk within each study was estimated with differing precision, and consequently each observation should be considered with a weight equal to the inverse of its variance (being Var(log RR) = [(log log RR)/3.92]2, where and RR and the upper and lower limits of the 95 percent confidence interval of the relative risk estimate (3)).

Hence, inverse-variance-weighted and correlation-corrected prepool methods were consistently used in fitting the flexible meta-regression models described in the following two sections. This method assumes that the only source of variability of the effect size is the within-study sampling variation (fixed-effects model). Nevertheless, meta-analyses often have to deal with issues relating to comparability of studies. This implies that differences in effect size among studies, in addition to those within studies, should be considered as a further source of random variability when important heterogeneity of the effect size is observed. Therefore, according to DerSimonian and Laird (1), an additional component of the variance explaining the between-study heterogeneity was considered in weighting each observation (random-effects model). The deviance (D) of both fixed- and random-effects models was computed, and the likelihood ratio test for homogeneity (1 df) was constructed as the difference between the random-effects model deviance and the fixed-effects model deviance (21).

The 95 percent confidence intervals of the model-predicted log RRs were always obtained by the large-sample approximation , where is the vector of estimated parameters, x is the vector of covariates, and V is the covariance matrix of (10).

All of the analyses were carried out using a Statistical Analysis System macro (54, 55) developed for the current application. It is briefly described in the Appendix.


    FRACTIONAL POLYNOMIAL MODELS
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Fractional polynomials of order M are defined as a family of model functions of a single positive covariate x. The model

describes log RR as a function of power transformations of the exposure variable x, where pm is an element of the powers vector p with m = 1, ..., M and ßm is the corresponding coefficient (11). (No intercept term was considered for this; likewise for all of the other models described in this paper, since for x = 0 the function would start from log RR = 0.)

It is worth considering the family of second-order fractional polynomials specifically, because only monotonic curves may be modeled by the first-order models, and because fractional polynomials with an order higher than 2 are rarely required in practice (11).

Although it is rather simple, the family of second-order fractional polynomial models offers considerably flexibility. In particular, by choosing p1 and p2 from a predefined set P = {–2, –1, –0.5, 0, 0.5, 1, 2, 3}, a very rich set of possible functions, including some so-called U-shaped and J-shaped relations, may be accommodated. The powers are expressed according to the Box-Tidwell transformation (12), in which denotes if pi != 0 and log x if pi = 0. When p1 = p2 = p, the model becomes log(RR½x) = ß1xp + ß2(xp log x).

The best fit among the family of models thus generated is defined as that with the highest likelihood or, equivalently, that with the lowest deviance. In the setting of fractional polynomials of order M (let us suppose, with M = 2), it is convenient to use the deviance associated with a reference model (e.g., with the conventional quadratic polynomial) as the baseline for reporting the deviances of other models. Thus, the gain (G) for a given model is defined as the deviance associated with the reference model minus that for the model in question. Since G moves in the direction opposite that of D, a larger gain indicates a better fit (56).

When several fractional polynomials of the same order are to be compared, Royston and Altman (12) suggest preferring a given model with respect to the reference model when (the 90th percentile of {chi}2 with M df, that is, 4.61 for second-order models).

The conventional quadratic model (p1 = 1, p2 = 2) better fits data than the linear model (p1 = 1), with a gain in deviance of 54.32, according to the expected J-shaped dose-response relation between alcohol and mortality.

Table 2 shows the gain in fitting each second-order fractional polynomial model with respect to the quadratic polynomial. The best-fitting model (p1 = p2 = 0.5) offers a gain in deviance of 138.36 with respect to the reference model.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Second-order fractional polynomials for modeling the relation between alcohol intake and relative risk of all-cause mortality reported in 29 cohort studies included in a meta-analysis, 1966–2000*,{dagger}
 

    SPLINE REGRESSION MODELS
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 
A spline function is a smoothly joined piecewise polynomial of order q (14). In particular, cubic spline models, in which third-order polynomials (q = 3) are fitted within each category, the extremes of which are defined by two knots, have the most frequent use because they provide great flexibility for fitting data. By dividing the observed range of exposure x into C categories indexed as c = 1, ..., C with C – 1 knots indexed as k1, ..., kC–1, a third-order polynomial

log(RR½x) = ß01x + ß02x2 + ß03x3

is fitted within each category. Smooth joints are obtained by assuming that the function is 2 times’ continuously differentiable on the range of x. Simultaneously fitting the C category-specific cubic models (equation 2) and taking into account continuity for the function and its derivatives is equivalent to fitting the single cubic spline model

where (xki)+ = max (0, (xki)) is the positive part. Hence, to fit the model shown in equation 3, C + 2 parameters would be estimated.

Depending on the exposure distribution, it might sometimes happen that one finds odd behavior on the part of the fitted curve in the last exposure category, such as sudden and unexpected changes in trend. When this happens, further constraints can be easily introduced by restricting the fitted curve to be linear in the upper category (14).

The cubic spline with the upper category restricted to be linear is

obtained applying to equation 3 the linearity condition for x >= kC–1 expressed by the constraints

and

It is worth considering that in restricting a spline of order q with C – 1 knots, C – 1 + q – (q 1) = C parameters would be estimated in the fitting of equation 4, independently of the order of the polynomial.

A crucial problem in spline regression is the choice of the location and number of knots. With the aim of avoiding arbitrary choices, knots have been firstly located as boundaries between equal-sized categories (e.g., one knot at the 50th percentile, two knots at the 33rd and 66th percentiles, and so on). However, a problem with this fixed percentile procedure is that it allows checking of only a few possible knots among many vastly putatively better knot positions. Following the recommendation of Royston (56), the alternative approach used in the current application is to assign knots at random positions and evaluate the likelihood of each resulting model. This procedure, called the random percentile procedure, was repeated 100 times, generating a family of 400 models (4 x 100) randomly selected among those with one, two, three, and four knots, with the model with the best goodness of fit being regarded as the best among those entertained. The model minimizing Akaike’s Information Criterion (AIC), a penalized likelihood which takes into account the number of parameters estimated in the model (56), was used as a general rule in the model choice, because the AIC has been shown to be appropriate when the model complexity is potentially unlimited (11). In the random percentile procedure, the position of the knot was regarded as an additional parameter of the model to be taken into account in the computation of the AIC (56).

Table 3 shows the AIC values according to increasing numbers of knots of cubic splines from 1 to 4. Restricted cubic splines and, among these, models constructed by considering the random percentile procedure for the choice of knot locations had lower AIC values. Clearly, there is nothing to be gained in this example by increasing the number of knots, since this always increases the AIC. Thus, the model with one knot optimized at the 25th percentile remains the most satisfactory choice.


View this table:
[in this window]
[in a new window]
 
TABLE 3. Akaike’s Information Criterion values for several unrestricted and restricted cubic spline models for alcohol-mortality data reported in 29 cohort studies included in a meta-analysis, 1966–2000*,{dagger}
 
In influence analysis, which involves seeing how much results change when the most influential data points are deleted from the analysis, these basic results of both fractional polynomial and cubic spline models were unchanged (data not shown).


    COMPARING MODELS
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Fixed-effect models used in the previous sections to explore the best-fitting nonlinear functions left very large deviances, thus suggesting important heterogeneity among studies. Therefore, we repeated meta-regression analyses using random-effects models. Second-order fractional polynomials with p1 = p2 = 0.5 and restricted cubic splines with one knot randomly selected were again consistently selected. The deviances of fixed- and random-effects models fell from 1,141.72 to 683.78 (conventional quadratic), from 1,003.35 to 679.16 (fractional polynomials), and from 1,013.10 to 679.98 (cubic splines), offering evidence that: 1) flexible models better describe the shape of the dose-response curve than quadratic models; 2) almost equal deviances were left from the best-fitting fractional polynomial and cubic spline functions; and 3) between-study heterogeneity significantly influenced our estimates in all of the compared models (likelihood ratio test values (1 df) for testing the null hypothesis of homogeneity were 457.94, 324.19, and 341.12 for quadratic, fractional polynomial, and cubic spline models, respectively; p < 0.0001).

The functions obtained by fitting conventional quadratic and best-fitting fractional polynomial and cubic spline random-effects models are compared in figure 2. Very similar trends were obtained by means of the best-fitting fractional polynomial and cubic spline models, whereas the quadratic curve was very different. The nadir (the level of alcohol consumption at which all-cause risk is lowest) and the last protective dose were 20 and 52 g/day, 5 and 44 g/day, and 6 and 46 g/day according to the quadratic, fractional polynomial, and cubic spline models, respectively.



View larger version (13K):
[in this window]
[in a new window]
 
FIGURE 2. Relation between the natural logarithm of the relative risk of all-cause mortality and alcohol intake reported in 29 cohort studies included in a meta-analysis (16, 2552) 1966–2000. The relation is modeled by the best-fitting restricted cubic spline (with one knot at 25 g/day of alcohol intake), the best-fitting second-order fractional polynomial (with two root squared terms of alcohol intake), and a conventional quadratic polynomial.

 
In the following analysis, we explored the effect of gender in modifying the relative risk function as an attempt to explain part of the between-study heterogeneity. With this purpose, we considered eight studies reporting gender-specific estimates (16, 25, 29, 33, 37, 40, 44, 47). We again consistently selected restricted cubic splines with one knot randomly selected and second-order fractional polynomials with p1 = p2 = 0.5. We estimated the effect of gender by including in the corresponding models the interaction terms between gender and the selected exposure variables. Significant portions of heterogeneity were explained by gender, since residual deviances moved from 371.45 to 293.59 (fractional polynomials) and from 373.51 to 291.96 (cubic splines) when we fitted models without and with the inclusion of a gender effect, respectively. (The corresponding likelihood ratio test statistics have 1 df in both cases, since only one of the two terms showed a significant effect.) However, important between-study heterogeneity was again present, since we obtained significantly larger deviances by fitting fixed-effect models. (Likelihood ratio test values (1 df) for testing the null hypothesis of homogeneity were 124.92 and 126.25 for fractional polynomials and cubic splines, respectively; p < 0.0001.)

Figure 3 plots the dose-response fractional polynomial and cubic spline functions in men and women. In men, the nadir is reached at 6 g/day and 7 g/day according to fractional polynomials and cubic splines, respectively (RR = 0.82 (95 percent confidence interval (CI): 0.74, 0.90) and RR = 0.83 (95 percent CI: 0.74, 0.93), respectively), and the protective action is observed up to a level of 60 g/day for both models. However, the protective effect was statistically significant only up to a daily intake of 28 g (RR = 0.90 (95 percent CI: 0.82, 0.99) and RR = 0.90 (95 percent CI: 0.81, 0.99), respectively). In women, the nadir is reached at 5 g/day and 6 g/day according to fractional polynomials and cubic splines, respectively (RR = 0.76 (95 percent CI: 0.68, 0.85) and RR = 0.77 (95 percent CI: 0.68, 0.88), respectively), and the protective action is observed up to levels of 47 g/day and 51 g/day. The protective effect was statistically significant only up to a daily intake of 24 g (RR = 0.89 (95 percent CI: 0.80, 0.99) and RR = 0.87 (95 percent CI: 0.77, 0.97), respectively).



View larger version (21K):
[in this window]
[in a new window]
 
FIGURE 3. Relation between the natural logarithm of the relative risk of all-cause mortality and alcohol intake among men and women in the eight studies included in the meta-analysis that reported gender-specific estimates (16, 25, 29, 33, 37, 40, 44, 47). The relation is modeled by the best second-order fractional polynomial and the best restricted cubic spline.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 
As the number of studies on a given dose-response association increases, meta-analysis of such studies becomes an increasingly useful tool for summarizing and evaluating the existing literature (53). The advantage of meta-analysis is that it reduces the effects of noise and of accidental confounding, thus providing more reliable and precise estimates of the effect size than single studies (57).

A challenge in epidemiologic studies, as well as in meta-analysis of such studies, is to determine the form of the relation between a given exposure and a disease. If the dose-response trend is not linear, there may be important public heath implications. For example, if there is a threshold for the effect of alcohol intake on the risk of death, measures addressed to the general population to reduce exposure below the threshold level would have more public health implications than control measures focused primarily on high consumers. Hence, there is considerable interest in understanding the shape of the dose-response relation.

In this paper, we have discussed flexible curve-regressing methods for obtaining models whose functional form is consistent with background knowledge.

An advantage of parametric approaches is that they allow researchers to obtain easily communicable results. The main advantage of parametric regression models using fractional polynomials rather than traditional ones is that models containing as few as two power transformations encompass a large range of shapes (13), allowing the arrangement of almost all known dose-response relations. In the current application, a dramatic improvement in the goodness of fit of the best-fitting second-order fractional polynomials with respect to the quadratic polynomial, which is the most popular tool for encompassing J-shaped relations, has been observed.

However, both traditional and fractional polynomials make use of all of the available data to build the global structure of the relation. Of course, this global structure may or may not be correct. Thus, perturbations of the data at an extreme end of the exposure range necessarily affect the model at the opposite extreme. This application, however, shows that fractional polynomials seem less affected by this than ordinary polynomials (see figure 2). In fact, the fast decrease in relative risk that is visually evident in the scatterplot of figure 1, with the gradual increase in relative risk values thereafter, seems better encompassed by the selected fractional polynomial model than the quadratic polynomial. Accordingly, the dose of alcohol showing the maximum protective effect, such as the one that discriminates between protective and harmful effects, showed much lower values for the best-fitting fractional polynomial than did the traditional quadratic polynomial. However, the effect of perturbations at extreme points of the exposure variable must still be there in some form.

As a consequence, it should be considered good practice to compare the results of the best-fitting fractional polynomial with those of spline analysis, which is characterized by local adapting. In this application, similar nadirs and last protective doses were found using the two approaches.

The meta-analysis described in this paper included most of the published information on alcohol and mortality currently available in the literature. The results suggest that the level of alcohol consumption at which all-cause risk is lowest is approximately 5 g/day and that alcohol exerts a protective action up to a daily intake of approximately 45 g, without any relevant differences between fractional polynomials and cubic splines.

However, because of the general weakness of meta-analysis of observational studies, our findings should be interpreted with caution. Use of different strategies in characterizing exposure categories (including the approximation involved in using the midpoint of each exposure category and the accommodation of the last open-ended category) will necessarily drive investigators toward different conclusions, the extent of which has not been investigated here. Perplexities mainly concern issues relating to comparability between studies. For example, it is questionable to pool results from studies that differ in terms of the measurement of alcohol consumption, the demographic characteristics of investigated subjects, the main patterns of alcohol consumption, the methods of obtaining the reported relative risks, the degrees of adjustment of the reported relative risks, and other unmeasured factors. Among these sources of variability, we investigated only the effect of gender, finding that: 1) the level of alcohol consumption at which all-cause risk is lowest was approximately 6 g/day in both men and women; 2) the extent of the greatest reduction in death risk was approximately 18 percent in men and 24 percent in women; and 3) alcohol seems to exert a protective action up to daily intakes of 60 g in men and 50 g in women. These findings are consistent with a recently published pool of systematic reviews (58). Other sources of heterogeneity, however, need to be investigated to accurately investigate the dose-response relation between alcohol and mortality. Moreover, other issues, such as publication bias, were not addressed in our study.

Therefore, we think that flexible modeling of dose-response aggregate data allows exploration of functional relations but does not overcome the general weakness of meta-analysis of observational studies. More generally, meta-analysis should be considered a useful alternative to but not a substitute for large, well-designed single investigations.

In summary, our results show that summarized curves generated from conventional polynomials may be misleading. Fractional polynomials and cubic splines both provide great flexibility for meta-analysis of dose-response aggregate data and are especially valuable when important nonlinearity is anticipated. The simple algebraic forms of such models and the fact that they are linear in their coefficients makes it easy to implement them in practice with the use of standard statistical software. Fractional polynomials are easier to communicate mathematically, require the estimation of fewer parameters, and are less influenced by arbitrariness in the choice of the model. On the other hand, cubic splines more adequately encompass sudden and important changes in relative risks. Hence, parallel analysis with both approaches seems the best strategy for obtaining unbiased and plausible dose-response curves.


    ACKNOWLEDGMENTS
 
The authors thank Dr. Martyn Plummer of the International Agency for Research on Cancer (Lyon, France) for his valuable suggestions.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 
Some details about the Statistical Analysis System (SAS)/Interactive Matrix Language (IML) (54) macro "flexible," used to produce the key analyses included in the paper, are given below.

The macro requires a SAS input data set specified by the user containing the following fields: id, the unique identifier of the study; exp, the assigned numerical value x of exposure; and n1 and n2, the numbers of cases and noncases (person-times or persons at risk) for each exposure level. In addition, adjusted estimates of relative risk (adjrr) for the nonzero exposure level and the lower (lower) and upper (upper) limits of the corresponding 95 percent confidence intervals can be reported. A set of covariates describing the study’s characteristics can also be added.

To call up the macro, the user needs to be in IML. The "flexible" macro is included in the program via the "%include ‘filename " statement, where filename is the name of the "flexible" macro file. Next the following macro variables are required: dataset, which indicates the name of the user’s input data set; ncov and covlist, representing, respectively, the number and names of the covariates describing the study’s characteristics; and type, which specifies whether the analysis has to be based on fractional polynomials (type = 1) or on cubic splines (type = 2). If analyses based on fractional polynomials (FP) are chosen, the "_FP_" module is invoked, and the following macro variables must be specified: power, defining the set of r and s power transformations of the exposure variable, and ref, defining the reference model in the gain computation. If analyses based on cubic splines (CS) are chosen, the "_CS_" module is invoked, and the following macro variables must be specified: restr, specifying whether the cubic smoothing spline has to be tail-restricted, and nrnd, defining the number of replications needed for the random percentile knot allocation method. Finally, the macro variable random defines whether random-effects models are to be fitted within the "_FP_" or "_CS_" modules.

The following instructions allow us to perform fractional polynomial analysis considering data from all of the 29 studies included in this analysis, which are stored in a data set called COHORT.

proc iml;

%include ‘flex_mac’;

dataset = COHORT; ncov = 0; covlist = {" "}; type = 1; power = {–2, –1, –0.5, 0, 0.5, 1, 2, 3}; ref = {1, 2}; random = 0;

%flexible(dataset, ncov, covlist, type, power, ref, random);

quit;

The resulting output is a table displaying the gain in the deviance for each second-order model with respect to the second-order reference model (see table 2). Parameter estimates and the corresponding standard errors of the reference and of the best-fitting model are also given.

In addition, an output data set called _PRED_ reports, for each model, the predicted value of the log relative risk (and 95 percent confidence interval) for each integer value of the exposure variable ranging from zero to the maximum value reported in the original data. Detailed graphs of the dose-response relation, like those presented in figure 2, could be obtained using any graphic software simply by plotting the data stored in _PRED_.

Further details on the use of the macro can be obtained from the first author (V. B.).


    NOTES
 
Correspondence to Professor Giovanni Corrao, Dipartimento di Statistica, Università degli Studi di Milano-Bicocca, Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy (e-mail: giovanni.corrao{at}unimib.it). Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 APPLICATION
 MODEL FITTING
 FRACTIONAL POLYNOMIAL MODELS
 SPLINE REGRESSION MODELS
 COMPARING MODELS
 DISCUSSION
 APPENDIX
 REFERENCES
 

  1. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials 1986;7:177–88.[CrossRef][ISI][Medline]
  2. Rothman KJ, Greenland S. Modern epidemiology. 2nd ed. Philadelphia, PA: Lippincott-Raven Publishers, 1998.
  3. Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiol Rev 1987;9:1–30.[ISI][Medline]
  4. Maclure M, Greenland S. Tests for trend and dose response: misinterpretations and alternatives. Am J Epidemiol 1992;135:96–104.[Abstract]
  5. Berlin JA, Longnecker MP, Greenland S. Meta-analysis of epidemiologic dose-response data. Epidemiology 1993;4:218–28.[ISI][Medline]
  6. Friedenreich CM. Methods for pooled analyses of epidemiologic studies. Epidemiology 1993;4:295–302.[ISI][Medline]
  7. McCullagh P, Nelder JA. Generalized linear models. London, United Kingdom: Chapman and Hall, 1989.
  8. Hastie T, Tibshirani R. Generalized additive models. London, United Kingdom: Chapman and Hall, 1990.
  9. Green PJ, Silverman BW. Nonparametric regression and generalized linear models: a roughness penalty approach. London, United Kingdom: Chapman and Hall, 1994.
  10. Greenland S. Dose-response and trend analysis in epidemiology: alternative to categorical analysis. Epidemiology 1995;6:356–65.[ISI][Medline]
  11. Royston P. A strategy for modelling the effect of a continuous covariate in medicine and epidemiology. Stat Med 2000;19:1831–47.[CrossRef][ISI][Medline]
  12. Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat 1994;43:429–67.[ISI]
  13. Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol 1999;28:964–74.[Abstract]
  14. Durrleman S, Simon R. Flexible regression models with cubic splines. Stat Med 1989;8:551–61.[ISI][Medline]
  15. Doll R, Peto R, Hall E, et al. Mortality in relation to consumption of alcohol: 13 years’ observations on male British doctors. BMJ 1994;309:911–18.[Abstract/Free Full Text]
  16. Thun MJ, Peto R, Lopez AD, et al. Alcohol consumption and mortality among middle-aged and elderly U.S. adults. N Engl J Med 1997;337:1705–14.[Abstract/Free Full Text]
  17. Stampfer MJ, Colditz GA, Willett WC, et al. A prospective study of moderate alcohol consumption and the risk of coronary disease and stroke in women. N Engl J Med 1988;319:267–73.[Abstract]
  18. Rimm EB, Giovannucci EL, Willett WC, et al. Prospective study of alcohol consumption and risk of coronary disease in men. Lancet 1991;338:464–8.[ISI][Medline]
  19. Corrao G, Rubbiati L, Bagnardi V, et al. Alcohol and coronary heart disease: a meta-analysis. Addiction 2000;95:1505–23.[CrossRef][ISI][Medline]
  20. Corrao G, Bagnardi V, Zambon A, et al. Exploring the dose-response relationship between alcohol consumption and the risk of several alcohol-related conditions: a meta-analysis. Addiction 1999;94:1551–73.[CrossRef][ISI][Medline]
  21. van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med 2002;21:589–624.[CrossRef][ISI][Medline]
  22. Norstrom T. Prevention strategies and alcohol policy. Addiction 1995;90:515–24.[CrossRef][ISI][Medline]
  23. White IR. The level of alcohol consumption at which all-cause mortality is least. J Clin Epidemiol 1999;52:967–75.[CrossRef][ISI][Medline]
  24. Leino EV, Romelsjo A, Shoemaker C, et al. Alcohol consumption and mortality. II. Studies of male populations. Addiction 1998;93:205–18.[CrossRef][ISI][Medline]
  25. Liao Y, McGee DL, Cao G, et al. Alcohol intake and mortality: findings from the National Health Interview Surveys (1988 and 1990). Am J Epidemiol 2000;151:651–9.[Abstract]
  26. Gaziano JM, Gaziano TA, Glynn RJ, et al. Light-to-moderate alcohol consumption and mortality in the Physicians’ Health Study enrollment cohort. J Am Coll Cardiol 2000;35:96–105.[CrossRef][ISI][Medline]
  27. Hart CL, Smith GD, Hole DJ, et al. Alcohol consumption and mortality from all causes, coronary heart disease, and stroke: results from a prospective cohort study of Scottish men with 21 years of follow up. BMJ 1999;318:1725–9.[Abstract/Free Full Text]
  28. Brenner H, Arndt V, Rothenbacher D, et al. The association between alcohol consumption and all-cause mortality in a cohort of male employees in the German construction industry. Int J Epidemiol 1997;26:85–91.[Abstract]
  29. Keil U, Chambless LE, Doring A, et al. The relation of alcohol intake to coronary heart disease and all-cause mortality in a beer-drinking population. Epidemiology 1997;8:150–6.[ISI][Medline]
  30. Wannamethee SG, Shaper AG. Lifelong teetotallers, ex-drinkers and drinkers: mortality and the incidence of major coronary heart disease events in middle-aged British men. Int J Epidemiol 1997;26:523–31.[Abstract]
  31. Yuan JM, Ross RK, Gao YT, et al. Follow up study of moderate alcohol intake and mortality among middle aged men in Shanghai, China. BMJ 1997;314:18–23.[Abstract/Free Full Text]
  32. Fuchs CS, Stampfer MJ, Colditz GA, et al. Alcohol consumption and mortality among women. N Engl J Med 1995;332:1245–50.[Abstract/Free Full Text]
  33. Rehm J, Sempos CT. Alcohol consumption and all-cause mortality. Addiction 1995;90:471–80.[CrossRef][ISI][Medline]
  34. Serdula MK, Koong SL, Williamson DF, et al. Alcohol intake and subsequent mortality: findings from the NHANES I Follow-up Study. J Stud Alcohol 1995;56:233–9.[ISI][Medline]
  35. Goldberg RJ, Burchfiel CM, Reed DM, et al. A prospective study of the health effects of alcohol consumption in middle-aged and elderly men: The Honolulu Heart Program. Circulation 1994;89:651–9.[Abstract]
  36. Paunio M, Heinonen OP, Virtamo J, et al. HDL cholesterol and mortality in Finnish men with special reference to alcohol intake. Circulation 1994;90:2909–18.[Abstract]
  37. Cullen KJ, Knuiman MW, Ward NJ. Alcohol and mortality in Busselton, Western Australia. Am J Epidemiol 1993;137:242–8.[Abstract]
  38. Hein HO, Sorensen H, Suadicani P, et al. Alcohol consumption, Lewis phenotypes, and risk of ischaemic heart disease. Lancet 1993;341:392–6.[ISI][Medline]
  39. de Labry LO, Glynn RJ, Levenson MR, et al. Alcohol consumption and mortality in an American male population: recovering the U-shaped curve—findings from the Normative Aging Study. J Stud Alcohol 1992;53:25–32.[ISI][Medline]
  40. Klatsky AL, Armstrong MA, Friedman GD. Alcohol and mortality. Ann Intern Med 1992;117:646–54.[ISI][Medline]
  41. Andreasson S, Romelsjo A, Allebeck P. Alcohol, social factors and mortality among young men. Br J Addict 1991;86:877–87.[ISI][Medline]
  42. Boffetta P, Garfinkel L. Alcohol drinking and mortality among men enrolled in an American Cancer Society prospective study. Epidemiology 1990;1:342–8.[Medline]
  43. Garfinkel L, Boffetta P, Stellman SD. Alcohol and breast cancer: a cohort study. Prev Med 1988;17:686–93.[ISI][Medline]
  44. Camacho TC, Kaplan GA, Cohen RD. Alcohol consumption and mortality in Alameda County. J Chronic Dis 1987;40:229–36.[ISI][Medline]
  45. Gordon T, Doyle JT. Drinking and mortality: The Albany Study. Am J Epidemiol 1987;125:263–70.[Abstract]
  46. Suhonen O, Aromaa A, Reunanen A, et al. Alcohol consumption and sudden coronary death in middle-aged Finnish men. Acta Med Scand 1987;221:335–41.[ISI][Medline]
  47. Friedman LA, Kimball AW. Coronary heart disease mortality and alcohol consumption in Framingham. Am J Epidemiol 1986;124:481–9.[Abstract]
  48. Kono S, Ikeda M, Tokudome S, et al. Alcohol and mortality: a cohort study of male Japanese physicians. Int J Epidemiol 1986;15:527–32.[Abstract]
  49. Colditz GA, Branch LG, Lipnick RJ, et al. Moderate alcohol and decreased cardiovascular mortality in an elderly cohort. Am Heart J 1985;109:886–9.[ISI][Medline]
  50. Klatsky AL, Friedman GD, Siegelaub AB. Alcohol and mortality: a ten-year Kaiser-Permanente experience. Ann Intern Med 1981;95:139–45.[ISI][Medline]
  51. Marmot MG, Rose G, Shipley MJ, et al. Alcohol and mortality: a U-shaped curve. Lancet 1981;1:580–3.[Medline]
  52. Dyer AR, Stamler J, Paul O, et al. Alcohol consumption and 17-year mortality in the Chicago Western Electric Company Study. Prev Med 1980;9:78–90.[CrossRef][ISI][Medline]
  53. Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to meta-analysis. Am J Epidemiol 1992;135:1301–9.[Abstract]
  54. SAS Institute, Inc. SAS/IML software: usage and reference. Version 6. Cary, NC: SAS Institute, Inc, 1985.
  55. Littell RC, Milliken GA, Stroup WW, et al. SAS system for mixed models. Cary, NC: SAS Institute, Inc, 1996.
  56. Royston P. Flexible parametric alternative to the Cox model, and more. Stata J 2001;1:1–28.
  57. Lindsey JK, Jones B. Choosing among generalized linear models applied to medical data. Stat Med 1998;15:59–68.
  58. White IR, Altmann DR, Nanchahal K. Alcohol consumption and mortality: modelling risks for men and women at different ages. BMJ 2002;325:191.[Abstract/Free Full Text]




This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (4)
Disclaimer
Request Permissions
Google Scholar
Articles by Bagnardi, V.
Articles by Corrao, G.
PubMed
PubMed Citation
Articles by Bagnardi, V.
Articles by Corrao, G.