1 National Institute of Occupational Health Sciences, Cincinnati, OH 45226-1998.
2 Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH 45215.
In a recent issue of the Journal, McNutt et al. (1) provided an excellent discussion of the drawbacks of the method proposed by Zhang and Yu (2). They correctly concluded that use of the log-binomial model is the correct method for estimating the relative risk in studies with common outcomes, which was the conclusion of Skov et al. (3).
McNutt et al. (1) did not give any examples illustrating estimation of relative risks for continuous covariates. They discussed possible convergence problems with the log-binomial model but did not give any definite solution. When estimating the relative risk using the log-binomial model, there is a restriction on the parameter space. Hence, the maximum likelihood solution might be on the boundary of the parameter space. This frequently happens for continuous covariates if the outcome is very prevalent and the relation is strong (4). Requiring additional iterations will not solve this problem. Standard software will be unable to find the correct estimate and standard error, because it uses some form of Newtons method to find where the likelihood function has a maximum. If the solution is on the boundary of the parameter space, the derivative will not be equal to 0, so Newtons method will not work. Deddens et al. (4) considered this problem, gave an illustrative example, and proposed a solution, called the COPY method. It involves combining (c 1) copies of the original data set with one copy of the original data set with all values of the dependent variable, Y = 0, 1, interchanged. For the expanded data set, the solution should no longer be on the boundary of the parameter space, and standard software will find the solution, which will be close to the solution for the original data. The standard error is then adjusted for the increase in sample size by multiplying by the square root of c. Simulations show that the COPY method works well when c = 1,000.
Use of the Poisson regression model (1) will result in exactly the same estimates and standard errors as use of the Cox regression model as recommended by Lee (5). The latter method has been used by many researchers lately. Both of these models seem unusual for estimation of the prevalence ratio. The Cox model was designed for survival analysis studies, and the Poisson regression model was designed for analyzing Poisson random variables, not binomial random variables, although the Poisson distribution would be expected to be a good approximation to the binomial distribution for large samples.
It is commonly believed that the log-binomial model has the disadvantage that it could result in a probability estimate greater than 1. However, by definition, the maximum likelihood estimates for the log-binomial model cannot result in estimated probabilities greater than 1 for covariate values within the original data set. This should be contrasted with the use of the Cox model (5), which does not estimate probabilities because it does not have an intercept in the model. The Poisson model, which has exactly the same estimated slope parameters as the Cox model plus an intercept, can be used to estimate probabilities. This is done using the equation P(Y = 1|X) = µ = exp(ß0 + ß1 x X). Since the mean value of a Poisson random variable can be greater than 1, the estimated probabilities may be greater than 1 even for covariate values in the original data set. This happens in the example presented by Deddens et al. (4), for which the maximum likelihood solution is on the boundary of the parameter space. The data are shown in table 1, and the estimates obtained are shown in table 2. For X = 10, the estimated probability using the true log-binomial maximum likelihood estimate is 1.00; the estimated probability using the log-binomial with the COPY method when c = 1,000 is 0.99; and the estimated probability using the Poisson regression model is 1.44.
|
|
REFERENCES