Department of Genetics, University of Cambridge, Cambridge, England
Likelihood ratio tests (LRTs) for comparing models of sequence evolution have become popular over the last few years (Goldman 1993
; Yang, Goldman and Friday 1994, 1995
; Huelsenbeck and Crandall 1997
; Huelsenbeck and Rannala 1997
). In their simplest form, such tests compare a simpler null hypothesis (H0) with a more complex alternative hypotheses (H1) which is a generalization of H0. H0 can be derived from H1 by fixing one or more of its free parameters at particular values, and the hypotheses are described as nested. Although it is also possible to test non-nested models (Goldman 1993
), nested models are often preferred, as statistical tests are simpler to perform and their results can be easier to interpret.
The test statistic for an LRT can be written as 2 = 2 ln(H1/H0) = 2(ln(H1) - ln(H0)), where H0 and H1 are the maximum-likelihood (ML) scores under hypotheses H0 and H1, respectively. This statistic measures how much improvement H1 gives over H0, and when the hypotheses are nested, 2
will always be nonnegative. For these nested hypotheses, and under certain regularity conditions, the asymptotic distribution of 2
(i.e., for large amounts of data) will be
2k. Here, k is the number of degrees of freedom by which H0 and H1 differ, that is, the number of free parameters of H1 whose values must be fixed to derive H0 (Wald 1949
; Silvey 1975
; Felsenstein 1981;
Goldman 1993
; Yang, Goldman, and Friday 1994, 1995
). (In effect, each free parameter contributes a
21 variate to the distribution of 2
, with the sum of k independent
21 variates being distributed as
2k.) Statistical tests assessed using such
2 distributions have now become a widespread and useful tool in phylogenetics (Huelsenbeck and Crandall 1997
; Huelsenbeck and Rannala 1997
).
Recently, there has been renewed interest in testing whether the predicted 2 distribution gives a reliable estimate of the true distribution of 2
under realistic conditions (e.g., with finite sequence lengths). Whelan and Goldman (1999)
investigated cases in which the competing hypotheses were different models of nucleotide substitution. Under three specimen experimental designs (representing realistic phylogenies and nucleotide substitution processes), we found that the
2 distribution was acceptable for performing tests of the significance of parameters describing the relative rate of transition and transversion substitutions (typically denoted
; one free parameter) and describing equilibrium base frequencies (typically denoted
= (
A,
C,
G,
T); three free parameters, because of the constraint that
X
X = 1) when these parameters are estimated by ML (Whelan and Goldman 1999
). Combinations of these parameters give most of the commonly used models of nucleotide substitution. The parameter
is present in the models K2P of Kimura (1980)
and HKY of Hasegawa, Kishino, and Yano (1985)
, which reduce to the models JC of Jukes and Cantor (1969)
and FEL of Felsenstein (1981)
, respectively, when
is fixed at 1. The parameters
are present in FEL and HKY, which reduce to JC and K2P, respectively, when
is fixed at (
,
,
,
).
In contrast, for tests of the significance of the incorporation of the shape parameter of a gamma distribution used to describe among-sites variation in the rate of nucleotide substitution (Yang 1993, 1994, 1996;
typically denoted
; one free parameter), Whelan and Goldman (1999) consistently found an unacceptable fit between the predicted
2 distributions and estimates of the true distributions of 2
. We provisionally attributed this effect to the fact that reducing a hypothesis H1 incorporating a gamma distribution (denoted, e.g., JC+
) to a hypothesis H0 without a gamma distribution (in this example, JC) requires
to be fixed at
, which is on the boundary of the set [0,
) of values permitted under H1. This contravenes the conditions required for the asymptotic distribution of 2
to be
2 (see, e.g., Self and Liang 1987
). Ota and colleagues (Ota, Waddell, and Kishino 1999
; Ota et al. 2000
) have also investigated effects caused by parameters lying on their permitted boundaries. They illustrated that ML estimates of these parameters cannot be asymptotically normally distributed and how, as a consequence, the ML estimates must be biased and the distribution of 2
(2lnLR in their notation) cannot be
2. Andrews (2000) showed that parameters which lie on their permitted boundaries can also adversely affect results of nonparametric bootstraps.
We refer to a parameter which under H0 is fixed at a value on the boundary of the set of values permitted under H1 as a boundary parameter. Self and Liang (1987
, p. 608, case 5) showed that (in the absence of any other boundary parameters not being tested) the contribution to 2
of a single boundary parameter is (asymptotically) a distribution which is a 50:50 mixture of
20 and
21 distributions. The
20 distribution takes the value 0 with probability 1; consequently, this mixture distribution takes the value 0 with probability ½, and takes a value drawn from a
21 distribution with probability ½. If the statistical test being performed also tests for the significance of k - 1 other parameters which are not on their boundaries, these contribute
2k-1 to the distribution of 2
, and the complete distribution of 2
is a 50:50 mixture of
2k-1 and
2k (Self and Liang 1987
, p. 609, case 9). We denote this distribution by 2k. These are the situations of relevance to the phylogenetic tests described here.
The 2k distribution has a mean of k - ½; significance levels of observed statistics X may be calculated according to P(2k > X) = (P(2k-1 > X) + P(
2k > X))/2, where the terms to the right of the equality can be found using standard tables or computer programs (e.g., the chi2 program of the PAML package; Yang 1997
). Other features of 2 distributions are more difficult to derive and were calculated using locally written programs. If two or more parameters lie on their boundaries, the situation is more complicated (Self and Liang 1987
; Ota et al. 2000
), and if these parameters are not independent, the distribution of 2
may not be expressible as any sum of
2 distributions (e.g., Self and Liang 1987
, pp. 608609, case 8). Models incorporating more than one boundary parameter are beginning to appear in phylogenetic analyses (e.g., Huelsenbeck and Nielsen 1999
).
Using simulated data, Whelan and Goldman (1999)
performed a number of model comparisons to investigate the true distribution of 2
when the hypotheses being compared differed by various combinations of the parameters
,
, and
. We now report the results of further simulations designed to consider specifically whether the 2 distributions described by Self and Liang (1987)
and Ota et al. (2000)
are acceptable under realistic conditions for statistical comparisons of nested models where H1 incorporates a gamma distribution for rate heterogeneity and H0 does not. We again use the models JC, K2P, FEL, and HKY (using ML estimators of
in FEL and HKY, denoted FELMLE and HKYMLE in Whelan and Goldman 1999
) and their counterparts, which, in addition, include the gamma distribution (JC+
, K2P+
, FEL+
, and HKY+
). Nine model comparisons (labeled AJ in table 1
) represent all possible combinations of parameters which may then be present in H0 and H1, subject to the condition that
always be included in H1 but not H0. Thus, each comparison is always testing for the significance of including
, either singly (comparisons AD) or in combination with other parameters (EJ), and either with (BD, HJ) or without (A, EG) other (untested) parameters present. Model trees used for data simulation are those described in Whelan and Goldman (1999)
, labeled according to the original data set whose analysis gave the phylogeny and parameters used for data simulation: cytochrome b,
-globin, and mtDNA (erroneously labeled "D-loop" by Whelan and Goldman 1999
, although, in fact, it comprises two protein-coding regions and three tRNAs).
|
Ota et al. (2000)
comment on practical difficulties that can arise in iterative ML optimizations. These are due to likelihood surfaces which are almost flat, and may be particularly pronounced in cases where optimal parameter values are on or near the boundaries permitted. Possible effects are premature termination of ML iterations either before a parameter estimate reaches a boundary at which it should lie or before it escapes from a boundary at which it should not lie. We encountered some such problems in a few of the analyses reported in table 1
. Typically, difficulty in achieving an accurate ML score is greater for more complex models (e.g., H1 rather than H0) and for analyses with parameters estimated near permitted boundaries. These effects lead to slightly depressed estimates of 2
when it is near 0. We find that these effects account for our example in which the fit between our estimate of the distribution of 2
and the predicted 2 distribution is rejected as inadequate (results not shown), although even in this case the estimated mean and 95% point of the distribution of 2
are close to those of the predicted 2 distribution (table 1 , comparison D for cytochrome b).
We conclude that the theoretical (asymptotic) prediction that the parameter contributes a 21 distribution (50:50 mixture of
20 and
21) to the distribution of 2
is accurate in these examples, even for the finite-sized data sets considered, and that 2k distributions (50:50 mixtures of
2k-1 and
2k) are appropriate for performing tests involving the boundary parameter
and k - 1 additional parameters whose fixed values under H0 do not lie on the boundary permitted under H1.
Combining the results described by Ota and colleagues (Ota, Waddell, and Kishino 1999
; Ota et al. 2000
), those described by Whelan and Goldman (1999)
, and those described above, it seems certain that the use of a
21 distribution for testing the shape parameter
of the gamma distribution or other boundary parameters is inappropriate and potentially misleading. As explained by Ota et al. (2000)
, the past use of a
21 distribution instead of the correct 21 distribution (or the use of a
2k distribution instead of a 2k distribution in tests involving k - 1 other parameters) may have led to overly conservative conclusions (i.e., too great a tendency to reject H1). Although confidence in past results regarding rate heterogeneity will often be maintained (typically rejecting H0 when only H1 uses a gamma distribution to allow for such rate heterogeneity; Yang 1996
), this need not always be the case. As more complex and realistic models are compared, including some which use gamma distributions to model factors other than rate heterogeneity, observed values of 2
may be a lot smaller. For example, the failure to use appropriate 2 distributions leads to two false negative results in Huelsenbeck and Nielsen (1999
, table 2
, data set COI, second codon position, test of HKY85 vs. HKY85+
, and NADH6, third codon position, HKY85+
vs. HKY85+
r+
although we note that these comparisons are in no way crucial to those authors' findings).
|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: likelihood ratio tests
gamma distribution
maximum likelihood
model comparison
molecular evolution
phylogenetics
2 Address for correspondence and reprints: Nick Goldman, University Museum of Zoology, Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, United Kingdom. E-mail: n.goldman{at}zoo.cam.ac.uk
![]() |
literature cited |
---|
![]() ![]() ![]() |
---|
Andrews, D. W. K. 2000. Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica 68:399406.
Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368376.[ISI][Medline]
Goldman, N. 1993. Statistical tests of models of DNA substitution. J. Mol. Evol. 36:182198.[ISI][Medline]
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160174.[ISI][Medline]
Huelsenbeck, J. P., and K. A. Crandall. 1997. Phylogeny estimation and hypothesis testing using maximum likelihood. Annu. Rev. Ecol. Syst. 28:437466.[ISI]
Huelsenbeck, J. P., and R. Nielsen. 1999. Variation in the pattern of nucleotide substitution across sites. J. Mol. Evol. 48:8693.[ISI][Medline]
Huelsenbeck, J. P., and B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276:227232.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Vol. 3. Academic Press, New York.
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111120.[ISI][Medline]
Ota, R., P. J. Waddell, M. Hasegawa, H. Shimodaira, and H. Kishino. 2000. Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Mol. Evol. Biol. 17:798803.
Ota, R., P. Waddell, and H. Kishino. 1999. Statistical distribution for testing the resolved tree against star tree. Pp. 1520 in Proceeding of the annual joint conference of the Japanese Biometrics and Applied Statistics Societies. Sinfonica, Minato-ku, Tokyo.
Self, S. G., and K.-L. Liang. 1987. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Stat. Assoc. 82:605610.[ISI]
Silvey, S. D. 1975. Statistical inference. Chapman and Hall, London.
Wald, A. 1949. Note on the consistency of the maximum likelihood estimate. Ann. Math. Stat. 20:595601.[ISI]
Whelan, S., and N. Goldman. 1999. Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics. Mol. Biol. Evol. 16:12921299.
Yang, Z. 1993. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10:13961401.[Abstract]
. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306314.[ISI][Medline]
. 1996. Among-site rate variation and its impact on phylogenetic analysis. TREE 11:367372.
. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555556.
Yang, Z., N. Goldman, and A. Friday. 1994. Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. Mol. Biol. Evol. 11:316324.[Abstract]
. 1995. Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Syst. Biol. 44:384399.[ISI]