* Section of Ecology, Behavior and Evolution, Division of Biological Sciences, University of California, San Diego
Department of Statistics, University of Wisconsin
Department of Botany, University of Wisconsin
Correspondence: E-mail: johnh{at}biomail.ucsd.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Bayesian phylogenetic inference Markov chain Monte Carlo maximum likelihood reversible jump Markov chain Monte Carlo substitution models
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
One of the challenges facing the biologist is to choose a model of DNA substitution that best describes the data in hand. Generally speaking, the aim is to pick a model that adequately explains the data (in this case an alignment of DNA sequences) without introducing superfluous parameters; one wants to strike a balance between bias avoidance (achieved by using models with many parameters) and reduced variance in the parameter estimates (achieved by using models with few parameters). Fortunately, there are several methods available that allow the biologist to choose among substitution models in a statistical framework. When two nested models are compared, one can use likelihood ratio tests (Goldman 1993). Models are nested when one model is a special case of a more general model. For example, the model proposed by Jukes and Cantor (1969) is a special case of the model proposed by Kimura (1980), which introduces an additional parameter describing the transition/transversion rate bias. When the transition/transversion bias is set to one, then Kimura's model is equivalent to the Jukes-Cantor model. In such nested model comparisons, minus twice the log likelihood ratio is asymptotically distributed as a 2 distribution, where the degrees of freedom of the
2 is the difference in the number of free parameters between the general and restricted models. Phylogeneticists commonly employ a series of nested likelihood ratio tests (LRTs) to select a substitution model, either by calculating likelihood ratios by hand for a pool of candidate models or by using Modeltest (Posada and Crandall 1998), which automates this procedure. This is a reasonable approach because many of the most familiar substitution models can be arranged linearly in a nested hierarchy.
Unfortunately, not all of the interesting substitution model comparisons are between nested models. In these cases, it may make sense to use a method that does not assume nesting of models, such as Akaike's Information Criterion (AIC; Akaike 1973), which is based on information theory (specifically, the Kullback-Liebler distance between models) and introduces a penalty term for models that have more parameters. The idea is to calculate the AIC for every candidate model, and choose the model with the smallest AIC for data analysis. (One might also consider other models that achieve a low AIC score.) Methods related to the AIC include the small sample bias adjusted AIC proposed by Hurvich and Tsai (1989), a quasi-likelihood modification (QAIC) suggested by Lebreton et al. (1992), and the Bayesian information criterion (BIC; Schwarz 1978). AIC has been commonly used for choosing among a limited number of candidate substitution models, and it is implemented in the program Modeltest (Posada and Crandall 1998).
The criteria described above all work well when maximum likelihood is the criterion. In a Bayesian analysis, model choice is often guided by the Bayes factor (Kass and Raftery 1995; Raftery 1996), which is the ratio of the marginal likelihoods of two models; instead of maximizing the likelihood with respect to the parameters, the parameters are integrated over a prior probability distribution. Relatively little work has been done on Bayesian model selection in phylogenetics, with the notable exception of that by Suchard, Weiss, and Sinsheimer (2001). Their work showed how Bayes factors can be calculated for nested models using the Savage-Dickey ratio (Verdinelli and Wasserman 1995).
In this paper, we extend the pool of candidate substitution models to all possible time-reversible models. We develop a new Markov chain Monte Carlo (MCMC) algorithm that jumps between models, allowing accurate calculation of the Bayes factor for any of the models. We also evaluate the possible DNA substitution models using AIC and BIC. We show that, although the named models often do a good job of explaining much of the pattern of substitution for real data sets, (1) many other (unnamed) models also do quite well, (2) the most complicated time-reversible model is not universally chosen as best, and (3) phylogeny estimation can be performed while averaging over the possible models.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The models typically used in phylogenetic analysis are all special cases of the GTR model, involving constraints on the nucleotide frequencies or on the rate parameters. For example, the Felsenstein (1981) model constrains the rates of substitution to be equal (rAC = rAG = rAT = rCG = rCT = rGT), and the Jukes-Cantor (1969) model has the additional constraint that the nucleotide frequencies are equal. For the remainder of this article, we will assume that the base frequencies are a common parameter across all possible models, and that they are free to vary. We concentrate our attention on constraints of the substitution rate parameters. We also introduce a notation that allows any possible time-reversible model to be described. We assign index values to each of the six substitution rates, in the order AC, AG, AT, CG, CT, GT. If a model has the constraint that ri = rj, then the index value for those two rates is the same. Moreover, the index number for the first rate is always 1, and indices are labeled sequentially. So, for example, "111111" denotes the Jukes-Cantor (1969) or Felsenstein (1981) model and "121121" denotes the Kimura (1980), Hasegawa, Kishino, and Yano (1984, 1985), or Felsenstein (1984) model. There are a total of 203 such models, with the simplest being "111111" and the most complex being the GTR model, "123456." All of the possible models are shown in table 1. Of the 203 models, one of them has one substitution type (111111), 31 of them have two substitution types (e.g., 121121), 90 of them have three substitution types (e.g., 123121), 65 of them have four substitution types (e.g., 112341), 15 of them have five substitution types (e.g., 123415), and one of them has six substitution types (123456). Only a handful of the 203 models have been named1. These include 111111 (Jukes and Cantor 1969; Felsenstein 1981), 121121 (Kimura 1980; Felsenstein 1984; Hasegawa, Kishino, and Yano 1984, 1985), 121131 (Tamura and Nei 1993), 123321 (Kimura 1981), 123341 (Posada 2003), 123425 (Posada 2003), and 123456 (Tavaré 1986).
|
|
|
To improve understanding of the remainder of this article, we introduce the following additional notation describing the substitution models. Each substitution model has K groups of substitution types. The membership of the ith group is contained in the vector ki. For example, model M101 (112231) has a total of K = 3 substitution groups: k1 = (AC, AG, GT), k2 = (AT, CG), and k3 = (CT). The number of substitution types in the ith substitution group is denoted ni and the rate for the ith substitution group is denoted ri. For model M101, n1 = 3, n2 = 2, and n3 = 1. Finally, N(M) is the number of substitution groups in model M with ni > 1. For model number 101, N(M101) = 2; the GTR model (123456) has N(M203) = 0.
Choosing Among the Possible Substitution Models
A number of criteria can be used to choose among a set of d candidate models, where the candidate models are labeled M1, M2, ... , Md. Here we review a few of these methods. However, we mainly concentrate on the Bayesian framework for choosing models, showing how Bayes factors can be calculated for substitution models of interest. We will also point out how AIC and BIC can be used to choose among all possible substitution models when using maximum likelihood.
Likelihood Ratio Testing
The likelihood ratio
|
Likelihood ratio tests have proven very useful in molecular phylogenetics (see review by Huelsenbeck and Rannala 1997), where they have been used to examine questions ranging from the molecular clock hypothesis to cospeciation in hosts and parasites. However, to use the 2 approximation of the null distribution, the requirement that the models are nested restricts the hypotheses that can be compared. With non-nested models, one can use the Cox test (Cox 1962), but in these cases it is not clear which hypothesis the null distribution should be generated under. In this article, where many of the models of interest are not nested (e.g., 111222 vs. 112211), likelihood ratio tests are of only limited value.
Information Criteria
Another approach to model choice is to attempt to find a model that in some sense minimizes the distance to the "true" model. A number of methods attempt to estimate the Kullback-Liebler information, which is a measure of the distance between models. The Akaike Information Criterion (AIC; Akaike 1973) is based on the maximum likelihood score and the number of parameters for a model, and it represents an attempt to estimate the Kullback-Liebler information (distance) between the true model and the fitted model. The AIC for model i is calculated as
|
|
Model selection using AIC or BIC works by calculating the AIC/BIC scores for each candidate model and selecting the model with the lowest AIC/BIC as best. AIC and BIC have been used previously to choose among a handful of competing phylogenetic models (e.g., Posada and Crandall 1998; Yang et al. 2000).
Bayes Factors
The Bayes factor for a comparison of model M1 to model M2 is
|
The Bayes factor can also be calculated as the ratio of the posterior odds to the prior odds of the models:
|
|
|
Reversible Jump MCMC for Model Selection
We do not pursue calculation of Bayes factors using the Savage-Dickey ratio for the same reason that we do not examine likelihood ratio tests; many of the comparisons of interest involve non-nested models. Instead, we calculate Bayes factors for all possible substitution models by constructing a Markov chain that visits substitution models in proportion to their posterior probability. The joint posterior probability of all the model parameters is calculated using Bayes' theorem as
|
Likelihood
We calculate the likelihood (probability of observing the DNA sequence alignment conditional on a specific model) under the GTR + model of DNA substitution, or submodels of the GTR +
. We approximate the continuous gamma distribution for among-site rate variation using the discrete approximation of Yang (1994). We use four categories to approximate the gamma distribution. The pruning algorithm of Felsenstein (1981) was used to calculate the likelihood for a tree.
Prior
We assume that all 203 substitution models have equal prior probability. Moreover, we assume a uniform prior on unrooted tree topologies, an exponential (10) prior on branch lengths, a flat Dirichlet distribution on nucleotide frequencies, and a uniform (0, 50) prior on the gamma shape parameter for among-site rate variation. We assume that the K rates of the substitution model are a linear transformation of a flat Dirichlet prior. In particular, if Y1, ... , YK have a flat Dirichlet distribution we let ri = 6Yi/ni for i = 1, ... , K. The prior density of the rates is
|
Markov Chain Monte Carlo
The posterior probability distribution of the phylogenetic model parameters cannot be calculated analytically. We use MCMC (Metropolis et al. 1953; Hastings 1970; Green 1995) to approximate the posterior probability distribution of the phylogenetic model parameters. MCMC for phylogeny inference has been described elsewhere (see Larget and Simon 1999; Huelsenbeck et al. 2002). The general idea is to construct a Markov chain that has as its state space the parameters of the phylogenetic model and a stationary distribution that is the posterior probability distribution of the parameters. The sequence of parameter states visited by the Markov chain forms a valid, albeit dependent, sample from the posterior probability distribution of interest (Tierney 1994). In this article, we are interested in the posterior probabilities of alternative substitution models. We construct a Markov chain that, in addition to exploring the other model parameters such as the phylogenetic tree, visits alternative substitution models. The posterior probability of the ith model is approximated as the proportion of the time the Markov chain visited model i.
In each iteration of the MCMC algorithm, we pick a parameter at random to update. We update the tree and branch lengths simultaneously using the LOCAL mechanism described in Larget and Simon (1999). We update the nucleotide frequencies and substitution model parameters using a Dirichlet proposal mechanism, and we update the gamma shape parameter using a sliding window mechanism. Each of these proposal mechanisms has been described elsewhere, and we will not go into any more detail about them here. However, two of the proposal mechanisms are unique to the problem of exploring substitution models. Specifically, we have proposal mechanisms that explore the different possible substitution models by merging or splitting groups of substitution types. The main complication is that moves from one substitution model to another involve changes in the dimension of the parameter space, and the normal MCMC theory does not apply. Instead, we construct a Markov chain that jumps dimensions, using the theory described in Green (1995).
The current substitution model will be denoted M. We attempt a merge of two substitution groups with probability pM(M) and attempt a split of a substitution group into two groups with probability pS(M). When the current model is the Jukes-Cantor (111111), then the probability of attempting a merge is zero and the probability of attempting a split is set to one. When the current model is GTR (123456) the probability of attempting a split is set to zero, and the probability of a merge is set to one.
A merge move works as follows. Two of K substitution groups of the current model, M, are chosen at random. These groups are denoted ki and kj. The new group that results from the merging of these two groups is denoted k'. For example, consider a merge of the model 111222. There are only two substitution groups that can be merged: k1 = (AC, AG, AT) and k2 = (CG, CT, GT). The group that results from the merging of these two groups is k' = (AC, AG, AT, CG, CT, GT). Of course, this is the model 111111. The rates of substitution must also be chosen for the new, merged, group. For the merge move, the new rate is deterministically chosen. The rates of substitution for the two merged groups are denoted ri and rj. The new rate, r' is
|
A split move works by picking one of the N(M) groups with at least two substitution types at random. The group to be split will be designated k. The substitution types in that group are then divided randomly, with the constraint that there is at least one item in each group. This results in two groups, k'i and k'j. For example, if the current model, M, is 112334, then only two groups are available that could be potentially split: the groups with substitutions AC and AG or with substitution types CG and CT. One of these two groups would be chosen at random (with each having an equal probability of being chosen), and the substitution types in the group randomly divided into two new groups. In this case, if k = (AC, AG), then the resulting groups would be k'i = (AC) and k'j = (AG). To find the new rates for the two new groups, r'i and r'j, we first generate a uniform random variable on the interval (nir, njr), denoted u, where r is the rate of the group to be split. The new rates are then
|
|
The probability of accepting a merge or split move, R, is
|
|
|
|
|
|
Data
We examined 16 alignments of DNA sequences. Table 2 summarizes the characteristics of the data sets examined. The posterior probabilities of substitution models (and other phylogenetic model parameters) were approximated using the MCMC algorithm described above as implemented in a program written by J.P.H. PAUP* (Swofford 2002) was used to calculate the maximum likelihood scores of the 203 models for each data set. All of the data sets were analyzed using the same model: the GTR + model of DNA substitution, and submodels thereof. The MCMC algorithm was run for 10,000,000 cycles for each data set, and sampled every 1,000th cycle. Each data set was analyzed three times. For each chain, we excluded samples taken during the first million cycles as the "burn in," basing inferences on the last 9,000 samples taken in each analysis. All probabilities were based on the pooled post-burn in samples from the three chains.
|
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Generally speaking, substitution models with two to five substitution types did best. Table 4 shows the Bayes factors for the number of substitution types (K), and figure 1 shows the posterior probability of the number of substitution types. The Jukes-Cantor (1969) and Felsenstein (1981) model (111111) did poorly for all of the data sets examined in this study. The seven named models did well for 10 of the data sets, having a Bayes factor greater than one, meaning there was more posterior probability associated with the named models than there was prior probability. However, the Bayes factor for the named models was never greater than 115, and it was less than one for four of the data sets.
|
|
In this study, we ran the Markov chains assuming a uniform prior probability on all models, and Bayes factors were calculated as the ratio of the posterior odds to the prior odds in favor of the model. The posterior probabilities of the best models happened to be of intermediate value for all 16 data sets examined, so that it was straight-forward to calculate the Bayes factor for the best model. However, it is possible that the posterior probability for the best model would be approximated to be 1.0 under a uniform prior on models, or alternatively, the biologist might be interested in the Bayes factor for a model that does a very poor job of explaining the data, having a posterior probability that is approximated using MCMC to be 0.0. However, in both cases, the true posterior probability is neither 1.0 nor 0.0, but rather a number very close to 1.0 or 0.0. How does one accurately approximate the Bayes factors for very good or very poor substitution models? One solution involves seeding the prior, giving more weight to a poor model for which one wants to approximate a Bayes factor, or giving less prior weight to a good model. One then runs the MCMC algorithm using the non-uniform prior, calculating the posterior probabilities for the model of interest as the fraction of the time that the Markov chain visited that model. One can then calculate what the posterior probability of model i would have been had the analysis been performed using a uniform prior. The posterior probability of model i is
|
Application of likelihood ratio tests to a small subset of hierarchically nested models would often lead to different conclusions about the optimal model than were reached here, when considering all possible time-reversible models. For example, consider the following three nested model comparisons: Test 1, M1 (111111) versus M15 (121121); Test 2, M15 versus M40 (121131); and Test 3, M40 versus M203 (123456). Here, the likelihood ratio test statistic is asymptotically distributed as a 2 with one (tests 1 and 2) or three (test 3) degrees of freedom. At the 5% level, the null (simple) model is rejected all of the time for Test 1, half of the time for Test 2, and 13 of the 16 times for Test 3. Importantly, the most complicated model considered (M203) would often be chosen as the best model when looking at only a small subset of all possible models. A similar conclusion would be reached if one only considered the subset of models that have one, two, or six substitution types. Phylogeneticists often concentrate on only these models, probably because the models are easily implemented using the program PAUP* (Swofford 2002) and are the only possible models in the program MrBayes (Huelsenbeck and Ronquist 2001).
The models chosen as best in the 16 DNA sequence alignments examined here were all similar in one respect: virtually all of them allowed the transition rates to differ from the transversion rates. That is to say, one rarely saw a model chosen that constrained a transition rate to be the same as a transversion rate (e.g., M60 where the substitution pattern is 123313). In fact, with only two exceptions, every model in the 95% credible sets of models allowed the transition rates to be different from the transversion rates. The exceptions included the HIV-env and the vertebrate ß-globin alignments, where the best models included M25 (112212), M60 (123313), M64 (112232), M100 (112312), and M171 (112343), among others. Clearly, in almost all of the sequence alignments examined in this study, the predominant pattern is that transitions occur at a different rate than transversions, and this fact determines which models are chosen as best; the best models all appear to be minor variants of the model proposed by Kimura (1980) and Hasegawa, Kishino, and Yano (1984, 1985).
The models currently used in molecular phylogenetics are quite complicated, and are likely to become more so as biologists add parameters to the models. For example, one strategy that is used to allow a phylogenetic model to better explain a DNA sequence alignment is to partition the data, perhaps according to codon or some other biologically relevant criterion, and to allow the parameters of the model to be independently estimated for each partition (Ronquist and Huelsenbeck 2003). This strategy, although it allows the phylogenetic model to better explain the data, also adds many parameters, and it is unclear how the biologist is to sort out which partitioning scheme is best; is it best to constrain a parameter, such as the gamma shape parameter that describes among-site rate variation, to be the same for all of the data partitions, or should one allow the parameter to be different? If it is to be potentially different, should the parameter be independently estimated for all partitions, or should this parameter be constrained to be the same for some but not all of the partitions? Questions such as these are likely to become more prevalent in the future as models become more complicated and biologists become more concerned with using any knowledge about a gene to tailor the assumptions of the analysis to specific gene regions. The automated procedure described in this article has the advantage that it allows the screening of a large number of possible models when performing a phylogenetic analysis. In principle, one could set up a very complicated general model, such as one that allows parameters to be independently estimated for each data partition, and then allow the MCMC procedure to explore the different possible sub-models in proportion to their probability. Formally, the inference of phylogeny would not depend on any specific model, but would be integrated over uncertainty in the model parameters. Moreover, one would learn how the pattern of substitution differs among data partitions from the output of such an analysis.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Arndt von Haeseler, Associate Editor
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adachi, J., and M. Hasegawa. 1995. Phylogeny of whales: dependence of the inference on species sampling. Mol. Biol. Evol. 12:177-179.
Akaike, H. 1973. Information theory as an extension of the maximum likelihood principle. Pp. 267281 in B. N. Petrov and F. Csaki, eds. Second International Symposium on Information Theory. Akademiai Kiado, Budapest.
Alfaro, M. E., and S. J. Arnold. 2001. Molecular systematics and evolution of Regina and the thamnophiine snakes. Mol. Phylo. Evol. 21:408-423.[CrossRef][ISI][Medline]
Arnason, U., A. Gullberg, and A. Janke. 1997. Phylogenetic analyses of mitochondrial DNA suggest a sister group relationship between Xenartha (Edentata) and Ferungulates. Mol. Biol. Evol. 14:762-768.[Abstract]
Barns, S. M., C. F. Delwiche, J. D. Palmer, and N. R. Pace. 1996. Perspectives on archaeal diversity, thermophyly and monophyly from environmental rRNA sequences. Proc. Natl. Acad. Sci. USA 93:9188-9193.
Bell, E. T. 1934. Exponential numbers. Am. Math. Monthly 41:411-419.
Brower, A. V. Z. 2000. Phylogenetic relationships among the Nymphalidae (Lepidoptera) inferred from partial sequences of the wingless gene. Proc. R. Soc. Lond., Ser. B 267:1201-1211.[CrossRef][ISI][Medline]
Cox, D. R. 1962. Further results on tests of families of alternate hypotheses. J. R. Stat. Soc. B 24:406-424.
Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368-376.[ISI][Medline]
Felsenstein, J. 1984. Distance methods for inferring phylogenies: a justification. Evolution 38:16-24.[ISI]
Goldman, N. 1993. Statistical tests of models of DNA substitution. J. Mol. Evol. 36:182-198.[ISI][Medline]
Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736.
Green, P. J. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711-732.[ISI]
Hafner, M. S., P. D. Sudman, F. X. Villablanca, T. A. Spradling, J. W. Demastes, and S. A. Nadler. 1994. Disparate rates of molecular evolution in cospeciating hosts and parasites. Science 265:1087-1090.[ISI][Medline]
Harshman, J., C. J. Huddleston, J. P. Bollback, T. J. Parsons, and M. J. Braun. 2003. True and false Gharials: a nuclear gene phylogeny of Crocodylia. Syst. Biol. 52:386-402.[ISI][Medline]
Hasegawa, M., T. Yano, and H. Kishino. 1984. A new molecular clock of mitochondrial DNA and the evolution of Hominoids. Proc. Jpn. Acad. Ser. B 60:95-98.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160-174.[ISI][Medline]
Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97-109.[ISI]
Hayasaka, K., T. Gojobori, and S. Horai. 1988. Molecular phylogeny and evolution of primate mitochondrial DNA. Mol. Biol. Evol. 5:626-644.[Abstract]
Huelsenbeck, J. H., and B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276:227-232.
Huelsenbeck, J. P., and F. Ronquist. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17:754-755.
Huelsenbeck, J. P., B. Larget, R. E. Miller, and F. Ronquist. 2002. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 51:673-688.[CrossRef][ISI][Medline]
Hurvich, C. M., and C-L. Tsai. 1989. Regression and time series model selection in small samples. Biometrika 76:297-307.[ISI]
Jackman, T. R., A. Larson, K. de Queiroz, and J. B. Losos. 1999. Phylogenetic relationships and tempo of early diversification in Anolis lizards. Syst. Biol. 48:254-285.[CrossRef][ISI]
Jeffreys, H. 1961. Theory of Probability. Oxford University Press, Oxford, U.K.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21123 in H. N. Munro, ed. Mammalian Protein Metabolism. Academic Press, New York.
Kass, R. E., and A. E. Raftery. 1995. Bayes factors and model uncertainty. J. Am. Stat. Assoc. 90:773-795.[ISI]
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111-120.[ISI][Medline]
Kimura, M. 1981. Estimation of evolutionary distances between homologous nucleotide sequences. Proc. Natl. Acad. Sci. USA 78:454-458.[Abstract]
Larget, B., and D. Simon. 1999. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16:750-759.
Lavine, M., and M. J. Schervish. 1999. Bayes factors: what they are and what they are not. Am. Stat. 53:119-122.[ISI]
Lebreton, J-D., K. P. Burnham, J. Clobert, and D. R. Anderson. 1992. Modeling survival and testing biological hypotheses using marked animals: a unified approach with case studies. Ecol. Monogr. 62:67-118.[ISI]
Losos, J. B., T. R. Jackman, A. Larson, K. de Queiroz, and L. Rodríguez-Schettino. 1998. Contingency and determinism in replicated adaptive radiations of island lizards. Science 279:2115-2118.
Mathews, S., and M. J. Donoghue. 1999. The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286:947-950.
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. W. Teller, and E. Teller. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 21:1087-1091.[ISI]
Muse, S. V., and B. S. Gaut. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates with application to the chloroplast genome. Mol. Biol. Evol. 11:715-724.
Posada, D. 2003. Using Modeltest and PAUP* to select a model of nucleotide substitution. Pp. 6.5.16.5.14 in A. D. Baxevanis, et al., eds. Current Protocols in Bioinformatics. John Wiley & Sons, Inc., New York.
Posada, D., and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817-818.[Abstract]
Raftery, A. E. 1996. Hypothesis testing and model selection. Pp. 163187 in W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, eds. Markov chain Monte Carlo in Practice. Chapman & Hall, London.
Ronquist, F., and J. P. Huelsenbeck. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572-1574.
Schöniger, M., and A. von Haeseler. 1994. A stochastic model and the evolution of autocorrelated DNA sequences. Mol. Phyl. Evol. 3:240-247.[CrossRef][Medline]
Schwarz, G. 1978. Estimating the dimension of a model. Ann. Stat. 6:461-464.[ISI]
Suchard, M. A., R. E. Weiss, and J. S. Sinsheimer. 2001. Bayesian selection of continuous-time Markov chain evolutionary models. Mol. Biol. Evol. 18:1001-1013.
Streelman, J. T., M. E. Alfaro, M. W. Westneat, D. R. Bellwood, and S. A. Karl. 2002. Evolutionary history of the parrotfishes: biogeography, ecomorphology, and comparative diversity. Evolution 56:961-971.[ISI][Medline]
Swofford, D. L. 2002. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10:512-526.[Abstract]
Tavaré, S. 1986. Some probabilistic and statistical problems on the analysis of DNA sequences. Pp. 5786 in Lectures in Mathematics in the Life Sciences, vol. 17.
Tierney, L. 1994. Markov chains for exploring posterior distributions. Ann. Statist. 22:1701-1762.[ISI]
Van Den Bussche, R. A., R. J. Baker, J. P. Huelsenbeck, and D. M. Hillis. 1998. Base compositional bias and phylogenetic analyses: A test of the "flying DNA" hypothesis. Mol. Phylogenet. Evol. 13:408-416.
Verdinelli, I., and L. Wasserman. 1995. Computing Bayes factors using a generalization of the Savage-Dickey density ratio. J. Am. Stat. Assoc. 90:614-618.[ISI]
Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306-314.[ISI][Medline]
Yang, Z., R. Nielsen, N. Goldman, and A. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-449.