* Biological Computation and Visualization Center, Department of Biological Sciences, Louisiana State University; and Department of Biological Sciences, University at Albany
Correspondence: E-mail: dpollock{at}lsu.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: maximum likelihood posterior probability ancestral reconstruction tRNA secondary structure substitution model functional inference
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Parsimony and maximum-likelihood (ML) methods of reconstruction have been used extensively in various ancestral sequence analyses (Stewart, Schilling, and Wilson 1987; Malcolm et al. 1990; Messier and Stewart 1997; Hassanin and Douzery 1999; Hibbett and Binder 2002; Richard, Lombard, and Dutrillaux 2003; Soltis et al. 2003) and can sometimes be reliable. For example, ancestral reconstructions using parsimony were 98% accurate in predicting ancestral sequences from experimental phylogenies created by serial propagation of bacteriophage T7 in the presence of a mutagen (Hillis et al. 1992; Bull et al. 1993). Ancestral reconstruction of sequences using parsimony is, however, known to be biased for skewed base compositions (Collins, Wimberger, and Naylor 1994; Zhang and Nei 1997; Eyre-Walker 1998; Sanderson et al. 2000). The bias in parsimony-reconstructed ancestral sequences deterministically decreases the frequency of the rare base and increases that of the most common base.
Although it has been generally assumed that ML sequence reconstruction does not suffer from the same problems (Collins, Wimberger, and Naylor 1994; Zhang and Nei 1997; Eyre-Walker 1998; Sanderson et al. 2000), both ML and parsimony can sometimes fail when reconstructing quantitative traits (Schluter et al. 1997; Hormiga, Scharff, and Coddington 2000; Oakley and Cunningham 2000; Webster and Purvis 2002). ML reconstructions of continuous ancestral traits can be particularly uncertain for traits with frequent changes (Schluter et al. 1997; Cunningham, Omland, and Oakley 1998), but continuous trait reconstruction is arguably hindered much more by modeling inadequacies than by problems with inference techniques. Even with discrete traits, however, ML reconstruction has limitations; in a recent "experimental phylogenetics" analysis using PCR-generated mutations, comparisons between known ancestral sequences and those reconstructed using ML showed that while most ancestral sequences were accurately reconstructed, errors increased with the depth of the sequence in the tree (Sanson et al. 2002). Although the models used are still imperfect (Yang, Kumar, and Nei 1995; Koshi and Goldstein 1996), and reconstruction is clearly not error-free, ML is more commonly used in ancestral reconstruction, mostly due to the large biases of parsimony.
For phylogenetic analyses, ML is generally preferred over parsimony and distance methods due to its greater accuracy and incorporation of more realistic models of evolution (Huelsenbeck 1995; Yang 1996a, 1996b; Huelsenbeck and Rannala 1997; Pollock and Bruno 2000). This is especially true for highly divergent sequences, such as vertebrate mtDNAs. Posterior probability (Bayesian) methods using Markov chain Monte Carlo (MCMC) simulations have, however, recently gained considerable attention in phylogenetic analysis, because they are computationally more efficient and faster than ML methods, particularly for analyzing more complex evolutionary models and larger data sets (Huelsenbeck and Ronquist 2001; Huelsenbeck et al. 2001; Bollback 2002; Douady et al. 2003). They also allow nuisance parameter integration, generation of credibility intervals, and analysis of parameter distributions, rather than only the most likely parameters (Antezana 2003). Posterior probability methods are therefore a potentially useful alternative to parsimony and ML methods for reconstructing ancestral sequences (Koshi and Goldstein 1996; Nielsen 2002; Huelsenbeck, Nielsen, and Bollback 2003).
Statistical biases may exist even in Bayesian methods, and the behavior of Bayesian methods in ancestral sequence reconstruction (Koshi and Goldstein 1996) is not presently well known. We have implemented a modification of Nielsen's Bayesian approach (Nielsen 2002; Nielsen and Huelsenbeck 2002) whereby internal states are mapped onto the phylogeny as augmented data during the course of the Markov chain, and we use this method here to address the differences between the Bayesian and ML approaches to ancestral reconstruction. We consider a simplification of this approach in which internal states are mapped only to internal nodes (not within branches) and the number of substitutions between nodes is limited to one or two per branch per site. We call this a "conditional pathway" approach, because likelihood calculations are conditional on a reduced or restricted set of possible substitution paths. Although this simplification is unlikely to be formally correct (i.e., more than two substitutions will almost certainly occasionally occur at a single site on a single branch during the course of evolution), it is likely to be a good approximation under many circumstances. It may not affect results dramatically in any case, because the probability of substitution between any two states with only two substitutions separating them may not be much different than the probability given many more intervening substitutions. For comparison, we also implement an extremely simple approach that is independent of branch length.
Although Bayesian methodologies are relatively efficient in phylogenetics, they can still become slow when the complexity of the model increases (Huelsenbeck and Ronquist 2001), so considering this aspect of computational limitations is important. The potential benefits of our implementation include increased computational speed and a dramatic increase in the feasibility of incorporating more complex models of evolution than are currently feasible, particularly those in which instantaneous rate matrices vary among gene positions, over time, or with changing sequence context. Computational costs for standard matrix exponentiation methods will increase linearly with the number of matrices and will increase with the square of the number of states in the matrices, whereas the methods described here will not. In our experience, it is also much easier to program new models with the methods described here, and there is no need to incorporate complicated memorization schemes to save computational time. For example, in work to be described elsewhere we implemented a model in which the instantaneous rate matrices were different at every site over the entire length of the mitochondrial genome (N. M. Krishnan et al., in preparation). Our purpose here, however, is not to demonstrate the implementation of such complex models, but to demonstrate the accuracy of our conditional pathway implementation by comparing it to full likelihood calculations as implemented by standard programs. The series of computationally simple conditional pathway methods that we implemented also result in a series of calculations intermediate between those of parsimony and full likelihood calculations, and this helps to clarify the reasons and conditions under which bias in ancestral reconstruction may occur.
We tested our program by analyzing its ability to infer ancestral sequence distributions from primate mtDNA sequences and from simulated data. Parsimony was recently used to study evolutionary changes of nucleotide composition in primate mtDNA genomes (Schmitz, Ohme, and Zischler 2002), and it was suggested that nucleotide frequencies had changed from their ancestral states. We performed preliminary analysis with ML in addition to parsimony that supported this result, but the analysis also suggested that nucleotide frequencies had changed many times along various primate lineages, always in the same direction. We present evidence that these results may have been strongly influenced by bias in these ancestral sequence reconstruction methods. In an analysis of the cytochrome b (Cyt-b) and cytochrome oxidase I (COI) gene sequences from selected primates, we find that frequencies estimated from the posterior distribution of our conditional pathway methods are dramatically more similar to extant sequences than frequencies estimated using either parsimony or ML. Surprisingly, ML reconstructions are sometimes less similar to extant sequences than parsimony reconstructions. We simulated primate mtDNA evolution under plausible conditions of stationary and changing mutation processes and found that considering the entire posterior distribution produced more accurate reconstructions, even with methods involving very simple calculations; the simplifying assumption of one or two substitutions per site per branch introduces very little bias. While there was little difference between the ML and Bayesian approaches to estimating parameters of the substitution model, the ML approach of estimating a specific ancestral sequence was considerably worse than the Bayesian approach of considering the entire posterior frequency distribution. Using the predicted folding of tRNAs into cloverleaf structures, we also considered the strong possibility that bias in reconstructed sequences can affect functional inferences, a potentially important consideration for paleo-molecular biochemistry.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Likelihood Calculations
Classical phylogenetic likelihood methods integrate over all possible ancestral states and all possible branch-specific substitution histories, which requires matrix multiplications and decompositions into Eigenvalues and Eigenvectors. We avoid this here by augmenting the sequence data with mapped ancestral states and calculating probabilities of occurrences of specific events, which simplifies calculations and avoids the matrix multiplication calculations along each branch required by matrix exponentiation methods. States at internal nodes are treated as hyperparameters and updated over the course of the Markov chain. The probability of a substitution event occurring at time t and not before is given (Rice 1995) as
![]() | (1) |
![]() | (2) |
Because there is almost no information concerning the timing of the event, we integrate this probability over all possible times such that
![]() | (3) |
If we assume that two substitutions occurred between the nodes rather than one, such that state x changes to state y changes to state z, then similar calculations and integrations can be made to obtain:
![]() | (4) |
![]() | (5) |
![]() | (6) |
Running the Markov Chain
Markov chains were run using Monte Carlo techniques that included a mixture of the Metropolis Hastings algorithm (Metropolis et al. 1953; Hastings 1970) and Gibbs sampling (Geman and Geman 1984; Gelfand and Smith 1990). Parameters and augmented data states were initialized in a sequence of approximations similar to the steps in an expectation maximization (EM) algorithm (Little and Rubin 1983; Meng and Rubin 1991). A first set of states at internal nodes was obtained by moving from the tips of the tree upwards, randomly choosing a state for each internal node from the states of the two immediate descendant nodes. The model parameters were then initialized by summing substitutions over the entire tree based on the initial augmented internal states and calculating the frequencies of these substitutions as proportions of the counts for each nucleotide,
![]() | (7) |
![]() | (8) |
After initialization of the model parameters and augmented states, we ran a Markov chain in which either the internal states or rate parameters were updated with equal probability at each step. The full rate matrix was updated using the Metropolis-Hastings algorithm, in which each set of parameters in the chain, t at step, depended only on the parameters in the previous step,
t1. The parameter values for a new step were proposed based on a proposal density, q(
'|
t1), and this proposal was accepted or rejected based on the Metropolis-Hastings acceptance function,
![]() | (9) |
![]() | (10) |
In the Markov chains run for this study, the parameter priors, P(), were uniform such that P(
') = P(
t1) for all
, and the proposals were symmetric such that q(
'|
t 1) = q(
t 1|
') for all
; the acceptance probability therefore reduced to the likelihood ratio. The chains works best if the proposal densities match the shape of the target distribution, P(x), but this density is unknown. Here, the proposed changes for the rate parameters followed a normal distribution with variance determined by the acceptance probabilities, and they were thus symmetric and not biased towards any parameter values. Proposals of values out of range (e.g., rates less than zero) were reflected about the range boundary. If proposal steps are too small, the chain will mix slowly, i.e., it will move around the space slowly and converge slowly to P(x). If the proposal steps are too large the acceptance rate will be low because the proposals are likely to land in regions of much lower probability density. Since the appropriate size of the proposal step depends on the data set being used, short simulations with 50 different window range values were run for 200 iterations prior to starting each chain to determine appropriate parameter proposal window sizes. The window sizes for proposals were fixed at values for which 60%80% of the proposals from the initial point were accepted. A full matrix update was proposed for the rate matrix in an MCMC generation and accepted according to the Metropolis-Hastings criterion.
States at internal nodes were updated using a Gibbs sampler (Geman and Geman 1984; Gelfand and Smith 1990; Wang, Rutledge, and Gianola 1994; Liu, Neuwald, and Lawrence 1995; Firat, Theobald, and Thompson 1997). An initial internal node was picked randomly and a new state was calculated from the probability density of substituting to or from the states at the three surrounding nodes. The remaining internal nodes were then updated in a similar fashion, moving outward from the initially chosen node. Because each new state was sampled from the conditional posterior density, the randomly sampled state was always accepted.
Chain Convergence Diagnostics
After initialization, the Markov chain was run for 2,000 iterations until equilibrium, at which point the initial values no longer affect the current values of the model parameters. These "burn-in" samples prior to chain convergence were discarded and excluded from analyses. Chain convergence was confirmed for likelihood and all substitution parameters. To determine whether the chains indeed converged to a stationary distribution, we ran three parallel chains with over-dispersed starting values for the transition matrix. Convergence was confirmed (Gelman et al. 1992; Gelman and Rubin 1996) when the within-chain variance (WT) was equal to the estimated asymptotic variance (). If T is the number of points generated in a chain and N is the total number of chains, then the among-chain variance is
![]() | (11) |
![]() | (12) |
![]() | (13) |
![]() | (14) |
For BL, sampling continued for 25,000 generations, while for B1 and B2 it continued for 50,000 generations. Nucleotide frequencies and nucleotide ratios were calculated at each internal node and averaged across all sampled points. The effective sample size (NEff) was calculated as
![]() | (15) |
![]() | (16) |
Parsimony, Maximum-Likelihood, and Bayesian Estimation
To contrast results from the Markov chains and methods described above with more familiar methods, we performed parsimony and ML ancestral reconstructions using PAUP* 4.0 (Swofford 2000). In addition to estimating the frequencies for each site, we recorded the maximum-likelihood value for the chains run under the BL, B1, and B2 approaches. Although we present primarily the ML and parsimony results from PAUP* and the posterior distribution estimates from our own program, there was not a qualitative difference between the biases produced from the ML estimate with our method or with PAUP*, or between assuming either constant rates or gamma-distributed rates in PAUP*. An important technical point worth clarifying here is that bias from the optimization methods is a result of choosing a particular nucleotide as "best," as opposed to tracking the entire distribution. We used the top choice for parsimony reconstruction produced by PAUP* without considering alternative equally-parsimonious solutions; consideration of these alternatives should not change the bias of parsimony because PAUP* chooses randomly among equally-parsimonious solutions, and hence when one looks across many sites one gets a fair estimate of the performance of the method.
Functional Test
Primate mitochondrial tRNAs were aligned using ClustalX (Thompson, Higgins, and Gibson 1994), and tRNAscan-SE (www.genetics.wustl.edu/eddy/tRNAscan-SE/) was used to obtain predicted secondary structures (Lowe and Eddy 1997). Only perfectly aligned and consistently paired sites were considered in our analyses, meaning that sites in the alignment were discarded if they included gaps, if they included loops in any of the predicted secondary structures, or if they were paired with different sites in predicted secondary structures from different species. These alignment and pairing criteria were necessary to avoid alignment ambiguity and to avoid changes in the base-pairing context, which our approach cannot accommodate. Thus, out of about 22,000 aligned sites, 13,803 sites did not have gaps and only 7,740 sites were consistently paired in predicted secondary structures. Of these, 3,360 were variable across the data set. The alignments for six tRNAs (tRNA-Gln, tRNA-Glu, tRNA-Ile, tRNA-Met, tRNA-Leu4, and tRNA-Pro) were used in their entirety, whereas tRNA-Tyr aligned poorly and contributed few sites. The base composition variability at a site was measured with the Shannon index at that site across the species in the study, where pi is the frequency of nucleotide i (A, C, G, or T) at that site. The Shannon index was also used to estimate the ambiguity of the posterior probability distribution for the inferred nucleotide state at each internal node at each site.
Simulations of Constant and Variable Evolution
For the constant evolution simulations, evolution was stationary along each branch on the primate phylogeny. Hence, simulations were performed under the most likely model for a gene by starting at the deepest node and keeping the rate matrix and equilibrium frequencies constant. Under the variable model of evolution, the average of all inferred ancestral node frequencies was used for all the internal branches, while the external branches were simulated using the nearest tip frequencies. The ML rate parameters were kept constant throughout. The frequencies observed in the simulations were recorded for each base and for each internal node (bn), and reconstructions (
bn) were made using parsimony, ML, BL, B1, and B2. Differences between the reconstructed and simulated frequencies for each base (b) and for each internal node (n) were used to estimate the bias (
bn
bn). The total bias in the frequency reconstruction was summarized using the mean squared error (MSE):
![]() | (17) |
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
|
There were 3,360 consistently paired nucleotides at 15 internal nodes for variable sites, and Bayesian integrations were more compatible with base pairing than ML in 1,096 cases (32.6%). To understand which sites were contributing to this effect, we classified reconstructions according to the base composition variability of the site and how ambiguously the node and site combination was reconstructed (table 5). The percentage of cases in which the integrated posterior compatibility was better than the ML reconstruction varied according to the extent of base composition variability at a site and the ambiguity of nucleotide reconstruction (table 5).
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Based on the predicted effect on tRNA structure, we infer that the cumulative effects of ancestral reconstruction biases can be important for functional inference. Comparisons between evolutionary models (GTR, "parsimony"), full or conditional pathway likelihood calculations (BL, B1, B2, ML), methods of inferring ancestors (Bayesian, ML, parsimony), and programs (PAUP*, our programs) help to clarify the nature of the bias and show that it is not an artifact of any particular set of procedures. Differences between tip sequences and ancestral reconstructions in primates were consistent with expected biases produced by ML and parsimony. Clearly, the idea that ancestral primates evolved from radically different frequencies than those seen today (Schmitz, Ohme, and Zischler 2002) is no longer tenable, since Bayesian estimates of ancestral frequencies are similar to extant sequences, and there is only a small amount of bias in Bayesian reconstructions whether the evolutionary process is variable or constant. Nucleotide frequencies have clearly changed during primate evolution (fig. 2), but not by nearly as much as are inferred from ML and parsimony reconstructions, and not in consistent and convergent directions along lineages leading to tip sequences.
The effects of reconstruction bias are not limited to errors in reconstructing nucleotide frequencies, but they can lead to serious bias and inaccuracies in functional predictions. All else being equal, one would normally predict that integrating base-pairing potential over posterior probabilities would yield considerably less complementarity than would optimization, since with canonical base-pairing three out of four of the possible matches are suboptimal. The observation that Bayesian estimates of ancestral tRNA base pairs are better than ML estimates in 20%40% of the cases is disturbing enough, but the fact that at sites with highly variable base composition they can be better in 85% of the cases has sobering implications for reconstruction enthusiasts. We can make predictions that more variable sites and more ambiguous reconstructions are likely to suffer from the greatest amount of bias, but it does not seem possible to accept reconstruction of ancestral conditions without question, even when the posterior probability of a particular reconstruction is high. If the measured functional features are correlated with particular nucleotides (e.g., RNA secondary structure stability is likely correlated with GC content), then functional interpretations will be biased. Any situation where physicochemical properties must be matched or balanced, as is the case with nucleotide pairing in RNA secondary structure, will also be biased.
Although we analyzed nucleotide content here, there is no reason to believe that the results cannot be generalized to amino acid sequences, and therefore to reconstruction of functional properties in ancestral proteins. For example, Gaucher et al. (2003) recently concluded not only that the common ancestor of all elongation factors of the bacterial Tu family proteins (Ef-Tu) was thermophilic rather than mesophilic, but also, surprisingly, that the common ancestor of all mesophiles was thermophilic, too. It is possible that mesophiles were derived from thermophiles, but if the last common ancestor of mesophiles was thermophilic, mesophily must have arisen in parallel at least twice among the descendents of this ancestor, and all thermophilic descendents must have gone extinct (or, at least, not have been sampled by Gaucher et al. [2003]).
Although great care was taken in the study by Gaucher et al. (2003) to consider alternative reconstructions at ambiguous nodes, our results strongly imply that extremely biased reconstructions can appear certain precisely because of the bias and that effects of bias may be cumulative. If thermostability is correlated with whichever amino acids are favored in a biased reconstruction, then the inference that the ancestral mesophile was thermophilic (and thus the inference of multiple parallel derivations of mesophily) would be false. If this is the case, consolation may be found in the possibility that reconstruction of ancestors may then be a profitable means to produce thermotolerant proteins from relatively less stable descendants. An obvious means to alleviate some (but not all) of these considerations in future studies would be to take care to maintain amino acid frequencies for all classes or conservation levels within the protein. This will reduce frequency bias, but problems with incorrect functional inference unfortunately may still remain due to interactions among sites (e.g., Pollock, Taylor, and Goldman 1999).
Our results also provide an interesting comparison concerning the effects of different assumptions on likelihood maxima and on reconstruction biases. Our simplest approach, BL, was similar to parsimony in that branch lengths were ignored, although incorporation of variation of rates among substitution types provided more flexibility than the standard parsimony algorithm. For both protein-coding genes, the likelihood maxima for this method were around 4,000 log likelihood units worse than the maxima for the methods with branch lengths (table 2), providing strong evidence to reject the hypothesis that branch lengths do not matter.
Allowing two substitutions per branch per site rather than only one improved the log likelihood maxima by about 800900 units, and allowing an infinite number of substitutions per branch improved the maxima by another 300400 units. In many ways this is not surprising, because there is no theoretical justification for limiting the number of substitutions per branch, but it is interesting to note that incorporating a nonreversible model of evolution while limiting the substitutions to two per site per branch results in likelihood maxima that are 200 units better than the maxima for reversible models with an infinite number of substitutions allowed. The assumption of a reversible model is usually made for computational convenience, rather than because of any compelling theoretical justification. For the methodology developed here, nonreversible models do not have any greater computational burden than reversible models, so a nonreversible model limited to two substitutions per branch per site may be both computationally and statistically more justified. We developed this approach to allow incorporation of more complex and biologically realistic models without undue computational burden, so it is encouraging that the assumptions made result in small likelihood reductions that are easily compensated by other means, and that the reconstructions are only slightly divergent from extant or simulated nucleotide frequencies.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Antezana, M. 2003. When being "most likely" is not enough: examining the performance of three uses of the parametric bootstrap in phylogenetics. J. Mol. Evol. 56:198222.[CrossRef][ISI][Medline]
Arnason, U., J. A. Adegoke, K. Bodin, E. W. Born, Y. B. Esa, A. Gullberg, M. Nilsson, R. V. Short, X. Xu, and A. Janke. 2002. Mammalian mitogenomic relationships and the root of the eutherian tree. Proc. Natl. Acad. Sci. USA 99:81518156.
Arnason, U., A. Gullberg, A. S. Burguete, and A. Janke. 2000. Molecular estimates of primate divergences and new hypotheses for primate dispersal and the origin of modern humans. Hereditas 133:217228.[ISI][Medline]
Arnason, U., A. Gullberg, and A. Janke. 1998. Molecular timing of primate divergences as estimated by two nonprimate calibration points. J. Mol. Evol. 47:718727.[ISI][Medline]
Arnason, U., A. Gullberg, and X. Xu. 1996. A complete mitochondrial DNA molecule of the white-handed gibbon, Hylobates lar, and comparison among individual mitochondrial genes of all hominoid genera. Hereditas 124:185189.[ISI]
Arnason, U., and A. Janke. 2002. Mitogenomic analyses of eutherian relationships. Cytogenet Genome Res 96:2032.[CrossRef][ISI][Medline]
Beardsley, P. M., A. Yen, and R. G. Olmstead. 2003. AFLP phylogeny of Mimulus section Erythranthe and the evolution of hummingbird pollination. Evol. Int. J. Org. Evol. 57:13971410.
Benner, S. A. 2002. The past as the key to the present: resurrection of ancient proteins from eosinophils. Proc. Natl. Acad. Sci. USA 99:47604761.
Bleiweiss, R. 1998. Origin of hummingbird faunas. Biol. J. Linnean Soc. 65:7797.[CrossRef][ISI]
Bollback, J. P. 2002. Bayesian model adequacy and choice in phylogenetics. Mol. Biol. Evol. 19:11711180.
Bull, J. J., C. W. Cunningham, I. J. Molineux, M. R. Badgett, and D. M. Hillis. 1993. Experimental molecular evolution of bacteriophage-T7. Evolution 47:9931007.[ISI]
Collins, T. M., P. H. Wimberger, and G. J. P. Naylor. 1994. Compositional bias, character-state bias, and character-state reconstruction using parsimony. Syst. Biol. 43:482496.[ISI]
Cunningham, C. W., K. E. Omland, and T. H. Oakley. 1998. Reconstructing ancestral character states: a critical reappraisal. Trends Ecol. Evol. 13:361366.[CrossRef][ISI]
Douady, C. J., F. Delsuc, Y. Boucher, W. F. Doolittle, and E. J. Douzery. 2003. Comparison of Bayesian and maximum likelihood bootstrap measures of phylogenetic reliability. Mol. Biol. Evol. 20:248254.
Eyre-Walker, A. 1998. Problems with parsimony in sequences of biased base composition. J. Mol. Evol. 47:686690.[ISI][Medline]
Faith, J. J., and D. D. Pollock. 2003. Likelihood analysis of asymmetrical mutation bias gradients in vertebrate mitochondrial genomes. Genetics 165:735745.
Firat, M. Z., C. M. Theobald, and R. Thompson. 1997. Univariate analysis of test day milk yields of British Holstein-Firesian heifers using Gibbs sampling. Acta Agric. Scand. Sect. A, Anim. Sci. 47:213220.[ISI]
Gaucher, E. A., J. M. Thomson, M. F. Burgan, and S. A. Benner. 2003. Inferring the palaeoenvironment of ancient bacteria on the basis of resurrected proteins. Nature 425:285288.[CrossRef][ISI][Medline]
Gelfand, A. E., and A. F. M. Smith. 1990. Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85:398409.[ISI]
Gelman, A., and D. B. Rubin. 1996. Markov chain Monte Carlo methods in biostatistics. Stat. Methods Med. Res. 5:339355.[Medline]
Gelman, A., D. B. Rubin, J. B. Carlin, and H. S. Stern. 1992. Bayesian data analysis. Chapman and Hall, London.
Geman, S., and D. Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6:721741.[ISI]
Giannasi, N., R. S. Thorpe, and A. Malhotra. 2000. A phylogenetic analysis of body size evolution in the Anolis roquet group (Sauria: Iguanidae): character displacement or size assortment? Mol. Ecol. 9:193202.[CrossRef][ISI][Medline]
Goodman, M., C. A. Porter, J. Czelusniak, S. L. Page, H. Schneider, J. Shoshani, G. Gunnell, and C. P. Groves. 1998. Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. 9:585598.[CrossRef][ISI][Medline]
Hassanin, A., and E. J. P. Douzery. 1999. Evolutionary affinities of the enigmatic saola (Pseudoryx nghetinhensis) in the context of the molecular phylogeny of Bovidae. Proc. R. Soc. Lond. B 266:893900.[CrossRef][ISI][Medline]
Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97109.[ISI]
Hibbett, D. S., and M. Binder. 2002. Evolution of complex fruiting-body morphologies in homobasidiomycetes. Proc. R. Soc. Lond. B 269:19631969.[CrossRef][ISI][Medline]
Hillis, D. M., J. J. Bull, M. E. White, M. R. Badgett, and I. J. Molineux. 1992. Experimental phylogenetics: generation of a known phylogeny. Science 255:589592.[ISI][Medline]
Horai, S., K. Hayasaka, R. Kondo, K. Tsugane, and N. Takahata. 1995. Recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs. Proc. Natl. Acad. Sci. USA 92:532536.[Abstract]
Hormiga, G., N. Scharff, and J. A. Coddington. 2000. The phylogenetic basis of sexual size dimorphism in orb-weaving spiders (Araneae, Orbiculariae). Syst. Biol. 49:435462.[CrossRef][ISI][Medline]
Huelsenbeck, J. P. 1995. The performance of phylogenetic methods in simulation. Syst. Biol. 44:1748.[ISI]
Huelsenbeck, J. P., R. Nielsen, and J. P. Bollback. 2003. Stochastic mapping of morphological characters. Syst. Biol. 52:131158.[ISI][Medline]
Huelsenbeck, J. P., and B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276:227232.
Huelsenbeck, J. P., and F. Ronquist. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17:754755.
Huelsenbeck, J. P., F. Ronquist, R. Nielsen, and J. P. Bollback. 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:23102314.
Ingman, M., H. Kaessmann, S. Paabo, and U. Gyllensten. 2000. Mitochondrial genome variation and the origin of modern humans. Nature 408:708713.[CrossRef][ISI][Medline]
Karlin, S., E. S. Mocarski, and G. A. Schachtel. 1994. Molecular evolution of herpesviruses: genomic and protein sequence comparisons. J. Virol. 68:18861902.[Abstract]
Koshi, J. M., and R. A. Goldstein. 1996. Probabilistic reconstruction of ancestral protein sequences. J. Mol. Evol. 42:313320.[ISI][Medline]
Krawczak, M., A. Wacey, and D. N. Cooper. 1996. Molecular reconstruction and homology modelling of the catalytic domain of the common ancestor of the haemostatic vitamin-K-dependent serine proteinases. Hum. Genet. 98:351370.[CrossRef][ISI][Medline]
Little, R. J. A., and D. B. Rubin. 1983. On jointly estimating parameters and missing data by maximizing the complete-data likelihood. Am. Stat. 37:218220.[ISI]
Liu, J. S., A. F. Neuwald, and C. E. Lawrence. 1995. Bayesian models for multiple sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 90:11561170.[ISI]
Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25:955964.
Maddison, D. R., and W. P. Maddison. 2000. MacClade 4: Analysis of phylogeny and character evolution. Sinauer Associates, Sunderland, Mass.
Malcolm, B. A., K. P. Wilson, B. W. Matthews, J. F. Kirsch, and A. C. Wilson. 1990. Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing. Nature 345:8689.[CrossRef][ISI][Medline]
Meng, X. L., and D. B. Rubin. 1991. Using EM to obtain asymptotic variancecovariance matricesthe SEM algorithm. J. Am. Stat. Assoc. 86:899909.[ISI]
Messier, W., and C. B. Stewart. 1997. Episodic adaptive evolution of primate lysozymes. Nature 385:151154.[CrossRef][ISI][Medline]
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. Equations of state calculations by fast computating machines. J. Chem. Phys. 21:10871092.[ISI]
Nielsen, R. 2002. Mapping mutations on phylogenies. Syst. Biol. 51:729739.[CrossRef][ISI][Medline]
Nielsen, R., and J. P. Huelsenbeck. 2002. Detecting positively selected amino acid sites using posterior predictive P-values. Pac. Symp. Biocomput. 7:576588.
Noor, M. A., and J. C. Larkin. 2000. A re-evaluation of 12S ribosomal RNA variability in Drosophila pseudoobscura. Mol. Biol. Evol. 17:938941.
Oakley, T. H., and C. W. Cunningham. 2000. Independent contrasts succeed where ancestor reconstruction fails in a known bacteriophage phylogeny. Evolution 54:397405.[ISI][Medline]
Pauling, L., and E. Zuckerkandl. 1963. Molecular restoration studies of extinct forms of life. Acta Chem. Scand. 17:916.[ISI]
Pollock, D. D., and W. J. Bruno. 2000. Assessing an unknown evolutionary process: effect of increasing site-specific knowledge through taxon addition. Mol. Biol. Evol. 17:18541858.
Pollock, D. D., W. R. Taylor, and N. Goldman. 1999. Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287:187198.[CrossRef][ISI][Medline]
Rice, J. A. 1995. Mathematical statistics and data analysis. Duxbury Press, Belmont, Calif.
Richard, F., M. Lombard, and B. Dutrillaux. 2003. Reconstruction of the ancestral karyotype of eutherian mammals. Chromosome Res. 11:605618.[CrossRef][ISI][Medline]
Robinson, D. M., D. T. Jones, H. Kishino, N. Goldman, and J. L. Thorne. 2003. Protein evolution with dependence among codons due to tertiary structure. Mol. Biol. Evol. 20:16921704.
Sanderson, M. J., M. F. Wojciechowski, J. M. Hu, T. S. Khan, and S. G. Brady. 2000. Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants. Mol. Biol. Evol. 17:782797.
Sanson, G. F., S. Y. Kawashita, A. Brunstein, and M. R. Briones. 2002. Experimental phylogeny of neutrally evolving DNA sequences generated by a bifurcate series of nested polymerase chain reactions. Mol. Biol. Evol. 19:170178.
Schluter, D., T. Price, A. O. Mooers, and D. Ludwig. 1997. Likelihood of ancestor states in adaptive radiation. Evolution 51:16991711.[ISI]
Schmitz, J., M. Ohme, and H. Zischler. 2000. The complete mitochondrial genome of Tupaia belangeri and the phylogenetic affiliation of scandentia to other eutherian orders. Mol. Biol. Evol. 17:13341343.
. 2002. The complete mitochondrial sequence of Tarsius bancanus: evidence for an extensive nucleotide compositional plasticity of primate mitochondrial DNA. Mol. Biol. Evol. 19:544553.
Soltis, D. E., A. E. Senters, M. J. Zanis, S. Kim, J. D. Thompson, P. S. Soltis, L. P. R. De Craene, P. K. Endress, and J. S. Farris. 2003. Gunnerales are sister to other core eudicots: implications for the evolution of pentamery. Am. J. Bot. 90:461470.
Stewart, C. B., J. W. Schilling, and A. C. Wilson. 1987. Adaptive evolution in the stomach lysozymes of foregut fermenters. Nature 330:401404.[CrossRef][ISI][Medline]
Swofford, D. L. 2000. Phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
Wang, C. S., J. J. Rutledge, and D. Gianola. 1994. Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genet. Sel. Evol. 26:91115.[ISI]
Webster, A. J., and A. Purvis. 2002. Testing the accuracy of methods for reconstructing ancestral states of continuous characters. Proc. R. Soc. Lond. B 269:143149.[CrossRef][ISI][Medline]
Xu, X., and U. Arnason. 1996. A complete sequence of the mitochondrial genome of the western lowland gorilla. Mol. Biol. Evol. 13:691698.[Abstract]
Yang, Z. 1996a. Among-site rate variation and its impact on phylogenetic analyses. Tree 11:367371.
. 1996b. Phylogenetic analysis using parsimony and likelihood methods. J. Mol. Evol. 42:294307.[ISI][Medline]
Yang, Z., S. Kumar, and M. Nei. 1995. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141:16411650.
Zhang, C., M. Zhang, J. Ju et al. (11 co-authors). 2003. Genome diversification in phylogenetic lineages I and II of Listeria monocytogenes: identification of segments unique to lineage II populations. J. Bacteriol. 185:55735584.
Zhang, J., and M. Nei. 1997. Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. J. Mol. Evol. 44:S139S146.[ISI][Medline]
Zhang, J., and H. F. Rosenberg. 2002. Complementary advantageous substitutions in the evolution of an antiviral RNase of higher primates. Proc. Natl. Acad. Sci. USA 99:54865491.