*Laboratory of Biometrics, Graduate School of Agriculture and Life Sciences, University of Tokyo, Tokyo, Japan;
Program in Statistical Genetics, Statistics Department, North Carolina State University;
Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, New Mexico
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The rate at which sequences change depends on a large number of factors (e.g., natural selection, population size, generation time, mutation rate, and mutation pattern). It is inconceivable that these factors are identical for all evolutionary lineages. In fact, it may be that no two evolutionary lineages share the same rate of molecular evolution.
However, closely related evolutionary lineages typically evolve at similar rates. This is because many factors that affect evolutionary rates are biological, and closely related lineages are biologically similar. Differences in rates between distantly related lineages are more likely to be large because there has been more time for divergence of the factors affecting rates.
Widely used methods for estimating divergence times assume that all evolutionary lineages experience sequence change at identical rates. Recognition of the potential problems with assuming "globally" constant rates on a phylogeny has led to the development of "local" divergence time estimation techniques that allow different portions of a phylogeny to evolve at different rates but force all branches within a particular portion of the phylogeny to evolve at the same rate (e.g., Hasegawa, Kishino, and Yano 1989
; Rambaut and Bromham 1998
; Yoder and Yang 2000
).
Sanderson (1997)
proposed a "nonparametric" method for estimating divergence times that assumes neither globally nor locally constant rates of molecular evolution. Instead, the Sanderson method has the attractive property of allowing the rate to evolve over time. The pioneering work of Sanderson was followed with Bayesian estimates of divergence times that were based on models of evolution of the rate of molecular evolution (Thorne, Kishino, and Painter 1998
; Huelsenbeck, Larget, and Swofford 2000
; Korber et al. 2000
).
In this study, we extend the Bayesian techniques for estimating divergence times, explore their behavior via simulation, and investigate the robustness of these techniques to violations of their assumptions. In addition, we assess the impact on divergence time estimation of constraints that can incorporate fossil information into the procedure.
![]() |
Method |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
For simplicity, the rate on a branch is assumed to be the average of the rates at the nodes that begin and end it. This assumption makes it technically correct to interpret our model as having the rate along a branch change in a linear fashion from the node that begins it to the node that ends it. We dislike this interpretation. We view the mean of the rates at the two nodes simply as a reasonable approximation of the average rate on a branch.
Model of Evolution of the Rate of Evolution
Given the logarithm of the rate at a node that begins a branch, we assume that the logarithm of the rate at the node ending the branch has a normal distribution. In one of our earlier implementations, the mean of the normal distribution for the logarithm of the ending rate was set equal to the logarithm of the beginning rate. For many data sets, this arrangement works well. However, when the expectation of the logarithm of the ending rate is equal to the logarithm of the beginning rate, the expectation of the ending rate is actually greater than the beginning rate. This effect on the expected ending rate may be undesirable because it predicts a tendency for rates to be higher at the tips of a tree than at the root. Our current implementation sets the mean of the normal distribution for the logarithm of the ending rate by forcing the expected ending rate to be equal to the beginning rate. The variance of the normal distribution for the logarithm of the ending rate is set equal to the time duration of the branch multiplied by a parameter . A large value of
means that there is little rate autocorrelation, and a low value implies strong rate autocorrelation. The case of
= 0 is equivalent to the molecular-clock assumption of constant evolutionary rates. Roughly, the evolutionary rate varies with the coefficient of variation
t in t time units, when t is small.
In earlier work (Thorne, Kishino, and Painter 1998
), rates were assigned to branches of a rooted tree rather than to nodes of the tree. The variance of the logarithm of the rate on a daughter branch given the logarithm of the rate on the parental branch was the product of
and the time separating the midpoints of the parental and daughter branches. A flaw of this earlier parameterization is that the correlation of rates on two sister branches should not depend on the time duration of their parental branch. In the current parameterization, the correlation of rates on two sister branches is independent of the time duration of the parental branch.
Because the hyperparameter can have a strong influence on an analysis, we add a gamma-distributed prior for
as another level to our hierarchical model. The additional level allows flexibility for the Bayesian analysis by incorporating uncertainty regarding
. We also specify the prior distribution for the rate of molecular evolution at the root node with a gamma distribution. This prior distribution is assumed to be independent of
and the internal node times on the tree.
The phrase "branch length" is employed to represent an expected amount of sequence change on a branch rather than to represent the time duration of the branch. It is prior information that combines with the branch length information to provide posterior information about node rates and divergence times. On a rooted bifurcating tree topology that relates n taxa, there are n tip nodes and n - 1 internal nodes. Because such a topology has one more node than branches, a unique set of 2n - 1 node rates cannot be identified from the node times and the 2n - 2 branch lengths. We add a restriction to the node rates so that they can be uniquely identified from the node times and the branch lengths. This is done by selecting one of the branches that emanate from the root and forcing the ending rate of this branch to be equal to the rate at its beginning. For all other branches, our model of rate evolution determines the relationship between the beginning and ending rates.
Our reasons for constraining one branch to have identical beginning and ending rates stem from the Metropolis-Hastings algorithm (Metropolis et al. 1953
; Hastings 1970
), which we adopt to sample from our posterior distributions of interest. Presence of the constraint greatly improves the ability of the Metropolis-Hastings algorithm to approximate the posterior distribution because the constraint increases the "mixing rate" of the Markov chain on which the Metropolis-Hastings algorithm relies.
Prior for Divergence Times
In earlier work (Thorne, Kishino, and Painter 1998
), the prior distribution for divergence times was a Yule process (e.g., Karlin and Taylor 1975
, pp. 119123). The Yule process can be seen as modeling origination of evolutionary lineages due to speciation. A more complicated stochastic process that reflects extinction and taxonomic sampling as well as speciation could also be incorporated (e.g., see the phylogenetic reconstruction method of Yang and Rannala [1997
]). From the standpoint of biologically interpreting the parameters of a prior for divergence times, explicit consideration of speciation and extinction is attractive. However, the processes of speciation and extinction are poorly understood. Further work is needed to characterize the suitability of models for these processes as priors for divergence times.
Our prior, a generalization of the Dirichlet distribution to rooted tree structures, is not intended as a detailed model of the speciation and extinction processes. This prior was selected because it is statistically simple and, more importantly, provides flexibility. As opposed to dictating the divergence time estimates, the prior allows the posterior to be responsive to the data.
Let Ti represent the time of node i. We assume all tip sequences are isolated at time 0. In figure 1
, this means Ti = 0 for i {0, ... , 8}. Times of internal nodes on the rooted tree are measured in terms of time units before the existence of the tips.
|
The procedure by which paths are selected depends on the number of branches that are traversed by the path. Specifically, the path that traverses the most branches is the one that is selected among all candidate paths. In cases in which multiple paths are equally optimal, our prior for divergence times is designed so that the prior will not depend on which path is arbitrarily selected (see below).
The time duration represented by each selected path is the sum of the time durations of the branches traversed by the path. If the time of the internal node that begins a path is known, then the times of other internal nodes traversed by the path can be determined from the set of proportions of the time durations of each of the branches on the path relative to the total time duration of the path.
Imagine that a selected path includes k branches. The proportion of time represented by branch i relative to the total time represented by the path will be denoted i, where i is the ith branch on the path from the initial internal node to the tip at which the path ends. The joint density of these proportions is assumed to follow a Dirichlet distribution. The probability density of a Dirichlet distribution is
| (1) |
| (2) |
As an example, consider the rooted tree in figure 1
. To sample from the divergence time prior for this case, the first step would be to generate a random value from the gamma distribution for T16, the time of the root. The path from the root node to node 0 traverses five branches. No other path from the root to a tip includes more branches. Therefore, the path from the root to node 0 could be selected first. This path can be divided into five time intervals because
| (3) |
Posterior Distribution
In the Bayesian framework, the goal is to study the posterior distribution of the parameters. The data set of aligned sequences will be denoted by X, and we assume that these sequences are related via a known rooted tree topology. Let R be a vector representing node rates, and let T be a vector denoting the times of the internal nodes. The posterior distribution for R, T, and the rate variation parameter is
| (1) |
| (2) |
The details of our MCMC implementation are somewhat tedious, and the general procedure has already been described (Thorne, Kishino, and Painter 1998
). Therefore, we omit many details here. The basic idea is to have each state of a Markov chain represent specific values of our parameters T, R, and
. In our implementation, the chain is assigned a randomly selected initial state. Thereafter, new states are randomly proposed and then randomly accepted or rejected. If a proposed state is accepted, it becomes the new state of the Markov chain. If the proposed state is rejected, the Markov chain remains at its current state. The random proposal and acceptance or rejection of proposed states is repeated many times. If the probability of accepting a proposed state is appropriately determined (see Hastings 1970
), then the stationary distribution of the resulting Markov chain will be the desired posterior distribution.
In our implementation, we cycle through different types of random proposal steps. One type produces a proposed state that is identical to the current state of the Markov chain except that the time of one internal node differs between the two states. Another proposal type is similar, except that it is the rate at a node that differs between the two states. Likewise, in a third proposal type, the two states are identical except for the value of . A fourth type generates a proposed state with branch lengths that are identical to the current state by multiplying all node rates of the current state by some randomly selected factor and then dividing all node dates by the same factor. The fifth and sixth proposal types are identical to the fourth except that the multiplication of rates is not done by the fifth type and the division of dates is not performed by the sixth type. Finally, each of these latter three proposal types is also used in a modified version that performs multiplication and/or division operations on only the times and/or the rates of nodes that are descendants of some randomly selected node on the tree.
Constraints Imposed by the Fossil Record
Analysis of an earlier model of rate evolution (see Thorne, Kishino, and Painter 1998
) yielded successful estimation of the proportion of time since an internal node of interest relative to the time since the root of the tree, but absolute times of divergence were not as well estimated. Even in the absence of the constant-rate assumption of a perfect molecular clock, constraints can assist in calibration of rates of molecular evolution. Fossil or other evidence may indicate that a certain internal node exceeds some age. Also, external evidence may indicate that a certain node must be younger than some age. Constraints are the basis of the not-very-parametric divergence time estimation technique of Sanderson (1997)
. We have adopted his idea by incorporating constraints into our method. Let C represent a set of constraints on node times. Our MCMC method is designed to approximate the posterior distribution p(T, R,
| X, C). In the presence of constraints, equation (2)
is modified so that
| (3) |
![]() |
Simulation Studies |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
In all MCMC analyses, the prior for the rate at the ingroup root node was the same as the distribution from which the rate at the ingroup root node was randomly sampled (i.e., a gamma distribution with a mean of 1 and a standard deviation of 0.5.). In all MCMC analyses performed without constraints on node times, the prior for the ingroup root time was a gamma distribution with a mean equal to the actual ingroup root time of 0.5 and with a standard deviation of 0.25. In all analyses performed in the presence of constraints on node times, the prior for the ingroup root time was the distribution that would result if a gamma distribution with a mean of 0.5 and a standard deviation of 0.25 were forced to conform to the constraints.
Simulation Studies Without Constraints on Node Times
In the first simulation, the difficulty of estimating divergence times in the absence of rate calibration is illustrated. In this case, data were simulated according to a perfect molecular clock. Therefore, rates at all nodes on the tree were identical to the randomly sampled rate at the ingroup root node. To make the situation even more favorable, the MCMC algorithm was forced to assume that = 0. Figure 3A
represents the prior distribution that was approximated by 100 MCMC runs. The thin lines depict the topology and true node times of the ingroup. There is a short horizontal line centered above or below each internal node. This line shows the average of the 100 different estimates of the posterior mean. With each of the 100 MCMC runs, a 95% prior probability interval for each internal node time was calculated. The average of the upper bounds and the average of the lower bounds for these prior intervals were calculated. These averages determine the limits of the vertical lines that extend above and below each of the averages of the estimated posterior means. Near each internal node on the tree is the number of times out of the 100 simulations in which the true node time was contained within the 95% prior interval. Because figure 3A
represents priors rather than posteriors, the same distribution is being approximated in each of the 100 runs of the MCMC routine. Therefore, these numbers should be close to 0 or close to 100 if the approximation is succeeding. Departures from 0 or 100 would be consistent with failure of the Markov chain to converge to the prior distribution in some fraction of the runs. In figure 3A,
the average of the prior means is close to the true node times. This is because the simulations were being performed under a best-case scenario. The width of the prior intervals demonstrates that the prior for divergence times was relatively diffuse.
|
Figure 3C is based on the same 100 analyses as figure 3B . In figure 3C, the posterior distributions depict the proportion of time from the tips to each internal node relative to the time from the tips to the root node. Figure 3C shows that proportional times can be well estimated. We believe that the difference between the narrow credibility intervals in figure 3C and the wide intervals in figure 3B illustrates the need for fossil or other external information to assist in the calibration of evolutionary rates.
Simulation Studies with Constraints on Node Times
Analyses depicted in figure 4
were performed under the constraint that one specific node must exceed 0.2 time units in age and must be no older than 0.3 time units in age. These constraints are depicted with horizontal lines that are centered above the constrained node. Figure 4A
was produced under the same situation as figure 3A
except that the prior distributions of divergence times were forced to be consistent with the two constraints. Figure 4A
shows that the prior distributions for divergence times are again diffuse, but nodes that are "close" to the constraints have noticeably more narrow 95% prior intervals.
|
In figure 4C
, robustness to the prior for rate variation per unit time is explored. Data sets were again simulated under a molecular clock. Instead of forcing the MCMC procedure to assume a perfect molecular clock, the prior for the rate variation parameter was a gamma distribution with a mean of 2 and a standard deviation of 2. The credibility intervals for this case are slightly wider than those in figure 4B
but are far more narrow than those of the prior (see fig. 4A
).
Figure 4D
shows the case in which data sets were simulated without a molecular clock but were analyzed by assuming a molecular clock. To simulate data, the value of was sampled from a gamma distribution with a mean of 2 and a standard deviation of 2. Divergence date estimates appear to be poor when a clock is assumed but rates are actually highly variable.
Data sets summarized in figure 4E
were generated by sampling the value of from a gamma distribution with a mean of 2 and a standard deviation of 2. In this scenario, the prior for
was a gamma distribution with a mean of 2 and a standard deviation of 2. Although the credibility intervals are wide, they are more narrow than those of the prior (fig. 4A
), and the true divergence times are captured more often than in the scenario depicted in figure 4D
.
Robustness to the Prior
We emphasize that, except for differences between the true value of and the prior, these are all best-case scenarios. Actual performance of divergence time estimation is likely to be substantially worse. For example, the assumed model of nucleotide substitution may differ from the actual model of nucleotide substitution. The impact of this difference on divergence time estimation is difficult to predict.
To show the impact of substantial deviations between the prior for divergence times and the actual times, posterior distributions for additional simulations are summarized in figure 5 . Except for the actual divergence times and the depicted constraints, all features of these scenarios are identical to the case of figure 4E . Figure 5A shows the prior distributions of divergence times for a situation in which much of the lineage splitting occurred near the tips of the trees. Figure 5B summarizes the posterior distributions for this case. Likewise, figure 5C and D, respectively, represent priors and posteriors for a situation in which much of the lineage splitting occurs near the root. Notice that the posteriors tend to be closer to the true values than do the priors. In figure 5C, the frequencies with which certain nodes are captured within the 95% prior interval are intermediate. Inspection of the MCMC output for these cases indicates that these intermediate frequencies are explained by slight fluctuations of the estimated interval endpoints among the 100 simulations.
|
Figure 6
shows a realization of branch lengths that were randomly generated with our model of rate evolution and the node times and topology depicted in figure 2
. To generate these branch lengths, was set at exactly 2 and the rate of molecular evolution at the root node was set to 1. Figure 6
is intended to illustrate the degree to which a typical set of branch lengths might deviate from being clocklike for the cases in which simulations allowed rate variation over time. Figure 6
depicts "true" branch lengths, rather than estimated branch lengths. Estimation error is likely to make branch lengths appear even less clocklike than they actually are.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Another notable way in which the HLS approach differs from our own is that the HLS treatment of rate evolution is not approximate, whereas ours is. In our approach, the branch lengths are determined by the node rates and times. In the HLS approach, the branch lengths are not completely determined by the node rates and times. For example, under the HLS model of discrete rate changes, a branch might experience exactly one rate-changing event. With only information on the rate at the node that ends the branch and the rate at the node that begins the branch, we have no information about whether the rate-changing event occurred closer to the end or the beginning of the branch. Because the branch length is affected by the exact time of the rate-changing event with the HLS procedure, the node rates and the node times are not by themselves sufficient to determine the branch lengths.
Our model could be generalized by assigning a probability density p(B | R, T) to the branch lengths for given node rates and times. This differs from our current approach in that branch lengths would not be deterministically specified from the node rates and times. In fact, this sort of generalization of equation (3) could have both the HLS model and our own as special cases. One form of p(B | R, T) would be determined by the HLS Poisson model. However, there is no reason to limit the form of p(B | R, T) to the HLS Poisson model. The probability density p(B | R, T) could be selected so as to reflect any desired combination of discrete and continuous rate change. Recent research (Bickel 2000
) suggests that other stochastic models for rate evolution should also be explored.
Intraspecific and Interspecific Data
Our dating method was designed with interspecific data sets in mind. For example, this method has recently been employed to address the timescale of eutherian evolution (Cao et al. 2000
). With interspecific data, it may be reasonable to assume that the rate of evolution of two lineages changes independently following their last common ancestral node. In contrast, this independence assumption is less plausible for intraspecific data. Rates of evolution in a large population may be slow because natural selection can overwhelm genetic drift, whereas rates of evolution in a small population may be large because genetic drift can overwhelm selection. This means that a sudden drop in population size can simultaneously increase the rate of evolution of all lineages within a population.
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Abbreviations: HLS, Huelsenbeck/Larget/Swofford; MCMC, Markov chain Monte Carlo.
2 Keywords: molecular clock
phylogeny
Markov chain Monte Carlo
Metropolis-Hastings algorithm
3 Address for correspondence and reprints: Hirohisa Kishino, Laboratory of Biometrics, Graduate School of Agriculture and Life Sciences, University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo 113-8657, Japan. E-mail: kishino{at}wheat.ab.a.u-tokyo.ac.jp
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bickel, D. R. 2000. Implications of fluctuations in substitution rates: impact on the uncertainty of branch lengths and relative-rate tests. J. Mol. Evol. 50:381390.[ISI][Medline]
Cao, Y., M. Fujiwara, M. Nikaido, N. Okada, and M. Hasegawa. 2000. Interordinal relationships and the timescale of eutherian evolution as inferred from mitochondrial genome data. Gene (in press).
Felsenstein, J. 1989. PHYLIPphylogeny inference package (Version 3.2). Cladistics 5:164166.
Hasegawa, M., H. Kishino, and T. Yano. 1989. Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea. J. Hum. Evol. 18:461476.[ISI]
Hastings, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97109.
Huelsenbeck, J. P., B. Larget, and D. L. Swofford. 2000. A compound Poisson process for relaxing the molecular clock. Genetics 154:18791892.
Huelsenbeck, J. P., and B. Rannala. 1997. Maximum likelihood estimation of phylogeny using stratigraphic data. Paleobiology 23:174180.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 2132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.
Karlin, S., and H. M. Taylor. 1975. A first course in stochastic processes. 2nd edition. Academic Press, San Diego.
Korber, B., M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinsky, and T. Bhattacharya. 2000. Timing the ancestor of the HIV-1 pandemic strains. Science 288:17891796.
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 21:10871092.[ISI]
Rambaut, A., and L. Bromham. 1998. Estimating divergence dates from molecular sequences. Mol. Biol. Evol. 15:442448.[Abstract]
Sanderson, M. J. 1997. A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol. Biol. Evol. 14:12181232.
Thorne, J. L., H. Kishino, and I. S. Painter. 1998. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15:16471657.
Yang, Z., and B. Rannala. 1997. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Mol. Biol. Evol. 14:717724.[Abstract]
Yoder, A. D., and Z. Yang. 2000. Estimation of primate speciation dates using local molecular clocks. Mol. Biol. Evol. 17:10811090.