*Max-Planck-Institut für biophysikalische Chemie, Göttingen, Germany; and Max-Planck-Institut für evolutionäre Anthropologie, Leipzig, Germany
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In general, any quartet-based evaluation method of phylogenetic signal is implementable in the setup of LM. We coin the general principle quartet-mapping. In this paper, we extend the LM procedure by combining its principle with the method of statistical geometry in sequence space (Eigen, Winkler-Oswatitsch, and Dress 1988
). Both LM and its extension to statistical geometry return the coordinates of a point which is mapped into the same graph. Applied to the same sequences, the two methods can therefore be used to directly compare the positions of the points and consequently the proposed topologies of the sequences. One should, however, notice a subtle difference between the two approaches. LM is built on the foundation of a stochastic model of sequence evolution, and thus the barycentric coordinates have the statistical interpretation of posterior probabilities. With the extension of this approach to statistical geometry in sequence space or maximum parsimony, for example, a clear statistical interpretation of the coordinates is lacking. However, the statistical properties of the two quartet-mapping procedures with respect to their ability to reconstruct the true evolutionary history have not been investigated so far. Although in numerous investigations it has been shown that maximum likelihood (ML) is generally robust even when evolution assumptions are violated in the reconstruction model (Gaut and Lewis 1995
; Huelsenbeck 1995
; Yang 1996
), its performance in the LM framework is not clear. One explicit question is that of support for the correct topology, especially for the star tree. In principle, a likelihood ratio test can be constructed for the star tree versus a tree phylogeny from the ML estimates (Churchill, von Haeseler, and Navidi 1992
; Strimmer, Goldman, and von Haeseler 1997
; Yang and Rannala 1997
; Whelan and Goldman 1999
; Ota et al. 2000
). However, it has been shown that under the null hypothesis, the likelihood ratio test's statistics deviate from the usual
2 approximation (Goldman 1993
; Gaut and Lewis 1995
), and thus the test is unreliable. The bootstrap test as introduced into the phylogenetic world by Felsenstein (1985) tends to support the tree favored by the tree-making method (Zharkikh and Li 1992
). Another test, called the interior-branch test, computes the standard error of estimates of edge lengths (Nei, Stephens, and Saitou 1985
; Li 1989
; Rzhetsky and Nei 1992
). Here, the emphasis of the test lies on the assessment of the significance of a positive branch length by means of a normal deviate test. Both the bootstrap test and the interior branch test have been shown to become conservative for starlike trees (Sitnikova 1996
).
In this paper, we use the quartet-mapping approach of ML and statistical geometry to investigate the following two questions: (1) Which tree shapes occur for which evolution models? (2) Given that the tree shape is correct, when is an interior branch length reliably positive? For these investigations, we use two approaches: We simulate sequence data evolving along a wide range of tree models under a broad range of parameter settings, including differences in the ratio of transition to transversion substitutions, discrete gamma-distributed site variation, unequal base composition, and unequal rate matrix entries. From the sequences, the simplex points are computed using the ML estimates as well as the statistical geometry parameters. We compare the positions of the points in the simplex relative to each other, and we compare the points with the true points as obtained from the tree models.
In addition, we compute the expected coordinates of the simplex point obtained from the statistical geometry parameters using the Jukes-Cantor (Jukes and Cantor 1969
) and the Kimura (1980)
two-parameter substitution models.
We investigate the behavior of LM and its extension to statistical geometry under violation of the evolutionary model. We also compare the efficiencies of both methods in distinguishing between starlike trees and fully resolved trees. We investigate under which circumstances a test for a positive interior branch length can be constructed without extensive statistical and computational efforts.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
![]() |
|
|
|
Let us exemplify the quartet-mapping approach in more detail for the method of statistical geometry in sequence space (Eigen, Winkler-Oswatitsch, and Dress 1988
). This method computes the sequence space graph of quartets of aligned sequences by analyzing the positionwise entries of the sequences. Let A, B, C, and D be a quartet of aligned sequences of length L, whose characters are drawn from an alphabet
of finite length. For nucleotide sequences, there are 15 possible position categories
j:
|
For a given quartet of sequences, one sums the number of positions dj that belong to j. Note that the sum of the 15 dj values is equal to L, the length of the alignment. The underlying topology of divergence of the quartet is obtained from the "informative" categories. A position category is considered informative if it helps to distinguish between different tree shapes. For statistical geometry, the configurations
6, ... ,
14 are informative. Configurations
6,
9, and
14 support tree T1,
7,
10, and
13 support tree T2; and
8,
11, and
12 support tree T3.
We define the supports GM1,
GM2, and
GM3 of the three perfect tree topologies T1, T2, and T3, respectively, as
|
An extension of the method of statistical geometry takes a weight matrix M of the alphabet into account (unpublished data). Each entry in the matrix M gives a dissimilarity measure between the two respective symbols of the alphabet. Each position in the alignment of the four sequences A, B, C, and D is then analyzed separately as follows: Let Al, Bl, Cl, and Dl be the entries at position l of A, B, C, and D, respectively. Then the four entries define six pairwise distances, M(Al, Bl), M(Al, Cl), M(Al, Dl), M(Bl, Cl), M(Bl, Dl), and M(Cl, Dl), and three distance sums, M(Al, Bl) + M(Cl, Dl), M(Al, Cl) + M(Bl, Dl), and M(Al, Dl) + M(Bl, Cl). We order these three distance sums by magnitude and label them maxl, medl, and minl, with maxl medl
minl. The six pairwise distances together with maxl are used to compute
|
|
|
Since LM and GM are based on the same graph, it is now possible to directly compare various statistical properties of these two tree evaluation methods.
Comparison of Likelihood-Mapping and Geometry-Mapping
We conducted both a theoretical analysis and computer simulations of four-taxon trees, in which we sampled a large proportion of the complete parameter space. The model trees are shown in figure 3
. Note that the branch lengths are expected numbers of nucleotide substitutions per site.
|
For the simulations, various cases of the HKY model (Hasegawa, Kishino, and Yano 1985
) and a fully reversible REV model (Yang 1994
) were assumed for the sequence evolution. The cases used, together with the model trees, the branch lengths, and the models assumed in the quartet reconstruction, are listed in table 1
. For the fully reversible model, we used the following the "instantaneous nucleotide substitution rate" matrix Q:
|
|
|
LM was used as implemented in PUZZLE, version 3.0 (Strimmer and von Haeseler 1996
). For the analyses of the sequences, the HKY model with the default option of no site variation was used as the reconstruction model for all simulations. Also prespecified was the transition/transversion parameter
according to the model used. Thus, in the presence of a gamma distribution of the sites, the reconstruction model leads to a violation of the assumption. Also, when sequences were generated under a fully reversible model, REV, the assumption that all transitions and all transversions had equal substitution rates was violated in the reconstruction process.
GM was used as implemented in STATGEOM, version 2.0. For all simulation models with a transition/transversion parameter = 4.0 (where
= 2s/v; see appendix C) the weighted GM method with a weight matrix M of the alphabet taking this
ratio into account was used. In the other cases, the unweighted GM method was applied.
In the star topology simulations, the percentage of simplex points placed (correctly) into the star area, as well as the percentage of points mapped (incorrectly) into either of the three tree areas, was recorded.
For the clock and the nonclock trees the percentage of simplex points mapped into the correct tree area was computed.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Given the outer branch lengths a and b, the coordinates of the simplex point can be used to calculate the area into which the simplex point is mapped as a function of the inner branch length c.
In the following sections, we will separately analyze the star tree, i.e., the case in which the expected inner branch length c is equal to zero, and the trees with positive inner branch lengths.
Success in Predicting the Star Tree
Figure 5
shows the results of quartet-mapping for the star tree when HKY substitution models or a general reversible model was simulated. The gray scale is shown in figure 4
. In the diagram, the shades in column 1 represent the percentages of quartets that are in the star tree area, while in columns 24, the shades show the percentages of quartets that are in tree areas 1, 2, and 3, respectively. We note that the probability of detecting the star tree T is virtually independent of the choice of a and b. GM has a better chance of correctly identifying the star topology. LM, on the other hand, is too liberal in accepting a binary tree. Table 2
shows the average percentages of quartets over all 225 parameter combinations of branch lengths a and b that are in the seven simplex areas T
, T1, T2, T3, T12, T13, and T23, respectively. While GM shows almost constant values for all evolution models, the values of LM are quite different for the different evolution models. If the model of sequence evolution is violated in the reconstruction process, then LM has the tendency to favor the wrong topology (%T2 > %T1 and %T3, and %T2 > %T
). This tendency is more pronounced if a << b or vice versa. GM suffers from this tendency only for very small values of a or b.
|
|
Success in Predicting the Binary Tree
We first analytically computed the areas into which the simplex point of GM was mapped for the parameter spaces of the clock and the nonclock trees. We used the Jukes-Cantor and the Kimura two-parameter evolutionary models. In the latter case, we compared the properties of the unweighted GM method with those of the weighted GM + M method.
The results of figure 6a show that for the clock tree, depending on the branch length combinations, GM suggests either the correct tree or the star topology. If the weighted GM method is applied, then the T1 region is significantly enlarged. For the nonclock tree (fig. 6b ), the set of branch length combinations for the true tree area is smaller than that for the clock tree. For most parameter values, GM suggests either the T12 topology or the star tree. For a very small range of branch length combinations, the simplex point is even mapped into the "inconsistent" tree area T2 = T(AC | BD).
|
|
We conclude that the success and the stability of LM is virtually independent of the assumed evolution model. The branch length combinations for which LM reconstructs the true tree cover a large part of the parameter space. The results of the GM analyses, on the other hand, show a clear dependency of success in reconstructing the true tree on the tree shape rather than the evolution model.
The simulations and the analytical considerations indicate a relationship between the lengths of the outer branches, a and b, and that of the inner branch, c, that determines into which simplex area the GM point is mapped. From the calculations in appendix D, we find that for the Jukes-Cantor model, the sum of the two outer branch lengths has to be smaller than a threshold value as determined in equation (15)
such that there exists a positive minimal inner branch length c, given by equation (14) , for which the corresponding simplex point is mapped into the correct tree area:
![]() |
A similar threshold for c is, of course, conceivable for more complex models of sequence evolution. To estimate that minimal value, cmin, we computed the location of s as a function of c for three examples with fixed outer branch lengths: a = 0.1 for clock tree I, and a = 0.01, b = 0.1 and a = 0.005, b = 0.1, respectively, for nonclock tree II. c was increased in steps of 0.005 starting from c = 0.005. Based on 200 simulations for a fixed c, the vector (c) was computed, and the minimal value cmin, for which
(c) was mapped into the correct tree area T1 was determined. Moreover, we computed the minimal inner edge length c0.95min such that at least 95% of the simulated s vectors were in T1. The results are shown in table 3
.
|
Second, cmin as computed by the GM method is in all cases larger than cmin of LM and changes considerably with the evolution models. It ranges between 0.02 (Jukes-Cantor) and 0.55 (Kimura two-parameter + (0.2)). We note too that the differences between cmin and c0.95min are larger for the GM method than for the LM implementation.
We conclude that GM is successful in reconstructing the correct tree when the following three conditions are fulfilled: (1) the sequences evolve without a severe site variation; (2) the total sum of the outer branch lengths is less than 0.9, and thus the mean length of one outer branch does not exceed 0.225; and (3) the interior branch length is larger than a minimal value that depends on the lengths of the outer branches.
Applications
Here we illustrate the two methods for two data sets. The first data set consisted of 24 mitochondrial sequences (11,133 bp) from three eutherian classes (18 eutheria, 2 marsupialia, and 1 monotremata) and three other noneutherian sequences (chicken, lungfish, and frog). The three codon positions were analyzed separately. For all codon positions, LM placed most of the quartets into the region of tree T1 (fig. 8a
), thus suggesting a branching pattern of eutheria and noneutheria versus marsupialia and monotremata. GM, on the other hand, mapped all quartets into the star tree area, suggesting an undefined branching pattern (fig. 8b
).
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Let us point out that the general idea of quartet-mapping is to reduce the probability that a wrong tree is reconstructed. A tree is recognized only if the simplex point is mapped into one of the three small trapezia (see fig. 1 ). The main conclusion of this paper is that the shape of the tree, rather than the substitution model, determines how reliably and successfully ML and statistical geometry map the simplex point into the correct topology area.
Let us discuss the two methods separately. In all analyzed cases, LM appears to be too liberal in suggesting a tree when the underlying topology is a star tree or very close to it. This bias is especially pronounced if the reconstruction model does not take rate heterogeneity among sites into account. In the heterogeneous case, as well as the REV case, the binary tree which brings the long edges together is favored more often than the other two binary trees. The ultimate reason for this phenomenon needs further investigation, but it has been observed in other studies that when the data violate the assumptions of the analysis model, the likelihood ratio test provides strong support for the tree separating the long edges from the short edges (Gaut and Lewis 1995
).
LM appears to be robust and successful in reconstructing the correct tree over much of the whole parameter space even when simple evolution models are used instead of more complex (and correct) ones. This agrees with robustness results reported elsewhere (Gaut and Lewis 1995
; Huelsenbeck 1995; Yang 1996).
Our results show that GM allows for consistent topology reconstruction if the inner branch length is longer than a threshold value. This threshold value depends critically on the tree shape and to some degree on the substitution model. However, there exists an upper limit for the sum of the outer branches for which the interior branch length has to be infinitely long such that the correct tree is reconstructed. Thus, in a major part of the parameter space, GM suggests a star tree. On the other hand, only a small range of parameters leads to an inconsistent tree reconstruction, i.e., for which GM suggests the incorrect tree. Thus, by using the quartet-mapping approach, i.e., by considering the seven quartet graphs rather than only the three binary trees, the Felsenstein zone is significantly reduced. The large region of the star tree, whose area is equal to the sum of the three areas of the three perfect trees, is the reason that GM suggests a star tree in a large part of the parameter space. GM evaluates the absolute number of positions, and, in addition, it does not take the information of the outer branch lengths into account. Thus, the level of "noise" is quickly approached, such that a tree can no longer be extracted from the sequence alignment. It is interesting to note here that GM allows for consistent tree reconstruction even if the molecular clock is violated, as long as the threshold relation of the branch lengths is satisfied. This is in a way surprising, since statistical geometry, similarly to maximum parsimony, makes no explicit assumption about the underlying process of evolution.
One could now ask if the range of branch lengths for which GM reconstructs the true tree can be enlarged. One source of noise that destroys the reconstruction of the true tree in the statistical-geometry approach of the sequence space approach is the use of the positions in which only one pair has equal characters. We therefore used a second, modified, definition of the coordinates of the simplex point by taking merely the "parsimonious informative" positions (Swofford and Olsen 1990
), i.e., those in which two pairs have equal characters. These are the position categories
6
8 in expression (3)
. Then, the three coordinates of the simplex vector are given by sMP1 = d6/(d6 + d7 + d8), sMP2 = d7/(d6 + d7 + d8), and s3MP = d8/(d6 + d7 + d8). Note that this is the quartet-mapping implementation of the maximum-parsimony method (Farris, Kluge, and Eckhardt 1970
; Fitch 1971
). Assuming the Jukes-Cantor model, we computed the expected coordinates sMPi for the clock tree and the nonclock tree. Figure 10
shows the comparison of the predicted areas using the geometry implementation and the parsimony implementation. Indeed, we find a significantly larger threshold line for the T1 area for the clock tree and an enlarged T1 area for the nonclock tree. A similar computation for the threshold relation (cf. eqs. 13, 14, and 15
) now gives
|
|
When evaluating a reconstructed tree topology, one of the biggest difficulties is in deciding when the length of an interior branch is equal to zero. From the star tree simulations, we find that ML as implemented in LM suggests a binary tree too often. On the other hand, the domain of the parameter space in which GM reconstructs a star tree is too large. We therefore propose the following procedure to test for a positive interior branch length using a combination of LM, GM, and ML. For a quartet of sequences, both LM and GM are applied. Then, we distinguish four cases:
Let us look at the example in figure 8
. Here, most of the simplex points of LM are mapped into the tree area T1, whereas the simplex points of GM are all placed into the star tree area. From an additional tree reconstruction now applied to all positions using the ML method as implemented in PUZZLE (Strimmer and von Haeseler 1996
), we computed the five mean branch lengths: â1 = 0.31, â2 = 0.21,
1 = 0.40, and
2 = 0.21 for the outer branches and
= 0.06 for the inner branch length. Then, â = 0.26,
= 0.31, and a + b = 0.57. The sum of â and
is below the threshold of equation (15)
. Therefore, there exists a positive minimal inner branch length cmin
0.36 given by equation (14) for which GM reconstructs a tree. The value
of the inner edge separating the four groups, however, is smaller than cmin, the minimal value needed for GM to reconstruct a tree. Thus, a definite conclusion about the phylogeny of these four groups cannot be reached.
In this paper, we investigated the statistical properties of LM and GM only for model trees with four taxa. It has been shown that for trees with more than four species, LM is also too liberal to accept a bifurcating tree even if the true topology is a star tree (Nieselt-Struwe 1998
). Nieselt-Struwe (1998)
also showed that this tendency cannot be improved by increasing the sequence length.
All models of nucleotide substitution used in the simulations in this paper are Markov models and assume that evolution is independent and identical at each site and along each lineage. This assumption, of course, may be quite wrong. The work of Schöniger and von Haeseler (1995)
and others on site-dependent models has shown that, indeed, ML has an enlarged Felsenstein zone if sequence sites do not evolve independently (Muse 1995
; Rzhetsky 1995; Schöniger and von Haeseler 1995
; Tillier and Collins 1995
). Investigations of models violating these assumptions should provide another insight into the stability of the ML method and statistical geometry in sequence space.
We conclude that quartet-mapping is a powerful tool with which to test and compare the success of tree evaluation methods based on quartets of taxa. It is especially a very efficient method for a quick evaluation of zero-length branches in trees. However, we are aware that this procedure cannot replace bootstrapping or other evaluation methods. The statistical properties that have been studied in this work can be supplemented by the addition of a bootstrapping procedure to assess the spreading of the point cloud. Work on this question is in progress.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: likelihood-mapping
maximum likelihood
statistical geometry
phylogenetic content
edge test
2 Address for correspondence and reprints: Arndt von Haeseler, Max-Planck-Institut für evolutionäre Anthropologie, Inselstrasse 22, D-04103 Leipzig, Germany. E-mail: arndt{at}eva.mpg.de
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Churchill G., A. von Haeseler, W. Navidi, 1992 Sample size for a phylogenetic inference Mol. Biol. Evol 9:753-769[Abstract]
Eigen M., R. Winkler-Oswatitsch, A. Dress, 1988 Statistical geometry in sequence space: a method of comparative sequence analysis Proc. Natl. Acad. Sci. USA 85:5913-5917[Abstract]
Farris J., A. Kluge, M. Eckardt, 1970 A numerical approach to phylogenetic systematics Syst. Zool 19:172-191[ISI]
Felsenstein J., 1985 Confidence-limits on phylogeniesan approach using the bootstrap Evolution 39:783-791[ISI]
Fitch W., 1971 Toward defining the course of evolution: minimum change for a specific tree topology Syst. Zool 20:406-416[ISI]
Gaut B., P. Lewis, 1995 Success of maximum likelihood phylogeny inference in the four-taxon case Mol. Biol. Evol 12:152-162[Abstract]
Goldman N., 1993 Statistical tests for DNA substitution J. Mol. Evol 34:183-198
Hasegawa M., H. Kishino, T. Yano, 1985 Dating the human-ape splitting by a molecular clock of mitochondrial DNA J. Mol. Evol 22:160-174[ISI][Medline]
Huelsenbeck J., 1995 The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor joining Mol. Biol. Evol 12:843-849[Abstract]
Jukes T., C. Cantor, 1969 Evolution of protein molecules Pp. 21132 in H. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Kimura M., 1980 A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences J. Mol. Evol 16:111-120[ISI][Medline]
Kuiken C. L., K. Nieselt-Struwe, G. F. Weiller, J. Goudsmit, 1994 Quasispecies behavior of human immunodificiency virus type 1: sample analysis of sequence data Pp. 100119 in K. W. Adolph, ed. Molecular Virology Techniques Part A. Academic Press, New York
Li W.-H., 1989 A statistical test of phylogenies estimated from sequence data Mol. Biol. Evol 6:424-435[Abstract]
. 1997 Molecular Evolution Sinauer Ass., Inc. Publishers, Sunderland, Mass., USA
Muse S., 1995 Evolutionary analysis of DNA sequences subject to constraint on secondary structure Genetics 139:1429-1439
Nei M., J. Stephens, N. Saitou, 1985 Methods for computing the standard errors of branching points in an evolutionary tree and their application to molecular data from humans and apes Mol. Biol. Evol 2:66-85[Abstract]
Nieselt-Struwe K., 1998 From likelihood-mapping to quartet-mapping, a new sequence analysis tool Pp. 1322 in M. K. Uyenoyama and A. von Haeseler, eds. Proceedings of the Trinational Workshop on Molecular Evolution. Duke University Publication Group, Duke University, Durham, N.C
Ota R., P. Waddell, M. Hasegawa, H. Shimodaira, H. Kishino, 2000 Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters Mol. Biol. Evol 17:798-803
Rzhetsky A., 1995 Estimating substitution rates in ribosomal RNA genes Genetics 141:771-783
Rzhetsky A., M. Nei, 1992 A simple method for estimating and testing minimum-evolution trees Mol. Biol. Evol 9:945-967
Saitou N., M. Nei, 1986 The number of nucleotides required to determine the branching order of three species, with special reference to the human-chimpanzee-gorilla divergence J. Mol. Evol 24:189-204[ISI][Medline]
Schöniger M., A. von Haeseler, 1995 Performance of the maximum likelihood, neighbor joining and maximum parsimony methods when sequence sites are not independent Syst. Biol 44:533-547[ISI]
Sitnikova T., 1996 Bootstrap method of interior-branch test for phylogenetic trees Mol. Biol. Evol 13:605-611[Abstract]
Strimmer K., N. Goldman, A. von Haeseler, 1997 Bayesian probabilities and quartet puzzling Mol. Biol. Evol 14:210-211
Strimmer K., A. von Haeseler, 1996 Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies Mol. Biol. Evol 13:964-969
. 1997 Likelihood-mapping: a simple method to visualize phylogenetic content of sequence alignment Proc. Natl. Acad. Sci. USA 94:6815-6819
Swofford D., G. Olsen, 1990 Phylogeny reconstruction Pp. 411501 in D. Hillis and G. Moritz, eds. Molecular systematics. Sinauer, Sunderland, Mass
Tillier E., R. Collins, 1995 Neighbor joining and maximum likelihood with RNA sequences: addressing the interdependence of sites Mol. Biol. Evol 12:7-15
Whelan S., N. Goldman, 1999 Distribution of statistics used for the comparison of models of sequence evolution in phylogenetics Mol. Biol. Evol 16:1292-1299
Yang Z., 1994 Estimating the pattern of nucleotide substitution J. Mol. Evol 39:105-111[ISI][Medline]
. 1996 Phylogenetic analysis using parsimony and likelihood methods J. Mol. Evol 42:294-307[ISI][Medline]
Yang Z., B. Rannala, 1997 Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method Mol. Biol. Evol 14:717-724[Abstract]
Zharkikh A., W.-H. Li, 1992 Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences: II Four taxa without a molecular clock. J. Mol. Evol 35:356-366[ISI][Medline]