Quartet-Based Phylogenetic Inference: Improvements and Limits

Vincent Ranwez and Olivier Gascuel

Département Informatique Fondamentale et Applications, Montpellier, France


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Simulation Results
 Discussion
 Conclusions
 Acknowledgements
 appendix
 References
 
We analyze the performance of quartet methods in phylogenetic reconstruction. These methods first compute four-taxon trees (4-trees) and then use a combinatorial algorithm to infer a phylogeny that respects the inferred 4-trees as much as possible. Quartet puzzling (QP) is one of the few methods able to take weighting of the 4-trees, which is inferred by maximum likelihood, into account. QP seems to be widely used. We present weight optimization (WO), a new algorithm which is also based on weighted 4-trees. WO is faster and offers better theoretical guarantees than QP. Moreover, computer simulations indicate that the topological accuracy of WO is less dependent on the shape of the correct tree. However, although the performance of WO is better overall than that of QP, it is still less efficient than traditional phylogenetic reconstruction approaches based on pairwise evolutionary distances or maximum likelihood. This is likely related to long-branch attraction, a phenomenon to which quartet methods are very sensitive, and to inappropriate use of the initial results (weights) obtained by maximum likelihood for every quartet.


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Simulation Results
 Discussion
 Conclusions
 Acknowledgements
 appendix
 References
 
The maximum-likelihood method (Felsenstein 1981Citation ) is widely used to infer molecular phylogenies. It has sound statistical foundations and performs well in computer simulations. Unfortunately, the computing time quickly becomes unacceptable as the number of taxa (n) increases. When n is greater than about a dozen taxa, it is much faster to use maximum-likelihood to analyze each subset of four taxa than to use it to directly infer the evolutionary history of all these taxa. By combining four-taxon phylogenies (4-trees) obtained by maximum likelihood, quartet methods try to tap the strength of maximum likelihood within reasonable computing time.

The simplest quartet approach involves keeping for each group of four taxa (quartet) the most likely 4-tree and then seeking the complete phylogeny which contains as many of these 4-trees as possible. Methods based on this principle (Bandelt and Dress 1986Citation ; Jiang, Kearney, and Li 1998Citation ; Berry and Gascuel 2000Citation ) are interesting due to their good theoretical properties. They do not, however, account for the fact that some 4-trees are more reliable than others, and they implicitly give the same importance to all inferred 4-trees.

Several different approaches have been described for incorporating reliability into quartet-based phylogenetic analysis. In Erdös et al. (1997)Citation , 4-trees found to be unreliable are rejected and not used in phylogeny reconstruction. In Berry et al. (1999), two different 4-trees on the same quartet are used when a single 4-tree is unreliable. Finally, methods like quartet puzzling (QP) (Strimmer, Goldman, and von Haeseler 1997Citation ) and that of Willson (1999a)Citation preserve the three possible 4-trees for each quartet and associate with them a weight proportional to the confidence they have in them. By using maximum likelihood to balance 4-trees, these algorithms seem to follow a reasonable approach for inferring a tree having a high likelihood on n taxa.

However, inference of phylogenies containing only four taxa is sometimes contested, which questions the very principle of quartet methods. The sampling of taxa preceding phylogenetic reconstruction strongly influences the inferred phylogeny. If one seeks to establish the phylogenetic relationship between four groups of taxa by using a single representative for each of these groups, the result generally depends on which representatives are selected. This problem was studied by Philippe and Douzery (1994)Citation , who, based on 4-trees inferred by a parsimony method, concluded: "Reconstructing history with only four taxa is rather a game of chance." Adachi and Hasegawa (1999)Citation extended this result to 4-trees inferred by maximum likelihood and concluded: "As Philippe and Douzery showed, it is now clear that an argument based on a quartet analysis of a single gene is very dangerous." However, it seems possible to overcome this difficulty, since in published simulations (Strimmer and von Haeseler 1996Citation ; Strimmer, Goldman, and von Haeseler 1997Citation ) the performance of QP is close to or even better than that of maximum likelihood. This gives rise to the following questions: Is it possible to improve QP? If so, how do these new quartet methods perform compared with other classical phylogenetic reconstruction methods?

In this article, we present weight optimization (WO), a new algorithm which also uses weighted 4-trees inferred by maximum likelihood. WO searches for the tree on n taxa such that the sum of the weights of the 4-trees induced by this tree is maximal. Finding the optimal tree in this sense is computationally difficult (Steel 1992Citation ), like most optimization tasks in phylogenetic inference (Swofford et al. 1996Citation ), and thus WO is based on heuristic approach which only guarantees finding a near-optimal tree. However, we shall see that its performance in optimizing this criterion is good enough and that better optimization does not improve the topological accuracy. We start by describing QP, and then we describe and demonstrate the properties of WO. Using computer simulations, we then compare the topological accuracies of QP and WO with those of other common phylogenetic reconstruction methods. WO significantly improves QP, but, as discussed, the performance of these quartet methods is rather disappointing. Finally, we describe factors which seem to limit the topological accuracy of quartet methods. The implementation of WO, which uses tools of general interest in phylogenetic reconstruction, is described in the appendix.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Simulation Results
 Discussion
 Conclusions
 Acknowledgements
 appendix
 References
 
First, we define the terms employed in this article and the corresponding notation. Then, we describe QP and WO and study their properties.

Definitions and Notation
A tree is a set of nodes connected by branches (or edges) so that there is a single path connecting two nodes. The degree of a node is the number of branches attached to it. The nodes with degree 1 (the leaves) are labeled by taxa; other nodes are called internal nodes. If all internal nodes have degree 3, then we say that the tree is fully resolved; if some have a higher degree, we say that the tree is partially resolved.

Removing a branch in a phylogeny separates taxa into two disjoint subsets such that their union equals the complete set of taxa. The branch is said to define the bipartition (or split) formed from the two subsets. When a bipartition separates two taxa from all the others, the pair of taxa is called an external pair. Let xy | zt denote the 4-tree that separates taxa x and y from taxa z and t. A quartet is a set of four taxa; for each quartet {x, y, z, t}, there are three possible 4-trees: xy | zt, xz | yt, and xt | yz. When an n-tree T contains a bipartition which separates taxa x and y from taxa z and t, T is said to induce the 4-tree xy | zt, and the 4-tree xy | zt is said to be consistent with T.

Let the pair (q, w) denote the weighted 4-tree (w4-tree) which associates a weight w with the 4-tree q. Let Q denote the set of weighted 4-trees used as the starting point for QP or WO. Q contains three w4-trees for each quartet. Qmax is the set induced by Q which, for each quartet, contains only the 4-tree having the maximum weight (among three). In the unweighted case, for each quartet, one 4-tree has weight 1 and the two others have weight 0. The set of 4-trees induced by a tree T is denoted QT.

After assigning weights to the 4-trees, quartet methods search for the tree T which fits them best. A natural measurement of this fit is the sum of the weights of 4-trees induced by T. Thus, we search for the tree T which maximizes the W criterion, defined as


However, finding T defined as such is NP-hard (Steel 1992Citation ) and can require an exponentially long computing time. Therefore, we have to use heuristic algorithms (such as QP or WO), to find a near-optimal tree within a reasonable amount of computing time.

Quartet Puzzling
To infer the phylogeny of n sequences, QP proceeds in three stages: (1) it uses the maximum-likelihood principle to weight all possible 4-trees; (2) based on these weights, it constructs a large number of n-trees; and (3) it computes the consensus tree of these n-trees.

Given a quartet, the likelihoods l1, l2, and l3 of the three associated 4-trees are used to estimate their respective probabilities p1, p2, and p3 of being the correct 4-tree. These probabilities are evaluated using Bayes theorem as (Strimmer, Goldman, and von Haeseler 1997Citation )


Strimmer, Goldman, and von Haeseler (1997)Citation propose three different ways to use these probabilities to weight 4-trees. In the continuous case, the probabilities are directly used as weights, i.e., wi = pi. In the binary (unweighted) case, the weight of the 4-tree with highest probability is set at 1, and the weights of the two others are set at 0. In the discrete case, the three wi's are discrete, least-squares approximations of the pi's. Assuming that p1 >= p2 >= p3, the weights (w1, w2, w3) are approximated by (1, 0, 0), (1/2, 1/2, 0), or (1/3, 1/3, 1/3). For example, if p1 = 0.5, p2 = 0.45, and p3 = 0.05, then w1 = 1/2, w2 = 1/2, and w3 = 0, indicating that it is not clear which of the first two 4-trees is the correct one but that the third one is certainly incorrect.

The next stage uses these weights on 4-trees to construct a collection of n-trees. To construct a single n-tree, QP follows an addition scheme. It first randomly determines an order among the taxa. The taxa are then added one by one to a partially constructed tree until a complete n-tree containing all taxa is obtained. Suppose that the taxa are denoted as 1, 2, ... , n, and that the addition order is (1, 2, 3, ... , n). The 4-tree of maximum weight among the three which resolve {1, 2, 3, 4} is used as starting point. At step i, the current tree contains the taxa from Si = {1, 2, 3, ... ,i - 1}, and QP seeks to add the taxon i. To determine the branch on which taxon i is added, QP proceeds as follows. First, with each branch of the current partially constructed tree, it associates a penalty which is initialized to 0. Then, it considers all w4-trees of type (xi | yz, w) with x, y, and z in Si, which are the w4-trees "relevant" to the addition of i. For each of these w4-trees, QP adds the penalty w to branches of the path connecting y to z. The taxon i is then added onto the least penalized branch.

Figure 1 shows how, starting from a set of w4-trees, QP determines the branch of 12 | 34 on which taxon 5 is added. For example, to take the w4-tree (51 | 23, 0.4) into account, QP adds a penalty of 0.4 on branches belonging to the path (2, 3). Indeed, if taxon 5 is added on one of these branches, the 4-tree 51 | 23 is not induced by the inferred tree. Note, however, that adding taxon 5 on the branch connected to taxon 4 also leads to a tree construction inconsistent with 51 | 23 and that QP does not penalize this branch.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 1.—WO and QP (based on continuous weighting) reconstruction procedure on a simple example with five taxa. It is assumed that the initial 4-tree is 12 | 34 for both algorithms. The w4-trees relevant for the addition of taxon 5 are grouped according to the quartet they resolve. For each quartet, the w4-tree of maximum weight is the first one. The final result provides the score of each branch, which corresponds to the sum of the scores set for each of the 12 relevant w4-trees. This score indicates the branch on which taxon 5 is added. WO, unlike QP, reconstructs the correct tree

 
This basic principle is altered according to the weighting scheme. In the binary case, only weights of 1 are reported in the tree, because weights of 0 do not have any influence. In the discrete case, QP randomly selects one of the likely 4-trees associated with a given quartet and then uses the binary procedure. For example, if weights are (1/2, 1/2, 0), then some n-trees will be reconstructed based on the assumption that the correct 4-tree is the first one, and some others will be reconstructed on the assumption that it is the second one.

The n-tree that is constructed varies according to the taxon addition order. In order to overcome this problem, QP randomly defines several addition orders, constructs the corresponding n-trees, and returns their consensus as final result. Strimmer and von Haeseler (1996)Citation recommend that trees be constructed using as many different orders as possible and suggest that 1,000 is a reasonable value. Several consensus tree definitions exist. The current version of QP uses the "majority rule" consensus tree of Margush and McMorris (1981)Citation , whose bipartitions appear in more than half of the n-trees. Usually this consensus tree is partially resolved, and the simulation tests performed by Strimmer and von Haeseler (1996)Citation and Strimmer, Goldman, and von Haeseler (1997)Citation were done (personal communication) using the CONSENSE program from the PHYLIP package (Felsenstein 1989Citation ). This program usually provides a fully resolved tree with relatively high branch supports which systematically contains bipartitions of the majority-rule consensus tree.

Weight Optimization
WO uses 4-tree weighting to dynamically define the taxon addition order. At each step, the selected taxon is added so as to generate the greatest increase of the W criterion (eq. 1) . This does not guarantee that the optimal n-tree will be found, but this kind of "greedy" approach has proven effective in many optimization problems (Cormen, Leiserson, and Rivest 1992Citation , pp. 329–356).

WO also uses continuous weighting based on likelihood, as introduced by (Strimmer, Goldman, and von Haeseler 1997Citation ) and described above. WO gives branches bonuses rather than penalizing them. This new formulation has the advantage of clarifying the link with the W criterion. When adding taxon i, WO chooses the branch giving the tree with the largest W score (eq. 1) . To this end, WO considers every relevant w4-tree (ix | yz, w) and adds a bonus w to each branch of the current tree, such that the addition of i on this branch constructs a tree inducing ix | yz. Once all relevant w4-trees have been taken into account, the bonus of any branch b with the addition of taxon i is equal to the increase {delta}W(b, i) of the W criterion induced by this addition. For the w4-tree (ix | yz, w), branches receiving a bonus are determined as follows. There is a single internal node belonging to the three paths (x, y), (x, z), and (y, z) called the median of x, y, z (Barthélemy and Guénoche 1990Citation , pp. 57). This internal node is attached to three branches and thus defines three disjointed subtrees denoted as Tx, Ty, and Tz, such that x Tx, y Ty, and z Tz (fig. 2 ). If taxon i is added on a branch of Tx, then the path (i, x) has its branches within Tx, while the path (y, z) has all its branches within Ty {cup} Tz. This ensures that paths (i, x) and (y, z) do not intersect and, consequently, that the constructed tree induces ix | yz. WO thus adds the bonus w onto branches of Tx, which is equivalent to adding a penalty w to the branches of Ty {cup} Tz. However, this is very different from giving a penalty only to the branches of the path (y, z) as is the case with QP.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 2.—The median node u of x, y, and z splits the tree into three subtrees, Tx, Ty, and Tz

 
Figure 1 shows how, starting from a set of w4-trees, WO determines the branch of 12 | 34 on which taxon 5 must be added. To take the w4-tree (51 | 23, 0.4) into account, WO puts a single 0.4 bonus on the branch connected to taxon 1. Indeed, this is the only branch for which the addition of taxon 5 generates a tree inducing 51 | 23. Note that by adding no bonus to the branch connected to taxon 4, WO, contrary to QP, penalizes this branch.

WO randomly selects three taxa (denoted as 1, 2, and 3) to initialize the tree T and then iteratively adds the remaining taxa. Unlike with QP, the taxon addition order is not random. Instead, at each step, we add the taxon providing the "safest" addition. The safety s(i) of the addition of i is defined as follows. Let M denote the branch with the highest bonus and m the branch giving the second highest bonus. Let {delta}W(M, i) and {delta}W(m, i) denote the increases in W(T) resulting from adding i onto branch M or m. We have


The greater the difference between {delta}W(M, i) and {delta}W(m, i), the more contrasted the certainties relative to M and m, and the safer the addition of i onto M. This definition of safety differs from that of Willson (1999a)Citation , which uses s(i) = {delta}W(M, i) - {delta}W(m, i). Unpublished tests show that, in the present case, it is preferable to normalize this value between 0 and 1, as in the above definition.

In order to completely define the addition order, we could alternatively select the 4-tree of maximum weight as the starting point. This requires examining all 4-trees, which is time- and memory-space-consuming, and simulations indicate that this does not improve the topological accuracy of WO. The simpler solution presented here ensures that the starting 4-tree is a good one, since it is the best among the 3 x (n - 3) 4-trees that resolve a quartet {1, 2, 3, x}, with x differing from 1, 2, and 3.

Differences Between WO and QP
Let T denote the correct tree. If the weighting of the 4-trees is accurate enough to obtain Qmax = QT, i.e., if for all quartets the correct 4-tree has a weight superior to that of the two others, then WO reconstructs the correct tree T with certainty. This property also holds for QP in the binary case (Strimmer and von Haeseler 1996Citation ), but it no longer holds when a discrete or continuous weighting is used. Moreover, WO constructs a single tree, whereas our simulations confirm that QP needs to construct a large number of trees to achieve satisfactory performance. As we shall explain, this not only makes WO faster than QP, but also saves memory space and allows us to deal with larger data sets.

The proof that WO infers the correct tree T once Qmax = QT is achieved by proving that at each step of WO, the current tree is a subtree tree of T. This property is true for the 3-tree used as a starting point. Assuming that this property is true at the current step, we simply have to prove that for every taxon, adding it on the branch with maximum bonus for this taxon infers a subtree of T, which guarantees that the property is still true at the next step. Since the current tree is a subtree of T, it possesses a branch such that adding i onto it leads to a subtree of T. The bonus of this branch is the sum of the weights of the newly induced 4-trees, which are all correct. The bonus of any other branch is the sum of the weights of the newly induced 4-trees, but at least one of them is not correctly resolved. These two sums concern weights which are bijectively associated, depending on the relevant quartet to which they correspond. Based on our hypothesis that Qmax = QT, this implies that the highest sum indicates the correct branch to add i, and thus recursively proves that at each WO step the current tree is a subtree tree of T.

Figure 1 illustrates a case in which QP (based on continuous weighting) fails to infer the correct phylogeny, although Qmax = QT. In this example, QP and WO are used to infer the phylogeny of taxa 1, 2, 3, 4, and 5. Moreover, we assume that 12 | 34 is used as the starting point by both algorithms. The way QP and WO then use the 4-tree weighting to add taxon 5 is detailed in figure 1 . After having taken all relevant w4-trees into account, WO adds taxon 5 on the branch connected to taxon 1 and thus infers the unique tree T such that Qmax = QT. Moreover, as shown above, WO reconstructs this tree T irrespective of the three taxa used to initialize the tree. On the other hand, QP adds taxon 5 on the branch connected to taxon 2 and does not infer this tree. In this example, the discrete approximation of a 4-tree weight different from 0 and 1 is always 1/3. Thus, for each quartet, the discrete-weight version of QP has one chance in three to select the correct 4-tree, and, depending on these random selections, it may or may not reconstruct the correct phylogeny. Finally, the only consistent version of QP is that using binary weighting.

The time complexity of a phylogenetic reconstruction algorithm expresses the computing time it requires, depending on the number of treated taxa (and possibly other parameters). QP and WO both have O(n4) complexity. In other words, their computing time increases proportionally to the fourth power of the number of taxa. In fact, QP as described by Strimmer and von Haeseler (1996)Citation has O(n5) complexity. However, both QP and WO can be implemented in O(n4) by gathering the w4-trees which modify the bonuses (or penalties) of the same branches and by reusing calculations already carried out during previous steps. Although they have the same theoretical complexity, WO is faster than QP because it constructs a single tree, whereas QP constructs numerous trees (1,000 per default). This difference increases when the number of taxa grows, since the authors of QP say that "generally, the more taxa are involved the more runs of the puzzling step are advised." However, WO remains slower than any method having O(n3) complexity, such as distance algorithms like neighbor joining (NJ) (Saitou and Nei 1987Citation ) or BIONJ (Gascuel 1997Citation ). The O(n4) implementation of WO is detailed in the appendix.

Space complexity expresses the memory space required according to the number of treated taxa. In the algorithm shown in figure 3 , the size of the input data is O(n4), and the memory space required is thus O(n4), as for most quartet algorithms. However, it is possible to modify quartet methods in order to input DNA sequences and to compute the weight of a 4-tree each time it is needed. This decreases the memory space required but increases the computation time as soon as weights are needed more than once. QP needs to consult each weight at least 1,000 times (one per reconstructed tree). For such methods, recomputing the weights instead of storing them is not realistic. However, this approach is better adapted in our case, since in the algorithm shown in figure 3 , the weight of a 4-tree is consulted only once. Denoting l as the length of DNA sequences, the memory space required then becomes O(nl + n2). This relatively low space complexity allows us to deal with larger data sets than quartet methods that need to store O(n4) weights.



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 3.—The weight optimization (WO) algorithm

 

    Simulation Results
 TOP
 Abstract
 Introduction
 Methods
 Simulation Results
 Discussion
 Conclusions
 Acknowledgements
 appendix
 References
 
We start by describing how we generated our test sets and which criteria were used to compare the performances of the various phylogenetic reconstruction programs. We then compare the results obtained by the different methods.

Protocol
Our experimental tests followed a protocol used within a similar framework by Kumar (1996)Citation , and later by Gascuel (1997)Citation . Six model trees were considered, each consisting of 12 taxa (fig. 4 ). The first three (AA, BB, AB) satisfied the molecular-clock hypothesis, while the other three (CC, DD, CD) presented varying substitution rates among lineages. Each interior branch was one unit long (a for constant- and b for variable-rate trees; the lengths of external branches are given in multiples of a or b). For each of these model trees, we studied four evolutionary conditions:



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 4.—Model trees used for simulation. Each interior branch is one unit long (a for constant- and b for variable-rate trees), and the lengths of external branches are given in multiples of a or b. Low divergence refers to a = 0.00625 substitutions per site and b = 0.005, which corresponds to a maximum pairwise divergence (MD) of about 0.1 substitutions per site. Medium divergence refers to a = 0.0185 and b = 0.015 (MD ~= 0.3), high divergence refers to a = 0.0625 and b = 0.05 (MD ~= 1.0), and very high divergence refers to a = 0.125 and b = 0.1 (MD ~= 2.0)

 
For each tree T and evolutionary condition, we used SEQGEN (Rambaut and Grassly 1997Citation ) to generate 1,000 data files with sequences of length 300, and 1,000 data files with sequences of length 600. These sequences were obtained by simulating an evolving process along T according to the Kimura two-parameter model with a transition/transversion rate of 2. We thus tested the different methods on 48,000 test sets corresponding to two sequence lengths, four evolution rates, and six model trees (the data files are available on our web page at http://www.firmm.fr/~w3ifa/MAAS/).

The various reconstruction methods were judged on their ability to infer the correct topology (i.e., that of the tree used to generate the sequences). Generally, this evaluation is made either by counting how many times the tree proposed by the method has the same topology as the correct tree T or by considering a topological distance d(, T) between the inferred tree and the correct one. We used both criteria. The first penalizes methods which propose not-fully-resolved trees, since such trees always differ from the correct one. The second avoids this drawback but requires choosing a measurement of the difference between two tree topologies, which could favor one reconstruction method over another. For our tests, we used the bipartition distance introduced by Robinson and Foulds (1981)Citation , which is equal to the number of bipartitions present in one of the two trees and not the other. This is the topological distance used in most simulation studies.

Methods Tested
For our tests, we used the program versions available on the web. The different programs were run with model options corresponding to the Kimura two-parameter model with a transition/transversion ratio of 2. We used default parameter values for other program options. The following programs were tested:

  1. The most recent version (4.2) of quartet puzzling, denoted QP4.2. This version of QP is based on discrete weighting. By default, this program constructs 1,000 trees (as described above) and returns their majority-rule consensus tree.
  2. Variants of quartet puzzling based on binary (QPB), discrete (QPD) and continuous (QPC) weighting, and on the consensus tree computed by the CONSENSE program, available in the PHYLIP package. Using CONSENSE (instead of the majority-rule consensus) led in our experiments to fully resolved trees and thus to inference of the correct tree more often. This was the consensus tree used in the tests of Strimmer and von Haeseler (1996)Citation and Strimmer, Goldman, and von Haeseler (1997)Citation (personal communication). We also tested variants of QP based on the reconstruction of a unique tree and on binary (QP1B) and continuous weighting (QP1C). These variants used the same maximum-likelihood computations as QP4.2.
  3. WO based on the same maximum-likelihood computations as QP variants.
  4. DNAPARS, the parsimony program (version 3.5c) of the PHYLIP package.
  5. BIONJ, a distance method designed by Gascuel (1997)Citation . BIONJ improves the NJ algorithm of Saitou and Nei (1987) by using a statistical model of evolutionary distances obtained from sequences. The distances used as input were computed with DNADIST from the PHYLIP package, assuming the Kimura two-parameter model, as for other methods.
  6. FASTDNAML, the maximum-likelihood program (version 1.2) introduced by Olsen et al. (1994). FASTDNAML comes from DNAML (Felsenstein 1981Citation ) and generally infers the same tree, but much faster.

Comparisons of the Different Methods
The results obtained by FASTDNAML, BIONJ, DNAPARS, WO, and the main variants of QP are detailed in tables 1, 2, 3 and 4 . The results obtained are summarized in tables 5 and 6 by averaging over the two sequence lengths and the four evolutionary conditions. Tables 1, 2, and 5 provide the percentages of inferred trees that exactly corresponded to the correct tree. Tables 3, 4, and 6 indicate the average bipartition distance between the inferred tree and the correct one. The results obtained from these two types of comparisons were nearly equivalent, except for QP4.2 which often inferred non-fully-resolved trees. We first analyze the performances of QP variants and compare them with that of WO. We then compare the performances of these quartet methods with those of other reconstruction methods.


View this table:
[in this window]
[in a new window]
 
Table 1 Percentages of Correctly Inferred Trees (sequence length 300)

 

View this table:
[in this window]
[in a new window]
 
Table 2 Percentages of Correctly Inferred Trees (sequence length 600)

 

View this table:
[in this window]
[in a new window]
 
Table 3 Average Robinson and Foulds Distances Between the Reconstructed Tree and the Model (sequence length 300)

 

View this table:
[in this window]
[in a new window]
 
Table 4 Average Robinson and Foulds Distances Between the Reconstructed Tree and the Model (sequence length 600)

 

View this table:
[in this window]
[in a new window]
 
Table 5 Percentages of Correctly Inferred Trees (averaged over all sequence lengths and evolutionary conditions)

 

View this table:
[in this window]
[in a new window]
 
Table 6 Average Topological Distances Between the Inferred and Model Trees (averaged over all sequence lengths and evolutionary conditions)

 
Tables 5 and 6 indicate a significant decrease of the topological accuracy of QP when a single tree is reconstructed instead of 1,000 trees. Indeed, the performances of QP1B and QP1C are 12%–16% worse than those of QPB and QPC, respectively, in terms of percentage of correctly inferred trees. This confirms the importance of reconstructing many trees for QP.

Our tests also confirm that using continuous weighting is slightly better than using discrete weighting (about 1% in terms of correctly reconstructed trees), but they question the idea that using discrete or continuous weighting (instead of binary weighting) improves the topological accuracy of QP. Indeed, tables 5 and 6 indicate that the average results obtained by QPB are better than those of QPD and QPC by about 2%–4% in terms of percentage of correctly inferred trees. This is likely explained by the fact that QPB is the only consistent variant of QP.

Tables 5 and 6 indicate that the topological accuracy of QP strongly depends on the model tree topology. The performances obtained by QPD and QPC are more than 20% worse for model tree AA than for tree BB in terms of percentage of correctly inferred trees, whereas other methods tend to have similar performances for both trees or better performances for tree BB than for tree AA. Moreover, the topological accuracy of QP1C does not depend on the model tree. The topological bias of QP thus seems to be due to the use of the consensus, which tends to privilege trees having many external pairs (tree BB has four, whereas tree AA has only two). To test this assumption, we applied QPC 1,000 times on a test set made of 12 identical sequences and therefore containing no phylogenetic signal. The trees constructed by QPC before the consensus step contained four external pairs on average, whereas those obtained after the consensus step always contained six (the largest possible number with 12 taxa). Moreover, on data files obtained from a model tree having six external pairs, QPC obtains an impressive performance, even better than that of FASTDNAML. This likely explains the previous excellent results obtained by QPD and QPC (Strimmer and von Haeseler 1996Citation ; Strimmer, Goldman, and von Haeseler 1997Citation ), because the model trees used to generate the sequences contained the greatest possible number of external pairs. QPB and QP4.2 are also affected by this topological bias, but to a lesser extent (their performances are about 7%–8% worse for model tree AA than for tree BB). The difference with QPC and QPD is due to the fact that trees reconstructed by QPB are less diversified and that in the majority tree provided by QP4.2, only highly supported bipartitions are retained.

It appears from tables 3 and 4 , that QP4.2 is more efficient (in terms of bipartition distance) than QPC and QPB only when the evolution rate is so low (MD ~= 0.1) and the sequences so short (l = 300) that it is not uncommon that no mutation occurs along certain branches (Kumar 1996Citation ). In this case, QP4.2 benefits from being the only method able (in our tests) to propose partially resolved trees, with about six bipartitions on average, whereas all other methods propose fully resolved trees having nine bipartitions. In all other conditions, QPB and QPC tend to be better ranked than QP4.2, according to the bipartition distance as well as the percentage of correctly inferred trees.

However, WO is better ranked than QPB and QPC for all trees but the BB tree according to both topological criteria (tables 5 and 6 ). Its performance is better overall than that of QP and less dependent on the correct tree topology.

Nevertheless, the performance of WO is still worse than those of the other tested methods. WO is always less accurate than FASTDNAML and BIONJ, and the only cases in which WO is ranked better than DNAPARS correspond to high evolution rates (MD ~= 1.0 and MD ~= 2.0), for which the parsimony methods are well known to be poorly suited.

Tables 5 and 6 clearly demonstrate that FASTDNAML is the most accurate method. On average, it finds the correct tree 12%–13% more often than BIONJ, which is the next best method. The topological accuracy of DNAPARS depends on the evolution rate. For a low evolution rate, which corresponds to the assumptions made by parsimony, DNAPARS is a little more accurate than BIONJ; for a medium evolution rate, BIONJ has an advantage over DNAPARS, and this advantage becomes dramatic for high evolution rates.

Since there are about n4 quartets, the time complexity of QP and WO (O(n4); see above) is thus minimal. This complexity makes it possible to deal with problems involving about one or two hundred taxa, while BIONJ and DNAPARS, which have O(n3) complexity, may deal with problems involving a few thousand taxa. However, note that BIONJ and DNAPARS are heuristic algorithms, like WO or QP, and do not systematically provide optimal trees according to the criterion they optimize (e.g., parsimony in the case of DNAPARS). We tested the different programs on a PC with a 466-MHz processor and 128 MB RAM. The average computing time required for one of our data sets was less than 0.1 s for both BIONJ and DNARPARS, about 0.6 s for WO, 2.64 s for QP, and 4.13 s for FASTDNAML. On a larger data set, containing 25 taxa and 1,896 sites, the average computing time required was about 1.6 s for BIONJ (including the computation of distances by DNADIST), 2.9 s for DNAPARS, 53.0 s for WO, 101.0 s for QP, and 318.0 s for FASTDNAML. In this case, bootstrapping the data is easy for BIONJ and DNAPARS, but becomes problematic for the other methods.


    Discussion
 TOP
 Abstract
 Introduction
 Methods
 Simulation Results
 Discussion
 Conclusions
 Acknowledgements
 appendix
 References
 
We first describe complementary tests revealing that the disappointing results of WO are not due to inefficient optimization of the criterion. Then, we discuss the difficulties encountered by quartet methods.

Better Optimizing the W Criterion
It could be conjectured that the results obtained by WO are disappointing because it uses a naive approach that is inefficient in optimizing the W criterion. When the correct tree T is not inferred by WO, two cases arise. Either T is the single optimum according to W—and in this case the problem is due to WO incorrectly optimizing the W criterion—or T is nonoptimal (or not the single optimum)—and in this case the problem concerns the W criterion, which is not discriminating enough.

In order to know which of these two problems limits the performance of WO, we isolated the second kind of error. Instead of defining a dynamic taxon addition order, we used 1,000 random-addition orders and constructed the 1,000 corresponding trees by optimizing the W criterion at each step. Then, we selected the tree among these 1,000 trees which had the highest W criterion value. This algorithm (WO1,000) does not guarantee that the resulting tree is optimal. However, in our tests, it always inferred a tree with a criterion value at least as good as that of the correct tree T. Thus, the only errors made by WO1,000 corresponded to cases in which the W criterion was not sufficiently discriminating. We observed that the performance of WO1,000 was hardly any better than that of WO: the average improvement over all model trees and conditions was no more than 2%, according to the percentage of correctly inferred trees. This clearly shows that only a tiny fraction of the errors of WO are due to insufficient optimization.

Possible Limits of Quartet Methods
The quartet methods are highly sensitive to 4-tree inference errors. For example, trees CC and CC' in figure 5 differ only with respect to the resolution of the nine specific quartets {1, 2, 3, x} with x {4, 5, 6, 7, 8, 9, 10, 11, 12}. For 12 taxa, there are 495 quartets. Therefore, it is sufficient to change the resolution of 2% (9/495) of the quartets of tree CC to obtain the set of 4-trees which exactly correspond to tree CC'. In this case, it is obvious that all quartet methods infer tree CC' instead of tree CC. In fact, in the unweighted case, a change in the resolution of five quartets is enough to infer the wrong tree, whereas in the weighted case this depends on the weights. A change in the weights of the w4-trees corresponding to these nine quartets such that Qmax = QCC', is enough to make all reasonable methods infer the wrong tree. More generally, with n taxa, there are trees which differ only on n - 3 quartets, whereas the total number of quartets is about n4. Thus, the minimum rate of badly inferred 4-trees sufficient for misleading all quartet methods quickly approaches 0 as the number of taxa increases.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 5.— Only a few errors in quartet inference is enough to mislead quartet methods. The nine quartets {1, 2, 3, x} with x {4, 5, 6, 7, 8, 9} are the only ones with different topologies in CC and CC'. Moreover, they are all (to some extent) subject to long-branch attraction

 
In our simulations, we observed cases in which most of the quartets were correctly weighted and the correct tree was not reconstructed. For example, considering a medium evolution rate, despite the fact that there were about 90% of the correct quartets in Qmax, we observed that the correct tree was not well inferred in comparison with other inference methods. This suggests that the inference errors of the 4-trees were not homogeneously distributed among the quartets and that the major part of the errors concerned specific quartets. This phenomenon is illustrated by the example in figure 5 . In this example, the five quartets {1, 2, 3, x} with x {5, 6, 7, 8, 12} are subject to long-branch attraction, which tends to support the 4-trees of CC' rather than those of CC. Moreover, the four other quartets which differentiate CC from CC' are also (to a lesser extent) affected by this phenomenon. Therefore, it appears that long-branch attraction influences the resolution of specific quartets and could be one of the main difficulties quartet methods are faced with. In global character methods such as FASTDNAML or DNAPARS, these long branches are split, thanks to the presence of other taxa, thus making the problem easier to solve. The analysis for distance methods is less clear. It is surprising that although based on pairwise evolutionary distances (i.e., two-by-two maximum-likelihood analysis), they are more accurate than current quartet methods, which are based on a more extensive four-by-four maximum-likelihood analysis.

We stress that poor weighting of a few specific quartets is enough to lead quartet methods to infer an incorrect bipartition. Moreover, once a wrong bipartition is inferred due to badly weighted 4-trees, this may influence the entire tree topology. For example, assume that the correct tree is CC (fig. 5 ) and that due to long-branch attraction some 4-trees are badly weighted and indicate taxa 1 and 5 as an external pair; then, the three bipartitions respectively gathering together taxa {1, 2}, {1, 2, 3}, and {4, 5} can no longer be inferred. In this case, even if all quartets {1, 2, 3, x} with x {4, 5, 6, 7, 8, 9, 10, 11, 12} are correctly weighted, the branch separating taxa 1 and 2 from the other is not inferred. This clearly indicates the importance and difficulty of correctly weighting 4-trees.

Although the likelihood approach used by QP (Strimmer, Goldman, and von Haeseler 1997Citation ) and then by WO seems well founded, this weighting system seems ill-adapted to the quartet approach. As an illustration, it should be noted that the likelihood of each 4-tree is well defined but does not allow us to predict the likelihood of the whole tree. Moreover, the continuous version of WO (presented above) finds the correct tree on average only about 3% more often than the binary variant (results not shown). The inadequacy of the weighting system currently used by QP and WO could thus explain their disappointing results.


    Conclusions
 TOP
 Abstract
 Introduction
 Methods
 Simulation Results
 Discussion
 Conclusions
 Acknowledgements
 appendix
 References
 
We have presented a new phylogenetic reconstruction algorithm based on weighted 4-trees. This algorithm is generally more efficient than QP, and its topological accuracy depends less on the correct tree topology. Moreover, it is faster and offers better theoretical guarantees. Unfortunately, our simulations seem to indicate that this method is less efficient overall than distance and maximum-likelihood methods. We have been working on quartet methods for a long time (Berry and Gascuel 1997, 2000Citation ) and were the first to be disappointed by the current results. Moreover, unpublished tests (V. Berry, personal communication) on quartet cleaning methods (Berry et al. 1999Citation ) seem to confirm our observations.

A great deal of work was carried out on ways of combining 4-trees to obtain a complete n-tree. The weaknesses of the various quartet methods tested in this paper are very likely due to the method of weighting the 4-trees, rather than the method of combining them. Indeed, as explained above, considering only four taxa requires correct management of long-branch attraction. We hope that this will become possible with the new approaches of Lyons-Weiler and Takahashi (1999)Citation , who suggest ways in which calculation of likelihood could handle this problem, or by Willson (1999b)Citation , who proposes a version of parsimony that is more resistant to long-branch attraction.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Simulation Results
 Discussion
 Conclusions
 Acknowledgements
 appendix
 References
 
We thank David Bryant for his helpful comments and advice.


    Footnotes
 
Manolo Gouy, Reviewing Editor

1 Keywords: phylogenetic reconstruction quartet methods tree consensus maximum likelihood parsimony distance methods computer simulations Back

2 Address for correspondence and reprints: Olivier Gascuel, Département Informatique Fondamentale et Applications, LIRMM 161, Rue Ada, 34392 Montpellier cedex 5, France. gascuel{at}lirmm.fr Back


    References
 TOP
 Abstract
 Introduction
 Methods
 Simulation Results
 Discussion
 Conclusions
 Acknowledgements
 appendix
 References
 

    Adachi, J., M. Hasegawa. 1999. Instability of quartet analyses of molecular sequence data by the maximum likelihood method: the cetacea/artiodactyla relashionships. Cladistics. 5:164–166

    Bandelt, H.-J., A. Dress. 1986. Reconstructing the shape of a tree from observed dissimilarity data. Adv. Appl. Math. 7:309–343

    Barthélemy, J. P., A. Guénoche. 1990. Trees and proximity representationJohn Wiley and Sons, Chichester, England

    Berry, V., O. Gascuel. 1997. Inferring evolutionary trees with strong combinatorial evidence. Lect. Notes Comput. Sci. 1276:111–123

    ———.2000. Inferring evolutionary trees with strong combinatorial evidence. Theor. Comput. Sci. 240:271–298

    Berry, V., T. Jiang, P. Kearney, M. Li. 1999. Quartet cleaning: improved algorithms and simulations. Lect. Notes Comput. Sci. 1643:313–324

    Cormen, T. H., C. E. Leiserson, R. L. Rivest. 1992. Introduction to algorithmsMIT Press, Cambridge, Mass

    Erdös, P., M. Steel, L. A. Székely, T. Warnow. 1997. Constructing big trees from short sequences. Lect. Notes Comput. Sci. 1256:827–837

    Felsenstein, J.. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368–376

    ———.1989. Phylogeny inference package (version 3.2). Cladistics. 5:164–166

    Gascuel, O.. 1997. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14:685–695

    Jiang, T., P. Kearney, M. Li. 1998. Orchestrating quartets: approximation and data correctionPp. 416–425 in R. Motwani, ed. Proceedings of the 39th IEEE Annual Symposium on Foundations of Computer Science, Los Alamitos, Calif

    Kumar, S.. 1996. A stepwise algorithm for finding minimum evolution trees. Mol. Biol. Evol. 13:584–593

    Lyons-Weiler, J., K. Takahashi. 1999. Branch length heterogeneity leads to nonindependent branch length estimates and can decrease the efficiency of methods of phylogenetic inference. J. Mol. Evol. 49:392–405

    Margush, T., F. McMorris. 1981. Consensus n-trees. Bull. Math. Biol. 43:239–244

    Olsen, G., H. Matsuda, R. Hagstrom, R. Overbeek. 1994. A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10:41–48

    Philippe, H., E. Douzery. 1994. The pitfalls of molecular phylogeny based on four species, as illustrated by the cetacea/artiodactyla relationship. J. Mamm. Evol. 2:133–152

    Rambaut, A., N. Grassly. 1997. Seq-gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13:235–238

    Robinson, D. F., L. R. Foulds. 1981. Comparison of phylogenetic trees. Math. Biosci. 53:131–147

    Saitou, N., M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425

    Steel, M.. 1992. The complexity of reconstructing trees from qualitative characters and subtrees. J. Classif. 9:91–116

    Strimmer, K., N. Goldman, A. von Haeseler. 1997. Bayesian probabilities and quartet puzzling. Mol. Biol. Evol. 14:210–211

    Strimmer, K., A. von Haeseler. 1996. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964–969

    Swofford, D. L., G. J. Olsen, P. Waddel, D. Hillis. 1996. Phylogenetic inferencePp. 407–514 in C. M. David, M. Hillis, and B. Mable, eds. Molecular systematics. 2nd edition. Sinauer, Sunderland, Mass

    Willson, S. J.. 1999a. Building phylogenetic trees from quartets by using local inconsistency measure. Mol. Biol. Evol. 16:685–693

    ———. 1999b. A higher parsimony method to reduce long branch attraction. Mol. Biol. Evol. 16:694–705

Accepted for publication February 22, 2001.