NJML+: An Extension of the NJML Method to Handle Protein Sequence Data and Computer Software Implementation

Satoshi Ota and Wen-Hsiung Li

Department of Ecology and Evolution, University of Chicago


    Abstract
 TOP
 Abstract
 Introduction
 Algorithm and Implementation of...
 Simulation Design
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
While the maximum-likelihood (ML) method of tree reconstruction is statistically rigorous, it is extremely time-consuming for reconstructing large trees. We previously developed a hybrid method (NJML) that combines the neighbor-joining (NJ) and ML methods and thus is much faster than the ML method and improves the performance of NJ. However, we considered only nucleotide sequence data, so NJML is not suitable for handling amino acid sequence data, which requires even more computer time. NJML+ is an implementation of a further improved method for practical data analyses (including protein sequence data). Our extensive simulations using nucleotide and amino acid sequences showed that NJML+ gave good results in tree reconstruction. Indeed, NJML+ showed substantial improvements over existing methods in terms of both computational times and efficiencies, especially for amino acid sequence data. We also developed a "user-friendly" interface for the NJML+ program, including a simple tree viewer.


    Introduction
 TOP
 Abstract
 Introduction
 Algorithm and Implementation of...
 Simulation Design
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
For reconstructing molecular phylogenetic trees, either nucleotide or amino sequence data are used, depending on the situation. With rapid accumulation of sequence data, it is increasingly desirable to reconstruct phylogenetic trees using amino acid sequences, especially when the divergence between sequences is very high at the nucleotide sequence level. Trees from amino acid sequence data are useful for various applications, including inference of the universal tree of life (Sogin, Hinkle, and Leipe 1993Citation ; Gribaldo and Cammarano 1998Citation ; Doolittle 1999Citation ; Philippe and Forterre 1999Citation ), protein secondary-structure prediction (Goldman, Thorne, and Jones 1996Citation ; Lio et al. 1998Citation ), and reconstruction of tissue evolution (OOta and Saitou 1999Citation ). For such purposes, the maximum-likelihood (ML) method (Felsenstein 1981Citation ) is often used because of its statistical rigor (Fukami-Kobayashi and Tateno 1991Citation ; Hasegawa, Kishino, and Saitou 1991Citation ; Hasegawa and Fujiwara 1993Citation ; Kuhner and Felsenstein 1994Citation ; Tateno, Takezaki, and Nei 1994Citation ; Huelsenbeck 1995Citation ).

The ML approach, however, requires a large amount of computer time when many taxa are involved. For amino acid sequence data, the ML method incurs even heavier computational burdens. In a given tree, the likelihood value of subtree k, which has two subtrees i and j, is computed in a recursive way: L(k)sk = ({Sigma}si Psksi({upsilon}i)L(i)si) ({Sigma}sj Psksi({upsilon}j)L(j))sj, where sm is a state of the root of subtree m, and Psksj({upsilon}) is a transition probability between states sk and sj for branch length {upsilon} (Felsenstein 1981Citation ). The number of states of any internal node (the root of any subtree) is 20 in the case of amino acid sequences, whereas it is only 4 in the case of nucleotide sequences. Clearly, it takes much more time to analyze amino acid sequence data than to analyze DNA sequence data.

Some algorithms exist for reconstructing phylogenetic trees from amino acid sequence data, e.g., heuristic methods in PROTML of MOLPHY (Adachi and Hasegawa 1996Citation ) and the quartet puzzling (QP) method (Strimmer and von Haeseler 1996Citation ). However, faster methods are needed to handle large trees.

We recently developed the NJML method (Ota and Li 2000Citation ), which is a hybrid of the neighbor-joining (NJ) (Saitou and Nei 1987Citation ) and ML methods. Our strategy was to explore the topology search space only around an initial NJ tree using the ML method. The depth of search depends on the reliability of the initial NJ tree in terms of bootstrap values. Since it uses a greedy (hill-climbing) search algorithm for the topology search, the computational cost is greatly reduced compared with the ML method. Ota and Li (2000)Citation showed that this simple method was highly efficient for improving NJ trees. Furthermore, the performance was nearly equal to or better than those of existing time-consuming heuristic ML methods.

The NJML algorithm, however, was designed only for nucleotide sequence data. In this study, we extended the algorithm to be applicable to amino acid sequence data too. In the new NJML program (NJML+), some approximation options are available for reducing the computational burden. The performances of these options were examined by extensive computer simulations.


    Algorithm and Implementation of NJML+
 TOP
 Abstract
 Introduction
 Algorithm and Implementation of...
 Simulation Design
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The NJML+ program is different from the NJML program mainly in the following six aspects:

  1. It is applicable to amino acid sequence data as well as to nucleotide sequence data.
  2. Arbitrary depth of search is available. The number of internal branches (n) reshuffled at a step can be only 1 (extremely greedy local rearrangements) or the total number of internal branches (exhaustive search).
  3. It has the following option: a user-defined initial bootstrap tree can be used as the initial tree.
  4. Statistics are reported at each step: standard errors of likelihood value differences (Kishino and Hasegawa 1989Citation ) and bootstrap values among candidate trees are computed. To compute bootstrap values, the resampling estimated log-likelihood (RELL) method (Kishino, Miyata, and Hasegawa 1990Citation ) is used.
  5. For reconstructing an initial bootstrap NJ tree, CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ) is available in addition to PHYLIP (Felsenstein 1993Citation ).
  6. Part of the algorithm was modified to reduce computational time, and memory management was improved for treating large trees.

Furthermore, NJML+ has the following features:

  1. The following evolutionary models are available for nucleotide sequence data:

  1. The following evolutionary models are available for amino acid sequence data:

  1. Approximation options (modes) (see below) have been implemented using the code of MOLPHY, version 2.2 (Adachi and Hasegawa 1996Citation ), and CLUSTAL W, version 1.7 (Thompson, Higgins, and Gibson 1994Citation ).

The original NJML program (Ota and Li 2000Citation ) has two major parts: (1) a set of modules (computer programs) for reconstructing an initial bootstrap NJ tree, and (2) a module for the ML estimation. These two parts account for almost all of the computational time taken to obtain a final result. Although the ML estimation takes much time, obtaining an initial bootstrap NJ tree also takes considerable time. We tried to reduce the computational time for both parts. To reduce the computational time for part 1, one may use CLUSTAL W instead of PHYLIP to obtain a bootstrap NJ tree, because CLUSTAL W is much faster. To reduce the computational time for part 2, one may use an approximate method used in NUCML and PROTML of MOLPHY (Adachi and Hasegawa 1996Citation ), in which branch lengths are optimized by the least-squares method and a corresponding likelihood value is computed as an approximate ML value. Since there is no iteration required (except in the final step), the computation is fast.

Now we have two options in part 1 and two options in part 2. Therefore, we can choose one of the four possible ways to obtain a final result. Users should be aware that each of the three approximation options has some drawbacks (see below). Users should choose an option according to the degree of sequence divergence; that is, choose a more exhaustive search when sequence divergence is high.

For nucleotide sequence data, NJML+ has three approximation modes (A, B, and C) and a default mode (D).

  1. Burst mode (A): An initial bootstrap NJ tree is generated using CLUSTAL W, which is very fast. In addition, the ML computation at each step is carried out using the approximate (least-squares) method mentioned above. One drawback of this mode is that CLUSTAL W uses Kimura's (1980)Citation two-parameter model but provides no other choice.
  2. Composite mode (B): An initial bootstrap NJ tree is generated by CLUSTAL W, but the ML computation at each step is carried out in an exact manner.
  3. Mixed mode (C): An initial NJ tree is reconstructed using PHYLIP, as in the NJML method. However, the ML search at each step is carried out in an approximate way, as in mode A.
  4. Default mode (D): The initial NJ bootstrap tree is carried out using PHYLIP. The ML computation at each step is carried out in an exact way. This mode is basically identical to NJML.

For amino acid sequence data, it is very time-consuming to use PHYLIP to reconstruct an initial bootstrap NJ tree (see Discussion). For this reason, we did not prepare modes C and D for amino acid sequence data. The drawback of omitting modes C and D is that for obtaining an initial NJ tree, we cannot choose any evolutionary model of amino acid substitution except for Kimura's (1983)Citation correction for multiple hits. Therefore, NJML+ for amino acid sequence data has only two approximation modes, which correspond to modes A and B for nucleotide sequence data.

We implemented NJML+ as shown in figure 1 . The data flow for each approximation mode is also shown. Some modules are inherited from NJML with minor modifications.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 1.—Implementation of NJML+. NJML+ contains a newly developed module, all_topo, which generates all possible bifurcating trees from a given multifurcating tree and computes the maximum-likelihood value for each tree by using amino acid sequence data. The other modules are modified from Ota and Li (2000)Citation by adding three approximation modes (A, B, and C)

 
Although NJML+ can run from a command line, we developed a "user-friendly" interface by which one can set various parameters for NJML+ to change the search space and can obtain a tree shape by a tree viewer. This interface is web-based, so that NJML+ in fact runs on a web server. The core part of the tree viewer was originally developed for Phylodendron, version 0.8d, beta (Gilbert 1997Citation ).


    Simulation Design
 TOP
 Abstract
 Introduction
 Algorithm and Implementation of...
 Simulation Design
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
We performed three kinds of simulations: (1) fixed model trees were used to study the efficiencies of different tree reconstruction methods to recover the correct topologies; (2) randomly generated model trees were used instead; and (3) randomly generated sequence data were used to compare computer runtimes of different methods. The simulation design basically follows Ota and Li (2000)Citation .

Simulations to Measure Efficiencies to Recover Correct Topologies
Seq-Gen (Rambaut and Grassly 1997Citation ) and PSeq-Gen (Grassly, Adachi, and Rambaut 1997Citation ) were used to generate ancestral nucleotide and amino acid sequences, respectively. In Seq-Gen, the JC and K2P models were used. All base frequencies were set to 0.25. For each case, 500 replications were conducted. In PSeq-Gen, the Dayhoff and JTT models were used. The amino acid frequencies used were those observed when the accepted mutation matrices for the Dayhoff and JTT models were constructed. For each case, 100 replications were conducted.

Figure 2 shows model trees T1 and T2, modified from Strimmer and von Haeseler (1996)Citation . For each of them, a variety of substitution rates a and b were assumed. As shown in figure 2 , T1 and T2 assume a constant rate and two varying rates (among lineages), respectively.



View larger version (8K):
[in this window]
[in a new window]
 
Fig. 2.—Model trees T1 and T2 with expected substitution rates a and b. In T1, a molecular clock is assumed, whereas T2 describes a situation of extreme rate heterogeneity in different branches of the tree. Modified from Strimmer and von Haeseler (1996)Citation

 
Using model trees T1 and T2, we examined the efficiencies of PUZZLE 4.0.2 and PROTML 2.2 (with the quick add OTUs search and star decomposition options), as well as NJML+ with various modes.

To compare inferred trees with model trees, a program modified from PAML's evolver (Yang 1999Citation ) was used.

Randomly Generated Model Tree Simulation
For each case, 500 trees were generated randomly, and each possible tree was equally probable. To generate them, evolver was used with slight modification. Randomly generated ancestral amino acid sequences evolved along the randomly generated trees under the JTT or Dayhoff model. The number of OTUs was 20, and the sequence lengths were 300 and 600 amino acids. The expected branch lengths of randomly generated trees were varied for four cases: 0.03, 0.05, 0.07, and 0.10 amino acid substitutions per residue site.

The PUZZLE program often generates multifurcating trees as final trees. In fact, published results in 1996 and 1997 (Strimmer and von Haeseler 1996Citation ; Strimmer, Goldman, and von Haeseler 1997Citation ) were not obtained by the original PUZZLE program. We used both the PUZZLE program and the consensus program of PHYLIP (Felsenstein 1993Citation ) to obtain completely resolved bifurcating trees (A. von Haeseler, personal communication).

The other part of this simulation was the same as the fixed model tree simulation. Since the computational burden of ML computation is large for large amino acid sequence data sets, we used the parallel computing system Chiba City of the Argonne National Laboratory.

Computer Runtimes
Computer runtimes were measured for NJML+ with various modes. In the case of simulation with amino acid sequences, PUZZLE 4.0.2 and PROTML 2.2 (with the quick add OTUs search and star decomposition options) were also included for comparison. To measure runtimes, the clock( ) function of GNU CC, version 2.8.1, was used on a Pentium III machine. Sequences were generated by Seq-Gen and Pseq-Gen under the K2P and JTT models, respectively. A given tree was randomly generated by setting the mean branch length to 0.05. These operations were iterated 100 times for each case, and the average computer runtimes were used to estimate the performance.


    Results
 TOP
 Abstract
 Introduction
 Algorithm and Implementation of...
 Simulation Design
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Comparison of Computer Runtimes Among Various Programs
A comparison of the computer runtimes for various methods using nucleotide sequence data is shown in table 1 . The simulation results of NJML, fastDNAml, and DNAML are from Ota and Li (2000)Citation . All approximation modes of NJML+, especially modes A and B, showed reductions in computer runtime. Although mode B used the exact likelihood computation, it was almost twice as fast as NJML with the same threshold, and it was 10 times as fast as fastDNAml when the number of OTUs was 22.


View this table:
[in this window]
[in a new window]
 
Table 1 Computer Run Times (in seconds) for Nucleotide Sequence Data

 
In the case of amino acid sequence data, computer runtimes are compared in table 2 and figure 3 among NJML+, PUZZLE, and PROTML. Standard deviations of computer runtimes are also shown.


View this table:
[in this window]
[in a new window]
 
Table 2 Average Computer Run Times (in seconds) for Amino Acid Data

 


View larger version (15K):
[in this window]
[in a new window]
 
Fig. 3.—Comparison of average runtimes among three programs. PROTML-q and PROTML-s are PROTML 2.2 with the quick add OTUs search and star decomposition options, respectively; PUZZLE is PUZZLE 4.0.2; NJML+ A and NJML+ B are NJML+ with modes A and B, respectively. Error bars represent standard deviations for 100 runs, some of which are shifted to the right to avoid overlapping

 
The speed of NJML+ with mode A was remarkable, especially when the number of OTUs was large. In the case of 22 OTUs, the runtime was 5, 10, and 18 times as fast as PUZZLE (PZ), the star decomposition search in PROTML (PMs), and the quick add OTUs search in PROTML (PMq), respectively.

Since NJML+ with mode B has extremely large variances in runtimes, comparison among the three programs is not simple (see table 2 ); in particular, the standard deviation was 144.93 when the number of OTUs was 20. As shown in figure 3 , however, NJML+ with mode B had only a weak tendency to increase its runtime with the number of OTUs.

Performance of Approximation Modes (nucleotide sequence data)
Table 3 shows the performance of NJML+ with three approximation modes in comparison with the NJ method and the NJML method with the same threshold. Nucleotide sequence data were used here. Mode B performed almost as well as NJML in any case. Interestingly, in some cases, this approximation mode gave better results than NJML. Modes A and B gave similar results in many cases; however, the performance of mode A was considerably reduced in some cases. In particular, with model tree T1 and the K2P model, the performance of mode A was much poorer than that of NJ with a/b = 0.03/0.42.


View this table:
[in this window]
[in a new window]
 
Table 3 Proportions of Correctly Reconstructed Trees (nucleotide sequence data)

 
In model tree T2 with the JC model, NJ gave no result when a/b = 0.03/0.42 because in many replications no distance estimates were obtained (see table 3 ). In these cases, no result was obtained for both NJML and NJML+ with mode C. On the other hand, modes A and B gave results because CLUSTAL W was used to construct the initial NJ tree.

Performance of Approximation Modes (amino acid sequence data)
In table 4 , the performances of NJ, NJML+, PROTML, and PUZZLE are shown. Amino acid sequence data were used here. For model tree T1, NJML+ with mode B was the best, and NJML+ with mode A was the second best. For example, for the JTT model with a/b = 0.03/0.42, the proportions of correct trees obtained were 0.95 for NJML+ with mode B and 0.91 for NJML+ with mode A, but only 0.63 for NJ, 0.68 for PMq, 0.65 for PMs, and 0.74 for PZ. For model tree T2, NJML+ with mode B and mode A still performed better than the other methods except for PMs, although the differences in performance were smaller than those for model tree T1.


View this table:
[in this window]
[in a new window]
 
Table 4 Proportion of Correctly Reconstructed Trees (amino acid sequence data)

 
Surprisingly, when model tree T1 was used, the star decomposition search in PROTML showed poorer results than NJ except for three cases. On the other hand, when model tree T2 was used, the star decomposition search gave better results than all other methods except for two cases of NJML+ with modes A and B (table 4 ). In contrast, the quick add OTUs search in PROTML gave better results with model tree T1 than with model tree T2.

Efficiencies in Randomly Generated Model Trees with Amino Acid Sequence Data
The results of simulation with randomly generated model trees are shown in table 5 . In each case, 500 randomly generated trees were used as model trees (in total, 8,000 model trees were used). NJML+ always gave better results than the other methods except for one case in which it tied with NJ (Dayhoff model with 600 amino acids of amino acid sequence data).


View this table:
[in this window]
[in a new window]
 
Table 5 Proportion of Correctly Reconstructed Trees for Randomly Generated Trees

 
Note that in the case of = 0.1, the average branch length is large for trees having 20 OTUs because some pairwise distances are larger than 2. Interestingly, NJML+ gave good results even in this case.

An Example: Myosin Light Chain Sequences
Figure 4 shows two phylogenetic trees of myosin essential light chain sequences reconstructed using the NJ method and NJML+ with mode B. The number of sequences was 23, and the number of sites (the sequence length) was 187. The aligned data are from OOta and Saitou (1999)Citation . The threshold for NJML+ was 90% with n = 3. The JTT model was used.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 4.—Comparison between trees A and B, which were reconstructed with the NJ and NJML+ methods, respectively. Steps 1, 2, and 3 indicate branches evaluated by NJML+ with mode B in steps 1, 2, and 3, respectively. Other numbers by internal branches are bootstrap values for 1,000 and 100 replications for the NJ and NJML+ methods, respectively. The vertical bars represent tissues in which genes are expressed. The scale bars represent the numbers of substitutions per 100 amino acid sites

 
Since an initial NJ tree had seven internal branches with bootstrap values lower than the threshold (90%), NJML+ needed three steps (=the smallest integer not less than 7/3) to infer the final tree. In figure 4B, the branches reevaluated by NJML+ in the three steps are indicated by steps 1, 2, and 3, respectively. In each step, the likelihood statistics for different candidate trees are computed and can be used to test whether two candidate trees are significantly different.

There are three differences between the NJ tree (fig. 4A ) and the NJML+ tree (fig. 4B ).

  1. In tree A, MOHUA2 (human) and MORBLA (rabbit) are in one cluster, whereas in tree B, MORBLA is clustered with A23253 (mouse) and MORTA1 (rat). The latter clustering is in agreement with the traditional view of mammalian phylogeny. However, the difference between the two clusterings is not significant; the difference in log likelihood values in step 2 is 1.6 ± 3.2.
  2. In tree A, MOHU6M (human) is clustered with S13671 (bovine) and JX0215 (pigs), whereas in tree B, MOCHG2 (chicken) is clustered with S13671 and JX0215. Although the former clustering is more plausible in terms of vertebrate phylogeny, the latter is significantly better in terms of likelihood value (31.4 ± 15.1). This raises the possibility that MOHU6M (human) is not orthologous to S13671 (bovine) and JX0215 (pig).
  3. In tree A, the sister group to the slow skeletal/cardiac muscle and cardiac muscle clusters of proteins is the fast skeletal muscle cluster of proteins, whereas in tree B it is the smooth/nonmuscle cluster of proteins. However, the difference in log likelihood values (14.1 ± 8.8) is not significant, and future research is needed to resolve the phylogeny.

We also tried NJML+ with mode A for the same sequence data. The result was the same as tree B.


    Discussion
 TOP
 Abstract
 Introduction
 Algorithm and Implementation of...
 Simulation Design
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The simulation studies showed that for amino acid sequence data, the performance of NJML+ was better than or almost equal to those of other methods. Furthermore, in each approximation mode, the speed of NJML+ was much faster than those of other methods. In the case of nucleotide sequence data, although the reduction in computer runtime was considerable in approximate modes, some of the approximate modes were less efficient than NJML.

Table 2 shows that mode B of NJML+ led to complicated relationships between the number of OTUs and the corresponding runtime. This is probably because the convergent time in searching the ML value varied considerably with replicated sequence data. In other words, the size of the topology search space is highly dependent on the distribution of bootstrap values among internal branches, as suggested by Ota and Li (2000)Citation . On the other hand, in mode A, no iterative computation was carried out except in the final step, so the corresponding runtime and the number of OTUs were monotonously correlated.

It is obvious that Kimura's (1983)Citation correction is not appropriate when d >> 1.0, where d is the number of amino acid substitutions per site between two sequences. Actually, the NJ method could not recover any correct topology with Kimura's correction when model tree T2 was used with a/b = 0.03/0.42 (the expected number of substitutions per site between two most divergent sequences was b + b + a/2 + a/2 + b + b = 4b + a = 1.71). As a consequence, the results of NJML+ were poor in this case (data not shown). In practical analyses with amino acid data, however, diverged data with d >> 1.0 are rarely used. Better methods for estimating distances should improve the efficiency of NJML+. For example, the method of Ota and Nei (1994)Citation is a good candidate in terms of computational time.

We tried a version of NJML+ using protdist of PHYLIP instead of CLUSTAL W with amino acid sequence data. This version corresponded to NJML+ with mode B for nucleotide sequence data. In terms of efficiencies in reconstructing correct topologies, the performance was considerably better than that of NJML+ with CLUSTAL W (namely, mode C for amino acid sequence data). In terms of computer runtime, however, NJML+ with protdist was impracticable (data not shown).

The costly runtime with protdist arose from the computation of distance matrices with bootstrap resampling data. If we already have a bootstrap tree in Newick format (Felsenstein 1993Citation ), the NJML+ algorithm will improve tree reconstruction without this computational burden.

It may happen that at a step there are two or more candidate topologies that do not differ significantly in likelihood value. Since NJML+ uses a greedy search algorithm, such a situation may lead to a wrong result. However, one can reduce the chance of making an error by using the option mode of NJML+. For example, if we have m such candidate trees in step n, we can give those m trees to NJML+ as user-defined initial bootstrap trees. This is practically a beam search algorithm with m nodes under consideration at depth n (e.g., Kumar 1996Citation ; Rodin and Li 2000Citation ). In addition, if we encounter other multiple candidate trees whose log likelihoods are not significantly different, we can apply the same option. This option of NJML+ provides flexibility and makes this method distinct from simple greedy search algorithms.

In simulations with randomly generated model trees, NJ performed as well as PROTML and PUZZLE. This shows that NJ is quite robust for large amino acid sequence data sets. It is interesting that although NJ and PROTML with the star decomposition search use similar strategies, their results were very different. This may be an important difference between the minimum-evolution criterion (Saitou and Imanishi 1989Citation ; Rzhetsky and Nei 1992Citation ) and the ML criterion.

Since NJML+ elaborates the topology search space around an initial NJ tree with the ML method, it is not surprising that NJML+ gives better results than NJ. This shows an advantage in the strategy of NJML+.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Algorithm and Implementation of...
 Simulation Design
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
A computer software package will be available for distribution on the web site http://oota.uchicago.edu/njml_plus/njml_plus.html.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Algorithm and Implementation of...
 Simulation Design
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
We thank Dr. Rick Stevens for permission to use a parallel computer system in the Argonne National Laboratory. Drs. Masami Hasegawa and Joe Felsenstein kindly gave us permission to use the source code of their programs for our programs. Dr. Julie Thompson-Maaloum also allowed us to use part of the source code of CLUSTAL W. Dr. Don Gilbert not only allowed us to use his programs, but also advised us on Java environment. This work was supported by NIH grant GM30998.


    Footnotes
 
Naruya Saitou, Reviewing Editor

1 Keywords: tree reconstruction topology search approximate likelihood neighbor joining likelihood test Back

2 Address for correspondence and reprints: Wen-Hsiung Li, Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, Illinois 60637. whli{at}uchicago.edu . Back


    References
 TOP
 Abstract
 Introduction
 Algorithm and Implementation of...
 Simulation Design
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Adachi J., M. Hasegawa, 1996 MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood Comput. Sci. Monogr 28:1-150

    Dayhoff M. O., 1978 Survey of new data and computer methods of analysis Pp. 2–8 in M. O. Dayhoff, ed. Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Springs, Md

    Doolittle W. F., 1999 Phylogenetic classification and the universal tree Science 284:2124-2193[Abstract/Free Full Text]

    Felsenstein J., 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach J. Mol. Evol 17:368-376[ISI][Medline]

    ———. 1993 PHYLIP: phylogeny inference package. Version 3.5c Distributed by the author, Department of Genetics, University of Washington, Seattle

    Fukami-Kobayashi K., Y. Tateno, 1991 Robustness of maximum likelihood tree estimation against different patterns of base substitutions J. Mol. Evol 32:79-91[ISI][Medline]

    Gilbert D. G., 1997 Phylodendron, Java software for phylogenetic tree drawing Indiana University, http://iubio.bio.indiana.edu/treeapp/

    Goldman N., J. L. Thorne, D. T. Jones, 1996 Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses J. Mol. Biol 263:196-208[ISI][Medline]

    Grassly N. C., J. Adachi, A. Rambaut, 1997 PSeq-Gen: an application for the Monte Carlo simulation of protein sequence evolution along phylogenetic trees Comput. Appl. Biosci 13:559-560[Medline]

    Gribaldo S., P. Cammarano, 1998 The root of the universal tree of life inferred from anciently duplicated genes encoding components of the protein-targeting machinery J. Mol. Evol 47:508-516[ISI][Medline]

    Hasegawa M., M. Fujiwara, 1993 Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny Mol. Phylogenet. Evol 2:1-5[Medline]

    Hasegawa M., H. Kishino, N. Saitou, 1991 On the maximum likelihood method in molecular phylogenetics J. Mol. Evol 32:443-445[ISI][Medline]

    Hasegawa M., H. Kishino, T. Yano, 1985 Dating of the human-ape splitting by a molecular clock of mitochondrial DNA J. Mol. Evol 22:160-174[ISI][Medline]

    Huelsenbeck J. P., 1995 The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor joining Mol. Biol. Evol 12:843-849[Abstract]

    Jones D. T., W. R. Taylor, J. M. Thornton, 1992 The rapid generation of mutation data matrices from protein sequences Comput. Appl. Biosci 8:275-282[Abstract]

    Jukes T. H., C. R. Cantor, 1969 Evolution of protein molecules Pp. 21–132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York

    Kimura M., 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences J. Mol. Evol 16:111-120[ISI][Medline]

    ———. 1983 The neutral theory of molecular evolution Cambridge University Press, Cambridge, England

    Kishino H., M. Hasegawa, 1989 Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea J. Mol. Evol 29:170-179[ISI][Medline]

    Kishino H., T. Miyata, M. Hasegawa, 1990 Maximum likelihood inference of protein phylogeny and the origin of chloroplasts J. Mol. Evol 31:151-160[ISI]

    Kuhner M., J. Felsenstein, 1994 A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates Mol. Biol. Evol 11:459-468[Abstract]

    Kumar S., 1996 A stepwise algorithm for finding minimum evolution trees Mol. Biol. Evol 13:584-593[Abstract]

    Lio P., N. Goldman, J. L. Thorne, D. T. Jones, 1998 PASSML: combining evolutionary inference and protein secondary structure prediction Bioinformatics 14:726-733[Abstract]

    Ota S., N. Saitou, 1999 Phylogenetic relationship of muscle tissues deduced from superimposition of gene trees Mol. Biol. Evol 16:856-867[Abstract]

    Ota S., W.-H. Li, 2000 NJML: a hybrid algorithm for the neighbor-joining and maximum likelihood methods Mol. Biol. Evol 17:1401-1409[Abstract/Free Full Text]

    Ota T., M. Nei, 1994 Estimation of the number of amino acid substitutions per site when the substitution rate varies among sites J. Mol. Evol 38:642-643[ISI]

    Philippe H., P. Forterre, 1999 The rooting of the universal tree of life is not reliable J. Mol. Evol 49:509-523[ISI][Medline]

    Rambaut A., N. C. Grassly, 1997 Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees Comput. Appl. Biosci 13:235-238[Abstract]

    Rodin A., W.-H. Li, 2000 A rapid heuristic algorithm for finding minimum evolution trees Mol. Phylogenet. Evol 6:173-179

    Rzhetsky A., M. Nei, 1992 Statistical properties of the ordinary least-squares, generalized least-squares, and minimum-evolution methods of phylogenetic inference J. Mol. Evol 35:367-375[ISI][Medline]

    Saitou N., T. Imanishi, 1989 Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree Mol. Biol. Evol 6:514-525[Free Full Text]

    Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]

    Sogin M. L., G. Hinkle, D. D. Leipe, 1993 Universal tree of life Nature 362:795

    Strimmer K., N. Goldman, A. von Haeseler, 1997 Bayesian probabilities and quartet puzzling Mol. Biol. Evol 14:210-211[Free Full Text]

    Strimmer K., A. von Haeseler, 1996 Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies Mol. Biol. Evol 13:964-969[Free Full Text]

    Tateno Y., N. Takezaki, M. Nei, 1994 Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site Mol. Biol. Evol 11:261-277[Abstract]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    Yang Z., 1999 PAML manual Department of Biology, Galton Laboratory, University College London, London

Accepted for publication June 19, 2001.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (5)
Request Permissions
Google Scholar
Articles by Ota, S.
Articles by Li, W.-H.
PubMed
PubMed Citation
Articles by Ota, S.
Articles by Li, W.-H.