Department of Biochemistry and Molecular Biophysics, University of Arizona, Tucson
Correspondence: E-mail: daubin{at}email.arizona.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: bacterial phylogeny maximum likelihood quartet trees lateral gene transfer
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
One approach proposed to circumvent some of the difficulties in reconstructing relationships among prokaryotes is to compare all gene trees, with the hope that a consensus tree will emerge. Concomitantly, any conflicting phylogeniesan expected result of LGTcan be identified. Such attempts have reached different conclusions about the degree of LGT shaping prokaryotic genomes (Jain, Rivera, and Lake 1999; Nesbo, Boucher, and Doolittle 2001; Brochier et al. 2002; Daubin, Gouy, and Perrière 2002; Zhaxybayeva and Gogarten 2002; Daubin, Moran, and Ochman, 2003). Among them, the application of the quartet-mapping method (Strimmer and Von Haeseler 1997; Nieselt-Struwe and Von Haeseler 2001) has provided striking support for the pervasiveness of LGT, leading some authors to abandon the notion of treelike evolution in prokaryotes.
Quartet mapping is a likelihood method originally proposed as a tool for analyzing the phylogenetic content of an n-sequence alignment by extracting all combinations of four sequences ("quartets") and evaluating the likelihood of the three possible topologies for each quartet (Strimmer and Von Haeseler 1997; Nieselt-Struwe and Von Haeseler 2001). Following these authors, the posterior probability (pi) of each of the three possible topologies (T1, T2 and T3) for a given quartet is computed as follows:
|
In their original paper, Strimmer and Von Haesler (1997) applied this method to test the ability of ribosomal DNA to retrieve the relationships of myriapods and chelicerates within the Arthropoda by using quartets of different representatives of these groups. In this case, the grouping of myriapods and chelicerates was supported by over 90% of the quartets, suggesting that the alignment was sufficiently informative to resolve the relationships among these higher taxa. However, more than 7% of the quartet alignments supported an alternative topology, indicating that depending on the particular sequences included in a quartet, a substantial proportion of quartet alignments will artifactually generate different trees.
The quartet-mapping method has been recently modified to analyze the extent of LGT in prokaryotes (Nesbo, Boucher, and Doolittle 2001; Zhaxybayeva and Gogarten 2002). Here, rather than evaluating the phylogenetic information contained in a multiple sequence alignment of many taxa for a single gene, these studies employ groups of four fully sequenced genomes to assess the congruency of hundreds of orthologous gene alignments with one another. In comparisons including distantly related prokaryotic lineages, each of the three possible quartet topologies was supported by approximately equal numbers of genes, which was taken as an indication that LGT has occurred at such a high frequency that no consensus tree exists. However, previous analyses have shown that quartets, even in recent groups such as mammals, frequently give strong support to arbitrary topologies, depending on the sequences taken to represent each mammalian order (Philippe and Douzery 1994; Adachi and Hasegawa 1996). Moreover, the original application of quartet mapping to arthropods showed that genes considered as reliable phylogenetic markers, such as ribosomal DNA, can yield a significant proportion of alignments supporting alternate topologies, even in the absence of LGT. Therefore, given the antiquity of bacterial phyla, protein sequences that exhibit higher levels of divergence than rDNA might produce results that could be incorrectly attributed to LGT. In this paper, we show that the application of quartet mapping can impart too much confidence to trees that garner no statistical support by other tests, and thus, the amount of LGT inferred from this method can be overestimated.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Most applications of quartet mapping ignore these statistics and adopt a fixed threshold, whereby one topology is judged as being strongly supported if its likelihood is equal to or greater than 100-fold higher than the sum of the two other topologies (see equation 1). The application of fixed thresholds is particularly problematic because the SE of the log-likelihood estimates are dependent on the particular alignment and often much higher than the fixed thresholds applied in most studies. The SE of log-likelihood estimates typically reach values greater than 5 (e5 148), and therefore, for a given alignment, the application of a threshold of 100 could confer a posterior probability greater than 0.99 to a topology that is not statistically different from the two others.
To assess the disparity between fixed and variable (i.e., dependent on SE) thresholds, we computed the difference between the log-likelihood of the best-supported topology and that of each of the two other topologies, as well as the SE of the log-likelihood estimates for families of orthologous genes from each of the two quartets of bacterial taxa (enterics and Streptococci). The dotted line in figure 1 shows the fixed threshold typically applied in the quartet-mapping method; therefore, all alignments represented by points above this line would be considered significantly supporting one of the three possible topologies. The solid line represents the variable threshold that takes the SE into account (at the 5% level). All points residing in the shaded area between the solid and dotted lines represent topologies that are falsely supported when using a fixed threshold.
|
These results have broad implications for the interpretation of genome evolution in bacteria in that evidence of conflicting topologies among bacterial genes has often been interpreted as resulting from LGT. In fact, the impact of LGT deduced from methods based on fixed and variable thresholds lead to different conclusions about the extent of LGT affecting bacterial genomes (table 1): the number of alignments "significantly" supporting an alternate topology can be severalfold higher using quartet mapping when compared with the results of both the KH test and the SH test.
|
As revealed by the difference in the results for the enteric bacteria and for the Streptococci (fig. 1), the estimation of LGT using quartet mapping is greater when comparing more distantly related species. Moreover, within a cluster of species, genes yielding phylogenies with short internal branches tend to support the hypothesis of LGT based on quartet mapping, which leads to the overestimation of LGT even when considering bacterial species of the same genus. Although our analyses viewed the Streptococci, which differ by at most 6% in rDNA sequences, as being relatively distantly related, previous studies have applied quartet-mapping methods to representatives of different phyla, which are substantially more divergent and thus more likely to yield a higher proportion of artifactual LGT. Considering quartets of such distantly related species increases the amount of phylogenetic incongruence caused by reconstruction artifacts, misalignment and sampling bias (Moreira and Philippe 2000; Eisen, 2000; Zwickl and Hillis 2002). As in bacteria, such tree incongruencies have been observed in numerous plant and animal groups, but these inconsistencies have led molecular phylogeneticists to question and refine their methods rather than to invoke lateral gene transfer. The antiquity of Bacteria introduces an exceptional challenge to the reconstruction of molecular phylogenies, and although LGT is certainly a major force molding prokaryotic genomes, its inference must rely on careful phylogenetic analysis with large taxon sampling and appropriate tests of incongruence.
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adachi, J., and M. Hasegawa. 1996. Instability of quartet analyses of molecular sequence data by the maximum likelihood method: the Cetacea/Artiodactyla relationships. Mol. Phylogenet. Evol. 6:72-76.[CrossRef][ISI][Medline]
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.
Brochier, C., E. Bapteste, D. Moreira, and H. Philippe. 2002. Eubacterial phylogeny based on translational apparatus proteins. Trends Genet. 18:1-5.[ISI][Medline]
Brochier, C., H. Philippe, and D. Moreira. 2000. The evolutionary history of ribosomal protein RpS14: horizontal gene transfer at the heart of the ribosome. Trends Genet. 16:529-533.[CrossRef][ISI][Medline]
Brown, J. R., C. J. Douady, M. J. Italia, W. E. Marshall, and M. J. Stanhope. 2001. Universal trees based on large combined protein sequence data sets. Nat. Genet. 28:281-285.[CrossRef][ISI][Medline]
Daubin, V., M. Gouy, and G. Perriere. 2002. A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Res. 12:1080-1090.
Daubin, V., N. Moran, and H. Ochman. 2003. Phylogenetics and the cohesion of bacterial genome. Science 301:829-832.
Doolittle, W. F. 1999. Lateral genomics. Trends Cell Biol. 9:M5-M9.[CrossRef][ISI][Medline]
Eisen, J. A. 2000. Horizontal gene transfer among microbial genomes: new insights from complete genome analysis. Curr. Opin. Genet. Dev. 10:606-611.[CrossRef][ISI][Medline]
Higgins, D. G., J. D. Thompson, and T. J. Gibson. 1996. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266:383-402.[ISI][Medline]
Jain, R., M. C. Rivera, and J. A. Lake. 1999. Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl. Acad. Sci. USA 96:3801-3806.
Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J. Mol. Evol. 2:170-179.
Moreira, D., and H. Philippe. 2000. Molecular phylogeny: pitfalls and progress. Int. Microbiol. 3:9-16.[Medline]
Nesbo, C. L., Y. Boucher, and W. F. Doolittle. 2001. Defining the core of nontransferable prokaryotic genes: the euryarchaeal core. J. Mol. Evol. 53:340-350.[CrossRef][ISI][Medline]
Nieselt-Struwe, K., and A. von Haeseler. 2001. Quartet-mapping, a generalization of the likelihood-mapping procedure. Mol. Biol. Evol. 18:1204-1219.
Ochman, H., J. G. Lawrence, and E. A. Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405:299-305.[CrossRef][ISI][Medline]
Philippe, H., and E. Douzery. 1994. The pitfalls of molecular phylogeny based on four species as illustrated by the Cetacea/Artiodactyla relationships. J. Mam. Evol. 2:133-152.
Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:1114-1116.
Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964-969.
Strimmer, K., and A. von Haeseler. 1997. Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc. Natl. Acad. Sci. USA 94:6815-6819.
Zhaxybayeva, O., and J. P. Gogarten. 2002. Bootstrap, Bayesian probability and maximum likelihood mapping: exploring new tools for comparative genome analyses. BMC Genomics 3:4.[CrossRef][Medline]
Zwickl, D. J., and D. M. Hillis. 2002. Increased taxon sampling greatly reduces phylogenetic error. Syst Biol. 51:588-598.[CrossRef][ISI][Medline]
|