*Equipe Phylogénie et Evolution moléculaires (UPRESA CNRS 8080), Université Paris-Sud, Orsay, France;
and
Laboratoire de Biologie des Protistes (UPRESA CNRS 6023), Université Clermont-Ferrand 2, Aubière, France
The molecular phylogeny of eukaryotes was first based on the comparison of the small-subunit ribosomal RNA (SSU rRNA) and quickly became a paradigm (Sogin 1991
). However, serious criticisms have recently been raised against it. In particular, it has been suggested that the various protist lineages that emerged first (e.g., microsporidia, trichomonads, mycetozoans) were misplaced because of the long-branch attraction (LBA) artifact (Philippe and Adoutte 1998
; Philippe and Laurent 1998
; Germot and Philippe 1999
; Moreira, Le Guyader, and Philippe 1999
; Roger et al. 1999
; Stiller and Hall 1999
). Indeed, simulation studies (unpublished data) showed that when a distant outgroup was used, all the standard methods of phylogenetic reconstruction artifactually inferred a phylogeny with an asymmetrical base containing the fast-evolving lineages, leading to a tree shape very similar to the rRNA one.
When the substitution rates are considerably variable for different lineages, the use of a model that reflects the actual substitution probability is of utmost importance for obtaining a correct tree topology (Lockhart et al. 1994
; Swofford et al. 1996
; Yang 1996
; Takezaki and Gojobori 1999
). In the case of rRNA-based eukaryotic phylogeny, the first model violation noticed was the variation of G + C content among species (Hasegawa and Hashimoto 1993
). Yet, the overall picture of the rRNA tree remained unchanged when only species with similar G + C contents were included (Leipe et al. 1993
) or when methods less sensitive to this bias, such as transversion parsimony (Woese et al. 1991
) or log-Det distance (Lockhart et al. 1994
), were applied. Variation in evolutionary rates across sites (RAS) is a further complexity of sequence evolution which also has a significant effect on evolutionary tree building (Olsen 1987
; Yang 1996
). For instance, the late emergence of fast-evolving lineages such as the nucleomorph of chloroarachniophytes (Van de Peer et al. 1996
) or microsporidia (Peyretaillade et al. 1998
) can be retrieved only when RAS models are assumed. Finally, the existence of covarion structure constitutes a further model violation that has just begun to be considered (Lockhart et al. 1998
; Lopez, Forterre, and Philippe 1999
).
In this study, we sequenced the complete large-subunit (LSU) rRNA of Trichomonas vaginalis. We concatenated this sequence with that of the SSU rRNA for this taxon and compared these data with similar data from other eukaryotes. Using these data, we investigated the robustness of eukaryotic phylogeny with consideration for the model of sequence evolution used. The protocol for obtaining the LSU rRNA using several universal and specific primers was the same as that previously described (Germot, Philippe, and Le Guyader 1996
), except that we sequenced both strands here. The size of the LSU rRNA was approximately 2,760 nt (the beginning and end were identified only by homology with other rRNAs), and its G + C content was 46%. The sequence of T. vaginalis was aligned manually with the available complete LSU rRNA sequences of 120 eukaryotes and of 15 archaebacteria, using the ED program of the MUST package (Philippe 1993
). To increase the sequence lengths, we aligned SSU rRNA sequences for the same set of organisms whenever possible, and only the fusion of SSU and LSU rRNA was used for further analysis. A subset of only 32 species, encompassing as much biodiversity as possible (i.e., by eliminating numerous fungal, plant, and metazoan species), was used in order to reduce the computing time of our maximum-likelihood (ML) analysis. Only positions that could be unambiguously aligned were selected, leaving 875 and 1,604 positions for SSU and LSU rRNA, respectively. The alignment is available on request or at http://bufo.bc4.u-psud.fr.
The G + C content of rRNA appeared to be significantly heterogeneous between species at the 5% level, as measured by the 2 statistics implemented in PUZZLE (Strimmer and von Haeseler 1996
). Rather than coping with this model violation using a nonhomogeneous model (e.g., Galtier and Gouy 1998
), only transversions were considered, since the ratio of purines to pyrimidines appeared to be homogeneous for all of the considered organisms, as previously observed for prokaryotic rRNA (Woese et al. 1991
). This allowed us to use complex RAS models, such as those using a
law distribution. Moreover, transitions were more saturated than transversions in a dot plot comparison (data not shown) and therefore were less likely to contain a reliable ancient phylogenetic signal.
The ML tree that was inferred with a model assuming equal rates across sites was very similar to the ones inferred from the SSU rRNA only (fig. 1A
). The three "archezoan" lineages (trichomonads, microsporidia, and diplomonads) emerged first, followed by euglenozoans and mycetozoans, and then by the crown. Within the tree, the monophyly of the various phyla was recovered with good statistical support. The main exception was the artifactual grouping of the nucleomorphs of chloroarachniophytes (Pedinomonas) and of cryptophytes (Guillardia), instead of being sister groups of chlorophytes and rhodophytes, respectively (Van de Peer et al. 1996
). As with several early branching lineages, the nucleomorph of chlorarachniophytes displayed a long branch, which suggested a possible misplacement due to the LBA artifact.
|
|
Because parts of the rRNA tree (fig. 1A
) are clearly in error (e.g., the artifactual grouping of red and green nucleomorphs), the decrease in rejection of the alternative tree as more complex models are incorporated into the analyses may reflect only this difference between the trees. We therefore tried to determine which portions of the alternative tree (fig. 1B
) were making a contribution to the relative improvement in likelihood score as models were changed. As expected, each constraint seemed to contribute to a small fraction of the difference in lnL between the revised tree and the classical tree. Unfortunately, the impact of the model used was unclear (table 1
and data not shown). For example, the enforcement of the correct position of the two nucleomorphs is less rejected with the E model (-0.21 SE) than with the model (-0.40 SE). In contrast, enforcing the sister group relationship of fungi and microsporidia is much easier with the
model (-0.49 SE) than with the E model (-1.88 SE). More detailed analysis was difficult because of stochastic effects and interdependence of the constraints.
The strong rejection of the revised tree (the tree specified in fig. 1B
) under the E model and not under the model suggests the significance of the presence of rate heterogeneity and/or covarion structure in the data. In the covarion model, sites that are variable in some parts of the underlying tree are invariable in others, and vice versa (Fitch and Markowitz 1970
). Indeed, as we show below, a concern about phylogenetic inference from these data is the demonstration that the rDNA data do appear to have evolved under a covarion model, and one which is nonstationary. If proportions of invariable sites increase independently in distantly related lineages, then even ML tree building that allows for positional heterogeneity is expected to become inconsistent (Lockhart et al. 1996, 1998
).
We applied the method of Lopez, Forterre, and Philippe (1999)
to test for the existence of a covarion structure. Since the method requires a large number of observed substitutions to be efficient, we used an expanded alignment of 60 species (divided into three groups, Archaea, basal eukaryotes, and "crown" eukaryotes) and both transitions and transversions (similar results were obtained when only transversions were used; data not shown). The matrix appeared to be significantly heterogeneous at the 1% level (the observed
2 was 3,152.10, whereas the limit was 2,394.76), demonstrating the existence of a covarion structure in rRNA. To reduce the model violations, we therefore removed the 153 positions that appeared to be significantly heterogeneous at the 5% level. These positions mainly corresponded to positions that are varied in Archaea and not in Eukaryota (38%) and to positions that are varied in the crown groups and not in Archaea or in sequences present in the asymmetrical base (27%). The resulting matrix, containing 2,326 positions, was not heterogeneous. Because of the lack of a covarion structure, the
model likely fits the data much better, although it is not possible to use a likelihood ratio test. Interestingly, with this homogeneous matrix, the tree of figure 1B
is not significantly rejected (-1.35 SE when compared with the tree of fig. 1A
). This strong decrease in the rejection of the revised tree is not observed for random trees (from -23.77 SE to -23.05 SE), indicating again that this rejection is due to the existence of model violations.
In conclusion, support for the classical rRNA tree (fig. 1A ) obtained using the E model appears to be due to model violations. Our results reported here and elsewhere suggest that the long branches of the diplomonads, microsporidia, and trichomonads have been falsely attracted toward the very long branch of the archaebacterial outgroup (fig. 1B ) as a consequence of different G + C contents, rate heterogeneity, and an increased proportion of variable positions in these ingroup taxa when compared with those taxa in the crown groups. Our observation that the tree in figure 1B could not be excluded as having a worse fit than that in figure 1A when consideration was given to the nonstationary property of biological sequence evolution is a concern for eukaryote sequence studies relying on stationary analysis models.
However, it was not possible to find good statistical support for the widely accepted grouping of microsporidia with fungi (Hirt et al. 1999
), despite the use of very long sequences (the fusion of SSU and LSU rRNA). This lack of resolution is likely due to the presence of both very short and very long branches within the same tree (see fig. 1B
). Therefore, the placement of fast-evolving lineages within the eukaryotic phylogeny is impossible because of the wide range of evolutionary rates in rRNA. To confidently locate these phyla, the use of other phylogenetic markers that evolved slowly in the corresponding lineages is probably the only reliable solution, as has been shown for microsporidia with tubulins (Keeling and Doolittle 1996
) and RNA polymerase (Hirt et al. 1999
).
![]() |
Acknowledgements |
---|
![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: Trichomonas vaginalis
rRNA
phylogeny
eukaryotic evolution
rate across sites
covarion
2 Address for correspondence and reprints: Hervé Philippe, Equipe Phylogénie et Evolution moléculaires (UPRESA CNRS 8080), Bâtiment 444, Université Paris-Sud, 91405 Orsay cedex, France. E-mail: herve.philippe{at}bc4.u-psud.fr
![]() |
literature cited |
---|
![]() ![]() ![]() |
---|
Adachi, J., and M. Hasegawa. 1996. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monogr. 28:1150.
Baldauf, S. L., and W. F. Doolittle. 1997. Origin and evolution of the slime molds (Mycetozoa). Proc. Natl. Acad. Sci. USA 94:1200712012.
Burger, G., D. Saint-Louis, M. W. Gray, and B. F. Lang. 1999. Complete sequence of the mitochondrial DNA of the red alga Porphyra purpurea. Cyanobacterial introns and shared ancestry of red and green algae. Plant Cell 11:16751694.
Fitch, W. M., and E. Markowitz. 1970. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4:579593.[ISI][Medline]
Galtier, N., and M. Gouy. 1998. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15:871879.[Abstract]
Germot, A., and H. Philippe. 1999. Critical analysis of eukaryotic phylogeny: a case study based on the HSP70 family. J. Eukaryot. Microbiol. 46:116124.[ISI][Medline]
Germot, A., H. Philippe, and H. Le Guyader. 1996. Presence of a mitochondrial-type 70-kDa heat shock protein in Trichomonas vaginalis suggests a very early mitochondrial endosymbiosis in eukaryotes. Proc. Natl. Acad. Sci. USA 93:1461414617.
Hasegawa, M., and T. Hashimoto. 1993. Ribosomal RNA trees misleading? Nature 361:23.
Hashimoto, T., L. B. Sanchez, T. Shirakura, M. Muller, and M. Hasegawa. 1998. Secondary absence of mitochondria in Giardia lamblia and Trichomonas vaginalis revealed by valyl-tRNA synthetase phylogeny. Proc. Natl. Acad. Sci. USA 95:68606865.
Hirt, R. P., J. M. Logsdon Jr., B. Healy, M. W. Dorey, W. F. Doolittle, and T. M. Embley. 1999. Microsporidia are related to fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc. Natl. Acad. Sci. USA 96:580585.
Keeling, P. J., J. A. Deane, and G. I. McFadden. 1998. The phylogenetic position of alpha- and beta-tubulins from the Chlorarachnion host and Cercomonas (Cercozoa). J. Eukaryot. Microbiol. 45:561570.[ISI][Medline]
Keeling, P. J., and W. F. Doolittle. 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. Mol. Biol. Evol. 13:12971305.
Leipe, D. D., J. H. Gunderson, T. A. Nerad, and M. L. Sogin. 1993. Small subunit ribosomal RNA+ of Hexamita inflata and the quest for the first branch in the eukaryotic tree. Mol. Biochem. Parasitol. 59:4148.[ISI][Medline]
Lockhart, P. J., A. W. Larkum, M. Steel, P. J. Waddell, and D. Penny. 1996. Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. Proc. Natl. Acad. Sci. USA 93:19301934.
Lockhart, P. J., M. A. Steel, A. C. Barbrook, D. Huson, M. A. Charleston, and C. J. Howe. 1998. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol. Biol. Evol. 15:11831188.[Abstract]
Lockhart, P., M. Steel, M. Hendy, and D. Penny. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11:605612.
Lopez, P., P. Forterre, and H. Philippe. 1999. The root of the tree of life in the light of the covarion model. J. Mol. Evol. 49:496508.[ISI][Medline]
Maddison, W. P., and D. R. Maddison. 1992. MacCladeanalysis of phylogeny and character evolution. Sinauer, Sunderland, Mass.
Moreira, D., H. Le Guyader, and H. Philippe. 1999. Unusually high evolutionary rate of the elongation factor 1 alpha genes from the Ciliophora and its impact on the phylogeny of eukaryotes. Mol. Biol. Evol. 16:234245.[Abstract]
Olsen, G. 1987. Earliest phylogenetic branching: comparing rRNA-based evolutionary trees inferred with various techniques. Cold Spring Harb. Symp. Quant. Biol. LII:825837.
Peyretaillade, E., C. Biderre, P. Peyret, F. Duffieux, G. Metenier, M. Gouy, B. Michot, and C. P. Vivares. 1998. Microsporidian Encephalitozoon cuniculi, a unicellular eukaryote with an unusual chromosomal dispersion of ribosomal genes and a LSU rRNA reduced to the universal core. Nucleic Acids Res. 26:35133520.
Philippe, H. 1993. MUST, a computer package of management utilities for sequences and trees. Nucleic Acids Res. 21:52645272.[Abstract]
Philippe, H., and A. Adoutte. 1998. The molecular phylogeny of Eukaryota: solid facts and uncertainties. Pp. 2556 in G. Coombs, K. Vickerman, M. Sleigh, and A. Warren, eds. Evolutionary relationships among Protozoa. Chapman & Hall, London.
Philippe, H., and J. Laurent. 1998. How good are deep phylogenetic trees? Curr. Opin. Genet. Dev. 8:616623.[ISI][Medline]
Roger, A. J., O. Sandblom, W. F. Doolittle, and H. Philippe. 1999. An evaluation of elongation factor 1 alpha as a phylogenetic marker for eukaryotes. Mol. Biol. Evol. 16:218233.[Abstract]
Sogin, M. L. 1991. Early evolution and the origin of eukaryotes. Curr. Opin. Genet. Dev. 1:457463.[Medline]
Stiller, J., and B. Hall. 1999. Long-branch attraction and the rDNA model of early eukaryotic evolution. Mol. Biol. Evol. 16:12701279.
Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964969.
Swofford, D. L. 1993. PAUP: phylogenetic analysis using parsimony. Version 3.1.1. Illinois Natural History Survey, Champaign.
Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pp. 407514 in D. M. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. Sinauer, Sunderland, Mass.
Takezaki, N., and T. Gojobori. 1999. Correct and incorrect vertebrate phylogenies obtained by the entire mitochondrial DNA sequences. Mol. Biol. Evol. 16:590601.[Abstract]
Van de Peer, Y., S. A. Rensing, U. G. Maier, and R. De Wachter. 1996. Substitution rate calibration of small subunit ribosomal RNA identifies chlorarachniophyte endosymbionts as remnants of green algae. Proc. Natl. Acad. Sci. USA 93:77327736.
Woese, C., L. Achenbach, P. Rouviere, and L. Mandelco. 1991. Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts. Syst. Appl. Microbiol. 14:364371.[ISI][Medline]
Yang, Z. 1996. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11:367370.[ISI]