* Department of Biosystems Science, Graduate University for Advanced Studies (Sokendai), Hayama, Kanagawa, Japan; The Institute of Statistical Mathematics, Minato-ku, Tokyo, Japan;
The Rockefeller University;
Institute of Biological Sciences, University of Tsukuba, Tsukuba, Japan
Correspondence: E-mail: hashi{at}biol.tsukuba.ac.jp.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Diplomonadida Parabasala eukaryote evolution root maximum likelihood combined phylogeny
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Analysis of increasing numbers of sequences from more and more species resulted in trees that often resolved the relationships within individual lineages well, but more distant, deeper relationships were more often contradictory than not. Recent developments in our understanding of the processes of molecular evolution and the development of more discriminating techniques of sequence analysis uncovered several reasons for the difficulty in attaining the desired goal with single sequences (Philippe et al. 2000; Gribaldo and Philippe 2002). Among these are the progressive loss of phylogenetic information resulting from mutational saturation of diverging sequences, the long branch attraction (LBA) artifact of phylogenetic reconstruction (Felsenstein 1978), and the failure to model the group-specific and species-specific differences in the evolution of different positions of the macromolecules studied. Lateral gene transfers (LGT) have also been recognized recently to contribute to the discordance of different gene trees (Richards et al. 2003).
To overcome the lack of resolution in single-gene phylogenies, various extensive analyses based on combined data sets with multiple genes have recently been performed, and a monophyletic origin of each of the higher-order groups, Opisthokonta (Metazoa + Fungi/Microsporidia), Amoebozoa (Lobosa + Conosa), Plantae (Viridiplantae + Rhodophyta + Glaucophyta), Euglenozoa + Heterolobosea, and Alveolata + stramenopiles has been established (Moreira, Le Guyader, and Philippe 2000; Baldauf et al. 2000; Arisue et al. 2002a, 2002b; Bapteste et al. 2002). In addition to these groups, the presence of several other higher-order groups has also been suggested by molecular and/or morphological findings (for review see Baldauf 2003). The group Cercozoa was supported by the actin and SSUrRNA phylogenies (Keeling 2001; Cavalier-Smith and Chao 2003a, 2003b) and by a shared insertion in the polyubiquitin genes (Archibald et al. 2003). The "excavate taxa" was proposed as a putative monophyletic or paraphyletic group (O'Kelly and Nerad 1999; Simpson and Patterson 1999), which includes organisms possessing a vental feeding groove that collects suspended particles driven into it by the beating of a posterior flagellum. Cavalier-Smith (2002) further proposed a larger group, "Excavata," based on an unrooted SSUrRNA tree and morphological considerations. Excavata comprises Metamonada (including Diplomonadida), Parabasala, Percolozoa (including Heterolobosea), Euglenozoa, and Loukozoa (including Oxymonadida, Trimastix, Malawimonas, Carpediomonas, Jakobea). To date, however, examination of the molecular phylogenies of SSUrRNA and tubulins have not supported either monophyly or paraphyly of Excavata with any statistical confidence (Dacks et al. 2001; Edgcomb et al. 2001; Silberman et al. 2002; Simpson et al. 2002).
Only small numbers of higher-order groups are present in the tree of Eukaryota as mentioned above, and an overall picture of biodiversity of Eukaryota seems to be rather good and comprehensive (Cavalier-Smith 2004). However, the phylogenetic relationships of the higher-order groups are still uncertain, and many alternative possibilities still exist regarding the root of the tree of Eukaryota. On the basis of distribution of the dihydrofolate reductase (DHFR) and thymidylate synthase (TS) fused gene, Stechmann and Cavalier-Smith (2002) proposed that the root is likely to be located between Opisthokonta and the others. Later they argued that the root should be located between the bikonts and Opisthokonta/Amoebozoa (Stechmann and Cavalier-Smith 2003a), together with independent lines of evidence for other gene fusion events in the pyrimidine biosynthetic pathway. Particular attention was paid also to the presence of a fusion between the genes, carbamoyl-phosphate synthetase (CPS) II that are composed of glutamine amidotransferase (GAT), and the CPS domains, dihydroorotase (DHO), and aspartate carbamoyltransferase (ACT), which is exclusively found in Metazoa, Fungi, and the amoebozoan, Dictyostelium discoideum (Nara, Hashimoto, and Aoki 2000).
In the present study, in order to obtain a robust resolution for the evolutionary relationship among major eukaryotic groups and to gain a better insight into the possible root of the eukaryotic tree, we performed combined maximum likelihood (ML) analyses based on 24 genes concerning the relationships among seven major eukaryotic groups (Opisthokonta, Amoebozoa, Plantae, Euglenozoa/Heterolobosea, Alveolata/stramenopiles, Diplomonadida, and Parabasala) using an outgroup for rooting the tree. For this purpose we cloned and sequenced the genes from several protists coding for isoleucyl-tRNA and valyl-tRNA synthetases (IleRS, ValRS), cytosolic-type heat shock protein 90 (HSP90c), and the largest subunit of RNA polymerase II (RPB1). Analysis of all selected sites from original alignments for the 24 genes including two rRNAs was strongly affected by the LBA artifact, significantly positioning Diplomonadida at the base of the eukaryotic tree. However, analysis of 22 protein-coding genes using only slowly evolving amino acid sites demonstrated clearly that Diplomonadida and Parabasala are closely related, and that an early emergence of the common ancestor of these two groups is not necessarily exclusively supported. Our present analyses, together with findings on the distribution of the fused DHFR-TS gene as mentioned above, narrowed the possible position of the root of the Eukaryota tree on the branches leading to Opisthokonta or to the common ancestor of Diplomonadida/ Parabasala.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sequence Alignments of the Genes Used for the Phylogenetic Analyses
In addition to IleRS, ValRS, RPB1, and HSP90c, for which original sequences were established from several protists in this work, 20 other genes were used for phylogenetic analyses. These included small subunit (SSU) rRNA, large subunit (LSU) rRNA, EF1, EF2, ribosomal proteins (RP) S14, S15a, L5, L8, L10a, cytosolic-type HSP70 (HSP70c), ER-type HSP70 (HSP70er), mitochondrial-type HSP70 (HSP70mit), chaperonin 60 (CPN60), chaperonin-containing TCP-1 (CCT)
,
,
,
subunits, actin (ACT),
-tubulin (TB
), and ß-tubulin (TBß). Genes related to metabolic pathways were not used in the present analyses, because LGT events are frequently observed for these genes, and thus inclusion of such genes would have violated the correct inference for organismal phylogeny. We assumed that the genes used in this study are not subjected to LGT as far as the analysis of the Eukaryota domain is concerned, because preliminary phylogenetic analyses of these genes did not suggest the presence of any LGT events.
For the above protein-coding 22 genes, amino acid sequences from diverse eukaryotes and several outgroup sequences were collected from various databases and alignments, including the original sequences of the above four genes obtained in this study, were constructed using the SAM2.1 program (Hughey and Krogh 1996). The obtained alignments were then adjusted manually. For SSUrRNA and LSUrRNA, alignments of diverse eukaryotic and four archaebacterial (outgroup) sequences were obtained using the secondary structure-based alignment database (http://oberon.fvms.ugent.be:8080/rRNA/index.html) (Wuyts et al. 2001, 2002). Several additional sequences not present in the database were inserted and then aligned manually. Unambiguously aligned sites were selected from each of the original alignments and used for phylogenetic analyses. Alignments and data sets used are available from T. H. upon request.
Outgroup sequences used for individual genes are listed in table S1 of the Supplementary Materials online. In brief, archaebacteria were used for the analyses of the genes, SSUrRNA, LSUrRNA, EF1, EF2, RP-S14, RP-S15a, RP-L5, RP-L8, RP-L10a, TCP-1
, TCP-1
, TCP-1
, and TCP-1
. Eubacteria were used for IleRS, ValRS, HSP70mit, and CPN60. Paralogous eukaryotic sequences were used for the other genes. Combined maximum likelihood (ML) analyses in this study were carried out under the assumption that ingroup sequences in each gene have evolved independently.
Programs and Models Used in the Phylogenetic Inference
The NUCML and PROTML programs in the package MOLPHY (version 2.3) (Adachi and Hasegawa 1996) were used in the analyses, which assumed a homogeneous across-site rate (Homogeneous model). To take the evolutionary heterogeneity across-site rate into consideration, the BASEML and CODEML programs in the package PAML (version 3.1) (Yang 1997) were used, where a discrete -distribution with 8 categories for across-site rate heterogeneity was assumed (RAS model). The
-shape parameter (
) was estimated from the analyzed data for each gene. The combined ML analysis, which calculated the sum of the log-likelihoods, was carried out using the TOTALML program in MOLPHY with a variety of different gene combinations. The HKY85 and JTT-F models were assumed for nucleotide and amino acid substitution processes, respectively (Hasegawa, Kishino, and Yano 1985; Jones, Taylor, and Thornton 1992). The RELL bootstrap analysis (Kishino, Miyata, and Hasegawa 1990) was performed on alternative trees to obtain approximate bootstrap proportion (BP) values, because the limitation of computational time did not enable us to carry out real bootstrap analyses. The RELL method was shown to be a good approximation to the real bootstrap method (Hasegawa and Kishino 1994). The AU test (Shimodaira 2002) in the CONSEL program (Shimodaira and Hasegawa 2001) and the Shimodaira-Hasegawa (SH) test in the BASEML and CODEML programs were used for statistical comparisons among the alternative trees of interest.
Phylogenetic Analyses of Individual Genes
In the preliminary stage of each individual gene analysis, an unrooted tree was considered for each gene, excluding sequences that belonged to an outgroup. The quick topology search option of the NUCML or PROTML program (q n2000) was used to produce candidate trees, which were subsequently analyzed by the ordinary ML method using the Homogeneous model. The best tree and alternative trees, of which the log-likelihood differences from the log-likelihood of the best tree were within 1 standard error (SE) (1SE criterion), were selected. These trees were further analyzed by the ML method using the RAS model, and the best tree was finally selected. Based on the best tree and widely accepted phylogenetic relationships, constraints on the subtrees for seven higher-order taxonomic groups of Eukaryota (Opisthokonta, Amoebozoa, Plantae, Alveolata/stramenopiles, Euglenozoa/Heterolobosea, Diplomonadida, and Parabasala) were assumed in advance. The subtree for the outgroup of each gene was also assumed in advance, based on established findings. Taxa and subtrees are shown in Table S1 of the Supplementary Materials online with other information.
Thereafter, for each gene with the subtree constraints, a total of 10,395 possible trees for eight groups (seven groups + an outgroup) was exhaustively analyzed with the Homogeneous model for a data set including all the sites initially selected from an original alignment ( "all" data set). Based on the best tree using the Homogeneous model site-by-site rate categories, r1 (the slowest evolving sites, including constant sites) through r8 (the fastest evolving site), were estimated by the analysis using the RAS model. To investigate the effect of removing constant or slowly evolving sites or rapidly evolving sites in the analyses, we made alternative data sets in a way similar to that previously examined by Hirt et al. (1999) and Dacks et al. (2002). The r8, r1, and r7 sites were stepwise removed from the "all" data set, producing another four data sets, r8, r18, r78, and r178. For each of these data sets, an exhaustive analysis of 10,395 trees using the Homogeneous model was carried out in the same manner as for the "all" data set.
Combined Analyses of the Relationships among Seven Eukaryotic Higher-Order Groups with the Outgroup
To evaluate the support for a given tree among the 10,395 trees from the total information residing in the individual genes, phylogenetic information of individual genes were combined by summing up site-by-site log-likelihoods for each tree. Thereafter, the tree with the highest log-likelihood in total was selected as the best tree for the combined analysis. With this approach, parameters (such as branch lengths) were optimized for each gene, allowing the combined analysis to take into consideration heterogeneous phylogenetic information among the genes. For each of the five data sets ("all," r8, r18, r78, and r178) the summation was done over 24 genes including 2 rRNAs and over 22 protein-coding genes. Based on the analyses using the summation over 24 genes for the data sets "all" and r78, candidate trees were selected from 10,395 alternatives based on the 4SE criterion (the best tree and trees with log-likelihood differences from the best tree less than 4SE), producing 137 and 572 trees for the data sets all and r78, respectively. The union of these tree sets contained 577 trees and was exhaustively searched by the analysis using RAS model for each of the 24 individual genes. The combined analysis was done in the same way as described above for each of the five data sets.
Combined Analyses with Additional Assumptions
According to the results of the analyses as described in the following section, Diplomonadida and Parabasala were grouped in advance, and 945 possible trees for the six eukaryotic groups with an outgroup were exhaustively examined for each gene using the r78 data set and the RAS model. Based on the combined analyses, over all 24 genes and 22 protein-coding genes, the selection of candidate trees was carried out against the 3SE criterion, producing 102 and 205 trees, respectively. The union of the two tree sets resulted in 214 trees, which were then used in AU tests to enable statistical comparisons with different combinations of genes.
Finally, assuming that Euglenozoa/Heterolobosea are closely related to Diplomonadida/Parabasala, which corresponds to the Excavata monophyly hypothesis (Cavalier-Smith 2002), 105 possible trees for the five eukaryotic groups with an outgroup were exhaustively examined for each of the 22 protein-coding genes using the data sets, "all." r8, and r78. Based on the combined analyses for these data sets, 105 alternative trees were compared.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The Relationship among Seven Higher-Order Groups of Eukaryota
At first, combined analyses were performed by the Homogeneous model using the "all" data set in order to provide an inference based on the most classical phylogenetic approach. The best tree by the 22 protein genes positioned the amitochondriate lineages, Diplomonadida and Parabasala, at the earliest and second-earliest branches of the eukaryotic tree, respectively, with 60% and 81% BP supports (fig. 1a). The stepwise divergences of Opisthokonta, Amoebozoa, and Plantae were followed by two early branches. The tree reconstructed the monophyly of Plantae, Alveolata/stramenopiles, and Euglenozoa/Heterolobosea, as suggested by the presence of a fused DHFR-TS gene (Stechmann and Cavalier-Smith 2002; 2003a). Inclusion of two rRNA genes (fig. 1b) further supported the earliest branching of Diplomonadida (96%) and changed the branching order apart from the two early branches. This tree was congruent with a previous tree inferred by the combined analysis of approximately 100 proteins with 25,000 sites but that did not include Parabasala (Bapteste et al. 2002). Because the analysis including two rRNA genes seemed to be strongly affected by a LBA artifact, uniting Diplomonadida with outgroup sequences, analyses based on the 22 protein genes were mainly used for subsequent phylogenetic inference.
|
|
To investigate the effect of reducing sites and different model specifications, variations in BP support values among 577 alternative trees were compared between different data sets using different sample sites and different models for the site rates (fig. 2b). For node "d," 81% support was found for the data set "all" with the Homogeneous model, whereas only 45% support was found for the data set r78 with the RAS model. For node "e," Homogeneous model analysis using the data set "all" did not support the node (38%), whereas analysis using the data set r78, with either Homogeneous or RAS modeling, showed a support of greater than 85%. By removing the fast evolving sites, support for the close affinity between Diplomonadida and Parabasala increased, whereas support for the early branching status of these two was decreased. This tendency was particularly obvious in the analyses using the Homogeneous model (fig. 2b, nodes d and e). In contrast, removal of the slowest evolving sites (r1) showed no significant effect on variation of the BP values for any of the nodes (fig. 2b), suggesting that removal of the slowest or constant sites did not affect phylogenetic inference as far as the ML method is concerned. For the other nodes of interest, represented by Trees B, C, D, and E, no high BP support was obtained for grouping Opisthokonta with Amoebozoa (node f) or Euglenozoa/Heterolobosea with Diplomonadida/Parabasala (node j), corresponding to the Excavata monophyly hypothesis (Cavalier-Smith 2002). Removal of the fast-evolving sites increased support for the earliest branching status of Opisthokonta (node l), especially with regard to the RAS model analysis (more than 10%), albeit without any clear support.
Possible Root of the Tree of Eukaryota
According to the above analyses, Diplomonadida and Parabasala were grouped together to form a new clade. To further analyze the relationship among higher-order groups and the root of the tree, 945 possible trees for the six higher-order eukaryotic groups (including the Diplomonadida-Parabasala clade) and an outgroup were exhaustively examined for each gene, using the r78 data set with RAS modeling. Thereafter, subsequent combined analyses were performed. Because the alternative trees for the BP analyses were different from the previous ones (577 trees), the BP support values for the nodes in figure 2a except node "e" were slightly changed but were almost the same as shown in figure 2.
Based on the criteria described in Materials and Methods, 214 trees were selected out of 945 trees for statistical comparisons. Of these 214 trees, combined analyses were performed on various combinations of the genes. Based on the analysis with 22 protein genes, 60 trees were finally selected by the AU test with a criterion of p > 0.05. Figure 3 compares the P values of the 60 trees and three additional trees of interest (Tree D in fig. 2a by Stechmann and Cavalier-Smith (2002), the best tree from tubulins, and the best tree from rRNAs). The 214 trees did not include Tree E in figure 2a by Stechmann and Cavalier-Smith (2003a).
|
Analyses of tubulins and rRNAs significantly rejected most of the 60 trees. It would appear that the phylogenetic signals residing in the tubulins and rRNAs must be different from those in the other protein data sets used in the present analyses. Interestingly, all the trees except Tree 45, that were not rejected by the analysis of tubulins (p 0.05), positioned Opisthokonta as the closest relative to Diplomonadida/Parabasala, indicating that the tubulin data sets strongly supported the close relationship between Opisthokonta and Diplomonadida/Parabasala. On the other hand in the analysis by rRNAs, 59 of the 60 trees were significantly rejected, leaving only Tree B (Bapteste et al. 2002). Also in this analysis, the classical eukaryotic tree of SSUrRNA (Tree 62) was significantly supported, even when rapidly evolving sites were excluded from the analysis.
Because Plantae, Euglenozoa/Heterolobosea, and Alveolata/stramenopiles share a fused DHFR-TS gene, these three groups are likely to be monophyletic, and the root of the tree of Eukaryota should not be located within these three groups (Stechmann and Cavalier-Smith 2002). The ML tree, from the combined analyses presented in this study, also reconstructed the monophyly for these three groups by sequence-based evidence, although the BP support was not high (fig. 2a). Therefore, from the 60 selected trees listed in figure 3, together with the DHFR-TS gene fusionbased findings, the possible position of the root of the eukaryotic tree could finally be narrowed to the branch leading either to Opisthokonta (Trees 5, 6, 16, 19, 20), to the common ancestor of Diplomonadida/Parabasala (Trees 44 [Tree A], 45, 51, 52 [Tree C]), to the common ancestor of Opisthokonta and Diplomonadida/Parabasala (Tree 55), or to the common ancestor of Plantae, Euglenozoa/Heterolobosea, and Alveolata/stramenopiles (Trees 57 and 58). In addition, if we also accept the close relationship between Excavata, Plantae, and Alveolata/stramenopiles as one of the candidate relationship (Cavalier-Smith 2002; Stechmann and Cavalier-Smith 2002; 2003a), Tree 1 cannot be ruled out either. The close relationship between Opisthokonta and Diplomonadida/Parabasala found in Trees 55 and 58 is likely to be artificially affected by tubulins as mentioned above. The p values for Trees 6, 55, and 57 decreased to 0.01 p < 0.05 in the analysis of the "15 proteins" whose outgroups are not extremely distant. Based on these considerations Trees 6, 55, 57, and 58 can also be ruled out, further narrowing the possibilities to the Opisthokonta rooting or to the Diplomonadida/Parabasala rooting.
Analysis with a Constraint on the Excavata Monophyly
To seek a possible relationship among higher-order groups under the assumption that Excavata are monophyletic (Cavalier-Smith 2002), 105 possible alternative trees for the five higher-order groups (Opisthokonta, Amoebozoa, Plantae, Alveolata/stramenopiles, Excavata) and an outgroup were exhaustively examined using the RAS model with the r78 data set. Tree D (Stechmann and Cavalier-Smith 2002) was selected as the best tree (fig. 4). Tree 60 in figure 3, in which Excavata was positioned at the base of Eukaryota, was not significantly different from the best tree (p = 0.433), and the tree (Tree 1 in figure 3) that exchanges the positions of Excavata and Plantae in Tree D was not significantly different either (p = 0.493). Sixty-nine trees were rejected by the AU test at the significance level, p < 0.05. Tree E (Stechmann and Cavalier-Smith 2003a) was included in these trees (p = 0.017). As shown in the BP values indicated in the internal branches of the trees in figure 4, the basal placings of Opisthokonta and Excavata were supported by 1% and 97%, respectively, when the "all" data set was used with Homogeneous modeling. In contrast, supports for these placings shifted to 74% and 18%, respectively in the analysis using the RAS model on the r78 data set. A close relationship with 91% support was found between Opisthokonta and Amoebozoa when the Homogeneous model was used on the "all" data set. However, support decreased to 21% when the r78 data set was used with RAS modeling. The analyses by removing fast-evolving sites with the RAS model favored the Opisthokonta rooting. This supported the hypothesis proposed by Stechmann and Cavalier-Smith in 2002 but not their later hypothesis of 2003 (Stechmann and Cavalier-Smith 2003a), under the assumption that Excavata monophyly really was the case.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The tree of Eukaryota has been examined by the SSUrRNA phylogeny during past two decades, and various phylogenetic questions have been successfully addressed (e.g., Sogin and Silberman 1998). One of the most important implications of the SSUrRNA phylogeny was that the amitochondriate protists, Euglenozoa, Heterolobosea, and Amoebozoa diverged stepwise before the radiation of the terminal "Crown" groups (Sogin and Silberman 1998). The classical SSUrRNA tree as shown in Tree 62 of figure 3, however, was exclusively supported only by rRNAs in the present analyses. Averaged phylogenetic signals residing in the protein data sets did not favor the rRNA tree at all. Because figure 3 revealed that the rRNA tree was an extreme example of the tree of Eukaryota, the widely accepted, SSUrRNA-based scenario of the eukaryotic evolution should be extensively revised (Cavalier-Smith 2004). At the same time, the tubulin phylogeny was also very discordant with the combined protein phylogeny without tubulins (fig. 3). Tubulins polymerize to form microtubules, the major component of cytoskeleton; 9+2 axonemes, and mitotic spindles. The tubulins are the most important molecules for forming structure and morphology of the cell. Compared with those of other proteins used in the present combined analyses, functional constraints on tubulins are more likely to be affected by lifestyle and environment of the organisms. Thus, convergent evolution at the molecular level might have occurred and violated the organismal phylogeny in the tubulin data sets.
From the combined analyses of the 22 protein sequences together with the findings for the distribution of the DHFR-TS gene fusion in the eukaryotic tree, two possibilities seem to exist for the root of the tree of Eukaryota, namely the branch leading either to Opisthokonta or that leading to the common ancestor of Diplomonadida/Parabasala. No strong support was detected for the latter possibility, in contrast to the prominent support generally obtained by the rRNA phylogeny. If the latter possibility could be discarded because of a widely recognized possible LBA artifact, Opisthokonta rooting would be the most likely option. The early emergence of Opisthokonta has been weakly recovered by a recent HSP90c phylogeny including six different, previously unsampled eukaryotic groups (Stechmann and Cavalier-Smith 2003b).
Although the present analyses did not support the Excavata monophyly hypothesis (Cavalier-Smith 2002), we examined the eukaryotic phylum Excavata. This was because we could not entirely exclude the hypothesis by the present combined analyses alone, possibly because they were affected by the serious LBA problems. Many unknown artifacts including LBA may critically influence the present analyses. One of the two higher-order groups in Excavata, Euglenozoa/Heterolobosea, and Diplomonadida/Parabasala, might be artificially located at the base of the eukaryotic tree (fig. 3), and thus the monophyly of Excavata might be difficult to reconstruct. It is worth noting that the best tree obtained in the analysis with a constraint on the Excavata monophyly was exactly the same as Tree D (Stechmann and Cavalier-Smith 2002). This result demonstrates once again that Opisthokonta rooting is most likely, if the possibility of rooting on one of the fast-evolving groups, Diplomonadida/Parabasala and Euglenozoa/Heterolobosea, is not considered. If Tree D was really the case, then the DHFR-TS fused gene would have been lost in the parasites of the Diplomonadida/Parabasala group, because neither DHFR nor TS activity was detected in Giardia intestinalis, Trichomonas vaginalis, and Tritrichomonas foetus (Wang et al. 1983; Wang and Cheng 1984; Aldritt, Tien, and Wang 1985), and because no related gene was found in the genome sequencing database of Giardia intestinalis (McArthur et al. 2000). Examination of the presence or absence of the fused gene in the free living organisms that belong to this group will be important to clarify the status of Diplomonadida/Parabasala in the phylogenetic tree based on the DHFR-TS gene fusion.
Preliminary exploration of the distribution of the gene fusion, CPSII-ACT-DHO, which was exclusively found in Opisthokonta and Amoebozoa, suggested that these two groups are probably monophyletic and that the root is located on their common ancestor (Stechmann and Cavalier-Smith 2003a). However, this possibility is less supported by the present study on the sequence-based phylogeny. Because information regarding the gene fusion events in the pyrimidine biosynthetic pathway is scant, further analyses examining gene organization of the pathway from diverse protist lineages is necessary to settle more precisely the discrepancy between the inferences based on the gene fusion and the sequence-based phylogeny. Interestingly a sequence similarity search of the genome project database of a unicellular red alga, Cyanidioschyzon merolae (Matsuzaki et al. 2004) (http://merolae.biol.s.u-tokyo.ac.jp/) identified a fused CPSII-ACT-DHO gene in addition to separate CPS and GAT genes, demonstrating that the gene fusion event most likely occurred on the common ancestor of all eukaryotes including bikonts, Opisthokonta, and Amoebozoa. This finding reduces the possibility for rooting on the common ancestor of Opisthokonta and Amoebozoa, but instead gives more support for the Opisthokonta rooting suggested by the present molecular phylogeny. In addition to gene fusions in the pyrimidine biosynthetic pathway summarized by Nara, Hashimoto, and Aoki (2000), a novel gene fusion, ACT-DHOD (Dihydroorotate dehydrogenase), has recently been found in a euglenozoan protist, Bodo saliens (Annoura et al. 2004). Compared to the DHFR-TS fusion event, gene fusions in the pyrimidine biosynthetic pathway may be more complicated, with fusion and separation events possibly occurring more than once on the independent branches of the eukaryotic tree.
If the constraints that were assumed in advance for each of the higher-order groups examined in the present analyses were very discordant with phylogenetic information for each of the genes analyzed, the constraints may have violated phylogenetic inference. Because constraints to RPB1 significantly affected the log-likelihood difference between the best trees with and without constraints (p < 0.01; see table S1 of the Supplementary Materials online), we excluded RPB1 from all 22 protein genes in the combined analysis, as shown in figure 3, and explored the influence of such exclusion. Analysis only by RPB1 was not significantly different from the analysis of the 22 proteins with regard to the patterns in figure 3. Removal of RPB1 from the 22 proteins ("RPB1") shifted the best tree to Tree 25 with Euglenozoa/Heterolobosea rooting, but the overall pattern of p values for the 63 trees in figure 3 did not change after its removal, demonstrating that no significant influence was introduced by inclusion of RPB1. Instead, as already mentioned in the Results, inclusion of tubulins and/or rRNAs might significantly affect the present combined analyses. We also examined an alternative combined analysis of the 22 protein genes by excluding sequences of Rhodophyta and/or Glaucophyta, which are present in nine of the genes (see table S1 in the Supplementary Materials online), because the monophyletic origin of Plantae, including Viridiplantae, Rhodophyta, and Glaucophyta, has recently been challenged (Nozaki et al. 2003). No differences were found either with or without Rhodophyta and/or Glaucopyhta, demonstrating that the constraint on the monophyletic origin of Plantae had no significant influence. The possibility for a polyphyletic origin of Viridiplantae, Rhodophyta, and Glaucophyta that was proposed by Nozaki et al. (2003) should be re-examined with more data.
In our present analyses performed by removing slow- or fast-evolving sites, owing to the limitation of the computational time, we did not provide "control" experiments for assessing a specific effect of removing sites over what is expected with the random removal of sites. Although we roughly compared the BP values of the analyses with different numbers of sites in the present analyses, in general one cannot simply compare them because a positive correlation is present between BP values and numbers of sites used in the analysis. A control experiment should be done in the next step for each of the data sets with different number of sites. In spite of the use of the model for approximating rate heterogeneity among sites (RAS), the removal of fast-evolving sites (r78) still showed an additional effect on the BP values. This is probably because an effect of model misspecification was apparent, especially on sites r7 and r8. Because violation of amino acid frequency constancy was not so evident for the 22 proteins analyzed (table S1 of the Supplementary Materials online), the misspecification can probably be attributed to evolutionary rate distribution differences across subtrees (covarion shifts), which were not taken into consideration in the present analyses with the RAS model. The presence of such model misspecification was discussed in detail in an EF1
analysis for the position of Microsporidia in light of LBA and covarion shifts (Inagaki et al. 2004). If the "all" data set in the present analyses contained such a covarion-like structure, then the RAS model could not fully approximate the data, resulting in a possible LBA artifact which locates Diplomonadida or a common ancestor of Diplomonadida and Parabasala at the base of the tree.
The ML tree of the combined analysis using RAS model on the r78 data set, including 22 proteins (Tree A, shown in fig. 2a) clearly demonstrated the difficulty in solving the higher-order phylogeny of Eukaryota. The branch lengths leading to the outgroup and Parabasala or Diplomonadida are extremely long, and those leading to nodes a, b, and c are extremely short. Except for Opisthokonta, taxon sampling within each higher-order group is sparse. The relationships between the major eukaryotic groups and an outgroup cannot be clearly resolved apart from the close relationship between Diplomonadida and Parabasala. Although the present analyses could narrow the possible root of the tree of Eukaryota, the problem is still open because of the lack of phylogenetic information. With the accumulation of EST sequence data, a large scale analyses for eukaryotic phylogeny ("phylogenomics") has recently been examined, and conclusively demonstrated the position of chanoflagellates (Philippe et al. 2004). The "phylogenomics" approach, using more sequence data with adequate taxon sampling, together with the application of sophisticated data analysis for combined phylogeny, will be indispensable for providing more robust inference on the higher-order relationships and the root of the Eukaryota tree.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
FIG. S1. Unrooted maximum likelihood trees of Eukaryota. (a), IleRS; (b), ValRS; (c), Hsp90c; and (d), RPB1.
Table S1. Constraints on the subtrees for seven higher-order taxonomic groups of Eukaryota and an outgroup.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adachi, J., and M. Hasegawa. 1996. MOLPHY version 2.3: program for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monographs No. 28, The Institute of Statistical Mathematics, Tokyo.
Aldritt, S. M., P. Tien, and C. C. Wang. 1985. Pyrimidine salvage in Giardia lamblia. J. Exp. Med. 161:437445.
Annoura, T., T. Nara, T. Makiuchi, T. Hashimoto, and T. Aoki. 2004. The origin of dihydroorotate dehydrogenase genes of kinetoplastids, with special reference to their biological significance and adaptation to anaerobic, parasitic conditions. J. Mol. Evol. in press.
Archibald, J. M., D. Longet, J. Pawlowski, and P. J. Keeling. 2003. A novel polyubiquitin structure in Cercozoa and Foraminifera: evidence for a new eukaryotic supergroup. Mol. Biol. Evol. 20:6266.
Arisue, N., T. Hashimoto, J. A. Lee, D. V. Moore, P. Gordon, C. W. Sensen, T. Gaasterland, M. Hasegawa, and M. Müller. 2002a. The phylogenetic position of the peleobiont Mastigamoeba balamuthi based on sequences of rDNA and translation elongation factors EF-1 and EF-2. J. Eukaryot. Microbiol. 49:110.[ISI][Medline]
Arisue, N., T. Hashimoto, H. Yoshikawa, Y. Nakamura, G. Nakamura, F. Nakamura, T. Yano, and M. Hasegawa. 2002b. Phylogenetic position of Blastocystis hominis and of stramenopiles inferred from multiple molecular sequence data. J. Eukaryot. Microbiol. 49:4253.[ISI][Medline]
Arisue, N., Y. Maki, H. Yoshida, A. Wada, L. B. Sánchez, M. Müller, and T. Hashimoto. 2004. Comparative analysis of the ribosomal components of the hydrogenosome-containing protist, Trichomonas vaginalis. J. Mol. Evol. 59:5971.[ISI][Medline]
Baldauf, S. L. 2003. The deep roots of eukaryotes. Science 300:17031706.
Baldauf, S. L., A. J. Roger, I. Wenk-Siefert, and W. F. Doolittle. 2000. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972977.
Bapteste, E., H. Brinkmann, J. A. Lee, D. V. Moore, C. W. Sensen, P. Gordon, L. Duruflé, T. Gaasterland, P. Lopez, M. Müller, and H. Philippe. 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl. Acad. Sci. USA 99:14141419.
Cavalier-Smith, T. 2002. The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa. Int. J. Syst. Evol. Microbiol. 52:297354.
Cavalier-Smith, T., and E. E. Chao. 2003a. Phylogeny of choanozoa, apusozoa, and other protozoa and early eukaryote megaevolution. J. Mol. Evol. 56:540563.[CrossRef][ISI][Medline]
Cavalier-Smith, T., and E. E. Chao. 2003b. Phylogeny and classification of phylum Cercozoa (Protozoa). Protist 154:341358.[CrossRef][ISI]
Cavalier-Smith, T. 2004. Only six kingdoms of life. Proc R. Soc. Lond. Ser. B 271:12511262.[CrossRef][ISI][Medline]
Dacks, J. B., and W. F. Doolittle. 2001. Reconstructing/deconstructing the earliest eukaryotes: how comparative genomics can help. Cell 107:419425.[ISI][Medline]
Dacks, J. B., J. D. Silberman, A. G. B. Simpson, S. Moriya, T. Kudo, M. Ohkuma, and R. J. Redfield. 2001. Oxymonads are closely related to the excavate taxon Trimastix. Mol. Biol. Evol. 18:10341044.
Dacks, J. B., A. Marinets, W. F. Doolittle, T. Cavalier-Smith, and J. M. Logsdon Jr. 2002. Analyses of RNA polymerase II genes from free-living protists: phylogeny, long branch attraction, and the eukaryotic big bang. Mol. Biol. Evol. 19:830840.
Edgcomb, V. P., A. J. Roger, A. G. B. Simpson, D. T. Kysela, and M. L. Sogin. 2001. Evolutionary relationships among "jakobid" flagellates as indicated by alpha- and beta-tubulin phylogenies. Mol. Biol. Evol. 18:514522.
Embley, T. M., M. van der Giezen, D. S. Horner, P. L. Dyal, and P. Foster. 2003. Mitochondria and hydrogenosomes are two forms of the same fundamental organelle. Philos. Trans. R. Soc. Lond. B Biol. Sci. 358:191201.[CrossRef][ISI][Medline]
Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27:401410.[ISI]
Gribaldo, S., and H. Philippe. 2002. Ancient phylogenetic relationships. Theoret. Popul. Biol. 61:391408.[CrossRef][ISI]
Hasegawa, M., and H. Kishino. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree. Mol. Biol. Evol. 11:142145.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160174.[ISI][Medline]
Hashimoto, T., L. B. Sánchez, T. Shirakura, M. Müller, and M. Hasegawa. 1998. Secondary absence of mitochondria in Giardia lamblia and Trichomonas vaginalis revealed by valyl-tRNA synthetase phylogeny. Proc. Natl. Acad. Sci. USA 95:68606865.
Hashimoto, T., Y. Nakamura, T. Kamaishi, and M. Hasegawa. 1997. Early evolution of eukaryotes inferred from protein phylogenies of translation elongation factors 1 and 2. Arch. Protistenkd. 148:287295.
Hirt, R. P., J. M. Logsdon Jr., B. Healy, M. W. Dorey, W. F. Doolittle, and T. M. Embley. 1999. Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc. Natl. Acad. Sci. USA 96:580585.
Horner, D. S., and T. M. Embley. 2001. Chaperonin 60 phylogeny provides further evidence for secondary loss of mitochondria among putative early-branching eukaryotes. Mol. Biol. Evol. 18:19701975.
Hughey, R, and A. Krogh. 1996. Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput. Appl. Biosci. 12:95107.[Abstract]
Inagaki, Y., E. Susko, N. M. Fast, and A. J. Roger. 2004. Covarion shifts cause a long-branch attraction artifact that unites Microsporidia and Archaebacteria in EF-1 phylogenies. Mol. Biol. Evol. 21:13401349.
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275282.[Abstract]
Keeling, P. J. 2001. Foraminifera and Cercozoa are related in actin phylogeny: two orphans find a home?. Mol. Biol. Evol. 18:15511557.
Keeling, P. J., and W. F. Doolittle. 1996. Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family. Mol. Biol. Evol. 13:12971305.
Kishino, H., T. Miyata, and M. Hasegawa. 1990. Maximum likelihood inference of protein phylogeny, and the origin of chloroplasts. J. Mol. Evol. 31:151160.[ISI]
Lee, J. J., G. F. Leedale, P. Bradbury, eds. 2000. An Illustrated Guide of Protozoa, 2nd ed. Society of Protozoologists, Lawrence, Kans.
Martin, W., and M. Müller. 1998. The hydrogen hypothesis of the first eukaryote. Nature 392:3741.[CrossRef][ISI][Medline]
Matsuzaki, M., O. Misumi, T. Shin-i et al. (42 co-authors). 2004. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature 428:653657.[CrossRef][ISI][Medline]
McArthur, A. G., H. G. Morrison, J. E. Nixon et al. (15 co-authors). 2000. The Giardia genome project database. FEMS Microbiol. Lett. 189:271273.[CrossRef][ISI][Medline]
Moreira, D., H. Le Guyader, and H. Philippe. 2000. The origin of red algae and the evolution of chloroplasts. Nature 405:6972.[CrossRef][ISI][Medline]
Nara, T., T. Hashimoto, and T. Aoki. 2000. Evolutionary implications of the mosaic pyrimidine-biosynthetic pathway in eukaryotes. Gene 257:209222.[CrossRef][ISI][Medline]
Nozaki, H., M. Matsuzaki, M. Takahara, O. Misumi, H. Kuroiwa, M. Hasegawa, T. Shin-i, Y. Kohara, N. Ogasawara, and T. Kuroiwa. 2003. The phylogenetic position of red algae revealed by multiple nuclear genes from mitochondria-containing eukaryotes and an alternative hypothesis on the origin of plastids. J. Mol. Evol. 56:485497.[CrossRef][ISI][Medline]
O'Kelly, C. J., and T. A. Nerad. 1999. Malawimonas jakobiformis n. gen., n. sp. (Malawimonadidae n. fam.): a jakoba-like heterotrophic nanoflagellate with discoidal mitochondrial cristae. J. Eukaryot. Microbiol. 46:522531.[ISI]
Philippe, H., P. Lopez, H. Brinkmann, K. Budin, A. Germot, J. Laurent, D. Moreira, M. Muller, and H. Le Guyader. 2000. Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc. R. Soc. Lond. B Biol. Sci. 267:12131221.[CrossRef][ISI][Medline]
Philippe, H., E. A. Snell, E. Bapteste, P. Lopez, P. W. Holland, and D. Casane. 2004. Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol. Biol. Evol. 21:17401752.
Richards, T. A., R. P. Hirt, B. A. P. Williams, and T. M. Embley. 2003. Horizontal gene transfer and the evolution of parasitic protozoa. Protist 154:1732.[CrossRef][ISI][Medline]
Roger, A. J. 1999. Reconstructing early events in eukaryotic evolution. Am. Nat. 154:S146S163.[CrossRef][ISI][Medline]
Roger, A. J., and J. D. Silberman. 2002. Mitochondria in hiding. Nature 418:827829.[CrossRef][ISI][Medline]
Rotte, C., K. Henze, M. Müller, and W. Martin. 2000. Origins of hydrogenosomes and mitochondria. Curr. Opin. Microbiol. 3:481486.[CrossRef][ISI][Medline]
Shimodaira, H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51:492508.[CrossRef][ISI][Medline]
Shimodaira, H., and M. Hasegawa. 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17:12461247.
Silberman, J. D., A. G. B. Simpson, J. Kulda, I. Cepicka, V. Hampl, P. J. Johnson, and A. J. Roger. 2002. Retortamonad flagellates are closely related to diplomonadsimplications for the history of mitochondrial function in eukaryote evolution. Mol. Biol. Evol. 19:777786.
Simpson, A. G. B., and D. J. Patterson. 1999. The ultrastructure of Carpediemonas membranifera: (Eukaryota), with reference to the "excavate hypothesis.". Eur. J. Protistol. 35:353370.[ISI]
Simpson, A. G. B., and A. J. Roger 2002. Eukaryotic evolution: getting to the root of the problem. Curr. Biol. 12:R691R695.[CrossRef][ISI][Medline]
Simpson, A. G. B., A. J. Roger, J. D. Silberman, D. D. Leipe, V. P. Edgcomb, L. S. Jermiin, D. J. Patterson, and M. L. Sogin. 2002. Evolutionary history of "early-diverging" eukaryotes: the excavate taxon Carpediemonas is a close relative of Giardia. Mol. Biol. Evol. 19:17821791.
Sogin, M. L., and J. D. Silberman. 1998. Evolution of the protists and protistan parasites from the perspective of molecular systematics. Int. J. Parasitol. 28:1120.[CrossRef][ISI][Medline]
Stechmann, A., and T. Cavalier-Smith. 2002. Rooting the eukaryote tree by using a derived gene fusion. Science 297:8991.
Stechmann, A., and T. Cavalier-Smith. 2003a. The root of the eukaryote tree pinpointed. Curr. Biol. 13:R665666.[CrossRef][ISI][Medline]
Stechmann, A., and T. Cavalier-Smith. 2003b. Phylogenetic analysis of eukaryotes using heat-shock protein Hsp90. J. Mol. Evol. 57:408419.[ISI]
Tovar, J., G. Leon-Avila, L. B. Sánchez, R. Sutak, J. Tachezy, M. van der Giezen, M. Hernandez, M. Müller, and J. M. Lucocq. 2003. Mitochondrial remnant organelles of Giardia function in iron-sulphur protein maturation. Nature 426:172176.[CrossRef][ISI][Medline]
Wang, C. C., R. Verham, S. F. Tzeng, S. Aldritt, and H. W. Cheng. 1983. Pyrimidine metabolism in Tritrichomonas foetus. Proc. Natl. Acad. Sci. USA 80:25642568.[Abstract]
Wang, C. C., and H. W. Cheng. 1984. Salvage of pyrimidine nucleosides by Trichomonas vaginalis. Mol. Biochem. Parasitol. 10:171184.[CrossRef][ISI][Medline]
Williams, B. A., R. P. Hirt, J. M. Lucocq, and T. M. Embley. 2002. A mitochondrial remnant in the microsporidian Trachipleistophora hominis. Nature 418:865869.[CrossRef][ISI][Medline]
Wuyts, J., P. De Rijk, Y. Van de Peer, T. Winkelmans, and R. De Wachter. 2001. The European Large Subunit Ribosomal RNA database. Nucleic Acids Res. 29:175177.
Wuyts, J., Y. Van de Peer, T. Winkelmans, and R. De Wachter. 2002. The European database on small subunit ribosomal RNA. Nucleic Acids Res. 30:183185.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555556.[Medline]