*Program in Evolutionary Biology, Canadian Institute for Advanced Research, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax;
Department of Botany, University of British Columbia, Vancouver;
Department of Zoology, University of Oxford, South Parks Road, UK;
Department of Biology, Emory University
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
How to reconcile the ssu rDNA and protein sequence evidence is hotly debated and has prompted several alternative views of eukaryotic relationships. The "eukaryotic big bang" hypothesis suggests that eukaryotes evolved in a massive radiation of 410 groups whose interrelationships are fundamentally irresolvable (Philippe and Adoutte 1998
; Philippe, Germot, and Moreira 2000a
). An alternative view, based on combined protein data, proposes two superclades of eukaryotes: one group, called opisthokonts (Cavalier-Smith 1987
), contains animals, fungi, and their choanozoan relatives, whereas the other contains plants, chromists, and most protozoa (Embley and Hirt 1998
; Dacks and Roger 1999
; Baldauf et al. 2000
; Edgcomb et al. 2001
). As ribosomal rRNA trees also invariably robustly resolve this dichotomy between opisthokonts and the rest of eukaryotes (Sogin 1991
; Cavalier-Smith 1993
, 2000
), some of the broad features of both rRNA and protein trees can be reconciled and are congruent with key ultrastructural data (Cavalier-Smith 2000
, 2002
). However, a number of unanswered questions remain: the evolutionary affinities of the many protist groups not clearly attributable to any major grouping, the root of the eukaryotic tree, and the identity of early evolving lineages. Some difficulties are largely methodological, including artifacts arising from long-branch attraction (LBA). Another problem, more easily remediable, is poor taxon samplingmany protein trees entirely omit key, often free-living, protist groups.
The largest subunit of RNA Polymerase II (RPB1) has been one of the few proteins used to address issues of major eukaryotic relationships (Stiller and Hall 1997
; Stiller and Hall 1998
; Hirt et al. 1999
). RPB1 is large (ca. 1,600 amino acid residues), and phylogenetic trees of this molecule can be outgroup-rooted by either its archaebacterial homologs or by its eukaryotic-specific paralogs, RPA1 or RPC1 (the largest subunits of RNA Polymerase I and III, respectively). However, because RPB1 genes are so large, they have not been characterized in a wide variety of eukaryotic species: the taxonomic representation of RPB1 is particularly sparse compared with other molecules, such as tubulins or ssu rDNA (see Baldauf et al. 2000
). RPB1 orthologs have been well sampled and characterized from animals and fungi. RPB1 sequences are also available for parasitic protists once thought to be early emerging eukaryotes, but notably lacking are sequences from free-living protists. Organisms heretofore missing from RPB1 analyses include heterokonts (or stramenopiles), cercomonads, heteroloboseans, and the cryptomonad nucleomorph (nm).
The first three are each monophyletic groups, having a variety of proposed larger-scale evolutionary affinities. The heterokonts are a collection of algae and secondary heterotrophs that have recently been proposed as related to the alveolates (Cavalier-Smith 1999
; Fast et al. 2001
). Cercomonads are related to various filose amoebae, thaumatomonads, and chlorarachniophytes (collectively Cercozoa: Cavalier-Smith 1998
), and possibly also to foraminifera (Keeling 2001
). Heterolobosea were proposed as an early evolving lineage because of ssu rDNA evidence and their lack of Golgi dictyosomes (Cavalier-Smith 1993
) but are now thought to be related to Euglenozoa in a larger excavate assemblage (Simpson and Patterson 1999
; Cavalier-Smith 2002
). The cryptomonad nm is the relict nucleus of an anciently captured red algal cell (Douglas et al. 1991
, 2001
).
In this study we cloned and sequenced the gene encoding RPB1 from Ochromonas danica (heterokont), Cercomonas ATCC50319, and Naegleria gruberi (heterolobosean). We have also analyzed RPB1 from the Guillardia theta nm. These data provide significant additions to the diversity of protist taxa represented in the RPB1 data set. We examine the evolutionary affinities of these lineages, as well as the effects of LBA in this data set, by a number of phylogenetic methods and consider the implications of RPB1 phylogeny for the evolution of transcription, spliceosomal introns, and the overall pattern of eukaryotic evolution.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
PCR Amplification
Conserved regions AD of the Naegleria RPB1 gene sequence were amplified using the degenerate PCR primers RPB1-F1 (GAG TGT CCA GGN CAY TTY GG) and RPB1-R2 (GTC GAA GTC TGC RTT RTA NGG) described in Hirt et al. (1999). After sequencing of this fragment, an exact-match primer, RPB1-N5X1 (AAG ATG GTA CAC GTA TCG), was used in combination with the reverse degenerate primer RPB1-R4 (TG GAA CGT ATT NAR NGT CAT) to obtain the remaining regions used for phylogenetic analysis. A second exact-match primer, RPB1-N3X1 (CAA GGG TAC TGA TGA ATT GTC), was used in combination with degenerate primer CTDR1 (TGA TAG ACT GGN GAN GTN GG) to amplify the remaining portion of the gene, including conserved region H and a portion of the carboxy-terminal domain (CTD). All PCR fragments were cloned into TOPO 2.1 vector using the TOPO TA cloning kit (InVitrogen). Sequencing of each clone was by LICOR and ABI automated sequencers. All clones were sequenced in both directions, and the full gene sequence was assembled with two- to sixfold coverage.
Degenerate primers (described in Stiller and Hall 1997
; Stiller, Duffield, and Hall 1998
) were used for PCR amplification of RPB1 regions AD, DF, and FG (Stiller and Hall 1997
) from Cercomonas ATCC50319 and O. danica. Products were cloned into Topo TA vectors (InVitrogen) and completely sequenced using ABI sequencing protocols.
Phylogeny
All sequences were obtained from NCBI, with the exception of those from N. gruberi (AF395110) O. danica (AF395111) and Cercomonas ATCC50319 (AF395835) reported herein. These three RPB1 sequences plus three RPA1 sequences as outgroups were manually added to a previously published alignment (Hirt et al. 1999
) using MacClade 4.0. (Maddison and Maddison 2000
). The final alignment used for global phylogenetic analysis contained 22 taxa and 746 aligned amino acid sites. A sub-data set with the outgroups removed was also analyzed. These alignments are available upon request. The microsporidial RPB1 sequences were not included in these analyses as they represent highly divergent fungi (Hirt et al. 1999
), and two representative, less divergent, fungal sequences were used instead.
Maximum parsimony analyses were performed using Paup* 4.0b (Swofford 1998
), whereas Neighbor-Joining and Fitch distance analyses used Phylip 3.573 (Felsenstein 1995
). Protein maximum likelihood (ML) analyses were done using two methods. Puzzle 4.0.2 (Strimmer and von Haeseler 1997
) was used incorporating a gamma correction for among site rate variation plus a correction for invariant sites (8 + 1 rate categories) estimated from the data set. A Neighbor-Joining tree, estimated by Puzzle 4.0.2, was used as a basis for site rate calculations. In addition, a protML 2.2 (Adachi and Hasegawa 1996
) heuristic (-q 10,000) search was performed for each data set. For the protML analyses, the relative estimated log likelihood values (RELLs) were calculated using Mol2con (A. Stoltzfus, personal communication). Although full ML heuristic searches were done to search for the optimal topology using ProML 3.6a (Felsenstein 1995
), the optimal topology found by ProML contradicted several nodes supported by all other methods in our analyses, including some which are well established in the literature; these alternate nodes were not supported by any of the other methods at greater than 50%. For this reason, the trees shown are the best protML topology (i.e., with the highest log likelihood) with branch lengths estimated in Puzzle to incorporate gamma-distributed rates and invariant sites. Although these protML trees provide an accurate representation of RPB1 phylogeny, they may not be the overall best trees because of computational limitations. ML distance analyses used Tree-Puzzle 4.02 (previously called Puzzle; Strimmer and von Haeseler 1997
) to calculate ML distance matrices along with Puzzleboot (A. Roger and M. Holder, personal communication; www.tree-puzzle.de); resampled matrices were then analyzed using Fitch (Felsenstein 1995
) with global rearrangements and 10 times jumbling. All bootstrap support values are based on 100 replicates.
After the LBA tests, a new alignment was constructed initially using Clustal X (Thompson et al. 1997
) and then adjusted manually such that only regions of unambiguously alignable sequence were retained for analysis (17 taxa, 910 sites). Phylogenetic analyses for this restricted data set were identical to those done on the global set.
LBA Tests
For the 22 taxon, 746 site data set, evolutionary rates at all amino acid sites were calculated using Puzzle 4.0.2 (Strimmer and von Haeseler 1997
), and selected sites were removed manually in MacClade 4.0. (Maddison and Maddison 2000
). For fast site removal (FSR) all sites calculated to be in the fastest rate category (category 8 of 8) were removed. For constant site removal (CSR) those in the slowest, i.e., invariant, class (category 0) were removed. For fast and constant site removal (FCSR), sites in both categories were eliminated. Heuristic protML analyses (-q 10,000) were performed and RELL values determined as described previously.
For establishing the autapomorphy to symplesiomorphy ratio, each class of substitution was assessed manually from the 22-taxon alignment. Autapomorphies were defined as unique substitutions at an otherwise invariant position within the in-group taxa and not shared with the outgroups. The outgroups, however, did not also have to share the invariant residue with the other in-groups. Symplesiomorphies were defined as substitutions shared between an in-group and at least two of the three outgroup taxa but different from the other in-groups at an otherwise invariant position for those in-groups. Substitutions for both classes were tallied for each in-group taxon individually as well as for several taxonomic groups (fungi, animals, red algae, kinetoplastids) to compensate for uneven taxon sampling in the alignment. For these groups the final substitution count was the sum of those substitutions shared by the group and the average of the substitutions found in each of the component taxa.
RASA 2.5 (Lyons-Weiler and Hoelzer 1999
) was used to assess the phylogenetic signal in the various data sets and to identify long-branch sequences in a phylogeny independent fashion. Outgroup-rooted analyses (using the analytical method in RASA) were performed on the 22-taxon data set, including the RPA1 sequences. Unrooted RASA analyses were performed with the RPA1 sequences removed; these analyses were performed with both the analytical and permutation methods, the latter with 30 replicates.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Degenerate primers were used to amplify the CTD of RPB1 from N. gruberi. This region of RPB1 in animals, plants, and fungi, as well as a number of protists is composed of heptad repeats with a canonical sequence of YSPTSPS (Lam et al. 1992
; Stiller, Duffield, and Hall 1998
). Because this region was determined by PCR amplification, the precise number of repeats in Naegleria is not known, although at least eight heptads are present. Several repeats in the Naegleria CTD are degenerate, before beginning a more canonical YSPTSPA/YSPTSPN register. There is no information regarding the presence of CTDs in Cercomonas and Ochromonas RPB1 genes because they would be outside the amplified regions.
Uniquely, the G. theta nm RPB1 lacks a distinctive CTD, apparently ending abruptly at the end of domain H (the final conserved block in RPB1). Interestingly, we note a possible tandem pair of degenerate heptad repeats: YSLSLKLF-YSMMKNF in one of three ORFs annotated as hypothetical protein genes in the 1,699 bp region immediately downstream of the putative RPB1 stop codon and before the next bona fide gene (rpl37A). However, database searches with this region did not reveal any significant similarity with CTDs (or any other proteins). Codon usage for the three ORFs resembles that for the RPB1 gene but with so few codons in each orf, this may not be statistically significant. However, the GC content of the RPB1 gene is higher (27%) than that of any of the three ORFs (which range from 17% to 20% GC). For any of these regions to be part(s) of the RPB1 gene, one or more spliceosomal intron(s) would appear to be required at the 3' end of the nm RPB1, unlike the other known G. theta nm introns, which are all found at the extreme 5' ends of genes. Determination of the 3' end sequence of the RPB1 mRNA would be required to verify the absence of a CTD or whether transcription extends into the short CTD-like region in this downstream ORF (or both).
Global Phylogenetic Analysis
To assess the phylogenetic placement of our new RPB1s, amino acid sequences from diverse eukaryotes were aligned with RPA1 homologs from an animal, a plant, and a fungus. This 22-taxon data set of 746 unambiguously aligned positions was subjected to rigorous phylogenetic analysis (fig. 1A
). The Giardia sequence emerged as the earliest RPB1 branch followed by the Trichomonas RPB1 sequence, both with apparently good support. Neither the Naegleria nor the Cercomonas sequences were strongly placed in these trees, whereas the Ochromonas sequence grouped with Plasmodium with variable, but modest support. Beyond those nodes that were universally supported, parsimony and distance analyses did not seem to provide significant resolution but were consistent with the seemingly more resolved ML analyses.
|
Hirt et al. (1999)
showed that failure to correct for invariant or rapidly evolving sites could lead to artifactual resolution in phylogenies, especially when using protML in which the assumption of rate constancy is applied to all sites. A simple method of fast and CSR was used to compensate for this artifact. In order to assess the effect of these site rate categories in our data set, protML (-q 10,000) searches were carried out with fast, constant, and fast plus constant sites removed (fig. 1B
).
Reminiscent of Hirt et al. (1999)
our optimal ML topology was unchanged by site removal, but the support for several nodes were significantly affected. Although the node separating the Giardia RPB1 and the three RPA1 sequences from the rest remained robust, support for the node placing the Trichomonas sequence with Giardia and outgroups dropped from 78% to 51% RELL support with fast sites removed and to 42% with fast plus constant sites removed (fig. 1B
). This suggests that the deep placement of Trichomonas might also be artifactual. Interestingly, the node uniting Ochromonas and Plasmodium rose from 78% RELL support to 95% and 94% with fast and fast plus constant sites removed, respectively (fig. 1B
). It is possible that long-branch effects are masking the real phylogenetic signal in this case.
Without knowing the location of the root, it is difficult to distinguish whether a long branch is caused by rapid sequence evolution or early divergence. Stiller, Duffield, and Hall (1998)
proposed a method that may help to do so. They realized that the ratio of unique substitutions (autapomorphies) in a sequence to the shared substitutions with outgroups (symplesiomorphies) should be relatively uniform even in early diverging eukaryotes, if the rate of evolution were fairly constant and the earliest branches did not precede others by an immensely long time. However, a high ratio of autapomorphies to symplesiomorphies indicates rapid sequence divergence in a taxon rather than slow, but ancient, evolution. In the 746 amino acid alignment, Giardia has 24 autapomorphies, Ochromonas and Trichomonas have 8 and 7, respectively, kinetoplastids have 6, and no other taxa or group of taxa have more than 4. Giardia only has five symplesiomorphies, Naegleria has one, as does the Homo sapiens sequence. The exceptionally numerous autapomorphies in Giardia are suggestive of rapid evolution in its RPB1 gene but do not preclude the possibility of it also being an ancient lineage among eukaryotes.
RASA (Lyons-Weiler and Hoelzer 1999
) assesses phylogenetic signal by measuring expected distribution of synapomorphies in a data set against a null hypothesis of a random distribution. It also identifies those taxa contributing more statistical noise than phylogenetic signal in the same data set. As seen in figure 2A,
when the rooted 22-taxon 746-site data set was analyzed using RASA, the Giardia sequence was clearly identified as a long-branch sequence. A similar result was obtained when the outgroup sequences were manually removed (fig. 2B
). As indicated in figure 2AD,
those taxa with the largest taxon variance were sequentially removed from the data set until the observed variance distribution was relatively even (fig. 2E
). In each case of taxon deletion, the tRASA value rose or (in the case of Mastigamoeba) was not markedly decreased (fig. 2 lower). Although the status of Mastigamoeba as a long branch is debatable, it was also deleted from our analysis to be conservative regarding both the possibility of long-branch artifacts as well as reducing the computational load of further phylogenetic analyses.
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Evolution of Transcription
The RNA Polymerase II complex is responsible for transcription and processing of messenger RNAs, with the largest subunit, RPB1, being central. A comparison of RPB1 sequences from diverse eukaryotes allows us to examine its functional evolution, particularly at the putatively identified functional and highly conserved regions. Block A is the region to which amplification primers were designed, so no information is available for the three sequences obtained by PCR. However, it is notable that the Guillardia nm RPB1 deviates from the Cys2-His2 zinc finger motif that characterizes this region (Cornelissen, Evers, and Kock 1988
): the first histidine position of the conserved Cys-X2-Cys-X9-His-X2-His motif is replaced with a tyrosine. However, this is the only clear deviation in the nm sequence (as compared with the red algal sequences) of a previously identified critical residue. Blocks B through E are present and well conserved in all sequences that we obtained from free-living protists. Block F, the location of the catalytic sites (Wlassoff, Kimura, and Ishihama 1999
), is well conserved across all taxa. This region also contains residues identified for
-amanitin sensitivity (reviewed by Quon, Delgadillo, and Johnson 1996
), including Arg-741, Cys-777, and Gly-785; all are perfectly conserved in our RPB1s. Conserved domains G and H are both implicated in binding RPB6, a subunit important for RNA polymerase complex formation (Minakhin et al. 2001
). In particular, the residues PGEMV in domain G and DAFDVMIDEES in domain H have been pointed out as contact points. In Naegleria and the Guillardia nm, these residues in domain G are perfectly conserved. The region H residues are less well conserved in the Naegleria sequence and the nm gene seems truncated at the end of this region. Implications of these observations for RPB6 binding and complex formation will require direct experimental inquiry.
The carboxy-terminal domain (CTD) of RPB1 has a number of transcriptional and posttranscriptional functions, in regulating transcription efficiency and coupling it to pre-mRNA processing: capping, splicing, 3' end cleavage, and polyadenylation. The phosphorylation of serines 2 and 5 of the heptad repeats is particularly critical. Interestingly, though, Stiller, McConaughy, and Hall (2000)
demonstrated that the last serine of the YSPTSPS heptad, although highly conserved, is not essential and can be substituted by a nonphosphorylatable residue. The CTD from Naegleria is congruent with this, having nonphosphorylatable residues (either alanine or asparagine) at this position. In addition to regulatory effects by phosphate addition, the action of a prolyl-isomerase ESS1 in yeast seems also to exert a regulatory effect at the CTD (Wu et al. 2000
). In line with this, the Mastigamoeba and Naegleria CTDs are perfectly conserved at both proline positions.
Despite the CTD being implicated in many aspects of transcription and RNA processing, several protists appear to be devoid of a bona fide CTD, instead having only serine- and proline-rich regions at the carboxy terminal end. Even in taxa where the conservation of the repeats is strong, CTDs sometimes contain a number of noncanonical repeats. Similarly, of the eight repeats known in the Naegleria RPB1 CTD, three diverge significantly from the canonical heptad sequence. These data suggest that the exact sequence of the repeat may not be critical and that the conservation of the repeats may be correlated with the rigor with which the function is required. Because mRNA processing occurs in some taxa where the CTD is diminished or absent, they may have different mechanisms of transcription regulation. It also underlines that the careful functional work carried out with RPB1 in animals and fungi needs to be taken in an evolutionary context and not generalized to other species without direct evidence. Comparative studies, as here, may help in generalizing to all eukaryotes.
Evolution of Splicing and Introns
The RPB1 CTD plays a major role as a platform for construction of the spliceosome, (reviewed by Hirose and Manley 2000)
. We have observed a relationship between spliceosomal intron density and the presence of a CTD. For intron-rich species like mammals, the efficiency of spliceosome binding to the CTD may be paramount, perhaps forcing strict adherence to the heptad repeat sequence. However, for intron-sparse organisms, this conservation might be relaxed. RPB1s from Trichomonas and Giardia (organisms not known to contain spliceosomal introns; Logsdon 1998
) lack CTDs with canonical repeats but instead have serine-prolinerich C-terminal regionspossibly representing degenerate CTDs. Other protists also show possible CTD degeneration (Stiller and Hall 1997
; Stiller, Duffield, and Hall 1998
). Interestingly, the low intron density in Naegleria (Logsdon 1998
) matches its abnormal CTD. Without knowing the location of the eukaryotic root or even a well resolved eukaryotic phylogeny, we cannot be sure whether this and similar cases in other protists are degenerate or early stages of CTD evolution. The apparent absence of a CTD from the Guillardia nm RPB1 contrasts with the presence of 17 spliceosomal introns in its genome (Douglas et al. 2001
). If the CTD is indeed missing from the Guillardia nm RPB1, it is very likely caused by loss because both red and green algae contain either bona fide heptad repeats or clearly degenerate repeats. Whether the absence of CTD from the Guillardia nm affects the transcription-processing functions and represents a singular loss event or a general feature of genome diminution and intron loss are interesting questions, now open to investigation. As RPB1 genes are sequenced from a diversity of eukaryotes and as more protist genomes are studied, the relationship between the CTD and the evolution and spread of spliceosomal introns will be clarified.
Eukaryotic Phylogeny
Our analyses of RPB1 phylogeny reveal supportfor the first time with this moleculefor some higher-level groupings among major eukaryotic lineages. Although RPB1 does not provide robust resolution between some major eukaryotic groups, the opisthokonts (animals plus fungi), Amoebozoa, and chromalveolates are moderately supported, as they are for other phylogenetic markers (Baldauf and Palmer 1993
; Baldauf et al. 2000
; Fast et al. 2001
). While our paper was in preparation, another RPB1 analysis (Stiller, Riley, and Hall 2001
) confirmed the alveolate relationship providing a new ciliate sequence and showed glaucophytes as an outgroup to red algae. In the analyses shown here (fig. 3 ), the cryptomonad nm clearly groups with red algae (Douglas et al. 1991
), though it is unclear whether it will group within the strong glaucophyte-red algal clade. Unfortunately, neither the Naegleria nor Cercomonas RPB1 sequences show strong affinity for any others in our data set.
Two apparently robust nodes in our 22-taxon phylogeny were those separating the Giardia and then the Trichomonas sequences from the other eukaryotic RPB1 sequences. However, this need not mean they are actually early emerging. In line with previous suggestions and results (Stiller, Duffield, and Hall 1998
; Hirt et al. 1999
), our various tests indicate that these two sequences are particularly divergent; thus, their placement as early evolving lineages is suspect. The site removal (fig. 1B
), autapomorphy-sympleisiomorphy ratio, and RASA (fig. 2
) analyses confirmed that the Giardia and Trichomonas RPB1 sequences represent long branches within the analysis. The Trypanosoma sequence has also been suggested as a long branch; however, the Leishmania sequence appears to divide this branch and somewhat reduce its effects. Although our LBA analyses neither indicate an alternate placement for Giardia or Trichomonas nor prove that these sequences are not early evolving, they strongly concur with prior suggestions that the deeply diverging position of diplomonads and parabasalids be viewed with caution.
When long-branch taxa are excluded, we see less apparent resolution than in previous reports (Stiller, Duffield, and Hall 1998
; Hirt et al. 1999
) or in our global analyses (fig. 1
). This suggests that long-branch taxa may structure the data set and provide false resolution. It is therefore important to view with caution any conclusions based on RPB1 phylogenies which include long-branch taxa; their presence may obscure other relationships. Our restricted data set has some resolution at the supertaxon level, consistent with data from morphological and other molecular analyses. In particular, the chromalveolates and Amoebozoa are reconstructed with moderate support (fig. 3
), as are the opisthokonts; the latter two are notable, given their previous lack of resolution by RPB1 (Stiller, Duffield, and Hall 1998
; Hirt et al. 1999
), including a seemingly well supported, but contradictory, placement of animals and fungi (Sidow and Thomas 1994
). An opisthokont plus amoebozoa branch is recovered in the optimal topology, consistent with other data (Baldauf et al. 2000
), but it is not statistically supported or recovered by other methods. Stiller, Riley, and Hall (2001)
have recently provided evidence from RPB1 for the separation of red and green algae; in the analyses done here (including the removal of long-branch taxa) we find no support for this separation. Although we do not recover a monophyletic plant clade (red algae and land plants), there is no significant support for its polyphyly. Indeed, Moreira, Le Guyader, and Phillippe (2000)
also showed that RPB1 phylogeny was the sole exception among a variety of genes to uniting red and green algae and that analyses of RPB1 are not strongly inconsistent with this clade.
In its initial formulation, the eukaryotic big bang hypothesis stated that the major eukaryotic groups were formed in an explosive radiation yielding as many as 10 or as few as four fundamentally unresolvable groups (Philippe and Adoutte 1998
). In the past few years, a number of these (and other) major eukaryotic groups have been confidently placed together using concatenated data (Baldauf et al. 2000
; Moreira, Le Guyader, and Phillippe 2000)
, novel taxon inclusion (Dacks et al. 2001
), or alternative protein markers (Hirt et al. 1999
; Moreira, Le Guyader, and Phillippe 2000
; Keeling 2001
; Fast et al. 2001
). Consequently, we doubt that the large-scale relationships between eukaryotes are fundamentally unresolvable by conventional molecular phylogenetics. Recent incarnations of the eukaryotic big bang hypothesis have focused on the time span of the radiation and less on fundamental lack of resolution among lineages (Philippe, Germot, and Moreira 2000a
). The major eukaryotic supertaxa probably did evolve rapidly, in line with the observation that most single genes have consistent, but weak signal. However, that radiation probably left behind a phylogenetic signal that could be unraveled with more data and additional analyses. This means that the eukaryotic big bang and superclade views are not as incompatible as they might first appear. Using several different genes to establish internal relationships may prove more productive and robust than seeking the deepest diverging taxa using single genes only. Given the relative success of RPB1 in placing phylogenetically difficult taxa (Hirt et al. 1999
; Stiller, Riley, and Hall 2001
) and our demonstration of some larger-scale eukaryotic resolution, building a well-represented RPB1 database may help clarify some of these internal relationships.
Note Added in Proof
Two recent papers have demonstrated that the diverse amoebae Mastigamoeba, Entamoeba and Dictyostelium form a monophyletic group, Conosa (Arisue, N., T. Hashimoto, J. A. Lee, D. V. Moore, P. Gordon, C. W. Sensen, T. Gaasterland, M. Hasegawa, and M. Muller. 2002. The phylogenetic position of Mastigamoeba balamuthi based on sequences of rDNA and translation elongation factors EF1- and EF-2. J. Eukaryot. Microbiol. 49:110; Bapteste, E., H. Brinkmann, J. A. Lee, D. V. Moore, C. W. Sensen, P. Gordon, L. Durufle, T. Gaasterland, P. Lopez, M. Muller, and H. Phillippe. 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl. Acad. Sci. USA 99:1419). A third paper using concatenated mitochondrial proteins (Forget, L., J. Ustinova, Z. Wang, V. A. R. Huss, and B. F. Lang. 2002. Hyaloraphidium curvatum: A linear mitochondrial genome, tRNA Editing, and an evolutionary link to lower fungi. Mol. Biol. Evol. 19:310319) shows independently of our nuclear gene evidence and that of Baldauf et al. (2000) that Acanthamoeba also is specifically related to Dictyostelium. This extensive evidence for the monophyly of Amoebozoa is not strongly contradicted by our analyses that do not place Mastigamoeba invertens with the other two amoebae; the RPB1 data set seems sensitive to long branch effects and M. invertens acts as a long branch. The position of M. invertens is similarly non-robust on gamma corrected 18S rRNA trees, where it often does not group with other amoebae (and never with M. balamuthi: TC-S unpublished data). In addition, a spliceosomal intron has been recently discovered in Giardia (Nixon, J. E., A. Wang, H. G. Morrison, A. G. McArthur, M. L. Sogin, B. J. Loftus, and J. Samuelson. 2002. A spliceosomal intron in Giardia lamblia. Proc. Natl. Acad. Sci. USA 99:37013705. Thus Giardia must still be capable of splicing despite its abnormal CTD; this is consistent with our suggestion of widespread CTD degeneration in protists.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Contributed equally to this paper
Abbreviations: RPB1, RNA Polymerase II largest subunit; LBA, long-branch attraction; CTD, carboxy-terminal domain of RPB1; nm, nucleomorph.
Keywords: evolution
Naegleria
Cercomonas
Ochromonas
intron
nucleomorph
Address for correspondence and reprints: John M. Logsdon Jr., Department of Biology, Emory University, 1111 Rollins Research Center, 1510 Clifton Road, Atlanta, Georgia 30322. jlogsdon{at}biology.emory.edu
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adachi J., M. Hasegawa, 1996 MOLPHY Version 2.3. Programs for molecular phylogenetics based on maximum likelihood. Computer Science Monographs 28
Archibald J., T. Cavalier-Smith, U. Maier, S. Douglas, 2001 Molecular chaperones encoded by a reduced nucleusthe cryptomonad nucleomorph J. Mol. Evol 52:490-501[ISI][Medline]
Baldauf S. L., W. F. Doolittle, 1997 Origin and evolution of the slime molds (Mycetozoa) Proc. Natl. Acad. Sci. USA 94:12007-12012
Baldauf S. L., J. D. Palmer, 1993 Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins Proc. Natl. Acad. Sci. USA 90:11558-11562[Abstract]
Baldauf S. L., A. J. Roger, I. Wenk-Siefert, W. F. Doolittle, 2000 A kingdom-level phylogeny of eukaryotes based on combined protein data Science 290:972-977
Cavalier-Smith T., 1987 The origin of fungi and pseudofungi Pp. 339353 in A. D. M. Rayner, C. M. Brasier, and D. Moore, eds., Evolutionary biology of the fungi, Vol. 13. Symp. Br. Mycol. Soc. Cambridge University Press, Cambridge
. 1993 Kingdom Protozoa and its 18 phyla Microbiol. Rev 57:953-994[Abstract]
. 1998 A revised six-kingdom system of life Biol. Rev. Camb. Philos. Soc 73:203-266[ISI][Medline]
. 1999 Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree J. Eukaryot. Microbiol 46:347-366[ISI]
. 2000 Flagellate megaevolution: the basis for eukaryote diversification Pp. 361390 in J. R. Green and B. S. C. Leadbeater, eds. The Flagellates. Taylor and Francis, London
. 2002 The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa Int. J. Syst. Evol. Microbiol 52:297-354
Cornelissen A. W., R. Evers, J. Kock, 1988 Structure and sequence of genes encoding subunits of eukaryotic RNA polymerases Oxf. Surv. Eukaryot. Genes 5:91-131[Medline]
Dacks J., A. J. Roger, 1999 The first sexual lineage and the relevance of facultative sex J. Mol. Evol 48:779-783[ISI][Medline]
Dacks J. B., J. D. Silberman, A. G. Simpson, S. Moriya, T. Kudo, M. Ohkuma, R. J. Redfield, 2001 Oxymonads are closely related to the excavate taxon Trimastix Mol. Biol. Evol 18:1034-1044
Douglas S., S. Zauner, M. Fraunholz, M. Beaton, S. Penny, L. T. Deng, X. Wu, M. Reith, T. Cavalier-Smith, U. G. Maier, 2001 The highly reduced genome of an enslaved algal nucleus Nature 410:1091-1096[ISI][Medline]
Douglas S. E., C. A. Murphy, D. F. Spencer, M. W. Gray, 1991 Cryptomonad algae are evolutionary chimaeras of two phylogenetically distinct unicellular eukaryotes Nature 350:148-151[ISI][Medline]
Douglas S. E., S. L. Penny, 1999 The plastid genome of the cryptophyte alga, Guillardia theta: complete sequence and conserved synteny groups confirm its common ancestry with red algae J. Mol. Evol 48:236-244[ISI][Medline]
Edgcomb V. P., A. J. Roger, A. G. Simpson, D. T. Kysela, M. L. Sogin, 2001 Evolutionary relationships among "jakobid" flagellates as indicated by alpha- and beta-tubulin phylogenies Mol. Biol. Evol 18:514-522
Embley T. M., R. P. Hirt, 1998 Early branching eukaryotes? Curr. Opin. Genet. Dev 8:624-629[ISI][Medline]
Fast N. M., J. C. Kissinger, D. S. Roos, P. J. Keeling, 2001 Nuclear-encoded, plastid-targeted genes suggest a single common origin for apicomplexan and dinoflagellate plastids Mol. Biol. Evol 18:418-426
Felsenstein J., 1995 PHYLIP (phylogeny inference package) Department of Genetics, University of Washington, Seattle
Germot A., H. Philippe, H. Le Guyader, 1997 Evidence for the loss of mitochondria in Microsporidia from a mitochondrial-type HSP70 in Nosema locustae Mol. Biochem. Parasitol 87:159-168[ISI][Medline]
Hirose Y., J. L. Manley, 2000 RNA polymerase II and the integration of nuclear events Genes Dev 14:1415-1429
Hirt R. P., J. M. Logsdon Jr., B. Healy, M. W. Dorey, W. F. Doolittle, T. M. Embley, 1999 Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins Proc. Natl. Acad. Sci. USA 96:580-585
Keeling P. J., 2001 Foraminifera and cercozoa are related in actin phylogeny: two orphans find a home? Mol. Biol. Evol 18:1551-1557
Keeling P. J., J. A. Deane, C. Hink-Schauer, S. E. Douglas, U. G. Maier, G. I. McFadden, 1999 The secondary endosymbiont of the cryptomonad Guillardia theta contains alpha-, beta-, and gamma-tubulin genes Mol. Biol. Evol 16:1308-1313[Abstract]
Keeling P. J., W. F. Doolittle, 1996 Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family Mol. Biol. Evol 13:1297-1305
Lam T. Y., L. Chan, P. Yip, C. H. Siu, 1992 The largest subunit of RNA polymerase II in Dictyostelium: conservation of the unique tail domain and gene expression Biochem. Cell. Biol 70:792-799[ISI][Medline]
Lichtenstein C. P., J. Draper, 1985 Genetic engineering in plants Pp. 102103 in D. M. Glover, ed. DNA cloning: a practical approach. IRL Press, Oxford
Logsdon J. M. Jr., 1998 The recent origins of spliceosomal introns revisited Curr. Opin. Genet. Dev 8:637-648[ISI][Medline]
Lyons-Weiler J., G. A. Hoelzer, 1999 Null model selection, compositional bias, character state bias, and the limits of phylogenetic information Mol. Biol. Evol 16:1400-1406
Lyons-Weiler J., G. A. Hoelzer, R. J. Tausch, 1996 Relative apparent synapomorphy analysis (RASA). I: the statistical measurement of phylogenetic signal Mol. Biol. Evol 13:749-757[Abstract]
Maddison D. R., W. P. Maddison, 2000 MacClade 4; analysis of phylogeny and character evolution Sinauer Associates, Sunderland, Mass
Minakhin L., S. Bhagat, A. Brunning, E. A. Campbell, S. A. Darst, R. H. Ebright, K. Severinov, 2001 Bacterial RNA polymerase subunit omega and eukaryotic RNA polymerase subunit RPB6 are sequence, structural, and functional homologs and promote RNA polymerase assembly Proc. Natl. Acad. Sci. USA 98:892-897
Moreira D., H. Le Guyader, H. Phillippe, 2000 The origin of red algae and the evolution of chloroplasts Nature 405:69-72[ISI][Medline]
Philippe H., A. Adoutte, 1998 The molecular phylogeny of Eukaryota: solid facts and uncertainties Pp. 2556 in G. Coombs, K. Vickerman, M. Sleigh, and A. Warren, eds. Evolutionary relationships among Protozoa. Chapman & Hall, London
Philippe H., A. Germot, D. Moreira, 2000a. The new phylogeny of eukaryotes Curr. Opin. Genet. Dev 10:596-601[ISI][Medline]
Philippe H., P. Lopez, H. Brinkmann, K. Budin, A. Germot, J. Laurent, D. Moreira, M. Muller, H. Le Guyader, 2000b. Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc. R. Soc. Lond. B. Biol. Sci 267:1213-1221
Quon D. V., M. G. Delgadillo, P. J. Johnson, 1996 Transcription in the early diverging eukaryote Trichomonas vaginalis: an unusual RNA polymerase II and alpha-amanitinresistant transcription of protein-coding genes J. Mol. Evol 43:253-262[ISI][Medline]
Sidow A., W. K. Thomas, 1994 A molecular evolutionary framework for eukaryotic model organisms Curr. Biol 4:596-603[ISI][Medline]
Silberman J. D., C. G. Clark, L. S. Diamond, M. L. Sogin, 1999 Phylogeny of the genera Entamoeba and Endolimax as deduced from small-subunit ribosomal RNA sequences Mol. Biol. Evol 16:1740-1751
Simpson A. G. B., D. J. Patterson, 1999 The ultrastructure of Carpediemonas membranifera (Eukaryota), with reference to the "excavate hypothesis." Eur. J. Protistol 35:353-370[ISI]
Sogin M. L., 1991 Early evolution and the origin of eukaryotes Curr. Opin. Gen. Dev 1:457-463[Medline]
Stiller J. W., E. C. Duffield, B. D. Hall, 1998 Amitochondriate amoebae and the evolution of DNA-dependent RNA polymerase II Proc. Natl. Acad. Sci. USA 95:11769-11774
Stiller J. W., B. D. Hall, 1997 The origin of red algae: implications for plastid evolution Proc. Natl. Acad. Sci. USA 94:4520-4525
. 1998 Sequences of the largest subunit of RNA polymerase II from two red algae and their implications for rhodophyte evolution J. Phycol 34:857-864[ISI]
. 1999 Long-branch attraction and the rDNA model of early eukaryotic evolution Mol. Biol. Evol 16:1270-1279
Stiller J. W., B. L. McConaughy, B. D. Hall, 2000 Evolutionary complementation for polymerase II CTD function Yeast 16:57-64[ISI][Medline]
Stiller J. W., J. Riley, B. D. Hall, 2001 Are red algae plants? A critical evaluation of three key molecular data sets. J. Mol. Evol 52:527-539
Strimmer K., A. von Haeseler, 1997 Puzzle Zoologisches Institut. Universitat Muenchen, Munich
Swofford D. L., 1998 PAUP*: phylogenetic analysis using parsimony (* and Other Methods) Sinauer Associates, Sunderland, Mass
Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 25:4876-4882
Wlassoff W. A., M. Kimura, A. Ishihama, 1999 Functional organization of two large subunits of the fission yeast Schizosaccharomyces pombe RNA polymerase II Location of the catalytic sites. J. Biol. Chem 274:5104-5113
Wu X., C. B. Wilcox, G. Devasahayam, R. L. Hackett, M. Arevalo-Rodriguez, M. E. Cardenas, J. Heitman, S. D. Hanes, 2000 The Ess1 prolyl isomerase is linked to chromatin remodeling complexes and the general transcription machinery EMBO J 19:3727-3738