Analyses of RNA Polymerase II Genes from Free-Living Protists: Phylogeny, Long Branch Attraction, and the Eukaryotic Big Bang

Joel B. Dacks1, Alexandra Marinets1, W. Ford Doolittle, Thomas Cavalier-Smith and John M. Logsdon, Jr3

*Program in Evolutionary Biology, Canadian Institute for Advanced Research, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax;
{dagger}Department of Botany, University of British Columbia, Vancouver;
{ddagger}Department of Zoology, University of Oxford, South Parks Road, UK;
§Department of Biology, Emory University


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The phylogenetic relationships among major eukaryotic protist lineages are largely uncertain. Two significant obstacles in reconstructing eukaryotic phylogeny are long-branch attraction (LBA) effects and poor taxon sampling of free-living protists. We have obtained and analyzed gene sequences encoding the largest subunit of RNA Polymerase II (RPB1) from Naegleria gruberi (a heterolobosean), Cercomonas ATCC 50319 (a cercozoan), and Ochromonas danica (a heterokont); we have also analyzed the RPB1 gene from the nucleomorph (nm) genome of Guillardia theta (a cryptomonad). Using a variety of phylogenetic methods our analysis shows that RPB1s from Giardia intestinalis and Trichomonas vaginalis are probably subject to intense LBA effects. Thus, the deep branching of these taxa on RPB1 trees is questionable and should not be interpreted as evidence favoring their early divergence. Similar effects are discernable, to a lesser extent, with the Mastigamoeba invertens RPB1 sequence. Upon removal of the outgroup and these problematic sequences, analyses of the remaining RPB1s indicate some resolution among major eukaryotic groups. The most robustly supported higher-level clades are the opisthokonts (animals plus fungi) and the red algae plus the cryptomonad nm—the latter result gives added support to the red algal origin of cryptomonad chloroplasts. Clades comprising Dictyostelium discoideum plus Acanthamoeba castellanii (Amoebozoa) and Ochromonas plus Plasmodium falciparum (chromalveolates) are consistently observed and moderately supported. The clades supported by our RPB1 analyses are congruent with other data, suggesting that bona fide phylogenetic relationships are being resolved. Thus, the RPB1 gene has apparently retained some phylogenetically meaningful signal, making it worthwhile to obtain sequences from more diverse protist taxa. Additional RPB1 data, especially in combination with other genes, should provide further resolution of branching orders among protist groups within the apparently rapid early divergence of eukaryotes.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
In the late 1980s and early 1990s, small subunit ribosomal DNA (ssu rDNA) painted a picture of three major early evolving eukaryotic lineages: the diplomonads, parabasalids, and microsporidia. These groups were followed sequentially by euglenozoans and heteroloboseans and an unresolved radiation of so-called crown taxa, including animals, plants, fungi, and a number of protists (Sogin 1991Citation ). This ssu rDNA-based view of eukaryotic relationships has been greatly weakened by protein phylogenies, indicating that some taxa, such as microsporidia (Keeling and Doolittle 1996Citation ; Germot, Philippe, and Le Guyader 1997Citation ; Hirt et al. 1999Citation ) and Mycetozoa (Baldauf and Doolittle 1997Citation ; Baldauf et al. 2000Citation ), are seriously misplaced on ssu rDNA trees. Related studies reveal phylogenetic artifacts in ssu rDNA trees formerly thought to support the apparently early divergence of diplomonads and parabasalids (Hirt et al. 1999Citation ; Silberman et al. 1999Citation ; Stiller and Hall 1999Citation ; Philippe et al. 2000bCitation ) and suggest that the root of the ssu rDNA tree using bacterial outgroups may be misplaced (Embley and Hirt 1998Citation ).

How to reconcile the ssu rDNA and protein sequence evidence is hotly debated and has prompted several alternative views of eukaryotic relationships. The "eukaryotic big bang" hypothesis suggests that eukaryotes evolved in a massive radiation of 4–10 groups whose interrelationships are fundamentally irresolvable (Philippe and Adoutte 1998Citation ; Philippe, Germot, and Moreira 2000aCitation ). An alternative view, based on combined protein data, proposes two superclades of eukaryotes: one group, called opisthokonts (Cavalier-Smith 1987Citation ), contains animals, fungi, and their choanozoan relatives, whereas the other contains plants, chromists, and most protozoa (Embley and Hirt 1998Citation ; Dacks and Roger 1999Citation ; Baldauf et al. 2000Citation ; Edgcomb et al. 2001Citation ). As ribosomal rRNA trees also invariably robustly resolve this dichotomy between opisthokonts and the rest of eukaryotes (Sogin 1991Citation ; Cavalier-Smith 1993Citation , 2000Citation ), some of the broad features of both rRNA and protein trees can be reconciled and are congruent with key ultrastructural data (Cavalier-Smith 2000Citation , 2002Citation ). However, a number of unanswered questions remain: the evolutionary affinities of the many protist groups not clearly attributable to any major grouping, the root of the eukaryotic tree, and the identity of early evolving lineages. Some difficulties are largely methodological, including artifacts arising from long-branch attraction (LBA). Another problem, more easily remediable, is poor taxon sampling—many protein trees entirely omit key, often free-living, protist groups.

The largest subunit of RNA Polymerase II (RPB1) has been one of the few proteins used to address issues of major eukaryotic relationships (Stiller and Hall 1997Citation ; Stiller and Hall 1998Citation ; Hirt et al. 1999Citation ). RPB1 is large (ca. 1,600 amino acid residues), and phylogenetic trees of this molecule can be outgroup-rooted by either its archaebacterial homologs or by its eukaryotic-specific paralogs, RPA1 or RPC1 (the largest subunits of RNA Polymerase I and III, respectively). However, because RPB1 genes are so large, they have not been characterized in a wide variety of eukaryotic species: the taxonomic representation of RPB1 is particularly sparse compared with other molecules, such as tubulins or ssu rDNA (see Baldauf et al. 2000Citation ). RPB1 orthologs have been well sampled and characterized from animals and fungi. RPB1 sequences are also available for parasitic protists once thought to be early emerging eukaryotes, but notably lacking are sequences from free-living protists. Organisms heretofore missing from RPB1 analyses include heterokonts (or stramenopiles), cercomonads, heteroloboseans, and the cryptomonad nucleomorph (nm).

The first three are each monophyletic groups, having a variety of proposed larger-scale evolutionary affinities. The heterokonts are a collection of algae and secondary heterotrophs that have recently been proposed as related to the alveolates (Cavalier-Smith 1999Citation ; Fast et al. 2001Citation ). Cercomonads are related to various filose amoebae, thaumatomonads, and chlorarachniophytes (collectively Cercozoa: Cavalier-Smith 1998Citation ), and possibly also to foraminifera (Keeling 2001Citation ). Heterolobosea were proposed as an early evolving lineage because of ssu rDNA evidence and their lack of Golgi dictyosomes (Cavalier-Smith 1993Citation ) but are now thought to be related to Euglenozoa in a larger excavate assemblage (Simpson and Patterson 1999Citation ; Cavalier-Smith 2002Citation ). The cryptomonad nm is the relict nucleus of an anciently captured red algal cell (Douglas et al. 1991Citation , 2001Citation ).

In this study we cloned and sequenced the gene encoding RPB1 from Ochromonas danica (heterokont), Cercomonas ATCC50319, and Naegleria gruberi (heterolobosean). We have also analyzed RPB1 from the Guillardia theta nm. These data provide significant additions to the diversity of protist taxa represented in the RPB1 data set. We examine the evolutionary affinities of these lineages, as well as the effects of LBA in this data set, by a number of phylogenetic methods and consider the implications of RPB1 phylogeny for the evolution of transcription, spliceosomal introns, and the overall pattern of eukaryotic evolution.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
DNA
Purified DNA from N. gruberi was generously donated by R. J. Redfield (University of British Columbia). Total genomic DNA was extracted from O. danica (ATCC 30004) and Cercomonas ATCC 50319 using a CTAB extraction (Lichtenstein and Draper 1985Citation ).

PCR Amplification
Conserved regions A–D of the Naegleria RPB1 gene sequence were amplified using the degenerate PCR primers RPB1-F1 (GAG TGT CCA GGN CAY TTY GG) and RPB1-R2 (GTC GAA GTC TGC RTT RTA NGG) described in Hirt et al. (1999). After sequencing of this fragment, an exact-match primer, RPB1-N5X1 (AAG ATG GTA CAC GTA TCG), was used in combination with the reverse degenerate primer RPB1-R4 (TG GAA CGT ATT NAR NGT CAT) to obtain the remaining regions used for phylogenetic analysis. A second exact-match primer, RPB1-N3X1 (CAA GGG TAC TGA TGA ATT GTC), was used in combination with degenerate primer CTDR1 (TGA TAG ACT GGN GAN GTN GG) to amplify the remaining portion of the gene, including conserved region H and a portion of the carboxy-terminal domain (CTD). All PCR fragments were cloned into TOPO 2.1 vector using the TOPO TA cloning kit (InVitrogen). Sequencing of each clone was by LICOR and ABI automated sequencers. All clones were sequenced in both directions, and the full gene sequence was assembled with two- to sixfold coverage.

Degenerate primers (described in Stiller and Hall 1997Citation ; Stiller, Duffield, and Hall 1998Citation ) were used for PCR amplification of RPB1 regions A–D, D–F, and F–G (Stiller and Hall 1997Citation ) from Cercomonas ATCC50319 and O. danica. Products were cloned into Topo TA vectors (InVitrogen) and completely sequenced using ABI sequencing protocols.

Phylogeny
All sequences were obtained from NCBI, with the exception of those from N. gruberi (AF395110) O. danica (AF395111) and Cercomonas ATCC50319 (AF395835) reported herein. These three RPB1 sequences plus three RPA1 sequences as outgroups were manually added to a previously published alignment (Hirt et al. 1999Citation ) using MacClade 4.0. (Maddison and Maddison 2000Citation ). The final alignment used for global phylogenetic analysis contained 22 taxa and 746 aligned amino acid sites. A sub-data set with the outgroups removed was also analyzed. These alignments are available upon request. The microsporidial RPB1 sequences were not included in these analyses as they represent highly divergent fungi (Hirt et al. 1999Citation ), and two representative, less divergent, fungal sequences were used instead.

Maximum parsimony analyses were performed using Paup* 4.0b (Swofford 1998Citation ), whereas Neighbor-Joining and Fitch distance analyses used Phylip 3.573 (Felsenstein 1995Citation ). Protein maximum likelihood (ML) analyses were done using two methods. Puzzle 4.0.2 (Strimmer and von Haeseler 1997Citation ) was used incorporating a gamma correction for among site rate variation plus a correction for invariant sites (8 + 1 rate categories) estimated from the data set. A Neighbor-Joining tree, estimated by Puzzle 4.0.2, was used as a basis for site rate calculations. In addition, a protML 2.2 (Adachi and Hasegawa 1996Citation ) heuristic (-q 10,000) search was performed for each data set. For the protML analyses, the relative estimated log likelihood values (RELLs) were calculated using Mol2con (A. Stoltzfus, personal communication). Although full ML heuristic searches were done to search for the optimal topology using ProML 3.6a (Felsenstein 1995Citation ), the optimal topology found by ProML contradicted several nodes supported by all other methods in our analyses, including some which are well established in the literature; these alternate nodes were not supported by any of the other methods at greater than 50%. For this reason, the trees shown are the best protML topology (i.e., with the highest log likelihood) with branch lengths estimated in Puzzle to incorporate gamma-distributed rates and invariant sites. Although these protML trees provide an accurate representation of RPB1 phylogeny, they may not be the overall best trees because of computational limitations. ML distance analyses used Tree-Puzzle 4.02 (previously called Puzzle; Strimmer and von Haeseler 1997Citation ) to calculate ML distance matrices along with Puzzleboot (A. Roger and M. Holder, personal communication; www.tree-puzzle.de); resampled matrices were then analyzed using Fitch (Felsenstein 1995Citation ) with global rearrangements and 10 times jumbling. All bootstrap support values are based on 100 replicates.

After the LBA tests, a new alignment was constructed initially using Clustal X (Thompson et al. 1997Citation ) and then adjusted manually such that only regions of unambiguously alignable sequence were retained for analysis (17 taxa, 910 sites). Phylogenetic analyses for this restricted data set were identical to those done on the global set.

LBA Tests
For the 22 taxon, 746 site data set, evolutionary rates at all amino acid sites were calculated using Puzzle 4.0.2 (Strimmer and von Haeseler 1997Citation ), and selected sites were removed manually in MacClade 4.0. (Maddison and Maddison 2000Citation ). For fast site removal (FSR) all sites calculated to be in the fastest rate category (category 8 of 8) were removed. For constant site removal (CSR) those in the slowest, i.e., invariant, class (category 0) were removed. For fast and constant site removal (FCSR), sites in both categories were eliminated. Heuristic protML analyses (-q 10,000) were performed and RELL values determined as described previously.

For establishing the autapomorphy to symplesiomorphy ratio, each class of substitution was assessed manually from the 22-taxon alignment. Autapomorphies were defined as unique substitutions at an otherwise invariant position within the in-group taxa and not shared with the outgroups. The outgroups, however, did not also have to share the invariant residue with the other in-groups. Symplesiomorphies were defined as substitutions shared between an in-group and at least two of the three outgroup taxa but different from the other in-groups at an otherwise invariant position for those in-groups. Substitutions for both classes were tallied for each in-group taxon individually as well as for several taxonomic groups (fungi, animals, red algae, kinetoplastids) to compensate for uneven taxon sampling in the alignment. For these groups the final substitution count was the sum of those substitutions shared by the group and the average of the substitutions found in each of the component taxa.

RASA 2.5 (Lyons-Weiler and Hoelzer 1999Citation ) was used to assess the phylogenetic signal in the various data sets and to identify long-branch sequences in a phylogeny independent fashion. Outgroup-rooted analyses (using the analytical method in RASA) were performed on the 22-taxon data set, including the RPA1 sequences. Unrooted RASA analyses were performed with the RPA1 sequences removed; these analyses were performed with both the analytical and permutation methods, the latter with 30 replicates.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Physical Attributes of the RPB1 Sequences
Using various degenerate and specific primers, a 4,790 nt ORF (encoding 1,596 amino acid residues), uninterrupted by introns, was amplified from N. gruberi. Only conserved regions A–G were amplified from Cercomonas ATCC50319 and O. danica, resulting in RPB1 sequences of 3,258 nt (1,055 amino acid residues) and 3,191 nt (1,062 residues), respectively. Whereas the O. danica sequence was uninterrupted by introns, the Cercomonas sequence had two, of 119 and 94 bases, respectively.

Degenerate primers were used to amplify the CTD of RPB1 from N. gruberi. This region of RPB1 in animals, plants, and fungi, as well as a number of protists is composed of heptad repeats with a canonical sequence of YSPTSPS (Lam et al. 1992Citation ; Stiller, Duffield, and Hall 1998Citation ). Because this region was determined by PCR amplification, the precise number of repeats in Naegleria is not known, although at least eight heptads are present. Several repeats in the Naegleria CTD are degenerate, before beginning a more canonical YSPTSPA/YSPTSPN register. There is no information regarding the presence of CTDs in Cercomonas and Ochromonas RPB1 genes because they would be outside the amplified regions.

Uniquely, the G. theta nm RPB1 lacks a distinctive CTD, apparently ending abruptly at the end of domain H (the final conserved block in RPB1). Interestingly, we note a possible tandem pair of degenerate heptad repeats: YSLSLKLF-YSMMKNF in one of three ORFs annotated as hypothetical protein genes in the 1,699 bp region immediately downstream of the putative RPB1 stop codon and before the next bona fide gene (rpl37A). However, database searches with this region did not reveal any significant similarity with CTDs (or any other proteins). Codon usage for the three ORFs resembles that for the RPB1 gene but with so few codons in each orf, this may not be statistically significant. However, the GC content of the RPB1 gene is higher (27%) than that of any of the three ORFs (which range from 17% to 20% GC). For any of these regions to be part(s) of the RPB1 gene, one or more spliceosomal intron(s) would appear to be required at the 3' end of the nm RPB1, unlike the other known G. theta nm introns, which are all found at the extreme 5' ends of genes. Determination of the 3' end sequence of the RPB1 mRNA would be required to verify the absence of a CTD or whether transcription extends into the short CTD-like region in this downstream ORF (or both).

Global Phylogenetic Analysis
To assess the phylogenetic placement of our new RPB1s, amino acid sequences from diverse eukaryotes were aligned with RPA1 homologs from an animal, a plant, and a fungus. This 22-taxon data set of 746 unambiguously aligned positions was subjected to rigorous phylogenetic analysis (fig. 1A ). The Giardia sequence emerged as the earliest RPB1 branch followed by the Trichomonas RPB1 sequence, both with apparently good support. Neither the Naegleria nor the Cercomonas sequences were strongly placed in these trees, whereas the Ochromonas sequence grouped with Plasmodium with variable, but modest support. Beyond those nodes that were universally supported, parsimony and distance analyses did not seem to provide significant resolution but were consistent with the seemingly more resolved ML analyses.



View larger version (44K):
[in this window]
[in a new window]
 
Fig. 1.—Global RPB1 phylogeny. Data sets of 746 aligned amino acid positions were analyzed by protML (ML), Puzzle (PZ), Puzzleboot (PB), Maximum Parsimony (MP) and PAM-corrected distance methods (DI) with values given at each node in that order. In all trees, nodes that are not supported by more than 50% with at least one method have no values listed for them and dashes denote cases where the topology shown was not reconstructed by a particular method. Topologies shown represent the optimal protML tree with branch lengths estimated in Puzzle incorporating a gamma correction and invariant sites. Sequences in bold are those obtained in this study. A, Rooted analysis of 22 taxa, including 19 RPB1 sequences and 3 RPA1 outgroups. Uppercase letters denote nodes of interest matched to values in figure 1B. B, Results of site removal test for long branches. As the optimal topology was the same for all site-removal analyses, the nodes listed are directly comparable with the phylogeny in figure 1A. NoSR = no sites removed, FSR = fast sites removed, CSR = constant sites removed, and F + CSR = fast and constant sites removed. Of particular note is the change in support for nodes J and I after site removal

 
Tests for Long Branches
It has been previously suggested that the Giardia and Trichomonas RPB1 sequences represent long branches (Hirt et al. 1999Citation ; Stiller and Hall 1999Citation ) which may artifactually place them as early emerging taxa (Philippe et al. 2000bCitation ). Of the in-group sequences, the Giardia sequence is clearly a long branch (fig. 1A ). We subjected our 22-taxon data set to a number of tests devised to detect long-branch effects in our analysis (Lyons-Weiler, Hoelzer, and Tausch 1996Citation ; Stiller, Duffield, and Hall 1998Citation ; Hirt et al. 1999Citation ).

Hirt et al. (1999)Citation showed that failure to correct for invariant or rapidly evolving sites could lead to artifactual resolution in phylogenies, especially when using protML in which the assumption of rate constancy is applied to all sites. A simple method of fast and CSR was used to compensate for this artifact. In order to assess the effect of these site rate categories in our data set, protML (-q 10,000) searches were carried out with fast, constant, and fast plus constant sites removed (fig. 1B ).

Reminiscent of Hirt et al. (1999)Citation our optimal ML topology was unchanged by site removal, but the support for several nodes were significantly affected. Although the node separating the Giardia RPB1 and the three RPA1 sequences from the rest remained robust, support for the node placing the Trichomonas sequence with Giardia and outgroups dropped from 78% to 51% RELL support with fast sites removed and to 42% with fast plus constant sites removed (fig. 1B ). This suggests that the deep placement of Trichomonas might also be artifactual. Interestingly, the node uniting Ochromonas and Plasmodium rose from 78% RELL support to 95% and 94% with fast and fast plus constant sites removed, respectively (fig. 1B ). It is possible that long-branch effects are masking the real phylogenetic signal in this case.

Without knowing the location of the root, it is difficult to distinguish whether a long branch is caused by rapid sequence evolution or early divergence. Stiller, Duffield, and Hall (1998)Citation proposed a method that may help to do so. They realized that the ratio of unique substitutions (autapomorphies) in a sequence to the shared substitutions with outgroups (symplesiomorphies) should be relatively uniform even in early diverging eukaryotes, if the rate of evolution were fairly constant and the earliest branches did not precede others by an immensely long time. However, a high ratio of autapomorphies to symplesiomorphies indicates rapid sequence divergence in a taxon rather than slow, but ancient, evolution. In the 746 amino acid alignment, Giardia has 24 autapomorphies, Ochromonas and Trichomonas have 8 and 7, respectively, kinetoplastids have 6, and no other taxa or group of taxa have more than 4. Giardia only has five symplesiomorphies, Naegleria has one, as does the Homo sapiens sequence. The exceptionally numerous autapomorphies in Giardia are suggestive of rapid evolution in its RPB1 gene but do not preclude the possibility of it also being an ancient lineage among eukaryotes.

RASA (Lyons-Weiler and Hoelzer 1999Citation ) assesses phylogenetic signal by measuring expected distribution of synapomorphies in a data set against a null hypothesis of a random distribution. It also identifies those taxa contributing more statistical noise than phylogenetic signal in the same data set. As seen in figure 2A, when the rooted 22-taxon 746-site data set was analyzed using RASA, the Giardia sequence was clearly identified as a long-branch sequence. A similar result was obtained when the outgroup sequences were manually removed (fig. 2B ). As indicated in figure 2AD, those taxa with the largest taxon variance were sequentially removed from the data set until the observed variance distribution was relatively even (fig. 2E ). In each case of taxon deletion, the tRASA value rose or (in the case of Mastigamoeba) was not markedly decreased (fig. 2 lower). Although the status of Mastigamoeba as a long branch is debatable, it was also deleted from our analysis to be conservative regarding both the possibility of long-branch artifacts as well as reducing the computational load of further phylogenetic analyses.



View larger version (41K):
[in this window]
[in a new window]
 
Fig. 2.—Taxon variance graphs and tRASA values for long-branch removal. RASA analyses were used to sequentially remove taxa until tRASA was maximized and taxon variance graphs appeared relatively even. Graphs of taxon variance from analytical analysis by RASA 2.5 are shown in the upper panel. Each data set is identified by the number of taxa, followed by the number of sites. In each case where a clearly long branch is present, that sequence is labeled. Taxa are always listed from top to bottom and are numbered as follows: 1 = Homo sapiens, 2 = Caenorhabditis elegans, 3 = Drosophila melanogaster, 4 = Schizosaccharomyces pombe, 5 = Saccharomyces cerevisiae, 6 = Acanthamoeba castellanii, 7 = Dictyostelium discoideum, 8 = Cercomonas ATCC50319, 9 = Ochromonas danica, 10 = Naegleria gruberi, 11 = Mastiga moeba invertens, 12 = Arabidopsis thaliana, 13 = Bonnemaisonia hamifera, 14 = Porphyra yezoensis, 15 = Plasmodium falciparum, 16 = Trichomonas vaginalis, 17 = Trypanosoma brucei, 18 = Leishmania donovani, 19 = Giardia intestinalis. In panels A and B, taxa are ordered 1–19. In C, taxa are ordered as 1–18. Panel D contains taxa 1–15, 17, and 18. In panel E, taxa are ordered 1–10, 12–15, 17, and 18. Panel F has taxa in the following order: 13, 14, 6, 7, 12, 8, 2, 3, 1, 4, 5, 10, 9, 15, 17, 18. Figure 2 , lower tabulates the tRASA values under analytical (ana) and permutation (perm) models of null slope estimation. The numbering of the data sets matches those in figure 2 , upper

 
Restricted Phylogeny
After removing long-branch taxa, the remaining sequences were aligned to give a data set of 16 taxa with 910 unambiguously aligned positions. Its tRASA value was highest among those tested (fig. 2 , lower), and little heterogeneity was seen in taxon variance among all sequences (fig. 2F ). The G. theta nm sequence was then manually added to the alignment. Analysis of this restricted data set robustly reconstructed the major groups of red algae, animal, fungi, and Euglenozoa and gave limited resolution of the branching order among the major eukaryotic groups (fig. 3A ). We recovered an Amoebozoa clade (sensu Baldauf et al. 2000Citation ) with the slime mould Dictyostelium discoideum and the amoeba Acanthamoeba castellanii grouping together with moderate affinity. A weak chromalveolate clade of Ochromonas and Plasmodium was also recovered; alternatively, the Plasmodium sequence clustered with the euglenozoan sequences (Trypanosoma and Leishmania) with the Ochromonas sequence adjacent. Because these four are the four longest-branching sequences that remain in this particular data set (fig. 3A ), it is possible that moderate LBA effects (not detectable by RASA) could be masking the chromalveolate relationship. Removing constant and fast plus constant sites modestly improved support for this relationship to 60% and 67%, respectively (data not shown). When euglenozoan RPB1 sequences were removed, support for chromalveolates and Amoebozoa changed to 81% and 84% RELL and 61% and 71% Puzzle support, respectively (fig. 3B ).



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 3.—Unrooted, taxon-restricted RPB1 phylogeny. Figure 3A shows an unrooted phylogeny arbitrarily rooted on the kinetoplastid sequences. ProtML RELL (ML), Puzzle (PZ), and Puzzleboot (PB) support values are shown at all nodes over 50%. Taxa in bold are those obtained in this study or newly analyzed (the G. theta nm). Upper case letters correspond to nodes of interest matched to values in figure 3B. Figure 3B tabulates the support values for nodes upon removal of the kinetoplastids (taxa denoted with an asterisk) and reanalysis. Notably, the support for chromalveolate and Amoebozoa nodes were particularly affected

 
Some cryptomonad nm genes are highly diverged and give long branches in phylogenetic analyses (Keeling et al. 1999Citation ; Archibald et al. 2001Citation ). However, when the Guillardia nm RPB1 sequence is aligned into a data set lacking other long-branch taxa (i.e., our 16 taxa 910-site alignment, above), it is recovered as a sister to the red algae with strong support (fig. 3A ). This result reinforces previous evidence that the cryptomonad nm is the remnant nucleus of a captured red alga (Douglas et al. 1991Citation ; Douglas and Penny 1999Citation ). Addition or exclusion of the long-branch nm sequence did not significantly alter support or topology of the rest of the tree.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Our analysis of RPB1 sequences has provided additional insights into the evolutionary relationships among eukaryotes, including considerable support for chromalveolates and Amoebozoa and a red algal affinity for the cryptomonad nm. In addition, these data also provide the opportunity to consider the functional evolution of RPB1, itself an important piece of the eukaryotic transcriptional apparatus.

Evolution of Transcription
The RNA Polymerase II complex is responsible for transcription and processing of messenger RNAs, with the largest subunit, RPB1, being central. A comparison of RPB1 sequences from diverse eukaryotes allows us to examine its functional evolution, particularly at the putatively identified functional and highly conserved regions. Block A is the region to which amplification primers were designed, so no information is available for the three sequences obtained by PCR. However, it is notable that the Guillardia nm RPB1 deviates from the Cys2-His2 zinc finger motif that characterizes this region (Cornelissen, Evers, and Kock 1988Citation ): the first histidine position of the conserved Cys-X2-Cys-X9-His-X2-His motif is replaced with a tyrosine. However, this is the only clear deviation in the nm sequence (as compared with the red algal sequences) of a previously identified critical residue. Blocks B through E are present and well conserved in all sequences that we obtained from free-living protists. Block F, the location of the catalytic sites (Wlassoff, Kimura, and Ishihama 1999Citation ), is well conserved across all taxa. This region also contains residues identified for {alpha}-amanitin sensitivity (reviewed by Quon, Delgadillo, and Johnson 1996Citation ), including Arg-741, Cys-777, and Gly-785; all are perfectly conserved in our RPB1s. Conserved domains G and H are both implicated in binding RPB6, a subunit important for RNA polymerase complex formation (Minakhin et al. 2001Citation ). In particular, the residues PGEMV in domain G and DAFDVMIDEES in domain H have been pointed out as contact points. In Naegleria and the Guillardia nm, these residues in domain G are perfectly conserved. The region H residues are less well conserved in the Naegleria sequence and the nm gene seems truncated at the end of this region. Implications of these observations for RPB6 binding and complex formation will require direct experimental inquiry.

The carboxy-terminal domain (CTD) of RPB1 has a number of transcriptional and posttranscriptional functions, in regulating transcription efficiency and coupling it to pre-mRNA processing: capping, splicing, 3' end cleavage, and polyadenylation. The phosphorylation of serines 2 and 5 of the heptad repeats is particularly critical. Interestingly, though, Stiller, McConaughy, and Hall (2000)Citation demonstrated that the last serine of the YSPTSPS heptad, although highly conserved, is not essential and can be substituted by a nonphosphorylatable residue. The CTD from Naegleria is congruent with this, having nonphosphorylatable residues (either alanine or asparagine) at this position. In addition to regulatory effects by phosphate addition, the action of a prolyl-isomerase ESS1 in yeast seems also to exert a regulatory effect at the CTD (Wu et al. 2000Citation ). In line with this, the Mastigamoeba and Naegleria CTDs are perfectly conserved at both proline positions.

Despite the CTD being implicated in many aspects of transcription and RNA processing, several protists appear to be devoid of a bona fide CTD, instead having only serine- and proline-rich regions at the carboxy terminal end. Even in taxa where the conservation of the repeats is strong, CTDs sometimes contain a number of noncanonical repeats. Similarly, of the eight repeats known in the Naegleria RPB1 CTD, three diverge significantly from the canonical heptad sequence. These data suggest that the exact sequence of the repeat may not be critical and that the conservation of the repeats may be correlated with the rigor with which the function is required. Because mRNA processing occurs in some taxa where the CTD is diminished or absent, they may have different mechanisms of transcription regulation. It also underlines that the careful functional work carried out with RPB1 in animals and fungi needs to be taken in an evolutionary context and not generalized to other species without direct evidence. Comparative studies, as here, may help in generalizing to all eukaryotes.

Evolution of Splicing and Introns
The RPB1 CTD plays a major role as a platform for construction of the spliceosome, (reviewed by Hirose and Manley 2000)Citation . We have observed a relationship between spliceosomal intron density and the presence of a CTD. For intron-rich species like mammals, the efficiency of spliceosome binding to the CTD may be paramount, perhaps forcing strict adherence to the heptad repeat sequence. However, for intron-sparse organisms, this conservation might be relaxed. RPB1s from Trichomonas and Giardia (organisms not known to contain spliceosomal introns; Logsdon 1998Citation ) lack CTDs with canonical repeats but instead have serine-proline–rich C-terminal regions—possibly representing degenerate CTDs. Other protists also show possible CTD degeneration (Stiller and Hall 1997Citation ; Stiller, Duffield, and Hall 1998Citation ). Interestingly, the low intron density in Naegleria (Logsdon 1998Citation ) matches its abnormal CTD. Without knowing the location of the eukaryotic root or even a well resolved eukaryotic phylogeny, we cannot be sure whether this and similar cases in other protists are degenerate or early stages of CTD evolution. The apparent absence of a CTD from the Guillardia nm RPB1 contrasts with the presence of 17 spliceosomal introns in its genome (Douglas et al. 2001Citation ). If the CTD is indeed missing from the Guillardia nm RPB1, it is very likely caused by loss because both red and green algae contain either bona fide heptad repeats or clearly degenerate repeats. Whether the absence of CTD from the Guillardia nm affects the transcription-processing functions and represents a singular loss event or a general feature of genome diminution and intron loss are interesting questions, now open to investigation. As RPB1 genes are sequenced from a diversity of eukaryotes and as more protist genomes are studied, the relationship between the CTD and the evolution and spread of spliceosomal introns will be clarified.

Eukaryotic Phylogeny
Our analyses of RPB1 phylogeny reveal support—for the first time with this molecule—for some higher-level groupings among major eukaryotic lineages. Although RPB1 does not provide robust resolution between some major eukaryotic groups, the opisthokonts (animals plus fungi), Amoebozoa, and chromalveolates are moderately supported, as they are for other phylogenetic markers (Baldauf and Palmer 1993Citation ; Baldauf et al. 2000Citation ; Fast et al. 2001Citation ). While our paper was in preparation, another RPB1 analysis (Stiller, Riley, and Hall 2001Citation ) confirmed the alveolate relationship providing a new ciliate sequence and showed glaucophytes as an outgroup to red algae. In the analyses shown here (fig. 3 ), the cryptomonad nm clearly groups with red algae (Douglas et al. 1991Citation ), though it is unclear whether it will group within the strong glaucophyte-red algal clade. Unfortunately, neither the Naegleria nor Cercomonas RPB1 sequences show strong affinity for any others in our data set.

Two apparently robust nodes in our 22-taxon phylogeny were those separating the Giardia and then the Trichomonas sequences from the other eukaryotic RPB1 sequences. However, this need not mean they are actually early emerging. In line with previous suggestions and results (Stiller, Duffield, and Hall 1998Citation ; Hirt et al. 1999Citation ), our various tests indicate that these two sequences are particularly divergent; thus, their placement as early evolving lineages is suspect. The site removal (fig. 1B ), autapomorphy-sympleisiomorphy ratio, and RASA (fig. 2 ) analyses confirmed that the Giardia and Trichomonas RPB1 sequences represent long branches within the analysis. The Trypanosoma sequence has also been suggested as a long branch; however, the Leishmania sequence appears to divide this branch and somewhat reduce its effects. Although our LBA analyses neither indicate an alternate placement for Giardia or Trichomonas nor prove that these sequences are not early evolving, they strongly concur with prior suggestions that the deeply diverging position of diplomonads and parabasalids be viewed with caution.

When long-branch taxa are excluded, we see less apparent resolution than in previous reports (Stiller, Duffield, and Hall 1998Citation ; Hirt et al. 1999Citation ) or in our global analyses (fig. 1 ). This suggests that long-branch taxa may structure the data set and provide false resolution. It is therefore important to view with caution any conclusions based on RPB1 phylogenies which include long-branch taxa; their presence may obscure other relationships. Our restricted data set has some resolution at the supertaxon level, consistent with data from morphological and other molecular analyses. In particular, the chromalveolates and Amoebozoa are reconstructed with moderate support (fig. 3 ), as are the opisthokonts; the latter two are notable, given their previous lack of resolution by RPB1 (Stiller, Duffield, and Hall 1998Citation ; Hirt et al. 1999Citation ), including a seemingly well supported, but contradictory, placement of animals and fungi (Sidow and Thomas 1994Citation ). An opisthokont plus amoebozoa branch is recovered in the optimal topology, consistent with other data (Baldauf et al. 2000Citation ), but it is not statistically supported or recovered by other methods. Stiller, Riley, and Hall (2001)Citation have recently provided evidence from RPB1 for the separation of red and green algae; in the analyses done here (including the removal of long-branch taxa) we find no support for this separation. Although we do not recover a monophyletic plant clade (red algae and land plants), there is no significant support for its polyphyly. Indeed, Moreira, Le Guyader, and Phillippe (2000)Citation also showed that RPB1 phylogeny was the sole exception among a variety of genes to uniting red and green algae and that analyses of RPB1 are not strongly inconsistent with this clade.

In its initial formulation, the eukaryotic big bang hypothesis stated that the major eukaryotic groups were formed in an explosive radiation yielding as many as 10 or as few as four fundamentally unresolvable groups (Philippe and Adoutte 1998Citation ). In the past few years, a number of these (and other) major eukaryotic groups have been confidently placed together using concatenated data (Baldauf et al. 2000Citation ; Moreira, Le Guyader, and Phillippe 2000)Citation , novel taxon inclusion (Dacks et al. 2001Citation ), or alternative protein markers (Hirt et al. 1999Citation ; Moreira, Le Guyader, and Phillippe 2000Citation ; Keeling 2001Citation ; Fast et al. 2001Citation ). Consequently, we doubt that the large-scale relationships between eukaryotes are fundamentally unresolvable by conventional molecular phylogenetics. Recent incarnations of the eukaryotic big bang hypothesis have focused on the time span of the radiation and less on fundamental lack of resolution among lineages (Philippe, Germot, and Moreira 2000aCitation ). The major eukaryotic supertaxa probably did evolve rapidly, in line with the observation that most single genes have consistent, but weak signal. However, that radiation probably left behind a phylogenetic signal that could be unraveled with more data and additional analyses. This means that the eukaryotic big bang and superclade views are not as incompatible as they might first appear. Using several different genes to establish internal relationships may prove more productive and robust than seeking the deepest diverging taxa using single genes only. Given the relative success of RPB1 in placing phylogenetically difficult taxa (Hirt et al. 1999Citation ; Stiller, Riley, and Hall 2001Citation ) and our demonstration of some larger-scale eukaryotic resolution, building a well-represented RPB1 database may help clarify some of these internal relationships.

Note Added in Proof
Two recent papers have demonstrated that the diverse amoebae Mastigamoeba, Entamoeba and Dictyostelium form a monophyletic group, Conosa (Arisue, N., T. Hashimoto, J. A. Lee, D. V. Moore, P. Gordon, C. W. Sensen, T. Gaasterland, M. Hasegawa, and M. Muller. 2002. The phylogenetic position of Mastigamoeba balamuthi based on sequences of rDNA and translation elongation factors EF1-{alpha} and EF-2. J. Eukaryot. Microbiol. 49:1–10; Bapteste, E., H. Brinkmann, J. A. Lee, D. V. Moore, C. W. Sensen, P. Gordon, L. Durufle, T. Gaasterland, P. Lopez, M. Muller, and H. Phillippe. 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl. Acad. Sci. USA 99:14–19). A third paper using concatenated mitochondrial proteins (Forget, L., J. Ustinova, Z. Wang, V. A. R. Huss, and B. F. Lang. 2002. Hyaloraphidium curvatum: A linear mitochondrial genome, tRNA Editing, and an evolutionary link to lower fungi. Mol. Biol. Evol. 19:310–319) shows independently of our nuclear gene evidence and that of Baldauf et al. (2000) that Acanthamoeba also is specifically related to Dictyostelium. This extensive evidence for the monophyly of Amoebozoa is not strongly contradicted by our analyses that do not place Mastigamoeba invertens with the other two amoebae; the RPB1 data set seems sensitive to long branch effects and M. invertens acts as a long branch. The position of M. invertens is similarly non-robust on gamma corrected 18S rRNA trees, where it often does not group with other amoebae (and never with M. balamuthi: TC-S unpublished data). In addition, a spliceosomal intron has been recently discovered in Giardia (Nixon, J. E., A. Wang, H. G. Morrison, A. G. McArthur, M. L. Sogin, B. J. Loftus, and J. Samuelson. 2002. A spliceosomal intron in Giardia lamblia. Proc. Natl. Acad. Sci. USA 99:3701–3705. Thus Giardia must still be capable of splicing despite its abnormal CTD; this is consistent with our suggestion of widespread CTD degeneration in protists.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We would like to thank Alastair Simpson, Banoo Malik, Lesley Davis, and Andrew Roger for critical reading of the manuscript and helpful comments. We also thank two anonymous reviewers for their helpful comments. This work was made possible by grants to W.F.D. from the CIHR (Grant MT4467), to T.C.-S. from NSERC (Canada) and NERC (UK) and to J.M.L. from the NIH (GM19656). J.B.D. was supported by a CIHR Doctoral Research Award as well as a Walter C. Sumner scholarship. A.M. was partly supported by a grant from CIAR. T.C.-S. thanks the CIAR Evolutionary Biology Program and NERC for Fellowship support.


    Footnotes
 
Geoffrey McFadden, Reviewing Editor

1 Contributed equally to this paper Back

Abbreviations: RPB1, RNA Polymerase II largest subunit; LBA, long-branch attraction; CTD, carboxy-terminal domain of RPB1; nm, nucleomorph. Back

Keywords: evolution Naegleria Cercomonas Ochromonas intron nucleomorph Back

Address for correspondence and reprints: John M. Logsdon Jr., Department of Biology, Emory University, 1111 Rollins Research Center, 1510 Clifton Road, Atlanta, Georgia 30322. jlogsdon{at}biology.emory.edu . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Adachi J., M. Hasegawa, 1996 MOLPHY Version 2.3. Programs for molecular phylogenetics based on maximum likelihood. Computer Science Monographs 28

    Archibald J., T. Cavalier-Smith, U. Maier, S. Douglas, 2001 Molecular chaperones encoded by a reduced nucleus—the cryptomonad nucleomorph J. Mol. Evol 52:490-501[ISI][Medline]

    Baldauf S. L., W. F. Doolittle, 1997 Origin and evolution of the slime molds (Mycetozoa) Proc. Natl. Acad. Sci. USA 94:12007-12012[Abstract/Free Full Text]

    Baldauf S. L., J. D. Palmer, 1993 Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins Proc. Natl. Acad. Sci. USA 90:11558-11562[Abstract]

    Baldauf S. L., A. J. Roger, I. Wenk-Siefert, W. F. Doolittle, 2000 A kingdom-level phylogeny of eukaryotes based on combined protein data Science 290:972-977[Abstract/Free Full Text]

    Cavalier-Smith T., 1987 The origin of fungi and pseudofungi Pp. 339–353 in A. D. M. Rayner, C. M. Brasier, and D. Moore, eds., Evolutionary biology of the fungi, Vol. 13. Symp. Br. Mycol. Soc. Cambridge University Press, Cambridge

    ———. 1993 Kingdom Protozoa and its 18 phyla Microbiol. Rev 57:953-994[Abstract]

    ———. 1998 A revised six-kingdom system of life Biol. Rev. Camb. Philos. Soc 73:203-266[ISI][Medline]

    ———. 1999 Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, dinoflagellate, and sporozoan plastid origins and the eukaryote family tree J. Eukaryot. Microbiol 46:347-366[ISI]

    ———. 2000 Flagellate megaevolution: the basis for eukaryote diversification Pp. 361–390 in J. R. Green and B. S. C. Leadbeater, eds. The Flagellates. Taylor and Francis, London

    ———. 2002 The phagotrophic origin of eukaryotes and phylogenetic classification of Protozoa Int. J. Syst. Evol. Microbiol 52:297-354[Abstract/Free Full Text]

    Cornelissen A. W., R. Evers, J. Kock, 1988 Structure and sequence of genes encoding subunits of eukaryotic RNA polymerases Oxf. Surv. Eukaryot. Genes 5:91-131[Medline]

    Dacks J., A. J. Roger, 1999 The first sexual lineage and the relevance of facultative sex J. Mol. Evol 48:779-783[ISI][Medline]

    Dacks J. B., J. D. Silberman, A. G. Simpson, S. Moriya, T. Kudo, M. Ohkuma, R. J. Redfield, 2001 Oxymonads are closely related to the excavate taxon Trimastix Mol. Biol. Evol 18:1034-1044[Abstract/Free Full Text]

    Douglas S., S. Zauner, M. Fraunholz, M. Beaton, S. Penny, L. T. Deng, X. Wu, M. Reith, T. Cavalier-Smith, U. G. Maier, 2001 The highly reduced genome of an enslaved algal nucleus Nature 410:1091-1096[ISI][Medline]

    Douglas S. E., C. A. Murphy, D. F. Spencer, M. W. Gray, 1991 Cryptomonad algae are evolutionary chimaeras of two phylogenetically distinct unicellular eukaryotes Nature 350:148-151[ISI][Medline]

    Douglas S. E., S. L. Penny, 1999 The plastid genome of the cryptophyte alga, Guillardia theta: complete sequence and conserved synteny groups confirm its common ancestry with red algae J. Mol. Evol 48:236-244[ISI][Medline]

    Edgcomb V. P., A. J. Roger, A. G. Simpson, D. T. Kysela, M. L. Sogin, 2001 Evolutionary relationships among "jakobid" flagellates as indicated by alpha- and beta-tubulin phylogenies Mol. Biol. Evol 18:514-522[Abstract/Free Full Text]

    Embley T. M., R. P. Hirt, 1998 Early branching eukaryotes? Curr. Opin. Genet. Dev 8:624-629[ISI][Medline]

    Fast N. M., J. C. Kissinger, D. S. Roos, P. J. Keeling, 2001 Nuclear-encoded, plastid-targeted genes suggest a single common origin for apicomplexan and dinoflagellate plastids Mol. Biol. Evol 18:418-426[Abstract/Free Full Text]

    Felsenstein J., 1995 PHYLIP (phylogeny inference package) Department of Genetics, University of Washington, Seattle

    Germot A., H. Philippe, H. Le Guyader, 1997 Evidence for the loss of mitochondria in Microsporidia from a mitochondrial-type HSP70 in Nosema locustae Mol. Biochem. Parasitol 87:159-168[ISI][Medline]

    Hirose Y., J. L. Manley, 2000 RNA polymerase II and the integration of nuclear events Genes Dev 14:1415-1429[Free Full Text]

    Hirt R. P., J. M. Logsdon Jr., B. Healy, M. W. Dorey, W. F. Doolittle, T. M. Embley, 1999 Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins Proc. Natl. Acad. Sci. USA 96:580-585[Abstract/Free Full Text]

    Keeling P. J., 2001 Foraminifera and cercozoa are related in actin phylogeny: two orphans find a home? Mol. Biol. Evol 18:1551-1557[Abstract/Free Full Text]

    Keeling P. J., J. A. Deane, C. Hink-Schauer, S. E. Douglas, U. G. Maier, G. I. McFadden, 1999 The secondary endosymbiont of the cryptomonad Guillardia theta contains alpha-, beta-, and gamma-tubulin genes Mol. Biol. Evol 16:1308-1313[Abstract]

    Keeling P. J., W. F. Doolittle, 1996 Alpha-tubulin from early-diverging eukaryotic lineages and the evolution of the tubulin family Mol. Biol. Evol 13:1297-1305[Abstract/Free Full Text]

    Lam T. Y., L. Chan, P. Yip, C. H. Siu, 1992 The largest subunit of RNA polymerase II in Dictyostelium: conservation of the unique tail domain and gene expression Biochem. Cell. Biol 70:792-799[ISI][Medline]

    Lichtenstein C. P., J. Draper, 1985 Genetic engineering in plants Pp. 102–103 in D. M. Glover, ed. DNA cloning: a practical approach. IRL Press, Oxford

    Logsdon J. M. Jr., 1998 The recent origins of spliceosomal introns revisited Curr. Opin. Genet. Dev 8:637-648[ISI][Medline]

    Lyons-Weiler J., G. A. Hoelzer, 1999 Null model selection, compositional bias, character state bias, and the limits of phylogenetic information Mol. Biol. Evol 16:1400-1406[Free Full Text]

    Lyons-Weiler J., G. A. Hoelzer, R. J. Tausch, 1996 Relative apparent synapomorphy analysis (RASA). I: the statistical measurement of phylogenetic signal Mol. Biol. Evol 13:749-757[Abstract]

    Maddison D. R., W. P. Maddison, 2000 MacClade 4; analysis of phylogeny and character evolution Sinauer Associates, Sunderland, Mass

    Minakhin L., S. Bhagat, A. Brunning, E. A. Campbell, S. A. Darst, R. H. Ebright, K. Severinov, 2001 Bacterial RNA polymerase subunit omega and eukaryotic RNA polymerase subunit RPB6 are sequence, structural, and functional homologs and promote RNA polymerase assembly Proc. Natl. Acad. Sci. USA 98:892-897[Abstract/Free Full Text]

    Moreira D., H. Le Guyader, H. Phillippe, 2000 The origin of red algae and the evolution of chloroplasts Nature 405:69-72[ISI][Medline]

    Philippe H., A. Adoutte, 1998 The molecular phylogeny of Eukaryota: solid facts and uncertainties Pp. 25–56 in G. Coombs, K. Vickerman, M. Sleigh, and A. Warren, eds. Evolutionary relationships among Protozoa. Chapman & Hall, London

    Philippe H., A. Germot, D. Moreira, 2000a. The new phylogeny of eukaryotes Curr. Opin. Genet. Dev 10:596-601[ISI][Medline]

    Philippe H., P. Lopez, H. Brinkmann, K. Budin, A. Germot, J. Laurent, D. Moreira, M. Muller, H. Le Guyader, 2000b. Early-branching or fast-evolving eukaryotes? An answer based on slowly evolving positions. Proc. R. Soc. Lond. B. Biol. Sci 267:1213-1221

    Quon D. V., M. G. Delgadillo, P. J. Johnson, 1996 Transcription in the early diverging eukaryote Trichomonas vaginalis: an unusual RNA polymerase II and alpha-amanitin–resistant transcription of protein-coding genes J. Mol. Evol 43:253-262[ISI][Medline]

    Sidow A., W. K. Thomas, 1994 A molecular evolutionary framework for eukaryotic model organisms Curr. Biol 4:596-603[ISI][Medline]

    Silberman J. D., C. G. Clark, L. S. Diamond, M. L. Sogin, 1999 Phylogeny of the genera Entamoeba and Endolimax as deduced from small-subunit ribosomal RNA sequences Mol. Biol. Evol 16:1740-1751[Abstract/Free Full Text]

    Simpson A. G. B., D. J. Patterson, 1999 The ultrastructure of Carpediemonas membranifera (Eukaryota), with reference to the "excavate hypothesis." Eur. J. Protistol 35:353-370[ISI]

    Sogin M. L., 1991 Early evolution and the origin of eukaryotes Curr. Opin. Gen. Dev 1:457-463[Medline]

    Stiller J. W., E. C. Duffield, B. D. Hall, 1998 Amitochondriate amoebae and the evolution of DNA-dependent RNA polymerase II Proc. Natl. Acad. Sci. USA 95:11769-11774[Abstract/Free Full Text]

    Stiller J. W., B. D. Hall, 1997 The origin of red algae: implications for plastid evolution Proc. Natl. Acad. Sci. USA 94:4520-4525[Abstract/Free Full Text]

    ———. 1998 Sequences of the largest subunit of RNA polymerase II from two red algae and their implications for rhodophyte evolution J. Phycol 34:857-864[ISI]

    ———. 1999 Long-branch attraction and the rDNA model of early eukaryotic evolution Mol. Biol. Evol 16:1270-1279[Free Full Text]

    Stiller J. W., B. L. McConaughy, B. D. Hall, 2000 Evolutionary complementation for polymerase II CTD function Yeast 16:57-64[ISI][Medline]

    Stiller J. W., J. Riley, B. D. Hall, 2001 Are red algae plants? A critical evaluation of three key molecular data sets. J. Mol. Evol 52:527-539

    Strimmer K., A. von Haeseler, 1997 Puzzle Zoologisches Institut. Universitat Muenchen, Munich

    Swofford D. L., 1998 PAUP*: phylogenetic analysis using parsimony (* and Other Methods) Sinauer Associates, Sunderland, Mass

    Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 25:4876-4882[Abstract/Free Full Text]

    Wlassoff W. A., M. Kimura, A. Ishihama, 1999 Functional organization of two large subunits of the fission yeast Schizosaccharomyces pombe RNA polymerase II Location of the catalytic sites. J. Biol. Chem 274:5104-5113

    Wu X., C. B. Wilcox, G. Devasahayam, R. L. Hackett, M. Arevalo-Rodriguez, M. E. Cardenas, J. Heitman, S. D. Hanes, 2000 The Ess1 prolyl isomerase is linked to chromatin remodeling complexes and the general transcription machinery EMBO J 19:3727-3738[Abstract/Free Full Text]

Accepted for publication January 15, 2002.