Retroids in Archaea: Phylogeny and Lateral Origins

Joshua S. Rest and David P. Mindell

Department of Ecology and Evolutionary Biology and Museum of Zoology, University of Michigan, Ann Arbor

Correspondence: E-mail: mindell{at}umich.edu.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Until recently, none of the diverse elements bearing reverse transcriptase (retroids) have been known from Archaea. However, in the recently published genomes of the acetate-utilizing archaeal methanogens, Methanosarcina acetivorans and M. mazei, several open reading frames (ORFs) are annotated as reverse transcriptase (RT). These annotations led us to the characterization of a retron and 13 retrointrons, including three twintrons, clustered at seven loci of the M. acetivorans genome, and four retrointrons at two loci of the M. mazei genome. Based on a phylogeny of the RT ORFs, we infer four lateral gene transfers (LGT) of these retroids from Bacteria to Archaea and of retrointron mobility within the Archaea genomes. Our phylogenetic analysis also identifies several novel retrons from GenBank in the bacterial groups Firmicutes, Fusobacteria, Cyanobacteria and ß-Proteobacteria, as well as in M. acetivorans. The discovery of retrointrons in Archaea as a consequence of LGT from Bacteria suggests that they did not originate in the progenote and parallels the "mitochondrial seed" theory of the origin of spliceosomes. Extending the known phylogenetic distribution of retroids to Archaea is consistent with the view that they have played a significant role in evolution of genomes throughout the tree of life.

Key Words: retron • Archaea • group II intron • Methanosarcina • origin of introns • retroelement


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Retroids, or retroelements, are transposable genetic elements encoding reverse transcriptase (RT) known from diverse Eukarya and Bacteria. Retroids can serve as agents of gene rearrangement and deletion, DNA repair, telomere extension, and, occasionally, movement of genes between genomes. They are also prone to frequent duplication. RT can be recognized by conserved residues forming seven domains (Doolittle et al. 1989; Xiong and Eickbush 1990), which form the catalytic site and are involved in reverse transcription (Kohlstaedt et al. 1992). These domains are conserved across diverse retroids, although the level of sequence identity between RTs in different retroid classes is low (less than 25%), and the presence of genes other than RT in retroids is variable (McClure 1999). The RT domains are thought to be homologous throughout all groups of retroids and have been used for phylogenetic reconstruction (Xiong and Eickbush 1990).

Retroids have not previously been identified in Archaea. Three groups of retroids are found in Bacteria and organelles: retrointrons, retrons, and retroplasmids. Retrointrons are self-splicing RNAs that can act as retroelements when they encode an RT open reading frame (ORF). The majority of retrointrons identified are group II introns; a twintron is a group II intron with another inserted into it, and group III introns are abbreviated versions of group II introns. Group II introns are found in the mitochondria of plants and Fungi and the chloroplasts of euglenoids and algae, as well as in Bacteria such as Cyanobacteria and {gamma}-Proteobacteria (reviewed in Zimmerly, Hausner, and Wu 2001). Retrointrons have six conserved RNA secondary structure domains, and the RT ORF, when present, is located in domain 4.

Another group of retroids, called retrons, are found in some Bacteria and encode DNA and RNA complexes, referred to as "multicopy single-stranded DNA," or msDNAs, which are synthesized by RT (reviewed in Lampson, Inouye, and Inouye 2001). The function of msDNA remains unknown. In addition to the RT ORF, retrons consist of two loci that are situated in opposite directions: msd encodes the DNA strand of the msDNA, and msr encodes the RNA strand of msDNA. Two sets of inverted repeats direct the proper folding of the RNA to allow it to serve as a primer and template for cDNA synthesis.

A third group of retroids, called retroplasmids, are found in the mitochondria of Fungi and are linear (Walther and Kennell 1999) or circular (e.g., Natvig, May, and Taylor 1984; Pande, Lemire, and Nargang 1989) plasmids that contain and are replicated by RT.

Retroids in Archaea and the Origin of Introns
Assessing the presence of retroids in Archaea, particularly retrointrons, is relevant to several questions concerning early evolution, and one such question is the origin of the spliceosome and its associated introns in eukaryotes. Although it has been proposed that spliceosomal introns were part of the prototypical gene and that introns were "early" (e.g., Gilbert, Marchionni, and McKnight 1986), much recent evidence supports the later advent of spliceosomal introns in eukaryotes (e.g., Logsdon 1998). Similarly, because retrointrons are present in many extant Bacteria, mitochondria, and chloroplasts but not present in Archaea or in the nuclear genome of eukaryotes, it is thought that they originated not in the progenote, but later in Bacteria. Similarities in structure and function suggest that these retrointrons may be related to the spliceosome. According to the "mitochondrial seed" hypothesis, retrointrons, which originated in Bacteria, were introduced via the mitochondrial endosymbiont into the eukaryotic nuclear genome, where the five domains of the retrointron catalytic core structure split into pieces and evolved into the five small nuclear RNAs (snRNAs) that form the spliceosome (Michel and Lang 1985; Sharp 1985; Roger and Doolittle 1993; Sontheimer and Steitz 1993; Eickbush 1999; Simpson, MacQuarrie, and Roger 2002). If retrointrons found in Archaea are sister to retrointrons in Bacteria, this would be consistent with the "introns early" hypothesis, which suggests that introns were present in the progenote, their common ancestor. However, if retrointrons were transferred more recently to Archaea and are paraphyletic in analyses with bacterial retrointrons, this would be consistent with the hypothesized late advent of introns in Bacteria and subsequent transfer via the mitochondrial endosymbiont (seed) to eukaryotes.

Like the snRNAs of the spliceosome, the self-splicing RNAs of retrointrons display many of the properties of hypothesized catalytic RNAs in the RNA world (Madhani and Guthrie 1994; Jeffares, Poole, and Penny 1998). A homologous relationship has been proposed between these extant RNAs and catalytic RNAs from the RNA world. Support for such homology is contingent upon the hypothesized ancestral form of retrointrons. Retrointrons have been observed in two forms: as RNA catalytic domains only and as RNA catalytic domains along with an RT ORF (Zimmerly et al. 1995). If the similarities between the hypothetical ribozymes of the RNA world and the catalytic RNAs of retrointrons are homologous rather than convergent, then retrointrons without RT should be older than retrointrons with RT (Lambowitz and Belfort 1993; Wank et al. 1999). Alternatively, if retrointrons with RT are older, it would suggest that a retroelement independently gained catalytic RNA activity (Curcio and Belfort 1996; Zimmerly, Hausner, and Wu 2001). It seems only reasonable to ask, therefore, if there are retrointrons in Archaea and if they have RT ORFs or not.

The apparent lack of retroids in Archaea, given the abundance and variety of forms of retroids in Bacteria and organelles, has been an enigma. However, in the recently published genomes of the acetate-utilizing archaeal methanogens Methanosarcina acetivorans (Galagan et al. 2002) and M. mazei (Deppenmeier et al. 2002), several ORFs are annotated as RT. These annotations led us to the characterization of 18 retroids clustered at seven loci of the M. acetivorans genome and four retroids at two loci of the M. mazei genome, almost all of which are retrointrons. These first retroids to be identified and examined in Archaea indicate an intriguing record of duplication and horizontal transmission.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Identification of RT ORFs
"Reverse transcriptase" is referred to 14 times in the M. acetivorans genome (NC_003552; Galagan et al. 2002), at six regions, or loci (loci 1 to 6; fig. 1 and table 1) and three times in the M. mazei genome (NC_003901; Deppenmeier et al. 2002) at one locus (locus 8; fig. 1 and table 2). Using a Blast search, we identified an additional retron RT ORF at an additional locus in M. acetivorans (locus 7; fig. 1 and table 1) and a retrointron RT ORF in M. mazei at an additional locus (locus 9; fig. 1 and table 2). We used Vector NTI Suite 8 (Informax Inc., Bethesda, Md) to characterize these nine loci by looking for additional RT ORFs (> 50 amino acids), as well as by searching for single-base pair frameshifts or stop codons that truncate the RT reading frames. Blast searches of intergenic regions against the genome and GenBank were used to identify genomic repeats and retrointron RNA domains between RT ORFs.



View larger version (35K):
[in this window]
[in a new window]
 
FIG. 1. Location and orientation of retroids in archaeal genomes. Retrointrons are located at six loci (1 to 6) of the M. acetivorans genome and a retron is located at locus 7 (table 1). Retrointrons are located at two loci (8 and 9) of the M. mazei genome (table 2). Retrointron RT ORF clades were determined by the phylogeny in figure 2

 

View this table:
[in this window]
[in a new window]
 
Table 1 Reverse Transcriptase ORFs in Methanosarcina acetivorans.

 

View this table:
[in this window]
[in a new window]
 
Table 2 Reverse Transcriptase ORFs in Methanosarcina mazei.

 
Phylogenetic Analysis
Profile hidden Markov models (HMMs) were used to characterize, search for, and align RT ORFs from related retroids. HMMs appear effective in detecting conserved patterns in multiple sequences (see Eddy 1996; Karchin and Hughey 1998). When provided with a set of training sequences that contain members of a protein family, the resulting HMM can (1) identify the positions of amino acids that describe conserved first-order structure of the family; (2) generate a multiple alignment of unaligned sequences that reveals these conserved regions; and (3) discriminate between family and nonfamily members in a sequence database search. These three properties are advantages over Blast or FASTA searches when producing an alignment of conserved protein families. HMMR 2.2g (Eddy 1998) was used for all HMM analyses, together with the PFAM HMM for RT, called RVT (Bateman et al. 2002), or our own HMM, ARCH-RVT (see below). The RVT HMM is a generalized model for diverse RTs and contains the seven core subdomains (1 to 7) found in all RTs (Xiong and Eickbush 1990) that encompass the palm and finger regions in the crystal structure of HIV-RT (Kohlstaedt et al. 1992).

To estimate phylogeny, we used a Bayesian inference (BI) approach (Yang and Rannala 1997; Mau, Newton, and Larget 1999) with Metropolis-coupled Markov chain Monte Carlo, or (MC)3, to approximate the posterior probabilities (PP) of the trees in MrBayes 3.0alpha (Huelsenbeck and Ronquist 2001). Bayesian inference has advantages over other methods of phylogenetic inference in interpretation of results, consistency (Wilcox et al. 2002) and computational speed (Larget and Simon 1999). Although some simulations have demonstrated artifactually high PP support values, they also indicate that the reliability of the results depends on appropriateness of the model (e.g., Suzuki, Glazko, and Nei 2002). To address this, we used a substitution model known to outperform other models on RT amino acid sequences, rtREV (Dimmic et al. 2002), and estimated the {alpha} parameter for the {gamma} distribution of rates. The search was run twice, starting from random trees with four simultaneous Markov chains and sampling every 100 generations. The proportion of searches in which any given node (set of relationships) is found during the chain is an approximation of its posterior probability and provides an indication of support for that node based on the data set. For comparison to BI, we also used maximum-parsimony (MP) and neighbor-joining (NJ) distance in PAUP* (Swofford 2002) with 1,000 bootstrapped data sets, and summarized as a 50% majority-rule tree.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
The retrointron (group II intron) RT ORFs are contained within six regions, or loci, of M. acetivorans (loci 1 to 6 in fig. 1) and two loci of M. mazei (loci 8 and 9 in fig. 1). Multiple RT ORFs are evident at some loci. By lowering the minimum size of predicted ORFs at these loci from 200 to 50 amino acids, we identified two additional retrointron RT ORFs in M. acetivorans that had not been annotated in the genome release (MA2799 and MA2799B in table 1 and fig. 1). We also extended two ORFs that had single-base pair deletions (MA4627 and MA4182 in table 1) and one with a stop codon (MA4187 in table 1) for purposes of comparative analysis.

The 21 RT ORFs identified in M. acetivorans and M. mazei all significantly fit the RVT HMM, and we used RVT to align them. From the resulting alignment, it was apparent that at three loci in M. acetivorans (3, 4, and 5) an RT ORF had been split in two by the addition of another RT ORF. This is similar to the arrangement of chloroplast "twintrons," which are retrointrons within retrointrons (Copertino and Hallick 1991). For further comparative analysis, the split ORFs of the external introns were concatenated: MA4627 and MA4625 at locus 3, MA2799B and MA2796 at locus 4, and MA4184 and MA4182 at locus 5 (fig. 1). The RT subdomains of MA4183 and MA2797 are identical, so only a single sequence was used to represent them. Additionally, the HMM identified an RT ORF in M. acetivorans, MA2102 at locus 7, with similarity to retrons. When the resulting 17 RT ORFs (16 retrointrons + 1 retron) are aligned with RVT, the alignment contains 239 out of 260 amino acids in the RVT HMM.

According to the HMM alignment, the retron RT and seven of the 16 retrointron ORFs have all seven subdomains (tables 1 and 2). Nine RTs contain two to four subdomains, and one RT, MA2799, has only a single subdomain. The ORF of retrointrons also contains domains 0, X, and Zn, the presence of which is variable in the ORFs discussed here (Dai and Zimmerly 2003).

Loci 1, 2, and 6 in M. acetivorans each contain a single retrointron, as identified by the presence of an RT ORF, flanked by a transposase domain. Loci 3, 4, and 5 each contain a twintron and an additional retrointron downstream, with respect to the direction of replication, and locus 5 has an additional downstream retrointron. The retron at locus 7 is flanked upstream by a hypothetical protein and downstream by a type I site-specific deoxyribonuclease. In M. mazei, locus 8 contains three retrointrons separated by transposase domains in various orientations, and locus 9 contains a single retrointron. Arrows at the end of each locus in figure 1 indicate the replication direction. The retrointrons at four of the six loci of M. acetivorans and one of two loci of M. mazei are oriented opposite the direction of replication. The retron at locus 7 (MA2102) of M. acetivorans is also oriented opposite the direction of replication.

For each of the nine loci that contain RT ORFs in the two genomes, Blast searches were used to characterize intergenic regions. Short stretches (~50 to100 bp) of unique nucleotide sequences that are repeated three to 29 times elsewhere in the genome were identified (fig. 1). We also used a Blast search to identify putative retrointron RNA domains on each side of the retrointron ORFs. Although we identify many of the group II RNA domains using this method, Dai and Zimmerly (2003) provide a more detailed analysis of these domains, and we have annotated figure 1 accordingly.

Phylogenetic Analysis
We used the 239 amino acid alignment (based on RVT, described above) of the 16 retrointron and one retron RT ORFs to build an archaeal-specific HMM, called ARCH-RVT. ARCH-RVT was used to search all nonredundant GenBank CDS translations+PDB+SwissProt+PIR (November 1, 2002) and returned a set of 335 ORFs (excluding ORFs from M. acetivorans and M. mazei) with bit scores of 17 or greater, suggesting that they are members of the same family as the archaeal RTs, and with an E value (expectation value) of 0.05 or less, which indicates that the bit score is better than expected from random sequence. We aligned the resulting sequences with RVT, clustered them into 70% identity groups using BlastCLUST, retained only a single representative from each group, and added in the 17 archaeal sequences (16 retrointrons and one retron), resulting in a final alignment 260 amino acids long with 177 unique RT sequences.

BI analyses for this alignment resulted in the topology shown in figure 2. A burn-in period of 1.2 x 106 generations was necessary for the chains to reach stationarity, and the chains were run for an additional 1 x 106 generations to sample the posterior probability landscape. The phylogeny identifies the major monophyletic lineages of RT related to retrons and retrointrons, including 64 non-LTR retrotransposons (retroposons), three retroplasmids, 22 retrons, and 88 retrointrons. Retroposons provide an appropriate outgroup to retroplasmids, retrons and retrointrons for phylogenetic analysis and were used to root the tree in figure 2 (Nakamura et al. 1997). In the overall phylogeny the major clades of retroids (retroplasmids, retrons, retroposons, and retrointrons) are each monophyletic. Retrointrons and retrons are sister clades and retroplasmids are sister to them.



View larger version (39K):
[in this window]
[in a new window]
 
FIG. 2. Bayesian inference (BI) phylogeny of archaeal retroids and retroids that fit the ARCH-RVT HMM. Three inferred horizontal transmission events from Bacteria to Archaea are indicated by bold arrows. Retrointron clades follow nomenclature of Zimmerly, Hausner, and Wu (2001), except the newly designated archaeal clades, {alpha}, ß and {delta}, which correspond to figure 1. ORFs from M. acetivorans are preceded by MA (table 1), and ORFs from M. mazei are preceded by MM (table 2). Other ORFs are identified by genus, species, and gene name and by protein GenInfo Identifier (GI). BI posterior probabilities (PP) are shown above each node; nodes with less than 50% PP support are collapsed. PP values are shown in bold when more than 50% of NJ distance bootstrap replicates also support the node and/or underlined when more than 50% of parsimony bootstrap replicates also support the node. See text for details of the analyses. The topology for the retron clade is shown in figure 3, and the topologies for the retroposon and retroplasmids clades are not shown. Retroplasmid GIs are 5052324, 11359715, 2226080. Retroposon GIs are 21296871, 21301295, 21296599, 21297850, 21296834, 21296602, 11359829, 21301225, 21296835, 6576738, 2636680, 17544114, 17568437, 17554978, 17508371, 17506149, 17509887, 4378025, 21104021, 9366563, 21294828, 21293795, 21292767, 159574, 7327281, 85020, 283590, 21291582, 21297143, 21291690, 18157526, 7496780, 12862434, 7493965, 21296089, 21293793, 21300974, 7511782, 7511783, 103359, 7511795, 2708259, 21740636, 22208512, 22830600, 8570049, 22755662, 2293759, 8843742, 15234647, 226557975410432, 5410438, 14286189, 7264294, 112261, 130551, 7511758, 422524, 7522140, 7494647, 7539028, 9631308, 134087

 
The archaeal retrointron RTs arise three times as crown groups in the tree (see arrows in fig. 2). One group, which we have named archaeal group {delta}, arises in the crown of the bacterial group D retrointrons. The clade consists of the internal introns in the three M. acetivorans twintrons, found in loci 3, 4, and 5. Archaeal group {delta} is sister to group II intron intB from the bacterium Escherichia coli (Ferat, Le Gouar, and Michel 1994).

A second group of 13 retrointrons arises in the crown of the algal chloroplast–like group. It consists of two clades, which we named archaeal groups {alpha} and ß. Archaeal group {alpha} consists of retrointrons from both M. mazei and M. acetivorans. In this clade, the external ORFs of the twintrons (loci 3, 4, and 5 in M. acetivorans) are basal to the two retrointrons found directly downstream of the twintrons (loci 4 and 5 in M. acetivorans) or alone (locus 6) and basal to three retrointrons from M. mazei (locus 8). The archaeal {alpha}/ß clade is sister to two bacterial full-length retrointrons, one from E. coli plasmid p0157 (ORF L0272) and another from P. putida (ORF494), which is at the locus of a transposon (Dai and Zimmerly 2002).

A third archaeal retrointron, MM3360, located alone at locus 9 of M. mazei, is in a group sister to the algal chloroplast–like group. MM3360 is sister to a retrointron fragment M.t.F1 from the bacterium Mycobacterium tuberculosis (Dai and Zimmerly 2002). M.t.F1 contains RT subdomains 1 to 3, whereas MM3360 contains subdomains 6 to 7. Interestingly, directly upstream of the M.t.F1 fragment is a microsatellite repeat. However, no such repeat is detectable surrounding MM3360.

The monophyly of retrointrons is supported by 84% PP, and the bacterial group C is supported as the basal retrointron group with 77% PP. Other major groups of retrointrons represented in figure 2 include bacterial group B, the fungal mitochondrial group, and potentially bacterial group A, represented by a single E. coli ORF (2443214). The basal position of the bacterial groups B and C is in agreement with previous analyses (Zimmerly, Hausner, and Wu 2001). No retrointrons from liverworts or plants in the mitochondrial group or euglenoids in the algal chloroplast–like group are present in the phylogeny.

The clade comprising the 22 retrons is shown in figure 3. This is a larger sample of retrons than has been previously examined (Lampson, Inouye, and Inouye 2001). The phylogeny identifies several new putative retrons and significantly extends the taxonomic distribution of retrons in Bacteria. Previously, retrons have only been identified in the {delta}-Proteobacteria and {gamma}-Proteobacteria. This analysis reveals retrons present in Cyanobacteria, Fusobacteria, Firmicutes, and ß-Proteobacteria, as well as in Archaea. The archaeal retron, MA2102, is sister to the Firmicutes retron, Staphylococcus aureus Sav2209.



View larger version (21K):
[in this window]
[in a new window]
 
FIG. 3. Retron clade from BI phylogeny of RT ORFs presented in figure 2. An inferred horizontal transmission event from Bacteria to Archaea is indicated by a bold arrow. MA2102 is the retron RT ORF from M. acetivorans (table 1). To the right of the topology, the relevant taxonomic group of each retron is indicated. Families in which retrons have not been previously identified in the literature are indicated by an asterisk (*). Bacterial retron RT ORFs are identified by genus, species, and gene name and by protein GenInfo Identifier (GI). BI posterior probabilities (PP) are shown above each node; nodes with less than 50% PP support are collapsed. PP values are shown in bold when NJ distance bootstrap also supports the node and/or underlined when parsimony bootstrap also supports the node. See text and figure 2 for details of the analyses

 
We were not able to detect the inverted repeats that define the terminals of msr and msd loci in the retron and would allow us to propose a secondary structure. The GC content of the M. acetivorans retron is 31%. For comparison, the GC content of the retron from S. aureus is 27% (genome 33%) and the GC content of the M. acetivorans genome is 43%.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
Although many aspects of the structure and function of archaeal retroids remain to be analyzed, these findings significantly extend the known phylogenetic distribution of retroids to include all three of life's primary lineages: Eukarya, Bacteria, and Archaea. Retroids are clustered into seven loci in the M. acetivorans genome and two loci of the M. mazei genome (fig. 1). Each locus contains one to four RT ORFs (when the external introns of twintrons are concatenated), with a varying complement of retrointron RNA domains, transposase, and integrase ORFs and genomic DNA repeats.

The retroids in seven of the nine loci from M. acetivorans and M. mazei described here are oriented opposite the direction of replication in their respective genomes. Typically, the majority of genes in Archaea are transcribed in the same direction as DNA replication; however, in several archaeal genomes, including Pyrococcus horikoshii, P. abyssi, and P. furisus, around half of the genes are transcribed in the opposite direction of replication (Zivanovic et al. 2002). Although colinearity of transcription and replication might not be under heavy constraint in M. mazei and M. acetivorans as it is in other prokaryotes such as E. coli and Bacillus subtilis, selection generally appears to favor colinearity for highly expressed genes in Archaea as in Bacteria.

The phylogeny diagnoses the relationship between major groups of retroids: retrointrons, retrons, retroplasmids, and retroposons (fig. 2). Here, retrointrons and retrons are sister groups, whereas Nakamura et al. (1997), using a smaller data set and Neighbor-Joining, found retroplasmids and retrons as sister groups.

The phylogeny indicates common ancestry of the archaeal retroids with retroids from Bacteria, and four lateral transfers from Bacteria to Archaea (figs. 2 and 3). In light of the fundamentally similar genome organization of Bacteria and Archaea (Baumann, Qureshi, and Jackson 1995), we are not surprised retroids found in Archaea are related to those of Bacteria and organelles rather than eukaryotes. The difference in GC content between the retron MA2102 and the genome of M. mazei is consistent with LGT, and the presence of retrointrons at several different loci in M. acetivorans and two different loci in M. mazei, rather than at a single locus (Deppenmeier et al. 2002), suggest multiple LGTs and/or mobility of the retrointrons within Archaea. Zimmerly, Hausner, and Wu (2001) inferred horizontal transfer of retrointrons within and between bacterial genomes and the organellar genomes of Fungi and algae. A large amount of LGT was generally observed from Bacteria in the M. acetivorans and M. mazei genomes (Deppenmeier et al. 2002; Galagan et al. 2002). For example, in M. mazei, 30% of ORFs have their most significant Blast match in Bacteria, with about 16% having significant matches only in Bacteria, including 56 of the 102 transposases (Deppenmeier et al. 2002). Rates of apparent LGT from Bacteria to M. acetivorans are similarly high (Galagan et al. 2002) and have been observed elsewhere in Archaea (Nesbo et al. 2001). LGT can be attributed to proximity of local populations of Archaea and Bacteria (Deppenmeier et al. 2002).

The presence of a retron (MA2102) in M. acetivorans, as well as first observation of retrons in the bacterial groups Firmicutes, Fusobacteria, Cyanobacteria, and ß-Proteobacteria (fig. 3) adds to the mystery concerning the function of msDNAs. Retrons encode DNA and RNA complexes, referred to as "multicopy single-stranded DNA," or msDNAs, found in some Bacteria and synthesized by RT (reviewed in Lampson, Inouye, and Inouye 2001). However, the function of msDNA is unknown.

Origins of Retrointrons in Methanosarcina acetivorans
Of particular interest are the 13 retrointrons, including several twintrons, in the archaeal {alpha}/ß clade. As in all other known twintrons, the internal intron of the M. acetivorans twintrons disrupts the functional domain of the external intron. The excision of all the internal introns, prior to that of the external, is essential for complete twintron excision (Copertino and Hallick 1991). The insertion of a mobile intron into another already fixed intron is the most likely method of twintron formation.

We propose that the phylogeny (fig. 2) and relative chromosomal location (fig. 1) suggest three major events that led to the current diversity and distribution of this clade in Archaea:

(1) Group {alpha} diverged from group ß upon transfer into M. acetivorans from Bacteria. (2) Transfer of group {delta} from Bacteria to M. acetivorans, followed by the insertion of a group {delta} retrointron into the ancestral {alpha} (perhaps at locus 4), formed a twintron. Subsequently, this twintron retrotransposed, or was otherwise duplicated, to loci 3 and 5. It is also possible that the insertion of {delta} into {alpha} occurred three times (at loci 3, 4, and 5) or that LGT of the twintron from Bacteria to M. acetivorans occurred three times, although we consider these possibilities to be unlikely. Two single {alpha} element and two ß elements are found downstream of the twintrons, and while the origin and timing of these insertions is not clear, it is possible that they are the result of cis retrotransposition of the twintron after excision of the internal intron. (3) The crown position of the M. mazei {alpha} retrointrons with respect to the M. acetivorans retrointrons suggests possible LGT from M. acetivorans to M. mazei.

Interestingly, six of the eight loci in M. mazei and M. acetivorans that have retrointrons also contain transposases. It has been suggested that the abundance of bacterial and archaeal transposases in the Methanosarcina genomes are an indication of their extensive utilization of LGT to promote genetic diversity (Deppenmeier et al. 2002).

Potential Impact of Retroids on Archaeal Genomes
The Methanosarcineae are the most metabolically versatile methanogens known, and although vertical evolution and LGT themselves are probably the major driving forces, the presence of retroids in Methanosarcina is also correlated with this metabolic expansion. Retrointrons can act as retroelements, particularly in Bacteria (Dai and Zimmerly 2002), thereby contributing to genome expansion and rearrangement (Sellem, Lecellier, and Belcour 1993). The number and duplicated nature of retrointrons found in M. acetivorans, together with its relatively large genome size, suggests they may have contributed to the "strikingly wide and unanticipated variety of metabolic and cellular capabilities" observed in the M. acetivorans genome (Galagan et al. 2002).

Only two of the 16 (12.5%) completely sequenced archaeal genomes contain retroids, M. mazei (4 RTs, ~4 Mb) and M. acetivorans (10 RTs, ~6 Mb) (as of November 1, 2002). One previous study (Ben-Mahrez et al. 1991) suggested possible biochemical RT activity in the halophile Halobacterium halobium, however the complete sequencing of the H. halobium genome failed to confirm this. In comparison, 24/85 (28%) of completely sequenced whole-bacterial chromosomes and 23/64 (36%) of completely sequenced unique whole-bacterial genomes contain RT (unpublished data). Some studies have suggested a correlation between retroid abundance and genome expansion (e.g., Elsik and Williams 2000; Shirasu et al. 2000), and it is perhaps no coincidence that M. acetivorans and M. mazei are the only two archaeal genomes sequenced that are greater than 3.5 Mb. Although the evidence is suggestive, there is no clear correlation in Bacteria between RT abundance and genome size (unpublished data).

Implications for Hypotheses of the Origin of Catalytic RNAs and Introns
Whereas some retrointrons in M. acetivorans do not contain RT, many do contain an RT ORF (this study; Dai and Zimmerly 2003). The widespread distribution of retrointrons with RT is consistent with the retroelement ancestor hypothesis, which predicts that retrointrons are ancestors of retroelements that gained RNA catalytic function and supports the idea that the catalytic RNA activity of the spliceosome and retrointrons is independently derived (convergent) rather than homologous with ribozymes from the RNA world (Lambowitz and Belfort 1993; Wank et al. 1999; Toor, Hausner, and Zimmerly 2001).

The retrointrons observed in Archaea, given current sampling and the phylogeny in figure 1, are due to LGT. The sample of Archaea, thus far, is small and sequencing of additional archaeal diversity is necessary. However, origins of archaeal retrointrons linked to LGT suggests that introns were either lost from Archaea at an early point in time or, more likely, that their origin in Bacteria came about after existence of the progenote. This distribution is also consistent with the retroelement ancestor hypothesis, described above. LGT between Bacteria and Archaea is consistent with the substantial horizontal transfer of retrointrons that has been observed within Bacteria (Zimmerly, Hausner, and Wu 2001; Dai and Zimmerly 2002). The three LGTs of retrointrons from Bacteria to Archaea inferred in this study are remarkably parallel to the hypothesized introduction of the spliceosome and its associated introns into eukaryotes by retrointrons, perhaps seeded by the mitochondrial endosymbiont (Michel and Lang 1985; Sharp 1985; Roger and Doolittle 1993; Sontheimer and Steitz 1993; Eickbush 1999; Simpson, MacQuarrie, and Roger 2002). The similarity in structure and function between the five domains of retrointrons and the five snRNAs of the spliceosome led to the suggestion of homology between these two types of introns and the hypothesis of a mitochondrial seed. Although previous studies have inferred LGT of introns among and between bacterial and organellar genomes (e.g., Zimmerly, Hausner, and Wu 2001), we are only aware of one other event of intron LGT that crosses the major divisions of life (under the paradigm that organelles are part of the Bacterial division). Kudla et al. (2002) inferred in the monocot Washingtonia robusta that a retrointron sequence was transferred from the mitochondrial to the nuclear genome and part of the retrointron was used to build a spliceosomal intron in the alcohol dehydrogenase gene. We consider this transfer from the mitochondrial to nuclear genome and the LGT from Bacteria to M. acetivorans and M. mazei as evidence supporting the viability of the mitochondrial seed hypothesis for the origin of the spliceosome.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 
We thank J. Moran, M. Dimmic, and S. Stuart for helpful comments. This research was supported with funding from NIH Grant T32-HG00040 to J.S.R. and NSF Grant DBI-9974525 to D.P.M.


    Footnotes
 
Keith Crandall, Associate Editor Back


    Literature Cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 Literature Cited
 

    Bateman, A., E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S. R. Eddy, S. Griffiths-Jones, K. L. Howe, M. Marshall, and E. L. Sonnhammer. 2002. The Pfam protein families database. Nucleic Acids Res. 30:276-280.[Abstract/Free Full Text]

    Baumann, P., S. A. Qureshi, and S. P. Jackson. 1995. Transcription: new insights from studies on Archaea. Trends Genet. 11:279-283.[CrossRef][ISI][Medline]

    Ben-Mahrez, K., I. Sorokine, M. Nakayama, and M. Kohiyama. 1991. Reverse transcriptase in archaebacteria: purification and characterization of a primase-reverse-transcriptase complex from Halobacterium halobium. Eur. J. Biochem. 195:157-162.[Abstract]

    Copertino, D. W., and R. B. Hallick. 1991. Group II twintron: an intron within an intron in a chloroplast cytochrome b-559 gene. EMBO J. 10:433-442.[Abstract]

    Curcio, M. J., and M. Belfort. 1996. Retrohoming: cDNA-mediated mobility of group II introns requires a catalytic RNA. Cell 84:9-12.[ISI][Medline]

    Dai, L., and S. Zimmerly. 2002. Compilation and analysis of group II intron insertions in bacterial genomes: evidence for retroelement behavior. Nucleic Acids Res. 30:1091-1102.[Abstract/Free Full Text]

    Dai, L., and S. Zimmerly. 2003. ORF-less and reverse-transcriptase-encoding group II introns in archaebacteria, with a pattern of homing into related group II intron ORFs. RNA 9:14-19.[Abstract/Free Full Text]

    Deppenmeier, U., A. Johann, and T. Hartsch, et al. (22 co-authors). 2002. The genome of Methanosarcina mazei: evidence for lateral gene transfer between Bacteria and Archaea. J. Mol. Microbiol. Biotechnol. 4:453-461.[ISI][Medline]

    Dimmic, M. W., J. S. Rest, D. P. Mindell, and R. A. Goldstein. 2002. rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J. Mol. Evol. 55:65-73.[CrossRef][ISI][Medline]

    Doolittle, R. F., D. F. Feng, M. S. Johnson, and M. A. McClure. 1989. Origins and evolutionary relationships of retroviruses. Q. Rev. Biol. 64:1-30.[ISI][Medline]

    Eddy, S. R. 1996. Hidden Markov models. Curr. Opin. Struct. Biol. 6:361-365.[CrossRef][ISI][Medline]

    Eddy, S. R. 1998. Profile hidden Markov models. Bioinformatics 14:755-763.[Abstract]

    Eickbush, T. H. 1999. Mobile introns: retrohoming by complete reverse splicing. Curr. Biol. 9:R11-14.[CrossRef][ISI][Medline]

    Elsik, C. G., and C. G. Williams. 2000. Retroelements contribute to the excess low-copy-number DNA in pine. Mol. Gen. Genet. 264:47-55.[CrossRef][ISI][Medline]

    Ferat, J. L., M. Le Gouar, and F. Michel. 1994. Multiple group II self-splicing introns in mobile DNA from Escherichia coli. C. R. Acad. Sci. Serie III Sci. Vie 317:141-148.

    Galagan, J. E., C. Nusbaum, and A. Roy, et al. (55 co-authors). 2002. The genome of M. acetivorans reveals extensive metabolic and physiological diversity. Genome Res. 12:532-542.[Abstract/Free Full Text]

    Gilbert, W., M. Marchionni, and G. McKnight. 1986. On the antiquity of introns. Cell 46:151-153.[ISI][Medline]

    Huelsenbeck, J. P., and F. Ronquist. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17:754-755.[Abstract/Free Full Text]

    Jeffares, D. C., A. M. Poole, and D. Penny. 1998. Relics from the RNA world. J. Mol. Evol. 46:18-36.[ISI][Medline]

    Karchin, R., and R. Hughey. 1998. Weighting hidden Markov models for maximum discrimination. Bioinformatics 14:772-782.[Abstract]

    Kohlstaedt, L. A., J. Wang, J. M. Friedman, P. A. Rice, and T. A. Steitz. 1992. Crystal structure at 3.5 A resolution of HIV-1 reverse transcriptase complexed with an inhibitor. Science 256:1783-1790.[ISI][Medline]

    Kudla, J., F. J. Albertazzi, D. Blazevic, M. Hermann, and R. Bock. 2002. Loss of the mitochondrial cox2 intron 1 in a family of monocotyledonous plants and utilization of mitochondrial intron sequences for the construction of a nuclear intron. Mol. Genet. Genomics 267:223-230.[CrossRef][ISI][Medline]

    Lambowitz, A. M., and M. Belfort. 1993. Introns as mobile genetic elements. Annu. Rev. Biochem. 62:587-622.[CrossRef][ISI][Medline]

    Lampson, B., M. Inouye, and S. Inouye. 2001. The msDNAs of bacteria. Prog. Nucleic Acid Res. Mol. Biol. 67:65-91.[ISI][Medline]

    Larget, B., and D. L. Simon. 1999. Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16:750-759.[Free Full Text]

    Logsdon, J. M., Jr. 1998. The recent origins of spliceosomal introns revisited. Curr. Opin. Genet. Dev. 8:637-648.[CrossRef][ISI][Medline]

    Madhani, H. D., and C. Guthrie. 1994. Dynamic RNA-RNA interactions in the spliceosome. Annu. Rev. Genet. 28:1-26.[CrossRef][ISI][Medline]

    Mau, B., M. A. Newton, and B. Larget. 1999. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics 55:1-12.[ISI][Medline]

    McClure, M. A. 1999. The retroid agents: disease, function and evolution. Pp. 163–195 in E. Domingo, R. Webster, and J. Holland, eds. Origin and evolution of viruses. Academic Press, London.

    Michel, F., and B. F. Lang. 1985. Mitochondrial class II introns encode proteins related to the reverse transcriptases of retroviruses. Nature 316:641-643.[ISI][Medline]

    Nakamura, T. M., G. B. Morin, K. B. Chapman, S. L. Weinrich, W. H. Andrews, J. Lingner, C. B. Harley, and T. R. Cech. 1997. Telomerase catalytic subunit homologs from fission yeast and human. Science 277:955-959.[Abstract/Free Full Text]

    Natvig, D. O., G. May, and J. W. Taylor. 1984. Distribution and evolutionary significance of mitochondrial plasmids in Neurospora spp. J. Bacteriol. 159:288-293.[ISI][Medline]

    Nesbo, C. L., S. L'Haridon, K. O. Stetter, and W. F. Doolittle. 2001. Phylogenetic analyses of two "archaeal" genes in Thermotoga maritima reveal multiple transfers between Archaea and Bacteria. Mol. Biol. Evol. 18:362-375.[Abstract/Free Full Text]

    Pande, S., E. G. Lemire, and F. E. Nargang. 1989. The mitochondrial plasmid from Neurospora intermedia strain Labelle-1b contains a long open reading frame with blocks of amino acids characteristic of reverse transcriptases and related proteins. Nucleic Acids Res. 17:2023-2042.[Abstract]

    Roger, A. J., and W. F. Doolittle. 1993. Molecular evolution: why introns-in-pieces? Nature 364:289-290.[CrossRef][ISI][Medline]

    Sellem, C. H., G. Lecellier, and L. Belcour. 1993. Transposition of a group II intron. Nature 366:176-178.[CrossRef][ISI][Medline]

    Sharp, P. A. 1985. On the origin of RNA splicing and introns. Cell 42:397-400.[ISI][Medline]

    Shirasu, K., A. H. Schulman, T. Lahaye, and P. Schulze-Lefert. 2000. A contiguous 66-kb barley DNA sequence provides evidence for reversible genome expansion. Genome Res. 10:908-915.[Abstract/Free Full Text]

    Simpson, A. G., E. K. MacQuarrie, and A. J. Roger. 2002. Eukaryotic evolution: early origin of canonical introns. Nature 419:270.[CrossRef][ISI][Medline]

    Sontheimer, E. J., and J. A. Steitz. 1993. The U5 and U6 small nuclear RNAs as active site components of the spliceosome. Science 262:1989-1996.[ISI][Medline]

    Suzuki, Y., G. V. Glazko, and M. Nei. 2002. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Sci. USA 99:16138-16143.[Abstract/Free Full Text]

    Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, Mass.

    Toor, N., G. Hausner, and S. Zimmerly. 2001. Coevolution of group II intron RNA structures with their intron-encoded reverse transcriptases. RNA 7:1142-1152.[Abstract/Free Full Text]

    Walther, T. C., and J. C. Kennell. 1999. Linear mitochondrial plasmids of F. oxysporum are novel, telomere-like retroelements. Mol. Cell 4:229-238.[ISI][Medline]

    Wank, H., J. SanFilippo, R. N. Singh, M. Matsuura, and A. M. Lambowitz. 1999. A reverse transcriptase/maturase promotes splicing by binding at its own coding segment in a group II intron RNA. Mol. Cell 4:239-250.[ISI][Medline]

    Wilcox, T. P., D. J. Zwickl, T. A. Heath, and D. M. Hillis. 2002. Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support. Mol. Phylogenet. Evol. 25:361-371.[CrossRef][ISI][Medline]

    Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353-3362.[Abstract]

    Yang, Z. H., and B. Rannala. 1997. Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo method. Mol. Biol. Evol. 14:717-724.[Abstract]

    Zimmerly, S., H. Guo, P. S. Perlman, and A. M. Lambowitz. 1995. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell 82:545-554.[ISI][Medline]

    Zimmerly, S., G. Hausner, and X. Wu. 2001. Phylogenetic relationships among group II intron ORFs. Nucleic Acids Res. 29:1238-1250.[Abstract/Free Full Text]

    Zivanovic, Y., P. Lopez, H. Philippe, and P. Forterre. 2002. Pyrococcus genome comparison evidences chromosome shuffling-driven evolution. Nucleic Acids Res. 30:1902-1910.[Abstract/Free Full Text]

Accepted for publication March 11, 2003.