Proliferation and Deterioration of Rickettsia Palindromic Elements

Haleh Amiri, Cecilia M. Alsmark and Siv G. E. Andersson

Department of Molecular Evolution, University of Uppsala, Sweden


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
It has been suggested that Rickettsia Palindromic Elements (RPEs) have evolved as selfish DNA that mediate protein sequence evolution by being targeted to genes that code for RNA and proteins. Here, we have examined the phylogenetic depth of two RPEs that are located close to the genes encoding elongation factors Tu (tuf) and G (fus) in Rickettsia. An exceptional organization of the elongation factor genes was found in all 11 species examined, with complete or partial RPEs identified downstream of the tuf gene (RPE-tuf) in six species and of the fus gene (RPE-fus) in 10 species. A phylogenetic reconstruction shows that both RPE-tuf and RPE-fus have evolved in a manner that is consistent with the expected species divergence. The analysis provides evidence for independent loss of RPE-tuf in several species, possibly mediated by short repetitive sequences flanking the site of excision. The remaining RPE-tuf sequences evolve as neutral sequences in different stages of deterioration. Likewise, highly fragmented remnants of the RPE-fus sequence were identified in two species. This suggests that genome-specific differences in the content of RPEs are the result of recent loss rather than recent proliferation.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Genome size expansions may occur by the acquisition of foreign DNA via horizontal gene transfer (Ochman, Lawrence, and Groisman 2000Citation ), by intragenomic duplications or by the acquisition of transposons and other types of repetitive sequences that propagate as selfish DNA (Gregory and Herbert 1999Citation ; Hartl 2000Citation ). Genome size variations in eukaryotes may also result from differences in the rate of spontaneous DNA elimination (Petrov et al. 2000Citation ). Likewise, it has been shown that deletions occur predominantly in selectively unconstrained sequences in the small genomes of obligate intracellular parasites of the genus Rickettsia (Andersson and Andersson 1999aCitation , 1999bCitation , 2001Citation ). In order to understand the evolution of genome sizes and content, it is necessary to understand the balance between proliferation and elimination of genetic material.

In humans, repeated sequences represent about 50% of the entire genome (International Human Genome Sequencing Consortium, 2001Citation ). These consist of transposon-derived repeats, processed pseudogenes, short sequence repeats, and large segmental duplications. Mobile elements, such as insertion sequences and transposons normally encode one or several enzymes required for transposition between the 10- to 40-bp inverted repeats flanking the element (Mahillon and Chandler 1998Citation ; Mahillon, Leonard, and Chandler 1999Citation ). Short repetitive elements previously described in the {gamma}-proteobacteria include the 125-bp intergenic repeat unit and the 152-bp-long RSA repeat (Bachellier, Clement, and Hofnung 1999Citation ; Rudd 1999Citation ).

A novel type of repetitive element was recently identified in Rickettsia conorii, called the Rickettsia Palindromic Element (RPE) (Ogata et al. 2000Citation ). This element represents the first example of a repetitive element that is inserted into protein-coding genes in bacteria (Ogata et al. 2000Citation ). However, the timing of acquisition is unclear, and it is not known whether the 45 RPEs in the R. conorii genome are related by species divergence or by recent intragenomic duplications. One extreme interpretation is that the RPEs are modern-day remnants of a highly mobile RNA world (Dwyer 2001Citation ). On the other extreme, it has been suggested that RPE elements have spread by intragenomic proliferation subsequent to speciation within the genus Rickettsia (Ogata, Audic, and Claverie 2001Citation ).

Here, we have studied the evolution of RPEs positioned downstream of the elongation factor genes in 10 species of Rickettsia. The genes encoding elongation factor Tu (tufA) and elongation factor G (fus) are normally colocated in the streptomycin (str) operon (Sicheritz-Ponten and Andersson 1997Citation ). Most proteobacteria have one additional gene coding for elongation factor Tu (tufB), which is part of the tufB gene operon (Sicheritz-Ponten and Andersson 1997Citation ). It has been suggested that the elongation factor genes were part of a large cluster of ribosomal protein genes (the so-called superribosomal protein operon) in the common ancestor of bacteria and archaea (Keeling, Charlebois, and Doolittle 1994Citation ). The presumed ancestral gene order is conserved in bacteria such as Escherichia coli and Bacillus subtilis (Wächtershäuser 1998Citation ).

It has been very informative to compare these conserved genomic structures with those found in R. prowazekii. This species is unique among the proteobacteria in that it has only one copy of the tuf gene that is not contained within the conventional str operon or within the typical tufB gene neighborhood (Syvänen et al. 1996Citation ). Upstream of the single tuf gene, we have identified two of the tRNA genes (tRNATyr and tRNAGly) that are located upstream of the tufB gene in E. coli (Syvänen et al. 1996Citation ). Downstream of this gene, we have found the S10 ribosomal protein gene operon that is located downstream of the tufA gene in E. coli (Syvänen et al. 1996Citation ). The chimeric disposition of the single tuf gene is thought to be the result of an intrachromosomal recombination event that caused an inversion of the segment flanked by the two ancestral tuf genes, followed by a deletion of one tuf gene (Andersson and Kurland 1995Citation ; Syvänen et al. 1996Citation ).

In this article, we have examined the arrangement of the elongation factor genes in several {alpha}-proteobacterial species and studied the phylogeny of RPEs located in the spacer region downstream of the elongation factor genes in 10 Rickettsia species. The Typhus Group Rickettsia (TG) includes species such as R. prowazekii and R. typhi that are transmitted by insects. The Spotted Fever Group Rickettsia (SFG) are transmitted by ticks and include species such as R. rickettsii, R. parkeri, R. sibirica, R. montana, R. rhipicephali, R. felis and R. helvetica. We show that the acquisition of RPEs as well as the rearrangement of the elongation factor genes occurred before the divergence of the TG and the SFG, but subsequent to divergence of genera within the {alpha}-proteobacteria.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Genomic DNA and Cosmid Clones
Genomic DNA from Rickettsia species was isolated and purified as described previously (Pretzman et al. 1987Citation ). The nucleotide sequences of the tuf, rpsJ, fus, secE, trpT, and nusG genes along with their intergenic regions were determined in R. helvetica, R. montana (strain OH 83-441), R. parkeri, R. rickettsii (strain Sawtooth), R. rhipicephali (strain 3-7-6), R. sibirica (strain 246), R. prowazekii (strain 22-2), R. typhi (strain Wilmington, Isakanjan, Azk1, Azk3), R. felis, and R. bellii (strain 369-C). In R. conorii (strain ITT-586), only the spacer regions of fus and secE were determined. Cosmid clones containing the genes for elongation factors Tu and G from Rhodobacter capsulatus sb1003 were kindly provided by R. Haselkorn. Cosmid DNA was prepared and subcloned into bluescript M13 (pBS) vector using standard procedures. The elongation factor gene sequences from Bartonella henselae, strain Houston 1, were obtained as part of our in-house genome project (U.C.M. Alsmark, C. Frank, B. Canbäck, D. Ardell, A.-S. Eriksson, K. Näslund, S. Handley, C. Hansen, M. Holmberg, S.G.E. Andersson, personal communication).

Polymerase Chain Reaction Amplification
We designed a set of degenerate primers (table 1 ) based on highly conserved parts of the tuf, rpsJ, fus, secE, trpT, and nusG genes. The primers were constructed so that the codon bias of Rickettsia (Andersson et al. 1998Citation ) was taken into account. An additional set of species-specific primers was designed in order to cover missing parts and to deal with the difficulties of cross-species polymerase chain reaction (PCR) amplification.


View this table:
[in this window]
[in a new window]
 
Table 1 Primers Used for the Amplification of the Elongation Factor Genes

 
PCR reactions were carried out under standard conditions in 100-µl reactions using 20–40 ng DNA, 2 U Taq polymerase (Promega Biotech, Madison, Wis.) and 50 pmol of each primer, in a DNA thermal cycler. The program included the following steps: 94°C for 2 min, 80°C while the DNA polymerase was added, 30 cycles of 95°C for 1 min, 37°–59°C for 1 min, 72°C for 1 min, followed by a final 5-min extension at 72°C. The annealing temperature was 4°–6°C below the melting temperature of the primers. The PCR fragments were purified from the reactions using Wizard YM PCR Preps DNA Purification System (Promega Biotech).

Sequencing
Plasmid clones and overlapping PCR products were sequenced on both strands using the PCR primers as well as internal oligomers. Sequence reactions were carried out using terminator cycle sequencing based on the Sangers dideoxy chain termination method. Reactions included 30–60 ng PCR product and 3.2 pmol primer in addition to dNTPs, fluorescent ddNTPs, and thermostable DNA polymerase provided in ABI PRISMTM Big Dye Terminator Cycle Sequencing Kit (Perkin Elmer). Thermal cycling was performed using a Peltier Thermal Cycler 225 (MS Research) in 25 cycles repeating the following steps: 96°C for 10 s, annealing temperature for 5 s, and 60°C for 4 min. Amplifications were separated on a 5% long ranger denaturing polyacrylamide gel on an ABI sequenator (Perkin Elmer).

Sequence Analysis
The sequences were assembled and edited using STADEN software (Staden 1996Citation ). Processed consensus sequences were aligned with the aid of ClustalW software (Thompson, Higgins, and Gibson 1994Citation ). We calculated the overall G+C content and the G+C content at third codon positions with the help of CodonW software (Lloyd and Sharp 1992Citation ). Pairwise distances for synonymous and nonsynonymous substitutions, Ks and Ka values, were calculated for the fus and the tuf genes separately using Li's method with the aid of MATDISLI software (Li 1993Citation ). The phylogenetic reconstructions were accomplished using maximum parsimony (MP) and neighbor-joining (NJ) (Saitou and Nei 1987Citation ) methods in Phylo_win (Galtier, Gouy, and Gautier 1996Citation ) and PAUP (Swofford 1998Citation ) software.

Nucleotide Sequence Accession Numbers
The nucleotide sequences obtained in this study have been given the following GenBank accession numbers: The tuf gene and downstream sequences in R. parkeri, AF502180; R. sibirica, AF502181; R. rickettsii, AF502179; R. montana, AF502183; R. rhipicephali, AF502182; R. felis, AF502185; R. helvetica, AF502184; R. typhi, AF502186; and R. bellii, AF502187. The fus gene and downstream sequences in R. parkeri, AF502171; R. sibirica, AF502172; R. rickettsii, AF502170; R. montana, AF502174; R. rhipicephali, AF502173; R. felis, AF502178; R. helvetica, AF502175; R. typhi, AF502176; and R. bellii, AF502177. The B. henselae tufA gene, AY099295; tufB gene, AY099292; and fus gene, AY099293. The R. capsulatus fus and tufA genes, AY099291; and tufB gene, AY099294.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Sequence of Elongation Factor Genes in {alpha}-proteobacteria
We have shown previously that the R. prowazekii genome is characterized by an inversion within the superribosomal protein gene operon (Syvänen et al. 1996Citation ; Andersson et al. 1998Citation ). In order to place this rearrangement event in an evolutionary context, we have sequenced the genes encoding elongation factors Tu (EF-Tu) and elongation factor G (EF-G) as well as their downstream flanking sequences in nine other lineages of Rickettsia and in two additional {alpha}-proteobacterial species. Nucleotide frequencies of the tuf and fus genes reflect the genomic mutational bias of each species, particularly at synonymous third codon positions (GC3S). Thus, the fus and tuf genes have GC3S values ranging from 14% in the A+T–rich R. prowazekii genome to 85% in the G+C–rich genome of R. capsulatus.

Members of the TG have GC3S values that are about 3%–5% lower those of members of the SFG, as also observed for several other genes (Andersson and Andersson 1997Citation , 1999aCitation , 1999bCitation , 2001Citation ). In contrast to the strong effect on silent sites, the biased mutation pressure has only altered the amino acid composition features of the elongation factors to a minor extent. For EF-G, the average ratio of amino acids encoded by A+T–rich codons over those encoded by G+C–rich codons (Andersson and Sharp 1996Citation ) is 1.35 in Rickettsia compared with 0.99 in R. capsulatus, and no variation in the overall amino acid composition is observed for EF-Tu. This suggests that the elongation factors are suitable as phylogenetic markers despite the strong mutation pressures acting on the {alpha}-proteobacterial genomes.

Phylogenetic Relationships Inferred from the Elongation Factors
To establish the phylogenetic context of the {alpha}-proteobacterial species selected for analysis, we performed a phylogenetic reconstruction based on the combined amino acid sequences of the two elongation factors (fig. 1 ). The EF-Tu and EF-G sequences from Rickettsia, B. henselae, R. capsulatus, and Agrobacterium tumefaciens were first aligned with the corresponding sequences from representative bacterial species. The {alpha}-proteobacterial elongation factors are of similar sizes, with a few exceptions. For example, B. henselae and A. tumefaciens share an insertion of five amino acids in EF-G with the consensus sequence GRDG(K/R), and all Rickettsia species are characterized by a unique insertion of nine amino acids with the consensus sequence VKDLKDEDK.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 1.—Phylogenetic relationships of (A) {alpha}-proteobacteria based on the combined protein sequences of EF-Tu and EF-G and (B) Rickettsia species derived from the combined nucleotide sequences of the tuf and fus genes. Branch lengths are proportional to those reconstructed with the neighbor-joining (NJ) method. Values at nodes indicate the percentages of 500 NJ bootstraps and 500 maximum parsimony bootstraps, in this order. Only bootstrap support values above 75% are shown.

 
The unrooted tree topologies (fig. 1A ) produced from these alignments by distance neighbour-joining and maximum parsimony methods were identical and have features as discussed in the following. First, the {alpha}-proteobacteria represents a monophyletic clade (100% bootstrap support), in which the lineage leading to Rickettsia diverged before the divergence of the other three species (100% bootstrap support). The topology further indicates that R. capsulatus diverged before the divergence of A. tumefaciens and B. henselae (100% bootstrap support). An unexpected feature of this tree is that the yeast mitochondrial sequence forms a cluster with the spirochetes with bootstrap support values of 100%. This clustering is also seen in reconstructions based on EF-G, but not in the phylogenies based on EF-Tu (data not shown). The close relationship of mitochondrial and spirochete EF-G sequences is possibly because of a horizontal gene transfer event from the spirochetes into the eukaryotes, in an event that was distinct from the endosymbiotic event that gave rise to mitochondria from the {alpha}-proteobacteria (Kurland and Andersson 2000Citation ; Karlberg et al. 2000Citation ).

We utilized the same methods to establish the phylogenetic relationships among the rickettsial species selected for analysis (fig. 1B ). Because of their close relationships, this analysis was based on the nucleotide sequences of the fus and tuf genes. The inferred tree separates the TG (R. prowazekii and R. typhi) from the SFG (R. rickettii, R. sibirica, R. parkeri, R. montana, R. rhipicephali, R. felis, and R. helvetica), whereas R. bellii represents an earlier diverging species. The results further suggest that the pathogenic species R. parkeri, R. sibirica, and R. rickettsii are phylogenetically distinct from the nonpathogenic species R. rhipicephali and R. montana (97% bootstrap support). R. felis and R. helvetica are placed as early diverging species within the SFG in this tree. The general features of this phylogeny match the tree topology previously obtained with small and large ribosomal RNA as well as with the citrate synthase gene sequences (Roux and Raoult 1995Citation ; Stothard and Fuerst 1995Citation ; Roux et al. 1997Citation ; Andersson et al. 1999Citation ).

Likewise, the frequencies of nonsynonymous (Ka) and synonymous (Ks) substitutions distinguish the TG from the SFG (table 2 ). The mean Ka and Ks values for comparisons across the two groups are 0.046 and 0.15 substitutions per position for the tuf gene, respectively. These values are comparable to the mean Ka and Ks values of 0.035 and 0.33 substitutions per site for the fus gene. R. bellii is equally distant from both of these two groups, as inferred from Ka and Ks values in the range of 0.15 and 0.35 for the tuf gene and 0.04 and 0.66 for the fus gene. These synonymous substitution frequencies are consistent with previous estimates for these species based on other gene sequences (Andersson et al. 1998Citation ; Andersson and Andersson 1999aCitation , 1999bCitation ). Taken together, our analysis confirms that the elongation factor genes sequenced here accurately reflect the evolution of the {alpha}-proteobacterial genomes in which they reside.


View this table:
[in this window]
[in a new window]
 
Table 2 Synonymous and Nonsynonymous Substitutions (x100) for the tuf and fus Genes. Nonsynonymous Substitutions (KA) Are Shown Above the Diagonals and Synonymous Substitutions (KS) Are Shown Below the Diagonals

 
Rearrangement of the Elongation Factor Genes in Rickettsia
The organization of the elongation factor genes tuf and fus in the superribosomal protein gene operon in E. coli is thought to represent the ancestral gene organization profile of the elongation factor genes (fig. 2A ) (Keeling, Charlebois, and Doolittle 1994Citation ; Sicheritz-Ponten and Andersson 1997Citation ; Wächtershäuser 1998Citation ). In accordance with this structure, we identified two copies of the tuf gene in B. henselae and R. capsulatus, as was also previously observed in A. tumefaciens (Syvänen et al. 1996Citation ). Furthermore, we found that the tufA gene is located near to the fus gene in B. henselae and R. capsulatus, and that the second tuf gene, tufB, is located distantly from this operon (fig. 2B ). In B. henselae, the ribosomal protein gene operons S10, spc and {alpha} have been found to be located downstream of the str operon. Likewise, we have identified the gene encoding RNA polymerase (rpoC) upstream of the str operon and the gene encoding ribosomal protein S10 (rpsJ) downstream of this operon in R. capsulatus.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 2.—Rearrangement of the elongation factor genes and position of Rickettsia palindromic elements (RPEs) in Rickettsia species. (A) The conserved, ancestral organization of the elongation factor genes, as observed in Escherichia coli and other proteobacteria. Boxes with stripes represent the tufA gene and genes normally located in the neighborhood of the tufA gene. Black boxes represent the tufB gene and genes normally located in the neighborhood of the tufB gene. (B) Schematic illustration of the phylogenetic relationships of {alpha}-proteobacteria as inferred from figure 1 . Boxes illustrate the rearranged (Rickettsia) and conserved (Bartonella, Agrobacterium, and Rhodobacter) organization of the elongation factor genes. Triangles represent RPEs, and symbols ({psi}) indicate pseudo-RPEs with frameshift mutations or internal termination codons

 
In contrast, only one single tuf gene was previously identified in the R. prowazekii genome (Andersson et al. 1998Citation ). Likewise, only one tuf gene could be amplified from the genomes of R. typhi, R. parkeri, R. sibirica, R. rickettsii, R. montana, R. rhipicephali, R. helvetica, R. felis, and R. bellii. All species displayed the same rearranged organization of the elongation factor genes as previously identified in R. prowazekii (fig. 2B ). We conclude that the atypical organization of the elongation factor genes in Rickettsia represents a rearrangement event that occurred subsequent to the divergence of the genus Rickettsia from the other {alpha}-proteobacteria, but before the divergence of the TG from the SFG.

RPEs Downstream of the Elongation Factor Genes
The variation in sizes and G+C content values of the tuf-rpsJ and the fus-secE intergenic regions in Rickettsia correlates with the presence of RPEs (compare table 3 and fig. 2B ). Thus, the RPE-containing tuf-rpsJ intergenic regions of R. rickettsii, R. sibirica, R. parkeri, R. rhipicephali, and R. helvetica are 200 bp in size and have mean G+C content values of 32%–37%, as expected for coding sequences in Rickettsia (Andersson et al. 1998Citation ). In contrast, the RPE-lacking spacers of R. prowazekii, R. typhi, R. montana, R. felis, and R. bellii are approximately 100 bp shorter and have G+C content values of only 17%–20%, as expected for noncoding sequences (Andersson et al. 1998Citation ). Likewise, the fus-secE spacer region varies in size from 205 nucleotides in R. bellii that lacks an RPE to about 400 bp in the RPE-containing spacers. The high G+C content of the fus-secE spacer regions is explained by the presence of a gene coding for tRNATrp in this region.


View this table:
[in this window]
[in a new window]
 
Table 3 Lengths and G+C Contents of the tuf-rpsJ and the fus-secE Spacer Regions

 
Vertical Transmission of RPEs
To analyze whether the RPEs are related by vertical descent or by intra-genomic expansion, we determined the relationships of RPE-tuf and RPE-fus to the complete set of RPEs in R. conorii. To begin with, we analyzed the most slowly evolving RPEs in R. conorii, i.e., those that are located inside protein-coding genes. A phylogenetic reconstruction (fig. 3A ) shows that the RPE-FUS peptides from the various Rickettsia species form a cluster with bootstrap support values of 100% and 99% in the NJ and MP methods, respectively. The order of divergence is in perfect agreement with the expected species divergence, with R. helvetica and R. felis representing early diverging taxa (compare figs. 1B and 3A ). Likewise, the RPE-TUF peptides form a cluster, although the bootstrap support values are lower for this cluster (51%–78%), most likely because of inactivation and thereby a more rapid rate of sequence divergence in some species (see subsequent discussion). Nevertheless, for this cluster too the order of divergence is as expected.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 3.—Relationships of RPEs inferred from phylogenetic analyses. Phylogenetic relationship of (A) the aligned amino acid sequences of the Rickettsia conorii RPEs located inside coding regions with RPE-TUF and RPE-FUS and (B) the aligned nucleotide sequences of the R. conorii RPEs located in noncoding regions with RPE-tuf and RPE-fus. Branch lengths in A are proportional to those reconstructed with the neighbor-joining (NJ) method, whereas the consensus topology reconstructed with the maximum parsimony (MP) method is shown in B. Values at nodes indicate the percentages of 500 NJ bootstraps (thick numbers) and MP bootstraps using FastStep as search criterion (cursive numbers). NJ and MP methods gave identical divergences at all nodes with bootstrap support values above 70% in the NJ (A) and MP trees (B). Only bootstrap support values above 65% (NJ) and 50% (MP) are shown in the trees

 
To examine the relationship of RPE-tuf and RPE-fus to the intergenic RPEs in R. conorii, we aligned the corresponding nucleotide sequences and tried to reconstruct their internal relationships (fig. 3B ). Because of the lack of selective constraints to maintain the reading frame open, many of these RPEs show signs of frameshift mutations and degradation (Ogata et al. 2000Citation ), which makes the topology of this tree less robust than the tree reconstructed from the RPEs located in coding regions. Nevertheless, in this analysis too we found that the RPE-tuf as well as the RPE-fus sequences clustered in a species-specific manner (85%–100% bootstrap support).

Any pairs of RPEs related by proliferation within the R. conorii genome would easily be recognized as a cluster in this analysis, as inferred from the short branches and high bootstrap support values for the RPE-tuf and RPE-fus clusters (fig. 3 ). No such strong clustering is observed. On the contrary, the 19 RPEs located inside coding regions were found to be more divergent from each other than either of the RPE-tuf and RPE-fus sequences were among the pathogenic SFG (R. rickettsii, R. sibirica, R. conorii, and R. parkeri) (fig. 3 ). Only two significant clusters could be identified among the coding RPEs: ubiH and RP167 (65%–87% bootstrap support) and orf3 and rlpA (71%–77% bootstrap support), but even for these, branch lengths were much longer than for RPE-fus and RPE-tuf in the pathogenic SFG clusters. No additional clusters with strong bootstrap support values were observed for the 25 RPEs located in noncoding regions (fig. 3 ) and for the complete set of RPEs (data not shown). This suggests that the RPEs identified in the R. conorii genome are the result of proliferation before speciation within the SFG. However, because of the short sizes of the RPEs, it is difficult to elucidate the exact order of duplication and divergence.

Loss and Deterioration of RPEs
The placement of R. montana and R. felis in phylogenetic trees based on the elongation factor genes (fig. 1B ) and the RPE-fus sequences (fig. 3A ) suggests that the absence of an RPE-tuf sequence in these two species is best explained by two independent losses. A 7-bp repeated sequence (AAGATGT) flanking the RPE-tuf element may have served as the site of excision (fig. 4A ). RPE-tuf sequences could also not be identified in R. prowazekii, R. typhi, and R. bellii. These were most likely lost by a similar mechanism. R. prowazekii and R. typhi, members of the typhus group, were found to contain highly fragmented sequence remnants of the RPE-fus sequences identified in the SFG (fig. 4B ). In these species, at least six deletion events have to be inferred, ranging in size from one to 18 nucleotides, some of which seem to have been acquired before the separation of the two species. An extreme example of RPE-deterioration is observed in the early diverging species R. bellii, where only nine nucleotides align perfectly with the RPEs in the other species, indicating that more than 95% of the RPE has been eliminated from this species.



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 4.—Alignment of the RPEs and flanking nucleotides in (A) the tuf-rpsJ and (B) the fus-secE intergenic regions of RPR (Rickettsia prowazekii), RTY (R. typhi), RPA (R. parkeri), RSI (R. sibirica), RRI (R. rickettsii), RMO (R. montana), RRH (R. rhipicephali), RFE (R. felis), RHE (R. helvetica), and RBE (R. bellii). Boxes indicate a short repetitive element that is flanking the site of deletions in RPR, RTY, RMO, RFE, and RBE. Arrows indicate the 5'- and 3'-ends of RPE-tuf and RPE-fus. The in-frame termination codon in RPE-fus of R. rhipicephali is marked with a shaded color. Nucleotides not determined ()

 
We reconstructed the ancestral open reading frames in the RPE-tuf and RPE-fus sequences by removing, adding, or altering nucleotides at sites of frameshift mutations and stop codons (table 4 ). The reconstructed RPE-fus gene sequences displayed codon position-dependent G+C content values (G+C = 35.6%; GC3s = 19.7%, on the average), i.e., features normally associated with genes having no mutational defects (Andersson et al. 1998Citation ). Likewise, the base composition patterns of the RPE-fus sequences (G+C = 39.4%, GC3s = 27.6%, on the average) indicate that they are presently, or have until recently been, influenced by selective constraints, although it should be noted that the GC3s values are higher than for RPE-fus. Because RPE-tuf in R. parkeri, R. sibirica, R. conorii, and R. rickettsii share an identical deletion event, it seems likely that the open reading frame in these species was destroyed before species divergence. Indeed, these sequences have accumulated nucleotide substitutions to the same extent at each of the three codon positions (two, four, and three substitutions at first, second, and third codon positions, respectively), as expected for neutrally evolving sequences (Andersson and Andersson 1999aCitation , 1999bCitation , 2001Citation ).


View this table:
[in this window]
[in a new window]
 
Table 4 Base Frequencies for RPE-tuf and RPE-fus in Rickettsia

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Genome sizes in eukaryotes correlate well with frequencies of repeated sequences and noncoding DNA. Microbial genomes, however, contain less than 10% of repeated sequences and noncoding DNA, with the exception of R. prowazekii (Andersson et al. 1998Citation ) and Mycobacterium leprae (Cole et al. 2001Citation ). Although the molecular processes that generate genome diversity at the sequence level have been described in great detail for microbes, the mechanisms of genome size expansion and degradation are less well understood. Within the {alpha}-proteobacteria (Olsen, Woese, and Overbeek 1994Citation ), genome sizes range from 1.1 Mbp in Rickettsia to 8.7 Mbp in Bradyrhizobium japonicum (Krawiec and Riley 1990Citation ). We have previously suggested that intracellular parasites with small genomes, like Rickettsia, have evolved from free-living {alpha}-proteobacteria with larger genome sizes by reductive evolutionary processes (Andersson and Kurland 1995Citation , 1998Citation ). Indeed, we have shown in this study that the elongation factor genes that are conservatively organized in free-living {alpha}-proteobacteria are rearranged in Rickettsia.

Genomic streamlining and rearrangements may result from a variety of processes, for example by intrachromosomal recombination at duplicated genes and repeated sequences. Indeed, we have previously suggested that a recombination event between two ancestral tuf genes in Rickettsia caused an inversion of the intervening segment (Andersson and Kurland 1995Citation ; Syvänen et al. 1996Citation ). The rate of recombination at such sites will eventually decrease as more and more of the repeated sequences are consumed in the recombination process, unless the rate of repeat regeneration is very fast. It is therefore not surprising that highly reduced genomes often have very low frequencies of repeated sequences (Andersson et al. 1998Citation ; Stephens et al. 1998Citation ; Frank, Amiri, and Andersson 2002). In contrast to the scarcity of repeated sequences in the R. prowazekii genome, however, as many as 45 copies of a repeated palindromic element were found in the genome of its close relative, R. conorii (Ogata et al. 2000Citation ). Even more astonishing was that as many as 19 of these are located inside open reading frames (ORFs; Ogata et al. 2000Citation ). When the R. conorii RPEs were used to search for similar sequences in the R. prowazekii genome, 10 highly divergent RPEs were identified in protein-coding genes also in this species (Ogata et al. 2000Citation ).

When were the RPEs acquired in Rickettsia? Dwyer has suggested that RPEs are modern-day vestiges of the early RNA world that have survived through evolution more or less intact (Dwyer 2001Citation ). If so, they would have arisen thousands of millions of years ago. Ogata and his colleagues, on the other hand, believe that RPE insertion occurred after the divergence of speciation in Rickettsia (Ogata, Audic, and Claverie 2001Citation ); if so, they would only be a few million years old. In order to place the acquisition of the RPEs in the evolutionary history of Rickettsia, we have superimposed the distribution of the RPEs located in the downstream regions of the elongation factor genes onto a phylogeny of Rickettsia inferred from the elongation factor genes. The analyses strongly suggest that the proliferation of RPEs predate speciation within the genus Rickettsia.

First, the high degree of sequence conservation and the finding of identical insertion sites for RPEs in a variety of Rickettsia species, including members from both the TG and the SFG, argues that the acquisition of RPEs occurred early in the lineage leading to Rickettsia. Some of the ancestral RPEs may already have been lost or they may no longer be recognizable by sequence similarity searches. Indeed, the highly fragmented RPE-fus sequences in R. prowazekii and R. typhi were identified only because of their location at an RPE-insertion site in the other species. Thus, the most parsimonious explanation for species-specific differences in the host proteins targeted by RPEs is recent RPE-loss in some lineages, rather than recent RPE-gain in the other lineages.

The ancestry of each RPE can be studied in more detail by exploiting the fact that any recent proliferation would become visible in the phylogenetic reconstructions as clusters of similar repeats. However, there is no support for a recent amplification, or any evidence that RPEs can proliferate in a selfish manner. On the contrary, the phylogenetic relationship of the RPE-tuf and RPE-fus sequences showed that these are related by vertical descent rather than by a recent intragenomic expansion. This suggests that the RPEs flourished before speciation within the genus Rickettsia and that there are no or only few recently born RPEs. These results are incompatible with the idea that the RPEs correspond to recently proliferating sequences (Ogata et al. 2000Citation ).

Is there any functional role associated with the spread of RPEs? Dwyer has noted that many proteins in multigene families have evolved from smaller duplication units encoded by short inverted repeat segments that resemble transposable genetic elements (Dwyer 1998Citation ). These transposable exons, "trexons," are similar to the RPEs in that they encode a helix and a short turn or loop. It has therefore been suggested that a putative function of the RPEs and the trexons was to support protein modularity by encoding protein segments that are specialized for participating in protein-protein and protein-DNA interactions (Dwyer 1998Citation ). Although it cannot be ruled out that the RPEs at some point or another contributed important functional characteristics to the genes in which they were inserted, our analyses suggest that modern RPEs are nothing but silent passengers in the genomes in which they reside.

The survival rate of selfish DNA will depend on two factors: the rate of intragenomic proliferation balanced by the rate of spontaneous deletions. Studies of Rickettsia pseudogenes have shown that the mutation rate for small deletions predominates over small insertions (Andersson and Andersson 1999aCitation , 1999bCitation , 2001Citation ). Likewise, deletions were found to be more common than insertions in the RPE-tuf sequences, suggesting that these too evolve as neutral sequences. Furthermore, we have shown that RPE-tuf has been lost in two lineages independent of each other, presumably by recombination at short repeated sites. Because insertions into protein-coding genes will impose a stabilizing effect on the element, it may not be surprising that the few remaining RPEs in the R. prowazekii genome are located inside protein-coding genes (Ogata et al. 2000Citation ), with the exception of the partial RPE-fus identified in this study.

If all of the identified RPEs in R. conorii originated before speciation in Rickettsia, as indicated by our phylogenetic analysis, the decay of RPEs must have occurred more rapidly in the TG than in the SFG. This is reminiscent of the finding that gene loss has been much more extensive in R. prowazekii than in R. conorii (Ogata, Audic, and Claverie 2001Citation ). Of the 834 previously identified genes in R. prowazekii, only 30 have been lost from R. conorii, whereas about half of the 552 genes uniquely present in R. conorii have been lost from R. prowazekii, and the remaining 229 are present as highly degraded gene remnants. The extent to which differences in growth properties, population structures, and mode of transmission have influenced the dynamics of gene loss and RPE degradation in the TG and SFG Rickettsia remains to be determined.

Taken together, the observed phylogenetic relationships and the ongoing degradation of RPEs suggest that they are neither modern-day remnants of an ancestral RNA world (Dwyer 2001Citation ) nor the outcome of a recent proliferation within the R. conorii genome (Ogata, Audic, and Claverie 2001Citation ). Although it can not be ruled out that RPEs may in rare cases have conferred novel functional features, the unique presence of RPEs in protein-coding genes of Rickettsia is most likely explained by a reduced selective efficiency of the encoded gene products because of recurrent bottlenecks and small population sizes.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank M. Thollesson for discussions and assistance with the phylogenetic analyses. This work was funded by the Swedish Natural Sciences Research Council, the Swedish Foundation for Strategic Research, and the Knut and Alice Wallenberg Foundation.


    Footnotes
 
Mark Ragan, Reviewing Editor

Keywords: elongation factors molecular evolution phylogeny RPE Rickettsia selfish DNA Back

Address for correspondence and reprints: Siv G. E. Andersson, Department of Molecular Evolution, Norbyvägen 18C, S-752 36 Uppsala, Sweden. siv.andersson{at}ebc.uu.se Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Andersson J. O., S. G. E. Andersson, 1997 Genomic rearrangements during evolution of the obligate intracellular parasite Rickettsia prowazekii as inferred from an analysis of 52,015 bp nucleotide sequence Microbiology 143:2783-2795[Abstract]

    ———. 1999a. Genome degradation is an ongoing process in Rickettsia Mol. Biol. Evol 16:1178-1191[Abstract]

    ———. 1999b. Insight into the evolutionary process of genome degradation Curr. Opin. Genet. Dev 9:664-671[ISI][Medline]

    ———. 2001 Pseudogenes, junk DNA and the dynamics of Rickettsia genomes Mol. Biol. Evol 18:829-839[Abstract/Free Full Text]

    Andersson S. G. E., C. G. Kurland, 1995 Genomic evolution drives the evolution of the translation system Biochem. Cell Biol 73:775-787[ISI][Medline]

    ———. 1998 Reductive evolution of resident genomes Trends Microbiol 6:525-536

    Andersson S. G. E., P. M. Sharp, 1996 Codon usage and base composition in Rickettsia prowazekii J. Mol. Evol 42:525-536[ISI][Medline]

    Andersson S. G. E., D. R. Stothatd, P. Fuerst, C. G. Kurland, 1999 Molecular phylogeny and rearrangement of rRNA genes in Rickettsia species Mol. Biol. Evol 18:987-995[Abstract/Free Full Text]

    Andersson S. G. E., A. Zomorodipour, J. O. Andersson, T. Sicheritz-Pontén, U. C. M. Alsmark, R. F. Podowski, A. K. Näslund, A.-S. Eriksson, H. H. Winkler, C. G. Kurland, 1998 The genome sequence of Rickettsia prowazekii and the origin of mitochondria Nature 396:133-140[ISI][Medline]

    Bachellier S., J. M. Clement, M. Hofnung, 1999 Short palindromic repetitive DNA elements in enterobacteria: a survey Res. Microbiol 150:627-639[ISI][Medline]

    Cole S. T., K. Eiglmeier, J. Parkhill, et al. (44 co-authors) 2001 Massive gene decay in the leprosy bacillus Nature 409:1007-1011.[ISI][Medline]

    Dwyer D. S., 2001 Selfish DNA and the origin of genes Science 291:252-253.[Free Full Text]

    Dwyer D. S., 1998 Assembly of exons from unitary transposable genetic elements: implications for the evolution of protein-protein interactions J. Theor. Biol 194:11-27[ISI][Medline]

    Frank C., H. Amiri, S. G. E. Andersson, 2002 Genome deterioration: loss of repeated sequences and accumulation of junk DNA Genetica (in press)

    Galtier N., M. Gouy, C. Gautier, 1996 SeaView and Phylo_win, two graphic tools for sequence alignment and molecular phylogeny Comput. Appl. Biosci 12:543-548[Abstract]

    Gregory T. R., P. D. Herbert, 1999 The modulation of DNA content: proximate causes and ultimate consequences Genome Res 9:317-324[Abstract/Free Full Text]

    Hartl D. L., 2000 Molecular melodies in high and low Nature Rev. Genet 1:145-149[ISI][Medline]

    International Human Genome Sequencing Consortium. 2001 Initial sequencing and analysis of the human genome Nature 409:860-921[ISI][Medline]

    Karlberg O., B. Canbäck, C. G. Kurland, S. G. E. Andersson, 2000 The dual origin of the yeast mitochondrial proteome Yeast Comp. Funct. Genomics 17:170-187

    Keeling P. J., R. L. Charlebois, W. F. Doolittle, 1994 Archaebacterial genomes: eubacterial form and eukaryotic content Curr. Opin. Genet. Dev 4:816-822[Medline]

    Krawiec S., M. Riley, 1990 Organization of the bacterial chromosome Microbiol. Rev 54:502-539[ISI]

    Kurland C. G., S. G. E. Andersson, 2000 Origin and evolution of the mitochondrial proteome Microbiol. Mol. Biol. Rev 64:786-820[Abstract/Free Full Text]

    Li W.-H., 1993 Unbiased estimation of the rates of synonymous and nonsynonymous substitution J. Mol. Evol 36:96-99[ISI][Medline]

    Lloyd A. T., P. M. Sharp, 1992 CODONS: a microcomputer program for codon usage analysis J. Hered 83:239-240[ISI][Medline]

    Mahillon J., M. Chandler, 1998 Insertion sequences Microbiol. Mol. Biol. Rev 3:725-774

    Mahillon J., C. Leonard, M. Chandler, 1999 IS elements as constitutent of bacterial genomes Res. Microbiol 150:675-587[ISI][Medline]

    Ochman H., J. G. Lawrence, E. A. Groisman, 2000 Lateral gene transfer and the rate of bacterial innovation Nature 405:299-304[ISI][Medline]

    Ogata H., S. Audic, J.-M. Claverie, 2001 Response Science 291:252-253[Free Full Text]

    Ogata H., S. Audic, V. Barbe, F. Artigueave, P. E. Fournier, D. Raoult, C. M. Claverie, 2000 Selfish DNA in protein coding genes Science 290:347-350.[Abstract/Free Full Text]

    Olsen G. J., C. R. Woese, R. Overbeek, 1994 The winds of (evolutionary) change: breathing new life into microbiology J. Bacteriol 175:3893-3896[Abstract]

    Petrov D. A., T. A. Sangster, J. S. Johnston, D. L. Hartl, K. L. Shaw, 2000 Evidence for DNA loss as a determinant of genome size Science 287:1060-1062[Abstract/Free Full Text]

    Pretzman C. I., Y. Rikihisa, D. Ralph, J. C. Gordon, S. Bech Nielsen, 1987 Enzyme-linked immunosorbent assay for Potomac Horse Fever disease J. Clin. Microbiol 25:31-36[Medline]

    Roux V., D. Raoult, 1995 Phylogenetic analysis of the genus Rickettsia by 16S rDNA sequencing Res. Microbiol 146:385-396[ISI][Medline]

    Roux V., E. Rydkina, M. Eremeeva, D. Raoult, 1997 Citrate synthase gene comparison, a new tool for phylogenetic analysis, and its application for the rickettsiae Int. J. Syst. Bacteriol 47:252-261[Abstract/Free Full Text]

    Rudd K., 1999 Novel intergenic repeats of Escherichia coli K-12 Res. Microbiol 150:653-664[ISI][Medline]

    Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]

    Sicheritz-Ponten T., S. G. E. Andersson, 1997 GRS: a graphic tool for genome retrieval and segment analysis Microbial Comp. Genomics 2:123-139[Medline]

    Staden R., 1996 The Staden analysis package Mol. Biotechnol 5:233-241[ISI][Medline]

    Stephens R. S., S. Kalman, C. Lammel, et al. (12 co-authors) 1998 Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis Science 282:754-759[Abstract/Free Full Text]

    Stothard D. R., P. A. Fuerst, 1995 Evolutionary analysis of the spotted fever and typhus groups of Rickettsia using 16S rRNA gene sequences Syst. Appl. Bacteriol 18:52-61

    Swofford D. L., 1998 PAUP: phylogenetic analysis using parsimony (and other methods), Version 4 Massachusetts: Sinauer Associates, Sunderland

    Syvänen A.-C., H. Amiri, A. Jamal, S. G. E. Andersson, C. G. Kurland, 1996 A chimeric disposition of the elongation factor genes in Rickettsia prowazekii J. Bacteriol 178:6192-6199[Abstract]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4683-4680

    Wächterhäuser G., 1998 Towards a reconstruction of ancestral genomes by gene cluster alignment Syst. Appl. Microbiol 21:473-477.[ISI]

Accepted for publication February 25, 2002.