Evolutionary Genomics of Chromoviruses in Eukaryotes

Benjamin Gorinsek*, Franc Gubensek*,{dagger} and Dusan Kordis*

* Department of Biochemistry and Molecular Biology, Joef Stefan Institute, Ljubljana, Slovenia
{dagger} Department of Chemistry and Biochemistry, Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia

Correspondence: E-mail: dusan.kordis{at}ijs.si.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 Literature Cited
 
The diversity, origin, and evolution of chromoviruses in Eukaryota were examined using the massive amount of genome sequence data for different eukaryotic lineages. A surprisingly large number of novel full-length chromoviral elements were found, greatly exceeding the number of the known chromoviruses. These new elements are mostly structurally intact and highly conserved. Chromoviruses in the key Amniota lineage, the reptiles, have been analyzed by PCR to explain their evolutionary dynamics in amniotes. Phylogenetic analyses provide evidence for a novel centromere-specific chromoviral clade that is widespread and highly conserved in all seed plants. Chromoviral diversity in plants, fungi, and vertebrates, as shown by phylogenetic analyses, was found to be much greater than previously expected. The age of plant chromoviruses has been significantly extended by finding their representatives in the most basal plant lineages, the green and the red algae. The evolutionary origin of chromoviruses has been found to be no earlier than in Cercozoa. The evolutionary history and dynamics of chromoviruses can be explained simply by strict vertical transmission in plants, followed by more complex evolution in fungi and in Metazoa. The currently available data clearly show that chromoviruses indeed represent the oldest and the most widespread clade of Metaviridae.

Key Words: Chromovirus • retrotransposon • gypsy • evolutionary genomics


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 Literature Cited
 
Retrotransposons are class I transposable elements (TE) that transpose through reverse transcription of an RNA intermediate. They are present in all eukaryotic genomes, where they constitute the most abundant class of mobile DNA. In many cases, they comprise over 50% of the nuclear DNA, a situation that may have arisen in just a few Myr. Retrotransposons play a central role in the structure, evolution, and function of eukaryotic genomes (Bennetzen 2000; Kidwell and Lisch 2001).

Retrotransposons bearing long terminal repeats (LTR) are classified into vertebrate retroviruses (Retroviridae), hepadnaviruses, caulimoviruses, Ty1/copia (Pseudoviridae), Ty3/gypsy (Metaviridae), BEL, and DIRS1 groups (Malik and Eickbush 2001). Metaviridae populate many eukaryotic genomes, such as oomycetes (Judelson 2002), slime molds (Glockner et al. 2001), fungi (Neuveglise et al. 2002), plants (Suoniemi, Tanskanen, and Schulman 1998; Marin and Llorens 2000; Friesen, Brandes, and Heslop-Harrison 2001), and animals (Bae et al. 2001; Butler et al. 2001; Volff, Korting, and Schartl 2001). Nine clades of Metaviridae have been recognized on the basis of phylogenetic analyses of combined reverse transcriptase (RT), ribonuclease H (RH), and integrase (IN) domains (Malik and Eickbush 1999; Bae et al. 2001). Metaviridae have been classified into three genera on the basis of the presence (Errantivirus) or absence (Metavirus) of an envelope gene (env) and the presence of a chromodomain (chromointegrase) (Chromovirus) (Malik and Eickbush 1999; Marin and Llorens 2000).

Chromoviruses (Marin and Llorens 2000) are the most widespread clade of Metaviridae and are present in genomes of plants, fungi, and vertebrates. In contrast to the numerous full-length chromoviruses identified in fungi (Hamann, Feller, and Osiewacz 2000, and references therein) and in plants (Marin and Llorens 2000, and references therein), the only full-length elements known in vertebrates are from the pufferfish Takifugu rubripes (Poulter and Butler 1998; Butler et al. 2001). In addition to Takifugu sushi elements, a highly corrupted Hsr1 element from the salamander Hydromantes supramontis has been described (Marracci et al. 1996; Butler et al. 2001). Numerous partial chromoviral sequences from fishes, amphibians, and reptiles are known (Miller et al. 1999; Butler et al. 2001). Low copy number Hur1 element, the distant relative of the vertebrate sushi elements, has been discovered in the human genome (Butler et al. 2001). Genome sequence data from model organisms provide a unique opportunity to obtain an insight into the complete diversity of any TE group. Chromoviruses are the only Metaviridae representatives with Eukaryota-wide distribution. Because the evolution of chromoviruses is still poorly understood (Marin and Llorens 2000; Butler et al. 2001), we have addressed the questions regarding their origin, diversity, and evolutionary dynamics in eukaryotic genomes.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 Literature Cited
 
Data Mining
All database searches were performed online and were finished in August 2003. The databases analyzed were the nonredundant (NR), EST, GSS, and HTGS databases at the National Center for Biotechnology Information (NCBI) (www.ncbi.nlm.nih.gov). In addition to the Ensembl (www.ensembl.org) databases, we searched the European Bioinformatics Institute (EBI) (www.ebi.ac.uk), DNA Data Bank of Japan (DDBJ) (www.ddbj.nig.ac.jp), Joint Genome Institute (JGI) (www.jgi.doe.gov), the Wellcome Trust Sanger Institute (www.sanger.ac.uk), and the Institute for Genomic Research (TIGR) (www.tigr.org) databases. Diverse taxon-specific genome databases were searched at the JGI (Chlamydomonas reinhardtii, Ciona intestinalis, Phanerochaete chrysosporium, Phytophthora sojae, Populus trichocarpa, Thalassiosira pseudonana, and Xenopus [Silurana] tropicalis) and at the NCBI for all major eukaryotic lineages. To detect all the available sequences corresponding to the chromoviruses, database searches were performed iteratively, using first the sequences of the core chromovirus data set (see table 1 in Supplementary Material online), and then the novel chromoviral sequences. At the NCBI Web site, comparisons were performed using different Blast programs (Altschul et al. 1997). At the JGI and other Web sites, the TBlastN program was used with the e-values set at 10–5 and otherwise with default settings. Numerous full-length chromoviral representatives or their separate domains, such as gag, protease, RT/RH, and chromointegrases have been used as queries. For the translation of chromoviral DNA sequences, the Translate program (www.expasy.org/tools/dna.html) was used, and the Blast2 program (www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html) was used to identify their LTRs. Closely related groups of full-length chromoviruses displaying greater than 90% amino acid similarity between their reverse transcriptases have been designated as families.


View this table:
[in this window]
[in a new window]
 
Table 1 Novel Chromoviruses Characterized in This Study.

 
Phylogenetic Analysis of Chromoviruses
The amino acid sequences of the combined RT and RH domains of chromoviral elements were aligned using ClustalX (Thompson et al. 1997). In-frame stop codons were designated by X. Gaps in aligned sequences were removed for the purpose of analysis. Phylogenetic trees were inferred using the neighbor-joining (NJ) method (Saitou and Nei 1987), implemented in Treecon (Van de Peer and De Wachter 1997) and in MEGA version 2.1 (Kumar et al. 2001) programs. As outgroup, we used Drosophila buzzatii Osvaldo (AJ133521) element. The significance of the various phylogenetic lineages was assessed by bootstrap analysis. To confirm that the novel elements belong to the chromoviruses, we included representatives of all known Metaviridae clades. The consensus NJ tree (fig. 1) was inferred using the combined RT and RH domains of all known Metaviridae clades (Malik and Eickbush 1999; Bae et al. 2001; Goodwin and Poulter 2002); this circular tree was made using the MEGA 2.1 program (Kumar et al. 2001).



View larger version (31K):
[in this window]
[in a new window]
 
FIG. 1. Evolutionary position of chromoviruses among Metaviridae. A circular NJ phylogenetic tree shows the evolutionary relationships between the different clades of the Metaviridae and the evolutionary position of the chromoviruses

 
PCR Amplification, Cloning, and Sequencing of Chromoviruses
The reptilian genomic DNA samples were those tested previously for non-LTR retrotransposons (Kordi and Gubenek 1998). Genomic DNA from additional species was prepared using a standard proteinase K digestion–phenol/chloroform cleaning–ethanol precipitation method.

The PCR analyses were based on the degenerated primers specific for the following conserved motifs: IRPS, YPLP, KTAF (RT domain), and DALS (RH domain) (fig. 2). The sense primers LAS2 (CTKGCCTCTGGGRATYATWWGACC), TVKS (ACBGTYAARAAYAAGTACCCWCTYCC), and GDEW (GGRGATGAGTGGAAGACGGCTTTCAAC) were used in combination with one of the antisense primers: ADS (GKCKGGAKAGGGCATCAGC) or ADAL2 (TCGNGANAGNGCATCNGC). Amplification of the approximately 1 kb fragment was performed on 250 ng genomic DNA, 1x PCR buffer with 3 mM magnesium chloride, 200 mM dNTP, 150 nM of each primer, and 0.7 units of AmpliTaq polymerase (PerkinElmer) in a total reaction volume of 25 µl. The cycling programs consisted of predenaturation at 95°C for 5 min followed by 35 cycles of denaturation at 95°C for 30 s, annealing at 53°C for 30 s, extension at 72°C for 80 s and a final extension at 72°C for 10 min. PCR products were sized on 1.5% agarose gel, and individual DNA bands in the range of 700 to 1,300 bp were purified by QIAquick columns (Qiagen) and ligated into a plasmid vector with 3'-T overhangs at the insertion site (pGEM-T-Easy, Promega). Both strands of the positive clones were sequenced with the BigDye Ready Reaction Kit and analyzed on an ABI 310 DNA sequencer (Applied Biosystem).



View larger version (10K):
[in this window]
[in a new window]
 
FIG. 2. Structural organization of chromoviruses. Chromoviruses have long terminal repeats in direct orientation at each end. The genes within the chromovirus encode Gag and Pol proteins. The following enzymatic domains in pol gene are indicated: PR, protease; RT, reverse transcriptase; RH, RNaseH; IN, integrase; and CHR, chromodomain. Typical sequence motifs are shown above the pol gene. The locations and names of the oligonucleotide primers used in this study are shown below the pol gene. Characteristic sequence motifs in LTRs are shown (5'-TG and CA-3'), as well as direct repeats (DR). Other sequences featured are PBS (primer binding site) and PPT (polypurine tract)

 

    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 Literature Cited
 
Identification of Novel Chromoviruses
A comprehensive survey of chromoviruses was conducted using the massive amount of genome sequence data for different eukaryotic lineages. Novel elements were detected by homology-based searching with previously identified plant, fungal, and vertebrate chromoviruses. Fifteen members of the chromoviruses have been used as the queries for searching sequence databases because they constitute the currently known chromoviral diversity (see table 1 in Supplementary Material online). Searching was repeated with the novel chromoviral sequences. More than 100 distinct, novel, full-length chromoviral representatives have been identified, the majority of them being from zebrafish (> 30 distinct elements) and angiosperms (> 30 distinct elements) and the remainder being from fungi (e.g., Coprinopsis cinerea, Phanerochaete chrysosporium, Ustilago hordei, and others), green algae (Chlamydomonas reinhardtii), and amphibia (Xenopus tropicalis) (table 1).

The majority of the novel structurally intact chromoviruses have conserved features of LTRs, including the dinucleotide end sequences (5'-TG...CA-3') that are parts of the inverted terminal repeats (fig. 2). We have found that most novel chromoviruses are flanked by 5-bp target site duplications. The size of LTRs differs between different families, clades, or species; in the majority of cases both LTRs shared more than 95% identity (table 1). The primer-binding sites (PBSs) of chromoviral elements from plants differ from those in fungi and animals. In the case of plant chromoviruses, PBSs are complementary to the 3' end of an initiator methionine tRNA (iMet). Fungal and vertebrate chromoviruses use the self-priming mechanism for initiating reverse transcription (Butler et al. 2001). In all novel chromoviral elements, we found the same PBS patterns noted above. Additionally, a polypurine tract (PPT), the primer for second-strand cDNA synthesis, was observed upstream of the 3' LTR in the majority of novel chromoviral elements. Chromoviral elements can be found either as a single ORF, encoding Gag/Pol polyprotein, or as two ORFs, encoding separate Gag and Pol. Gag protein sequences show very low levels of conservation and differ greatly between members of the different families (data not shown). Pol polyprotein shows higher levels of conservation, especially the RT/RH domains. Most of the novel elements have structurally intact ORFs and LTRs or are slightly degenerated copies with only a few in-frame stop codons, indicating recent transposition. Some chromoviral elements have short LTRs (< 500 bp), whereas some plant chromoviral elements have much longer LTRs (> 1,500 bp). The longest LTRs of 4,149 bp has been found in the Ljchromovir-4 element from Lotus japonicus and is one of the longest currently known chromoviruses at nearly 13 kb long. We found a number of highly conserved elements in Coprinopsis cinerea, in contrast to previously reported highly degenerated chromoviral elements from Neurospora crassa, Podospora anserina, and Aspergillus nidulans.

For some taxonomic groups, no full-length chromoviral elements can be obtained from the sequence databases, such as for Cercozoa (plasmodiophorid Polymyxa betae), Rhodophyta (Porphyra yezoensis), and Bryophyta (Physcomitrella patens). In all cases, the most typical part of chromoviruses, the chromointegrase, can be easily recognized (fig. 3). No chromoviruses have been found in the genome databases of the most basal eukaryotic lineages (Diplomonadida, Euglenozoa, and Alveolata). Among Metazoa, no chromoviruses can be found in the genome databases of Protostomia and the most basal Deuterostomia lineages (Echinodermata and Urochordata). Consistent with the abundance of retrotransposons in plants and teleost fishes, their genomes harbored many novel full-length chromoviruses.



View larger version (112K):
[in this window]
[in a new window]
 
FIG. 3. Novel chromointegrases are present in the basal eukaryotic lineages. This alignment of novel chromointegrases was constructed with the program ClustalX (Thompson et al. 1997). Identical residues are in black, and conservative substitutions are in gray. Most sequences were obtained from the GenBank; species and accession numbers are included. Chlamydomonas reinhardtii and Phytophthora sojae sequences have been obtained from the corresponding JGI genome Web sites

 
Chromoviruses in the Kingdom Plantae
Chromoviral Diversity in Plant Genomes
The search for chromoviruses in complete genomes (Oryza sativa and Arabidopsis thaliana) and numerous large genomic sequences from different angiosperms (e.g., Medicago truncatula, Lotus japonicus, and Zea mays) showed a very diverse chromoviral repertoire in the angiosperm genomes. The presence of at least five distinct monophyletic clades in plant genomes is shown in the NJ tree (fig. 4), three of which have been previously reported (Marin and Llorens 2000). We renamed these clades on the following way: the previously named ancestral lineage IV was renamed Reina clade; the ancestral lineage II was renamed Tekay clade; and the ancestral lineage III was renamed Galadriel clade. The number of novel full-length elements in the above-mentioned clades is significantly increased (fig. 4).



View larger version (29K):
[in this window]
[in a new window]
 
FIG. 4. Diversity of chromoviruses in Viridiplantae. A rooted NJ tree using Poisson correction model and D. buzzatii Osvaldo element as outgroup. The NJ tree represents the bootstrap consensus after 1,000 replicates; nodes with confidence values greater than 50% are indicated. Most sequences were obtained from the GenBank; genus names and accession numbers are included. Chlamydomonas reihardtii and Populus balsamifera ssp. trichocarpa sequences have been obtained from the corresponding JGI genome Web sites. For the annotated sequences, we added the name of the element. Plant chromoviral clades were named after the typical elements (mostly from maize). Two novel plant chromoviral clades are in bold. The branches of the novel Lotus japonicus chromoviruses are in bold, to show their large chromoviral diversity. Novel chromoviral elements found in this study are in bold

 
The fourth novel centromere-specific clade was named the CRM clade, after the Zea mays CRM element. Their insertion specificity for centromeres has been confirmed through FISH analyses (Miller et al. 1998) and genome sequences of large centromeric regions (Hudakova et al. 2001). The centromere-specific plant chromoviruses from angiosperms, represented by previously reported monocot elements (Hordeum vulgare cereba, Zea mays CRM and Oryza sativa RIRE7), and novel CRM-like elements from dicots, as found here in the genomes of Arabidopsis thaliana, Lotus japonicus, Brassica oleracea, and Medicago truncatula, are highly conserved, much more so than in any other plant chromoviral clade. Searching the NR and EST databases confirmed the presence of highly conserved CRM-like elements in gymnosperm genomes, represented by Picea glauca (AF229251) and Pinus taeda (BQ290691) (fig. 4). Indeed, the centromeric localization of an RT fragment from Pinus pinaster (clone pPpgy1) has been demonstrated by FISH analysis (Friesen, Brandes, and Heslop-Harrison 2001). Knowledge of the distribution of centromere-specific chromoviruses in plants was very limited, and they were believed to be cereal specific (Miller et al. 1998; Langdon et al. 2000; Hudakova et al. 2001; Kumekawa et al. 2001). We have considerably extended their range of distribution, showing their widespread presence in angiosperms and gymnosperms, revealing that the CRM clade is indeed widespread in seed plants (Spermatophyta) (fig. 4). It will be no surprise if they are found also in much older lineages of land plants, such as in mosses, ferns, and liverworts.

The fifth plant chromoviral clade has been found in the green algae, in the Chlamydomonas reinhardtii genome (fig. 4). In the most basal plant lineage, the red algae, we have found the sixth chromoviral clade (fig. 3), but the lack of RT/RH sequences prevents their inclusion in the phylogenetic analysis.

Among plant chromoviruses, large differences can be seen in the size of the elements and their LTRs. Tekay clade contains the largest chromoviruses, which can reach 14,522 bp (Sorghum bicolor element Leviathan-1 [AY144442]), and large variability in the sizes of the elements is apparent, from 7.7 kb (Arabidopsis thaliana Legolas) to 14.5 kb (Leviathan-1). The size of their LTRs is also quite variable, from 1.3 kb (Legolas) to 4.1 kb (Lotus japonicus, Ljchromovir-4), the majority being around 2.5 kb. Reina, Galadriel, and CRM clades have similar sizes of elements, all in the range 5 to 7 kb, but they differ in the sizes of their LTRs. The shortest LTRs are found in the Reina clade (300 to 400 bp), the middle sizes are in Galadriel clade (600 to 800 bp), and the largest are in the CRM clade (850 to 1,100 bp). The elements in the green algae-specific Chlamyvir clade are 7.5 kb, but they have relatively short LTRs (~ 400 bp).

The enormous size of many plant genomes demonstrates a great tolerance for repetitive DNA, a substantial fraction of which is composed of retrotransposons. The genome sequence data shows that chromoviruses have been very successful in plants. Big differences in chromoviral repertoires can be seen between species with small and large genomes. The phylogenetic analysis (fig. 4) shows the presence of three different chromoviral clades in the small genome of Arabidopsis thaliana, but they do not contain numerous diverse families (Marin and Llorens 2000). The rice genome harbors much higher copy numbers of distinct chromoviral elements than the compact genome of Arabidopsis thaliana (McCarthy et al. 2002).

We have found numerous structurally intact and diverse chromoviral elements in the partial genome sequence data from legume Lotus japonicus that has a genome size of 450 Mb (table 1). The diversification of particular chromoviral clades that contain numerous distinct families is apparent in the Lotus japonicus genome (fig. 4). Most of these families possess completely different LTRs and exhibit numerous differences in their ORFs (table 1). A very similar situation can be seen in rice (McCarthy et al. 2002). In contrast to the diverse chromoviral repertoire in the Lotus japonicus genome, another legume species (Medicago truncatula) with similar genome size shows much lower chromoviral diversity. The number of available BAC clones for Medicago truncatula is currently more than 10 times bigger than for Lotus japonicus, but an unusually small number of distinct full-length elements can be found.

Plant Chromoviruses Have Different Insertion Specificities
As a result of the potentially deleterious effects of TE proliferation, host organisms have developed strategies that limit the activity of TEs (Okamoto and Hirochika 2001). Therefore, TEs direct their integration to specific parts of the genome, which limits the damage they can cause to a minimum. In plant species with smaller genomes, such as Arabidopsis thaliana, retrotransposons make up a very small percentage of the genome, perhaps less than 5%, and cluster in the gene-poor pericentromeric regions (Copenhaver et al. 1999). The majority of plant LTR retrotransposons insert preferentially into the relatively silent chromatin, especially in pericentromeric regions and in centromeres (Kumar and Bennetzen 1999), thereby reducing the risk of negative selection operating on TEs (Kidwell and Lisch 2001).

The centromeric localization of the novel plant CRM clade has been clearly shown by FISH analyses (Miller et al. 1998; Hudakova et al. 2001; Kumekawa et al. 2001; Langdon et al. 2000; Friesen, Brandes, and Heslop-Harrison 2001), in contrast to all the other plant LTR retrotransposons that show disperse patterns. The plant CRM clade and the other plant chromoviral clades show clear differences in the chromointegrase sequences. They differ in the otherwise conserved sequence motifs in the C-terminal region of integrase, such as in the HPVFHS motif and in two motifs of the chromodomain (see figure 1 in Supplementary Material online). Changes in these motifs might be responsible for the specific targeting of CRM retroelements into the centromeric regions because it has been shown that the C-terminal regions of LTR retrotransposon integrases are responsible for targeting and insertion specificities (Nymark-McMahon and Sandmeyer 1999; Singleton and Levin 2002). Experimental proof has not yet been provided for any plant chromoviral element. Therefore, the crucial sequence motifs responsible for targeting to centromeres might be recognized from comparisons of chromointegrases, such as was done for restriction enzyme–like endonucleases in the R2 clade of non-LTR retrotransposons and has been subsequently confirmed experimentally (Yang, Malik, and Eickbush 1999).

Chromoviruses Are Transcriptionally Active in Plant Genomes
The majority of the retrotransposons are quiescent in somatic tissues but are activated under different stress conditions (Okamoto and Hirochika 2001). Numerous ongoing plant EST sequencing projects enable the analysis of transcripts of the known chromoviruses. Searches of the plant EST databases, using the representatives of all seed plant chromoviral clades and the oldest full-length chromoviral representatives (Crchromovir-1 from Chlamydomonas reinhardtii), show numerous matches (see table 2 in Supplementary Material online). We have found that chromoviruses are underrepresented in the EST databases, and only a few families from the particular species are transcribed (Lotus japonicus, Arabidopsis thaliana, Oryza sativa, Zea mays, and Medicago truncatula). EST matches to the plant chromoviral clades have been found in all the available Spermatophyta species and, for one group, also in mosses (Physcomitrella patens). EST matches have been also found for green (Chlamydomonas reinhardtii) and red (Porphyra yezoensis) algae, indicating that transcriptionally active chromoviruses are present in all plant genomes. Numerous high-score EST matches to the LTRs or the internal regions of the chromoviral elements are found in stressed plants or tissues, indicating chromoviral activation under biotic (fungal infection) and abiotic (drought or cold) stress conditions (see table 2 in Supplementary Material online).


View this table:
[in this window]
[in a new window]
 
Table 2 Vertebrate Species That Were Tested for the Presence of Chromoviruses by PCR.

 
Chromoviruses in the Kingdom Fungi
Diversity of Chromoviruses in Fungal Genomes
Fungal genomes are small, the majority being in the range of 10 to 30 Mb, although some more basal fungal lineages are considerably larger but still much smaller than plant genomes. We have found quite a large number of novel chromoviruses in diverse fungal genomes, such as in basidiomycetes (Coprinopsis cinereus, Phanerochaete chrysosporium, and Ustilago hordei), in ascomycetes (Coccidioides posadasii, Histoplasma capsulatum, and Pneumocystis carinii), and in zygomycetes (Rhizopus oryzae).

The topology of the fungal chromoviral tree shows the presence of several basidiomycete-specific and ascomycete-specific clades, but no clade contains representatives from both phyla (fig. 5). Ascomycetes contain six clades of chromoviruses (Maggy, MGRL3, Pyret, Coccy, Tf1, and Ty3), whereas the Basidiomycetes contain four clades (MarY1, Smut, Tcn1, and Tcn2). We have significantly increased the number of novel full-length elements known in Ascomycetes and, especially, in the Basidiomycetes (fig. 5).



View larger version (34K):
[in this window]
[in a new window]
 
FIG. 5. Diversity of chromoviruses in Fungi. The rooted NJ tree using the Poisson correction model and the D. buzzatii Osvaldo element as outgroup. The NJ tree represents the bootstrap consensus after 1,000 replicates; nodes with confidence values greater than 50% are indicated. Novel chromoviral elements found in this study are in bold. Most sequences were obtained from the GenBank; genus names and accession numbers are included. Phanerochaete chrysosporium sequences have been obtained from the JGI genome Web site. For the annotated sequences, we added the name of the element. Fungal chromoviral clades were named after the typical elements. Histoplasma capsulatum sequences have been obtained from the UWGC Web site (http://genome.wustl.edu) and Pneumocystis carinii from the Pneumocystis genome project Web site (http://pneumocystis.uc.edu). Cryptococcus neoformans elements belonging to Tcn1 and Tcn2 clades were obtained from the Retrobase Web site (http://biocadmin.otago.ac.nz/Retrobase/home.html). Asterisks denote the manually assembled Rhizopus oryzae chromoviral sequences

 
As previously reported, the majority of ascomycete chromoviral elements are vertically inactivated, containing numerous stop codons (Hamann, Feller, and Osiewacz 2000). Apparently, Fungi have a much smaller tolerance for transposable elements than plants. Indeed, the majority of fungal TEs, including chromoviruses, are inactivated through the process of RIP or MIP (Selker 2002; Daboussi et al. 2002). Currently available genome data for the basal fungal lineages are very limited (e.g., Rhizopus oryzae, Zygomycetes), but we can expect that Zygomycetes and Chitrydiomycetes harbor in their larger genomes more chromoviruses than ascomycetes and basidiomycetes.

Insertion Specificities of Fungal Chromoviruses
Retrotransposons insert into the host genome by targeting integration into "safe havens," thereby avoiding the disruption of coding sequences. Although 60% of the genome of Schizosaccharomyces pombe is coding sequence, all Tf1 insertions occur in intergenic regions. It has been found experimentally that Tf1 preferentially inserts into intergenic regions that include RNA polymerase II promoters. Integration of Tf1 is biased toward promoter-proximal regions of genes, 100 to 400 bp upstream of the translation start site. Recently active Tf1 elements were found to be absent from centromeres and pericentromeric regions of the genome containing tandem tRNA gene clusters. Chromosome III has twice the density of insertion events observed in the other two chromosomes (Singleton and Levin 2002; Bowen et al. 2003). The S. cerevisiae retrotransposon Ty3 inserts specifically into the initiation sites of genes transcribed by RNA polymerase III (Jordan and McDonald 1999). Data relating to the insertion specificities of other fungal chromoviruses are very limited. Insertion specificities for the centromeres have been reported for Neurospora crassa (Cambareri, Aisner, and Carbon 1998) and for Aspergillus nidulans (Nielsen, Hermansen, and Aleksenko 2001), but no FISH studies have been made to show the preference of any fungal chromovirus for the centromeres.

Chromoviruses Are Transcriptionally Active in Fungal Genomes
Searching with known fungal chromoviral elements shows their presence in the fungal section of the EST database. For example, searching with MAGGY element shows numerous high-score matches to Magnaporthe grisea ESTs. The same was found with other elements from Magnaporthe. We found that chromoviruses are underrepresented in the EST databases because only a few families in Magnaporthe grisea are transcribed (see table 2 in Supplementary Material online). Searching with the Cgret element, the only characterized chromovirus from the Colletotrichum gloeosporioides, shows the presence of novel elements in C. trifolii with 46% identity to the Cgret element. This indicates that Colletotrichum trifolii also contain several distinct chromoviral elements. In the fungal section of the EST database, significant matches were found for only a few ascomycete fungi (Magnaporthe grisea, Colletotrichum trifolii, S. pombe, S. cerevisiae, Blumeria graminis, and Paracoccidioides brasiliensis). This bias is caused by the absence of an adequate amount of EST sequence data for the basidiomycetes and the other more basal fungal lineages (Chytridiomycetes and Zygomycetes). The presence of the transcripts of already known elements and of novel diverse elements (Colletotrichum trifolii) in the EST database suggests that chromoviruses are transcriptionally active in fungi. They have been found to be expressed in germline tissues (conidia) and in growing tissues (appresorium and young actively growing mycelia) (see table 2 in Supplementary Material online).

Chromoviruses in Metazoa

Chromoviral Diversity in Vertebrata
Until now, the extent of chromoviral diversity in vertebrates was known only for Takifugu rubripes, a teleost fish with a very compact genome (Butler et al. 2001). It now becomes clear that chromoviral diversity is considerably greater in species with larger genomes. In the nearly complete zebrafish genome, five times bigger than that of pufferfish, we have found a very diverse chromoviral repertoire containing more than 30 distinct full-length elements (fig. 6). The zebrafish elements are structurally intact and highly conserved, and even their LTRs are, in most cases, either identical or no more than 5% divergent, indicating recent retrotransposition (table 1).



View larger version (26K):
[in this window]
[in a new window]
 
FIG. 6. Diversity of chromoviruses in Vertebrata. The rooted NJ tree using the Poisson correction model and the D. buzzatii Osvaldo element as outgroup. The NJ tree represents the bootstrap consensus after 1,000 replicates; nodes with confidence values greater than 50% are indicated. The novel consensus PCR amplified RT/RH sequences obtained in this study are in bold italics. Novel chromoviral elements from teleost fishes and X. tropicalis found in this study are in bold. Most sequences were obtained from the GenBank; genus names and accession numbers are included. Xenopus tropicalis sequences have been obtained from the corresponding JGI genome Web site. For the annotated sequences we added the name of the element

 
The availability of genome sequence data for two species with very diverse genome sizes provides an insight into the diversity of chromoviruses in compact and normal size genomes of teleost fishes. In the compact genome of Takifugu rubripes, only 3 families of sushi exist (Butler et al. 2001), whereas more than 30 distinct chromoviral families can be seen in the zebrafish genome (table 1 and fig. 6), which also differ greatly in their copy numbers (data not shown). Chromoviral diversity and conservation in zebrafish differ from the tentative conclusions of Butler et al. (2001) because of the then limited data. It is obvious that in the compact vertebrate genomes, the retrotransposon families have low copy numbers and the number of distinct families is low, indicating that the reduced TE diversity is typical of compact genomes.

The zebrafish genome is very informative because it shows for the first time the complete chromoviral diversity in teleost fishes. Teleost fishes have genome sizes that are in most cases only one-third of the human. In contrast, the salamanders possess one of the largest genomes. Regarding the very high copy number of the Hsr1 element (Marracci et al. 1996), we can expect an extremely diverse and high copy number chromoviral repertoire in their huge genomes. Chromoviral diversity is much more difficult to recognize in any other vertebrate group. Sequence data from amphibians, as represented by the Xenopus tropicalis genome database and the X. laevis and X. tropicalis EST databases, show the presence of full-length and diverse chromoviral elements in Xenopus tropicalis, but the real level of diversity remains to be seen from its complete genome sequence.

Chromoviruses in Amniota
The amniotes (Amniota) are a monophyletic group of vertebrates, comprising reptiles, birds, and mammals, that develops in its embryonic life the envelope called amnion. The absence of chromoviruses from birds and mammals has been noted previously (Miller et al. 1999), whereas the protease (PR)/RT fragments have been amplified from a few reptilian species (Miller et al. 1999). Although the chromoviral elements may constitute the major part of the salamander genome (Marracci et al. 1996), they have clearly disappeared in some Amniota lineages. Reptiles are, therefore, a crucial taxonomic group for understanding chromoviral evolution in Amniota.

This key position of reptiles prompted a more extensive investigation of chromoviral distribution in this group. The approximate 1.0 kb fragment encoding RT and RH has been amplified by PCR, using degenerate oligonucleotide primers. Chromoviral distribution has been analyzed in 15 reptilian species, including representatives of crocodiles, turtles, and squamates (table 2). Chromoviruses have been detected in the earliest reptilian lineage, the Squamata (snakes and lizards), and in crocodiles but not in the genomes of turtles. To confirm that the amplified PCR products were chromoviruses and to analyze their evolutionary relationships, they were cloned and sequenced from the 14 reptilian species. In all reptilian chromoviruses, we found stop codons and short indels, which result in frameshift mutations or "in-frame" stop codons (fig. 7).



View larger version (112K):
[in this window]
[in a new window]
 
FIG. 7. Novel RT/RH sequences of the reptilian chromoviruses. This alignment was constructed with the program ClustalX (Thompson et al. 1997). Identical residues are in black and conservative substitutions are in gray

 
In contrast to the presence of numerous full-length chromoviral elements in teleost (Danio rerio) and amphibian (Xenopus tropicalis) genomes, no full-length chromoviral elements can be found in the Amniota genomes. Searching of all publicly available avian (chicken, Gallus gallus) and mammalian (monotremes, marsupials, and numerous eutherian species) genome databases shows the absence of chromoviruses from these genomes. If reptilian genomes were to contain conserved and potentially active chromoviral elements, they could be amplified by PCR because degenerate primers based on conserved motifs have been used. The presence of the numerous in-frame stop codons in the majority of reptilian chromoviral sequences indicates the absence of active elements from the reptilian genomes. Although squamates and crocodiles still harbor vertically inactivated chromoviral elements, they have been lost stochastically in birds and mammals. A pattern of rapid vertical inactivation and stochastic loss of chromoviruses is evident in the genomes of Amniota.

Hur1-like Elements in the Mammalian Genomes
Butler et al. (2001) have identified the full-length Hur1 element in the human genome. We obtained additional full-length representatives from mouse, rat, and sheep genomes (see figure 2 in Supplementary Material online). Using human KIAA1051 protein (encoding Gag and PR) as a query, we found novel representatives in chimpanzee (AC094111), macaque (AB060816), baboon (AC092529), cat (AC108197), dog (AC110669), cow (AC144997), pig (BF079738), and rat (XM_228548, AI599367). Such a widespread distribution of Hur1 elements in diverse mammalian orders, including Primates, Cetartiodactyla, Carnivora, and Rodentia, indicates their origin more than 100 MYA. The analysis of the novel full-length Hur1 elements shows the presence of gag, PR, and RT domains but the absence of LTRs, RH, and chromointegrase sequences. The Hur1 elements are present in very low copy number in eutherian mammals. They are highly conserved in mammals, and recent studies have shown their up-regulation in developmentally regulated processes (Ono et al. 2001; Shigemoto et al. 2001). There is increasing evidence that Hur1 elements represent a new example of mammalian genes derived from TEs and may indeed represent important regulatory genes (Volff, Korting, and Schartl 2001).

Chromoviruses Are Transcriptionally Active in Vertebrate Genomes
Searching the vertebrate sections of EST databases with the different chromoviral representatives shows numerous matches to various teleost fishes (Danio rerio, Oryzias latipes, and salmonid fishes) and to amphibians (Xenopus laevis). This indicates that, among vertebrates, transcriptionally active chromoviruses are present, at least in teleost fishes and amphibians. We found the underrepresentation of chromoviruses in the vertebrate section of the EST databases because only a few chromoviral families in zebrafish are transcribed. Chromoviral transcripts have been found to be expressed in the germline and embryonic tissues. Transcripts of the Hur1 elements have been found in human cancer cell lines and in embryonic and germline tissues of the mouse and rat (see table 2 in Supplementary Material online).

Phylogenomic Analysis of Chromoviruses in Eukaryota
Kingdom Plantae
The evolution of chromoviruses can be very simply described using the phylogenomic approach (Eisen and Hanawalt 1999), in which knowledge of the correct taxonomy is crucial. Although plants are a very diverse group of eukaryotes, the genome sequence data are limited largely to economically important angiosperms (cereals), thereby preventing an unbiased view of chromoviral evolutionary history and dynamics in plants. Phylogenetic analysis indicates that the genomes of land plants contain four clades of plant chromoviruses, which emerged much earlier than previously thought (Marin and Llorens 2000). The presence of four chromoviral clades in the genomes of Coniferophyta, evident from the phylogenetic analysis of plant chromoviral RT sequences (data not shown), indicates that clade diversification occurred before the radiation of Spermatophyta. The absence of distinct chromoviral clades in green algae suggests that chromoviral clades emerged after the separation of green algae and the lineage leading to Streptophyta (fig. 4). The finding of chromoviruses in both red and green algae significantly extends their range of distribution in plants because they were previously believed to be present only in Spermatophyta (Kumar and Bennetzen 1999; Marin and Llorens 2000). The finding of chromointegrases in the red algae (fig. 3), the most basal plant lineage, which originated around 1.5 billion years ago (Hedges 2002), shows the very early origin and old age of chromoviruses in the kingdom Plantae.

If chromoviruses were strictly vertically transferred in the plant kingdom, we would expect to find them in all major plant lineages. During this study we have observed the presence of chromoviruses in most of the plant lineages for which sufficient genome or EST data is available (figs. 4 and 8A; also see table 3 in Supplementary Material online). We have found them in the genomes of the most ancestral plant lineages, such as in red and green algae. Among land plants, we found chromoviruses in mosses (Bryopsida), ferns (Filicopsida), gymnosperms (Cycadales, Ginkgoales, and Coniferales), and flowering plants (angiosperms). Such a widespread distribution confirms the ubiquity of chromoviruses in the kingdom Plantae and their evolution by strict vertical transfer.



View larger version (50K):
[in this window]
[in a new window]
 
FIG. 8. Phylogenetic distribution of chromoviruses in the kingdom Plantae (A), in unikonts (B), and in eukaryotes (C). Taxonomic groups with available genome data are shown in bold. The presence or absence of chromoviruses is indicated by (+) or (–), while a question mark (?) indicates paucity of sequence data. (A) This representation of the phylogenetic relationships between the major plant groups is based on the data of Soltis and Soltis (2003) and Hedges (2002). Branches in bold represent the presence of chromoviruses. (B) This representation of the phylogenetic relationships between the unikonts (Amoebozoa + Fungi + Animalia) is based on the data of Stechmann and Cavalier-Smith (2003) and Hedges (2002). An asterisk indicates the presence of Hur1 elements. Branches in bold represent the presence of chromoviruses, whereas dotted branches represent their absence. (C) A representation of the phylogenetic relationships between the eukaryotes based on the data of Baldauf (2003), Cavalier-Smith and Chao (2003), and Hedges (2002). The presence of chromoviruses in a taxonomic group is shown in bold, whereas the absence of chromoviruses (in taxonomic groups with genome data) is shown in outlined font

 
The majority of chromoviral sequence data comes from different angiosperm lineages. We have found chromoviruses in the most basal extant angiosperm lineages, represented by the Amborellales and Nymphaeales. Among the eudicotyledons, we have found them in the genomes of numerous orders, such as Asterales, Apiales, Solanales, Caryophylalles, Fabales, Malphigiales, Rosales, Cucurbitales, Brassicales, Malvales, Sapindales, Lamiales, Vitales, and Ranunculales (fig. 4; also see table 3 in Supplementary Material online). Such a widespread distribution clearly shows that chromoviruses are a ubiquitous component of the eudicot genomes. Among the monocotyledons (Liliopsida), chromoviruses are present in different orders, such as Poales, Zingiberales, Liliales, and Asparagales. Such a widespread distribution clearly shows that chromoviruses are a ubiquitous genome component of the monocot plants also (fig. 4; also see table 3 in Supplementary Material online). It has to be noted that, for the majority of plant taxonomic groups (red algae, green algae, glaucophytes, hornworts, liverworts, lycophytes, ferns, and gymnosperms), very limited or zero sequence data exists.

Kingdom Fungi
Fungi are divided into six phyla: Microsporidia, Chytridiomycota, Zygomycota, Glomeromycota, Basidiomycota, and Ascomycota. If the chromoviruses in the kingdom Fungi were strictly vertically transferred, we would expect to find them in all major fungal lineages. We have observed them in most fungal phyla (figs. 5 and 8B; also see table 3 in Supplementary Material online), except in the phylum Chytridiomycota, for which there is insufficient sequence data. In the phylum Microsporidia (parasitic fungi with compact genomes), we observed chromoviruses in the Spraguea lophii but not in the Encephalitozoon cuniculi genome. In the phylum Zygomycota, we have found chromoviruses in the Rhizopus oryzae, and in the phylum Glomeromycota, we observed them in the Glomus intraradices.

Phylum Basidiomycota is divided into three classes, and we have found chromoviruses in the representatives of all three classes. In the class Hymenomycetes, we found them in the Heterobasidiomycetes and in the Homobasidiomycetes. We found chromoviruses in the classes Ustilaginomycetes and Ureidiniomycetes. Their presence in all three classes confirms their widespread distribution in the phylum Basidiomycota. It should be noted that big differences were observed in the chromoviral repertoires between different basidiomycete species, from the very rich and diverse (Coprinopsis cinera) to the very limited repertoires (e.g., Ustilago maydis).

Phylum Ascomycota is divided into three subphyla: Pezizomycotina, Saccharomycotina (Hemiascomycetes), and Taphrinomycotina (Archoascomycetes). We have observed chromoviruses in all major classes of the subphylum Pezizomycotina, such as Eurotiomycetes, Sordariomycetes, Leotiomycetes, and Dothideomycetes. In the subphylum Saccharomycotina, chromoviruses are present in a number of yeast species (see table 3 in Supplementary Material online). In the subphylum Taphrinomycotina, chromoviruses are present in the class Schizosaccharomycetes. We have found chromoviruses also in the class Pneumocystidiomycetes. Finding chromoviruses in all three subphyla and in all the major classes confirms their widespread distribution in the phylum Ascomycota.

During this study, we have observed the presence of chromoviruses in most fungal phyla for which sufficient genome or EST data are available (figs. 5 and 8B; also see table 3 in Supplementary Material online). Such a widespread distribution confirms their ubiquity in the fungal genomes and their evolution by strict vertical transfer. The evolutionary dynamics of chromoviruses in the ascomycetes is characterized by vertical inactivation because most chromoviruses are inactivated through the RIP process (Selker 2002). Some microsporidians (Encephalitozoon cuniculi) show the absence of chromoviruses in their genomes—a clear case of the stochastic loss. It is important to note that, for most fungal taxonomic groups, very limited or zero sequence data exists. The Fungal Genome Initiative (www.genome.wi.mit.edu/seq/fgi/) will allow in the very near future the analysis of a number of fungal genomes that belong to the major fungal taxonomic groups. We hope that analysis of these novel fungal genomes will greatly improve our knowledge of TE diversity and evolution in the kingdom Fungi.

Phylum Amoebozoa
According to the latest evolutionary studies, animals, fungi, and Amoebozoa belong to the unikonts (Stechmann and Cavalier-Smith 2003) (fig. 8B and C). Amoebozoa are a very diverse protozoan phylum with not-well-resolved evolutionary relationships and with the limited genome data restricted to Dictyostelium discoideum and Entamoeba histolytica. Considering their close evolutionary relationships with animals and fungi, we expect that, on the basis of the strict vertical transfer hypothesis, they will contain chromoviruses (fig. 8B). They were indeed observed in the genome of the Dictyostelium discoideum (element Skipper) (Malik and Eickbush 1999; Marin and Llorens 2000) but not in the genome of Entamoeba histolytica.

Kingdom Animalia
On the basis of the strict vertical hypothesis, we would expect to find chromoviruses in majority of phyla in the animal kingdom, from the most basal metazoan lineages (Porifera and Cnidaria) to the vertebrates. However, this is not found, because in the subkingdom Bilateria, no chromoviruses can be observed in the genomes of Protostomia (comprising most invertebrate phyla) or in the most basal Deuterostomia lineages (Echinodermata and Urochordata). Chromoviruses appeared much later in the Vertebrata but quickly disappeared in Amniota (birds and mammals) (fig. 8B).

The observed distribution of chromoviruses in metazoan genomes can be explained by frequent loss or by single horizontal transfer in the most basal vertebrate lineage. Because chromoviruses are present in vertebrates, they must be retained in the genome of the urbilaterian animal. In the lineage leading to Protostomia, they have been apparently lost but retained in the lineage leading to Deuterostomia. In the case of Deuterostomia, chromoviruses were obviously lost from the basal lineages but retained in the vertebrates. The most recent case of stochastic loss of chromoviruses can be seen in avian genomes. Such loss is not unexpected, because these genomes are very compact and are nearly without TEs, except the CR1 elements and endogenous retroviruses. It is important to note that the genome sequence data are extremely sparse for the most basal metazoans (Porifera and Cnidaria), but they are crucial for understanding the evolution of Bilateria.

The Origin and Age of Chromoviruses
In this study, we have found chromoviruses in Cercozoa (Phytomyxea) and in Heterokonta (Oomycetes) (fig. 8C). No full-length elements can be found in Polymyxa betae (AJ245891) and Phytophthora infestans (Judelson 2002), but are present in the P. sojae genome. The most typical part of chromoviruses, the chromointegrase, can be recognized in the above cases (fig. 3). The phylogenetic relationships of the basal eukaryotic lineages are still hotly debated (Cavalier-Smith and Chao 2003). Phytomyxea and Oomycetes are plant parasites. Although some still treat them as fungi, phylogenetic analyses based on rRNA or protein data never show close relationships with the fungi (Cavalier-Smith and Chao 2003). No chromoviruses can be found in the genome databases of the most basal eukaryotic lineages (Diplomonadida, Euglenozoa, Alveolata, and Bacillariophyta [diatoms]) (fig. 8C). The absence of chromoviruses from the genomes of the most basal eukaryotic lineages indicates their loss or, more likely, their later origin. The genomes of the most basal eukaryotic lineages show the absence of any LTR retrotransposons, although they contain the non-LTR retrotransposons (Burke et al. 2002). The discovery of chromoviruses in Cercozoa and in the Heterokonta is crucial for understanding the evolutionary origin of LTR retrotransposons because it shows the presence and origin of chromoviruses in very basal eukaryotic lineages. Currently available genome data for basal eukaryotic lineages, although very limited relative to the enormous diversity of basal eukaryotic lineages, indicates that chromoviruses originated no earlier than in Cercozoa.

High Rate of Genomic Turnover of Chromoviruses
Recent Amplification of Chromoviruses: Evidence from LTR Similarity
The majority of the novel and the previously known full-length chromoviral elements show greater than 95% LTR identity (table 1; also see table 1 in Supplementary Material online). The level of sequence divergence between LTRs of a particular element can be used in determining the relative ages of LTR retrotransposon families (SanMiguel et al. 1998). The low level of sequence divergence among LTRs indicates that most of the full-length chromoviruses in eukaryotes are relatively young. The prevalence of young, full-length LTR retrotransposons has been found previously in plants (SanMiguel et al. 1998; McCarthy et al. 2002), baker's yeast (Jordan and McDonald 1999), C. elegans (Bowen and McDonald 1999), and D. melanogaster (Bowen and McDonald 2001).

Our data clearly show the prevalence of young, full-length elements in the Viridiplantae genomes, from the green algae to the seed plants (table 1; also see table 1 in Supplementary Material online). High genomic turnover of the chromoviruses in the fission yeast (Bowen et al. 2003) and the gypsy elements in the hemiascomycetous yeasts (Neuveglise et al. 2002) has been observed. We have observed the prevalence of the young, full-length elements in diverse basidiomycetes (Hymenomycetes [Coprinopsis cinerea and Phanerochaete chrysosporium]) and in Ustilaginomycetes [Ustilago hordei]). For the first time, it can be seen that most of the full-length chromoviruses present in the fish and amphibian genomes are very young (table 1). The finding of the prevalence of young, full-length elements in Danio rerio and Xenopus tropicalis and their high genomic turnover constitutes the first such report for LTR retrotransposons in vertebrates. In contrast to mammals (Tristem 2000), zebrafish (representative of a typical teleost and vertebrate genome) contain a very diverse repertoire of numerous, recently amplified LTR retrotransposons (data not shown).

Genomic Turnover
Data reported here indicate that the chromoviral populations in eukaryotic genomes are diverse, but, inside a particular family, they are very homogenous. Elements within a given family are very similar in both size and sequence. Furthermore, LTR comparisons indicate that most chromoviruses in eukaryotic genomes have recently transposed, indicating that plant, fungal, and vertebrate genomes contain many functional chromoviruses. Collectively, these facts suggest a high level of genomic turnover of chromoviruses in the eukaryotic genomes.

Eukaryotic genomes also possess a specific mechanism for eliminating LTR retrotransposon insertions through intra-element LTR recombination, leaving a solo LTR behind. Plant (SanMiguel et al. 1998; McCarthy et al. 2002), fungal (Jordan and McDonald 1999; Bowen et al. 2003), and animal (Ganko, Fielman, and McDonald 2001) genomes contain numerous solo LTRs that vastly outnumber full-length elements. This indicates that the ultimate fate of LTR retrotransposons in the genome is their elimination. However, replication through retrotransposition provides a means for LTR retrotransposons to avoid this fate. The high levels of 5' to 3' LTR identity indicate that all full-length chromoviruses in eukaryotes have transposed relatively recently. It is apparent that chromoviruses have evolved strategies such as high genomic turnover and site-specific integration that enable their long-term survival in diverse eukaryotic genomes. The rapid genomic turnover of chromoviruses evidenced here is probably the result of their successful efforts to escape genomic repression and elimination mechanisms.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 Literature Cited
 
Supplementary tables and figures are available online at the MBE Web site (www3.oup.co.uk/jnls/lis/molbev). The sequences reported in this paper have been deposited to the GenBank database under accession numbers AY158706 to AY158744 and AY312970 to AY312975.


    Note Added in Proof
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 Literature Cited
 
In a paper published after submission of this manuscript, the evolutionary analysis of Hur1 elements in mammals have been reported (Lynch and Tristem 2003).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 Literature Cited
 
We thank Prof. R. H. Pain for critical reading of the manuscript, Dr. R. Zardoya for the vertebrate genomic DNA samples, and two anonymous reviewers for their valuable comments on an earlier version of the manuscript. This work was supported by the Ministry of Education, Science and Sport of Slovenia by research program P0-0501-0106-03.


    Footnotes
 
Brandon Gaut, Associate Editor Back


    Literature Cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Supplementary Material
 Note Added in Proof
 Acknowledgements
 Literature Cited
 

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.[Abstract/Free Full Text]

    Bae, Y. A., S. Y. Moon, Y. Kong, S. Y. Cho, and M. G. Rhyu. 2001. CsRn1, a novel active retrotransposon in a parasitic trematode, Clonorchis sinensis, discloses a new phylogenetic clade of Ty3/gypsy-like LTR retrotransposons. Mol. Biol. Evol. 18:1474-1483.[Abstract/Free Full Text]

    Baldauf, S. L. 2003. The deep roots of eukaryotes. Science 300:1703-1706.[Abstract/Free Full Text]

    Bennetzen, J. L. 2000. Transposable element contributions to plant gene and genome evolution. Plant Mol. Biol. 42:251-269.[CrossRef][ISI][Medline]

    Bowen, N. J., I. K. Jordan, J. A. Epstein, V. Wood, and H. L. Levin. 2003. Retrotransposons and their recognition of pol II promoters: a comprehensive survey of the transposable elements from the complete genome sequence of Schizosaccharomyces pombe. Genome Res. 13:1984-1997.[Abstract/Free Full Text]

    Bowen, N. J., and J. F. McDonald. 1999. Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-like elements. Genome Res. 9:924-935.[Abstract/Free Full Text]

    Bowen, N. J., and J. F. McDonald. 2001. Drosophila euchromatic LTR retrotransposons are much younger than the host species in which they reside. Genome Res. 11:1527-1540.[Abstract/Free Full Text]

    Burke, W. D., H. S. Malik, S. M. Rich, and T. H. Eickbush. 2002. Ancient lineages of non-LTR retrotransposons in the primitive eukaryote, Giardia lamblia. Mol. Biol. Evol. 19:619-630.[Abstract/Free Full Text]

    Butler, M., T. Goodwin, M. Simpson, M. Singh, and R. Poulter. 2001. Vertebrate LTR retrotransposons of the Tf1/sushi group. J. Mol. Evol. 52:260-274.[ISI][Medline]

    Cambareri, E. B., R. Aisner, and J. Carbon. 1998. Structure of the chromosome VII centromere region in Neurospora crassa: degenerate transposons and simple repeats. Mol. Cell Biol. 18:5465-5477.[Abstract/Free Full Text]

    Cavalier-Smith, T., and E. E. Chao. 2003. Phylogeny of choanozoa, apusozoa, and other protozoa and early eukaryote megaevolution. J. Mol. Evol. 56:540-563.[CrossRef][ISI][Medline]

    Copenhaver, G. P., K. Nickel, and T. Kuromori, et al. (14 co-authors). 1999. Genetic definition and sequence analysis of Arabidopsis centromeres. Science 286:2468-2474.[Abstract/Free Full Text]

    Daboussi, M. J., J. M. Daviere, S. Graziani, and T. Langin. 2002. Evolution of the Fot1 transposons in the genus Fusarium: discontinuous distribution and epigenetic inactivation. Mol. Biol. Evol. 19:510-520.[Abstract/Free Full Text]

    Eisen, J. A., and P. C. Hanawalt. 1999. A phylogenomic study of DNA repair genes, proteins, and processes. Mutat. Res. 435:171-213.[ISI][Medline]

    Friesen, N., A. Brandes, and J. S. Heslop-Harrison. 2001. Diversity, origin, and distribution of retrotransposons (gypsy and copia) in conifers. Mol. Biol. Evol. 18:1176-1188.[Abstract/Free Full Text]

    Ganko, E. W., K. T. Fielman, and J. F. McDonald. 2001. Evolutionary history of Cer elements and their impact on the C. elegans genome. Genome Res. 11:2066-2074.[Abstract/Free Full Text]

    Glockner, G., K. Szafranski, T. Winckler, T. Dingermann, M. A. Quail, E. Cox, L. Eichinger, A. A. Noegel, and A. Rosenthal. 2001. The complex repeats of Dictyostelium discoideum. Genome Res. 11:585-594.[Abstract/Free Full Text]

    Goodwin, T. J., and R. T. Poulter. 2002. A group of deuterostome Ty3/gypsy-like retrotransposons with Ty1/copia-like pol-domain orders. Mol. Genet. Genomics 267:481-491.[CrossRef][ISI][Medline]

    Hamann, A., F. Feller, and H. D. Osiewacz. 2000. Yeti—a degenerate gypsy-like LTR retrotransposon in the filamentous ascomycete Podospora anserina. Curr. Genet. 38:132-140.[CrossRef][ISI][Medline]

    Hedges, S. B. 2002. The origin and evolution of model organisms. Nat. Rev. Genet. 3:838-849.[CrossRef][ISI][Medline]

    Hudakova, S., W. Michalek, G. G. Presting, R. ten Hoopen, K. dos Santos, Z. Jasencakova, and I. Schubert. 2001. Sequence organization of barley centromeres. Nucleic Acids Res. 29:5029-5035.[Abstract/Free Full Text]

    Jordan, I. K., and J. F. McDonald. 1999. Tempo and mode of Ty element evolution in Saccharomyces cerevisiae. Genetics 151:1341-1351.[Abstract/Free Full Text]

    Judelson, H. S. 2002. Sequence variation and genomic amplification of a family of Gypsy-like elements in the oomycete genus Phytophthora. Mol. Biol. Evol. 19:1313-1322.[Abstract/Free Full Text]

    Kidwell, M. G., and D. R. Lisch. 2001. Perspective: transposable elements, parasitic DNA, and genome evolution. Evol. Int. J. Org. Evol. 55:1-24.

    Kordis, D., and F. Gubensek. 1998. Unusual horizontal transfer of a long interspersed nuclear element between distant vertebrate classes. Proc. Natl. Acad. Sci. USA 95:10704-10709.[Abstract/Free Full Text]

    Kumar, A., and J. L. Bennetzen. 1999. Plant retrotransposons. Annu. Rev. Genet. 33:479-532.[CrossRef][ISI][Medline]

    Kumar, S., K. Tamura, I. B. Obsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245.[Abstract/Free Full Text]

    Kumekawa, N., N. Ohmido, K. Fukui, E. Ohtsubo, and H. Ohtsubo. 2001. A new gypsy-type retrotransposon, RIRE7: preferential insertion into the tandem repeat sequence TrsD in pericentromeric heterochromatin regions of rice chromosomes. Mol. Genet. Genomics 265:480-488.[CrossRef][ISI][Medline]

    Langdon, T., C. Seago, M. Mende, M. Leggett, H. Thomas, J. W. Forster, R. N. Jones, and G. Jenkins. 2000. Retrotransposon evolution in diverse plant genomes. Genetics 156:313-325.[Abstract/Free Full Text]

    Lynch, C., and M. Tristem. 2003. A co-opted gypsy-type LTR-retrotransposon is conserved in the genomes of humans, sheep, mice, and rats. Curr. Biol. 13:1518-1523.[CrossRef][ISI][Medline]

    Malik, H. S., and T. H. Eickbush. 1999. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J. Virol. 73:5186-5190.[Abstract/Free Full Text]

    Malik, H. S., and T. H. Eickbush. 2001. Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses. Genome Res. 11:1187-1197.[Abstract/Free Full Text]

    Marin, I., and C. Llorens. 2000. Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data. Mol. Biol. Evol. 17:1040-1049.[Abstract/Free Full Text]

    Marracci, S., R. Batistoni, G. Pesole, L. Citti, and I. Nardi. 1996. Gypsy/Ty3-like elements in the genome of the terrestrial Salamander hydromantes (Amphibia, Urodela). J. Mol. Evol. 43:584-593.[ISI][Medline]

    McCarthy, E. M., J. Liu, G. Lizhi, and J. F. McDonald. 2002. Long terminal repeat retrotransposons of Oryza sativa. Genome Biol. 3:RESEARCH00053.1–0053.11.

    Miller, J. T., F. Dong, S. A. Jackson, J. Song, and J. Jiang. 1998. Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics 150:1615-1623.[Abstract/Free Full Text]

    Miller, K., C. Lynch, J. Martin, E. Herniou, and M. Tristem. 1999. Identification of multiple Gypsy LTR-retrotransposon lineages in vertebrate genomes. J. Mol. Evol. 49:358-366.[ISI][Medline]

    Neuveglise, C., H. Feldmann, E. Bon, C. Gaillardin, and S. Casaregola. 2002. Genomic evolution of the long terminal repeat retrotransposons in hemiascomycetous yeasts. Genome Res. 12:930-943.[Abstract/Free Full Text]

    Nielsen, M. L., T. D. Hermansen, and A. Aleksenko. 2001. A family of DNA repeats in Aspergillus nidulans has assimilated degenerated retrotransposons. Mol. Genet. Genomics 265:883-887.[CrossRef][ISI][Medline]

    Nymark-McMahon, M. H., and S. B. Sandmeyer. 1999. Mutations in nonconserved domains of Ty3 integrase affect multiple stages of the Ty3 life cycle. J. Virol. 73:453-465.[Abstract/Free Full Text]

    Okamoto, H., and H. Hirochika. 2001. Silencing of transposable elements in plants. Trends Plant Sci. 6:527-534.[CrossRef][ISI][Medline]

    Ono, R., S. Kobayashi, H. Wagatsuma, K. Aisaka, T. Kohda, T. Kaneko-Ishino, and F. Ishino. 2001. A retrotransposon-derived gene, PEG10, is a novel imprinted gene located on human chromosome 7q21. Genomics 73:232-237.[CrossRef][ISI][Medline]

    Poulter, R., and M. Butler. 1998. A retrotransposon family from the pufferfish (fugu) Fugu rubripes. Gene 215:241-249.[CrossRef][ISI][Medline]

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.[Abstract]

    SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima, and J. L. Bennetzen. 1998. The paleontology of intergene retrotransposons of maize. Nat. Genet. 20:43-45.[CrossRef][ISI][Medline]

    Selker, E. U. 2002. Repeat-induced gene silencing in fungi. Adv. Genet. 46:439-450.[ISI][Medline]

    Shigemoto, K., J. Brennan, E. Walls, C.J. Watson, D. Stott, P.W. Rigby, and A.D. Reith. 2001. Identification and characterisation of a developmentally regulated mammalian gene that utilises –1 programmed ribosomal frameshifting. Nucleic Acids Res. 29:4079-4088.[Abstract/Free Full Text]

    Singleton, T. L., and H. L. Levin. 2002. A long terminal repeat retrotransposon of fission yeast has strong preferences for specific sites of insertion. Eukaryot. Cell. 1:44-55.[Abstract/Free Full Text]

    Soltis, D. E., and P. S. Soltis. 2003. The role of phylogenetics in comparative genetics. Plant Physiol. 132:1790-1800.[Free Full Text]

    Stechmann, A., and T. Cavalier-Smith. 2003. The root of the eukaryote tree pinpointed. Curr. Biol. 13:R665-R666.[CrossRef][ISI][Medline]

    Suoniemi, A., J. Tanskanen, and A. H. Schulman. 1998. Gypsy-like retrotransposons are widespread in the plant kingdom. Plant J. 13:699-705.[CrossRef][ISI][Medline]

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTALX Windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882.[Abstract/Free Full Text]

    Tristem, M. 2000. Identification and characterization of novel human endogenous retrovirus families by phylogenetic screening of the human genome mapping project database. J. Virol. 74:3715-3730.[Abstract/Free Full Text]

    Van de Peer, Y., and R. De Wachter. 1997. Construction of evolutionary distance trees with TREECON for Windows: accounting for variation in nucleotide substitution rate among sites. Comput. Appl. Biosci. 13:227-230.[Abstract]

    Volff, J., C. Korting, and M. Schartl. 2001. Ty3/Gypsy retrotransposon fossils in mammalian genomes: did they evolve into new cellular functions? Mol. Biol. Evol. 18:266-270.[Free Full Text]

    Yang, J., H. S. Malik, and T. H. Eickbush. 1999. Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc. Natl. Acad. Sci. USA 96:7847-7852.[Abstract/Free Full Text]

Accepted for publication November 7, 2003.