Structural and Evolutionary Analyses of the Ty3/gypsy Group of LTR Retrotransposons in the Genome of Anopheles gambiae

Jose Manuel C. Tubío*, Horacio Naveira{dagger} and Javier Costas{ddagger}

* Departamento de Xenética, Facultade de Bioloxía, Universidade de Santiago de Compostela, Spain; {dagger} Departamento de Bioloxía Celular e Molecular, Universidade de A Coruña, Spain; and {ddagger} Unidade de Medicina Molecular, INGO, Complexo Hospitalario Universitario de Santiago de Compostela, Spain

Correspondence: E-mail: bfcostas{at}usc.es.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
The recent availability of the genome of Anopheles gambiae offers an extraordinary opportunity for comparative studies of the diversity of transposable elements (TEs) and their evolutionary dynamics between two related species, taking advantage of the existing information from Drosophila melanogaster. To this goal, we screened the genome of A. gambiae for elements belonging to the Ty3/gypsy group of long-terminal repeat (LTR) retrotransposons. The A. gambiae genome displays a rich diversity of LTR retrotransposons, clearly greater than D. melanogaster. We have characterized in detail 63 families, belonging to five of the nine main lineages of the Ty3/gypsy group. The Mag lineage is the most diverse and abundant, with more than 30 families. In sharp contrast with this finding, a single family belonging to this lineage has been found in D. melanogaster, here reported for the first time in the literature, most probably consisting of old inactive elements. The CsRn1 lineage is also abundant in A. gambiae but almost absent from D. melanogaster. Conversely, the Osvaldo lineage has been detected in Drosophila but not in Anopheles. Comparison of structural characteristics of different families led to the identification of several lineage-specific features such as the primer-binding site (PBS), the gag-pol translational recoding signal (TRS), which is extraordinarily diverse within the Ty3/gypsy retrotransposons of A. gambiae, or the presence/absence of specific amino acid motifs. Interestingly, some of these characteristics, although in general well conserved within lineages, may have evolved independently in particular branches of the phylogenetic tree. We also show evidence of recent activity for around 75% of the families. Nevertheless, almost all families contain a high proportion of degenerate members and solitary LTRs (solo LTRs), indicative of a lower turnover rate of retrotransposons belonging to the Ty3/gypsy group in A. gambiae than in D. melanogaster. Finally, we have detected significant overrepresentations of insertions on the X chromosome versus autosomes and of putatively active insertions on euchromatin versus heterochromatin.

Key Words: Ty3/gypsy • retrotransposon • Anopheles gambiae • Drosophila melanogaster


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Eukaryotic transposable elements (TEs) often make up a substantial fraction of the host genome in which they reside. Thus, they constitute 4% to 6% of the euchromatic genome in Drosophila melanogaster (Kaminker et al. 2002), 16% in Anopheles gambiae (Holt et al. 2002), and 45% in humans (International Human Genome Sequence Consortium 2001).

With the availability of an increasing number of eukaryotic genomic sequences, a primary task in studies of transposon evolution is the characterization of the full transposon complement of sequenced genomes (Holmes 2002; Kaminker et al. 2002). The recently released genome of the Diptera A. gambiae (Holt et al. 2002) offers an extraordinary opportunity for comparative studies of TEs diversity and evolutionary dynamics between two related species, taking advantage of the existing information from D. melanogaster (Kaminker et al. 2002; Lerat, Rizzon, and Biémont 2003).

The most abundant type of TEs in Drosophila is the Ty3/gypsy group of long-terminal repeat (LTR) retrotransposons, also referred to as Metaviridae according to virus taxonomy (Boeke et al. 2000). Nine different lineages of this group have been so far identified in different organisms, based on the phylogenetic analysis of their reverse transcriptase (RT), ribonuclease H (RNaseH), and integrase (INT) amino acid domains (Malik and Eickbush 1999; Bae et al. 2001). However, so far, only six of them have been identified in insects, namely CsRn1, Gypsy, Mag, Mdg1, Mdg3, and Osvaldo. All but Mag have been previously detected in D. melanogaster (Bae et al. 2001; Kaminker et al. 2002; Kapitonov and Jurka 2003).

Our analysis of the Mdg1 lineage of A. gambiae revealed the existence of 10 different families, mainly consisting of degenerate copies and solitary LTRs (solo LTRs), although some of them also contain very recent, putatively active, insertions (Tubío, Costas, and Naveira 2004). Three additional Ty3/gypsy elements have been partially characterized previously; two of them (referred to as A. gambiae retrotransposon 1 and A. gambiae retrotransposon 2 [Volff et al. 2001]) belong to the Mag lineage, whereas the other, Ozymandias (Hill et al. 2001), has been assigned to the CsRn1 lineage (Tubío, Costas, and Naveira 2004). Here, we report our findings on the diversity of the Ty3/gypsy group of LTR-retrotransposons in A. gambiae. In addition to the recently published study focused on the non-LTR retrotransposons (Biedler and Tu 2003), this work represents an important step towards the characterization of the full set of TEs within the genome of the African malaria mosquito.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Genome Screening of TE Families of the Ty3/gypsy Group
TBlastN (Altschul et al. 1997) was used to search for sequences homologous to the pol region of representative elements of each lineage of the Ty3/gypsy group in the A. gambiae genome (Holt et al. 2002). Specifically, the query sequences included all the well-characterized elements from D. melanogaster as well as the CsRn1-like element within contig AE003787 (positions 212564 to 208162), the retrotransposons Mag from the Lepidoptera Bombyx mori, CsRn1 from the Trematoda Clonorchis sinensis, Ty3 from the yeast Saccharomyces cerevisiae, Sushi from the fish Fugu rubripes; Cyclops from the plant Vicia faba, Cer1 from the Nematoda Caenorhabditis elegans, and Osvaldo from Drosophila buzzatii. Those hits showing at least 30% amino acid identity over at least 80% of the length of the query sequence were subjected to further analyses, to identify both LTRs of each insertion by means of Blast 2 sequences (Tatusova and Madden 1999). Chromosomal locations of the different insertions were obtained from the A. gambiae section of the NCBI MapViewer (www.ncbi.nlm.nih.gov/mapview). Additional BlastN searches were performed using as queries those A. gambiae elements identified in this way. This reiterative process (namely, chromosomal allocation of new hits and additional searches) was continued until no new insertions were identified. Different insertions were initially assigned to the same family if they showed at least a stretch of 400 bp of the pol region with a pairwise identity of at least 90%. A consensus sequence for each TE family was then constructed by choosing the most frequent nucleotide at each position after manual alignment of the elements with the aid of BioEdit version 5.0.9 (Hall 1999). After construction of family consensus sequences, further refinement of family assignments was carried out by these rules. First, different insertions were assigned to the same family if they presented at least a contiguous stretch of 400 bp of the pol region with an identity of at least 90% with the family consensus sequence. Second, insertions were also included in a family if they had a pol region shorter than 400 bp but a gag region larger than 400 bp with an identity of at least 90% with the family consensus sequence. Finally, those insertions showing (1) pol regions shorter than 400 bp with at least 90% identity, (2) gag regions shorter than 400 bp with at least 90% identity, or (3) gag regions larger than 400 bp with an identity of 85% to 89%, were assigned to the family if they shared, additionally, a minimum of 90% nucleotide identity over at least three quarters the size of the consensus LTR sequence from that family. Solo LTRs were assigned to a specific family if they presented a homology of at least 90% to the consensus LTR sequence.

All the family consensus sequences first reported in this paper have been deposited in the A. gambiae section of Repbase Update (http://www.girinst.org/Repbase_Update.html [Jurka 2000]). They were named from GYPSY18_AG to GYPSY72_AG. The previously discovered element Ozymandias (Hill et al. 2001) and the elements A. gambiae retrotransposon 1 and A. gambiae retrotransposon 2 (Volff et al. 2001) have been renamed as GYPSY50_AG, GYPSY28_AG, and GYPSY55_AG, respectively, after its full characterization, according to Repbase terminology. The families for which no consensus could be obtained are reported in this paper with the name of the contig or scaffold where a representative sequence was identified.

Characterization of Insertions
Putative open reading frames (ORFs) were found by sorted three-frame translation of each TE insertion with the aid of BioEdit version 5.0.9. The primer-binding site (PBS) of each element was localized by searching the compilation of tRNA sequences of Sprinzl et al. (1999), using sliding windows of 9 bp at 1-bp steps as probes, starting –1 bp relative to the 5' LTR end. Individual insertions were considered putatively active if they contained intact ORFs (i.e., without any frameshift or nonsense mutation) and two nontruncated LTRs (i.e., LTRs without indels >10 bp, as compared with the consensus sequence). Those insertions with frameshift mutations, nonsense mutations, or truncated LTRs were classified as inactive insertions. Those insertions with unsequenced gaps but meeting the criteria to be regarded as putatively active based on the analysis of the available sequence were not assigned to a specific activity status. Those insertions bearing identity exclusively to the LTR of a family consensus sequence were considered solo LTRs. Average pairwise divergence between both LTRs from the same element copy and between different copies of the same family were obtained as the proportion of nucleotide differences with the aid of MEGA version 2.1 (Kumar et al. 2002), using the pairwise deletion option.

Multiple Sequence Alignments and Phylogenetic Analyses
Our phylogenetic analyses were based on the alignment of the seven amino acid domains of the RT defined by Xiong and Eickbush (1990) and the RNaseH and INT domains defined by Malik and Eickbush (1999). The general alignment, available as Supplementary Material online, was obtained in two steps. First, we generated an alignment for each one of the Ty3/gypsy lineages present in insects using the multiple-alignment mode of ClustalX (Thompson et al. 1997). Each one of the alignments included the consensus sequences of the A. gambiae elements of the lineage, the available representative sequences of D. melanogaster elements of the lineage, and representative sequences of all the lineages (Cer1, CsRn1, Cyclops, Gypsy, Mag, Mdg1, Mdg3, Osvaldo, and Ty3). Second, these different alignments were joined together manually, using as guide the representative sequences for each one of the Ty3/gypsy lineages, common to all the lineage-specific alignments, with the help of BioEdit. For the purpose of phylogenetic analyses, the amino acid motifs of the D. melanogaster insertions at genomic sequences AC016130 and AE003787, corresponding to elements belonging to the Mag and CsRn1 lineages, respectively, have been reconstructed by the introduction of gaps to compensate for frameshift mutations.

Phylogenetic relationships between different retrotransposons based on this general alignment were obtained both by distance (neighbor-joining [NJ]) and maximum-parsimony (MP) methods, as implemented in MEGA version 2.1, using the pairwise deletion option. The amino acid distances were computed using the Poisson correction for multiple substitutions and assuming equality of substitution rates among sites. In MP analyses, we searched for the best tree using the close-neighbor interchange, with default parameter values and random addition of sequences to produce the initial trees. In both MP and NJ analyses, bootstrapping was performed (1,000 replicates) to assess the support for each internal branch of the tree.

Statistical Analysis of the Distribution of Insertions in the X Chromosome Versus the Autosomes
The equiproportional hypothesis of Montgomery, Charlesworth, and Langley (1987) postulates that the turnover of insertions should occur at equal rates on the X chromosome and the autosomes. Under this hypothesis, the expected ratio of haploid mean copy number of any given family in the X chromosome and the autosomes (HX/HA) at equilibrium can be obtained by solving the quadratic in X = HX/HA, after assigning numerical values to the constants in equation (2) of Montgomery, Charlesworth, and Langley (1987), corrected after Langley et al. (1988). We followed the statements of Krzywinski et al. (2004) and assumed that the Y chromosome is entirely heterochromatic, that it constitutes 10% of the haploid genome of a male, and that 975 out of the 8,845 total unmapped scaffolds of the A. gambiae genome are most likely to be linked to this chromosome. We also followed Holt et al. (2002) for assumptions on the relative size of each chromosome. All unmapped scaffolds, amounting to roughly 44 Mb, were pooled into a separate category, conceptually equivalent to part of the "heterochromatin" in the models of Montgomery, Charlesworth, and Langley (1987). Finally, after assuming that transposition rates per copy per generation do not differ either between sexes or between heterochromatic and euchromatic insertions, a value of 8.0% was obtained for the expected proportion of elements on the X chromosome under this equiproportional hypothesis. Observed and expected frequencies were compared by means of {chi}2 tests.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Ty3/gypsy Families in the A. gambiae Genome
We have identified three known (Ozymandias, A. gambiae retrotransposon 1, and A. gambiae retrotransposon 2) and 127 putative novel families of retrotransposons belonging to the Ty3/gypsy group in the sequenced genome of A. gambiae, in addition to the 10 families from the Mdg1 lineage previously reported by our group (Tubío, Costas, and Naveira 2004). Sixty-three of these families, representing those cases where either it was possible to obtain a consensus sequence or there was at least one complete insertion sequence potentially capable of transposition, are characterized below. Those insertions not classified in any of these families are succinctly described in table 1 of Supplementary Material online.

Figure 1 shows the phylogenetic relationships of all the well-characterized families from D. melanogaster and A. gambiae belonging to the Ty3/gypsy group of LTR retrotransposons, based on the alignment of the conserved amino acid domains of RT, RNaseH, and INT; in addition to D. melanogaster elements belonging to the Mag (AC016130) and CsRn1 (AE003787) lineages, as well as representatives from other species of each one of the six lineages of the Ty3/gypsy group without well-characterized sequences in D. melanogaster genome.



View larger version (33K):
[in this window]
[in a new window]
 
FIG. 1.— Phylogenetic relationships between the Ty3/gypsy retrotransposons of D. melanogaster and A. gambiae inferred by the NJ method based on the conserved domains of RT, RNaseH, and INT. Representative sequences of lineages not found in D. melanogaster are also included (represented in italics). Vertical bars indicate those lineages found in A. gambiae. Bootstrap values (1,000 replications) of at least 75% supporting the clusters are shown above (NJ method) or below (MP method) the branches leading to them. Specific branches (or clusters defined by these branches) referred to in the text are marked by capital letters. Names of Anopheles elements are those from Repbase (but omitting hyphens for clarity) except for putatively active individual insertions without enough information to reconstruct family consensus sequences, which are named by their accession numbers. D. melanogaster insertions at genomic sequences AC016130 and AE003787 are denoted by their accession numbers followed by the letters DM.

 
The high bootstrap values (>95% in all cases) support an unambiguous assignment of all mosquito elements to each one of five different lineages, namely CsRn1, Gypsy, Mag, Mdg1, and Mdg3. The only exception is an A. gambiae family from the Mdg1 lineage (GYPSY54_AG, undetected in our previous work [Tubío, Costas, and Naveira 2004]) representing a very basal branch in the phylogenetic tree of this lineage. Although its clustering with members of the Mdg1 lineage is well supported by the NJ method (98% bootstrap value), this is not the case using MP (63% bootstrap value). In addition to the well-supported phylogenetic relationships clearly defining each lineage, all the elements of each one of the lineages have distinctive structural characteristics, presented in figure 2. There is also some structural variation within lineages. Thus, families from the CsRn1 lineage show two alternative translational recoding signals (TRSs), some families from the Gypsy lineage differ in the presence or absence of an additional ORF encoding the env protein, and a few families from the Mag lineage have a particular TRS (fig. 2).



View larger version (26K):
[in this window]
[in a new window]
 
FIG. 2.— Schematic diagram showing the main structural features of the different Ty3/gypsy retrotransposons of A. gambiae. Open boxes indicate ORFs. Gray boxes indicate noncoding regions. Black triangles indicate LTRs with the usual TG...CA termini. Gray triangles indicate LTRs with TG...AA termini. Arrows indicate the TSD, with the number of duplicated base pairs shown above. The CCHC and GPY/F motifs are indicated (not to scale), as well as the CHCC motif presented in members of the CsRn1 lineage instead of the usual CCHC motif (Bae et al. 2001). In all cases, the CCHC motif is present in two consecutive copies, as previously described for Mag from B. mori (Garel, Nony, and Prudhomme 1994). The tRNAs complementary to the PBS are indicated on the right side, followed by the number of families corresponding to each one of the structures and PBS. N. f. indicates that a tRNA complementary to the PBS has not been found. GYPSY has been abbreviated as G at the beginning of the Repbase family names, and the "_AG" at the end of the name has been suppressed to save space. The eight putatively active elements belonging to families without enough information to reconstruct family consensus sequences have been named according to their accession numbers, but excluding the AAAB0100 string at the beginning. Exact location of the insertions are AAAB01008880, 1172895 to 1167795; AAAB01008811, 153120 to 147785; AAAB01008986, 10714616 to 10709222; AAAB01008852, 172295 to 163857; AAAB01008851, 882036 to 887093; AAAB01008181, 18998 to 18820; AAAB01008967, 14102 to 19005; and AAAB01008445, 156 to 5229. If two ORFs overlap, a number within a box indicates the frameshift in base pairs. The word "stop" within a box indicates the gag stop codon that have to be read through to translate the ORF2. PT, protease; RT, reverse transcriptase; RNase, ribonuclease H; INT, integrase.

 
The CsRn1 Lineage
This lineage was first described in trematodes and has also been detected in D. melanogaster and A. gambiae (Bae et al. 2001; Hill et al. 2001). We have identified 21 putative families, although we have only been able to obtain a consensus sequence for seven of them because of the low number of insertions available for the other families.

In contrast to A. gambiae, the CsRn1 lineage appears to be poorly represented in the D. melanogaster genome. Our Blast searches to evaluate this observation revealed the existence of only one family (referred to as AE003787DM in the phylogeny) with five insertions in the fly genome: three solo LTRs (located at genomic scaffolds AE003522, from 83073 to 83278; AE003526, from 203102 to 203307; and AE003784, from 304622 to 304425), one partial insertion (genomic scaffold AE03843; from 326893 to 322718) and one complete insertion bearing inactivating mutations in the INT domain (genomic scaffold AE003787; from 212564 to 208162).

The Gypsy Lineage
We have identified 24 putative new families belonging to the Gypsy lineage, but we were only able to obtain a consensus sequence for nine of them. Neither of the previously described Anopheles elements belonging to this lineage (Afun1 from A. funestus and Aste11 from A. stephensi [Cook et al. 2000]) were identified in A. gambiae. As shown in figure 2, we have identified an env-like ORF3 in three of the nine families, conforming to a R-X2-R-X4-5-6-G-X3-K-X3-G-X2-D-X2-D rule, which is slightly different from the general pattern proposed as a specific probe for the in silico detection of insect endogenous retroviral envelop protein (Terzian, Pélisson, and Bucheton 2001). The 18 insertions of families GYPSY41_AG, GYPSY42_AG, and GYPSY43_AG, where a target-site duplication (TSD) could be identified, showed preferential insertion at ATAT sites. Two other families of the Gypsy lineage (GYPSY44_AG and GYPSY45_AG) also showed preferential insertion at C(G/T)CG, based on 12 individual members.

The Mag Lineage
Two A. gambiae elements of the Mag lineage had already been partially characterized (referred to as A. gambiae retrotransposon 1 and A. gambiae retrotransposon 2 [Volff et al. 2001]). We have identified 53 putative families in the A. gambiae genome belonging to this lineage, but it has only been possible to characterize in detail 30 of them, representing 48% of all the characterized Ty3/gypsy families in A. gambiae. A. gambiae retrotransposon 1 and A. gambiae retrotransposon 2 have been identified as members of families GYPSY28_AG and GYPSY55_AG, respectively.

So far, no Mag-like TEs had been identified in the genus Drosophila. To confirm the absence of the Mag lineage from Drosophila, we carried out Blast searches of the genome of D. melanogaster, using the pol region of different A. gambiae families as queries. This search led to the detection of an insertion within a 2R centromeric heterochromatin sequence (AC016130.13, unfinished sequence; pol region around nucleotide positions 89502 to 91566), most similar to elements from the Mag lineage. The insertion bears several inactivating mutations. Additional hits related to this element were identified in unfinished genomic sequences. Phylogenetic analyses revealed that this element and other Mag families cluster together with high bootstrap values, representing and old branch of the lineage (fig. 1).

The Mdg1 lineage
Most mosquito members of this lineage have been described elsewhere (Tubío, Costas, and Naveira 2004). Here, we show the existence of a basal member of this lineage (GYPSY54_AG). Six additional putative families related to this one had to be excluded from the analysis (table 1 in Supplementary Material online). All the other A. gambiae families of this lineage are more related to D. melanogaster families than to this novel family (fig. 1). Nevertheless, both the phylogenetic relationships (well-supported by bootstrap values in the case of NJ) and the structural characteristics of this family are consistent with its classification within the Mdg1 lineage. Namely, the element lacks a CCHC domain, contains a GPY/F domain, and the translation of the pol ORF requires a frameshift of –1 bp as in the remaining elements of the lineage (fig. 2 [Tubío, Costas, and Naveira 2004]), although we were not able to identify any tRNA complementary to the PBS. Blast searches failed to identify elements closely related to this one in other genomes.

The Mdg3 Lineage
We have identified 25 putative new families in the A. gambiae genome belonging to this lineage, but it has only been possible to offer a full description of 16 of them. No Mdg3 lineage elements had been described in A. gambiae before this work.

Analysis of Individual Insertions
Table 1 shows the total number of insertions, classified as putatively active insertions, inactive insertions, and solo LTRs, belonging to each one of the families, as well as the chromosomal distribution of all the insertions. The most abundant family is GYPSY50_AG, containing 28 members. Five additional families are constituted by more than 20 members. We have identified putatively active members for 47 of the 63 characterized families. Nevertheless, it must be pointed out that at least some of the unclassified members (those meeting the criteria to be considered active but with short stretches of unfinished sequence) are most likely to be active. For 71 of the 85 putatively active elements, belonging to 40 different families, the two LTRs are identical in sequence. In addition, all families present inactive members, which have accumulated several indels and/or nonsense mutations. In general, the number of putatively active members per family is lower than that of inactive members. It was possible to calculate the average pairwise identity between putatively active copies and between inactive copies for 17 families (table 2). In all but one case, the identity was higher between putatively active copies. This difference is highly significant (Student's t-test = 6.319, P < 0.001). The average pairwise identity between putatively active copies was higher than 99% in the 17 families.


View this table:
[in this window]
[in a new window]
 
Table 1 Total Number of Insertions According to Its Category and Chromosomal Location

 

View this table:
[in this window]
[in a new window]
 
Table 2 Average Pairwise Identity Between Putatively Active Insertions and Between Inactive Insertions of the Same Family

 
Assuming that most unmapped scaffolds are located in heterochromatin (as in Kaminker et al. [2002]), 89% of the putatively active elements (76/85) and 66% of the inactive elements (292/440) are inserted in euchromatin, representing a significant association between activity status and chromatin location ({chi}2 = 18.05, P < 0.001). We also checked for any biased distribution of insertions associated to particular chromosomes. The X chromosome represents 9.45% of the DNA in chromosome arms (Holt et al. 2002) but contains 12.7% of the located insertions ({chi}2 = 5.78, P < 0.016). However, this comparison does not take into account either the fact that X chromosomes are actually three quarters as numerous as any autosome in the population (A. gambiae males are hemizygous) or the contribution of Y-linked insertions to the pool of retrotranspositions. These two factors are conveniently addressed in the mathematical developments of the equiproportional hypothesis (Montgomery, Charlesworth, and Langley 1987; Langley et al. 1988), which produces a value of 8.0% for the expected proportion of elements on the X chromosome of A. gambiae (see last section of Materials and Methods). Observed frequencies were found to depart significantly from these expectations ({chi}2 = 14.02, P < 0.001), because an overrepresentation of insertions on the X chromosome.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Diversity and Characteristic Features of the Ty3/gypsy Group of LTR Retrotransposons Within the A. gambiae Genome
In our search of the A. gambiae genome, we have identified retrotransposon families belonging to five of the nine major lineages of the Ty3/gypsy group of LTR retrotransposons. The Osvaldo lineage has been detected in Drosophila but not in mosquito. The best hits in our Blast searches using Osvaldo elements as query correspond to elements from other lineages. Originally discovered in D. buzzatii (Pantazidis, Labrador, and Fontdevila 1999), Kapitonov and Jurka (2003) have recently identified Osvaldo-like elements in D. melanogaster. Three lineages (namely Mdg1, Gypsy, and Mdg3) are represented in the genome of both A. gambiae and D. melanogaster by several distinct families. These lineages show two contrasting tree topologies. On the one hand, the Mdg3 and Gypsy lineages contain several branches that comprise elements in both Drosophila and Anopheles, clear indication of an old diversification process before the split of Diptera. On the other hand, the Mdg1 lineage splits into two species-specific monophyletic groups, with the exception of the very basal family GYPSY54_AG. This fact, consistent with vertical transmission, strongly suggests that the main diversification of this lineage took place in parallel in both genomes after the divergence of flies and mosquitoes. The latest splits of Anopheles families, dated to 14 to 5 MYA (Tubío, Costas, and Naveira 2004), occurred in this lineage.

The other two mosquito lineages are almost absent from D. melanogaster but abundant in A. gambiae. The most extreme case is that of the Mag lineage. Thus, whereas we have identified for the first time a family of Mag-like elements in D. melanogaster, most probably consisting of old inactive members, A. gambiae contains at least 30 families, most of them putatively active, with an extraordinary structural diversity, accounting for approximately 48% of all the characterized Ty3/gypsy families in A. gambiae (fig. 2). Interestingly, 13 of the 30 families arose in a short period of evolutionary time (cluster J in figure 1). A similar situation occurs in the case of the CsRn1 lineage. We have characterized seven families belonging to this lineage in A. gambiae, whereas the D. melanogaster genome bears only a single family (Bae et al. 2001) with just two full-length members bearing inactivating mutations. In summary, our data reveal a rich diversity of LTR retrotransposons in A. gambiae, clearly greater than in D. melanogaster. A similar situation has been recently described in the case of non-LTR retrotransposons (Biedler and Tu 2003).

The seven new CsRn1 families characterized for the first time in this paper constitute a significant contribution to the total number of known families of this lineage, detected mainly in Trematoda (Bae et al. 2001; Copeland et al. 2003). This fact allows us to confirm the general distinctive characteristics of the CsRn1 lineage, such as a PBS complementary to tRNATrp, the unusual CHCC gag motif instead of the typical CCHC motif, and the existence of the GPY/F motif at the 3' end of the INT gene (Bae et al. 2001). In a similar way, each lineage shows distinctive features (or combination of features), as clearly shown in figure 2 from the A. gambiae representatives. Thus, members of the related lineages Gypsy and Mdg1 are characterized by a frameshift of –1 bp at the gag-pol boundary and the absence of the CCHC motif at the C-terminal end of gag. They differ in the presence (Mdg1) or absence (Gypsy) of the GPY/F motif. Members of the Mdg3 lineage are the only ones with a single gag-pol ORF bearing the CCHC gag motif and a GPY/F domain at the C-terminal end. Members of the Mag lineage also have the conventional CCHC gag motif, but they lack the GPY/F motif at the 3' end of the INT gene. In addition, it is the unique lineage that causes a TSD at the insertion site of 5 bp, instead of the typical 4 bp.

Several characteristics, although in general are conserved within members from the same lineage, evolved in some specific branches of the phylogenetic tree. One example of this is the PBS. Thus, four closely related Mdg3 families from A. gambiae (AAAB01008445, GYPSY37_AG, GYPSY38_AG, and GYPSY71_AG), clustered together with a strong bootstrap support (cluster F in figure 1), seem to shift from a PBS complementary to tRNALeu, common to all the other elements from the Mdg3 lineage, to another complementary to tRNAPr°. A similar situation has been detected in the case of the Mag-like elements GYPSY24_AG and GYPSY66_AG (cluster I in figure 1). These elements contain a PBS complementary to tRNALeu instead of one complementary to tRNASer as the other members of the Mag lineage, with the exception of GYPSY68_AG, which contains a PBS complementary to tRNAArg. Members of the Gypsy lineage may be further split into two groups, based on the presence of a PBS complementary to tRNASer or to tRNALys, as pointed out previously in the case of Drosophila elements (Terzian, Pélisson, and Bucheton 2001). The phylogenetic tree of figure 1 strongly suggests that the PBS complementary to tRNALys arose later in a specific branch of the tree (cluster A in figure 1). Interestingly, this acquisition predated the split of Diptera.

Another clear example of distinctive characteristics evolving at specific branches is the TRS at the gag-pol boundary. Thus, a wide variety of strategies have been identified within the Mag lineage. Although most elements contain a single ORF, there is one element presenting a –1 frameshifting (characteristic of other lineages) and a cluster of three elements (cluster H in figure 1) showing two nonoverlapping ORFs separated by more than 100 bp. A similar TRS has been previously observed in several plant retrotransposons, but the mechanism to express pol in these cases is not clear, although splicing, internal ribosomal entry, or a bypass mechanism have been suggested (Gao et al. 2003). Furthermore, there is a mosquito family of the Mag lineage characterized by a long ORF encoding all the protein domains with the exception of INT, which is encoded by a different overlapping ORF, requiring a frameshift of –2 to be translated. The reasons for this particular structure are unknown.

The CsRn1 lineage also shows different TRS. While some elements show a conventional –1 frameshifting, there is a cluster (cluster G in figure 1) that has a stop codon at the gag-pol boundary. Stop codon readthrough has been previously described in a few elements, such as the Kamikaze element from B. mori, the RIRE2 element from rice and several mammalian retroviruses (revised in Gao et al. [2003]). Interestingly, the LTRs termini of members of this cluster are TG...AA, instead of the expected TG...CA. Thus, all Drosophila retrotransposons have the TG...CA termini except those from the Gypsy lineage that show AG...YT at LTR ends (Kapitonov and Jurka 2003) and the Drosophila family from this lineage (AE003787) that shows TG...TA (Bae et al. 2001).

We have also identified two clear cases of acquisition of preferential insertion in specific sequences. Thus, those elements belonging to the cluster formed by GYPSY41_AG, GYPSY42_AG, and GYPSY43_AG (cluster C in figure 1) are inserted at ATAT sites and those belonging to the related families GYPSY44_AG and GYPSY45_AG are inserted at C(G/T)CG sites (cluster B in figure 1). This preferential insertion might play an important role in host-retrotransposon coevolution (SanMiguel et al. 1996; Voytas 1996).

Finally, the env ORF present in some members of the Gypsy lineage deserves more attention. It has been shown that this lineage has acquired its env gene from a class of insect baculoviruses early in its evolution (Malik, Henikoff, and Eickbush 2000). Later, a few Drosophila elements have lost the env gene, such as Burdock (Terzian, Pélisson, and Bucheton 2001). Our survey of elements of the Gypsy lineage in A. gambiae revealed the existence of nine families. Only three of them (namely GYPSY41_AG, GYPSY46_AG, and GYPSY47_AG) conserve the env gene. Taking into account the phylogenetic relationships shown in figure 1, this fact implies three independent losses of the env ORF during the evolution of these elements (branches B, D, and E in figure 1). The role of the env gene in the life cycle of the elements from the Gypsy lineage remains enigmatic. It has been shown that the env protein of Gypsy may confer infectious properties to the element (Song et al. 1994), leading to the suggestion of a mechanism for Gypsy mobilization through infection of the germline by retroviral particles produced in the follicle cells (Song et al. 1997). Nevertheless, amplification of Gypsy of D. melanogaster may occur in an env-independent manner in the female germline (Chalvet et al. 1999). In a similar way, the Drosophila Zam element, which contains an env ORF, enters the oocyte via the vitelline granule traffic with no apparent need for its env protein, after expression in follicle cells surrounding the oocyte (Leblanc et al. 2000).

Turnover of LTR Retrotransposons in A. gambiae
We have identified 47 families of the Ty3/gyspy group in mosquito containing putatively active elements (table 1). The average pairwise identity between the putatively active elements of each one of the families is always higher than 99% (table 2). Furthermore, 83.5% of the putatively active elements have identical flanking LTRs (table 1). Thus, the genome of the PEST strain of A. gambiae, the strain selected by the Anopheles genome project, presents clear evidence of recent activity for around 75% of the LTR retrotransposon families characterized in this work.

Lerat, Rizzon, and Biémont (2003) have recently shown that, in general, the TE families of D. melanogaster are characterized by a high degree of homogeneity and a lack of divergent elements. By contrast, all the families of the Ty3/gypsy group in A. gambiae contain a significant proportion of inactive degenerated elements, bearing indels and/or nonsense mutations and showing an average pairwise divergence significantly higher than that between active members (tables 1 and 2). Thus, there are around 40% of inactive degenerated elements within the sequenced genome of A. gambiae but only around 13% of active copies. Eighteen percent of the insertions (117/642) correspond to elements without obvious inactivating mutations but with unsequenced gaps, precluding their classification as either active or inactive. The significant overrepresentation of inactive elements within the unmapped scaffolds strongly indicates that heterochromatin is a shelter for these degenerate copies because of lower selection against inserted elements in heterochromatin. Bearing in mind that natural selection acts against inserted elements mainly because of insertional mutations and to chromosomal rearrangements generated by ectopic exchange between different insertions (Charlesworth and Langley 1989; Charlesworth, Sniegowski, and Stephan 1994), this lower selection is easily explained by the reduced gene density and recombination rates in heterochromatic regions. The high frequency of inactive copies is a strong indication of a slower turnover rate of Ty3/gypsy retrotransposons within the genome of A. gambiae than within that of D. melanogaster, most probably reflecting a reduced efficacy of selection against insertions of Ty3/gypsy retrotransposons in A. gambiae. The weakening of the efficacy of selection against TE insertions might be related to the complex genetic population structure of A. gambiae sensu stricto (della Torre et al. 2002). This species is composed of different isolated or semiisolated genetic units. There are different chromosomal and molecular forms showing incomplete premating barriers. This complex structure might give rise to a reduced effective population size and/or a reduced recombination rate, both features leading to a reduced efficiency of selection against TE insertions (Charlesworth and Langley 1989; Charlesworth, Sniegowski, and Stephan 1994).

In addition to the high frequency of inactive members, we have noted a strong excess of solo LTRs in A. gambiae in comparison with D. melanogaster, confirming our previous observation from the Mdg1 lineage (Tubío, Costas, and Naveira 2004). Thus, 176 of the 642 insertions (27%) identified in the present work correspond to solo LTRs versus 58 of 740 insertions (7.8%) reported by Kaminker et al. (2002) in D. melanogaster. This feature is in agreement to the above-mentioned slower turnover rate of retrotransposons in Anopheles. As a consequence, each individual insertion remains within the genome for a longer period of time, increasing the probability of exchange between the two LTRs flanking an element, giving rise to solo LTRs.

Finally, we have found a significant overrepresentation of insertions of LTR-retrotransposons on the X chromosome in comparison with the autosomes. Taking into account that A. gambiae exhibits comparable recombination frequencies in both sexes but males are hemizygous (Zheng et al. 1996), the overrepresentation of insertions on the X chromosome might be simply explained by a stronger selective pressure against autosomal insertions, because of their higher opportunity of ectopic recombination, according to theoretical expectations (Charlesworth and Langley 1989; Charlesworth, Sniegoski, and Stephan 1994). Nevertheless, the chromosomal distribution of TEs depends on a series of complex interacting factors in addition to recombination rates such as gene density, chromatin structure, transposition mechanisms, or interactions between TEs and host genes (Carr et al. 2002; Rizzon et al. 2002). Thus, another possibility to explain the underrepresentation of autosomal insertions might be, for instance, a lower transposition rate per copy per generation for overall male genomes. It is interesting to note that different transposition rates between sexes have been detected for specific elements (Pasyukova et al. 1997).


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
Supplementary table 1.

Supplementary Alignment. Text file in fasta format. Insertions not assigned to families.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 
J.M.C.T. was supported by a predoctoral fellowship from Xunta de Galicia (Spain) and FSE European funds. The authors like to thank E. Valadé for interesting discussions.


    Footnotes
 
Billie Swalla, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 References
 

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.[Abstract/Free Full Text]

    Bae, Y-A., S-Y. Moon, Y. Kong, S-Y. Cho, and M-G. Rhyu. 2001. CsRn1, a novel active retrotransposon in a parasitic trematode, Clonorchis sinensis, discloses a new phylogenetic clade of Ty3/gypsy-like LTR retrotransposons. Mol. Biol. Evol. 18:1474–1483.[Abstract/Free Full Text]

    Biedler, J., and Z. Tu. 2003. Non-LTR retrotransposons in the African malaria mosquito, Anopheles gambiae: unprecedented diversity and evidence of recent activity. Mol. Biol. Evol. 20:1811–1825.[Abstract/Free Full Text]

    Boeke, J. D., T. Eickbush, S. B. Sandmeyer, and D.F. Voytas. 2000. Family Metaviridae. Pp. 359–367 in M. Regenmortel, C. Fauquet, D. Bishop, eds. Virus taxonomy: classification and nomenclature of viruses. Academic Press, San Diego.

    Carr, M., J. R. Soloway, T. E. Robinson, and J. F. Brookfield. 2002. Mechanisms regulating the copy numbers of six LTR retrotransposons in the genome of Drosophila melanogaster. Chromosoma 110:511–518.[ISI][Medline]

    Chalvet, F., L. Teysset, C. Terzian, N. Prud'homme, P. Santamaria, A. Bucheton, and A. Pelisson. 1999. Proviral amplification of the Gypsy endogenous retrovirus of Drosophila melanogaster involves env-independent invasion of the female germline. EMBO J. 18:2659–2669.[Abstract/Free Full Text]

    Charlesworth, B., and C. H. Langley. 1989. The population genetics of Drosophila transposable elements. Annu. Rev. Genet. 23:251–287.[CrossRef][ISI][Medline]

    Charlesworth, B., P. Sniegowski, and W. Stephan. 1994. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371:215–220.[CrossRef][ISI][Medline]

    Cook, J. M., J. Martin, A. Lewin, R. E. Siden, and M. Tristem. 2000. Systematic screening of Anopheles mosquito genomes yields evidence for a mayor clade of Pao-like retrotransposons. Insect Mol. Biol. 9:109–117.[CrossRef][ISI][Medline]

    Copeland, C. S., P. J. Brindley, O. Heyers, S. F. Michael, D. A. Johnston, D. L. Williams, A. C. Ivens, and B. H. Kalinna. 2003. Boudicca, a retrovirus-like long terminal repeat retrotransposon from the genome of the human blood fluke Schistosoma mansoni. J. Virol. 77:6153–6166.[Abstract/Free Full Text]

    della Torre, A., C. Constantini, N. J. Besansky, A. Caccone, V. Petrarca, J. R. Powell, and M. Coluzzi. 2002. Speciation within Anopheles gambiae—the glass is half full. Science 298:115–117.[Abstract/Free Full Text]

    Gao, X., D. J. Rowley, X. Gai, and D. F. Voytas. 2003. Translational recoding signals between gag and pol in diverse LTR retrotransposons. RNA 9:1422–1430.[Abstract/Free Full Text]

    Garel, A., P. Nony, and J. C. Prudhomme. 1994. Structural features of mag, a gypsy-like retrotransposon of Bombyx mori, with unusual short terminal repeats. Genetica 93:125–137.[ISI][Medline]

    Hall, T. A. 1999. BioEdit: a use-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41:95–98.

    Hill, S. R., S. S. Leung, N. L. Quercia, D. Vasiliauskas, J. Yu, I. Pasic, D. Leung, A. Tran, and P. Romans. 2001. Ikiara insertions reveal five new Anopheles gambiae transposable elements in islands of repetitious sequence. J. Mol. Evol. 52:215–231.[ISI][Medline]

    Holmes, I. 2002. Transcendent elements: whole-genome transposon screens and open evolutionary questions. Genome Res. 12:1152–1155.[Free Full Text]

    Holt, R. A., G. M. Subramanian, A. Halpern et al. (126 co-authors). 2002. The genome sequence of the Malaria mosquito Anopheles gambiae. Science 298:129–149.[Abstract/Free Full Text]

    International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921.[CrossRef][ISI][Medline]

    Jurka, J. 2000. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16:418–420.[CrossRef][ISI][Medline]

    Kaminker, J. S., C. M. Bergman, B. Kronmiller et al. (12 co-authors). 1995. The transposable elements of Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol. 3:(research0084.1–0084.20).

    Kapitonov, V., and J. Jurka. 2003. Molecular paleontology of transposable elements in the Drosophila melanogaster genome. Proc. Natl. Acad. Sci. USA 100:6569–6574.[Abstract/Free Full Text]

    Krzywinski, J., D. R. Nusskern, M. K. Kern, and N. Besansky. 2004. Isolation and characterization of Y chromosome sequences from the African malaria mosquito Anopheles gambiae. Genetics 166:1291–1302.[Abstract/Free Full Text]

    Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics análisis software. Bioinformatics 17:1244–1245.[Abstract/Free Full Text]

    Langley, C. H., E. Montgomery, R. Hudson, N. Kaplan, and B. Charlesworth. 1988. On the role of unequal exchange in the containment of transposable element copy number. Genet. Res. 52:223–235.[ISI][Medline]

    Leblanc, P., S. Desset, F. Giorgi, A. R. Taddei, A. M Fausto, M. Mazzini, B. Dastugue, and C. Vaury. 2000. Lyfe cycle of an endogenous retrovirus, ZAM, in Drosophila melanogaster. J. Virol. 74:10658–10669.[Abstract/Free Full Text]

    Lerat, E., C. Rizzon, and C. Biémont. 2003. Sequence divergence within transposable element families in the Drosophila melanogaster genome. Genome Res. 13:1889–1896[Abstract/Free Full Text]

    Malik, H. S., and T. H. Eickbush. 1999. Modular evolution of the integrase domain in the Ty3/gypsy class of LTR retrotransposons. J. Virol. 73:5186–5190.[Abstract/Free Full Text]

    Malik, H. S., S. Henikoff, and T. H. Eickbush. 2000. Poised for contagion: evolutionary origins of the infectious abilities of invertebrate retroviruses. Genome Res. 10:1307–1318.[Abstract/Free Full Text]

    Montgomery, E., B. Charlesworth, and C. H. Langley. 1987. A test for the role of natural selection in the stabilization of transposable element copy number in a population of Drosophila melanogaster. Genet. Res. 49:31–41.[ISI][Medline]

    Pantazidis, A., M. Labrador, and A. Fontdevila. 1999. The retrotransposon Osvaldo from Drosophila buzzatii displays all structural features of a functional retrovirus. Mol. Biol. Evol. 16:909–921.[Abstract]

    Pasyukova, E., S. Nuzhdin, W. Li, and A. J. Flavell. 1997. Germ line transposition of the copia retrotransposon in Drosophila melanogaster is restricted to males by tissue-specific control of copia RNA levels. Mol. Gen. Genet. 255:115–124.[CrossRef][ISI][Medline]

    Rizzon, C., G. Marais, M. Gouy, and C. Biémont. 2002. Recombination rate and the distribution of transposable elements in the Drosophila melanogaster genome. Genome Res. 12:400–407.[Abstract/Free Full Text]

    SanMiguel, P., A. Tikhonov, Y-K. Jin et al. (12 co-authors). 1996. Nested retrotransposons in the intergenic regions of the maize genome. Science 274:765–768.[Abstract/Free Full Text]

    Song, S. U., T. Gerasinova, M. Kurkulos, J. D. Boeke, and V. G. Corces. 1994. An Env-like protein encoded by Drosophila retroelement: evidence that gypsy is an infectious retrovirus. Genes Dev. 8:2046–2057.[Abstract]

    Song, S. U., M. Kurkulos, J. D. Boeke, and V.G. Corces. 1997. Infection of the germ line by retroviral particles produced in the follicle ells: a possible mechanism for the mobilization of the gypsy retroelement of Drosophila. Development 124:2789–2798.[Abstract/Free Full Text]

    Sprinzl, M., Vassilenko, K. S., Emmerich, J., and F. Bauer. 1999. Compilation of tRNA sequences and sequences of tRNA genes. http://www.staff.uni-bayreuth.de/~btc914/search/index.html.

    Tatusova, T. A., and T. L. Madden. 1999. Blast 2 sequences—a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. 174:247–250.[CrossRef][ISI][Medline]

    Terzian, C., A. Pélisson, and A. Bucheton. 2001. Evolution and phylogeny of insect endogenous retroviruses. BMC Evol. Biol. 1:3.[CrossRef][Medline]

    Thompson, J. D., T. J. Gibson, K. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876–4882.[Abstract/Free Full Text]

    Tubío, J. M. C., J. C. Costas, and H. F. Naveira. 2004. Evolution of the mdg1 lineage of the Ty3/gypsy group of LTR retrotransposons in Anopheles gambiae. Gene 330:123–131.[CrossRef][ISI][Medline]

    Volff, J-N., C. Körting, J. Altschmied, J. Duschl, K. Sweeney, K. Wichert, A. Froschauer, and M. Schartl. 2001. Jule from the fish Xiphophorus is the first complete vertebrate Ty3/gypsy retrotransposon from the mag family. Mol. Biol. Evol. 18:101–111.[Abstract/Free Full Text]

    Voytas, D. F. 1996. Retroelements in genome organization. Science 274:737–738.[Free Full Text]

    Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9:3353–3362.[Abstract]

    Zheng, L., M. Q. Benedict, A. J. Cornel, F. H. Collins, and F. C. Kafatos. 1996. An integrated genetic map of the African human malaria vector mosquito, Anopheles gambiae. Genetics 143:941–952.[Abstract/Free Full Text]

Accepted for publication August 30, 2004.





This Article
Abstract
FREE Full Text (PDF)
Supplementary Material
All Versions of this Article:
22/1/29    most recent
msh251v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (1)
Request Permissions
Google Scholar
Articles by Tubío, J. M. C.
Articles by Costas, J.
PubMed
PubMed Citation
Articles by Tubío, J. M. C.
Articles by Costas, J.