New Regulatory Regions of Drosophila 412 Retrotransposable Element Generated by Recombination

Nathalie Mugnier, Christian Biémont and Cristina Vieira

Laboratoire de Biométrie et Biologie Evolutive, Université Claude Bernard Lyon, Villeurbanne Cedex, France

Correspondence: E-mail: vieira{at}biomserv.univ-lyon1.fr.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
There are no doubts that transposable elements (TEs) have greatly influenced genomes evolution. They have, however, evolved in different ways throughout mammals, plants, and invertebrates. In mammals they have been shown to be widely present but with low transposition activity; in plants they are responsible for large increases in genome size. In Drosophila, despite their low amount, transposition seems to be higher. Therefore, to understand how these elements have evolved in different genomes and how host genomes have proposed to go around them, are major questions on genome evolution. We analyzed sequences of the retrotransposable elements 412 in natural populations of the Drosophila simulans and D. melanogaster species that greatly differ in their amount of TEs. We identified new subfamilies of this element that were the result of mutation or insertion-deletion process, but also of interfamily recombinations. These new elements were well conserved in the D. simulans natural populations. The new regulatory regions produced by recombination could give rise to new elements able to overcome host control of transposition and, thus, become potential genome invaders.

Key Words: retrotransposable element 412 • Drosophila natural populations • regulatory region


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
From the data on sequenced eukaryotic genomes (human, Drosophila, nematode, and yeast) (The C. elegans Sequencing Consortium 1998; Adams et al. 2000; International Human Genome Sequencing Consortium 2003), it is obvious that transposable elements (TEs), which are increasingly thought to have had a strong evolutionary impact (McDonald et al. 1997; Bowen and Jordan 2002), are a major component of most genomes (Berg and Howe 1989). Among the transposable element families, the long-terminal repeat retrotransposons (LTR retrotransposons) are abundantly represented in most species. The transposition mechanism of these elements uses a reverse transcriptase and an RNA polymerase that lack proofreading repair activity. This allows errors to creep into the new DNA retrotransposed copies of the elements, in addition to new mutations that occur during the cell replication process (Katz and Skalka 1990). The level of nucleotide diversity, thus, increases between copies of these LTR retrotransposons over time, eventually more than for simply repeated sequences.

In retroviruses and LTR retrotransposons, the LTR sequences include several regulatory sequences necessary for the replication cycle. Despite having this important function in the regulatory mechanisms, the LTR regions are very rapidly evolving sequences (Arkhipova, Lyubomirskaya, and Ilyin 1995), a phenomenon attributed to having to cope with changing genomic environments (Ludwig et al. 2000). The LTR-ULR (ULR – untranslated leader region) regions of the LTR retrotransposable elements contain a high number of regulatory motifs (McDonald et al. 1997), which are involved in mechanisms such as the response to stress, as has been shown for the Tnt1 element in tobacco (Casacuberta et al. 1997) and the 1731 element in Drosophila (Faure, Best-Belpomme, and Champion 1996). This suggests that the LTR-ULR region may play an important role in TE expression in different genomic environments (Ludwig 2002), and could be the key for the successful invasion of a genome by a new TE.

The euchromatic insertion site number of the 412 retrotransposable element (Yuki et al. 1986) has a very peculiar distribution in natural populations of D. simulans: it follows a latitudinal cline, so that populations from East Africa have fewer insertion sites than do populations from the Northern Hemisphere (Vieira and Biémont 1996a). This cline may be the result of adaptating to environmental conditions, such as minimum temperature (Vieira et al. 1998), although no direct effect of these conditions on 412 expression has been demonstrated in the laboratory. The transposition rate for this 412 element was found to be around 10–3 in natural populations (Vieira and Biémont 1997), and its expression in gonads was found to be restricted to the testis and to be tightly controlled independently of copy number (Borie et al. 2002). In contrast, the somatic expression of 412 increases with copy number when this number is low but decreases for high copy number values, suggesting that there may be a silencing mechanism related to the number of copies (Borie, Loevenbruck, and Biémont 2000).

To find out more about the features of the 412 copies present in the natural populations with differing insertion site numbers, we carried out a detailed study of the regulatory regions of this element in natural populations of D. simulans and D. melanogaster. We identified two distinct subfamilies of regulatory sequences in D. simulans, whereas the sequences in the D. melanogaster populations were more homogeneous. The new regulatory regions in D. simulans were generated by the usual mechanisms of nucleotide substitutions and insertion/deletion, but also by interfamily recombination. These findings allow us to discuss the possible evolutionary history of 412 and its potential capacity to invade the Drosophila genome.

The nucleotide sequence data from this study have been submitted to GeneBank under accession numbers AY634189 to AY634215.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Drosophila Strains
Flies derived from natural populations (table 1) were maintained at 18°C in the laboratory as isofemale lines or small mass cultures with around 50 pairs in each generation. We worked with the strains of D. melanogaster from Reunion Island (Indian Ocean), Bolivia, Saint Cyprien (France), and Canton (China) and with strains of D. simulans from Amieu, Papeete and Moorea (Pacific Ocean), Makindu (Kenya), Moscow (Russia), Madagascar, and Canberra (Australia) described in Biémont et al. (2003).


View this table:
[in this window]
[in a new window]
 
Table 1 Populations and Natural Size Variants of the LTR-ULR Region of 412 Studied

 
PCR Amplification and Sequencing
Total genomic DNA was prepared from adult flies as described by Bender, Spierer, and Hogness (1983). Primers F412LTRULR (5'-TAT GTG CAT ATA TCG AGG GTA CA-3') and R412LTRULR (5'-TCT ACC AAG CAG ACG TTC GAA C-3') were designed from the 412 reference sequence (GenBank accession number X04132) and were used to amplify the specific 412 sequence extending from within the 5' LTR to the beginning of the gag gene (fig. 1). PCR conditions were as follows: 5 min at 94°C, 30 cycles of 1 min at 94°C, 1 min at 57°C, and 1 min at 72°C, and then 10 min at 72°C. PCR products were visualized after fractionation through 2% agarose gels by ethidium bromide staining. PCR of the genomic 412 sequences resulted in a pool of LTR-ULR products from multiple 412 elements in a genome. These PCR products were then subcloned into pDrive Cloning Vector DNA using the Quiagen PCR Cloning Kit (Quiagen). Individual subclones were PCR amplified as described above, and PCR products were fractionated through 2% agarose gel. For each population, we chose the size variants to be sequenced. Automated sequencing was carried out in the DTAMB sequencing facility of the University of Lyon. 412 LTR-ULR subclones were sequenced in both directions using either M13 forward or M13 reverse universal primers. The resulting chromatograms were aligned and compared using the SeqEd program (Applied Biosystems) to resolve any sequence ambiguity.



View larger version (15K):
[in this window]
[in a new window]
 
FIG. 1.— The structure and regulatory region of the 412 element. The regulatory region is about 1,800 bp and is composed of the 5' LTR and 5' ULR. The two small ORFs (sORF1 and sORF2) are putative ORFs. LTR, long terminal repeat; ULR, untranslated leader region; gag, group associated antigene; pol, polymerase domain; sORF, small open reading frame.

 
Sequence Analysis
Twenty-seven size variants of the LTR-ULR region have been sequenced from natural populations: nine from D. melanogaster and 18 from D. simulans (table 1). In addition to the size variants from natural populations, we analyzed the 26 regulatory regions of 412 complete copies from the sequenced genome of D. melanogaster (table 2). These 26 regulatory regions were retrieved from release 3.0 of the genome on the Web site of the Berkley Drosophila Genome project (fttp://www.fruitfly.org/) and were only found in the euchromatin portion of the genome. These copies were identified according to the following convention: chromosome arm_nucleotide position of the beginning of the sequence.


View this table:
[in this window]
[in a new window]
 
Table 2 Copies from the Sequenced Genome

 
Sequences were edited with the SEAVIEW sequence editor (Galtier, Gouy, and Gautier 1996). Size variants were aligned using ClustalX (Thompson et al. 1997). All phylogenetic trees and nucleotide diversity values were based on this alignment. Local alignment between two sequences was performed with the LFasta program (Pearson and Lipman 1988). A graphical representation of the LFasta data was produced using Lalnview software (Duret, Gasteiger, and Perrière 1996) and provided the percentage similarity of the aligned regions. We used this representation to illustrate particular events, such as recombination. Repeated sequences were detected with the Tandem Repeat Finder version 2.02 program (Benson 1999) and binding sites were dtected with the MatInspector program (Quandt et al. 1995). BlastN (Altschul et al. 1997) was used to make similarity searches between sequences or against GeneBank sequences.

Nucleotide Diversity
To detect differences in selective pressure along the regulatory region, we calculated the nucleotide diversity ({pi}) with the DNAsp version 3.51 program (Rozas and Rozas 1999). This analysis was performed on different groups of sequences (size-type variants, species, etc.). For a given group of sequences, {Pi} was calculated as the average number of nucleotide differences between two sequences randomly chosen (Li 1997):

where is the number of nucleotides that differ in sequences i and j, n is the number of sequences studied, and 1/(n(n–1)) is the number of possible pairs. Because the {Pi} value is a function of the length of the sequences, we used the standardized nucleotide diversity {pi} = {Pi}/L (Li 1997), where L is the length of the sequence. {pi}, the number of nucleotide differences per site between two randomly chosen sequences, is an estimation of the neutrality parameter, {theta} = 4Neµ. The number of segregating sites, k (Watterson 1975), is another estimation of {theta}. Under the neutrality evolution model, these two estimations are similar. The Tajima D statistic (Tajima 1989) tests neutrality for a group of sequences or for parts inside sequences performed with DNAsp such as:

with Tajima (1989) suggested using a ß distribution as an approximation of the distribution of D. A negative value of D means that sequences are subject to selective pressure against new mutations. Because this test is suitable for genes but not for transposable elements, we only used it to compare different parts (LTR, ULR, and sORFs) of the regulatory region within a same group of sequences. Gaps were not taken into account. Finally, for each sORF, we tested the selective pressure with a {chi}2 comparison of the synonymous and replacement substitutions on the amino acid sequences.

The differences between {pi} values were statistically evaluated by the standard deviation for nucleotide diversity provided by DnaSP. The differences were transformed into z scores, which are approximately normally distributed with infinite degree of freedom. The z score is the absolute value of the difference between {pi} divided by the square root of the sum of the squares of the standard deviations of {pi}:

Phylogenetic Analysis
Phylogenetic trees were obtained using the Phylo_win program (Galtier, Gouy, and Gautier 1996) by the Neighbor-Joining (Saitou and Nei 1987) method. We used this method, which does not require the hypothesis of a constant rate of evolution, because transposable elements can be subject to variable rates of substitution. The distances were corrected using Jukes and Cantor (1969), which can be applied to noncoding sequences (Li 1997). Bootstrap values were used to estimate the internal branch validity. The reconstruction was based on the LTR region, because there are no recombination events in this region and, thus, the phylogeny of the LTR can be used to reconstitute the evolutionary history of all the size variants. In addition, four regulatory regions from the sequenced genome have been integrated into the tree, assuming the sequenced genome to be like a population.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Size Variants and Nuclear Diversity
412 Copies from the Sequenced Genome of D. melanogaster
The general structure of the 412 transposable elements is shown in figure 1. Note the presence of two small open reading frames (sORF) in the ULR (the function of which is unknown) and of two repeated motifs. To get general idea of the degree of diversity of the different 412 copies, we first analyzed data from the sequenced genome of D. melanogaster. The 26 full-length 412 copies of the sequenced genome of D. melanogaster studied by Lerat, Rizzon, and Biémont (2003) have 99% to 100% sequence similarity. Detailed analysis of the regulatory regions of these copies indicates that variations in size have essentially resulted from repeats of two motifs: one 30-bp motif located in the LTR and one motif of about 60 bp located in the ULR, between the two sORFs (see figure 1 and table 3). We have named these motifs, the 30-bp LTR and the 60-bp ULR, respectively. We defined four types of regulatory region on the basis of the presence or absence of these two repeats (fig. 1). We also detected a small variant (3R_10928293), which resulted from a deletion of about 260 bp in the sORF1, and three larger variants (3L_1028125, 3L_1200276, and X_21318687), which had an insertion of 89 bp just before the end of the LTRs. These three copies all have an LTR (about 553 bp for 3L_1028125 and 517 bp for the two other copies).


View this table:
[in this window]
[in a new window]
 
Table 3 Copies of the 412 Size Variants from the Natural Populations of D. melanogaster and D. simulans

 
The nucleotide diversity index, {pi} (Li 1997) was calculated for different parts of the regulatory region (table 4). This index, which corresponds to the average number of polymorphic sites for a group of sequences, was calculated as the mean of the polymorphic site numbers between all the sequences taken in pairs. We observed that the 5' LTRs ({pi} = 0.0084) of the 412 copies of the sequenced genome were statistically significantly (z = 7.206) more divergent than their ULRs ({pi} = 0.0011) and that within the ULR, the sORFs were better conserved ({pi} = 0.0007 for sORF1 and 0.0002 for sORF2).


View this table:
[in this window]
[in a new window]
 
Table 4 Nucleotide Diversity Index {pi} of the LTR-ULR of the 412 Elements from Natural Populations

 
412 Copies in Natural Populations
Table 1 shows the number of size variants for the regulatory region we found in natural populations, all the sequences obtained are presented in table 3, and figure 2a summarizes the type of regulatory regions obtained from natural populations of D. melanogaster. Seven of the nine sequences obtained were similar to the main types defined above from the sequenced genome (fig. 1 and table 3). They had the 30-bp LTR and 60-bp ULR repeats. Their levels of nucleotide diversity were higher than the levels of the 412 sequences from the sequenced genome (table 4). Of the two other size variants, one (DmCyp2, 1,265 bp), found only in the Saint Cyprien population, resulted from a deletion of the sORF1, and the other (DmR3, 1,514 bp), found in the Reunion Island population, resulted from a deletion in the ULR plus an insertion just before the gag gene. This last variant diverged in sequence from the others, and a Blast search against the whole D. melanogaster sequenced genome identified it as a heterochromatic copy (fig. 3). DmR3 was more than 90% similar to a heterochromatic 412 copy (locus AE_003306_1) present in the sequenced genome of D. melanogaster. This finding suggests that DmR3 is an old heterochromatic 412 copy that has accumulated many substitutions. We, therefore, used this element to root the phylogenetic tree described below.



View larger version (44K):
[in this window]
[in a new window]
 
FIG. 2.— Structure of the regulatory region of the 412 elements from natural populations. (a) Structure of the regulatory region of the 412 elements detected in D. melanogaster natural populations. (b) Structure of the regulatory region of the 412 elements detected in D. simulans natural populations.

 


View larger version (23K):
[in this window]
[in a new window]
 
FIG. 3.— LALNVIEW (Duret, Gasteiger, and Perrière 1996) representation of the LFasta (Pearson and Lipman 1988) results between the DmR3 size variant and (a) a consensus of the other size variants found in D. melanogaster natural populations and (b) the heterochromatic element found in the sequenced genome of D. melanogaster.

 
In the natural populations of D. simulans, besides the common variants found in D. melanogaster, we also found other size variants (fig. 2b and table 3) resulting from several deletion events. Among these deleted variants, we distinguished two groups with distinct structural features: the first group contained 412 sequences of around 1,600 bp and resulted from several small deletions, whereas the second group contained composite elements of around 1,300 bp (see figure 2b and table 3). The sequences of the 1,600-bp group all presented a similar pattern of deletion and were the only sequences to have a deletion within their LTRs. We found these elements in four populations of D. simulans from Canberra, Papeete, Madagascar, and Makindu. The Canberra variant (DsCb1, 1,665 bp) was the only variant of the 1,600-bp group with no deletion in the 5' LTR. The elements of the 1,300-bp group were detected in populations of D. simulans from Canberra, Moorea, Amieu, Madagascar, Moscow, and Makindu. These elements resulted from a recombination between a 412 element and an mdg1 element, as shown in figure 4. We described these as "composite elements." These composite elements had a regulatory region with a 412-type LTR followed by the ULR of mdg1, and they terminated in the 412-element sequence. This type of element has never previously been reported in D. melanogaster, and we did not observe either in our study.



View larger version (21K):
[in this window]
[in a new window]
 
FIG. 4.— LALNVIEW (Duret, Gasteiger, and Perrière 1996) representation of the LFasta (Pearson and Lipman 1988) data explaining the recombination event of the composite elements. (a) Level of similarity between the canonical sequence of the regulatory region of the 412 element and a consensus of the regulatory region of the composite elements. (b) level of similarity between the canonical sequence of the regulatory region of the mdg1 element and a consensus of regulatory region of the composite elements.

 
The nucleotide diversity ({pi}) for all 412 variants from D. simulans was around 0.128 but was 0.017 for the sequences from the natural populations of D. melanogaster (table 4). The higher {pi} values in D. simulans were mainly caused by the high sequence divergence of the ULRs of the composite elements (see table 2). We did not detect any significant difference between LTRs and ULRs in D. simulans variants analyzed by type of element (table 4).

Regulatory Motifs in the LTR-ULR Regions
We looked for potential regulatory biding sites on the different natural 412 variants using MatInspector software. The results revealed a large number of motifs with high similarity scores. These motifs correspond to ecdysone sensitive sites (Thummel, Burtis, and Hogness 1990; Von Kalm et al. 1994), binding sites for homeodomain proteins (Wilson et al. 1996), or heat-shock proteins (Fernandes, Xiao, and Lis 1994) and MAR (matrix attachment region) binding sites (Dickinson, Dickinson, and Kohwi-Shigematsu 1997). The potential regulatory motifs were distributed throughout the regulatory region, although many of them were located within the ULR, as in the copia LTR retrotransposon (Jordan and McDonald 1998b). The 30-bp LTR motif did not display any known binding sites (see table S1 in Supplementary Material online).

Neutrality Test
Selection of the regulatory region was estimated by the Tajima (1989) test of neutrality, which compares different estimates of the neutral mutation parameter, {theta}. This parameter was calculated for different groups of sequences and on the LTR and sORFs of the regulatory region when enough sequences were available for the test to be valid (table 5). This test was only used to compare the different parts of a same group of sequences and not to compare the different species with each other. No significant deviation from the neutral model was detected for the D. simulans sequences of any group, even though a negative value was always obtained, suggesting the existence of some kind of purifying selection. In contrast, the ULR region of the D. melanogaster sequences deviated significantly from neutrality (D = –1.9982, P < 0.01), whereas the LTR did not. These findings suggest that a greater functional constraint was operating in the ULR region than in the LTR of the 412 element of D. melanogaster.


View this table:
[in this window]
[in a new window]
 
Table 5 Tajima Test of Neutrality

 
To analyze the coding regions (sORFs) we performed {chi}2 comparisons of the synonymous and nonsynonymous mutation distributions (table 6), using only sequences of D. melanogaster and D. simulans with a full sORF. The test was significant for sORF1 ({chi}2 = 10.150, P < 0.01), suggesting that there was a selective pressure against replacement substitutions. No significant deviation from the neutrality model of substitution was detected for the sORF2. It is possible that D. simulans sequences interfered with these tests, because no selective pressure had previously been detected for this group of sequences, and we could not test for the D. simulans group separately, because there are not enough sequences with a complete sORF. We did not test for the sORF2 of D. melanogaster, either, because these sequences were highly conserved (only two mutations) and the high selective pressure on the sORF2 in this species was, therefore, obvious.


View this table:
[in this window]
[in a new window]
 
Table 6 Amount of Polymorphic and Fixed Sites Found on the sORFs of the ULR of the 412 Elements

 
Phylogenetic Reconstruction
We analyzed the phylogenetic relationship between the 412 size variants by the Neighbor-Joining (Saitou and Nei 1987) method. This phylogenic reconstruction was carried out on the LTR, which was the least deleted region, to include as many sequences in the analysis as possible. The Phylo_win software provides unrooted trees. We found two divergent elements, DmR3 and DsA2; DmR3 was an heterochromatic copy, as suggested above. We, therefore, supposed that these two divergent elements must be old copies that have accumulated numerous substitutions. We used these two copies to root the trees.

The phylogenetic analysis of the LTRs shown in figure 5 was based on the alignment of 335 nucleic sites. This analysis identified two distinct clusters (bootstrap value: 87%) that we named "subfamilies." The first subfamily was a monophyletic group composed of all the D. melanogaster variants plus the full-length variants of D. simulans (subfamily 1, see caption of figure 5). However, the low bootstrap values did not allow us to distinguish between sequences from the two species. The 412 size variants from the sequenced genome were clustered with the natural population D. melanogaster size variants. The second subfamily contains the composite elements, DsMoo2, DsCb2, DsMak3, DsA3, DsMad1, and DsMos2 (indicated by circles on the phylogenic tree [fig. 5]) and the four 1,600-bp size variants, DsCb1, DsMad2, DsP1, and DsMak3 (indicated by stars on the phylogenic tree [fig. 5]), all of which belonging to the D. simulans species (subfamily 2, see caption of figure 5). This above group was not supported by a high boostrap value, but the structural features of the subfamily 2 sequences clearly separates them from the full-length elements and suggest at least two distinct events in the evolution of the 412 regulatory region in D. simulans, such as deletion and recombination. Note that the DsCb1 1,600-bp variant (indicated by an arrow on the phylogenic tree [fig. 5]) was clustered with the composite element because of its full-length LTRs, which is typical of composite elements. The phylogenetic reconstruction made on the sORF1 region clustered this DsCb1 variant with the 1,600-bp sequences (data not shown). This suggests that a recombination event could have taken place between composite elements and 1,600-bp elements in the Canberra population, but we must be careful with this interpretation, knowing the low levels of bootstrap support.



View larger version (24K):
[in this window]
[in a new window]
 
FIG. 5.— Phylogenetic reconstruction based on the 5' LTR with the method of Neighbor-Joining and the distance of Jukes and Cantor (1969). The DsCb1 size variant is indicated by an arrow. Only bootstrap values higher than 70% are reported. Subfamily 1 = the full-length variants (DsMoo1, DsMoo3, DsP2, DsP3, DsMos1, DsMad3, DmCant2, DsA1, DmBol1, DmCant3, DmR2, DmR1, DmCyp2, and DmCyp1) and the sequences from the sequenced genome (3R_13335013, 3L_4309413, 3L_9035692, and 2R_18957922). The type of full-length variant (number of the 30-bp LTR motif and number of the 60-bp ULR motif) follows the sequence name. Subfamily 2 = the 1,600-bp–type elements (DsP1, DsMak1, DsMad2, and DsCb1), indicated by a star, and the composite elements (DsCb2, DsMak3, DsA3, DsMad1, DsMoo2, and DsMos2), indicated by a black circle.

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
D. melanogaster and D. simulans Size Variants
We detected different regulatory regions for the retrotransposon 412 in the natural populations dispersed throughout the world. The regulatory regions identified differed in size and in sequence, but we did not detect all of them in the two species studied, D. melanogaster and D. simulans. With the exception of the deleted variant in the population from St. Cyprien and the heterochromatic DmR3 variant, all the variants identified in D. melanogaster were highly similar in sequence and their size variation resulted from the two repeats (designated 30-bp LTR and 60-bp ULR, respectively). This sequence conservation suggests that all these elements, except DmR3, resulted from recent transposition events, as suggested by an analysis of the transposable element copies from the sequenced genome of D. melanogaster (Bowen and McDonald 2001; Lerat et al. 2003).

In contrast to what is observed in D. melanogaster, the elements described in D. simulans are variable in both sequence and structure, as reported by Cizeron and Biémont (1999) who used restriction enzymes to analyze 412 copies in genomes from natural populations. In D. simulans populations, we found deleted regulatory regions that were not present in the D. melanogaster populations, suggesting that these deleted elements have appeared in D. simulans after the speciation event or that D. melanogaster has lost them. Elements could be deleted as a result of the high rate of loss through deletion of DNA sequences, as reported in the Drosophila genus (Petrov, Lozovskaya, and Hartl 1996 [but see Gregory {2004}]), and this process would be expected to be stronger in D. simulans than in D. melanogaster because D. simulans has a smaller genome than D. melanogaster and, therefore, contains fewer transposable elements (Vieira and Biémont 1996b; Vieira and Biémont 2004). However, the presence of highly divergent copies by substitution in D. simulans suggests either that this species has powerful mechanisms to limit the transposable element copy number or that the species has not yet been invaded by transposable elements, in contrast to what happened long ago to D. melanogaster, which displays numerous transposable element copies with a high level of sequence similarity and a few highly divergent copies (Lerat, Rizzon, and Biémont 2003). Furthermore, we identified two groups of regulatory regions specific to D. simulans, the composite elements and the 1,600-bp elements. These two types of regulatory region are distinct from the more usual type we observed in D. melanogaster, but sequences are conserved within each group. This could result from the recent transmission of these groups between the populations studied.

Regulatory Motifs and Evolution
The LTR-ULR region contains various regulatory motifs that could be involved in 412 expression. We found several different binding sites (Micard et al. 1988; Brookman et al. 1992; Vasilyeva, Bubenshchikova, and Ratner 1999), ecdysone-inducible motifs (Von Kalm et al. 1994; Thummel, Burtis, and Hogness 1990), homeodomains (Wilson et al. 1996), and heat shock elements (Fernandes, Xiao, and Lis 1994). These transcriptional regulatory motifs are more numerous in the ULR region, as found for the copia regulatory region (Jordan and McDonald 1998b). The Tajima test and nucleotide diversity index indicate that the ULR region is under selective pressure for D. melanogaster 412 elements but not for D. simulans.

No known regulatory motifs were detected on the 30-pb LTR repeat. Numerous LTR retrotransposons, such as copia (Jordan and McDonald 1998a), present repeats of about 30 bp in their LTRs, which possess regulatory motifs thought to increase the promoter activity (Matyunina, Jordan, and McDonald 1996). In the case of 412, if we assume that the sORFs are translated, it is tempting to suggest that sORFs encode regulatory proteins and can bind unknown motifs in these repeats. This is supported by the discovery of small Rev-like regulatory proteins in the 3' ULR of Cer-7 LTR retrotransposon (Bowen and McDonald 1999). Other possible regulatory mechanisms can also be suggested; for example, translational regulation by mRNA secondary structure, as has been shown for retroelements (Berkhout 2000).

The statistical tests suggest that there is more constraint in the ULR region than in the LTR, but this phenomenon is less clear for sequences from D. simulans. The Tajima tests of neutrality provide values that were negative, but not significantly so, for the different groups of D. simulans sequences. Two hypotheses could account for this finding: either there are some inactive sequences that swamp the information, or selective pressure must have operated in the past that can no longer be detected. Further experiments are required to decide which hypothesis is correct.

Evolution of the 412 Regulatory Regions
Evolution History
Our analysis of the size and sequence variation of the regulatory region of the 412 element in natural populations of D. melanogaster and D. simulans reveals the existence of two subfamilies for the regulatory region. The phylogenetic analysis discriminated between these two subfamilies, which cluster distinct structural size variants. The second subfamily is composed of the six composite elements plus the four 1,600-bp elements of D. simulans, whereas the first subfamily clusters all the other sequences except the two highly divergent variants, DmR3 and DsA2 (fig. 5). These findings are in agreement with the data on the LTR retrotransposons, Retrolyc1, Blood, and Tnt1 (Vernhettes, Grandbastien, and Casacuberta 1998; Araujo et al. 2001; Costas, Valadé, and Naveira 2001a), for which subfamilies were observed for the regulatory region, particularly for the U3 region of the LTRs. This suggests that the differentiation and diversification of the regulatory region of the LTR retrotransposons are major factors in the evolution of these elements. Transposable elements belonging to the same family can, thus, acquire different patterns of expression by divergence of their regulatory region, leading to the emergence of new subfamilies (Nilsson and Bohm 1994; Beguiristain et al. 2001; Costas, Valadé, and Naveira 2001a). This implies that these subfamilies can be expressed in different tissues and at different stages of development. After differentiation of their expression, the subfamilies can evolve independently and later create distinct families. We also found composite elements in D. simulans that probably result from a recombination event between two families of transposable elements, 412 and mdg1. This illustrates another mechanism of evolution of transposable elements, known as mosaic evolution (Maisonhaute and Capy 2001). A recombination event between the regulatory regions of two elements could be a quick and efficient way of changing the expression pattern and the regulatory interacting system, allowing new transposable element "mutants" to evade regulation by the host.

The 412-mdg1 composite element detected in D. simulans is striking because the mdg1 element is present in the heterochromatin but almost entirely absent in the euchromatic portion of the genome in most populations of this species (Vieira et al. 1999), whereas it is present in many copies in the chromosome arms of D. melanogaster (20.8 ± 4.0 copies [Vieira et al. 1999]). If the recombination event had occurred in D. simulans, it may have involved copies of mdg1 located in heterochromatin. Another possible explanation involves a recombination event between 412 and mdg1 RNAs in the capsid during the reverse transcription step, as observed for retroviruses (Gao et al. 1998). However, this hypothesis suggests that transcription of heterochromatic copies has occurred, which is unlikely unless full-length copies already existed within the heterochromatin (Kogan et al. 2003; Sun et al. 2003).

412 Invasion in D. simulans
The Makindu and Canberra populations have only two 412 size variant types from subfamily 2: a composite element and a 1,600-bp–type element. These two populations are, however, quite different with regard to their 412 copy number (Vieira et al. 1999). We found that Makindu had one to four euchromatic copies, whereas Canberra had more than 64 copies. Because the 1,600-bp–type variant, DsCb1, from the Canberra population was the only variant of that 1,600-bp–type that had an undeleted LTR, this variant could be an aggressive copy. Because the phylogenetic tree based on the 5' LTR clustered DsCb1 with the composite elements with a bootstrap value around 70% (fig. 5), this variant could be the result of a recombination between a 1,600-bp–type variant (such as DsMak1) and a composite element (such as DsCb2). As a result, DsCb1 had a 1,600-bp–type 5' ULR and a full-length, composite-type 5' LTR. The question of whether this specific regulatory region was responsible for the high copy number of 412 in this population warrants further investigation. However, we cannot exclude the possibility that other aggressive size variants do in fact exist in Canberra but were not detected.

Such recombination events are similar of the typical situation in retroviruses (Sharp et al. 1999), where the step of reverse transcription occurs in the viral capsid, which contains two RNA genomes. If two distinct subtypes of retroviruses are present within the same capsid, recombination events between the two viral genomes can occur. The secondary structure of the RNA could be involved in such recombination (Moumen et al. 2003), especially near the primer binding site (PBS). Our results suggest that the 412 retrotransposon follows a retroviral-like evolution strategy in the D. simulans species, except that it does not leave the cell.

Many recombination events are known to occur in viruses during the step of reverse transcription between RNA genomes (Hu and Temin 1990), and such a phenomenon has also been reported in D. melanogaster. For example, in the mdg1 lineage, stalker is thought to result from recombination events between the ancestral family of what are now elements 412, mdg1, pilgrim, and blood (Costas, Valadé, and Naveira 2001b). Malik and Eickbush (1999) assigned 412 and mdg1 to the mdg1 lineage, along with the elements pilgrim, blood, and stalker. This lineage seems to be subject to recombination events between its families. Costas, Valadé, and. Naveira (2001b) have also described stalker as a composite element resulting from recombination events between the other families of this lineage. Recombination appears to be a very important and common way of creating new TE families in the natural populations of D. simulans. The new regulatory regions produced by a recombination process can give rise to new elements that may be able to overcome the host control of transposition and are, therefore, potential genomic invaders. More studies of these elements are necessary to elucidate their distribution, structure, and expression.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
This work was funded by the Centre National de la Recherche Scientifique (UMR 5558, GDR 2157). We thank M.L. Perez and C. Jarrin for technical help, K. Jordan, J. McDonald, E. Lerat, and D. Mouchiroud for their helpful comments, and M. Ghosh for revising the English text.


    Footnotes
 
Pierre Capy, Associate Editor


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Adams, M. D., S. E. Celniker, R. A. Holt et al. (170 co-authors). 2000. The genome sequence of Drosophila melanogaster. Science 287:2185–2195.[Abstract/Free Full Text]

    Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acid Res. 25:3389–33402.[Abstract/Free Full Text]

    Araujo, P. G., J. M. Casacuberta, A. P. P. Costa, R. Y. Hashimoto, M. A. Grandbastien, and M. A. Van Sluys. 2001. Retrolyc1 subfamilies defined by different U3 regulatory régions in the Lycopersicon genus. Mol. Genet. Genomics 266:35–41.[CrossRef][ISI][Medline]

    Arkhipova, I. R., N. V. Lyubomirskaya, and Y. V. Ilyin. 1995. Drosophila retrotransposons. Springe, Berlin.

    Beguiristain, T., M. A. Grandbastien, P. Puigdomènech, and J. M. Casacuberta. 2001. Three Tnt1 subfamilies show different stress-associated patterns of expression in tobacco: consequences for retrotransposon control and evolution in plants. Plant Physiol. 127:212–221.[Abstract/Free Full Text]

    Bender, W., P. Spierer, and D. S. Hogness. 1983. Chromosomal walking and jumping to isolate DNA from the Ace and rosy loci and the bithorax complex in Drosophila melanogaster. J. Mol. Biol. 168:17–33.[ISI][Medline]

    Benson, G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27:573–580.[Abstract/Free Full Text]

    Berg, D. E., and M. M. Howe. 1989. Mobile DNA. American Society for Microbiology, Washington, DC.

    Berkhout, B. 2000. Multiple biological roles associated with the repeat (R) region of the HIV-1 RNA genome. Adv. Pharmacol. 127:212–221.

    Biémont, C., C. Nardon, G. Deceliere, D. Lepetit, C. Loevenbruck, and C. Vieira. 2003. Worldwide distribution of transposable element copy number in natural populations of Drosophila simulans. Evolution 57:159–167.[ISI][Medline]

    Borie, N., C. Loevenbruck, and C. Biémont. 2000. Developmental expression of the 412 retrotranspon in natural populations of D. melanogaster and D. simulans. Genet. Res. 76:217–226.[CrossRef][ISI][Medline]

    Borie, N., C. Maisonhaute, S. Sarrazin, C. Loevenbruck, and C. Biémont. 2002. Tissue-specificity of 412 retrotransposon expression in Drosophila simulans and D. melanogaster. Heredity 89:247–252.[CrossRef][ISI][Medline]

    Brookman, J. J., A. T. Toosy, L. S. Shashidhara, and R. A. H. White. 1992. The 412 retrotransposon and the development of the gonadal mesoderm in Drosophila. Development 116:1185–1192.[Abstract/Free Full Text]

    Bowen, N. J., and I. K. Jordan. 2002. Transposable element and the evolution of eukaryotic complexity. Curr. Issues Mol. Biol. 4:65–76.[Medline]

    Bowen, N. J., and J. F. McDonald. 1999. Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-likeelements. Genome Res. 9:924–935.[Abstract/Free Full Text]

    ———. 2001. Drosophila euchromatic LTR retrotransposons are much younger than the host species in which they reside. Genome Res. 11:1527–1540.[Abstract/Free Full Text]

    Casacuberta, J. M., S. Vernhettes, C. Audeon, and M. A. Grandbastien. 1997. Quasispecies in retrotransposons: a role for sequence variability in Tnt1 evolution. Genetica 100:109–117.[CrossRef][ISI][Medline]

    Cizeron, G., and C. Biémont. 1999. Polymorphism in structure of the retrotransposable element 412 in Drosophila simulans and Drosophila melanogaster populations. Gene 232:183–190.[CrossRef][ISI][Medline]

    Costas, J., E. Valadé, and H. Naveira. 2001a. Amplification and phylogenetic relationships of a subfamily of blood, a retrotransposable element of Drosophila. J. Mol. Evol. 52:342–350.[ISI][Medline]

    ———. 2001b. Structural features of the lineage of the Ty3/Gypsy group of LTR retrotransposons inferred from the phylogenetic analyses of its open reading frames. J. Mol. Evol. 53:168–171.

    Dickinson, L. A., C. D. Dickinson, and T. Kohwi-Shigematsu. 1997. An atypical homeodomain in SATB1 promotes specific recognition of the key structural element in a matrix attachment region. J. Biol. Chem. 27:11463–11470.[CrossRef]

    Duret, L., E. Gasteiger, and G. Perrière. 1996. LALNVIEW: a graphical viewer for pairwise sequence alignment. Comput. Appl. Biosci. 12:507–510.[Abstract]

    Faure, E., M. Best-Belpomme, and S. Champion. 1996. Upregulation of the Drosophila 1731 retrotransposon long-terminal repeat by UV-B irradiation requires a short sequence in U3 region. Arch. Biochem. Biophys. 326:219–226.[CrossRef][ISI][Medline]

    Fernandes, M., H. Xiao, and J. T. Lis. 1994. Fine structure analyses of the Drosophila heat shock factor-heat shock element interaction. Nucleic Acids Res. 22:167–173.[Abstract]

    Galtier, N., M. Gouy, and C. Gautier. 1996. SEAVIEW and PHYLO_WIN : two graphic tools for sequence alignement and molecular phylogeny. Comput. Appl. Biosci. 12:543–548.[Abstract]

    Gao, F., D. L. Robertson, C. D. Carruthers et al. (13 co-authors). 1998. An isolate of human immunodeficiency virus type 1 originally classified as subtype I represents a complex mosaic comprising tree different group M subtypes (A, G and I). J. Virol. 72:10234–10241.[Abstract/Free Full Text]

    Gregory, T. R. 2004. Insertion-deletion biases and the evolution of genome size. Gene 324:15–34.[CrossRef][ISI][Medline]

    Hu, W. S., and H. M. Temin. 1990. Genetic consequences of packaging two RNA genomes in one retroviral particule: pseudodiploidy rate of genetic recombination. Proc. Natl. Acad. Sci. USA 87:1556–1560.[Abstract/Free Full Text]

    International Human Genome Sequencing Consortium. 2003. Initial sequencing and analysis of the human genome. Nature 409:860–921.[CrossRef]

    Jordan, I. K., and J. F. McDonald. 1998a. Interelement selection in the regulatory region of the copia retrotransposon. J. Mol. Evol. 47:670–676.[ISI][Medline]

    ———. 1998b. Evolution of the copia retrotransposon in the Drosophila melanogaster species subgroup. Mol. Biol. Evol. 15:1160–1171.[Abstract]

    Jukes, T. H., and C. R. Cantor. 1969 Evolution of protein molecules. Mammalian protein metabolism. Academic Press, New York.

    Katz, R. A., and A. M. Skalka. 1990. Generation of diversity in retroviruses. Ann. Rev. Genet. 24:409–445.[CrossRef][ISI][Medline]

    Kogan, G. L., A. V. Tulin, A. A. Aravin, Y. A. Abramov, A. I. Kalmykova, C. Maisonhaute, and V. A. Gvozdev. 2003. The GATE retrotransposon in Drosophila melanogaster: mobility in heterochromatin and aspects of its expression in germline tissues. Mol. Genet. Genomics 269:234–242.[CrossRef][ISI][Medline]

    Lerat, E., C. Rizzon, and C. Biémont. 2003. Sequence divergence within transposable element families in the Drosophila melanogaster genome. Genome Res. 13:1889–1896.[Abstract/Free Full Text]

    Li, W. H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.

    Ludwig, M. Z. 2002. Functional evolution of noncoding DNA. Curr. Opin. Genet. Dev. 12:634–639.[CrossRef][ISI][Medline]

    Ludwig, M. Z., C. Bergman, N. H. Patel, and M. Kreitman. 2000. Evidence for stabilizing selection in a eukaryotic enhancer element. Nature 403:564–567.[CrossRef][ISI][Medline]

    Malik, H. S., and T. H. Eickbush. 1999. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J. Virol. 73:5186–5190.[Abstract/Free Full Text]

    Maisonhaute, C., and P. Capy. 2001. Aquisition/loss of modules: the construction set of transposable elements. Russ. J. Genet. 38:594–601.[ISI]

    Matyunina, L. V., I. K. Jordan, and J. F. McDonald. 1996. Naturally occuring variation in copia expression is due to both element (cis) and host (trans) regulatory variation. Proc. Natl. Acad. Sci. USA 93:7097–7102.[Abstract/Free Full Text]

    McDonald, J. H., L. V. Matyunina, S. Wilson, I. K. Jordan, N. J. Bowen, and W. J. Miller. 1997. LTR retrotransposons and the evolution of eukaryotic enhancers. Genetica 100:3–13.[CrossRef][ISI][Medline]

    Micard, D., J. L. Couderc, M. L. Sobrier, G. Giraud, and B. Dastugue. 1988. Molecular study of the retrovirus-like transposable element 412, a 20-OH ecdysone responsive sequence in Drosophila cultured cells. Nucleic Acids Res. 16:455–470.[Abstract]

    Moumen, A., L. Polomack, T. Unge, M. Veron, H. Buc, and M. Negroni. 2003. Evidence for a mechanism of recombination during reverse transcription dependent on structure of the acceptor RNA. J. Biol. Chem. 278:15973–15982.[Abstract/Free Full Text]

    Nilsson, M., and S. Bohm. 1994. Inducible and cell type-specific expression of VL30 U3 subgroups correlate with their enhancer design. J. Virol. 68:276–288.[Abstract]

    Pearson, W. R., and D. J. Lipman. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85:2444–2448.[Abstract]

    Petrov, D. A., E. R. Lozovskaya, and D. L. Hartl. 1996. High intrinsic rate of DNA loss in Drosophila. Nature 384:346–349.[CrossRef][ISI][Medline]

    Quandt, K., K. Frech, H. Karas, E. Wingender, and T. Werner. 1995. MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res. 23:4878–4884.[Abstract]

    Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174–175.[Abstract/Free Full Text]

    Saitou, N., and M. Nei. 1987. Neighbor-joining method: a new methode for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.[Abstract]

    Sharp, P. M., E. Bailes, D. L. Robertson, F. Gao, and B. H. Hahn. 1999. Origins and evolution of AIDS virus. Biol. Bull. 196:338–342.[Free Full Text]

    Sun, X., H. D. Le, J. M. Wahlstrom, and G. H. Karpen. 2003. Sequence analysis of a functional Drosophila centromere. Genome Res. 13:182–194.[Abstract/Free Full Text]

    Tajima, F. 1989. Stastistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595.[Abstract/Free Full Text]

    The C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018.[Abstract/Free Full Text]

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. J. Higgins. 1997. Clustal X windows interface: flexible strategies for multiple sequence alignement by quality analysis tolls. Nucleic Acid Res. 25:4876–4882.[Abstract/Free Full Text]

    Thummel, C. S., K. C. Burtis, and D. S. Hogness. 1990. Spacial and temporal patterns of E74 transcription during Drosophila development. Cell 61:101–111.[ISI][Medline]

    Vasilyeva, L. A., E. V. Bubenshchikova, and V. A. Ratner. 1999. Heavy heat shock induced retrotransposon transposition in Drosophila. Genet. Res. 74:111–119.[CrossRef][ISI][Medline]

    Vernhettes, S., M. A. Grandbastien, and J. M. Casacuberta. 1998. Evolutionary analysis of the Tnt1 retrotransposon in Nicotiana species reveals the high variability of its regulatory sequences. Mol. Biol. Evol. 15:827–836.[Abstract]

    Vieira, C., and C. Biémont. 1996a. Geographical variation in insertion site number of retrotransposon 412 in Drosophila simulans. J. Mol. Evol. 42:443–451.[ISI][Medline]

    ———. 1996b. Selection against transposable element in D. simulans and D. melanogaster. Genet. Res. 68:9–15.[ISI][Medline]

    ———. 1997. Transposition rate of the 412 retrotransposable element is independent of copy number in natural populations of Drosophila simulans. Mol. Biol. Evol. 14:185–188.[Abstract]

    ———. 2004. Transposable element dynamics in two sibling species: Drosophila melanogaster and Drosophila simulans. Genetica 120:115–123.[CrossRef][ISI][Medline]

    Vieira, C., P. Aubry, D. Lepetit, and C. Biémont. 1998. A temperature cline in copy number for 412 but not roo/B104 retrotransposons in populations of Drosophila simulans. Proc. R. Soc. Lond. B Biol Sci. 265:1161–1165.[CrossRef][ISI][Medline]

    Vieira, C., D. Lepetit, S. Dumont, and C. Biémont. 1999. Wake up of transposable elements following Drosophila simulans worldwide colonisation. Mol. Biol. Evol. 16:1251–1255.[Abstract]

    Von Kalm, L., K. Crossgrove, D. Von Seggern, G. M. Guild, and S. K. Beckendorf. 1994. The Broad-complex controls a tissue-specific response to the steroid hormone ecdysone at the onset of Drosophila metamorphosis. EMBO J. 13:3505–3516.[Abstract]

    Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256–276.[ISI][Medline]

    Wilson, D. S., G. Sheng, S. Jun, and C. Desplan. 1996. Conservation and diversification in homeodomain-DNA interactions: a comparative genetic analysis. Proc. Natl. Acad. Sci. USA 93:6886–6891.[Abstract/Free Full Text]

    Yuki, S., S. Inouye, S. Ishimaru, and K. Saigo. 1986. Nucleotide sequence characterization of a Drosophila retrotransposon, 412. Eur. J. Biochem. 158:403–410.[Abstract]

Accepted for publication November 25, 2004.





This Article
Abstract
FREE Full Text (PDF)
Supplementary Material
Correction to PDF
An erratum has been published
All Versions of this Article:
22/3/747    most recent
msi060v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (1)
Request Permissions
Google Scholar
Articles by Mugnier, N.
Articles by Vieira, C.
PubMed
PubMed Citation
Articles by Mugnier, N.
Articles by Vieira, C.