Molecular and Evolutionary Analysis of Two Divergent Subfamilies of a Novel Miniature Inverted Repeat Transposable Element in the Yellow Fever Mosquito, Aedes aegypti

Zhijian Tu3,

Department of Biochemistry, Virginia Polytechnic Institute and State University

Abstract

A novel family of miniature inverted repeat transposable elements (MITEs) named Pony was discovered in the yellow fever mosquito, Aedes aegypti. It has all the characteristics of MITEs, including terminal inverted repeats, no coding potential, A+T richness, small size, and the potential to form stable secondary structures. Past mobility of Pony was indicated by the identification of two Pony insertions which resulted in the duplication of the TA dinucleotide targets. Two highly divergent subfamilies, A and B, were identified in A. aegypti based on sequence comparison and phylogenetic analysis of 38 elements. These subfamilies showed less than 62% sequence similarity. However, within each subfamily, most elements were highly conserved, and multiple subgroups could be identified, indicating recent amplifications from different source genes. Different scenarios are presented to explain the evolutionary history of these subfamilies. Both subfamilies share conserved terminal inverted repeats similar to those of the Tc2 DNA transposons in Caenorhabditis elegans, indicating that Pony may have been borrowing the transposition machinery from a Tc2-like transposon in mosquitoes. In addition to the terminal inverted repeats, full-length and partial subterminal repeats of a sequence motif TTGATTCAWATTCCGRACA represent the majority of the conservation between the two subfamilies, indicating that they may be important structural and/or functional components of the Pony elements. In contrast to known autonomous DNA transposons, both subfamilies of Pony are highly reiterated in the A. aegypti genome (8,400 and 9,900 copies, respectively). Together, they constitute approximately 1.1% of the entire genome. Pony elements were frequently found near other transposable elements or in the noncoding regions of genes. The relative abundance of MITEs varies in eukaryotic genomes, which may have in part contributed to the different organizations of the genomes and reflect different types of interactions between the hosts and these widespread transposable elements.

Introduction

Transposable elements can be classified on the basis of the mechanisms of their transposition as DNA-mediated or RNA-mediated elements (Finnegan 1992Citation ). RNA-mediated transposable elements include long terminal repeat (LTR) retrotransposons, non-LTR retrotransposons, and short interspersed repetitive elements (SINEs). Their transposition involves a reverse transcription step which generates cDNA from RNA molecules. DNA-mediated elements such as P, hobo, and mariner use a cut-and-paste mechanism directly from DNA to DNA, and they are characterized by terminal inverted repeats flanking a gene encoding a transposase.

Recently, several families of short interspersed elements with terminal inverted repeats have been found in plants (e.g., Bureau and Wessler 1992Citation ; Bureau, Ronald, and Wessler 1996Citation ; Charrier et al. 1999Citation ; Surzycki and Belknap 1999Citation ), vertebrates (Morgan and Middleton 1990Citation ; Morgan 1995Citation ; Ünsal and Morgan 1995Citation ; Smit and Riggs 1996Citation ; Izsvák et al. 1999Citation ), a nematode (Oosumi, Garlick, and Belknap 1995, 1996Citation ), and two species of insects (Tu 1997Citation ; Braquart, Royer, and Bouhin 1999Citation ). These elements can be grouped as miniature inverted repeat transposable elements (MITEs) based on their common structural characteristics, as proposed by Wessler, Bureau, and White (1995)Citation . These features include small size, no coding potential, conserved terminal inverted repeats, A+T richness, and, in many cases, the potential to form stable secondary structures. The distribution of many families of MITEs in the genome appears to be biased. Many plant MITEs are associated with genes, where more than 90% of them are found in the noncoding regions, mostly in the 5' and 3' flanking sequences (Bureau and Wessler 1992, 1994a, 1994bCitation ; Bureau, Ronald, and Wessler 1996Citation ). The three families of MITEs previously reported in the yellow fever mosquito, Aedes aegypti, are also associated with genes (Tu 1997Citation ). A few MITEs have been found to have terminal inverted repeats that are highly similar to some autonomous DNA-mediated elements which encode transposases (Morgan 1995Citation ; Oosumi, Garlick, and Belknap 1996Citation ). However, the sequence similarity between these MITEs and their corresponding DNA-mediated elements is confined to the terminal inverted repeats. In this regard, MITEs are similar to the Ds1 transposable elements in maize (Federoff 1989Citation ; MacRae and Clegg 1992Citation ), and they are not simply nonautonomous deletion derivatives of the DNA-mediated elements. MITEs are generally homogeneous in size and are units of highly successful transposition. It has been suggested that some MITEs and the DNA-mediated elements share the same transposition machinery based on common terminal inverted repeats (Morgan 1995Citation ; Oosumi, Garlick, and Belknap 1996Citation ). On the other hand, Izsvák et al. (1999)Citation propose that MITEs transpose by a DNA intermediate resulting from the folding back of a single strand of DNA during replication. They suggest that the DNA intermediate is reintegrated into the genome using host factors which are involved in cellular replication.

Here, I report the discovery and characterization of a novel family of highly divergent and highly reiterated MITEs named Pony in A. aegypti. The phylogenetic relationships between a large number of Pony elements are analyzed. Different scenarios are presented to explain the evolution of these elements. The transposition mechanism employed by Pony and other MITEs is assessed. The genomic distribution of Pony elements and evolutionary implications of the relative abundance of MITEs in different eukaryotic genomes are discussed.

Materials and Methods

Mosquitoes
Mosquitoes used in this study were from the Rock strain of A. aegypti.

Construction of a Genomic Library
To facilitate the characterization of the short Pony elements in A. aegypti, a genomic library that contains inserts from 1.3 to 2.5 kb was prepared using a {lambda} ZapExpress vector kit from Stratagene Cloning Systems (La Jolla, Calif.). Genomic DNA was prepared from the Rock strain of A. aegypti. The vector was predigested with BamHI and treated with calf intestine alkaline phosphatase. The genomic DNA was partially digested with Sau3AI, which produces ends that are compatible to BamHI cuts. The digestion conditions were optimized to produce mostly 1–3-kb fragments. The digested fragments were separated on an agarose gel. Fragments between 1.3 and 2.5 kb were cut out and purified using the Sephaglas Bandprep Kit from Amersham Pharmacia Biotech (Arlington Heights, Ill.). These fragments were ligated to the vector of approximately the same molarity to minimize tandem inserts in one clone. The primary library had 2.8 x 106 original plaque-forming units, with a 1.7% background. A total of 2.0 x 106 original plaque-forming units were amplified and stored. Aliquots of the remaining unamplified library were used in the screening experiments described below.

Screening of the {lambda} ZapExpress Genomic Library
The unamplified library was screened using two digoxigenin-labeled ssDNA probes, obtained using either Pony-Aa-A1 or Pony-Aa-B1 as template. For the Pony-Aa-A1 probe, the template for the labeling reaction was a gel-purified polymerase chain reaction (PCR) product obtained using a plasmid that contains Pony-Aa-A1 and a primer matching the terminal inverted repeat TACCGTTTTGNYTCANATTNCGNACA. For the Pony-Aa-B1 probe, the template for the labeling reaction was a gel-purified PCR product obtained using a plasmid that contains Pony-Aa-B1 and the primers AACACTTAACTTTCGAATGGT and TATTCCGGACACTCTACTTTG. The above PCR products were labeled in two separate asymmetric PCR reactions using GTATGGTACAGCATTTTGATT and AACACTTAACTTTCGAATGGT as respective primers. The labeling conditions were the same as those described in Tu and Hagedorn (1997)Citation , using a digoxigenin-dUTP labeling mixture. MagnaGraph Nylon membranes (Micron Separation Inc., Westborough, Mass.) were used to lift the plaques. The prehybridization solution was 5 x SSC with 2% nonfat milk, 0.1% N-lauroylsarcosine, and 0.02% SDS. Approximately 20 ng of probe per milliliter of prehybridization solution was used for hybridization. Hybridization was carried out at 55°C in a Gene Roller from Savant Instruments, Inc. (Holbrook, N.Y.). The final set of washes were at 55°C with 0.5 x SSC containing 0.1% SDS. After washing, the membranes were incubated in a solution of an alkaline phosphatase–linked anti-digoxigenin antibody and two phosphatase substrates X-phosphate and nitroblue tetrazolium salt, following the protocol of Boehringer Mannheim Biochemicals (Indianapolis, Ind.). Secondary screening was performed to confirm and purify the positive clones isolated during the primary screening. Inserts in {lambda} ZapExpress clones were excised in vivo and inserted into the pBK-CMV phagemid vector using the ExAssist helper phage from Stratagene Cloning Systems. There should be no cross-hybridization between the two Pony probes which are 59.5% identical, because the screening stringency described above allows slightly more than 20% mismatch. This was later confirmed by sequence comparisons between the positive clones isolated using the two different probes.

Estimation of the Copy Number of Pony Elements
A total of four 150-mm plates that contained 31,500 plaques from the unamplified A. aegypti genomic library were screened under the conditions described above. The copy numbers of the two subfamilies were calculated based on the ratio of positive plaques to the total number of plaques screened, taking into account the known size of the haploid genome of the A. aegypti Rock strain (800 Mb; Rao and Rai 1987Citation ), the 1.9-kb average insert size of the genomic library, and the level of background. This method was previously described in Tu, Isoe, and Guzova (1998)Citation .

PCR and TA Cloning
Pony elements were amplified by PCR using approximately 3 ng of genomic DNA isolated from the Rock strain of A. aegypti. A single primer, TACCGTTTTGNYTCANATTNCGNACA, which corresponds to the TA insertion site plus the consensus sequence of the terminal inverted repeats of known Pony elements, was used in each reaction. The calculated melting temperature of the primer was 68°C. Three different annealing temperatures were used: 60°C, 56°C, and 51°C, respectively. Samples were denatured at 94°C for 35 s and extended at 72°C for 140 s. Approximately 1 U of TakaRa Taq polymerase (Takara, Shuzo Co., Otsu, Shiga, Japan), 1.5 mM MgCl2, 0.2 mM dNTP, and 2 µM primer were used per 20-µl reaction. PCR products were run on a 0.9% agarose gel and purified using a Sephaglas Bandprep Kit from Amersham Pharmacia Biotech. Purified PCR products were cloned in a pCR 2.1 vector using an Original TA Cloning Kit from Invitrogen (Carlsbad, Calif.).

DNA Sequencing
Sequencing of the genomic clones was done with the sequencing facilities at the University of Arizona and Virginia Tech with either T3/T7 primers or custom synthetic primers using an automated sequencer (Model 377, Applied Biosystem International, Foster City, Calif.). Cloned PCR products were sequenced with an IRD800 dye-labeled T7 primer using a 4200S Gene ReadIR sequencing instrument from Li-Cor (Lincoln, Nebr.).

Sequence Analysis and Phylogenetic Inference
Searches for matches of either nucleotide or amino acid sequences in the database (Non-redundant GenBank + EMBL + DDBJ + PDB) were done using Fasta of GCG (Genetics Computer Group, Madison, Wis., version 9.0, 1996) and BLAST (Altschul et al. 1997Citation ). Pairwise comparisons were done by Gap and Bestfit of GCG. Multiple sequences were aligned by Pileup, which is a progressive, pairwise method from GCG. Specific parameters such as gap weight and gap length weight are described in the figure legend of the alignment. Consensus of the multiple-sequence alignment was obtained using Pretty of GCG. Phylogenetic trees were constructed using the minimum-evolution, neighbor-joining, and maximum-parsimony methods of PAUP*, version 4.0 b2 (Swofford 1999Citation ). Specific parameters used in the phylogenetic analyses are described in the corresponding figure legend. One thousand bootstrap replicates were used to assess the confidence in the grouping (Felsenstein and Kishino 1993Citation ). Direct and inverted repeats within MITE sequences were analyzed using GeneQuest of LaserGene (DNASTAR, Inc., Madison, Wis.). DNA secondary structure was predicted using the RNA folding program of GeneQuest, which uses the Vienna modifications (Hofacker et al. 1994Citation ) of the Zuker (1989)Citation algorithm.

Results

Discovery of a Novel Family of MITEs Named Pony in A. aegypti
The first copy of Pony, Pony-Aa-A1, was discovered in A. aegypti inside a previously reported Wujin element (Tu 1997Citation ). This 514-bp Pony insertion was not detected in the previous paper because a more stringent multiple-sequence alignment method was used to compare different copies of Wujin elements. However, as shown in figure 1A, a pairwise comparison between two Wujin elements using a condition that imposes less penalty on gap creation indicates that Pony-Aa-A1 was inserted in Wujin-Aa4, resulting in a TA target site duplication. Subsequent database searches identified three additional copies of Pony near other transposable elements and eight copies in the noncoding regions of genes in A. aegypti (table 1 ). As described below, analysis of these 12 Pony elements and 26 additional copies suggests that Pony is a novel family of MITEs.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 1.—Evidence of past mobility of the Pony element. Shown here are the insertion of Pony-Aa-A1 in Wujin-Aa4 (A) and the insertion of Pony-Aa-A11 in Pony-Aa-A12 (B). The putative TA target duplications are underlined. The 3' end of Wujin-Aa4 (TATCACTG) was not identified in Tu (1997)Citation because of the interruption by the Pony insertion. The revised sequence of Wujin-Aa4 is deposited in GenBank (AF208664), while the sequence of Wujin-Aa3 can be found in Tu (1997)Citation

 

View this table:
[in this window]
[in a new window]
 
Table 1 Identification of Pony Elements Near Genes and Other Repetitive Sequences in Aedes aegypti

 
The Presence of Two Divergent Subfamilies of Highly Reiterated Pony Elements
Analysis of the 12 Pony elements found during the database searches suggested that there were two distinct subfamilies, namely, Pony-Aa-A and Pony-Aa-B. Approximately 31,500 plaques from an unamplified A. aegypti genomic library were screened using both A and B probes as described in Materials and Methods. Double lifts were performed. One set of the membranes were probed with the A probe, while the other set were probed with the B probe. The A and B probes detected 617 and 730 positives, respectively. Thus, based on the known genome size of A. aegypti and the average insert size of the library described in Materials and Methods, the copy numbers of Pony elements in the A and B subfamilies were approximately 8,400 and 9,900, respectively. The two subfamilies together constitute approximately 1.1% of the entire genome. Approximately 2%–3% of the positive clones hybridized to both probes based on a preliminary analysis aligning the positive spots on the two sets of membranes. Because there should be no cross-hybridization between the two probes under the screening conditions described in Materials and Methods, these clones are likely to contain at least one Pony element from each of the two subfamilies. If we assume that there is no physical linkage between the two subfamilies, the probability of finding a Pony-Aa-A in a clone that contains a Pony-Aa-B will be 2% (617/31,500), while the probability of finding a Pony-Aa-B in a clone that contains a Pony-Aa-A will be 2.3% (730/31,500). These numbers are similar to the observed frequency (2%–3%), indicating that there was no close physical linkage between the two subfamilies.

Additional Pony Sequences Obtained by Library Screening and PCR
Fifteen additional Pony elements isolated from the genomic library were sequenced (table 2 ). Analysis of all available sequences indicated that the 24-bp terminal inverted repeats were highly conserved. To survey the diversity of Pony elements and to obtain additional sequences, the consensus of the 24-bp terminal inverted repeats of all available Pony sequences plus the specific TA target sequence was used as the primer to amplify Pony elements from genomic DNA of the Rock strain of A. aegypti in a PCR experiment, as shown in figure 2 . A predominant 500-bp product and a small amount of a 1,000-bp product were obtained in three different reactions at different annealing temperatures (60°C, 56°C, and 51°C, respectively). The 500-bp product obtained at the lowest annealing temperature was cloned. Eleven clones, representing different copies of Pony elements, were sequenced (GenBank accession numbers AF259802AF259812). The 1,000-bp product is likely a dimer of the Pony elements, because when it was purified and reamplified using the same primer, a predominant 500-bp product was obtained (data not shown).


View this table:
[in this window]
[in a new window]
 
Table 2 Pony Elements Identified During the Screening of an Aedes aegypti Genomic Library

 


View larger version (97K):
[in this window]
[in a new window]
 
Fig. 2.—A 0.9% agarose gel showing the amplification of Pony elements from Aedes aegypti genomic DNA using polymerase chain reactions (PCRs). Lanes 1 and 5 are 1-kb DNA ladders from Promega (Madison, Wis.). Lanes 2, 3, and 4 are PCR products generated at three different annealing temperatures (60°C, 56°C, and 51°C, respectively). Approximately 3 ng of genomic DNA isolated from the Rock strain of A. aegypti was used as template for each of the three reactions. A single primer corresponding to the consensus sequence of the terminal inverted repeat of known Pony elements was used. See Materials and Methods for the primer sequence and detailed PCR conditions

 
Sequence Comparison and Phylogenetic Analysis
A total of 38 Pony elements have been discovered as described above, 28 of which are either full-length or close to full-length. A multiple-sequence alignment of these 28 elements is shown in figure 3 . The alignment indicated that the terminal inverted repeats and several segments of internal sequences were conserved among all of the elements. The sequence alignment also suggested that there were two subfamilies based on shared nucleotide substitutions. The two subfamilies are indicated by the letters A and B. There were 17 elements in subfamily A and 11 in subfamily B. Two consensus sequences were obtained based on separate multiple-sequence alignments of the A and B elements. One of the A elements, A6, and one of the B elements, B2, showed more sequence divergence than the rest of their respective subfamilies. While A6 was 75% similar to the consensus of subfamily A, the other 16 elements in subfamily A were more than 90% similar to the consensus. While B2 was 76% similar to the consensus of subfamily B, the other 10 elements in subfamily B were more than 85% similar to the consensus. Therefore, most of these Pony elements were highly conserved within their respective subfamilies. On the other hand, the cross-subfamily comparisons between individual elements showed less than 60% sequence similarities, and the two consensus sequences were less than 62% similar. It should be noted that in some cases one or more significant gaps were introduced in the alignments, indicating that deletions and insertions were not uncommon. As shown in figure 4 , the phylogenetic analysis of the 28 Pony elements is consistent with the notion that there are two subfamilies. Three methods, including minimum-evolution distance, neighbor joining, and maximum parsimony, were used. The division between subfamilies A and B was well supported by all three methods, with bootstrap values at 100%. In addition, there were four nodes within subfamily A and six within subfamily B that were supported by all three methods with bootstrap values above 50%. It is interesting to note that all 11 elements obtained in the PCR analysis belong to either the A or the B subfamily.




View larger version (117K):
[in this window]
[in a new window]
 
Fig. 3.—Multiple-sequence alignment of 28 Pony elements that are either full-length or close to full-length. Names of the elements are abbreviated by omitting the common prefix "Pony-Aa". The system used to name the Pony elements is explained in the footnote to table 1 . The alignment was done using Pileup of GCG (gap weight = 1, gap length weight = 0) and was then slightly modified manually at the 5' and 3' termini for improvement. The consensus sequence was created by Pretty (plurality = 14, threshold = 1) of GCG. Dots indicate sequences that are identical to the consensus. Lowercase letters indicate sequence variation. Dashed lines indicate gaps. The arrows mark the flanking TA target duplications, as well as the terminal inverted repeats. Note that there is a 36-bp insertion in Pony-Aa-A14 that is not shared by any other Pony sequences. N = A, C, G, T; R = A, G; S = G, C; V = G, A, C; W = A, T; Y = C, T

 


View larger version (17K):
[in this window]
[in a new window]
 
  Fig. 4.—A, Phylogenetic analysis of the 28 Pony elements aligned in figure 3 . Naming of the elements is as described in the legend of figure 3 . The TA duplications were not included in the analysis. All analyses were conducted using PAUP*, version 4.0 b2 (Swofford 1999Citation ). All trees were unrooted. The relative branch length was calculated by PAUP*, version 4.0 b2. The tree shown here is an unrooted phylogram constructed using a minimum-evolution algorithm. The heuristic search was conducted using the tree bisection-reconnection (TBR) branch-swapping algorithm. All characters are of equal weight and unordered. Neighbor joining and maximum parsimony were also used in the analysis. Confidence of the groupings was estimated using 1,000 bootstrap replications. The Arabic numerals at the base of a node are the bootstrap values, which represent the percentage of times out of 1,000 bootstrap resamplings that branches were grouped together at a particular node. The first number is the bootstrap value derived from a minimum-evolution analysis, the second is that derived from a neighbor-joining analysis, and the third is that derived from a parsimony analysis. Only groups supported by all three independent bootstrap analyses above the 50% level are marked as thick branches. More detailed branching patterns of the subgroups in subfamilies A and B are shown in panels B and C, respectively

 
The 10 incomplete copies of Pony elements were assigned to one of the two subfamilies based on pairwise comparisons to the consensus sequences of the two subfamilies. As shown in tables 1 and 2, six of them (Pony-Aa-A4, -A9, and -A1518) belong to subfamily A, while four (Pony-Aa-B3 and -B7–9) belong to subfamily B. With the exception of A4, the similarities between the truncated copies and their respective consensus sequences range from 74% to 95%. Pony-Aa-A4 was 64% similar to the consensus of subfamily A, which was in part due to the fact that most of the 212-bp region of similarity was outside the highly conserved sequences. There was no significant similarity between A4 and the consensus of subfamily B.

Despite a Relatively Low Sequence Similarity, the Two Subfamilies Share Several Structural Characteristics, Including a Tc2-like Terminal Inverted Repeat and Conserved Subterminal Motifs
The consensus sequences of the two subfamilies are less than 62% similar, as shown in a pairwise comparison (fig. 5A ). Like all characterized MITEs, the two subfamilies of Pony elements are bordered by imperfect terminal inverted repeats. As shown in figure 5B, 14 of the 18 terminal nucleotides of the two subfamilies of Pony are similar to the terminal sequences of the Tc2 DNA transposon (Ruvolo, Hill, and Levitt 1992Citation ) and the three MITE-like elements in C. elegans (Oosumi, Garlick, and Belknap 1995, 1996Citation ). In addition, as discussed below, Pony tends to specifically insert in a TA dinucleotide target, which is common among Tc2-like elements. A 20-bp sequence motif named {alpha} (TTGATTCAWATTCCGRACAS) was identified in both consensus sequences (fig. 5C ). It overlaps the terminal inverted repeats but starts 5 bp inside the termini. Two additional copies of this {alpha} sequence were found in the internal regions, one of which had multiple substitutions in the consensus of the B subfamily. Two copies of a segment of {alpha}, the 15-bp ß sequence TCAWATTCCGRACAS and the reverse sequence of another short fragment of {alpha}, the 8 bp {gamma} sequence WATTCCGG, were also highly conserved between the two subfamilies. The majority of the conserved sequences were related to these subterminal repeat motifs ({alpha}, ß, and {gamma}). The consensus sequences of both subfamilies contain more than 62% A+T. They all have the potential to form stable secondary structures, with {Delta}G values lower than -78 kcal/mol (fig. 5D ).



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 5.—Comparisons between the consensus sequences of the two subfamilies of Pony elements. A, A pairwise alignment between the two consensus sequences aligned using Gap of GCG (gap weight = 30, gap length weight = 1). The consensus sequence of subfamilies A and B were created by Pretty of GCG using the majority rule. In a few positions where no base was in the majority, a base that occurred no less than any other bases was chosen arbitrarily. Dots indicate gaps introduced during the alignment. The terminal inverted repeats, as well as the three types of subterminal repeat motifs ({alpha}, ß, and {gamma}), are marked by arrows. B, Multiple-sequence alignment of the terminal sequences of Pony-Aa-A, Pony-Aa-B, Tc2, and three MITE-like families in Caenorhabditis elegans, including CeleTc2, Cele11, and Cele12 (Oosumi, Garlick, and Belknap 1995, 1996Citation ). The common TA insertion site was included. The consensus of these six sequences was created using the simple majority rule. Lowercase letters indicate sequence differences from the consensus. When no base was in the majority, a base that occurred no less than any other bases was chosen arbitrarily. Dashes indicate gaps introduced during the alignment. C, The relationships between the sequences of the three types of subterminal repeat motifs ({alpha}, ß, and {gamma}). The sequence of the {alpha} motif is derived based on the majority of the four {alpha} repeats in the two subfamilies. D, Comparison between the two consensus sequences in their lengths, A+T contents, and potentials to form stable secondary structures as indicated by {Delta}G values. D = A, G, T; H = A, C, T; K = G, T; M = A, C; N = A, C, G, T; R = A, G; S = G, C; W = A, T; Y = C, T

 
Pony Tends to Insert in a Specific TA Dinucleotide Target
As shown in figure 1A and B, two examples were found in which a Pony element was inserted in another MITE, resulting in the duplication of the TA target sequence. Furthermore, as shown in figure 3 , not counting the two elements that had truncations at the 3' termini, 25 of the 26 Pony sequences were flanked by TA repeats. Thus, the insertion target of the Pony family is highly specific. However, no obvious consensus was found in the target sequences other than the conserved TA insertion site.

Pony Elements Were Often Found near Genes and Other Transposable Elements
As shown in table 1 , five Pony-Aa-A elements and three Pony-Aa-B elements were found in the introns and flanking regions of six genes in A. aegypti. The gene sequences used in the above analysis include 21 genes retrieved from the nonredundant GenBank database (September 1999) and four additional genes, including VgB, VgC (Isoe and Hagedorn, personal communication), AaE74 (unpublished data), and a ferritin gene (Pham and Law, personal communication). It is not yet clear whether this association between Pony elements and the noncoding regions of genes in A. aegypti is random. A large number of randomly selected genes need to be analyzed to address this question. Pony elements were also frequently found near other transposable elements. As shown in table 1 , four Pony elements were found near known transposable elements. In addition, all of the Pony elements found in the noncoding regions of genes had at least one other transposable element nearby (data not shown). Furthermore, as shown in table 2 , 11 of the 14 Pony clones isolated from the genomic library contained at least one other transposable element. Most of the transposable elements found near Pony were either elements of the Feilai family (Tu 1999Citation ), fragments of non-LTR retrotransposons, or MITEs themselves.

Discussion

Diversity and Evolutionary History of Pony Elements
Sequence comparison and phylogenetic analysis indicate that there are two highly divergent subfamilies of Pony elements in A. aegypti (figs. 3 and 4 ). The consensus sequences of these subfamilies share less than 62% similarity. Because all 11 elements sequenced from cloned PCR products belong to either the A or the B subfamily and because a degenerate primer that matched the conserved terminal inverted repeats of all available Pony sequences was used in the above PCR experiment at low annealing temperatures (fig. 2 ), it is likely that there are only two major subfamilies of Pony elements in A. aegypti. However, we cannot rule out the presence of other subfamilies that are much less abundant. Although the two subfamilies are highly divergent, all but 2 of the 28 full-length Pony elements are more than 85% similar to the consensus sequences of their respective subfamilies. Moreover, most of them are more than 90% similar to their consensus, and phylogenetic analysis showed the presence of multiple subgroups within each of the subfamilies. A small but significant fraction of both subfamilies are truncated copies. Truncations were occurring at both 5' and 3' termini, and the truncated copies were not flanked by TA target duplications. Thus, these truncations could have resulted from recombination, insertion of other transposable elements, and deletions of large terminal segments. Like the full-length copies, a small fraction of the truncated copies showed less than 75% similarity to the consensus of their subfamily, while the majority of them showed 79%–95% similarity to their consensus. Although it is not clear how Pony, or any other MITE, might have originated in the genome, different hypotheses can be proposed to explain the evolutionary history of the Pony elements. One hypothesis is that the two subfamilies could have coexisted for a long time in the ancestral species and they were only extensively amplified relatively recently, which could explain the presence of a small number of divergent copies of full-length and truncated Pony elements amid a large number of highly conserved copies within each of these subfamilies. In addition, more than one source gene was amplified which resulted in multiple subgroups within each subfamily. If we assume similar rates of evolution for most Pony elements, amplification events at different times could account for the varied levels of divergence in different subgroups. Alternatively, the two subfamilies could be relatively new in the mosquito genome. Under this hypothesis, to account for the presence of a small number of highly divergent copies within each subfamily, it is necessary to assume that they have been evolving at a much faster rate than the rest of the Pony sequences. To distinguish between these hypotheses, it is necessary to determine the distribution and diversity of Pony in different mosquitoes and perhaps in other insects. Such analysis would also help to address issues such as vertical transmission versus horizontal transfer of MITE families.

The Importance of Subterminal Repeats in Pony and Other MITEs
Sequence comparisons between the two highly divergent subfamilies of Pony elements present a unique opportunity to analyze what features may be important for this and other families of MITEs. In addition to the terminal inverted repeats, the {alpha}, ß, and {gamma} subterminal repeats represent the majority of the conservation between the two subfamilies (fig. 5 ), indicating that these repeats may be important structural and/or functional components of the Pony elements. One of the {alpha} repeats is less conserved between the two subfamilies because of three substitutions in the consensus of the B subfamily, which could indicate that not all of the repeats are equally important. The occurrence of subterminal repeats has been previously documented in a few MITEs (Morgan and Middleton 1990Citation ; Bureau and Wessler 1992Citation ; Charrier et al. 1999Citation ). New subterminal repeats have been identified in a number of MITE families, as shown in table 3 , although not all of the MITEs surveyed contained significant subterminal repeats. Therefore, subterminal repeats could be an important feature for a subgroup of MITEs. It is possible that some of the conserved repeat motifs in the Pony sequences and other MITEs may serve as multiple binding sites for transposases or other proteins that are involved in the transposition process. The ATTTGCAT octamer repeats in a MITE-like element from Xenopus have been shown to bind to an oocyte nuclear protein (Morgan and Middleton 1990Citation ). A segment of the {alpha} motif of the Pony elements, TTGATTCAW, is similar to one of the two repeat motifs (TTGATTCAY) that constitute the strong binding site of the bacterial transposon Mu (Groenen, Timmers, and van de Putte 1985Citation ), although there is no evidence yet for the function of this sequence in Pony elements.


View this table:
[in this window]
[in a new window]
 
Table 3 Examples of Subterminal Repeats in Different MITEs

 
Implications for the Transposition Mechanisms of Pony and Other MITEs
The terminal inverted repeats of the two subfamilies of Pony are similar to the terminal sequences of the Tc2 DNA transposon (Ruvolo, Hill, and Levitt 1992Citation ) and the three MITE-like elements in C. elegans (Oosumi, Garlick, and Belknap 1995, 1996Citation ). All of these elements use the TA dinucleotide as insertion targets. Because the terminal inverted repeats often harbor the most important cis-elements required for transposition, and because the same target sequence could indicate a similar endonuclease used in DNA cleavage before integration, Pony may have been borrowing the transposition machinery from a Tc2-like DNA transposon in mosquitoes. The PCR analysis shown in figure 2 failed to generate an autonomous DNA transposon in A. aegypti which shares similar terminal inverted repeats with Pony elements. However, this could be due to selective amplification of smaller fragments in PCR, or the high copy number of the Pony sequences relative to the possible autonomous elements. A previously reported MITE in A. aegypti, Wujin, is flanked by the same TA target duplications, and the first 5 nt of its terminal repeats are identical to the Tc1 transposons in C. elegans (Tu 1997Citation ). A fragment of a DNA element that encodes a transposase similar to Tc1 has been found in A. aegypti, although no full-length copy has been isolated (unpublished data). The hypothesis that MITEs use the transposition machinery of autonomous DNA-mediated elements has previously been proposed (Morgan 1995Citation ; Oosumi, Garlick, and Belknap 1996Citation ; Smit and Riggs 1996Citation ). One of the difficulties of this hypothesis is that most MITEs, including Pony, are present in the genome at very high copy numbers which have been attained only by retroelements. Interestingly, phylogenetic analysis of Pony suggests the presence of subfamilies and amplification from different source genes, which resembles the evolutionary pattern of a SINE retrotransposon named Feilai in A. aegypti (Tu 1999Citation ). However, coexistence of subfamilies is not unique to retroelements. Multiple subfamilies of P elements have been found in a few species of Drosophila (Clark et al. 1995Citation ; Clark and Kidwell 1997Citation ). Therefore, the presence of independent subfamilies cannot be used as evidence for a retrotransposition mechanism. Izsvák et al. (1999)Citation proposed an alternative hypothesis suggesting that MITEs transpose by a DNA intermediate produced from the folding back of a single-stranded DNA during replication. The authors argue that the stable secondary structure of MITEs is crucial in the formation of such single-stranded DNA stem-loops and thus critical to their transposition. They suggest that these stem-loops are integrated in the chromosomes using host factors of cellular replication and that MITEs may be only occasionally mobilized by trans-complementary transposases. Although I think it is more likely that Pony may have been borrowing the transposition machinery from a Tc2-like transposon as described above, the data presented here do not directly contradict the "snap-back" hypothesis by Izsvák et al. (1999)Citation . The snap-back model can explain the highly successful amplification of MITEs relatively well. However, it does not address the fact that a number of MITEs, including Pony, share similar terminal inverted repeats with autonomous DNA transposons (Morgan 1995Citation ; Oosumi, Garlick, and Belknap 1996Citation ; Smit and Riggs 1996Citation ) or the fact that not all MITEs have the potential to form stable secondary structures (Bureau, Ronald, and Wessler 1996Citation ). Furthermore, this model seems to predict that most MITEs should have the potential to form simple hairpin structures, which is not true for many MITEs, including Pony and the other three MITEs in A. aegypti (Tu 1997Citation ). It is possible that different MITEs have been exploiting different ways of transposition. Some MITEs may primarily use a snap-back mechanism, while others may rely mainly on the transposition machinery of autonomous DNA transposons. These two proposed mechanisms are not mutually exclusive.

Genomic Distribution of MITEs and the Evolution of Eukaryotic Genomes
Both of the two subfamilies of Pony elements are highly reiterated in A. aegypti. They constitute approximately 1.1% of the entire genome. There was no close physical linkage between the two subfamilies. However, as shown in tables 1 and 2, Pony elements are frequently found near other transposable elements. Most of the transposable elements found near Pony are either elements of the Feilai family, fragments of non-LTR retrotransposons, or MITEs themselves, indicating a possible nonrandom distribution of Pony elements in the genome. On the other hand, Pony elements are also frequently found in the noncoding regions of genes in A. aegypti. It seems that Pony elements are not biased against genic regions. This is consistent with our preliminary analysis showing concentrations of a number of repetitive elements in the noncoding regions of a number of genes in A. aegypti (unpublished data). Nonrandom distribution of MITEs and other transposable elements have previously been indicated in A. aegypti (Tu 1997, 1999Citation ; Tu, Isoe, and Guzova 1998Citation ; Tu and Hill 1999Citation ). Together with the other MITEs (Tu 1997Citation ), the highly reiterated Pony elements may have contributed to the highly repetitive nature and the pattern of "short period interspersion" of the A. aegypti genome (Warren and Crampton 1991Citation ). In addition, many families of highly repetitive MITEs have been discovered in genomes that have high levels of repetitive sequences, such as several cereal grasses, a frog, and humans (e.g., Morgan 1995Citation ; Ünsal and Morgan 1995Citation ; Bureau, Ronald, and Wessler 1996Citation ; Smit and Riggs 1996Citation ; Izsvák et al. 1999Citation ). In contrast, no highly repetitive MITEs have been reported in the genome of Drosophila melanogaster, which has a low level of repetitive sequences and a pattern of "long period interspersion" (Crain et al. 1976Citation ). Although a number of families of MITEs have been found in C. elegans, which has a small and compact genome (Oosumi, Garlick, and Belknap 1996Citation ), the copy numbers of these MITEs are generally low. Similarly, MITEs are much less abundant in the genome of Arabidopsis thaliana, which has only approximately 4% repetitive sequences (Casacuberta et al. 1998Citation ; Surzycki and Belknap 1999Citation ). The distribution of MITE-like elements in these various genomes suggests that the massive proliferation of MITEs may be associated with more repetitive genomes in both the plant and the animal kingdoms. The difference in the relative abundance of MITEs may have in part contributed to the different organizations of the eukaryotic genomes and may reflect different types of interactions between the hosts and these widespread transposable elements.

In summary, this study describes a novel family of highly divergent and highly reiterated MITEs in the yellow fever mosquito, A. aegypti. Evolutionary insights were brought to view regarding the expansion of the Pony family and its impact on the organization of the genome. Pony may have been borrowing the transposition machinery from an autonomous DNA transposon, although the detailed mechanisms remain to be determined. Specifically, it remains to be seen how Pony might have attained a very high copy number through a DNA-mediated transposition mechanism. The rapid accumulation of genomic sequences from a wide range of organisms will almost certainly provide a better understanding of the diversity and common characteristics of this large group of transposable elements. Such information, as well as the knowledge of the mechanism of their transposition and expansion, will undoubtedly shed light on the evolution of eukaryotic genomes.

Supplementary Material

The sequences reported in this manuscript have been deposited in GenBank with accession numbers AF208664AF208681 and AF259802AF259812.

Acknowledgements

I thank Shirley Luckhart, Jiann-Shin Chen, Andrea Crampton, and Chunhong Mao for critical comments on the manuscript. I thank Jennifer Hill and Yumin Qi for valuable technical assistance. I also thank the sequencing facilities at the University of Arizona and Virginia Tech for their service. I am indebted to Jun Isoe, Henry Hagedorn, Daphne Pham, and John Law for sharing unpublished data. This work was supported by NIH grant AI42421 to Z.T. and by the Agricultural Experimental Station at Virginia Tech.

Footnotes

Pierre Capy, Reviewing Editor

1 Abbreviations: MITE, miniature inverted repeat transposable element; SINE, short interspersed repetitive element. Back

2 Keywords: MITES SINES interspersed repeats genome evolution phylogenetics Back

3 Address for correspondence and reprints: Zhijian Tu, Department of Biochemistry, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061. E-mail: jaketu{at}vt.edu . Back

literature cited

    Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.[Abstract/Free Full Text]

    Braquart, C., V. Royer, and H. Bouhin. 1999. DEC: a new miniature inverted-repeat transposable element from the genome of the beetle Tenebrio molitor. Insect Mol. Biol. 8:571–574.

    Bureau, T. E., P. C. Ronald, and S. R. Wessler. 1996. A computer-based systematic survey reveals the predominance of small inverted repeat elements in wild-type rice genes. Proc. Natl. Acad. Sci. USA 93:8524–8529.

    Bureau, T. E., and S. R. Wessler. 1992. Tourist: a large family of small inverted repeat elements frequently associated with maize genes. Plant Cell 4:1283–1294.

    ———. 1994a. Mobile inverted repeat elements of the Tourist family are associated with the genes of many cereal grasses. Proc. Natl. Acad. Sci. USA 91:1411–1415.

    ———. 1994b. Stowaway: a new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants. Plant Cell 6:907–916.

    Casacuberta, E., J. M. Casacuberta, P. Puigdomenech, and A. Monfort. 1998. Presence of miniature inverted repeat transposable elements (MITEs) in the genome of Arabidopsis thaliana: characterisation of the Emigrant family of elements. Plant J. 16:79–85.[ISI][Medline]

    Charrier, B., F. Foucher, E. Kondorosi, Y. d'Aubenton-Carafa, C. Thermes, A. Kondorosi, and P. Ratet. 1999. Bigfoot: a new family of MITE elements characterized from the Medicago genus. Plant J. 18:431–441.[ISI][Medline]

    Clark, J. B., T. K. Altheide, M. J. Schlosser, and M. G. Kidwell. 1995. Molecular evolution of P transposable elements in the genus Drosophila. I. The saltans and willistoni species groups. Mol. Biol. Evol. 12:902–913.[Abstract]

    Clark, J. B., and M. G. Kidwell. 1997. A phylogenetic perspective on P transposable element evolution in Drosophila. Proc. Natl. Acad. Sci. USA 94:11428–11433.

    Crain, W. R., F. C. Eden, W. R. Pearson, E. H. Davidson, and R. J. Britten. 1976. Absence of short period interspersion of repetitive and non-repetitive sequences in the DNA of Drosophila melanogaster. Chromosoma 56:309–326.

    Edwards, M. J., and H. H. Hagedorn. 1998. Vitelline envelope genes of the yellow fever mosquito, Aedes aegypti. Insect Biochem. Mol. Biol. 28:915–925.

    Federoff, N. 1989. Maize transposable elements. Pp. 375–411 in D. E. Berg and M. M. Howes, eds. Mobile DNA. American Society for Microbiology, Washington, D.C.

    Felsenstein, J., and H. Kishino. 1993. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42:193–200.[ISI]

    Finnegan, D. J. 1992. Transposable elements. Curr. Opin. Genet. Dev. 2:861–867.[Medline]

    Groenen, M. A. M., E. Timmers, and P. van de Putte. 1985. DNA sequences at the ends of the genome of bacteriophage Mu essential for transposition. Proc. Natl. Acad. Sci. USA 82:2087–2091.

    Hofacker, I., W. Fontana, P. Stadler, S. Bonherffer, M. Tacker, and P. Schuster. 1994. Fast folding and comparison of RNA secondary structures. Monatsh. Chem. 125:167–188.[ISI]

    Izsvák, Z., Z. Ivics, N. Shimoda, D. Mohn, H. Okamoto, and P. B. Hacket. 1999. Short inverted repeat transposable elements in Teleost fish and implications for a mechanism of their amplification. J. Mol. Evol. 48:13–21.[ISI][Medline]

    Luck, J. E., G. J. Lawrence, E. J. Finnegan, D. A. Jones, and J. G. Ellis. 1998. A flax transposon identified in two spontaneous mutant alleles of the L6 rust resistance gene. Plant J. 16:365–369.[ISI][Medline]

    MacRae, A. F., and M. T. Clegg. 1992. Evolution of Ac and Ds1 elements in select grasses (Poaceae). Genetica 86:55–66.

    Morgan, G. T. 1995. Identification in the human genome of mobile elements spread by DNA-mediated transposition. J. Mol. Biol. 254:1–5.[ISI][Medline]

    Morgan, G. T., and K. M. Middleton. 1990. Short interspersed repeats in Xenopus that contain multiple octamer motifs are related to known transposable elements. Nucleic Acids Res. 18:5781–5786.[Abstract]

    Oosumi, T., B. Garlick, and W. R. Belknap. 1995. Identification and characterization of putative transposable DNA elements in solanaceous plants and Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA 92:8886–8890.

    ———. 1996. Identification of putative nonautonomous transposable elements associated with several transposon families in Caenorhabditis elegans. J. Mol. Evol. 43:11–18.[ISI][Medline]

    Rao, P. S., and K. S. Rai. 1987. Inter and intraspecific variation in nuclear DNA content in Aedes mosquitoes. Heredity 59:253–258.

    Romans, P., Z. Tu, Z. Ke, and H. H. Hagedorn. 1995. Analysis of a vitellogenin gene of the mosquito, Aedes aegypti and comparisons to vitellogenins from other organisms. Insect Biochem. Mol. Biol. 25:939–958.[ISI][Medline]

    Ruvolo, V., J. E. Hill, and A. Levitt. 1992. The Tc2 transposon of Caenorhabditis elegans has the structure of a self-regulated element. DNA Cell Biol. 11:111–122.[ISI][Medline]

    Smit, A. F. A., and A. D. Riggs. 1996. Tiggers and other DNA transposon fossils in the human genome. Proc. Natl. Acad. Sci. USA 93:1443–1448.

    Song, W.-Y., L.-Y. Pi, T. Bureau, and P. C. Ronald. 1998. Identification and characterization of 14 transposon-like elements in the noncoding regions of members of the Xa21 family of disease resistance genes in rice. Mol. Gen. Genet. 258:449–456.[ISI][Medline]

    Surzycki, S. A., and W. R. Belknap. 1999. Characterization of repetitive DNA elements in Arabidopsis. J. Mol. Evol. 48:684–691.[ISI][Medline]

    Swofford, D. L. 1999. PAUP*. Version 4.0 b2. A commercial test version; completed version 4.0 distributed by Sinauer, Sunderland, Mass.

    Tu, Z. 1997. Three novel families of miniature inverted repeat transposable elements are associated with genes of the yellow fever mosquito, Aedes aegypti. Proc. Natl. Acad. Sci. USA 94:7475–7480.

    ———. 1999. Genomic and evolutionary analysis of Feilai, a diverse family of highly reiterated SINEs in the yellow fever mosquito, Aedes aegypti. Mol. Biol. Evol. 16:760–772.[Abstract]

    Tu, Z., and H. H. Hagedorn. 1997. Biochemical, molecular, and phylogenetic analysis of pyruvate carboxylase in the yellow fever mosquito, Aedes aegypti. Insect Biochem. Mol. Biol. 27:133–147.

    Tu, Z., and J. J. Hill. 1999. MosquI, a novel family of mosquito retrotransposons distantly related to the Drosophila I factors, may consist of elements of more than one origin. Mol. Biol. Evol. 16:1675–1686.[Abstract/Free Full Text]

    Tu, Z., J. Isoe, and J. A. Guzova. 1998. Structural, genomic, and phylogenetic analysis of Lian, a novel family of non-LTR retrotransposons in the yellow fever mosquito, Aedes aegypti. Mol. Biol. Evol. 15:837–853.[Abstract]

    Ünsal, K., and G. T. Morgan. 1995. A novel group of families of short interspersed repetitive elements (SINEs) in Xenopus: evidence of a specific target site for DNA-mediated transposition of inverted repeat SINEs. J. Mol. Biol. 248:812–823.[ISI][Medline]

    Warren, A. M., and J. M. Crampton. 1991. The Aedes aegypti genome: complexity and organization. Genet. Res. 58:225–232.[ISI][Medline]

    Wessler, S. R., T. E. Bureau, and S. E. White. 1995. LTR-retrotransposons and MITEs: important players in the evolution of plant genomes. Curr. Opin. Genet. Dev. 5:814–821.[ISI][Medline]

    Zuker, M. 1989. Computer prediction of RNA structure. Methods Enzymol. 180:262–288.[ISI][Medline]

Accepted for publication May 19, 2000.