maT—A Clade of Transposons Intermediate Between mariner and Tc1

Charles Claudianos*{dagger},2, Jeremy Brownlie{dagger}{ddagger}, Robyn Russell{dagger}, John Oakeshott{dagger} and Steven Whyard{dagger}

*Research School of Biological Sciences, The Australian National University, Canberra;
{dagger}CSIRO Entomology, Canberra;
{ddagger} Department of Botany and Zoology, The Australian National University, Canberra


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
A group of transposons, named maT, with characteristics intermediate between mariner and Tc1 transposons, is described. Two defective genomic copies of MdmaT from the housefly Musca domestica, with 85% identity, were found flanking and imbedded in the Md{alpha}E7 esterase gene involved in organophosphate insecticide resistance. Two cDNA clones, with 99% identity to each other and 72%–89% identity to the genomic copies were also obtained, but both represented truncated versions of the putative open reading frame. A third incomplete genomic copy of MdmaT was also identified upstream of the putative M. domestica period gene. The MdmaT sequences showed high identity to the transposable element Bmmar1 from the silkworm moth, Bombyx mori, and to previously unidentified sequences in the genome of Caenorhabditis elegans. A total of 16 copies of full-length maT sequences were identified in the C. elegans genome, representing three variants of the transposon, with 34%–100% identity amongst them. Twelve of the copies, named CemaT1, were virtually identical, with eight of them encoding a putative full length, intact transposase. Secondary structure predictions and phylogenetic analyses confirm that maT elements belong to the mariner-Tc1 superfamily of transposons, but their intermediate sequence and predicted structural characteristics suggest that they belong to a unique clade, distinct from either mariner-like or Tc1-like elements.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
The mariner transposon of Drosophila mauritiana and the Tc1 transposable element (TE) of Caenorhabditis elegans are members of a superfamily of class II TEs that are found in a large number of organisms, ranging from fungi to vertebrates (Doak et al. 1994Citation ; Robertson 1995Citation ). The mariner-Tc1 transposons range from 1,300 to 2,400 bp in length, contain a single transposase gene and are flanked by inverted terminal repeats (ITRs). Despite a low overall sequence identity (16%) across the superfamily (Robertson 1995Citation ), the transposases share several functional and predicted structural characters that suggest they are derived from a common ancestor (Plasterk, Izsvak, and Ivics 1999Citation ).

Transposition of mariner-Tc1 TEs occurs by a cut and paste mechanism and is mediated by the transposase's ability to recognize the ITRs (Plasterk, Izsvak, and Ivics 1999Citation ). The two functions of the transposase, sequence-specific DNA binding and excision and insertion of the TE, are mediated by separate DNA-binding and catalytic domains. At the N-terminus of the transposase is the predicted DNA binding domain, which for most mariner-Tc1 elements, is about 150 amino acids. For one member of the superfamily, Tc3, the crystal structure of the N-terminus of the DNA-bound transposase has been determined (van Pouderoyen et al. 1997Citation ) and was observed to possess a DNA-binding domain containing a helix-turn-helix (HTH) motif like that of the DNA-binding paired-like domain of some transcription factors (Franz et al. 1994Citation ; Ivics et al. 1996Citation ). The HTH motif binds to specific ITR sequences of the Tc3 transposon. Although the overall amino acid identities are not high amongst the mariner-Tc1 transposases (30%; Robertson 1995Citation ), secondary structure predictions and sequence alignments suggest that other transposases within the superfamily possess similar paired-like HTH motifs (Pietrokovsky and Henikoff 1997Citation ; van Pouderoyen et al. 1997Citation ; Plasterk, Izsvak, and Ivics 1999Citation ). Similar analyses also predict the presence of a second HTH motif that resembles a homeodomain-like DNA-binding domain, and through DNase footprints and methylation interference studies, this domain has been shown to bind to ITR sequences adjacent to the sequences bound to the paired-like HTH (Vos, van Luenen, and Plasterk 1993Citation ; Colloms, van Luenen, and Plasterk 1994Citation ; Vos and Plasterk 1994Citation ).

The second functional domain of the transposase is the (~130 amino acid) catalytic domain, which is responsible for the site-specific cleavage and joining of the transposition process. The presumptive active site within this domain is defined by a three amino acid motif, consisting of two aspartic acid residues (D) separated by more than 90 residues in the primary sequence, followed by an aspartic or glutamic acid residue (E) at a typical distance of 34 or 35 residues (Doak et al. 1994Citation ). The catalytic domain of mariner-Tc1 transposases displays distant DNA sequence similarity (39%–45%) to several prokaryotic IS transposases and to some long terminal repeat (LTR) retroelement and retroviral integrases (Fayet et al. 1990Citation ; Khan et al. 1991Citation ; Doak et al. 1994Citation ). The similarities in sequence and predicted structure across the superfamily may reflect functional conservation amongst the mariner-Tc1 TEs because similar mobility mechanisms have been observed for Tc1 (Vos, De Baere, and Plasterk 1996Citation ) and three other members of the mariner-Tc1 superfamily, Tc3 (van Luenen, Colloms, and Plasterk 1994Citation ), Mos1 (Tosi and Beverley 2000Citation ; Zhang, Dawson, and Finnegan 2001Citation ), and Himar1 (Lampe, Churchill, and Robertson 1996Citation ). The elements are all thought to excise by double-stranded DNA breaks at the end of the ITRs. When repaired, the staggered catalytic cleavage at the 5' end of the transposon donor site leaves behind a characteristic footprint (Vos and Plasterk 1994Citation ). Excision results in a copy of the element that can subsequently reintegrate elsewhere in the genome at a TA dinucleotide target site, and the integration is accompanied by duplication of the TA target on either end of the inserted element.

Although members of the mariner-Tc1 superfamily have several features in common, there are also features that distinguish mariner from Tc1 elements. Mariner elements are generally about 1.3 kb long, whereas Tc1 elements are slightly larger, ranging between 1.6 and 1.7 kb in length. The difference in length is often due to different lengths of the ITRs because the ITRs of mariner are about 30 bp, whereas those of Tc1 range from 20 to 460 bp. The nucleotide sequences of the ITRs also differ and there are some nucleotides that are characteristically conserved within each family (Robertson 1995Citation ). Robertson (1995)Citation compiled consensus sequences for mariner and Tc1 transposases and identified 99 mariner residues and 86 Tc1 residues that can serve as distinguishing characters for each family. Mutation of some of these conserved residues has been shown to have a profound effect on transposition rates of the Mos1 mariner element in Drosophila melanogaster (Lohe, De Aguiar, and Hartl 1997Citation ). However, the character which is used most often to distinguish between mariner and Tc1 is the catalytic triad motif—mariner elements have a D,D,D catalytic triad for the transposase, whereas Tc1 elements have a D,D,E catalytic triad (Robertson and Asplund 1996Citation ; Plasterk, Izsvak, and Ivics 1999Citation ).

This article identifies a novel group of TEs that clearly belong to the mariner-Tc1 superfamily. These TEs have features intermediate to both mariner and Tc1 elements, and hence we have named them maT elements. We first detected a maT element as a repetitive sequence in the genome of the housefly Musca domestica, flanking and imbedded in the Md{alpha}E7 gene implicated in organophosphate (OP) insecticide resistance (Claudianos, Russell, and Oakeshott 1999Citation ). A GenBank search revealed sequences highly similar to the housefly sequence in the genome of C. elegans. Full-length maT elements from C. elegans were subsequently identified, and this article describes their distribution within the C. elegans genome and their phylogenetic relationship to other members of the mariner-Tc1 superfamily.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
DNA Isolation, Cloning, PCR Amplification, and DNA Sequencing
Genomic DNA from M. domestica (Rutgers and sbo strains), Lucilia cuprina (LS2 strain), and D. melanogaster (Oregon R) was prepared from 1–3 day adult female flies by the Lifton method (Bender, Spierer, and Hogness 1983Citation ). Third instar larval Uni-Zap cDNA from the Rutgers housefly strain was a gift of René Feyereisen (University of Arizona, USA).

Claudianos, Russell, and Oakeshott (1999)Citation described the isolation of clone {lambda}19.3.1 from a {lambda}Dash (Stratagene) genomic library made from adults of the Rutgers strain kindly provided by R. Feyereisen, University of Arizona, USA. Clone {lambda}19.3.1 contains an esterase gene (Md{alpha}E7) implicated in OP resistance. In the current study, DNA from {lambda}19.3.1 was subcloned into compatibly digested and prepared pBluescript SK(-) (Stratagene) plasmid vector with T4 DNA ligase (New England Biolabs). Nested deletions of subcloned EcoRI fragments were created using the Erase-a-Base system (Promega). Subcloned restriction fragments and deletion clones were sequenced on both strands using flanking primers (T3 and T7) and TaqFS dye-terminators. Reactions were analyzed using a 373A automated DNA sequencer (Applied Biosystems). In total, 10 kb of the 17.8 kb of {lambda}19.3.1 were sequenced. This included two novel ~900 nt sequences associated with intron I and the 3' flanking regions of Md{alpha}E7. These sequences had significant sequence identity to each other (85%) and were named MdmaT1.A and MdmaT1.B, respectively.

Two primers, AE7.25 (5'-CGCAACAGAAAGAAAATAAAC-3') and AE7.26 (5'-ATCGACACTTTGGTATTTTT-3'), were used in PCR experiments to amplify the maT sequences from M. domestica genomic DNA and cDNA. The Md{alpha}E7-specific primers that were used to amplify adjacent sequences from genomic DNA were: AE7.4 (5'-TCGATTATTTGGGTTTCATTTGT-3'), AE7.12 (5'-GGCATGGAAAACCTCACCTGG-3'), and AE7.30 (5'-ATGAATTTCAAAGTTAGTCAA-3'). PCR was conducted in a Corbett Research FTS-1 thermal cycler. The 50-µl PCR reaction mix contained 100 µg DNA, 50 pmol of each primer, 10 mM Tris-HCl (pH 8.3), 1.5 mM KCl, 0.25 mM of each dNTP, and 1 unit of Taq polymerase (GIBCOBRL). Amplification began with an initial denaturation step of 95°C for 3 min, followed by addition of the Taq polymerase at 80°C, and then 35 cycles of 1 min at 95°C, 1 min at 55°C, and 2 min at 72°C. A final 5 min at 72°C was used to fully extend all PCR products. PCR products were resolved in agarose gels and purified using a Qiagen QIAquick PCR purification kit according to the manufacturer's instructions. Purified amplicons were cloned into EcoRV-cleaved, T-tailed pGEM-T plasmid vector (Promega). Plasmid DNA was isolated (Wizard, Promega) and the DNA was sequenced using primers complementary to the T7 and SP6 promoters in the vector. DNA was sequenced using TaqFS dye-terminator chemistry (Applied BioSystems) on the Applied BioSystems Model 373A automated DNA sequencer.

Southern Analysis
Southern blot analysis of genomic DNA was carried out after separation of 10 µg of digested DNA on duplicate 1.0% agarose gels and blotting onto supported nitrocellulose membranes (NitroPure, Micron Separations Inc.). A 536-bp MdmaT1.A genomic PCR product was labeled by the random primer method and used as a probe. Prepared membranes were hybridized with 32P-labeled DNA at 42°C in a 50% formamide hybridization solution using standard techniques. High stringency blot washes were performed at 65°C in 2x SSC, 0.1% SDS, and low stringency washes were carried out using the same solutions at 50°C.

Phylogenetic Analysis, Sequence Alignment, and Molecular Modeling
The two M. domestica maT sequences were compared with nonredundant databases using the NCBI server with tblastx and tblastn (www.ncbi.nlm.gov/cgi-bin/BLAST). maT sequences were similarly identified from the completed C. elegans genome project (www.sanger.ac.uk/Projects/C_elegans/blast_server.shtml). Additional Tc1-, Tc3-, and mariner-like sequences within the C. elegans genome were also identified by comparison with sequences previously reported (Robertson 1995Citation ; Robertson and Asplund 1996Citation ; Gomulski et al. 2001Citation ; Shao and Tu 2001Citation ). Inferred translation products were used to estimate identity and distance scores. Pairwise comparisons were performed using the GCG alignment program "Gap" (Devereux, Haeberli, and Smithies 1984Citation ), and multiple sequence comparisons and consensus sequences were generated using either the GCG program "Pileup" (Devereux, Haeberli, and Smithies 1984Citation ) or "CLUSTAL W" (Thompson, Higgins, and Gibson 1994Citation ) with default parameters (gapweight 5.0, gap length weight 0.3) for nucleotide sequences and the default scoring matrix (gapweight 3.0, gap length weight 0.1, and end gap penalties enforced for pileup; gap opening 10.0, gap extension penalty 0.1 for CLUSTAL W) for proteins. In multiple aligned sequences, the locations of indels (insertions and deletions) were adjusted as necessary so that they fell outside known structural elements of other transposon and paired-box proteins. Phylogenetic analyses were performed using PAUP* (Swofford 2000Citation ) and PHYLIP program packages (Felsenstein 1989Citation ). Distance trees were created using the neighbor-joining method with standard distances and mean character differences (PAUP*) and with "Prodist" using the "Categories-chemical model" (PHYLIP). Parsimony trees were created using simple heuristic search with fast stepwise addition (PAUP) and "Protopars" (PHYLIP). Confidence values were obtained by random resampling with 1,000 bootstrap replications.

Molecular modeling was performed using the EMBL Predict Protein server (Rost 1996Citation ) and MolMod mirror site (Molecular Modeling and Structure Prediction at ANGIS). The analyses included PHDsec, PRO-SITE search, ProDom search, COILS algorithms, and incremental threading optimization onto known (3D) structures using TITO (Labesse and Mornon 1998Citation ). PSORT II analysis (Horton and Nakai 1997Citation ) was used to predict for the presence of nuclear localization signals (NLSs).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Characterization of maT in the Housefly
Figure 1A shows the location of the two novel sequences, MdmaT1.A and MdmaT1.B, in the first intron and in the 3' flanking region of the Md{alpha}E7 gene, respectively. Database searches revealed that MdmaT1.A and MdmaT1.B (GenBank accessions AF315724 and AF315725), respectively, had 29% and 32% transposase amino acid sequence identity to the Bmmar1 transposase consensus sequence from the silkworm Bombyx mori (Robertson and Asplund 1996Citation ) as well as similarity to a number of undescribed sequences from the C. elegans genome project (see below). Neither MdmaT1.A nor MdmaT1.B are complete transposons because no ITRs or dinucleotide insertion footprints common to mariner-Tc1-type transposons (Plasterk 1996Citation ) were detected within 2 kb of the identified sequences. The sequences appear to represent nearly full length, yet imperfect ORFs (316 and 288 amino acids for MdmaT1.A and MdmaT1.B, respectively) because they approach the length of the Bmmar1 transposase consensus (346 amino acids). The MdmaT1.B sequence is within 1 kb of a second putative ORF, which shows 45% sequence identity to the potassium channel gene Shaw from D. melanogaster (Butler, Wei, and Salkoff 1990Citation ). The Shaw-like sequence is downstream of MdmaT1.B and is translated in the same direction. There is 85% identity over the 900 bp of nucleotide sequence overlap between MdmaT1.A and MdmaT1.B. However, each sequence had several frameshift and stop mutations (maximum of five and four, respectively), which suggests that these putative elements are no longer functional, even if transcribed.



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 1.—(A) Schematic view of the structure of the Md{alpha}E7 gene with surrounding genomic region. The exons are boxed and numbered 1–6. The ATG initiation codon and TAA termination codon are marked. The restriction map of the Md{alpha}E7 genomic region indicates various cleavage sites including the 2.3 and 3.2 kb EcoRI (E) fragments that were cloned and sequenced. Block arrows indicate the relative position of MdmaT1.A, MdmaT1.B and putative Shaw sequences. The PCR primers used to amplify DNA fragments are indicated with small arrows. (B) Southern blot gel of EcoRI (E), XbaI (X), SalI, (S) and HindIII (H) digested {lambda}19.3.1 DNA and sbo and Rutgers strain genomic DNA probed with the maT1 PCR product. Hybridization patterns confirm that the above EcoRI fragments contain maT-related sequences. Dark smears indicate both OP-resistant Rutgers (R+) and OP-susceptible sbo strains contain multiple copies of maT sequences

 
The primer pair AE7.25 and AE7.26 was used with {lambda}19.3.1 DNA to PCR amplify a 536-bp MdmaT1.A fragment from M. domestica DNA. This PCR product was subcloned, [{alpha}-32P] dATP labeled, and used as a probe for Southern blot analysis. As expected from the sequencing analysis, this revealed two strongly hybridizing fragments in the {lambda}19.3.1 clone (fig. 1B ). However, when the same analysis was carried out on genomic DNA from OP-resistant Rutgers or OP-susceptible sbo housefly strains, a hybridizing smear was obtained with all combinations of enzymes and conditions, suggesting the presence of possibly hundreds of MdmaT sequences in both strains. Comparative analyses under low stringency washing conditions revealed no detectable cross-hybridization to genomic DNA from the sheep blowfly L. cuprina or D. melanogaster (results not shown).

A combination of MdmaT (AE7.25) and Md{alpha}E7 esterase (AE7.4, AE7.12) primers were designed to identify and characterize MdmaT1.A and MdmaT1.B from the sbo fly strain. The primer pair AE7.12 and AE7.25 produced a 3.1-kb PCR product and confirmed that MdmaT1.B and the adjacent Shaw sequence are present at the 3' end of Md{alpha}E7 in both the OP-resistant Rutgers and -susceptible sbo strains (results not shown). However, MdmaT1.A was not detected in sbo flies using AE7.25 and AE7.4 or any other combination of primers. Southern blot analyses of genomic DNA from various housefly strains probed with the full-length Md{alpha}E7 cDNA indicate significant differences at the 5' end of the Md{alpha}E7 gene among strains (Claudianos 1999Citation ).

An attempt was made to PCR amplify MdmaT sequences from housefly cDNA. PCR experiments using a third instar Rutgers Uni-Zap cDNA library as a template and the primer pair AE7.25 and AE7.26 consistently produced 500 bp products. The products were cloned and sequenced. Two similar sequences of 497 and 498 bp encode 99% identical peptides and contain a single common stop mutation (amino acid position 129), resulting in truncated ORFs (GenBank accessions AF324221 and AF324222). A consensus housefly MdmaT sequence was generated based on MdmaT1.A and MdmaT1.B and the two cDNA sequences. The resulting 1,050 bp sequence encodes a 350 amino acid protein that has 35% identity and 55% similarity to the 346 amino acid consensus sequence of the Bmmar1 transposon (fig. 2A ).



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 2.—(A) Multiple alignment of the consensus M. domestica maT (MdmaT), C. elegans maT (CemaT1) and B. mori Bmmar1 sequences. Black shading indicates the putative D,D,D/E catalytic triad residues common to mariner-Tc1 TEs, gray shading indicates the bipartite nuclear localization signal, * indicates identical residues, and + indicates conservative amino acid replacements. (B) A schematic representation of C. elegans maT transposase protein and maT transposon DNA. Putative domains that have similarity to structural and functional domains of Tc1-mariner TEs are indicated. The shading and numbering of residues and nucleotides correspond to the protein and DNA sequences of maT used in this study. The 26-bp ITR is shown, flanked by TA dinucleotide insertion footprints

 
An examination of the GenBank database revealed that a partial MdmaT element is also present in the intron of the period gene of M. domestica (Piccin et al. 2000Citation ). This MdmaT sequence lacks ITRs but has 72% and 71% identities to MdmaT1.A and MdmaT1.B, respectively. This sequence, designated MdmaT1.C, has a number of frameshift and stop mutations and would not be expected to encode an active transposase.

maT in C. elegans
Examination of the C. elegans genome sequencing project (www.sanger.ac.uk/Projects/) revealed a total of 16 maT sequences apparently randomly distributed throughout the nematode's genome (table 1 ). Twelve of the 16 C. elegans maT sequences, all denoted CemaT1, are highly similar, with greater than 99% identity. Four of these 12 CemaT1 elements contain a small number (1–5) of coding, frameshift and stop mutations, and are likely nonfunctional (table 1 ). The remaining eight CemaT1 copies contain a single 1,008 nt ORF encoding a 336 amino acid peptide. These apparently intact CemaT1 elements have perfect 26 nucleotide palindromic ITRs and are flanked by TA dinucleotide insertion duplications. When viewed together, the relatively small number of copies, the high ratio of putative functional to nonfunctional sequences, and the limited sequence divergence suggest that the invasion of CemaT1 into the genome of the C. elegans N2 strain genome was a relatively recent event.


View this table:
[in this window]
[in a new window]
 
Table 1 The maT Elements In Caenorhabditis elegans. Three Families of maT Elements Were Identified and Their Genomic Locations Were Determined Using ACeDB (www.sanger.ac.uk). Putative Intact Open Reading Frames (ORFs) are Indicated (Y = yes), as are Putatively Nonfunctional ORFs

 
Three of the four remaining CemaT elements are highly similar to each other (88%–94% identities), and have been designated as CemaT2. The CemaT2 elements are 54% similar, 34% identical to CemaT1. The other C. elegans maT element, designated CemaT3, is distinct from each of the other two, with 55% and 53% sequence divergence from CemaT1 and CemaT2, respectively. All the CemaT2 and CemaT3 elements appear to be nonfunctional because there are several stop codons within the predicted ORFs.

A pairwise sequence comparison of the consensus housefly MdmaT and nematode CemaT1 encoded proteins (GenBank accession U41268, nt 7,264–8,271) shows 38% identity and 58% similarity (fig. 2A ). The consensus MdmaT has 29% and 36% identity to CemaT2 and CemaT3, respectively. A multiple alignment of CemaT1 and Bmmar1 encoded proteins verifies a common C-terminal D,D,D catalytic triad characteristic of mariner transposase proteins (Doak et al. 1994Citation ). Similarly, the maT encoded protein has N-terminal similarity to the DNA binding and DNA recognition domains of Tc1 and Tc3 transposons (Colloms, van Luenen, and Plasterk 1994Citation ). A schematic of the sequence organization and putative protein characteristics of CemaT1 is shown in figure 2B.

maT as a Novel Clade
A multiple alignment of maT, Tc1- and mariner-related proteins as well as a distantly related bacterial transposon was used to construct phylogenetic trees (fig. 3A and B ). These evolutionary analyses confirmed that Bmmar1, MdmaT, and CemaT sequences cluster together as a single clade within the mariner-Tc1 superfamily. A number of recently identified transposons (Shao and Tu 2001Citation ) of the mariner-Tc1 superfamily, two from the nematode Caenorhabditis briggsae (C.briggsae.ITmD37D1 and C.briggsae.ITmD37D2), and one from the dipteran insect Sarcophaga peregrina (S.peregrina.ITmD37D1) also fall within this clade. Two somewhat related mosquito sequences (An. gambiae and Ae. atropalpus ItmD37E1; Shao and Tu 2001Citation ) show a statistically supported monophyletic relationship with maT elements (fig. 3 ). Intriguingly, the apparent close phylogenetic relationship of the mosquito DD37E elements to DD37D maT elements conflicts with the nominal catalytic motif (D,D,D/E) classification system.



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 3.—(A) Midpoint rooted and (B) unrooted version of a neighbor-joining tree showing phylogenetic relationships of putative proteins encoded by M. domestica maT, C. elegans maT, and B. mori Bmmar1 TEs with other members of the mariner-Tc1 superfamily. Ellipses define Tc1 (shaded) and mariner families from maT elements. Distance (PAUP*) and parsimony (PHYLIP) bootstrap values >50% are shown above and below the nodes, respectively. Strong statistical support (*) delineates three distinct lineages in the phylogeny. Congruent tree topologies were confirmed using PAUP* and PHYLIP (Parsimony and Distance) treatments (data not shown). The alignment used to generate these trees is available from EMBL at ALIGN_000448

 
Although many of the lineages of the mariner-Tc1 TEs in the phylogenetic analysis are considerably diverged, all have a sequence similarity (>36%) to a transposase domain that contains a D,D,E/D catalytic triad and a slightly greater similarity (>40%) to paired DNA-binding proteins. Congruent bootstrap parsimony and distance trees (fig. 3A and B, respectively) supported by high confidence values, confirm that maT-type TEs lie between the mariner and the Tc1 groups to form a novel group of TEs, distinct from either of the two major families. In addition, maT transposases share only 19% and 26%, respectively, of the 99 mariner and 86 Tc distinguishing characters, or synapomorphies, defined by Robertson and Asplund (1996)Citation . Within these synapomorphies, are three conserved blocks of sequence associated with the catalytic triad that can define (or separate) mariner from Tc- like elements (Capy et al. 1996Citation ). For mariner-like elements, the consensus sequence around the first catalytic residue is TXDE, whereas the Tc-like consensus is (W/F)(S/T)DE. CemaT1 has a FTDE sequence, which conforms to the Tc consensus sequence. At the site of the second catalytic residue is a HDNA consensus for mariner elements, and a QDND for Tc elements. CemaT1 has a QDGA motif, containing the glutamine from the Tc motif and the alanine residue from the mariner motif. Finally, at the third catalytic residue, mariner elements have a SPDLAP(S/T/I)DY consensus sequence and Tc elements have a SPDLNPIE consensus sequence. CemaT1 has a SPDLNPMDY motif, which has a unique methionine at position 7 of the motif not found in either family, the Tc asparagine at position 5 of the motif, and the mariner catalytic aspartic acid and adjacent tyrosine. Based on the consensuses around the catalytic residues, CemaT1 appears intermediate to both families. The overall length of the maT elements and their ITR lengths are more similar to mariner elements, although the maT ITR sequences show greater similarity to a Tc1 consensus ITR sequence (10 of 12 consensus bp) than to a mariner ITR sequence (5 of 12 consensus bp). Lampe et al. (2000)Citation recently named several C. elegans mariner TEs with high identity to Bmmar1, but they are not yet documented in GenBank. Examination of these C. elegans sequences (kindly provided by H. Robertson, University of Illinois) confirmed that some of the so-called C. elegans mariner elements are the same as the maT elements that we have described. It is our contention, given the number of characteristics shared with both mariner and Tc1 elements, that the TEs that form this clade are better named maT to reflect their intermediate position in the mariner-Tc1 superfamily.

A multiple alignment of experimentally determined or putatively identified active members of the mariner-Tc1 superfamily was used to construct maximum parsimony trees for both the DNA binding and catalytic domains of the transposase protein (fig. 4A and B, respectively). The phylogenetic analyses of the two separate domains generated trees with similar topology to each other as well as to the previous extensive phylogenetic analyses based on the total transposase sequence (fig. 3 ). Although maT transposases have a D,D,D catalytic triad, which has often been used as a defining character for mariner elements, phylogenetic analysis of the catalytic domain still predicts a closer relationship between Tc and maT elements than mariner and maT TEs. It seems unlikely that maT elements are hybrid TEs that resulted from a recombination between a Tc and a mariner element because both of maT's transposase domains appear to be more Tc-like than mariner-like. Interelement recombinations or gene conversions are considered to have given rise to various hybrid elements, including LINE retrotransposons in humans (Saxton and Martin 1998Citation ), Ty retrotransposons in yeast (Jordan and McDonald 1998Citation ), and numerous bacterial (insertion sequences, IS) transposons (Mahillon and Chandler 1998Citation ). In all these cases however, phylogenetic analyses suggested that the putative hybrid elements had acquired their two functional domains from different elements. In contrast, there is no evidence to suggest that maT elements acquired one of their two transposase domains from a mariner element and the other from a Tc element.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 4.—Phylogenetic relationships of active (putatively or experimentally determined) members of the mariner-Tc1 superfamily. Parsimony analysis of (A) the DNA-binding domain or (B) catalytic domain were created using the PHYLIP, and trees were rooted using the IS630 sequence. Confidence values are shown (calculated from 1,000 replications). Alignments used to generate trees are available from EMBL, ALIGN_000233, and ALIGN_000210

 
Interestingly, with parsimony analysis, the majority of informative character states (PAUP*-character states; Swofford 2000Citation ) are associated with the C-terminal transposase domain of aligned TEs. Additionally, database searches and phylogenetic analyses using the separated ~130 amino acid N-terminal domain show maT transposase has a closer similarity to paired proteins than to the N-terminal domains of other mariner-Tc1 transposases (51% versus 40%–45%, respectively). These similarities strongly suggest that this superfamily's transposases and the paired protein's DNA-binding domains share a common ancestral origin.

Molecular Modeling of maT: Structure-Function Analyses
The 130 amino acid N-terminal domain of C. elegans maT1 transposase has 23% identity and 42% similarity to the N-terminal DNA-binding domain of the C. elegans Tc3 transposase (Tc3A; Collins, Forbes, and Anderson 1989Citation ) and 24% identity and 52% similarity to the DNA-binding paired domain of the Pax-paired (prd) family of transcription factors of Drosophila (Bopp et al. 1986Citation ) and mammals (Franz et al. 1994Citation ). The corresponding N-terminal domain of the mariner element Mos1 has less identity (20%) to CemaT1 than either Tc3A or prd. Secondary structure predictions of the CemaT transposase N-terminal domain (fig. 5 ) identified six putative {alpha}-helices (residues 9–14, 18–24, 32–44, 62–68, 80–88, 109–124) in an HTH arrangement typical of eukaryotic paired and homeobox proteins, as well as prokaryotic {lambda} repressors and Hin recombinase proteins (Wintjens and Rooman 1996Citation ).



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 5.—(A) Secondary structure schematic of the N-terminal domains of C. elegans' CemaT1. The maT protein is predicted to have seven structurally conserved {alpha}-helices, six of which ({alpha}1–6) are typical of the HTH motif of prd and Tc3A-N that recognizes and binds to specific DNA target sites. (B) Amino acid sequence alignments of C. elegans' CemaT1 and Tc3, D. melanogaster's prd, and D. mauritiana's Mos1. The predicted {alpha}-helices are shaded in gray. Empirically determined DNA contacts in prd and Tc3A-N are indicated above their corresponding peptide sequences; p indicate contacts with the sugar phosphate backbone; M indicates major groove base contacts and m indicates minor groove contacts with the DNA double helix. A PSORT II analysis indicates that a bipartite nuclear localization signal is associated with helix 6 (underlined). PHDsec predicts the additional intervening helix (black box) that is not present in prd or Tc3A-N structures. A homeodomain motif between helices 3 and 4 is also indicated (italics)

 
X-ray crystallography studies (van Pouderoyen et al. 1997Citation ) and secondary structure analyses (fig. 5 ), have indicated that both prd and Tc3A DNA-binding domains have six predicted {alpha}-helices, arranged in a bipartite manner, with three helices in each subdomain. In prd, helices 2 and 3 form an HTH motif with direct DNA contacts (Xu et al. 1995Citation ). In Tc3A, helices 1–3 have a paired-like motif, with helices 2 and 3 involved in specific DNA binding to a portion of the ITRs (van Pouderoyen et al. 1997Citation ). Helices 4–6 of Tc3A have a more homeodomain-like motif and are considered part of a DNA recognition subdomain, recognizing ITR sequences adjacent to those bound to the paired-like HTH motif (van Pouderoyen et al. 1997Citation ). In prd, helices 4–6 could not be shown to bind to DNA (Xu et al. 1995Citation ). However, helices 4–6 of other members of the paired domain family (Pax proteins) showed sequence-specific recognition, like that of the second subdomain of Tc3A (Epstein et al. 1994Citation ; Xu et al. 1995Citation ).

The alignment of the DNA-binding domains of CemaT1, Tc3, Mos1, and prd shows that the first three helices of the CemaT1 transposase closely match in position and size the equivalent helices from the DNA-binding subdomains of Tc3A-N and prd (fig. 5 ) In contrast, the mariner element Mos1 differs from the others because it shares only two of the three helices in this subdomain, and two of these helices are separated by a longer intervening loop than that observed or predicted for the other sequences examined. X-ray crystallography data for the N-terminal domains of Tc3A and prd indicated that the two proteins share highly similar topographies, with only 2.3 Å difference between the DNA-binding domains of the two proteins and 0.66 Å difference between the connecting loops of the Tc3A-N and prd structures (van Pouderoyen et al. 1997Citation ). Threading optimization of the CemaT1 transposase N-terminus with prd and Tc3 (using TITO software) predicted a hypothetical structure of CemaT1 that closely matched the three-dimensional structures of both prd and Tc3A-N, falling within the 2.3 Å helix and 0.66 Å loop tolerances suggested by the crystal structure comparison of Tc3A-N and prd (results not shown).

Atomic structure comparisons show Tc3A shares 11 of the 16 sugar-phosphate DNA contacts of prd, and four of these are identical residues (residues 15, 46, 51, and 70 of prd). The relative positions of the HTH motifs involved in base-specific major and minor groove (DNA-helix) docking interactions are also conserved between these two proteins (fig. 5 ). CemaT1 shares seven and four identical DNA contact residues with prd and Tc3A, respectively. In contrast, Mos1 shares only one identical contact residue with either prd or Tc3A. The relative lack of conservation of nucleotide-specific docking residues makes it difficult to predict any maT-specific target sequences. However, like Tc3A and Tc1A (Colloms, van Luenen, and Plasterk 1994Citation ; van Pouderoyen et al. 1997Citation ), these nucleotides are expected to be within the 26-bp ITR, and would likely differ between Tc3 and CemaT1, because their ITR sequences differ by 54%.

PHDsec predictions also suggest that the CemaT1 transposase has the equivalent of the prd and Tc3A helices 4, 5, and 6, along with an additional helix predicted at residues 93–100, between helices 5 and 6. A GRPR-like sequence is conserved in mariner-Tc1 transposases between helices 3 and 4, and CemaT1 similarly has a GRPP sequence. This motif is characteristic of homeodomain proteins (Gehring et al. 1994Citation ) and mediates DNA interactions of these and related proteins. Overall, the primary sequence and secondary structure profile of the N-terminus of the CemaT1 protein suggest it is more closely related to the prd protein than to either the Tc3A-N or Mos1 peptides. However, important differences clearly exist in the DNA-binding domains of the prd, Tc3A-N, Mos-1, and CemaT1 transposase. The few residues preceding the first helix of Tc3A adopt a conformation different to that for the longer N-terminus of the paired domain, which forms a small ß-sheet structure (Xu et al. 1995Citation ; van Pouderoyen et al. 1997Citation ). In this respect, CemaT1 transposase appears to be more similar to Tc3A and Mos1 and may have the same ability to affect the conformational changes to DNA needed for transposon-transposase complementarity.

The mariner-Tc1 TE proteins contain a bipartite-type NLS comprising two basic amino acids followed by a 10 amino acid spacer and a cluster of three basic residues (Ivics et al. 1996Citation ). PSORT II analysis of the proteins shows that the last helix of CemaT1 and Mos1 DNA recognition domains overlaps the bipartite NLS (fig. 5 ). Although the last helix of CemaT1 aligns reasonably well with that of Tc3, the last helix of Mos1 does not align with the positions of the last helices of any of the other sequences examined due to the presence of additional intervening sequence between the last two helices.


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Bioinformatic analyses including database searches, multiple alignments, phylogenetic tree reconstructions, and molecular modeling all support the hypothesis that the maT and mariner-Tc1 transposases are closely related chimeric proteins comprising DNA-binding and catalytic transposase domains. The maT family represents a phylogenetic node intermediate to mariner and Tc1 subfamilies, clearly distinguishing a third major lineage of TEs within the mariner-Tc1 superfamily. The discovery of homologous maT sequences in organisms as diverse as the housefly M. domestica, nematode C. elegans, and silk moth B. mori suggests members of this new clade may be widespread.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
We would like to thank René Feyereisen, Ian Boussy, and David Rowell for their helpful discussions and comments. C.C. was supported by a grant from the National Health and Medical Research Council of Australia (997038).


    Footnotes
 
Ross Crozier, Reviewing Editor

Keywords: transposon mariner Tc1 Caenorhabditis elegans Musca domestica Md{alpha}E7 Back

Address for correspondence and reprints: Charles Claudianos, Research School of Biological Sciences, The Australian National University, G.P.O. Box 475, Canberra, ACT 2601, Canberra. E-mail: claudianos{at}rsbs.anu.edu.au Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 

    Bender W., P. Spierer, D. S. Hogness, 1983 Chromosomal walking and jumping to isolated DNA from Ace and rosy loci and the bithorax complex in Drosophila melanogaster J. Mol. Biol 168:17-33[ISI][Medline]

    Bopp D., M. Burri, S. Baumgartner, G. Frigerio, M. Noll, 1986 Conservation of a large protein domain in the segmentation gene paired and in functionally related genes of Drosophila Cell 47:1033-1040[ISI][Medline]

    Butler A., A. Wei, L. Salkoff, 1990 Shal, Shab, and Shaw: three genes encoding potassium channels in Drosophila Nucleic Acids Res 18:2173-2174[ISI][Medline]

    Capy P., R. Vitalis, T. Langin, D. Higuet, C. Bazin, 1996 Relationships between transposable elements based upon the integrase- transposase domains: is there a common ancestor? J. Mol. Evol 42:359-68[ISI][Medline]

    Claudianos C., 1999 The evolution of {alpha}-esterase mediated organophosphate resistance in Musca domestica Doctoral dissertation, The Australian National University, Canberra, Australia

    Claudianos C., R. J. Russell, J. G. Oakeshott, 1999 The same amino acid substitution in orthologous esterases confers organophosphate resistance on the house fly and a blowfly Insect Biochem. Molec. Biol 29:675-686[ISI][Medline]

    Collins J., E. Forbes, P. Anderson, 1989 The Tc3 family of transposable genetic elements in Caenorhabditis elegans Genetics 121:47-55[Abstract/Free Full Text]

    Colloms S., H. G. A. M. van Luenen, R. H. A. Plasterk, 1994 DNA binding activities of the Caenorhabditis elegansTc3 transposase Nucleic Acids Res 22:5548-5554[Abstract]

    Devereux J., P. Haeberli, O. Smithies, 1984 A comprehensive set of sequence analysis programs for the vax Nucleic Acids Res 12:387-395[Abstract]

    Doak T., F. Doerder, C. Jahn, G. Herrick, 1994 A proposed superfamily of transposase genes: transposon-like elements in ciliated protozoa and a common "D35E" motif Proc. Natl. Acad. Sci. USA 91:942-946[Abstract]

    Epstein J., J. Cai, T. Glaser, L. Jepeal, R. Maas, 1994 Identification of a Pax paired domain recognition sequence and evidence for DNA-dependent conformational changes J. Biol. Chem 269:8355-8361[Abstract/Free Full Text]

    Fayet O., P. Ramond, P. Polard, M. Prere, M. Chandler, 1990 Functional similarities between retroviruses and the IS3 family of bacterial insertion sequences Mol. Microbiol 4:1771-1777[ISI][Medline]

    Felsenstein J., 1989 PHYLIP: phylogeny inference package. Version 3.2 Cladistics 5:164-166

    Franz G., T. Loukeris, G. Dialektaki, C. Thompson, C. Savakis, 1994 Mobile Minos elements from Drosophila hydei encode a two-exon transposase with similarity to the paired DNA-binding domain Proc. Natl. Acad. Sci. USA 91:4746-4750[Abstract]

    Gehring W. J., Y. Q. Qian, M. Billeter, K. Furukubo-Tokunaga, A. F. Schier, D. Resendez-Perez, M. Affolter, G. Otting, K. Wuthrich, 1994 Homeodomain-DNA recognition Cell 78:211-223[ISI][Medline]

    Gomulski L. M., C. Torti, M. Bonizzoni, D. Moralli, E. Raimondi, P. Capy, G. Gasperi, A. R. Malacrida, 2001 BA new basal subfamily of mariner elements in Ceratitis rosa and other Tephritid flies J. Mol. Evol 53:597-606[ISI][Medline]

    Horton P., K. Nakai, 1997 Better prediction of protein cellular localization sites with the k nearest neighbor classifier Intellig. Syst. Mol. Biol 5:147-152

    Ivics Z., Z. Izsvak, A. Minter, P. Hackett, 1996 Identification of functional domains and evolution of Tc1-like transposable elements Proc. Natl. Acad. Sci. USA 93:5008-5013[Abstract/Free Full Text]

    Jordan I. K., J. F. McDonald, 1998 Evidence for the role of recombination in the regulatory evolution of Saccharomyces cerevisiae Ty elements J. Mol. Evol 47:14-20[ISI][Medline]

    Khan E., J. Mack, R. Katz, J. Kulkosky, A. Skalka, 1991 Retroviral integrase domains: DNA binding and the recognition of LTR sequences Nucleic Acids Res 19:851-60 [published erratum appears in Nucleic Acids Res. 1991 Mar 25; 19(6): 1358[ [Abstract]

    Labesse G., J. Mornon, 1998 Incremental threading optimization (TITO) to help alignment and modeling of remote homologues Bioinformatics 14:206-211[Abstract]

    Lampe D. J., M. E. A. Churchill, H. M. Robertson, 1996 A purified mariner transposase is sufficient to mediate transposition in vitro EMBO J 15:5470-5479[Abstract]

    Lampe D. J., K. K. O. Walden, J. M. Sherwood, H. M. Robertson, 2000 Genetic engineering of insects with mariner transposons Pp. 237–248 in A. M. Handler, and A. A. James, eds. Insect transgenesis. Methods and applications. CRC Press. Boca Raton, Fla

    Lohe A. R., D. De Aguiar, D. L. Hartl, 1997 Mutations in the mariner transposase: the D,D,(35)E consensus sequence is nonfunctional Proc. Natl. Acad. Sci. USA 94:1293-1297[Abstract/Free Full Text]

    Mahillon J., M. Chandler, 1998 Insertion sequences Microbiol. Mol. Biol. Rev 62:725-774[Abstract/Free Full Text]

    Pietrokovsky S., S. Henikoff, 1997 A helix-turn-helix DNA-binding motif predicted for transposases of DNA transposons Mol. Gen. Genet 254:689-695[ISI][Medline]

    Piccin A., M. Couchman, J. D. Clayton, D. Chalmers, R. Costa, C. P. Kyriacou, 2000 The clock gene period of the housefly, Musca domestica, rescues behavioral rythmicity in Drosophila melanogaster. Evidence for intermolecular coevolution? Genetics 154:747-758[Abstract/Free Full Text]

    Plasterk R. H. A., 1996 The Tc1/mariner transposon family Curr. Top. Microbiol. Immunol 204:125-143[ISI][Medline]

    Plasterk R. H. A., Z. Izsvak, Z. Ivics, 1999 Resident aliens—the Tc1/mariner superfamily of transposable elements Trends Genet 15:326-332[ISI][Medline]

    Robertson H. M., 1995 The Tc1-mariner superfamily of transposons in animals J. Insect Physiol 41:99-105[ISI]

    Robertson H. M., M. L. Asplund, 1996 Bmmar1: a basal lineage of the mariner family of transposable elements in the silk moth, Bombyx mori Insect Biochem. Mol. Biol 26:945-954[ISI][Medline]

    Rost B., 1996 PHD: predicting one-dimensional protein structure by profile-based neural networks Methods Enzymol 266:525-539[ISI][Medline]

    Saxton J. A., S. L. Martin, 1998 Recombination between subtypes creates a mosaic lineage of LINE-1 that is expressed and actively retrotransposing in the mouse genome J. Mol. Biol 280:611-622[ISI][Medline]

    Shao H., Z. Tu, 2001 Expanding the diversity of the IS630-Tc1-mariner superfamily: discovery of a unique DD37E transposon and reclassification of the DD37D and DD39D transposons Genetics 159:1103-1115[Abstract/Free Full Text]

    Swofford D. L., 2000 PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4 Sinauer Associates, Sunderland, Mass

    Tosi L. R., S. M. Beverley, 2000 cis and trans factors affecting Mos1 mariner evolution and transposition in vitro, and its potential for functional genomics Nucleic Acids Res 28:784-790[Abstract/Free Full Text]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    van Luenen H. G. A. M., S. D. Colloms, R. H. A. Plasterk, 1994 The mechanism of transposition Tc3 of C. elegans Cell 79:293-301[ISI][Medline]

    van Pouderoyen G., R. Ketting, A. Perrakis, R. Plasterk, T. Sixma, 1997 Crystal structure of the specific DNA-binding domain of Tc3 transposase of C. elegans in complex with transposon DNA EMBO J 16:6044-6054[Abstract/Free Full Text]

    Vos J. C., I. De Baere, R. H. A. Plasterk, 1996 Transposase is the only nematode protein required for in vivo transposition of Tc1 Genes Dev 10:755-761[Abstract]

    Vos J., R. Plasterk, 1994 Tc1 transposase of Caenorhabditis elegans is an endonuclease with a bipartite DNA binding domain EMBO J 13:6125-6132[Abstract]

    Vos J. C., H. G. A. M. van Luenen, R. H. A. Plasterk, 1993 Characterization of the Caenorhabditis elegansTc1 transposase in vivo and in vitro Genes and Devel 7:1244-1253[Abstract]

    Wintjens R., M. Rooman, 1996 Structural classification of HTH DNA-binding domains and protein-DNA interaction modes J. Mol. Biol 262:294-313[ISI][Medline]

    Xu W., M. Rould, S. Jun, C. Desplan, C. Pabo, 1995 Crystal structure of a paired domain-DNA complex at 2.5 A resolution reveals structural basis for Pax developmental mutations Cell 80:639-50[ISI][Medline]

    Zhang L., A. Dawson, D. J. Finnegan, 2001 DNA-binding activity and subunit interaction of the mariner transposase Nucleic Acids Res 29:3566-3575[Abstract/Free Full Text]

Accepted for publication July 17, 2002.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (9)
Request Permissions
Google Scholar
Articles by Claudianos, C.
Articles by Whyard, S.
PubMed
PubMed Citation
Articles by Claudianos, C.
Articles by Whyard, S.