Drosophila P Transposons in the Human Genome?

Sylvia Hagemann and Wilhelm Pinsker

Institute of Medical Biology, University of Vienna, Vienna, Austria

P elements are DNA transposons that were first discovered to be the causative agent of hybrid dysgenesis in Drosophila melanogaster (Kidwell, Kidwell, and Sved 1977Citation ) but were later found to occur in many drosophilid species. The interspecific distribution of P-element sequences, as well as their sequence relationships, is not in accordance with the phylogeny of their host species. Therefore, P elements are not merely vertically inherited but can also be transmitted horizontally between sexually isolated taxa. The most striking example is the rather recent transfer from Drosophila willistoni to D. melanogaster (Daniels et al. 1990Citation ), which must have occurred in the last century and was followed by a rapid spread of P elements through the natural populations of this new host (Anxolabéhère, Kidwell, and Periquet 1988Citation ). Additional cases of horizontal transmission show that P elements have repeatedly crossed species barriers and have even invaded species of the related genera Scaptomyza and Lordiphosa (Hagemann, Haring, and Pinsker 1996aCitation ; Clark and Kidwell 1997Citation ; Haring, Hagemann, and Pinsker 2000Citation ; Silva and Kidwell 2000Citation ). However, due to the fact that their mode of transposition requires host-encoded proteins (Rio and Rubin 1988Citation ), the distribution of active P elements seemed to be restricted to fruit flies of the drosophilid family. Thanks to the data provided by the human genome project, we are able to show that a homolog of the P element coding sequence exists as a stationary single-copy sequence in the human genome.

The molecular structure of actively transposing Drosophila P elements consists of four exons (designated 0–3) flanked by terminal inverted repeats (O'Hare and Rubin 1983Citation ). The exons code for two different proteins produced by differential splicing of the primary transcript (Misra and Rio 1990Citation ). The transposase contains the information of all four exons and mediates transposition in germ line cells. A second protein, which is translated from an mRNA that retains the third intron, acts as a repressor of P-element transposition in somatic cells. Besides these full-sized P elements, terminally truncated P element homologs have been detected in two different Drosophila lineages (Miller et al. 1992Citation ; Nouaud and Anxolabéhère 1997Citation ). In both cases, the terminal inverted repeats are missing and the coding region lacks the transposase-specific exon 3. It is assumed that these truncated P homologs are derivatives of previously active transposons that were primarily maintained in their host genomes as suppressors of transposition but later acquired new functions in their host genomes (Miller et al. 1999Citation ). P-related sequences without terminal inverted repeats were also found outside the drosophilid family in the Australian sheep blowfly Lucilia cuprina (Perkins and Howells 1992Citation ) and in the house fly Musca domestica (Lee, Clark, and Kidwell 1999Citation ). These immobile P homologs may represent either truncated derivatives of once transpositionally active P elements or, alternatively, descendants of an ancestral genomic progenitor sequence that later evolved into mobile P elements by acquisition of the terminal structures essential for transposition.

The P homolog in the human genome (Phsa) was discovered by a BLAST search of the GenBank database carried out with a 1,690-bp cDNA sequence derived from a recently isolated 3'-truncated P element of Drosophila subsilvestris (unpublished data) which belongs to the T-type subfamily of P transposons (Hagemann, Haring, and Pinsker 1996bCitation ). The search revealed significant amino acid similarity of the deduced Drosophila protein sequence to a human protein of unknown function (accession number BAB15609), with 23% identities and 40% similarities over 407 amino acids (BLAST E value score: 9 x 10-7). The corresponding cDNA sequence of this human protein was originally described in the course of the NEDO human cDNA sequencing project (accession number AK026973). Partial sequences are registered in the human EST database (AA443424, AI219532, AA659374, AA194210, AA194021). Through the cDNA sequence, we were able to identify the entire gene in the recently released complete human genome sequence (position NT_006413.2/Hs 4_6570 on the long arm of chromosome 4). The exon/intron limits were subsequently deduced from the alignment of the genomic sequence with the cDNA. Alignments with several insect P-element sequences suggest that the coding region of Phsa extends farther upstream of the previously presumed start codon.

In figure 1a , the molecular structure of Phsa is compared with that of the full-sized canonical P element (p{pi}25.1) of D. melanogaster. The coding region of Phsa includes equivalents of exons 1–3 but lacks nearly the whole exon 0 and the terminal structures. No inverted repeats are found in the flanking regions over a distance of 1 kb on both sides. The presumed start codon is located close to the 3' end of exon 0 (ed0) of the Drosophila sequence. Phsa contains 3 exons (eh1–eh3) and 2 introns (ih1 and ih2). The introns of the Drosophila P element (id1–id3) are missing in Phsa. The large exon eh3 is homologous to the 3' section of ed1 and the complete sequences of ed2 and ed3. Two introns (ih1, ih2) not found in the Drosophila sequence are located within the section corresponding to ed1. The larger intron ih2 has a length of 9,008 bp and contains insertions of six mobile sequences (five Alu elements and one LINE-1 element). The reading frames are intact and encode a protein of 759 amino acids. The occurrence as a single-copy sequence (at least in the euchromatic part of the human genome), the absence of the characteristic inverted repeat termini, and the length of the sequence (12.4 kb) suggest that Phsa is not transpositionally active. In figure 1b , the deduced protein sequences of Phsa and the D. melanogaster P element (p{pi}25.1) are compared. Sequence similarity is highest in the central section of ed2. One of the conserved motifs (ATQLFS) is found not only in Phsa and p{pi}25.1, but also in all Drosophila P-related sequences and, with one replacement, in the P homologs of L. cuprina and M. domestica. The section corresponding to ed3 shows the strongest divergence (amino acid similarity: 22.1% in ed1, 27.7% in ed2, and 13.6% in ed3) and differs in length by 56 amino acids. This is not surprising, as exon 3, which is required for transposase function only, is the most variable section among the Drosophila P elements (Witherspoon 1999Citation ). Nevertheless, the presence of two conserved motifs, AGYV++KL and GL++PSE, suggests that this section too is homologous to P elements.



View larger version (50K):
[in this window]
[in a new window]
 
Fig. 1.—Comparison of the human P homologous gene (Phsa) with the canonical Drosophila melanogaster P element (p{pi}25.1). a, Molecular structure of the genes. ATG is the start codon; exons are represented by eh (Phsa) and ed (p{pi}25.1); introns are represented by ih (Phsa) and id (p{pi}25.1). Positions of introns in the corresponding sequences are indicated by vertical arrows. Intron ih2 contains five Alu insertions (full circles) and one LINE-1 element (open circle). b, Alignment of deduced protein sequences. Identical amino acids are indicated by asterisks, and conservative replacements are indicated by colons (250 PAMs >0) or dots (250 PAMs =0). The translation start postulated on the basis of the cDNA sequence is marked by an open square; the black square shows the presumed start deduced from the sequence alignment. Exon limits (eh, ed) are indicated by vertical arrows. The dendrogram in figure 2 is based on the conserved section (BLAST E value score: 2.29 x 10-18) framed by horizontal arrows

 
In a BLAST search of the protein database with the deduced Phsa protein sequence, significant score values were obtained for 12 different P elements and P-related sequences (score values ranged from 6 x 10-12 for Lucilia cuprina to 2 x 10-4 for Drosophila davidii). The sequence relationships among eight P encoded proteins are depicted in figure 2 . A multiple alignment was obtained by the program CLUSTAL W (Thompson et al. 1997Citation ) and adjusted by hand (not shown; available as supplementary material). The maximum-parsimony dendrogram (PAUP, version 4.0b6) is based on the internal section indicated in figure 1b . This section, with a length of 472 bp in the alignment, appeared to be the best-conserved region (average amino acid identity = 25.7%), with unambiguous homology among the eight sequences. It covers most of exons ed1 and ed2, which are also the sections best conserved among Drosophila P elements. Because there is no sequence that can be used as an outgroup, midpoint rooting was employed to root the tree. The dendrogram shows that Phsa is separated from the cluster of insect P elements, reflecting the early split of these two lineages. Terminal inverted repeats are found in Drosophila sequences only, but not in the P homologs of Musca, Lucilia, and Homo, and thus may be interpreted as a derived trait. Nevertheless, the loss of mobility through terminal truncation has been shown to occur repeatedly in different phylogenetic lineages (Miller et al. 1992Citation ; Nouaud and Anxolabéhère 1997Citation ). Therefore, the dendrogram does not provide a definite answer for the evolutionary origin of mobile P transposons. Phsa may be considered a degenerated transposon that became immobile through loss of the terminal structures and has now acquired a novel function. Recruitment of transposon sequences for new assignments in the host genome, described as "molecular domestication" (Miller et al. 1992Citation ), now appears to be an important and rather common evolutionary phenomenon (Kidwell and Lisch 2001Citation ). Recent analyses of the human genome sequence revealed that at least 47 human genes are derived from transposable elements, which seem to act as a creative force in the evolution of novel coding sequences (International Human Genome Sequencing Consortium 2001Citation ). Alternatively, one could assume that Phsa might be the genomic ancestor of the P transposon family. It may have originated as a single-copy gene that in the course of sequence rearrangements acquired the appropriate termini necessary for genomic mobility. Partial cDNA sequences from the cow Bos taurus (accession number AW483725) and the chicken Gallus gallus (accession numbers AJ395159 and AJ394151) that are homologous to Phsa and the insect-derived P sequences were found in the EST database. Thus, Phsa, as well as the other two vertebrate P-element homologs, is expressed, although the protein function is still unknown. Our findings clearly indicate that P-element-related sequences may have a widespread occurrence in vertebrates. Accumulating data from genomic databases will soon provide the solution to this evolutionary puzzle.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 2.—Protein sequence relationships among the human sequence Phsa and P-element sequences of the insects Drosophila (with representatives of five different types), Lucilia, and Musca. The maximum-parsimony dendrogram was generated from a multiple alignment (CLUSTAL W) of the conserved inner section (472 bp) using PAUP (version 4.0b6; Swofford 1997Citation ). Midpoint rooting was applied to root the tree, and gaps were treated as missing characters. Bootstrap values (1,000 replicates) are shown at the nodes. Accession numbers: Lucilia cuprina, A46361; Musca domestica, AF183396; Drosophila melanogaster, X06779; Drosophila bifasciata-M, X60990; Drosophila guanche, A44085; Drosophila subsilvestris, AY032732; D. bifasciata-O, X71634

 

Acknowledgements

This work was supported by the Austrian Science Foundation (FWF, project P11819-GEN).

Footnotes

Pierre Capy, Reviewing Editor

1 Keywords: P element Drosophila human genome sequence phylogeny Back

2 Address for correspondence and reprints: Sylvia Hagemann, Institute of Medical Biology, University of Vienna, Währingerstrasse 10, A-1090 Vienna, Austria. sylvia.hagemann{at}univie.ac.at . Back

References

    Anxolabéhère D., M. G. Kidwell, G. Periquet, 1988 Molecular characteristics of diverse populations are consistent with the hypothesis of a recent invasion of Drosophila melanogaster by mobile P elements Mol. Biol. Evol 5:252-269[Abstract]

    Clark J. B., M. G. Kidwell, 1997 A phylogenetic perspective on P transposable element evolution in Drosophila Proc. Natl. Acad. Sci. USA 94:11428-11433[Abstract/Free Full Text]

    Daniels S. B., K. R. Peterson, L. D. Strausbaugh, M. G. Kidwell, A. Chovnick, 1990 Evidence for horizontal transmission of the P transposable element between Drosophila species Genetics 124:339-355[Abstract/Free Full Text]

    Hagemann S., E. Haring, W. Pinsker, 1996a Repeated horizontal transfer of P transposons between Scaptomyza pallida and Drosophila bifasciata Genetica 98:43-51[ISI][Medline]

    ———. 1996b A new P element subfamily from Drosophila tristis, D. ambigua, and D. obscura Genome 39:978-985[ISI][Medline]

    Haring E., S. Hagemann, W. Pinsker, 2000 Ancient and recent horizontal invasions of drosophilids by P elements J. Mol. Evol 51:577-586[ISI][Medline]

    International Human Genome Sequencing Consortium 2001 Initial sequencing and analysis of the human genome Nature 409:860-921[ISI][Medline]

    Kidwell M. G., J. F. Kidwell, J. A. Sved, 1977 Hybrid dysgenesis in Drosophila melanogaster: a syndrome of aberrant traits including mutation, sterility and male recombination Genetics 86:813-833[Abstract/Free Full Text]

    Kidwell M. G., D. R. Lisch, 2001 Perspective: transposable elements, parasitic DNA, and genome evolution Evolution 55:1-24[ISI][Medline]

    Lee S. H., J. B. Clark, M. G. Kidwell, 1999 A P element-homologous sequence in the house fly, Musca domestica Insect Mol. Biol 8:491-500[ISI][Medline]

    Miller W. J., S. Hagemann, E. Reiter, W. Pinsker, 1992 P homologous sequences are tandemly repeated in the genome of Drosophila guanche Proc. Natl. Acad. Sci. USA 89:4018-4022[Abstract]

    Miller W. J., J. F. McDonald, D. Nouaud, D. Anxolabéhère, 1999 Molecular domestication—more than a sporadic episode in evolution? Genetica 107:197-207[ISI][Medline]

    Misra S., D. C. Rio, 1990 Cytotype control of P element transposition: the 66 kd protein is a repressor of transposase activity Cell 62:269-284[ISI][Medline]

    Nouaud D., D. Anxolabéhère, 1997 P element domestication: a stationary truncated P element may encode a 66-kDa repressor-like protein in the Drosophila montium species subgroup Mol. Biol. Evol 14:1132-1144[Abstract]

    O'Hare K., G. M. Rubin, 1983 Structures of P transposable elements and their sites of insertion and excision in the Drosophila melanogaster genome Cell 34:25-35[ISI][Medline]

    Perkins H. D., A. J. Howells, 1992 Genomic sequences with homology to the P element of Drosophila melanogaster occur in the blowfly Lucilia cuprina Proc. Natl. Acad. Sci. USA 89:10753-10757[Abstract]

    Rio D. C., G. M. Rubin, 1988 Identification and purification of a Drosophila protein that binds to the terminal 31-base-pair inverted repeats of the P transposable element Proc. Natl. Acad. Sci. USA 85:8929-8933[Abstract]

    Silva J. C., M. G. Kidwell, 2000 Horizontal transfer and selection in the evolution of P elements Mol. Biol. Evol 17:1542-1557[Abstract/Free Full Text]

    Swofford D., 1997 PAUP: phylogenetic analysis using parsimony. Version 4.0.0d Smithsonian Institution, Washington, D.C

    Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 25:4876-4882[Abstract/Free Full Text]

    Witherspoon D. J., 1999 Selective constraints on P-element evolution Mol. Biol. Evol 16:472-478[Abstract]

Accepted for publication June 26, 2001.