*Department of Ecology and Evolution, University of Chicago;
and
Department of Molecular and Cellular Biology, Harvard University
Abstract
Jingwei (jgw) is the first gene found to be of sufficiently recent origin in Drosophila to offer insights into the origin of a gene. While its chimerical gene structure was partially resolved as including a retrosequence of alcohol dehydrogenase (Adh), the structure of its non-Adh parental gene, the donor of the N-terminal domain of jgw, is unclear. We characterized this non-Adh parental locus, yellow emperor (ymp), by cloning it, mapping it onto the polytene chromosomes, sequencing the entire locus, and examining its expression patterns in Drosophila melanogaster. We show that ymp is located in the 96-E region; the N-terminal domain of ymp has donated the non-Adh portion of jgw via a duplication. The similar 5' portions of the gene and its regulatory sequences give rise to similar testis-specific expression patterns in ymp and jgw in Drosophila teissieri. Furthermore, between-species comparison of ymp revealed purifying selection in the protein sequence, suggesting a functional constraint in ymp. While the structure of ymp provides clear information for the molecular origin of the new gene jgw, it unexpectedly casts a new light on the concept of genes. We found, for the first time, that the single locus of the ymp gene encompasses three major molecular mechanisms determining structure of eukaryotic genes: (1) the 5' exons of ymp are involved in an exon-shuffling event that has created the portion recruited by jgw; (2) using alternative cleavage sites and alternative splicing sites, the 3' exon groups of ymp produce two proteins with nonhomologous C-terminal domains, both exclusively in the testis; and (3) in the opposite strand of the third intron of ymp is an essential gene, musashi (msi), which encodes an RNA-binding protein. The composite gene structure of ymp manifests the complexity of the gene concept, which should be considered in genomic research, e.g., gene finding.
Introduction
The early history of a gene is of interest, because it addresses a general question about the origin of genes. A number of new genes with novel functions have been found which have revealed various evolutionary mechanisms underlying the origin of new genes (e.g., Long and Langley 1993
; Martignetti and Brosius 1993
; Ohta 1994
; Long et al. 1996
; Begun 1997
; Chen, DeVries, and Cheng 1997
). One of the major molecular processes that give rise to new genes is exon shuffling (Gilbert 1978
). Many cases have been reported of new genes originating via exon shuffling (Patthy 1995
; Long and Langley 1993
; Long et al. 1996
; Nurminsky et al. 1998
). However, insight into the early evolution of such genes is dependent on the discovery of a young gene because of the rapid sequence evolution characteristic of new genes as revealed by several investigations (Long and Langley 1993
; Long et al. 1996
; Nurminsky et al. 1998
).
Jingwei (jgw) was the first gene observed in Drosophila to have recently been created by exon shuffling, and its age is estimated at around 2 Myr. A portion of jgw was identified in Drosophila yakuba and Drosophila teisseiri in an in situ hybridization using Adh as a probe (Langley, Montgomery, and Quattlebaum 1982
). By cloning and sequencing this portion of the gene, Jeffs and Ashburner (1991)
observed that all Adh introns were lost, interpreting this as a processed pseudogene resulting from random insertion of a retrosequence into a region devoid of regulatory sequences.
Further molecular population genetic analysis, however, revealed strong purifying selection, as shown by the near limitation of nucleotide polymorphism to silent sites (Long and Langley 1993
). This gene was observed to have specific RNA expression patterns, and its evolution was driven by ubiquitous Darwinian positive selection (Long and Langley 1993
), which usually acts only on functional genes. These results suggest that jgw is a newly evolved functional gene. Furthermore, molecular characterization showed that the insertion of the Adh retrosequence recruited nearby preexisting exons and introns and thereby created a chimerical gene structure in a standard form of exon shuffling.
What is the source of the recruited exons and introns of the jgw gene? They could originate from a unique noncoding genomic sequence, as is approximately seen in the genes encoding BC1 RNA in rodents and BC200 RNA in primates (Brosius and Gould 1992
). Alternatively, they could have originated from a preexisting gene or a duplicate of a gene. Although Long, Wang, and Zhang (1999)
demonstrated that these recruited exons and introns are a portion of a duplicate of the gene yellow emperor (ymp), the structure of ymp itself was unclear, and the process by which the non-Adh portion originated remains to be investigated.
In this paper, we report the structure and some information concerning the function of the ymp locus in Drosophila melanogaster. We found that its structure is unique and not only offers a further explanation for the origin of the jgw gene, but also manifests the complexity of the concept of genes. The implication of these results will be discussed with respect to genomic research, such as gene-finding from genomic sequence data.
Materials and Methods
Screening cDNA and Genomic Libraries
cDNA libraries of D. melanogaster and D. yakuba and a genomic library of D. teissieri were screened using a 32P-labeled DNA fragment containing the first three exons of D. teissieri jgw, following standard procedure (Sambrook, Fritsch, and Maniatis 1989
). Two distinct transcripts were isolated from the D. melanogaster cDNA library. Both strands of the inserts were sequenced using the sequencing kit of United States Biochemical (version 2). We named the transcripts ymp-1 and ymp-2, respectively, following Long, Wang, and Zhang (1999)
.
Drosophila yakuba cDNA and D. teissieri genomic libraries were made, using Lambda ZAP II and Lambda FIX II (Stratagene, San Diego), respectively, as vectors, using protocols provided by Stratagene and Sambrook, Fritsch, and Maniatis (1989). The RNA and genomic DNA were extracted from adult flies using modified procedures from Ashburner (1989)
. The D. melanogaster cDNA library (RNA from adult flies of the Oregon R strain) was a generous gift of Dr. Bruce A. Hamilton of the Whitehead Institute, Massachusetts Institute of Technology.
Mapping ymp in Polytene Chromosomes Using Fluorescence In Situ Hybridization
Digoxigenin-11-dUTP (DIG) (Roche Molecular Biochemicals) or Biotin-16-dUTP (Roche Molecular Biochemicals) labeled probes were constructed specifically for the shared three 5' exons, ymp-1 3' exons, and ymp-2 3' exons, respectively, by PCR. Primers A747 and A698 (Long, Wang, and Zhang 1999) were used for amplifying the three shared exons, ymp1F (5'-GTGCCCATTATTGCGATTTCAT-3') and ymp1R (5'-TCCCTGGCCTTTTATTTCCTTC-3') were used for the ymp-1 3' exons, and y43-3 (5'-TGGCATTGGTGAAGGACG-3') and y43-1 (5'-AAAGAAGTAGCTACTCGGC-3') were used for the ymp-2 3' exons. Polytene chromosome slides for fluorescence in situ hybridization (FISH) were prepared according to the protocols of Ashburner (1989)
. DIG-labeled probes were detected with rhodamine-conjugated antibody, and biotin-labeled probes were detected with fluorescein-conjugated streptavidin. Single and double color FISHs were performed as described by Wiegant (1996)
with modifications.
P1 Subcloning
Based on the results of polytene chromosome in situ hybridization, which show these genes located at 96E on the third chromosome of D. melanogaster, we screened D. melanogaster P1 clones (Hartl et al. 1994
) around 96E by PCR amplifications using the same primers described in Mapping ymp in Polytene Chromosomes Using Fluorescence In Situ Hybridization. These P1 clones were from the laboratory of Dr. Spyros Artavanis-Tsakonas of Yale University.
Two P1 clones (DS00423 and DS02160), each containing both the ymp-1 and the ymp-2 sequences, were identified by PCR amplification of both ymp-1 and ymp-2. DNA fragments from XhoI digestion of these two P1 clones were separated on a 0.7% agarose gel, and were then transferred to nylon membrane (Roche Molecular Biochemicals) by Southern blotting. The three DIG-labeled probes described in Mapping ymp in Polytene Chromosomes Using Fluorescence In Situ Hybridization were successively hybridized to the membrane. Almost identical hybridization patterns were found for these two independent P1 clones. All positive bands were purified from another agarose gel and subcloned into XhoI-cut Bluescript SK(+) plasmid (Stratagene, San Diego). These inserts were sequenced using an ABI automated sequencer. The contig for these subclones was established by PCR analyses using various primer-pairing strategies, together with the help of the ymp-1 and ymp-2 cDNA sequences.
Reverse Transcription Polymerase Chain Reaction
Poly (A) RNAs extracted from D. melanogaster whole heads, thoraces (male), abdomen (female), abdomen (male), eyes, brains, proboscis, gut, testis, and muscle were used for reverse transcription polymerase chain reaction (RT-PCR) in order to detect expression patterns of ymp-1 and ymp-2. PCR with gapdh2 primers was used to provide an internal control for normalizing the cDNA concentration. The primers in the PCR reactions, at a concentration of 8 µM, are A691-internal/CD-3 (5'-TCCTGCAGTGAGAGCATAGA-3') for ymp-1 and A691-internal (5'-TAGATGATGATCCTTGTGTG-3')/Y43-4 (5'-CGGATTCGAAACCTCAAGGC-3') for ymp-2. The expression of the gapdh2 gene encoding glyceraldehyde-3-phosphate dehydrogenase-2 was chosen as an internal control because of its stable expression in various tissues (Tso, Sun, and Wu 1985
). The primers for amplifying gapdh2 that were added into the same PCR reactions used to amplify ymp-1 or ymp-2 were JCT.L (5'-CAAGCAAGCCGATAGATAAAC-3') and t11.R (5'-GTCAAATCGACCACGGAAA-3') at a concentration of 8 µM. The oligo JCT.L was designed to span an intron in order to rule out PCR amplification of genomic DNA. The detailed procedures for microdissecting flies, extracting RNA, synthesizing cDNA, and normalizing cDNA concentrations are in Alvarez, Robison, and Gilbert (1996)
.
Sequence Analyses
DNA alignments were conducted using the GeneJockeyII program package (BIOSOFT). The virtual translation of DNA sequences into protein sequences and alignment of the translated protein sequences were also carried out with the GeneJockeyII package. DNA and protein sequence similarity searches were conducted through the NCBI web site of the National Institutes of Health (http://www.ncbi.nlm.nih.gov). Estimation of synonymous substitution rates (Ks) and nonsynonymous substitution rates (Ka) and a test of deviation of the ratio Ka/Ks from unity were carried out using the K-estimator proposed by Comeron (1999)
.
Results
Gene Structure of the ymp Locus
Using the 5' portion of jgw as a probe, we identified 77 positive plaques from a total 300,000 pfu of the D. melanogaster cDNA library. Among them, we identified two distinct classes of transcripts, ymp-1 and ymp-2. We also obtained the ymp-1 homologous sequence from the screening of a D. yakuba cDNA library and the ymp-2 homologous sequence of D teissieri by sequencing a ymp-positive phage clone identified from the D. teissieri genomic DNA library that we constructed. The cDNA sequences are shown in figure 1bd.
Subcloning and sequencing of the D. melanogaster P1 clones showed a complex genomic structure for these genes (fig. 1a
).
|
Strikingly, we found a well-characterized gene, msi, located in the big intron (14.9 kb) which separates the three small homologous exons from the downstream exons of ymp-1 and ymp-2. The msi gene is about 7.6 kb long, has two introns, and encodes a neural RNA-binding protein which is required for the development of adult external sensory organs (Nakamura et al. 1994
). This gene is located on the DNA strand opposite the sense strand that encodes ymp genes (fig. 1a
).
From these gene structure data, it appears that with two introns (3 and 7) separating three distinct exon groups of ymp, three novel proteins originated by recombination of these exon groups and the Adh retrosequence (fig. 2 ). That intron 3 also harbors a developmentally important gene indicates a unique role of introns in the evolution of genes. The following analyses will further show that the proteins YMP-1 and YMP-2 are not functionless.
|
|
|
|
The origin of new genes includes two processes: the initial molecular assembly events and the subsequent population genetics. A processed retrosequence of the Adh gene is part of a young functional gene, jgw (Long and Langley 1993
). The Adh-derived sequence was combined with three upstream exons about 2.5 MYA in the yakubateissieri lineage. Recently, Long, Wang, and Zhang (1999)
demonstrated that there is another gene, dubbed ymp, containing the same structure as the recruited portion of jgw, which must have provided the donor for the exon-shuffling process that created jgw. However, the structure and function of ymp, as well as the portion of the donor gene that was involved in the shuffling process, remained unclear. The results of this study revealed that the ymp locus, the source of the recruited portion of jgw, has a remarkably complex gene structure.
The ymp locus produces two mRNAs, ymp-1 and ymp-2, resulting from the use of two adenylation sites and an alternative splicing process. The between-species comparison indicated significantly lower nonsynonymous substitution rates than synonymous substitution rates in the coding sequences of each protein (table 1
), suggesting an evolutionary constraint on protein sequence typical of functional genes. These two transcripts share the three 5' exons that are highly similar to the recruited portion of jgw, suggesting that the recruited portion of jgw arose from a duplication event of the ymp gene (Long, Wang, and Zhang 1999
). Both ymp-1 and ymp-2 are specifically expressed in testes (fig. 3
), suggesting that their functions may be related to reproduction. Thus, the interesting fact that jgw is specifically expressed in adult male D. teissieri is likely a consequence of a similar regulatory sequence inherited by the jgw gene from the ymp locus. It is remarkable that a sibling species of D. teissieri, D. yakuba, which has been separated for a short time (2.5 Myr), evolved a different expression pattern in which the transcripts are also present in other developmental stages.
It becomes clear from this investigation and previous analysis (Long, Wang, and Zhang 1999
) that the first three exons of the ymp locus are a donor for the recruited portion of jgw. Considering the hydrophobicity of the N-terminal peptide in JGW, YMP-1, and YMP-2 (Long, Wang, and Zhang 1999
), the three small exons may encode a signal peptide, although this needs to be experimentally confirmed. Because there is no reported signal peptide homologous to this peptide, the target cellular membrane location of this signal peptide is unknown. YMP-1 and YMP-2 probably carry out different functions, since their sequences are not similar at the C-terminal ends. This feature, together with the shared promoter, makes the ymp locus different from other loci with multiple adenylation sites or alternative splicing, which usually produce isoforms with somewhat similar domains (with the exception of the unc-17/cha-1 locus [Alfonso et al. 1994]; the unc-17/cha-1 locus encodes two alternative forms, one of which contains only a noncoding first exon).
The ymp locus is further complicated by the presence of the msi gene, nested in intron 3, which separates the first three exons from the rest of the downstream exons of ymp (fig. 1a
). The nested structure of the ymp locus shows two unique features that differ from nested genes previously reported. The intronic msi is 7.6 kb long, making it the longest nested gene identified so far. The other nested genes are usually around 1 kb long (Henikoff et al. 1986; Chen et al. 1987
; Furia et al. 1990, 1993
; Levinson et al. 1990
; Neufeld, Carthew, and Rubin 1991
; McBabb, Greig, and Davis 1996
; Valleix et al. 1999
). Moreover, like the first reported nested cuticle gene in the Gart locus (Henikoff et al. 1986
), the msi gene is located on the strand opposite its host gene. Simultaneous transcription of both strands may lead to RNA interference (O'Hare 1986
; Sharp 1999
), as two recent experiments on D. melanogaster showed (Kennerdell and Carthew 1998
; Misquitta and Paterson 1999
). In the ymp locus, this interference, if any, may be avoided by a spatially differential expression of the ymp gene and the msi gene. The msi gene is expressed in sensilla (Nakamura et al. 1994
), while the expression of the ymp gene is restricted to the testis. In the GART locus, however, simultaneous transcription of the purine gene and the intronic gene seems possible (Henikoff et al. 1986
). Nested genes may not be uncommon gene structures. In a survey of a genomic region surrounding Adh gene of 2.9 Mb in D. melanogaster, Ashburner et al. (1999)
identified 17 nested coding regions (CDS) using computer programs for gene prediction, although all of them except Adh and Adh-r have yet to be confirmed experimentally. How typical the different cases represented by GART and ymp are in their structures and their expression patterns and how nested genes are related to transcriptional interference are questions that remain to be clarified with further experimental data.
The ymp locus encompasses three phenomena pointing to an important role for introns: (1) an event of exon shuffling involving 5' exons, (2) a long nested gene within an intron, and (3) alternative transcription termination associated with alternative splicing. A single locus combining this set of molecular properties has not previously been reported. This finding may add to the classical concept of genes, which has been modified with the discoveries of operons, introns, overlapping genes, alternative splicing, multiple polyadenylation sites, complex promoters, and nested genes. The complex structure and evolutionary history of ymp indicate the importance of introns in the origin of new genes, as the exon theory of genes has suggested (Gilbert 1978, 1989
). Indeed, introns 3 and 7 in ymp facilitate the recombination of several exon groups and Adh retrosequences that led to the origin of three proteins (fig. 2
). Meanwhile, the complexity of gene structure, as shown in the ymp locus, ought to be an important factor to consider in genomic research, such as the prediction of genes from genome data. In fact, the complex gene structure of the ymp locus, as described in this report, was not predicted from the genome sequences of D. melanogaster (Adams et al. 2000).
Acknowledgements
We thank Walter Gilbert of Harvard for support and discussion; Bruce Hamilton of MIT for his generous gift of D. melanogaster cDNA libraries; the laboratory of Spyros Artavanis-Tsakonas of Yale for maintaining and delivering the P1 clones of D. melanogaster; and Janice Spofford, A. Hon-Tsen Yu, and members of M.L.'s laboratory for critical reading and discussion of the manuscript. We also thank Josep Comeron for his K-estimator program. This project was supported by a Packard Fellowship in Science and Engineering and a grant from National Science Foundation to M.L.
Footnotes
Edward Holmes, Reviewing Editor
1 Keywords: origin of new genes
exon shuffling
nested gene
alternative splicing
2 Address for correspondence and reprints: Manyuan Long, Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, Illinois 60637. E-mail: mlong{at}midway.uchicago.edu
literature cited
Adams, M. D., S. E. Celniker, R. A. Holt et al. (195 co-authors). 2000. The genome sequence of Drosophila melanogaster. Science 287:21852195.
Alfonso, A., K. Grundahl, J. R. McManus, J. M. Asbury, and J. B. Rand. 1994. Alternative splicing leads to two cholinergic proteins in Caenorhabditis elegans. J. Mol. Biol. 241:627630.[ISI][Medline]
Alvarez, C. E., K. Robison, and W. Gilbert. 1996. Novel Gq alpha isoform is a candidate transducer of rhodopsin signaling in a Drosophila testes-autonomous pacemaker. Proc. Natl. Acad. Sci. USA 93:1227812282.
Ashburner, M. 1989. Drosophila, a laboratory manual. Cold Spring Harbor Laboratory Press, New York.
Ashburner, M., S. Misra, J. Roote et al. (25 co-authors). 1999. An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region. Genetics 153:179219.
Begun, D. J. 1997. Origin and evolution of a new gene descended from alcohol dehydrogenase in Drosophila. Genetics 145:375382.
Brosius, J., and S. J. Gould. 1992. On "genomenclature": a comprehensive (and respectful) taxonomy for pseudogenes and other "junk DNA". Proc. Natl. Acad. Sci. USA 89:1070610710.
Chen, C., T. Malone, T. Beckendorf, and R. L. Davis. 1987. At least two genes reside within a large intron of the dunce gene of Drosophila. Nature 329:721724.
Chen, L. B., A. L. DeVries, and C. H. C. Cheng. 1997. Evolution of antifreeze glycoprotein gene from a trypsinogen gene in Antarctic notothenioid fish. Proc. Natl. Acad. Sci. USA 94:38113816.
Comeron, J. M. 1999. K-estimator: calculation of the number of nucleotide substitutions per site and the confidence intervals. Bioinformatics 15:763764.
Furia, M., P. P. D'Avino, S. Crispi, D. Artiaco, and L. C. Polito. 1993. Dense cluster of genes is located at the ecdysone-regulated 3C puff of Drosophila melanogaster. J. Mol. Biol. 231:531538.
Furia, M., F. A. Digilio, D. Artiaco, E. Giordana, and L. C. Polito. 1990. A new gene nested within the dunce genetic unit of Drosophila melanogaster. Nucleic Acids Res. 18:58375841.
Gilbert, W. 1978. Why gene in pieces? Nature 271:501.
. 1989. The exon theory of genes. Cold Spring Harb. Symp. Quant. Biol. 52:901905.[ISI]
Hartl, D. L., D. I. Nurminsky, R. W. Jones, and E. R. Lozovskaya. 1994. Genome structure and evolution in Drosophila: applications of the framework P1 map. Proc. Natl. Acad. Sci. USA 91:68246829.
Henikoff, S., M. A. Keene, K. Fechtel, and J. W. Fristrom. 1986. Gene within a gene: nested Drosophila genes encode unrelated proteins on opposite strands. Cell 44:3342.
Jeffs, P., and M. Ashburner. 1991. Processed pseudogenes in Drosophila. Proc. R. Soc. Lond. B 244:151159.
Kennerdell, J. R., and R. W. Carthew. 1998. Use of dsRNA-mediated genetic interference to demonstrate that frizzled and frizzled 2 act in the wingless pathway. Cell 95:10171026.
Langley, C. H., E. Montgomery, and W. F. Quattlebaum. 1982. Restriction map variation in the Adh region of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 79:56315635.
Levinson, B., S. Kenwrick, D. Lakich, G. Hammonds, and J. Gitschier. 1990. A transcribed gene in an intron of the human factor VIII gene. Genomics 7:111.
Long, M., S. J. de Souza, C. Rosenberg, and W. Gilbert. 1996. Exon shuffling and the origin of the mitochondrial targeting function in plant cytochrome c1 precursor. Proc. Natl. Acad. Sci. USA 93:77277731.
Long, M., and C. H. Langley. 1993. Natural selection and origin of jingwei-a chimeric processed functional gene. Science 260:9195.
Long, M., W. Wang, and J. Zhang. 1999. Origin of new genes and source for N-terminal domain of the chimerical gene, jingwei, in Drosophila. Gene 238:135142.
McBabb, S., S. Greig, and T. Davis. 1996. The alcohol dehydrogenase gene is nested in the outspread locus of Drosophila melanogaster. Genetics 143:897911.
Martignetti, J. A., and J. Brosius. 1993. BC2000 RNA: a neural RNA polymerase III product encoded by a monomeric Alu element. Proc. Natl. Acad. Sci. USA 90:1156311567.
Misquitta, L, and B. M. Paterson. 1999. Targeted disruption of gene function in Drosophila by RNA interference (RNA-i): a role for nautilus in embryonic somatic muscle formation. Proc. Natl. Acad. Sci. USA 96:14511456.
Nakamura, M., H. Okano, J. A. Blendy, and C. Montell. 1994. Musashi, a neural RNA-binding protein required for Drosophila adult external sensory organ development. Neuron 13:6781.
Neufeld, T. P., R. W. Carthew, and G. M. Rubin. 1991. Evolution of gene position: chromosomal arrangement and sequence comparison of the Drosophila melanogaster and Drosophila virilis sina and Rh4 genes. Proc. Natl. Acad. Sci. USA 88:1020310207.
Nurminsky, D. I., M. V. Nurminskaya, D. De Aguiar, and D. L. Hartl. 1998. Selective sweep of a newly evolved sperm-specific gene in Drosophila. Nature 396:572575.
O'Hare, K. 1986. Genes within genes. Trends Genet. 2:33.
Ohta, T. 1994. Further examples of evolution by gene duplication revealed through DNA sequence comparisons. Genetics 138:13311337.
Patthy, L. 1995. Protein evolution by exon-shuffling. Springer-Verlag, New York.
Sambrook, J., E. Fritsch, and T. Maniatis. 1989. Molecular cloninga laboratory manual. 2nd edition. Cold Spring Harbor Laboratory Press, New York.
Sharp, P. A. 1999. RNAi and double-strand RNA. Genes Dev. 13:139141.
Tso, J. Y., X.-H. Sun, and R. Wu. 1985. Structure of two unlinked Drosophila melanogaster glyceraldehyde-3-phosphate dehydrogenase genes. J. Biol. Chem. 260:82208228.
Valleix, S., J.-C. Jeanny, S. Elsevier, R. L. Joshi, P. Fayet, D. Bucchini, and M. Delpech. 1999. Expression of human F8B, a gene nested within the coagulation factor VIII gene, produces multiple eye defects and developmental alterations in chimeric and transgenic mice. Hum. Mol. Genet. 8:12911301.
Wiegant, J. 1996. Nonradioactive in situ hybridization application manual. 2nd edition. Boehringer, Mannheim, Germany.