* Institute of Bioinformatics, Tsinghua University, Beijing, China
Department of Medical Genetics, China Medical University, Shenyang, China
Life Science College, Nanjing Normal University, Nanjing, China
Correspondence: E-mail: bphma{at}mail.tsinghua.edu.cn.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: amino acid usage alternatively spliced genes protein length distribution eukaryotes evolution
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The above findings have strongly suggested an important role for alternative splicing in the formation of biological complexity of the human body. Some researchers have recently presented results from genomic analyses of alternative splicing that strongly supported the hypothesis (Graveley 2001; Kondrashov and Koonin 2003; Kriventseva et al. 2003). Meanwhile, further questions have been raised in regard to the identification, functional roles, and regulation of alternative splicing forms across the whole genome, in addition to those relevant to the mechanisms, origin, and evolution of alternative splicing.
Understanding the mechanisms and functions of alternative splicing may provide a unique way of elucidating the evolutionary mechanisms of a genome as well as analysis of its sequence data. Comparative study of the amino acids usage and protein length of alternatively and non-alternatively spliced genes may therefore be necessary. The purpose of this study was to explore amino acid usage patterns and protein length distributions of alternatively and non-alternatively spliced genes among six eukaryotic genomes, including those of human (Homo sapiens), mouse (Mus musculus), rat (Rattus norvegicus), fruit fly (Drosophila melanogaster), Caenorhabditis elegans, and bovine (Bos taurus).
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Statistical Analysis
A 2 test has been used for examining the statistical significance of the difference in overall amino acid usage between alternatively and non-alternatively spliced genes. The value of
2 was calculated as:
|
Because the capability of compared sample sets is different, a percent difference test (Ma 1982, pp. 194197; Steel and Torrie 1980) was carried out for evaluating the statistical significance of the difference in the usage of single amino acids between alternatively and non-alternatively spliced genes; it was calculated as:
|
|
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
Protein Length Distribution in Alternatively and Non-alternatively Spliced Genes Among the Six Eukaryotes
We have also compared protein length distribution of alternatively and non-alternatively spliced genes from the six eukaryotes studied (table 4 and fig. 2). As shown in table 4, among all data sets, the average lengths of the protein products of alternatively and non-alternatively spliced genes were, respectively, 656.8 and 484.6 (human), 625.7 and 439.0 (mouse), 759.7 and 443.9 (rat), 764.3 and 504.3 (fruit fly), 679.3 and 439.1 (C. elegans), and 604.2 and 349.0 (bovine), whereas the average lengths of highly expressed alternatively and non-alternatively spliced genes were, respectively, 686.3 and 432.6 (human), and 494.5 and 407.4 (mouse). These results seem to suggest that, for both the overall genome and the highly expressed genes, the average lengths of alternatively spliced genes are significantly greater than those of non-alternatively spliced ones. Further analysis of the average length of the shorter isoform in alternatively spliced genes has also suggested that the average lengths of shorter isoforms from human, mouse, rat, fruit fly, C. elegans, and bovine are, respectively, 563.4, 551.5, 591.2, 701.4, 613.0, and 512.1 amino acids, all being greater than those of non-alternatively spliced ones (data not shown). Furthermore, as shown in figure 2, although the two types of genes have very similar protein length distribution among the six species, for those with a length fewer than 400 amino acids, non-alternatively spliced genes seem to have taken a remarkably higher proportion in humans. Nevertheless, highly expressed human and mouse genes have a protein length distribution similar to that of genes overall.
|
|
The influence of alternatively spliced genes that may potentially be mixed in the non-alternative subset has been evaluated. As shown in figure 3, when subset A is mixed with some of subset C2, the mean for subset C1, as estimated by all of subset A, will change e.g., to increase in case the mean of subset C2 is greater than that of C1. Similarly, we assume that the non-alternatively and alternatively spliced gene data sets have, respectively, a mean value of µ1 and µ2. Using c1 and c2 to denote the sample sizes of non-alternatively and alternatively spliced gene subsets and f to denote the proportion between C2 and A, the subset A, A with a mean value of µ1 + fµ2, and size of m. (m = c1 + fc2) will be the sum of subsets C1 and C2, whereas subset B with a mean value of µ2 and size of c2 will be equal to subset C2.
|
|
|
|
|
It can be seen clearly that the influences of potentially mixed-in alternatively spliced genes in the non-alternatively spliced subset may be negligible.
Implications for the Origin of Alternatively Spliced Genes
Our findings seem to imply that alternatively spliced genes might have originated from non-alternatively spliced ones through DNA mutation or gene fusion. Results from other studies have also implied that mutation events may play an important role in the generation of alternatively spliced genes at the early stages of eukaryote evolution.
Mutational events during the course of evolution may have resulted in loss or retention of introns in the mRNA, as well as shift of transcription/polyadenylation/splice sites, which may all enable production of multiple mRNAs from a single gene. Various alternative splicing patterns have been identified among organisms (Wang, Selvakumar, and Helfman 1997; König, Ponta, and Herrlich 1998; Lopez 1998; Muro, Iaconcig, and Baralle 1998; Smith and Valcarcel 2000; Coward, Haas, and Vingron 2002). It has been discovered that some beneficial mutations could lead to a new gene function (Ohno 1970; Li 1997; Wagner 1998; Holland 1999; Hughes 1999), and that mutations that may result in the emergence of multiple alternative splicing variants are randomly distributed (Kriventseva et al. 2003). Luzi et al. (2000) revealed that the alternatively spliced mammalian shc, rai, and sli genes may have evolved from a single ancestor. It has been proposed that splicing errors might have been caused by mutations in the genomic DNA that have either destroyed a normal splicing signal or created a new one (Cooper and Mattox 1997), whereas other types of "mis-splicing" might have been caused by mutations in splicing regulatory proteins (Blencowe 2000; Jensen et al. 2000; Reenan, Hanrahan, and Ganetzky 2000). Graveley (2001) has suggested that splicing "errors" may also result from mutations of the pre-mRNA during transcription. Recent studies on C. elegans have suggested that the major targets of the mRNA surveillance system are not aberrantly spliced RNAs, but rather transcripts that are deliberately spliced to contain premature stop codons (Mitrovich and Anderson 2000).
The greater average length of those alternatively spliced genes among the six eukaryotes seemed to have suggested that such genes might have arisen through gene fusion or insertion of exogene/segments. Other mechanisms that may have an implication for the origin of alternatively spliced genes may include fusion (Thompson et al. 2000) or deletion between two adjacent genes (Nurminsky et al. 1998), DNA duplication (Kim et al 1992; Bark 1993; Stewart and Denell 1993; Hayward and Bonthron 1998; Kondrashov and Koonin 2001; Letunic, Copley, and Bork 2002), insertion (Kondrashov and Koonin 2003), and exon creation and/or loss (Modrek and Lee 2003), among others.
Clearly, our analysis may provide an important clue for the understanding the origin and evolutionary mechanisms of alternative splicing, although further research in the field is still needed.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bark, I. C. 1993. Structure of the chicken gene for SNAP-25 reveals duplicated exon encoding distinct isoforms of the protein. J. Mol. Biol. 233:67-76.[CrossRef][ISI][Medline]
Blencowe, B. J. 2000. Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem. Sci. 25:106-110.[CrossRef][ISI][Medline]
Brett, D., H. Pospisil, J. Valcarcel, J. Reich, and P. Bork. 2002. Alternative splicing and genome complexity. Nat. Genet. 30:29-30.[CrossRef][ISI][Medline]
Chiusano, M. L., F. Alvarez-Valin, M. Di Giulio, G. Donofrio, G. Ammirato, G. Colonna, and G. Bernardi. 2000. Second codon positions of genes and the secondary structures of proteins. Relationships and implications for the origin of the genetic code. Gene 261:63-69.[CrossRef][ISI][Medline]
Cooper, T. A., and W. Mattox. 1997. The regulation of splice-site selection, and its role in human disease. Am. J. Hum. Genet. 61:259-266.[ISI][Medline]
Coward, E., S. A. Haas, and M. Vingron. 2002. SpliceNest: visualizing gene structure and alternative splicing based on EST clusters. Trends Genet. 18:53-55.[CrossRef][ISI]
Duret, L., and D. Mouchiroud. 1999. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96:4482-4487.
Graveley, B. R. 2001. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 17:100-107.[CrossRef][ISI][Medline]
Hayward, B. E., and D. T. Bonthron. 1998. Structure and alternative splicing of the ketohexokinase gene. Eur. J. Biochem. 257:85-91.[Abstract]
Holland, P. W. 1999. Gene duplication: past, present and future. Semin. Cell Dev. Biol. 10:541-547.[CrossRef][ISI][Medline]
Hughes, A. L. 1999. Adaptive evolution of genes and genomes. Oxford University Press, New York, Oxford.
Jensen, K. B., B. K. Dredge, G. Stefani, R. Zhong, R. J. Buckanovich, H. J. Okano, Y. Y. L. Yang, and R. B. Darnell. 2000. Nova-1 regulates neuronspecific alternative splicing and is essential for neuronal viability. Neuron 25:359-371.[ISI][Medline]
Kersey, P., H. Hermjakob, and R. Apweiler. 2000. Varsplic: Alternatively-spliced protein sequences derived from Swiss-Prot and TrEMBL. Bioinformatics. 16:1048-1049.[Abstract]
Kim, J., J. J. Yim, S. Wang, and D. Dorsett. 1992. Alternate use of divergent forms of an ancient exon in the fructose-1,6-bisphosphate aldolase gene of Drosophila melanogaster. Mol. Cell. Biol. 12:773-783.[Abstract]
Kondrashov, F. A., and E. V. Koonin. 2001. Origin of alternative splicing by tandem exon duplication. Hum. Mol. Genet. 10:2661-2669.
Kondrashov, F. A., and E. V. Koonin. 2003. Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequences. Trends Genet. 19:115-119.[CrossRef][ISI][Medline]
König, H., H. Ponta, and P. Herrlich. 1998. Coupling of signal transduction to alternative pre-mRNA splicing by a composite splice regulator. EMBO J. 17:2904-2913.
Kriventseva, E. V., I. Koch, R. Apweiler, M. Vingron, P. Bork, M. S. Gelfand, and S. Sunyaev. 2003. Increase of functional diversity by alternative splicing. Trends Genet. 19:124-128.[CrossRef][ISI][Medline]
Letunic, I., R. Copley, and P. Bork. 2002. Common exon duplication in animals and its role in alternative splicing. Hum. Mol. Genet. 11:1561-1567.
Li, W. H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.
Lopez, A. J. 1998. Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu. Rev. Genet. 32:279-305.[CrossRef][ISI][Medline]
Luzi, L., S. Confalonieri, P. Paolo, D. Fiore, and P. G. Pelicci. 2000. Evolution of Shc functions from nematode to human. Curr. Opin. Genet. Dev. 10:668-674.[CrossRef][ISI][Medline]
Ma, Y. 1982. Experimental statistics. Agriculture Press, Beijing [in Chinese].
Mikheeva, S., M. Hakim-Zargar, D. Carson, and K. A. Jarrell. 1997. Use of an engineered ribozyme to produce a circular human exon. Nucleic Acids Res. 25:5085-5094.
Mitrovich, Q. M., and P. Anderson. 2000. Unproductively spliced ribosomal protein mRNAs are natural targets of mRNA surveillance in C. elegans. Genes Dev. 14:2173-2184.
Modrek, B., and C. Lee. 2002. A genomic view of alternative splicing. Nat. Genet. 30:13-19.[CrossRef][ISI][Medline]
Modrek, B., and C. Lee. 2003. Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat. Genet. 34:1-4.[CrossRef][ISI][Medline]
Modrek, B., A. Resch, C. Grasso, and C. Lee. 2001. Genome-wide analysis of alternative splicing using human expressed sequence data. Nucleic Acids Res. 29:2850-2859.
Muro, A. F., A. Iaconcig, and F. E. Baralle. 1998. Regulation of the fibronectin EDA exon alternative splicing. Cooperative role of the exonic enhancer element and the 50 splicing site. FEBS Lett. 437:137-141.[CrossRef][ISI][Medline]
Nurminsky, D. I., M. V. Nurminskaya, E. V. Benevolenskaya, Y. Y. Shevelyov, D. L. Hartl, and V. A. Gvozdev. 1998. Cytoplasmic dynein intermediate-chain isoforms with different targeting properties created by tissue-specific alternative splicing. Mol. Cell. Biol. 18:6816-6825.
Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, New York.
Okaysu, T., M. Ikeda, K. Akimoto, and K. Sorimachi. 1997. The amino acid composition of mammalian and bacterial cells. Amino Acids 13:379-391.[ISI]
Reenan, R. A., C. J. Hanrahan, and B. Ganetzky. 2000. The mle(napts) RNA helicase mutation in Drosophila results in a splicing catastrophe of the para Na+ channel transcript in a region of RNA editing. Neuron 25:139-149.[ISI][Medline]
Smith, C. W., and J. Valcarcel. 2000. Alternative pre-mRNA splicing: the logic of combinatorial control. Trends Biochem. Sci. 25:381-388.[CrossRef][ISI][Medline]
Sorimachi, K. 1999. Evolutionary changes reflected by the cellular amino acid composition. Amino Acids 17:207-226.[ISI][Medline]
Sorimachi, K. 2002. The classification of various organisms according to the free amino acid composition change as the result of biological evolution. Amino Acids 22:55-69.[CrossRef][ISI][Medline]
Sorimachi, K., T. Itoh, Y. Kawarabayasi, T. Okayasu, K. Akimoto, and A. Niwa. 2001. Conservation of the basic pattern of cellular amino acid composition of archaeobacteria during biological evolution and the putative amino acid composition of primitive life forms. Amino Acids 21:393-399.[CrossRef][ISI][Medline]
Sorimachi, K., T. Okayasu, K. Akimoto, and A. Niwa. 2000. Conservation of the basic pattern of cellular amino acid composition during biological evolution in plants. Amino Acids 18:193-197.[CrossRef][ISI][Medline]
Steel, R. G. D., and J. H. Torrie. 1980. Principles and procedures of statistics, a biometrical approach. 2nd edition. McGraw-Hill Book Company, New York.
Stewart, M. J., and R. Denell. 1993. The Drosophila ribosomal protein S6 gene includes a 3' triplication that arose by unequal crossing-over. Mol. Biol. Evol. 10:1041-1047.[Abstract]
Thompson, T. M., J. J. Lozano, N. Loukili, R. Carrió, F. Serras, B. Cormand, M. Valeri, V. M. Díaz, J. Abril, and M. Burset. 2000. Fusion of the human gene for the polyubiquitination coeffector UEV1 with Kua, a newly identified gene. Genome Res. 10:1743-1756.
Wagner, A. 1998. The fate of duplicated genes: loss or new function? Bioessays 20:785-788.[CrossRef][ISI][Medline]
Wang, Y.-C., M. Selvakumar, and D. M. Helfman. 1997. Alternative pre-mRNA splicing. Pp. 242279 in A. R. Krainer, ed. Eukaryotic mRNA processing. Oxford University Press, Oxford.