Comparative Analysis of Amino Acid Usage and Protein Length Distribution Between Alternatively and Non-alternatively Spliced Genes Across Six Eukaryotic Genomes

Yonglong Zhuang*, Fei Ma*,, Jesse Li-Ling{dagger}, Xiaofeng Xu{ddagger} and Yanda Li*

* Institute of Bioinformatics, Tsinghua University, Beijing, China
{dagger} Department of Medical Genetics, China Medical University, Shenyang, China
{ddagger} Life Science College, Nanjing Normal University, Nanjing, China

Correspondence: E-mail: bphma{at}mail.tsinghua.edu.cn.


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
Alternative splicing has been discovered in nearly all metazoan organisms as a mechanism to increase the diversity of gene products. However, the origin and evolution of alternatively spliced genes are still poorly understood. To understand the mechanisms for the evolution of alternatively spliced genes, it may be important to study the differences between alternatively and non-alternatively spliced genes. The aim of this research was to compare amino acid usage and protein length distribution between alternatively and non-alternatively spliced genes across six nearly complete eukaryotic genomes, including those of human (Homo sapiens), mouse (Mus musculus), rat (Rattus norvegicus), fruit fly (Drosophila melanogaster), Caenorhabditis elegans, and bovine (Bos taurus). Our results have suggested the following: (1) across the six species, alternatively and non-alternatively spliced genes have very similar tendency for amino acids usage for not only the overall scale but also those highly expressed genes, with all of the highly expressed genes having preferred amino acids including A, E, G, K, L, P, S, V, R, T, and D. (2) For not only the overall genes but also those highly expressed ones, the average length of the protein products of alternatively spliced genes is significantly greater than that of non-alternatively spliced ones. In contrast, distributions of protein lengths for the two groups of genes are very similar among all six species. Based on these results, we propose that alternatively spliced genes may have originated from non-alternatively spliced ones through events such as DNA mutations or gene fusion.

Key Words: amino acid usage • alternatively spliced genes • protein length distribution • eukaryotes • evolution


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
A major task for the postgenomic era is to characterize the full protein complement, i.e. the proteome, for organisms. It is already clear that alternative splicing has an extremely important role in expanding the protein diversity and is responsible, at least in part, for the discrepancy between the number of genes and the biological complexity seen in many organisms (Graveley 2001). Alternative splicing has been found in nearly all metazoan organisms as a means of producing functionally diverse polypeptides from a single gene, where variation in mRNA structure may take many different forms (Lopez 1998; Smith and Valcarcel 2000). For instance, introns that are normally excised can be retained in the mRNA. The positions of either 5' or 3' splice sites can shift to result in longer or shorter exons. In addition to the above changes in splicing, alterations in transcriptional start sites or polyadenylation sites may also allow production of multiple mRNAs from a single gene. Exonic splicing enhancers are often found at the 5' and 3' ends of an exon, and sometimes in the middle, and may regulate the accessibility of different exons to the splicing machinery through the formation of secondary structures (Wang, Selvakumar, and Helfman 1997; König, Ponta, and Herrlich 1998; Muro, Iaconcig, and Baralle 1998). Exon duplication and loss can also lead to alternative splicing (Kondrashov and Koonin 2001; Letunic, Copley, and Bork 2002; Modrek and Lee 2003). Furthermore, secondary structures of RNA can also alter the splicing pattern of human mRNA transcripts (Mikheeva et al. 1997). Recent estimates, based on analyses of expressed sequence tags (ESTs), have suggested that the transcripts from 35%–59% of human genes are alternatively spliced (Modrek et al. 2001; Brett et al. 2002; Modrek and Lee 2002).

The above findings have strongly suggested an important role for alternative splicing in the formation of biological complexity of the human body. Some researchers have recently presented results from genomic analyses of alternative splicing that strongly supported the hypothesis (Graveley 2001; Kondrashov and Koonin 2003; Kriventseva et al. 2003). Meanwhile, further questions have been raised in regard to the identification, functional roles, and regulation of alternative splicing forms across the whole genome, in addition to those relevant to the mechanisms, origin, and evolution of alternative splicing.

Understanding the mechanisms and functions of alternative splicing may provide a unique way of elucidating the evolutionary mechanisms of a genome as well as analysis of its sequence data. Comparative study of the amino acids usage and protein length of alternatively and non-alternatively spliced genes may therefore be necessary. The purpose of this study was to explore amino acid usage patterns and protein length distributions of alternatively and non-alternatively spliced genes among six eukaryotic genomes, including those of human (Homo sapiens), mouse (Mus musculus), rat (Rattus norvegicus), fruit fly (Drosophila melanogaster), Caenorhabditis elegans, and bovine (Bos taurus).


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
Data Source and Processing
The Swiss-Prot database has so far provided the most comprehensive information with regard to eukaryotic genomes, and it contains a list of genes that have been annotated as being alternatively spliced. In addition, a program "varsplic.pl" has been written for generating additional records from Swiss-Prot, with which one record can be generated for each isoform of the same protein (Kersey, Hermjakob, and Apweiler 2000). The output files from running the program on Swiss-Prot are also available from a "varsplic" non-redundant database (available online at ftp://ftp.ebi.ac.uk/pub/software/swissprot /varsplic_sprot.fas.Z). In the present work, protein products of alternatively and non-alternatively spliced genes from human, mouse, rat, fruit fly, C. elegans, and bovine have been extracted from the Swiss-Prot database (http://www.expasy.org, Release 40.34, 15 Nov 2002) and the "varsplic" database. The protein data could be divided into two subsets according to the key words (with or without) "alternative splicing." To minimize redundancy, short sequences of fewer than 20 amino acid residues were excluded, and only one such sequence from each set of homologous alternatively spliced genes was retained. A summary of the final data sets is shown in table 1, with the full content available online at ftp://166.111.30.65/pub/Data/.


View this table:
[in this window]
[in a new window]
 
Table 1 Numbers of Protein Encoding Sequences and Total Amino Acids Among Alternatively and Non-alternatively Spliced Genes from Six Eukaryotes.

 
Estimating the Level of Gene Expression
To analyze the amino acids usage and protein length distribution of highly expressed genes, the level of gene expression was estimated by counting the EST sequences, which may have served as an indirect reflection of the abundance of transcripts (Duret and Mouchiroud 1999) for each selected gene. The numbers of ESTs were retrieved from UniGene libraries via the linkage provided by the Swiss-Prot database for GenBank (http://www.ncbi.nlm.nih.gov, also available via ftp:// 166.111.30.65/pub/Data).

Statistical Analysis
A {chi}2 test has been used for examining the statistical significance of the difference in overall amino acid usage between alternatively and non-alternatively spliced genes. The value of {chi}2 was calculated as:


where Oi = frequency of the ith amino acid in alternatively spliced genes, Ei = frequency of the ith amino acid in non-alternatively spliced genes, k = 20, and df = 19. Probabilities were estimated based on the values obtained by comparison of cumulative distributions.

Because the capability of compared sample sets is different, a percent difference test (Ma 1982, pp. 194–197; Steel and Torrie 1980) was carried out for evaluating the statistical significance of the difference in the usage of single amino acids between alternatively and non-alternatively spliced genes; it was calculated as:


where 1 = x1/n1 (x1 = numbers of single amino acids in alternatively spliced genes; n1 = overall numbers of amino acids in alternatively spliced genes); 2 = x2/n2 (x2 = numbers of single amino acids in non-alternatively spliced genes; n2 = overall numbers of amino acids in non-alternatively spliced genes);



    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
Amino Acid Usage in Alternatively and Non-alternatively Spliced Genes Among Six Eukaryotes
The overall amino acid usage in alternatively and non-alternatively spliced genes in human (A), mouse (B), rat (C), bovine (D), fruit fly (E), and C. elegans (F) is summarized in figure 1. As shown, the two types of genes have exhibited a very similar tendency for amino acid usage, with the preferred amino acids (frequency >= 5%) in the two groups being Ala (A), Glu (E), Gly (G), Lys (K), Leu (L), Pro (P), Ser (S), Val (V), Arg (R), Thr (T), and Asp (D). Notably, these amino acids may all be characterized with two couplets of hydrophobicity and hydrophilicity: L, V, A, and P and E, G, K, S, R, T, and D. Chiusano et al. (2000) proposed that the physicochemical properties of those most frequently used amino acids are reflected in their secondary structures, and that analysis of prokaryotic and human proteins has suggested that whereas Leu, Glu, Lys, Ala, and Val prefer {alpha}-helix and ß-sheet structures, Asp, Pro, Gly, and Ser prefer aperiodic structures. We have shown here that use of above two groups of amino acids is also significantly biased among alternatively and non-alternatively spliced genes across the six species studied. On {chi}2 test, significant differences have also been found with regard to the amino acid usage between alternatively and non-alternatively spliced genes within each species (data not shown). To further study the above phenomena, we have calculated the difference in the usage of each type of amino acid between the two groups of genes among the six organisms and found significant differences in, respectively, 17 (human), 17 (mouse), 16 (rat), 15 (bovine), 15 (fruit fly), and 13 (C. elegans) out of all 20 amino acids (table 2). It is also interesting to note that amino acids of similar usage among alternatively and non-alternatively spliced genes have been the preferred ones, such as A, D, G, K, P, R, S, T, and V (fig. 1, table 2).



View larger version (46K):
[in this window]
[in a new window]
 
FIG. 1. Overall usage of amino acids among alternatively and non-alternatively spliced genes from human (A), mouse (B), rat (C), bovine (D), fruit fly (E), and C. elegans (F), and among highly expressed genes from human (A) and mouse (B)

 

View this table:
[in this window]
[in a new window]
 
Table 2 Differences in Overall Amino Acid Usage Between Alternatively and Non-alternatively Spliced Genes Among Six Eukaryotes.

 
To explore whether gene expression levels may have an influence on the bias in amino acid usage, we have estimated expression levels of human and mouse genes by counting the number of matches between each gene and EST sequences that may serve as an indirect measurement for the abundance of gene transcripts (Duret and Mouchiroud 1999). Highly expressed genes were identified according to their numbers of ESTs (table 3). Indeed, we have found that some housekeeping genes known to be expressed at high levels, such as those encoding for translation elongation factor, actin, ribosomal protein, heat shock protein, and histone, have relatively larger numbers of ESTs (>1,500) in our data sets. This finding affirms the validity of our approach. Subsequently, we have compared amino acid usage between highly expressed alternatively and non-alternatively spliced genes. A very similar tendency has been found between the two groups of genes in human and mouse for the usage of both the overall and biased used amino acids—e.g., Ala (A), Glu (E), Gly (G), Lys (K), Leu (L), Pro (P), Ser (S), Val (V), Arg (R), Thr (T), and Asp (D). On a percent difference test, a statistical difference was found with regard to the use of, respectively, 15 (human) and 14 (mouse) amino acids out of all 20 types (data not shown). These results suggest that the difference in amino acid usage between alternatively and non-alternatively spliced genes is more significant in highly expressed genes from human and mouse. This seems to imply that, to maintain structural/functional stability, the studied organisms probably all have endured a selective pressure during their evolution, which has resulted in the similar tendency for amino acids usage in both alternatively and non-alternatively spliced genes.


View this table:
[in this window]
[in a new window]
 
Table 3 Estimated Numbers of Relatively Highly Expressed Genes Based on Those of EST Sequences for Alternatively and Non-alternatively Spliced Genes in Human and Mouse.

 
The above findings seem in agreement with previous studies reporting that the basic pattern of cellular amino acid composition was relatively constant in all examined organisms (Okaysu et al. 1997; Sorimachi 1999, 2002; Sorimachi et al. 2000, 2001). Moreover, the significant difference in the amino acid usage seems to imply biological changes as the result of evolution. The patterns of amino acid usage may therefore reflect mechanisms of evolution crucial for the understanding of the molecular evolution and origin of alternatively spliced genes.

Protein Length Distribution in Alternatively and Non-alternatively Spliced Genes Among the Six Eukaryotes
We have also compared protein length distribution of alternatively and non-alternatively spliced genes from the six eukaryotes studied (table 4 and fig. 2). As shown in table 4, among all data sets, the average lengths of the protein products of alternatively and non-alternatively spliced genes were, respectively, 656.8 and 484.6 (human), 625.7 and 439.0 (mouse), 759.7 and 443.9 (rat), 764.3 and 504.3 (fruit fly), 679.3 and 439.1 (C. elegans), and 604.2 and 349.0 (bovine), whereas the average lengths of highly expressed alternatively and non-alternatively spliced genes were, respectively, 686.3 and 432.6 (human), and 494.5 and 407.4 (mouse). These results seem to suggest that, for both the overall genome and the highly expressed genes, the average lengths of alternatively spliced genes are significantly greater than those of non-alternatively spliced ones. Further analysis of the average length of the shorter isoform in alternatively spliced genes has also suggested that the average lengths of shorter isoforms from human, mouse, rat, fruit fly, C. elegans, and bovine are, respectively, 563.4, 551.5, 591.2, 701.4, 613.0, and 512.1 amino acids, all being greater than those of non-alternatively spliced ones (data not shown). Furthermore, as shown in figure 2, although the two types of genes have very similar protein length distribution among the six species, for those with a length fewer than 400 amino acids, non-alternatively spliced genes seem to have taken a remarkably higher proportion in humans. Nevertheless, highly expressed human and mouse genes have a protein length distribution similar to that of genes overall.


View this table:
[in this window]
[in a new window]
 
Table 4 Summary of Protein Lengths of Alternatively and Non-alternatively Spliced Genes.

 


View larger version (33K):
[in this window]
[in a new window]
 
FIG. 2. Distribution of protein lengths of alternatively and non-alternatively spliced genes in human, mouse, rat, fruit fly, C. elegans, and bovine

 
We have selected the subset of alternatively spliced genes from the Swiss-Prot database primarily because that database provides the largest set of experimentally verified splicing isoforms. To reduce the "noise" in the subset, the program called Varsplic.pl was developed for creating new records for alternatively spliced isoforms using information provided by the Swiss-Prot database (Kersey, Hermjakob, and Apweiler 2000). This may effectively reduce the numbers of alternatively spliced genes contained within the subset for non-alternatively spliced ones.

The influence of alternatively spliced genes that may potentially be mixed in the non-alternative subset has been evaluated. As shown in figure 3, when subset A is mixed with some of subset C2, the mean for subset C1, as estimated by all of subset A, will change e.g., to increase in case the mean of subset C2 is greater than that of C1. Similarly, we assume that the non-alternatively and alternatively spliced gene data sets have, respectively, a mean value of µ1 and µ2. Using c1 and c2 to denote the sample sizes of non-alternatively and alternatively spliced gene subsets and f to denote the proportion between C2 and A, the subset A, A with a mean value of µ1 + fµ2, and size of m. (m = c1 + fc2) will be the sum of subsets C1 and C2, whereas subset B with a mean value of µ2 and size of c2 will be equal to subset C2.



View larger version (7K):
[in this window]
[in a new window]
 
FIG. 3. The supposed two subsets of data (A) and (B)

 
Hence,


then,


If


then


Therefore, the higher f is, the higher f/(1 – f) and the lower µ1 will become, and vice versa.

It can be seen clearly that the influences of potentially mixed-in alternatively spliced genes in the non-alternatively spliced subset may be negligible.

Implications for the Origin of Alternatively Spliced Genes
Our findings seem to imply that alternatively spliced genes might have originated from non-alternatively spliced ones through DNA mutation or gene fusion. Results from other studies have also implied that mutation events may play an important role in the generation of alternatively spliced genes at the early stages of eukaryote evolution.

Mutational events during the course of evolution may have resulted in loss or retention of introns in the mRNA, as well as shift of transcription/polyadenylation/splice sites, which may all enable production of multiple mRNAs from a single gene. Various alternative splicing patterns have been identified among organisms (Wang, Selvakumar, and Helfman 1997; König, Ponta, and Herrlich 1998; Lopez 1998; Muro, Iaconcig, and Baralle 1998; Smith and Valcarcel 2000; Coward, Haas, and Vingron 2002). It has been discovered that some beneficial mutations could lead to a new gene function (Ohno 1970; Li 1997; Wagner 1998; Holland 1999; Hughes 1999), and that mutations that may result in the emergence of multiple alternative splicing variants are randomly distributed (Kriventseva et al. 2003). Luzi et al. (2000) revealed that the alternatively spliced mammalian shc, rai, and sli genes may have evolved from a single ancestor. It has been proposed that splicing errors might have been caused by mutations in the genomic DNA that have either destroyed a normal splicing signal or created a new one (Cooper and Mattox 1997), whereas other types of "mis-splicing" might have been caused by mutations in splicing regulatory proteins (Blencowe 2000; Jensen et al. 2000; Reenan, Hanrahan, and Ganetzky 2000). Graveley (2001) has suggested that splicing "errors" may also result from mutations of the pre-mRNA during transcription. Recent studies on C. elegans have suggested that the major targets of the mRNA surveillance system are not aberrantly spliced RNAs, but rather transcripts that are deliberately spliced to contain premature stop codons (Mitrovich and Anderson 2000).

The greater average length of those alternatively spliced genes among the six eukaryotes seemed to have suggested that such genes might have arisen through gene fusion or insertion of exogene/segments. Other mechanisms that may have an implication for the origin of alternatively spliced genes may include fusion (Thompson et al. 2000) or deletion between two adjacent genes (Nurminsky et al. 1998), DNA duplication (Kim et al 1992; Bark 1993; Stewart and Denell 1993; Hayward and Bonthron 1998; Kondrashov and Koonin 2001; Letunic, Copley, and Bork 2002), insertion (Kondrashov and Koonin 2003), and exon creation and/or loss (Modrek and Lee 2003), among others.

Clearly, our analysis may provide an important clue for the understanding the origin and evolutionary mechanisms of alternative splicing, although further research in the field is still needed.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 
We are indebted to anonymous reviewers for their insightful comments. This study was supported by the Postdoctoral Science Foundation of China (grant number 200211) and a grant from the Science and Technology Ministry of China (Preliminary Study on Functional Genome Systematics) (grant number 2001CCA01400).


    Footnotes
 
1 The first two authors should be regarded as joint first authors. Back

Peer Bork, Associate Editor Back


    Literature Cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 Literature Cited
 

    Bark, I. C. 1993. Structure of the chicken gene for SNAP-25 reveals duplicated exon encoding distinct isoforms of the protein. J. Mol. Biol. 233:67-76.[CrossRef][ISI][Medline]

    Blencowe, B. J. 2000. Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem. Sci. 25:106-110.[CrossRef][ISI][Medline]

    Brett, D., H. Pospisil, J. Valcarcel, J. Reich, and P. Bork. 2002. Alternative splicing and genome complexity. Nat. Genet. 30:29-30.[CrossRef][ISI][Medline]

    Chiusano, M. L., F. Alvarez-Valin, M. Di Giulio, G. Donofrio, G. Ammirato, G. Colonna, and G. Bernardi. 2000. Second codon positions of genes and the secondary structures of proteins. Relationships and implications for the origin of the genetic code. Gene 261:63-69.[CrossRef][ISI][Medline]

    Cooper, T. A., and W. Mattox. 1997. The regulation of splice-site selection, and its role in human disease. Am. J. Hum. Genet. 61:259-266.[ISI][Medline]

    Coward, E., S. A. Haas, and M. Vingron. 2002. SpliceNest: visualizing gene structure and alternative splicing based on EST clusters. Trends Genet. 18:53-55.[CrossRef][ISI]

    Duret, L., and D. Mouchiroud. 1999. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96:4482-4487.[Abstract/Free Full Text]

    Graveley, B. R. 2001. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 17:100-107.[CrossRef][ISI][Medline]

    Hayward, B. E., and D. T. Bonthron. 1998. Structure and alternative splicing of the ketohexokinase gene. Eur. J. Biochem. 257:85-91.[Abstract]

    Holland, P. W. 1999. Gene duplication: past, present and future. Semin. Cell Dev. Biol. 10:541-547.[CrossRef][ISI][Medline]

    Hughes, A. L. 1999. Adaptive evolution of genes and genomes. Oxford University Press, New York, Oxford.

    Jensen, K. B., B. K. Dredge, G. Stefani, R. Zhong, R. J. Buckanovich, H. J. Okano, Y. Y. L. Yang, and R. B. Darnell. 2000. Nova-1 regulates neuronspecific alternative splicing and is essential for neuronal viability. Neuron 25:359-371.[ISI][Medline]

    Kersey, P., H. Hermjakob, and R. Apweiler. 2000. Varsplic: Alternatively-spliced protein sequences derived from Swiss-Prot and TrEMBL. Bioinformatics. 16:1048-1049.[Abstract]

    Kim, J., J. J. Yim, S. Wang, and D. Dorsett. 1992. Alternate use of divergent forms of an ancient exon in the fructose-1,6-bisphosphate aldolase gene of Drosophila melanogaster. Mol. Cell. Biol. 12:773-783.[Abstract]

    Kondrashov, F. A., and E. V. Koonin. 2001. Origin of alternative splicing by tandem exon duplication. Hum. Mol. Genet. 10:2661-2669.[Abstract/Free Full Text]

    Kondrashov, F. A., and E. V. Koonin. 2003. Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequences. Trends Genet. 19:115-119.[CrossRef][ISI][Medline]

    König, H., H. Ponta, and P. Herrlich. 1998. Coupling of signal transduction to alternative pre-mRNA splicing by a composite splice regulator. EMBO J. 17:2904-2913.[Abstract/Free Full Text]

    Kriventseva, E. V., I. Koch, R. Apweiler, M. Vingron, P. Bork, M. S. Gelfand, and S. Sunyaev. 2003. Increase of functional diversity by alternative splicing. Trends Genet. 19:124-128.[CrossRef][ISI][Medline]

    Letunic, I., R. Copley, and P. Bork. 2002. Common exon duplication in animals and its role in alternative splicing. Hum. Mol. Genet. 11:1561-1567.[Abstract/Free Full Text]

    Li, W. H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.

    Lopez, A. J. 1998. Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu. Rev. Genet. 32:279-305.[CrossRef][ISI][Medline]

    Luzi, L., S. Confalonieri, P. Paolo, D. Fiore, and P. G. Pelicci. 2000. Evolution of Shc functions from nematode to human. Curr. Opin. Genet. Dev. 10:668-674.[CrossRef][ISI][Medline]

    Ma, Y. 1982. Experimental statistics. Agriculture Press, Beijing [in Chinese].

    Mikheeva, S., M. Hakim-Zargar, D. Carson, and K. A. Jarrell. 1997. Use of an engineered ribozyme to produce a circular human exon. Nucleic Acids Res. 25:5085-5094.[Abstract/Free Full Text]

    Mitrovich, Q. M., and P. Anderson. 2000. Unproductively spliced ribosomal protein mRNAs are natural targets of mRNA surveillance in C. elegans. Genes Dev. 14:2173-2184.[Abstract/Free Full Text]

    Modrek, B., and C. Lee. 2002. A genomic view of alternative splicing. Nat. Genet. 30:13-19.[CrossRef][ISI][Medline]

    Modrek, B., and C. Lee. 2003. Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat. Genet. 34:1-4.[CrossRef][ISI][Medline]

    Modrek, B., A. Resch, C. Grasso, and C. Lee. 2001. Genome-wide analysis of alternative splicing using human expressed sequence data. Nucleic Acids Res. 29:2850-2859.[Abstract/Free Full Text]

    Muro, A. F., A. Iaconcig, and F. E. Baralle. 1998. Regulation of the fibronectin EDA exon alternative splicing. Cooperative role of the exonic enhancer element and the 50 splicing site. FEBS Lett. 437:137-141.[CrossRef][ISI][Medline]

    Nurminsky, D. I., M. V. Nurminskaya, E. V. Benevolenskaya, Y. Y. Shevelyov, D. L. Hartl, and V. A. Gvozdev. 1998. Cytoplasmic dynein intermediate-chain isoforms with different targeting properties created by tissue-specific alternative splicing. Mol. Cell. Biol. 18:6816-6825.[Abstract/Free Full Text]

    Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, New York.

    Okaysu, T., M. Ikeda, K. Akimoto, and K. Sorimachi. 1997. The amino acid composition of mammalian and bacterial cells. Amino Acids 13:379-391.[ISI]

    Reenan, R. A., C. J. Hanrahan, and B. Ganetzky. 2000. The mle(napts) RNA helicase mutation in Drosophila results in a splicing catastrophe of the para Na+ channel transcript in a region of RNA editing. Neuron 25:139-149.[ISI][Medline]

    Smith, C. W., and J. Valcarcel. 2000. Alternative pre-mRNA splicing: the logic of combinatorial control. Trends Biochem. Sci. 25:381-388.[CrossRef][ISI][Medline]

    Sorimachi, K. 1999. Evolutionary changes reflected by the cellular amino acid composition. Amino Acids 17:207-226.[ISI][Medline]

    Sorimachi, K. 2002. The classification of various organisms according to the free amino acid composition change as the result of biological evolution. Amino Acids 22:55-69.[CrossRef][ISI][Medline]

    Sorimachi, K., T. Itoh, Y. Kawarabayasi, T. Okayasu, K. Akimoto, and A. Niwa. 2001. Conservation of the basic pattern of cellular amino acid composition of archaeobacteria during biological evolution and the putative amino acid composition of primitive life forms. Amino Acids 21:393-399.[CrossRef][ISI][Medline]

    Sorimachi, K., T. Okayasu, K. Akimoto, and A. Niwa. 2000. Conservation of the basic pattern of cellular amino acid composition during biological evolution in plants. Amino Acids 18:193-197.[CrossRef][ISI][Medline]

    Steel, R. G. D., and J. H. Torrie. 1980. Principles and procedures of statistics, a biometrical approach. 2nd edition. McGraw-Hill Book Company, New York.

    Stewart, M. J., and R. Denell. 1993. The Drosophila ribosomal protein S6 gene includes a 3' triplication that arose by unequal crossing-over. Mol. Biol. Evol. 10:1041-1047.[Abstract]

    Thompson, T. M., J. J. Lozano, N. Loukili, R. Carrió, F. Serras, B. Cormand, M. Valeri, V. M. Díaz, J. Abril, and M. Burset. 2000. Fusion of the human gene for the polyubiquitination coeffector UEV1 with Kua, a newly identified gene. Genome Res. 10:1743-1756.[Abstract/Free Full Text]

    Wagner, A. 1998. The fate of duplicated genes: loss or new function? Bioessays 20:785-788.[CrossRef][ISI][Medline]

    Wang, Y.-C., M. Selvakumar, and D. M. Helfman. 1997. Alternative pre-mRNA splicing. Pp. 242–279 in A. R. Krainer, ed. Eukaryotic mRNA processing. Oxford University Press, Oxford.

Accepted for publication June 18, 2003.