From the
The average vertebrate gene consists of multiple small exons (average size, 137 nucleotides) separated by introns that are considerably larger(1) . Thus, the vertebrate splicing machinery has the task of finding small desired exons amid much longer introns. The splice site consensus sequences that drive exon recognition are located at the very termini of introns(2, 3) . Despite the discriminatory challenge faced during exon recognition in large multiexon premessenger RNAs, vertebrate splice sites are short and poorly conserved. In fact, splice site sequences in mammals are less conserved than their yeast counterparts despite the fact that only a minority of genes in Saccharomyces cerevisiae have introns; and those genes that are split by introns usually have only a single intron(4, 5) . Thus, vertebrate splicing contends with a more complex specificity problem via recognition of less precise consensus sequences. Any mechanism for the orchestration of splicing in multiexon vertebrate genes must provide an explanation for this puzzle.
Part of the solution of the puzzle comes from the observation that individual splice sites are not independently recognized consensus sequences. In both yeast and vertebrate splicing, interactions between 5` and 3` splice sites and the factors that recognize them have been observed during the earliest steps of spliceosome assembly(4, 5, 6, 7, 8, 9, 10, 11, 12) . Usually these interactions are depicted as occurring between the 5` and 3` splice sites across an intron. Experimentally, such interactions have been observed with in vitro splicing precursor RNAs having naturally short or artificially shortened introns. It is difficult to extrapolate initial interactions between the factors that recognize the 5` and 3` splice sites flanking a small vertebrate intron to introns that can naturally be 100 kilobases in length, especially given the likelihood that such introns will contain sequences that are as good a match to consensus splice sites as the actual utilized sites.
Models that invoke pairing between the splice sites across an
exon, as contrasted with pairing across an intron, are useful
perspectives of splice site pairing for the splicing of pre-mRNAs with
large introns and small exons. Such an exonic perspective of splice
site recognition has been termed ``exon
definition''(10) . This review discusses exon definition
and contrasts it with intron-oriented perspectives that are more useful
when considering splicing in lower eukaryotes with small introns. The
basic exon definition model proposes that in pre-mRNAs with large
introns, the splicing machinery searches for a pair of closely spaced
splice sites in an exonic polarity (Fig. 1). When such a pair is
encountered, the exon is defined by the binding of U1 and U2 snRNPs ()and associated splicing factors, including the 3` splice
site recognizing factors U2AF and SC35 and the 5` splice
site-recognizing factor
ASF/SF2(2, 13, 14, 15, 16) .
Following definition of the exon, neighboring exons must be juxtaposed,
presumably via interactions between the factors that recognize
individual exons. Thus, from this perspective, assembly of the active
vertebrate spliceosome consists of the sequential steps of exon
definition and exon juxtaposition.
Figure 1: Exon Definition in pre-mRNAs with small exons and large introns. snRNPs (red and green) and SR protein (yellow) are shown interacting with isolated exons during exon definition. U4/U5/U6 snRNPs (black) are depicted as joining the assembly during the subsequent step of exon juxtaposition.
Predictions of Exon Definition
The exon definition model offers predictions of pre-mRNA behavior. Several of these predictions have been tested in the last several years, and the results lend credence to an exonic perspective of splice site recognition.
Figure 2: Predictions of the phenotype of mutation of the 5` splice site bordering an internal exon. Exon pairing of splice sites predicts exon skipping or the activation of a proximal cryptic 5` splice site (left), whereas intronic pairing of splice sites predicts intron inclusion or distal cryptic site activation (right).
Mutation of vertebrate splice sites also leads to exon skipping. A survey of mammalian mutations available in the data base in the summer of 1994 indicated that over 100 splice site mutations have been characterized in disease gene DNA(21) . Four phenotypes were observed: exon skipping, activation of a cryptic splice site, creation of a pseudo-exon within an intron, and intron retention, in ratios of 51, 32, 11, and 6%, respectively. The most frequent phenotype was exon skipping. Exon skipping is a predicted phenotype from an exon perspective because mutation of the splice site at one side of an exon should inhibit pairing of splice sites across exons and inhibit recognition of the exon. Rejection of the exon leads directly to exon skipping.
The observation of exon skipping strongly indicates that splice sites are recognized as exonic pairs. It is presumably this dependence upon a pair of sites that minimizes recognition of isolated cryptic sites within large vertebrate introns. Occasionally, mutation of human genes has created a strong splice site deep within an intron. Such created sites have been observed to be utilized via the activation of a nearby cryptic splice site of the opposite polarity to create a pseudo-exon from within an intron. Again, the observation is that only pairs of splice sites can be recognized and that cryptic splices in introns can only be activated by creation of a nearby site of the opposite type in an exonic polarity.
Occasionally, mutation of an internal splice site results in intron retention. Exon definition would not predict intron retention, except perhaps for very small introns. Of the splice site mutations mentioned above, only 6% caused intron retention. Four of the included introns were very short, and three were terminal introns, suggesting abrogation of exon definition modes of recognition when introns are very small or at the ends of pre-mRNAs (see below). Three examples involved large internal introns and cannot be explained by current exon perspectives.
Figure 3: Internal exon size distribution. Length distribution of 1600 primate internal exons from a library normalized to represent highly related exons only a single time (top) (library kindly provided by D. Searles, University of Pennsylvania) or 194 alternative vertebrate cassette exons (bottom) compiled by Stamm et al.(46) or by S. Smith and T. A. Cooper (Baylor College of Medicine).
Last exons begin with a 3` splice site and terminate with a poly(A) site(30) . They are often the largest exon in a vertebrate gene, with an average size of approximately 600 nucleotides(1, 31) . Exon recognition predicts that factors recognizing 3` splice sites interact with factors recognizing poly(A) sites to recognize last exons. Indeed mutation of 3` splice sites inhibits the in vitro polyadenylation cleavage reaction(32) . Just as with first exons, mutation of the signal at the distal end of a 3-terminal exon, the poly(A) site, inhibits in vitro removal of proximal but not distal introns(33) . These results suggest that splicing and polyadenylation factors interact across 3`-terminal exons. The mechanism of this interaction is unclear, although recent observations have suggested that U1 snRNPs or the U1 snRNP A protein are involved, either positively or negatively, via recognition of exon internal sequences upstream of the polyadenylation signal AAUAAA(34, 35, 36) .
Exon Enhancer Sequences and Differential Splicing
Exon definition has proven to be a useful framework for considering differential splicing, especially those differential splicing events involving cassette exons that are differentially included. Generally, differentially recognized exons have either weaker splicing signals or a suboptimal length compared with constitutive exons (3, 37) (Fig. 3), suggesting that the constitutive exon definition process is so strong as to be difficult to regulate unless the involved exon recognition signals are weak. Exon inclusion in these cases appears to be via recognition of special sequences by tissue or development-specific splicing factors(38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49) . One class of sequences commonly found associated with differential exons, referred to as exon enhancers, resides within the target exon. The existence of exon internal consensus sequences was initially surprising because of the constraints imposed upon such sequences by coding requirements. A family of such sequences, often purine-rich and coding for a wide variety of amino acids, has been observed to be important for recognition of weak exons. These sequences appear to be the binding site for a family of splicing factors known as SR proteins because of the arginine and serine repeats that characterize them(50) . In addition to binding exon sequences via their RNA binding domains, the SR proteins also make protein-protein contacts via their SR domains with U2AF bound near the 3` splice site and U1 snRNPs bound to the 5` splice site via arginine-serine-rich domains present in each(12, 15, 51) . Such recognition makes the SR proteins ideal candidates for exon-bridging proteins involved in exon definition. Bridging across exons has been experimentally detected in that UV cross-linking of U2AF to the 3` splice site of an isolated exon is affected by the strength of the 5` splice site terminating the exon(52) .
Interestingly, SR proteins have not yet been found in S.
cerevisiae. Even those yeast splicing proteins that are equivalent
in known function to vertebrate proteins containing SR domains lack SR
domains in their yeast forms(53, 54) . From an exon
definition viewpoint, this absence may not be surprising in that
organisms with small introns, such as S. cerevisiae, may not
use exon definition and therefore may not need many or all of the SR
proteins. In general, evidence exists to suggest that pre-mRNAs with
small introns use the intron, rather than the exon, as the initial mode
of pairing between splice sites(4, 5, 11) .
In Saccharomyces pombe, pre-mRNAs have multiple small introns
of less than 100 nucleotides(55) . In Drosophila, 50%
of the introns are less than 100 nucleotides and are often flanked by
large exons(1, 56) . Expanding small introns in either
organism inhibits splicing of the intron or activates cryptic sites
within the expanded introns(57, 58) . ()Thus, in genes with small exons, expanding the exon leads
to aberrant splicing, whereas in genes with small introns, expanding
the introns leads to aberrant splicing. These observations suggests
that the pairing unit utilized is that offering the smallest distance
between two adjacent splice sites.
Mutation of splice sites in genes with small introns has a different phenotype than the same mutation in genes with large introns. In pre-mRNAs with small introns, mutation of an internal 5` splice site does not lead to exon skipping. Instead the mutated intron is included in the final mRNA and the splicing of neighboring introns is unaffected (58) . A difference in splicing signals between the two types of introns has also been noticed(56, 57) . Small introns often lack the pyrimidine track located between the branch point and the 3` splice site of vertebrate but not S. cerevisiae introns. Therefore, small introns appear to have different signals and to be recognized somewhat differently than large introns.
Initial pairing of splice
sites across an exon may be similar to initial pairing across an
intron. Except for the SR proteins, the vertebrate factors known to be
required for splicing are found in yeast and are required there as
well. Several lines of experimental evidence also suggest that either
the intron or exon can be the pairing unit during pre-mRNA recognition.
As mentioned earlier, expansion of an internal exon in a vertebrate
gene can cause exon skipping. If the same exons and their flanking
splice sites, however, are placed in a gene in which the introns
flanking the expanded exon are small, the expanded exon is
constitutively included(59) . Expansion of the
small introns reverts the phenotype to exon skipping. These
observations suggest that large exons are only a problem in genes with
large introns, and more importantly, that the same splice sites can be
recognized in either intronic or exonic polarity (Fig. 4).
Figure 4: Earliest complex formation in vertebrates via exon definition versus that in lower eukaryotes via intron definition.
Exon/intron architecture in Drosophilamelanogaster also suggests multiple ways of pairing splice sites within the same pre-mRNA. Although many Drosophila genes fit neatly into two categories characterized as genes with small introns and large exons or as genes with small exons and large introns, there are a reasonable number of genes that have a mixed exon/intron architecture, suggesting that over part of their length the exon is the unit of recognition and over part of their length the intron is the unit of recognition. Sorting out how two such recognition mechanisms can operate within the same precursor RNA without a disruption in exon recognition or exon ordering is one of the future challenges for exon definition.
Although exon definition suggests how exons and their splice sites are initially recognized by the splicing machinery, it does not immediately offer a solution to the second step in spliceosome assembly (Fig. 1). Juxtaposition of exons across large vertebrate introns is a formidable problem, especially if inadvertent exon skipping is to be avoided. Little insight is available as to how such juxtapositioning could occur. A likely scenario invokes interactions between the SR proteins bound to one exon with the SR proteins bound to an adjoining exon. In addition to the SR proteins, another class of nuclear proteins found only in organisms with large introns is the hnRNP proteins(60) . At least one hnRNP protein affects 5` splice site recognition and is likely to have a major role in differential splicing (22, 61) . Like the SR proteins, the hnRNP proteins contain both an RNP recognition domain and a protein-protein recognition domain. Unlike the SR proteins, the limited information available suggests that the hnRNP proteins recognize intronic consensus sequences rather than exonic sequences. Given their capacity to differentially recognize RNA sequences and their preference for intronic sequences, the hnRNP proteins remain potential interesting players in both differential splicing and exon juxtapositioning.