Section of Evolution and Ecology, Division of Biology, University of California, Davis
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In Clarkia the gene was duplicated before the divergence of the extant species. The two paralogous genes, designated PgiC1 and PgiC2, have an average pairwise nucleotide identity of about 96% in exons and 88% in introns (Gottlieb and Ford 1996
). Comparison of the genomic library clones from three species led to the discovery of a conserved region about 430490 nt upstream from the translation start. A primer that binds within the conserved region and primers against exons were used in PCR to obtain templates for sequencing the upstream region from other species. The 5' region of 26 PgiC genes from Clarkia and one from the related O. mexicana was examined (Gottlieb and Ford 1996
; Ford and Gottlieb 1999
, 2002
, unpublished data). Most of the genes, including both PgiC1s and PgiC2s, have a perfectly aligned TATA sequence about 80 bases upstream of the start codon that seemed a likely candidate as a binding site for the TATA-binding protein during transcription initiation. But seven genes from diploid species (and their orthologues from tetraploids) have one or more of the four bases either deleted or substituted (fig. 1
). Because all these genes encode active PGIC isozymes (Gottlieb and Weeden 1979
; Weeden and Gottlieb 1979
; unpublished data), they must use either a TATA located elsewhere or a nonconsensus sequence.
|
In every species we found an intron in the leader sequence of PgiC. In C. xantiana the +1 site and TATA box are located well upstream of the previously remarked TATA upstream of the start codon; thus, this latter TATA has nothing to do with transcription initiation. Here, we describe the structure of the 5'-nontranslated region of PgiC and then compare the evolution of the leader intron and the leader exon to that of the introns and the exons in the coding portion of the gene. Leader introns are known from relatively few plant genes and have not been compared previously with the introns in the coding regions. The comparison is significant because it is not known whether exons and introns outside the coding region of a gene show different constraints when compared with those within it other than the obvious coding constraints on exons.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
RNA Isolation and RLM-RACE
The RLM-RACE experiments used RNA prepared from one individual each of C. xantiana subsp. parviflora, C. arcuata, C. modesta and from a single line of A. thaliana, in each case using the same PgiC genotype previously used for genomic sequencing. Total RNA was prepared with the RNeasy Plant Total RNA Kit (Qiagen) from 5060 mg leaves of single seedlings. Before each experiment, the total RNA pool was evaluated to confirm that it contained PgiC. This was done by reverse transcribing an aliquot of the RNA pool with an upstream-pointing primer that binds exon 6 and then carrying out a PCR on the cDNA with primers that bind exons 1 and 4 or another combination of exons near the 5' end of the gene (fig. 2
). The expected fragment was observed in every case, so RLM-RACE was carried out on a second aliquot of the RNA pool.
|
For C. xantiana PgiC1, reverse transcription utilized a PgiC-specific primer against exon 6. The first PCR on cDNA was carried out with primers against the outside site on the oligo and exon 5 and the second PCR on a 125-fold dilution of PCR1, with primers against the inside site on the oligo and a PgiC1-specific site on exon 4 (fig. 2 ). (Other similar primers were used for other species and genes.) After electrophoresis and ethidium bromide staining, the agarose gel of the PCR2 reaction showed a smear of fragments between 350 and 450 nt. Fragments were cut from the gel, the DNA recovered with the Zymoclean DNA Gel Recovery Kit (Zymo Research), and cloned with the TOPO TA Cloning Kit for Sequencing (Invitrogen). PCR using PgiC1-specific primers against exons 1 and 4 confirmed that the clones carried PgiC1 fragments. Restriction enzyme analysis identified clones with varying lengths of inserts, but none of the inserts initially sequenced proved to be of full length, i.e., including the transcription start site.
The identification of the clones carrying full-length inserts was complicated by several factors: relatively few clones contained full-length inserts; at the start, we did not know how long the leader might be, and eventually we learned that the true full-length inserts varied in size because of alternative splicing (see Results). Also, small differences in length are difficult to resolve on a gel. PCR experiments using the LP primer (the primer that binds in the conserved region 430490 nt upstream of the start codon) showed that certain clones included the LP-binding site. These proved to be of full or nearly full length. PCR tests using the LP primer and the primer against exon 4 were subsequently used to identify additional full-length clones for both genes in C. xantiana. In total, 63 RLM-RACE clones were sequenced on a Perkin-Elmer ABI 377 Sequencer. The sequences were aligned as described earlier (Gottlieb and Ford 1996
). Accession numbers for sequences of representative RACE clones are AJ419548AJ419555. Only the longest clone of each variant pattern was submitted.
Genomic Sequences
Genomic sequences of PgiC1 (X89386) and PgiC2 (X89387) from C. xantiana subsp. parviflora were obtained previously (Ford, Thomas, and Gottlieb 1995
). To look for the variation in the leader intron splice junctions and adjacent regions, we examined genomic DNA from eight other alleles of PgiC1 and 10 of PgiC2 sampled from seven other geographically dispersed populations of C. xantiana subsp. parviflora recently discovered by Eckhart and Geber (1999)
and three of the outcrossing subsp. xantiana. Sequencing templates were generated by PCR using the LP primer (fig. 2
) and one that binds in exon 1. The accession numbers are AJ419527AJ419547.
Genomic sequences of PgiC1s and PgiC2s of other Clarkia species including C. arcuata and C. modesta and PgiC of O. mexicana, used as an out-group, were obtained previously (Gottlieb and Ford 1996
; Ford and Gottlieb 1999
, 2002
, unpublished data; X89384X89397, AJ302021AJ302022, AJ312367AJ312371, AJ437270AJ437271, AJ437274AJ437278).
PgiC from A. thaliana (X69195) was originally described from both the cDNA and genomic clones (Thomas et al. 1993
). An additional upstream sequence was obtained from both the cDNA clone A12C (AJ419524) and the genomic library clone. The newly obtained genomic sequence was in full agreement with the corresponding GenBank sequence AB007647 (A. thaliana chromosome 5, P1 clone: MJB21). A PgiC cDNA clone from O. sativa (rice) (D45217) was compared with the corresponding genomic sequence from AC091494 (O. sativa chromosome 3, BAC clone OSJNBa0070N04).
Sequencing of PCR-amplified fragments and plasmid subclones of the genomic library clones was performed as described earlier (Gottlieb and Ford 1996
).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
We compared the rice cDNA sequence (D45217) with the corresponding region of the rice chromosome 3 sequence (AC09194, version of September 14, 2001) to find out whether PgiC from a monocot also has a leader intron and whether the introns in its coding region correspond to those in Clarkia and A. thaliana. The match with cDNA begins at position 73207, and the start codon begins at 73466 in the chromosome sequence. Alignment of the cDNA and genomic sequences showed that the rice gene also has a leader intron and 22 introns in the coding region; these are in exactly the same positions as those in Clarkia. The leader intron is 195 nt long, is located 9 nt upstream of the start codon, and has consensus dinucleotide borders.
The RLM-RACE clones were aligned with the genomic sequence of the 5' region of the PgiC genes from 15 species of Clarkia and Oenothera studied previously (Gottlieb and Ford 1996
; Ford and Gottlieb, 2002
). All the genes have appropriate splice sites in similar positions and thus have the leader intron (fig. 4 ).
|
The PgiC2 genes have a four-base tandem duplication of the region AGGT that includes the 5' splice site of the leader intron, thus creating two potential splice sites (fig. 4 , positions 1a and 1b). The extra bases result from a duplication in PgiC2 rather than a deletion in PgiC1 because PgiC of the out-group Oenothera has only one copy of the splice region (fig. 4 ). Seven of the PgiC2 genes have GT at both positions. The RACE clones demonstrate that one of these, C. xantiana PgiC2, uses the upstream GT. Two PgiC2 genes, those of C. lewisii and C. concinna, have GT only in the upstream position with GC in the downstream site. Finally, in C. modesta PgiC2 and C. lingulata PgiC2, the upstream position is mutated to CT. The RACE clones show that C. modesta PgiC2 uses the downstream position GT and, presumably, so does C. lingulata PgiC2.
All the PgiC genes have a 3'-AG splice site at the same position (fig. 4 , position 3), except C. lewisii PgiC1 and C. gracilis PgiC2, in which the site is mutated to TG and AT, respectively. These genes may use an AG 8 nt downstream (fig. 4 , position 4) or another AG 6 nt further downstream; the former is more likely because it is used for alternative splicing (below).
The leaders of eight other alleles of PgiC1 and 10 of PgiC2 from relatively widespread populations of C. xantiana were also sequenced. There were 1.5% polymorphic sites in PgiC1 and 2.8% sites in PgiC2 but no variation at either splice site of the leader introns (data not shown).
Alternative Splicing of the Leader Intron
RLM-RACE clones of three of the five Clarkia genes showed two alternative splice patterns of the leader intron (fig. 3
). Alternative splicing occurred at the 5' end of the intron in one of the genes and at the 3' end in the other two genes.
Six clones of the C. arcuata PgiC1 transcripts used the same 5'-GT as other PgiC1 genes, but two clones used an alternative site, which is 19 nt downstream (fig. 3 , position 2). This alternative dinucleotide was also found in other PgiC1 and PgiC2 genes but was not used for splicing. Its use in C. arcuata PgiC1 may reflect a change in sequence context resulting from an adjacent deletion of 21 nt (fig. 3 ). The deletion places the alternative 5'-GT splice junction in a context (GGTAA) similar to that marked by positions 1a and 1b (fig. 3 ), whereas position 2 in the other genes (fig. 3 ) is in a GGTT or TGTT context.
Clarkia xantiana PgiC2 has two potential 5'-GT splice sites only a few bases apart, resulting from the four-base duplication. Nevertheless, all 15 RACE clones used the upstream site, clearly showing that this site was preferred (fig. 3 , position 1a).
A particular AG-3' was generally used as the acceptor splice site (fig. 3 , position 3) but one out of 12 clones of C. xantiana PgiC1 used an alternative AG-3' 8 nt downstream (fig. 3 , position 4) and three out of seven clones of C. modesta PgiC2 also used the same downstream site. The use of this downstream AG-3' site in some RACE clones makes it likely that it is also the acceptor site used in C. lewisii PgiC1 and C. gracilis PgiC2 (fig. 4 ).
The leader introns spliced from the cloned transcripts range from 262 to 309 nt in length (fig. 3 ), with much of the variation resulting from insertions and deletions (see subsequently) as well as the use of alternative splice sites.
Thirteen available transcripts of Arabidopsis PgiC used identical splice sites to remove the leader intron. The transcripts are the 11 RACE clones, the library clone, and another PgiC cDNA (EMBL AF372970) that includes 23 nt upstream of the leader intron.
Transcription Start Site
The discovery of the leader intron explains why a TATA sequence near the start codon (fig. 1
) could be present or absent in expressed Clarkia PgiC genes. This TATA is actually located within the leader intron and cannot be involved in initiating transcription. A likely +1 transcription start site and TATA box were identified well upstream of the leader intron in C. xantiana PgiC genes.
For C. xantiana PgiC1, the three longest RACE clones began at an identical position 181 nt upstream of the intron. The corresponding genomic sequence shows a putative TATA box at -21, four additional TA-rich regions, and a CAAT box upstream (fig. 5 ). Similarly, for C. xantiana PgiC2, the seven longest clones begin 165 nt upstream of the intron, with an apparent TATA box at -30, additional TA-rich regions, and a CAAT identified in the genomic sequence upstream (fig. 5 ). The recovery of a number of clones of each transcript with the same 5' base and a candidate TATA box suggests that the longest clones from these genes are full-length and their 5' terminus is the transcription start (+1). The +1 of PgiC1 is 11 nt upstream of that of PgiC2 (fig. 5 ).
|
The longest cDNA clones of PgiC of Arabidopsis and rice include 94 and 55 nt, respectively, upstream of the leader intron but are presumably not complete. Thus, it is not yet possible to identify the transcription start of the gene in these species.
Thus, although the RLM-RACE procedure is intended to produce only full-length transcripts, in our hands it was not generally successful. The length variation at the 5' end of the RACE clones probably resulted from RNA degradation before ligating the oligo to the mRNA or from incomplete dephosphorylation, or both. For several of the genes, the PCR2 produced two fragment length classes. Individual clones obtained from both size categories showed a range of sizes, and there was no evidence that they represent products from a second transcription start. Because RLM-RACE is too time consuming and expensive to repeat until perfect, identification of the +1 site may require sequencing many clones until a number is found with identical starting points located at an appropriate distance downstream from a TATA box. After we discovered that the LP-binding site was included in the PgiC messages, PCR analysis using this primer made it possible to select relatively long RACE clones for sequencing, enabling us to obtain multiple clones of the longest transcripts for both genes in C. xantiana.
Insertions, Deletions, and Base Substitutions in the PgiC Leader
The unprocessed leader extends from the transcription start site to the start codon, a length of 463 and 471 nt in C. xantiana PgiC1 and PgiC2, respectively. The length in the other genes included in figure 4
, assuming that the +1 is about 20 nt upstream of the LP site (fig. 5
), ranges from about 440 nt in C. unguiculata PgiC2 to about 520 nt in C. gracilis PgiC2. The leader exon is the upstream 30%40% of this region, from the +1 to the leader intron. The leader intron, about 57%66% of the total, is followed by the 18 nt (10 nt for some genes) of the nontranslated portion of exon 1 (fig. 2
). After excision of the leader intron, the mature leader consists of the leader exon and the nontranslated part of exon 1.
The leader intron, ranging from 252 nt in C. unguiculata PgiC2 to 325 nt in C. gracilis PgiC2, is generally longer than all but five of the introns in the coding region. The only exceptions, in a few genes, are certain introns in the coding region that are very large because of the transposonlike insertions (Gottlieb and Ford 1996
). Neither the leader exon nor the leader intron shows any very large insertions or deletions, direct or inverse repeats, runs of dinucleotides or like features. (Some PgiC1 leader introns have a small tract of varying numbers of A's.)
There are, however, numerous smaller insertions and deletions (collectively termed gaps or indels). In the leader intron there are about 10 gaps characterizing single PgiC1 sequences, nine characterizing single PgiC2 sequences, and 13 shared by two or more sequences. (Because a gap may result from a number of insertions or deletions [or both], we cannot infer the number of events causing the observed pattern. Also, the characterization of an insertion as "shared" may be problematic if there are differences in the length or sequence, particularly if the insertion is short.) With one exception, every pair of gene sequences differs by at least one gap in the leader intron. Two gaps distinguish all PgiC1s from PgiC2s: the 4-nt duplication that includes the 5' splice site in PgiC2 and a 12-nt insertion in PgiC2 (these are identified as insertions by reference to the out-group Oenothera). The duplication of the 5' splice site region is noteworthy because there are no deletions, insertions, duplications, or substitutions in the splice sites of introns in the coding region of expressed PgiC genes (data not shown).
The frequency of gaps in the leader intron is similar to that of intron 7, the only similarly sized intron of the coding region. Intron 7 is about 250 nt in length and has about nine gaps characterizing single PgiC1 sequences, 10 characterizing single PgiC2 sequences, and seven shared by two or more sequences; among the seven are two deletions in PgiC2 versus PgiC1 and the out-group PgiC. Comparison can also be made with intron 12, about 200230 nt in PgiC2s but about 400 nt in PgiC1s because of a single large deletion in the PgiC2s. If the region of this deletion is excluded, there are about six gaps characterizing single PgiC1 sequences, eight characterizing single PgiC2 sequences, and 12 shared by two or more sequences; one of the latter is an insertion in all PgiC1s.
Among the other introns, 16 are generally less than 170 nt; among these the introns at the 3' end of the gene exhibit more gaps. For example, introns 1, 3, and 5, all less than 100 nt in length, have only four, six, and 10 gaps, respectively. In contrast, introns 18 through 22, also generally less than 100 nt, have from 16 to more than 30 gaps apiece plus regions where the pattern of overlapping insertions and deletions is too complex to make a count. The four longest introns have the most gaps. Intron 10, typically about 450 nt but varying from 282 to 889 nt, has over 60 gaps. Thus, relative to its length, the leader intron appears similar to the introns of the coding region in the accumulation of the gaps.
The leader exon, ranging from about 144 nt in C. heterandra PgiC2 to 195 nt in C. epilobioides PgiC2, is longer than all coding exons, except that it is shorter than exon 5 (156 nt) in three genes. Unlike coding exons, the leader exon also exhibits numerous gaps. The region where comparisons can be made is bounded at the 5' end by the LP site and varies from 106 to 157 nt. Six gaps characterize single PgiC1 sequences, at least seven characterize single PgiC2 sequences, and several are shared by groups of PgiC1s or PgiC2s. There are also three gaps that distinguish PgiC1s from PgiC2s: an 8-nt deletion in PgiC2, a 1-nt insertion in PgiC1, and a 22-nt deletion in PgiC1. In contrast, in the 1,707- to 1,713-nt length of the coding exons in Clarkia PgiC genes there are only two gap areas: a codon in exon 22 that is represented by one, two, or three copies in various genes, and a trinucleotide that overlaps two codons and is duplicated in C. rostrata PgiC1.
The rate of base substitutions in the entire 5' leader is similar to that in the introns of the coding region and much higher than the rate in the coding exons. For example, comparing the divergence of PgiC1 and PgiC2 of C. xantiana, the mature leaders (including both the leader exon and the nontranslated part of exon 1) differ by 11.8% (Kimura two-parameter corrected distance, K2p = 0.129; standard error, SE = 0.031), the leader introns by 10.5% (0.114 ± 0.024), and the introns of the coding region, taken overall, by 11.2% (0.121 ± 0.006). The variation among these three regions is insignificant and is considerably less than the variation seen among the 22 individual introns of the coding region. For example, between PgiC1 and PgiC2 of C. xantiana, seven introns differ by less than 10%, seven by more than 10% but less than 12%, and eight by more than 12%. In contrast, the coding exons of the two genes in C. xantiana differ by only 3.7% (K2p = 0.038 ± 0.005). Similar results are obtained by averaging over all pairs of PgiC1s and PgiC2s in figure 4 : the mature leaders differ by 11.4%, the leader introns by 11.6%, the introns of the coding region by 12.9%, and the coding exons by only 4.2%. Comparisons among PgiC1s only and among PgiC2s only show approximately the same pattern, but actual values vary greatly depending on the relatedness of the species (data not shown).
Another way of comparing the variability in the leader region with other noncoding and coding regions is to look at the proportion of the variable sites, considering only sites where all sequences have a base. Among the sample of PgiC1s, the proportion of variable sites is 22% in the mature leader, 28% in the leader intron, 31% in the introns of the coding region, and 12% in the coding exons. Among the PgiC2s, the proportions are, respectively, 25%, 26%, 27%, and 10%. Thus, in both genes, the leader region has a similar proportion of variable sites as the introns of the coding region but two to nearly three times the variability of the coding exons.
Base Composition in the PgiC Leader
The leader intron and the leader exon differ markedly in base composition, with the leader intron being more like the introns in the coding regions and the leader exon being more similar to the coding exons (table 1
). In general, both the leader intron and the introns in the coding region of Clarkia PgiC genes have a high proportion of T + A (60%68% for the leader intron, 65%67% for the introns of the coding region), mostly attributable to having more T's. The leader exon has a higher proportion of C + Glike coding exons, but in the leader exon this reflects a uniquely high proportion of C's and a low proportion of G's (similar to that in the introns), whereas in the coding exons the proportion of C's and G's is roughly equal. For PgiC1 of C. xantiana, for example, a 4 x 2 contingency table comparing the leader intron (A, 73; T, 103; G, 42; C, 46) and the leader exon (A, 40; T, 46; G, 25; C, 70) shows a highly significant difference (2 = 26.1, df = 3, P < 0.001) with the largest contribution coming from the different proportions of C and the second largest from the difference in T.
|
In all the Clarkia, Arabidopsis, and rice PgiC genes the base frequencies of exon 1 (including the translated and nontranslated nucleotides) are similar to the average given in table 1
for the coding exons. The proportion of T's is sometimes lower, 22%29% in Clarkia and rice, and only 15% in Arabidopsis (data not shown). Thus, the leader intron is flanked on both sides by exons with a markedly lower proportion of T's, although the difference with exon 1 is not statistically significant because of the small size of exon 1. Again, using PgiC1 of C. xantiana as an example, a 4 x 2 contingency table comparing the leader intron with exon 1 (A, 19; T, 18; G, 20; C, 12) has 2 = 7.6, df = 3, and P < 0.1.
Plant genes generally have about 15% more T's in the introns in the coding region than in the coding exons, a difference that has been shown to be important for the correct recognition of the splice sites (Brown and Simpson 1998
; Lorkovic et al. 2000
). Thus, the contrast in base composition between the leader exon and the leader intron in PgiC is likely functional and demonstrates that the region is evolving under certain constraints despite its relatively rapid sequence divergence.
In the absence of selective constraints, base composition is expected to approach an equilibrium dependent on the relative frequency of the spontaneous mutations of each possible type (A to G, A to T, A to C, G to A, etc.; Li 1997,
pp. 5968). Lacking direct information regarding these frequencies, two estimates of unselected base composition may be provided by the region upstream of the +1 transcription start site and by the fourfold degenerate sites in the coding exons (Li 1997,
e.g. pp. 401410, pp. 422423). The upstream region may include some regulatory sites, but if relatively short, they would have a negligible effect on the overall base composition. Upstream sequences from genomic library clones of PgiC1 (1,110 bp) and PgiC2 (683 bp) of C. xantiana, PgiC1 (347 bp) and PgiC2 (760 bp) of C. lewisii, and PgiC1 (884 bp) of C. mildrediae have varying base compositions (table 1
) but are generally high in A and T and thus more like the introns than the exons. The A + T composition at the fourfold degenerate sites of the genes in figure 4
is 67%74% (data not shown), similar to but even higher than that of the introns both in the coding region and the leader sequence. These estimates suggest that maintenance of the difference in base composition between the leader exon and the leader intron may particularly involve selection to reduce the A + T content of the leader exon.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our study was undertaken because a comparison of the PgiC genes of numerous species of Clarkia and one from the related genus Oenothera revealed an apparent discordance: most of the genes have a typical TATA sequence about 80 nt upstream of the start codon, but the sequence was absent or substituted in seven genes that are known to be expressed (shown by electrophoretic analysis of isozymes). We learned that this TATA is actually located in the leader intron and does not play a role in transcription initiation. In C. xantiana we obtained and sequenced the cDNAs of complete transcripts of the 5' leaders and identified the transcription start and a likely TATA box.
The availability of the leader sequences from PgiC1 and PgiC2 of the related species of Clarkia presented an unusual opportunity to compare the evolution of the leader intron and the leader exon with the introns and exons of the coding region. Comparative data for the leader intron of fatty acid desaturase are available from a phylogenetic analysis of the numerous species of Gossypium, but the leader exon and the coding regions were not sequenced (Liu et al. 2001
).
Unlike the splice sites of the introns in the coding region, the splice sites of the leader intron in Clarkia PgiC genes are not perfectly conserved across species. The AG-3' splice site is mutated in C. lewisii PgiC1 and in C. gracilis PgiC2. Probably, the nearest AG downstream is used instead. The AGGT at the 5' splice site in PgiC1 is duplicated in PgiC2. Clarkia xantiana PgiC2 uses the upstream copy of the splice site, but in C. modesta and C. lingulata, where the upstream copy is mutated to CT, the downstream copy is used. Among the other PgiC2 genes examined from genomic data only, several had both copies of the 5' splice site intact and in two the downstream copy was mutated (fig. 4
). These variations show that the evolution of the splice sites of the leader intron is less constrained than that of the introns in the coding region, where no base substitutions, insertions, or deletions involving splice sites are known; however, the entire intron 22 has been deleted from PgiC1 of C. concinna (Gottlieb and Ford 1996
). This relaxed constraint in the leader is presumably permitted because it does not affect the reading frame and the protein sequence.
Three of the five Clarkia PgiC genes examined from 52 RLM-RACE clones exhibited alternative splicing of the leader intron. Clarkia xantiana PgiC1 used either of the two AG-3' sites as did C. modesta PgiC2, but in each case just one 5'-GT site was used. Clarkia arcuata PgiC1 utilized either of the two 5'-GT sites and one AG-3' site. As a consequence, mature transcripts that differ slightly in length and sequence upstream of the start codon of each of these genes are synthesized. It is quite possible that a larger sample of messages would have revealed alternative splicing in the other two genes as well.
The use of variant nearby splice sites to remove a leader intron has also been observed for the waxy gene in different rice cultivars, and the different splice patterns have a large effect on gene expression (Cai et al. 1998
; Isshiki et al. 1998
). But in japonica rice the normal 5' splice site is mutated from GT to TT; thus, the mutation appears to foster the use of other sites for splicing. In the coding region, alternative splicing of introns is generally deleterious and not permitted, except when it serves a specific purpose, for example, bringing together different combinations of exons to produce distinct proteins in different tissues (reviewed in Brown and Simpson 1998
).
Also, the presence of functional and nonfunctional versions of the same transcript may potentially regulate the transcript level and translatability (Lorkovic et al. 2000
).
Our results in PgiC show that variable splicing of the leader intron may occur even when there are no splice site mutations. It is presumably permitted because there is no effect on the reading frame or the resulting protein sequence. Whether the variants are differently regulated remains to be determined.
The recognition of the splice junctions is known to depend on the context of adjacent sequence (Brown and Simpson 1998
). With regard to this, the 21-nt deletion in C. arcuata PgiC1 is interesting because it places an adjacent GT (fig. 3
, position 2) into a sequence context similar to that of the standard splice sites (positions 1a and 1b), thus perhaps permitting it to serve as an alternative splice site.
Both the leader intron and the leader exon accumulate base substitutions and gaps at a rate similar to the introns of the coding region as expected, given the lower functional constraint on the sequence in these noncoding regions. But they show a high degree of conservation relative to the region upstream of the transcription start, presumably a consequence of selection to maintain signals governing intron recognition and splicing, as well as other possible regulatory functions.
The contrast in base composition between T-rich introns and GC-rich exons is known to be important for the recognition of intron splice sites (Brown and Simpson 1998
; Lorkovic et al. 2000
). Although plant introns are usually T-rich throughout their length, with about 15% more T's than the adjacent exons, experiments with TA-deficient introns that are not spliced have shown that the insertion of short T-rich elements anywhere in the intron can restore correct splicing (reviewed in Lorkovic et al. 2000
). Thus, much of the T-richness of introns may be redundant, but the matter is not yet understood. Because plant exons and introns are often short, less than 100 nt, a biologically significant difference in base composition between adjacent introns and exons may not be statistically significant. These experiments involved introns located between coding exons, and indeed all summary data on the intron-exon base composition in plants are based on the introns and exons of the coding regions. But the same splicing machinery is likely employed to excise the leader intron, and this accounts for the fact that a marked difference in base composition between the leader exon and the leader intron is maintained by selection despite the relatively high (intronlike) rate of sequence divergence throughout the leader region.
The peculiarly high proportion of C in the leader exon has not been reported previously in other genes, and its possible significance and generality are unknown. One possibility is that selection to reduce the frequency of T's results in an accumulation of C's simply because of the bias for transitions over transversions, i.e., substitution of a T nucleotide most commonly results in a C. In coding exons this tendency would be countered by selection to maintain the amino acid sequence.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Abbreviations: PgiC, cytosolic isozyme of phosphoglucose isomerase; RLM-RACE, RNA ligasemediated rapid amplification of cDNA ends.
Keywords: alternative splicing
Arabidopsis
base composition
Clarkia
leader intron
PgiC
Address for correspondence and reprints: L. D. Gottlieb, Section of Evolution and Ecology, Division of Biology, University of California, 1 Shields Avenue, Davis, California 95616. E-mail: ldgottlieb{at}ucdavis.edu
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Brown J. W. S., C. G. Simpson, 1998 Splice site selection in plant pre-mRNA splicing Annu. Rev. Plant Physiol. Plant Mol. Biol 49:77-95[ISI]
Cai X.-L., Z.-Y. Wang, Y.-Y. Xing, J.-L. Zhang, M.-M. Hong, 1998 Aberrant splicing of intron 1 leads to the heterogeneous 5'UTR and decreased expression of waxy gene in rice cultivars of intermediate amylose content Plant J 14:459-465[ISI][Medline]
Eckhart V. M., M. A. Geber, 1999 Character variation and geographic distribution of Clarkia xantiana A. Gray (Onagraceae): flowers and phenology distinguish two subspecies Madroño 46:117-125
Filatov D. A., D. Charlesworth, 1999 DNA polymorphism, haplotype structure and balancing selection in the Leavenworthia PgiC locus Genetics 15:1423-1434
Ford V. S., L. D. Gottlieb, 1999 Molecular characterization of PgiC in a tetraploid plant and its diploid relatives Evolution 53:1060-1067[ISI]
. 2002 Single mutations silence PgiC2 genes in two very recent allotetraploid species of Clarkia Evolution 56:699-707[ISI][Medline]
Ford V. S., B. R. Thomas, L. D. Gottlieb, 1995 The same duplication accounts for the PgiC genes in Clarkia xantiana and C. lewisii (Onagraceae) Syst. Bot 20:147-160[ISI]
Fu H., S. Y. Kim, W. D. Park, 1995 High-level tuber expression and sucrose inducibility of a potato Sus4 sucrose synthase gene require 5' and 3' flanking sequences and the leader intron Plant Cell 7:1387-1394
Gottlieb L. D., V. S. Ford, 1996 Phylogenetic relationships among the sections of Clarkia (Onagraceae) inferred from the nucleotide sequences of PgiC Syst. Bot 21:45-62[ISI]
Gottlieb L. D., N. F. Weeden, 1979 Gene duplication and phylogeny in Clarkia Evolution 33:1024-1039[ISI]
Hirano H.-Y., M. Eiguchi, Y. Sano, 1998 A single base change altered the regulation of the waxy gene at the posttranscriptional level during the domestication of rice Mol. Biol. Evol 15:978-987[Abstract]
Isshiki M., K. Morino, M. Nakajima, R. J. Okagaki, S. R. Wessler, T. Izawa, K. Shimamoto, 1998 A naturally occurring functional allele of the rice waxy locus has a GT to TT mutation at the 5' splice site of the first intron Plant J 15:133-138[ISI][Medline]
Isshiki M., M. Nakajima, H. Satoh, K. Shimamoto, 2000 dull: Rice mutants with tissue-specific effects on the splicing of the waxy pre-mRNA Plant J 23:451-460[ISI][Medline]
Kato T., E. Itoh, R. F. Whittier, D. Shibata, 1998 Increase of foreign gene expression in monocot and dicot cells by an intron in the 5' untranslated region of a soybean phosphoenol pyruvate carboxylase gene Biosci. Biotech. Biochem 62:151-153[ISI][Medline]
Kawabe A., K. Yamane, N. T. Miyashita, 2000 DNA polymorphism at the cytosolic phosphoglucose isomerase (PgiC) locus of the wild plant Arabidopsis thaliana Genetics 156:1339-1347
Li W.-H., 1997 Molecular evolution Sinauer Associates, Sunderland, Mass
Liu Q., C. L. Brubaker, A. G. Green, D. R. Marshall, P. J. Sharp, S. P. Singh, 2001 Evolution of the Fad2-1 fatty acid desaturase 5'UTR intron and the molecular systematics of Gossypium (Malvaceae) Am. J. Bot 88:92-102
Liu R., D. Charlesworth, M. Kreitman, 1999 The effect of mating system differences on nucleotide diversity at the phosphoglucose isomerase locus in the plant genus Leavenworthia Genetics 151:343-357
Liu X., M. A. Gorovsky, 1993 Mapping the 5' and 3' ends of Tetrahymena thermophila mRNAS using RNA ligase mediated amplification of cDNA ends (RLM-RACE) Nucleic Acids Res 21:4954-4960[Abstract]
Lorkovic Z. J., D. A. Wieczorek Kirk, M. H. L. Lambermon, W. Filipowicz, 2000 Pre-mRNA splicing in higher plants Trends Plant Sci 5:160-167[ISI][Medline]
Nakano R., T. Matsumura, H. Sakakibara, T. Sugiyama, T. Hase, 1997 Cloning of maize ferredoxin II gene: presence of a unique repetitive nucleotide sequence within an intron found in the 5' untranslated region Plant Cell Physiol 38:1167-1170[ISI][Medline]
Norris S. R., S. E. Meyer, J. Callis, 1993 The intron of Arabidopsis thaliana polyubiquitin genes is conserved in location and is a quantitative determinant of chimeric gene expression Plant Mol. Biol 21:895-906[ISI][Medline]
Okuley J., J. Lightner, K. Feldmann, N. Yadav, E. Lark, J. Browse, 1994 Arabidopsis FAD2 gene encodes the enzyme that is essential for polyunsaturated lipid synthesis Plant Cell 6:147-158
Plesse B., M.-C. Criqui, A. Durr, Y. Parmentier, J. Fleck, P. Genschik, 2001 Effects of the polyubiquitin gene Ubi.U4 leader intron and first ubiquitin monomer on reporter gene expression in Nicotiana tabacum Plant Mol. Biol 45:655-667[ISI][Medline]
Shaw J. R., R. J. Ferl, J. Baier, D. St. Clair, C. Carson, D. R. McCarty, L. C. Hannah, 1994 Structural features of the maize sus1 gene and protein Plant Physiol 106:1659-1665
Terauchi R., T. Terachi, N. T. Miyashita, 1997 DNA polymorphism at the Pgi locus of a wild yam, Dioscorea tokoro Genetics 147:1899-1914
Thomas B. R., V. S. Ford, E. Pichersky, L. D. Gottlieb, 1993 Molecular characterization of duplicate cytosolic phosphoglucose isomerase genes in Clarkia and comparison to the single gene in Arabidopsis Genetics 135:895-905
Thomas B. R., D. Laudencia-Chingcuanco, L. D. Gottlieb, 1992 Molecular analysis of the plant gene encoding cytosolic phosphoglucose isomerase Plant Mol. Biol 19:745-757[ISI][Medline]
Weeden N. F., L. D. Gottlieb, 1979 Distinguishing allozymes and isozymes of phosphoglucose isomerase by electrophoretic comparison of pollen and somatic tissues Biochem. Genet 17:287-296[ISI][Medline]
Werr R., W.-B. Frommer, C. Maas, P. Starlinger, 1985 Structure of the sucrose synthase gene on chromosome 9 of Zea mays L EMBO J 4:1373-1380[ISI]