(Received for publication, September 28, 1994; and in revised form, November 18, 1994)
From the
A single mouse genomic locus encodes proteins catalyzing three steps of purine synthesis, glycinamide ribonucleotide synthetase (GARS), aminoimidazole ribonucleotide synthetase (AIRS), and glycinamide ribonucleotide formyltransferase (GART). This gene has 22 exons and spans 28 kilobases. The existence of a second genetic locus and closely related pseudogenes was ruled out by Southern analysis. Mouse tissues express two related classes of messages encoded by this single locus: a trifunctional GARS-AIRS-GART mRNA and a monofunctional GARS mRNA. These transcripts used the same set of multiple transcriptional start sites, and both used the same first 10 exons. CCAAT and TATA elements were not found for this locus. Exon 11, which represented the last coding sequence of the GARS domain, was differentially utilized for the two messages. The trifunctional mRNA was generated by splicing exon 11 to exon 12, the first coding sequence for the AIRS domain with subsequent use of a polyadenylation signal at the end of exon 22. Genomic sequence corresponding to the 3`-UTR of the monofunctional GARS mRNA was contiguous with exon 11, so that the smaller message arose from the recognition of one of the multiple polyadenylation signals present within the intron between exons 11 and 12. Hence, polyadenylation of the primary transcript at a position corresponding to an intron of the genomic locus was responsible for the generation of the monofunctional GARS class of mRNAs. This utilization of an intronic polyadenylation site without alternative exon usage is comparable to the mechanism whereby both secreted and membrane-bound forms of the immunoglobulin µ heavy chain are made from a single genetic locus.
The enzyme activities catalyzing the second, third, and fifth
steps in de novo purine synthesis, i.e. glycinamide
ribonucleotide synthetase (GARS), ()glycinamide
ribonucleotide formyltransferase (GART), and aminoimidazole
ribonucleotide synthetase (AIRS), respectively, are present as a single
110-kDa trifunctional protein in those vertebrate species studied
to date(1, 2) . The mouse cDNA for the trifunctional
GARS-AIRS-GART was previously isolated in this laboratory by initial
screening of a mouse expression library using a polyclonal antibody
generated against the 110-kDa protein(3) . Using one of the
cDNA clones obtained from this initial screen as a probe, a second
class of mouse cDNAs encoding a monofunctional GARS whose coding region
had the identical sequence with that of the GARS domain of the mouse
trifunctional cDNA, but a completely different 3`-untranslated region
(UTR) was isolated. In agreement with the sequence of this GARS cDNA, a
protein with the molecular weight predicted for a monofunctional GARS,
in addition to the trifunctional GARS-AIRS-GART, was detected in mouse
L1210 cells by Western blot. These two proteins were subsequently
separated, and both protein fractions were shown to have GARS enzymatic
activity. Two cDNAs for monofunctional GARS and trifunctional
GARS-AIRS-GART have also been reported in chicken and human
tissue(4) . These enzymatic activities are present in bacteria
as three single-domain
proteins(5, 6, 7, 8) . Drosophila melanogaster and Drosophila pseudoobscura also express two mRNAs encompassing these three enzyme activities,
one encoding a trifunctional GARS-AIRS-AIRS-GART, with an apparent
endoduplication of the AIRS domain, and the other corresponding to only
the GARS domain(9) . These two mRNAs have been reported to be
encoded by a single gene in Drosophila(10) . Hence,
although higher eukaryotes have evolved a presumably more efficient or
more favorable trifunctional protein combining a GARS active site with
peptides that form the active sites for two subsequent pathway
reactions, a catalytically active monofunctional GARS has been shown to
be present in mouse(3) ; all three vertebrate species studied
to date express the monofunctional GARS mRNA(3, 4) .
The reason for this apparent functional redundancy remains unclear.
We now report the organization of the region of the mouse genome corresponding to these mRNAs. Our evidence demonstrates that both the trifunctional GARS-AIRS-GART and the monofunctional GARS are encoded by a single 28-kb genomic locus. The detailed structure of the gene indicates that the monofunctional GARS mRNA is generated by alternative processing of the intron separating the last coding domain for GARS and the first exon of the AIRS domain of the trifunctional transcript, and that this occurs without the use of alternative exons. Multiple polyadenylation signals in intron 11 are used to generate the monofunctional GARS mRNA, and both transcripts used the same set of multiple transcriptional start sites.
Figure 2: Schematic representation of overlapping mouse genomic clones for GARS-AIRS-GART, restriction map and exon location. A, four cDNA probes, pK4MF0.8, pK4MF1.0, pYTF1.0, and pYTF0.7, corresponding to the monofunctional GARS and the trifunctional GARS-AIRS-GART cDNA, were used to determine the overlap between the four genomic clones. The positions of oligonucleotide and cDNA probes that are described in the text are indicated. B, the EcoRI, HindIII, and BamHI sites were mapped by Southern analysis using four cDNA probes (pK4MF0.8, pK4MF1.0, pYTF1.0, and pYTF0.7 in A) and four oligonucleotides (P(R), JK9, JK20, and JK1R). The HindIII site indicated by the asterisk (*) is an RFLP for the mouse strains, CBA and C57 BL6. The genomic fragments that were subcloned into pBluescript SK II(+) are noted below the restriction map. These subclones were used for sequencing analysis to determine exon size and location within the genomic locus.
Figure 1:
Northern analysis of monofunctional
GARS and trifunctional GARS-AIRS-GART in mouse tissues and in mouse
L1210 leukemic cells. The blots were hybridized with a radiolabeled
probe corresponding to the GARS domain (probe pQ0.8, Fig. 2A) (A) and to the AIRS and GART domains
(probe pW1.15T, Fig. 2A) (B). The GARS domain
probe hybridized to two messages of 3.4 and 1.7-1.9 kb, and the
AIRS-GART probe only hybridized with the 3.4-kb message. In mouse
liver, two distinct messages of approximately 1.7 to 1.9 kb are
detected. Note the different amounts of poly(A) RNA
per lane; the film was exposed for 36 h.
Restriction maps were
constructed using endonucleases EcoRI, BamHI, and HindIII for four of the genomic clones (Fig. 2B), but three clones (LC1,
LC6, and
LC5) were sufficient to encompass the entire genomic locus (Fig. 2A).
LC1 contained sequence corresponding to
most of the GARS domain, the first domain of the trifunctional
GARS-AIRS-GART cDNA, and had a common restriction map with
LC7
over 13 kb.
LC6 had a 3.0-kb overlap with
LC1 and had a
sequence corresponding to the most 3` end of the GARS domain, all of
the AIRS domain, and most of the GART domain.
LC5 had a 6-kb
overlap with
LC6 and contained coding sequence corresponding to
the most 3` end of the AIRS domain and all of the GART domain. The
genomic locus was found to span 28 kb.
Seventeen genomic restriction fragments that spanned the entire genomic locus were subcloned into pBluescript SK II(+) to allow a more detailed analysis of the intron/exon organization of this gene. The sizes of the exons and junctional sequences were determined by limited sequence analysis of the subcloned fragments first using oligonucleotides from the GARS-AIRS-GART sequence known to hybridize with the various subcloned fragments, followed by a second round of sequencing extending intron primers adjacent to the exon back into and through each exon (see ``Materials and Methods''). The sequence obtained was compared with that of the previously isolated mouse cDNAs(3) ; divergence from the cDNA and match to consensus splice junction sequence (16) were used as criteria for identification of the 5` and the 3` splice sites of each exon. The exons ranged in size from 69 to 342 nt (Table 1), and the genomic locus was comprised of 22 exons. Exon 2 contained the sequence for the start methionine for both classes of RNAs and, along with exons 3 to 11, encoded the GARS domain for both GARS-AIRS-GART and the monofunctional GARS. Exons 12 to 17 encoded AIRS with exon 18 having a sequence corresponding to the end of the AIRS domain and the beginning of the GART domain. Exons 19 to 22 encoded GART and the 3`-UTR of GARS-AIRS-GART. The beginning of the genomic GART domain was clearly identifiable from the high conservation of the N-terminal amino acid sequence of yeast and Escherichia coli monofunctional GART cDNAs with the mouse cDNA GART domain(3, 8, 17) . Exon 1 in the genomic map (Fig. 2B) had sequence identity with that of the initial 5`-UTR region for the monofunctional GARS cDNA isolated from cDNA libraries(3) . Because the cDNA for the 5`-UTR of GARS-AIRS-GART was not found in cDNA libraries, it was not clear whether this first exon was utilized by both classes of mRNAs. Alternative transcriptional start sites and different first exons might be responsible for the generation of these two classes of messages from the same gene. This was not the case (see below).
Figure 3:
Use of PCR for sizing the mouse GARS-AIRS-GART gene. Primer pairs between adjacent exons were
used for PCR against the genomic clones. Lanes 1 and 2 represent primer pairs between exon 1 to exon 2 and exon 2
to exon 3, respectively; lane 3, exons 2-4; lane
4, exons 4-5; lane 5, exons 5-6; lane
6, exons 6-7; lane 7, exons 7-8; lane
8, exons 8-9; lane 9, exons 8-10; lane
10, exons 10-11; lane 11, exons 11-12; lane 12, exons 14-15; lane 13, exons
15-17; lane 14, exons 17-18; lane 15,
exons 18-19; lane 16, exons 19-20; and lane
17, exons 19-21. The standards are
DNA restricted with HindIII and ØX restricted with HaeIII (M) (Life Technologies, Inc.).
Figure 4:
Restriction enzyme analysis of mouse
genomic DNA and isolated genomic clones defining the GARS-AIRS-GART locus. Cloned DNAs (0.01 µg) (
LC1, lanes 1 and 5;
LC6, lanes 2 and 6; and
LC5, lanes 3 and 7) or CBA mouse
genomic DNA (10 µg, lanes 4 and 8) were cleaved
with EcoRI (lanes 1-4) or HindIII (lanes 5-8), Southern-blotted, and hybridized with a
mixed probe corresponding to the entire cDNA for trifunctional
GARS-AIRS-GART. The hybridizing EcoRI and HindIII
restriction fragments found in the genomic DNA are also found in the
overlapping clones. Note that there were no restriction fragments found
in the CBA genomic DNA that were not represented in these three
clones either as an identically sized fragment or as a fragment
attached to the
arms, specifically the 2.3-kb fragment in the EcoRI digest and the 4.4-kb band in the HindIII
digest.
Figure 5: Schematic representation of three models for the generation of both the monofunctional GARS and trifunctional GARS-AIRS-GART transcripts from a single mouse locus. The boxes represent exons which are separated by introns (line). The splicing used to generate the trifunctional GARS-AIRS-GART (solid line above the intron) and the monofunctional GARS (dashed line below the intron) are shown. Exon 11 (solid box) represents the exon containing the last of the GARS coding domain, and exon 12 (striped box) represents the beginning of the AIRS coding domain. The 3`-UTR for the monofunctional transcript is indicated by the shaded box. A, both transcripts utilize the same exon 11 with the trifunctional transcript generated by splicing of exon 11 to exon 12 and the monofunctional transcript by the recognition that polyadenylation signals in intronic sequence are contiguous with exon 11. B, the two transcripts are generated by alternative splicing of two exons 11; 11A for monofunctional transcript that contains a contiguous 3`-UTR and 11B for trifunctional transcript. C, the two transcripts are produced by usage of the same exon 11 but alternative splicing to either the 3`-UTR of monofunctional or to exon 12 for the trifunctional.
To determine the mechanism for generation of the two classes of
mRNAs, two oligonucleotides, JK9 corresponding to the GARS coding
domain (amino acids 410-416) and JK10R corresponding to the
antisense strand of the 3`-UTR of monofunctional GARS (nt
1724-1743) (see Fig. 2A), were used for Southern
blot analysis of LC6. Both oligonucleotides hybridized to the same
restriction fragments for DNA cut with BamHI, HindIII, and EcoRI. The smallest hybridizing
fragments, a 3.9-kb HindIII-EcoRI and a 4.4-kb EcoRI fragment, were subcloned into pBluescript SK II(+) (Fig. 2B). Sequence analysis was performed on the two
subclones using oligonucleotides JK9 and JK11R to ascertain their
relative positions with respect to each other and for comparison of
genomic sequence with the 3`-UTR of monofunctional GARS cDNA.
Double-stranded DNA sequencing of these genomic subclones primed from
oligonucleotide JK9 gave clear sequence of the last GARS coding domain
(amino acids 419-433) along with the consensus 5` splice donor
site which contained the stop codon of monofunctional GARS and which
ran immediately into a sequence identical with the 3`-UTR of
monofunctional GARS including that representing the oligonucleotide
JK10R (Fig. 6). Using JK10R as a primer, the confirmatory
antisense sequence was obtained including a sequence representing oligo
JK9. These results indicated that the 3`-UTR of monofunctional GARS was
immediately adjacent to the last coding region for GARS in the mouse
genome. Thus, splicing of an exon containing only the 3`-UTR of
monofunctional GARS (Fig. 5C) was not the case, but two
alternative mechanisms could not be definitively distinguished (Fig. 5, A and B). Because the coding region
of GARS for both classes of messages was identical, it was unlikely
that there would be two alternative exons (11A and 11B in Fig. 5B) representing the last 232 nt of identical GARS
coding domain, as evolutionary drift should have resulted in some
sequence differences between the two classes of mRNAs. Detailed
Southern analysis of the region of
LC6 which separates the
penultimate exon of the GARS domain (exon 10) and the first coding
region of the AIRS domain (exon 12) demonstrated that both the
oligonucleotide probes JK9 and JK10R hybridized to the same restriction
fragments. Sequence analysis of the smallest hybridizing fragment, a
1.0-kb RsaI-StyI fragment, indicated the presence of
only one copy of exon 11 (data not shown). Therefore, we rule out the
possibility of two alternative nearly identical exons and conclude that
the monofunctional GARS mRNA results from alternative processing of the
5` splice junction (Fig. 5A).
Figure 6:
Nucleotide sequence of the GARS-AIRS-GART
exon 11 and adjacent genomic intron sequences. Sequence of exon 11
obtained from double-stranded sequencing of subcloned restriction
fragments, 6RR4.4 and
6RR3.9 (see Fig. 2B).
The taa stop codon for the monofunctional GARS transcript and the five
potential polyadenylation signals are underlined. The three
polyadenylation signals that are used for polyadenylation as identified
by 3`-RACE are indicated by roman numerals (I, II, III), and the sites of cleavage for addition of the
polyadenylation tail are noted (
). For polyadenylation site I
(3/19 clones analyzed), three different sites for cleavage were
identified; for site II (11/19 clones), the major class of clones was
identified; for site III (2/19 clones), the cleavage was identified by
both 3`-RACE and by the monofunctional GARS cDNA previously isolated.
The putative pausing signal is boxed.
Of the 19 clones analyzed,
three polyadenylation signals were found to be utilized, two of which
had the more common consensus polyadenylation signal sequence of AATAAA
(which has been reported to be used in 86% of genes surveyed, (18) ), and one of which had the less common consensus sequence
of ATTAAA (present in 12% of genes previously surveyed) (Fig. 6). Of the two AATAAA sequences recognized, one (site III
in Fig. 6) was already known to be used in the cDNA previously
isolated (3) and the other (site I) allowed cleavage of the
hnRNA at one of three closely spaced positions for subsequent addition
of the poly(A) tail. Only one cleavage site was found immediately
downstream of two overlapping polyadenylation signals identified
jointly as site III in Fig. 6. The predominant polyadenylation
signal that was used and identified by this 3`-RACE experiment was the
ATTAAA (site II). The presence of more than one polyadenylation signal
and the ability to use several of these polyadenylation signals appears
to represent a redundant mechanism. This suggests that the
monofunctional GARS plays an important role which cannot be sufficed by
the GARS domain of the trifunctional GARS-AIRS-GART. The site of
cleavage for poly(A) addition for all three polyadenylation signals
used, a GT-like tract was found in proximity to all of the utilized
poly(A) sites; such a motif is important for 3` end formation and
cleavage (reviewed in (19) ). In addition, examination of the
sequence in the intron downstream of the last polyadenylation site for
monofunctional GARS mRNA revealed a 29-nucleotide poly(T) tract (Fig. 6). This poly(T) tract was separate from the GT-like
tract. Similar poly(T) tracts present in the first intron of
c-myc, c-myb, and c-fos(20, 21, 22) have been implicated in
premature transcript arrest, subsequent generation of abbreviated
transcripts, and control of message levels for those genes. There is a
precedent that such a mechanism would allow intronic polyadenylation: a
synthetic poly(A) site placed in intron 2 of the -globin gene
could be recognized efficiently only if the upstream 5` splice donor
site was deleted or when a pause site was placed downstream of the
synthetic poly(A) site (23) . However, we note that a
functional mRNA is generated, by utilization of an intronic
polyadenylation signal in the mouse GARS-AIRS-GART gene (Fig. 6), whereas this is not the case for c-myc,
c-myb, and c-fos.
Figure 7:
Nucleotide sequence of exon 1 of
GARS-AIRS-GART and adjacent genomic sequence. The sequence of this exon
was obtained by double stranded sequencing of subcloned restriction
fragment, 1BB3.0 (see Fig. 2B). The start sites
identified by 5`-RACE for the monofunctional and trifunctional
transcripts (
) and by primer extension analysis (
) are
noted. The potential GC boxes are underlined. The prevalent
start site identified by 5`-RACE was used to denote nucleotide position
+1 for numbering of the exons (upper case) with the promoter
region and a portion of intron 1 shown (lower case
letters).
Figure 8: Southern analysis of the HindIII RFLP. Mouse genomic DNA (10 µg) from CBA, C57 BL6, and the F1 hybrid were restricted with HindIII. The DNA was separated on a 1% agarose gel, blotted, and probed with the cDNAs pYTF1.0 and pYTF0.7 (see Fig. 2A). The additional HindIII site which resulted in restriction fragments of 3.9 and 2.2 kb is seen in both the CBA strain and the F1 hybrid, but not in the C57 BL6 mouse strain.
The exonic sequence for this gene differed from
the sequence of the cDNA that we previously published (3) by
several single nucleotide changes. Almost all of these polymorphisms
were in the third nucleotide of the codon and did not result in an
amino acid change. One of these polymorphisms (an A in the cDNA to a G
transition in the genomic DNA sequence at the second nucleotide of the
codon) resulted in Gly instead of the previously reported
Asp
present in the mouse cDNA. All of the polymorphisms
identified are noted in Genbank.
We show that there is only one mouse genomic locus, spanning 28 kb, responsible for the generation of two related mRNAs, a monofunctional GARS mRNA and a trifunctional GARS-AIRS-GART. The sequence and relative location of all 22 exons of this gene and the intron sequence adjacent to these exons was determined. The first 11 exons of this gene encode the GARS domain and are used by both mRNAs. Exons 12 to 17 contain sequence for the AIRS domain of the trifunctional mRNA, exon 18 contains sequence for the end of the AIRS domain and the beginning of the GART domain, and exons 19 to 22 comprise the rest of the GART domain and the 3`-UTR of the trifunctional transcript. The mechanism used to generate the mature GARS and GARS-AIRS-GART transcripts does not involve alternative exon usage. Instead, both mature transcripts use the same terminal GARS exon (exon 11), with splicing to the next downstream exon (exon 12) generating the trifunctional GARS-AIRS-GART transcript, while 3` end formation and cleavage-polyadenylation within intron 11 of this gene generates the monofunctional transcript (Fig. 5A). The sequence of both the 3`-UTR of the monofunctional GARS cDNA and the corresponding location within intron 11 indicated the presence of multiple polyadenylation signals (four AATAAAs and one ATTAAA); at least three of the five can be recognized and used for polyadenylation. In addition, both transcripts use the same set of multiple transcriptional start sites, and, hence, the generation of the two classes of transcripts is unlikely to be regulated by different promoter elements in mouse L1210 cells.
Whereas both transcripts use the same first 10 exons, removal of intron 11 by splicing competes directly with the recognition of the polyadenylation signals in intron 11. The ratio of the trifunctional to monofunctional mRNAs indicated that splicing predominated over polyadenylation in this alternative processing in spite of some substantial differences in the level of total expression between, for instance, L1210 cells and mouse liver (Fig. 1). This ratio also did not change appreciably, even when the levels of the trifunctional mRNA were down-regulated by contact inhibition of the growth of 10 T1/2 mouse embryo fibroblasts (data not shown). In all mouse tissues examined to date, the smaller message was always found. Hence, it appears that the GARS and the GARS-AIRS-GART mRNAs are coordinately expressed regardless of the level of transcription of this gene, at least in the adult tissues and cell lines we have examined.
A few other cases are known in which an
intronic polyadenylation signal is used without alternative exon usage:
the immunoglobulin heavy chain genes(26) , the thyroid hormone
receptor (c-erb-A-1)(27) ,
spectrin(28) , and the 2`-5`-oligo(A) synthetase
gene(29, 30) . In these cases, the production of the
different forms of mRNA by controlled use of an intronic
polyadenylation site constitutes a mechanism of tissue-specific or
developmentally regulated gene expression. In the well-characterized
immunoglobulin µ heavy chain gene, the recognition and use of the
intronic polyadenylation signal is developmentally regulated as a part
of the process of B cell maturation(31) . Thus, in plasma
cells, the prevalent form is the secreted (µ
) form
produced by recognition of the polyadenylation signal in intron 4, and
the membrane form (µ
), predominant in B cells, is
generated by splicing out intron 4 and use of the polyadenylation
signal at the end of exon 6. Several studies on regulation of
µ
and µ
forms of immunoglobulin genes
indicate that developmentally regulated trans-acting factors
are involved (31, 32, 33) . The similarity of
this aspect of alternative processing of the immunoglobulin heavy chain
gene to that of the GART locus is striking, yet, so far, we do
not have evidence for the modulation of the intronic polyadenylation
mechanism as a control feature to regulate the two transcripts from the GART gene.
An analysis of the common features between the immunoglobulin heavy chain and the GART gene might permit insight into the underlying mechanism responsible for recognition of intronic polyadenylation sites. In previous studies on the immunoglobulin µ heavy chain gene, it was found that the sequence at the 5` splice donor site immediately preceding the site of intronic polyadenylation differed from the consensus sequence usually found at this position in that the nucleotide found at the +5 position of the intron was an A. In an extensive compilation of sequence homology around splice sites(16) , the nucleotide normally found at this position is a G, which is present in 87% of cases; A is present at this position in only 7% of analyzed genes. The centrality of this A to the mechanism of intronic polyadenylation is further suggested by its conservation in six surveyed mammalian and vertebrate species(31) . When this A was changed to a G in the mouse heavy chain gene, splicing predominated over intronic polyadenylation, even in cells that would have otherwise preferentially used the intronic polyadenylation signal(31) . The nucleotide used at this position of the 5` splice site of the 11th intron in the mouse GART gene is also an A (Fig. 6), suggesting that the immunoglobulin heavy chain and the GART genes have common steps in the mechanism used to generate a transcript by intronic polyadenylation.
Both a monofunctional GARS mRNA and a trifunctional message occur in chicken, human, D. melanogaster, and D. pseudoobscura (Fig. 1, Refs. 4, 9, and 34). The Drosophila GART locus has been cloned and sequenced in its entirety. Even though the Drosophila GART locus contains 7 exons compared to the 22 exons comprising the mouse locus, the position of three of the introns found in the Drosophila GART gene are exactly conserved in the mouse GART gene. Thus, introns 2, 3, and 5 in the Drosophila GART locus correspond to the position of introns 4, 10, and 14 in the mouse gene, respectively (Fig. 9). Intron 2 in Drosophila and intron 4 in the mouse gene interrupt the codon for a homologous serine (amino acid 142 in Drosophila and 149 in the mouse protein) both between the second and third nucleotides of the codon. Introns 3 and 5 in the Drosophila gene and introns 10 and 14 in the mouse GART locus interrupt the codons for homologous glycines; both sets of homologous introns again interrupt the codon between the second and third nucleotides. Hence, the position of these ancestral introns has been precisely conserved during the substantial divergent evolution between these two species. In addition, the intron involved in generation of the monofunctional GARS (Drosophila intron 4 and mouse intron 11) was conserved and interrupted the codon between the second and third nucleotides, occurring before the first AIRS coding region in both cases. However, the amino acid that is interrupted has diverged (arginine in mouse and isoleucine in Drosophila).
Figure 9: Conservation of the positioning of three introns between Drosophila and mouse GART loci. The sequence of the exons (upper case letters) surrounding the intron position (lower case letters) are shown. The amino acid interrupted by the intron is conserved between the two species. Note that the position of interruption within the codon is also the same between the two species.
In both Drosophila and mouse, the monofunctional GARS message contains an in-frame stop codon consisting of the TAA present as part of the 5` splice donor site. Thus, there are no additional amino acids which are added to the monofunctional GARS protein relative to the GARS domain present in the trifunctional GARS-AIRS-GART protein in both species. We have previously presented evidence that the monofunctional mouse GARS mRNA is translated and that this protein has GARS catalytic activity. The role of this monofunctional GARS protein is unknown, and its presence in mouse tissues is somewhat surprising in view of the development of a fused trifunctional GARS-AIRS-GART during evolution, presumably as a result of a selective advantage lent by substrate channeling or some related effect. It would seem that, if the monofunctional GARS did not have a function and was merely an evolutionary remnant, the intronic sequences causative of the origin of the monofunctional GARS mRNA would have been lost. The opposite is the case: the mouse 3`-UTR for the monofunctional message contains five polyadenylation signals, of which at least three are used, whereas the Drosophila GART gene contains only one polyadenylation signal in the intron immediately downstream from the GARS domains. Hence, not only was the Drosophila polyadenylation signal maintained during evolution, but these additional features appeared in the mouse gene, apparently to ensure the generation of a monofunctional mRNA and a catalytically active GARS protein. At the moment, we conclude only that the monofunctional GARS has an essential function which cannot be fulfilled by the GARS domain of the trifunctional protein.
We conclude that several aspects of the GARS-AIRS-GART gene would define it as a typical housekeeping gene, in particular, a ubiquitous expression pattern, and the patterns of promoter elements and multiple start sites found. However, the polyfunctionality of one of the encoded proteins, the careful conservation of this function during evolution, the use of multiple intronic polyadenylation sites as a means of generating an alternate transcript, and the coordinate up-regulation of both sets of transcripts in dividing cells all suggest it to be an intriguing example of higher order control of gene expression. This gene could serve as a valuable model system for the study of the mechanism of competition between splicing and cleavage/polyadenylation in an endogenous transcript in mammalian cells and aid in identification of protein factors required for these two processes. Unlike the immunoglobulin heavy chain gene, the mouse GARS-AIRS-GART locus does not have the added complications of complex gene rearrangements.
Note Added in Proof-A polymorphism was found in half of the 5`-RACE clones in which an additional CAG was present in these cDNAs at the position equivalent to the end of exon 1, resulting in a trinucleotide repeat (not shown in Fig. 7).
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U18418[GenBank].