(Received for publication, May 4, 1995; and in revised form, June 29, 1995)
From the
Approximately 25% of Caenorhabditis elegans genes are
organized as operons. Polycistronic transcripts are converted to
monocistronic mRNAs by 3` cleavage/polyadenylation and 5` trans-splicing with untranslated, 5`-terminal exons called
spliced leaders, (SLs). The 5` termini of mRNAs encoded by downstream
genes in operons are acceptors for 7 recently discovered
``novel'' SLs and a classical SL (SL2). Diversity in SL exons
is now partly explained by the discovery and characterization of five
novel genes that encode C. elegans SL RNAs. These novel SL
RNAs contain a 22- or 23-nucleotide SL followed by conserved splice
donor and downstream sequences that are essential for catalysis of trans-splicing reactions. The SL3
, SL4, and SL5 RNA genes
are tightly clustered on chromosome III; their 114-nucleotide
transcripts deliver three distinct SLs to mRNAs. The SL3
and
SL3
RNA genes are on chromosome I, but are not tightly linked. SL
RNAs 3
, 3
, and 3
provide identical 5` leader exons,
although their 3` sequences diverge. Transcription of SL 3-5 RNA
genes appears to be driven by flanking DNA elements that are homologous
with segments of promoters for the C. elegans SL2 RNA and
small nuclear RNA genes. RNase protection assays demonstrated that
novel SL RNAs are transcribed in vivo and accumulate in the
poly(A
) RNA pool. SL3 exons are transferred to mRNAs
as frequently as SL2 exons. In contrast, SL4 is appended to mRNAs 10%
as frequently as SL3. The abundance of SL4 RNA increased 6-fold during
postembryonic development, and the SL4 RNA gene promoter is active
principally in hypodermal cells.
The free living nematode Caenorhabditis elegans excises
noncoding segments of gene transcripts by three processes. Internal
introns are apparently removed by a conserved classical cis-splicing mechanism that involves snRNPs ()(U1-U6), a branched intermediate and auxiliary protein
factors(1, 2, 3) . Approximately 70% of C. elegans mRNAs are covalently modified at their 5` termini
by the addition of a 22-nt leader sequence (4) named the
``spliced leader'' or SL. SLs are attached in trans-splicing reactions(5, 6, 7) .
The 5` SL that appears most frequently on C. elegans mRNAs
corresponds to nucleotides 1-22 of the product of the SL1 RNA
gene(5, 8) . This gene is reiterated in tandem
100 times on chromosome V and encodes an RNA composed of
95
nt(8) . C. elegans transcripts that (a)
contain a consensus splice acceptor sequence (UUUCAG) upstream from the
coding region and (b) lack a corresponding 5` splice donor
sequence, are targets for trans-splicing(9) . Most of
the spliceosomal components required for cis-splicing are also
essential for trans-splicing(10) . Key differences are
as follows: (a) an SL1 (or SL2; see below) RNA complexed with
proteins is included in the splicing complex; (b) SL RNA is
consumed in each round of trans-splicing, thereby
necessitating reloading or reassembly of the spliceosome for subsequent
catalysis; and (c) SL RNA provides a trimethylguanosine cap
for mature mRNAs(11, 12) . The 22-nt leader derived
from SL1 RNA is usually appended to the 5` end of mRNAs derived from
typical genes, in which transcribed sequences are preceded by
contiguous promoter/enhancer regions.
Unlike many other eukaryotes, C. elegans contains numerous genes that are organized as
operons(4, 13) . Thus, transcription of two (or more)
structural genes is sometimes driven by a unique 5` promoter/enhancer
region. Polycistronic transcripts are converted to monocistronic mRNAs
by a combination of cleavage, polyadenylation, and trans-splicing(13, 14, 15) .
Messenger RNAs encoded by downstream genes in operons receive leader
sequences that differ from SL1(13, 16) . Initially,
two highly homologous SL RNA genes (SL2 and SL2
), which
encode identical 22-nt SLs were identified as sources of these 5`
termini(16) .
An early compilation of trans-spliced
cDNA sequences suggested that C. elegans mRNAs received SL1 or
SL2 in a mutually exclusive manner(16) . However, recent
applications of a reverse transcriptase-anchored PCR procedure (17) revealed that the 5` ends of mRNAs encoding a sex
determination factor (TRA-2; (18) ), protein kinase
C1A(19) , and the subunit of casein kinase II (
)(20) are modified with multiple
``novel'' SLs. Thus, our knowledge of SL RNA genes is
incomplete, and multiple pertinent questions regarding the origin and
utilization of the novel SLs can be asked. Are the novel SLs more
closely related to SL1 or SL2? Are the structures of the novel SL RNA
genes similar to the previously described SL1 and SL2 RNA genes or
different? Are the novel SL RNA genes clustered or scattered in the
genome? What are the sizes of the pools of novel SL RNA transcripts
relative to the levels of SL1 and SL2 RNAs? To what degree are the
novel SLs utilized on mRNAs in vivo? Is the expression of any
of the novel SL RNA genes regulated?
To address these questions we
have cloned and characterized five novel SL RNA genes. The genes can be
either clustered or isolated, and they encode homologous RNA sequences
of 114 nt. The novel, 5` SL sequences are more closely related to
SL2 and SL2
than SL1. Although the relative abundance of the
novel SL RNAs is low when compared with steady-state SL1 and SL2 RNA
concentrations, the utilization of certain novel 5` SL sequences on
mRNAs occurs with a substantial frequency. Finally, expression of the
novel SL4 RNA gene is developmentally regulated, and SL4 RNA
transcripts are detected only in hypodermal cells.
Figure 1:
Multiple,
nonclassical SLs are incorporated at the 5` termini of certain C.
elegans cDNAs. Primary structures of nonclassical SLs were
determined by the direct sequencing of amplified
cDNAs(17, 18, 19) . The cDNAs contained
complete copies of the 5` termini of C. elegans mRNAs encoding
protein kinase C1A, TRA-2, and casein kinase II (18, 19, 20) .
Only the SL
sequences are shown. SLs a-d were appended to protein kinase C1A
mRNA; SLs b-f were present at the 5` end of tra-2 mRNA;
SLg was found at the 5` terminus of cDNA encoding the
subunit of
casein kinase II. The novel SLs are aligned with the classical C.
elegans spliced leader sequences SL1 and
SL2(5, 16) . Novel SLs marked with an asterisk are evident in both protein kinase C1A and tra-2 cDNAs. A
distinctive dinucleotide sequence that is conserved in all of the novel
SLs is shown in boldfacetype. The lengths (in
nucleotides) of the SLs are given in parenthesis.
Plasmids were linearized by
digestion with SpeI, and P-labeled antisense SL
RNA probes (
200 nt) were synthesized via SP6 RNA polymerase and
purified as described previously (20, 22) . Control
(sense) probes were generated by T7 RNA polymerase after cleaving with PvuII. Hybridization of
P-labeled antisense (or
sense) probes (10
cpm) with C. elegans total RNA
and subsequent digestion of single-stranded RNA with RNase T
and RNase A was performed as previously reported(20) .
Protected duplex RNA was denatured and subjected to electrophoresis in
a 6% polyacrylamide gel containing 7 M urea. Protected,
P-labeled complementary SL RNA (
115 nt) was
visualized by autoradiography on XAR-5 film and quantified by
PhosphorImager (Molecular Dynamics) analysis.
The abundance of various SL sequences in
poly(A) RNA was determined by RNase protection
analysis as described above for poly(A
) RNA.
P-labeled antisense RNA probes (5
10
cpm) that complement only the SL sequence were employed. Probes
were generated by the method of Milligan et al.(32) using T7 RNA polymerase (>1000 units/ml) provided
by Heike Pelka (Department of Molecular Developmental Biology, Albert
Einstein College of Medicine). Reaction products (22- or 23-nt SL plus
12 or 13 irrelevant nt) were analyzed on a 10% polyacrylamide, 7 M urea gel to verify their lengths and purity. The 35-nt probes were
readily distinguished from the 22/23-nt protected fragments in RNase
protection assays.
Figure 3:
Sequence of a segment of C. elegans DNA that contains the SL3, SL4, and SL5 RNA genes. The
spliced leader sequences SL3 (inverse complement of nucleotides
933-954), SL5 (nucleotides 1790-1812), and SL4 (nucleotides
2691-2712) are shown in boldfacetype.
Sequences of the SL RNA genes are underlined. Restriction
enzyme sites (ClaI, SacI, and XbaI, in
ascending order) used to clone the individual SL RNA genes and their
associated 5`- and 3`-flanking regions are indicated with boldface
italic type . The 152-nt inverted repeats that flank the gene
cluster are shown in lowercase letters.
The pPD16.51 expression plasmid (33) was cleaved at a unique SacI site that precedes the lacZ gene. An
oligonucleotide primer was designed to initiate Taq DNA
polymerase-catalyzed DNA synthesis 2 kbp downstream from the lacZ transcription initiation site. DNA synthesis progressed toward the
5` end of the lacZ gene, yielding a 2-kilobase
single-stranded, antisense DNA probe. Primer (30 ng) and 200 ng of
template DNA were added to a reaction mixture containing 0.5 M KCl, 0.01 M Tris-HCl, pH 8.3, 1.5 mM MgCl, 0.001% gelatin, 0.2 mM dATP, 0.2 mM dCTP, 0.2 mM dGTP, and 0.13 mM dTTP.
Digoxigenin-11-dUTP was added to a final concentration of 70
µM. The reaction mixture (25 µl) was heated at 100
°C for 3 min and then was incubated in a thermal cycler under the
following conditions: denaturation at 95 °C for 45 s; annealing at
55 °C for 30 s; and synthesis at 72 °C for 90 s. After 35
cycles, single-stranded DNA was precipitated with ethanol and dissolved
in 0.3 ml of buffer B. The probe was boiled for 10 min and an aliquot
(6 µl) was added to a suspension (20 µl) of fixed nematodes in
buffer B. Hybridization of probe with lacZ RNA derived from
the chimeric reporter genes (see above) was carried out at 48 °C
for 16 h. Subsequently, C. elegans were washed serially with
1-ml aliquots of PBS, 0.1% Tween 20 containing 80, 60, 40, 20, and 0%
buffer B at 22 °C. C. elegans were then washed with 0.1%
(w/v) albumin and 0.1% Triton X-100 in PBS (buffer C) and incubated 16
h at 4 °C with a 1:2500 dilution of antibody directed against
digoxigenin (Boehringer Mannheim) in buffer C. The IgG was coupled to
alkaline phosphatase. After washing C. elegans four times with
10 volumes of buffer C, antigen-antibody complexes were detected by
incubation with a solution containing 0.5% (w/v) 4-nitro blue
tetrazolium chloride and 0.33 mg/ml 5-bromo-4-chloro-3-indolyl
phosphate, 0.1 M NaCl, 5 mM MgCl
, 0.1 M Tris-HCl pH 9.5, 1 mM levamisole, and 0.1% Tween 20
for 20 min at 37 °C. Alkaline phosphatase catalyzes the synthesis
of an insoluble blue reaction product. Procedures for fixation of C. elegans and in situ hybridization of RNA are
modifications of methods described by Tautz and Pfeifle (36) and Patel and Goodman(37) , respectively.
Modifications were introduced by Seydoux and Fire (Dept. of Embryology,
Carnegie Institute of Washington). (
)
Figure 2:
Characterization and cloning of fragments
of C. elegans DNA that encode novel SL RNA genes. A and
B, samples (10 µg) of high molecular weight C. elegans DNA were digested with ClaI (lane1)
and PstI (lane2) and processed as described
under ``Experimental Procedures.'' Dried agarose gels were
probed with P-labeled oligonucleotides corresponding to
SLb (A) and SL2 (B) (see Fig. 1and
``Experimental Procedures''). Autoradiograms are shown. C, genomic DNA inserts derived from clones (see
``Experimental Procedures'') of recombinant bacteriophage
EMBL4 designated
SLA (lane1) and
SLB (lane2) were digested with AluI and MspI, respectively, and subjected to Southern gel analysis as
described under ``Experimental Procedures.'' Dried agarose
gels were probed with a
P-labeled oligonucleotide
corresponding to SLb. Autoradiograms are presented. Gels were
calibrated with DNA markers that were electrophoresed in parallel lanes
and stained with ethidium bromide. Sizes of the markers are given in
kbp.
Subsequently, a C. elegans genomic DNA
library in bacteriophage EMBL4 was screened with P-labeled
oligonucleotides complementary to the SLb and SLa leader sequences (Fig. 1). Approximately 200 candidate clones were obtained from
1.6
10
recombinant phage. Several recombinants were
plaque purified, and the DNA inserts were characterized by digestion
with restriction endonucleases and Southern gel analysis. Each genomic
insert contained multiple copies of novel SL RNA genes. Representative
examples are shown in Fig. 2C.
Figure 5:
Alignments of novel and classical SL RNA
sequences. A, derived sequences of the novel SL RNAs were
obtained from the results presented in Fig. 3and Fig. 4.
Sequences of SL1, SL2, and SL2
RNAs were obtained from (5) and (16) . Nucleotides conserved in all SL RNA
transcripts are marked with asterisks; nucleotides comprising
the donor splice site are shown with boldfacetype;
nucleotides that constitute the Sm antigen binding site are underlined. The conservation of key sequence motifs, in the
intronic region of novel C. elegans SL RNAs is illustrated in panel B. A. lumbricoides expresses a single SL RNA that
contains the indicated sequences(46) . The introduction of
mutations into these functional regions inhibits the trans-splicing of mRNAs catalyzed by extracts of Ascaris embryos(39) .
Figure 4:
Sequences of the C. elegans SL3 and SL3
RNA genes. Sequences of C. elegans genomic DNA fragments that include the SL3
RNA gene (A) and the SL3
RNA gene (B) are presented.
Positions of the SL3 sequence are indicated with boldface lettering. The SL RNA structural genes are underlined.
Contiguous DNA sequences that flank the 5` and 3` ends of the SL RNA
genes are also shown.
The SL3 and SL3
RNA genes were found in the DNA inserts of
SLB and
SLC (Fig. 4). Spliced leader sequences encoded
by these genes are identical with nucleotides 1-22 of SL3
RNA, but their downstream sequences differ significantly.
Derived
novel SL RNA sequences are aligned with SL1, SL2, and SL2
RNAs in Fig. 5A. The novel SL RNA genes encode spliced
leaders (nucleotides 1-22 or 1-23) that are more homologous
with SL2 (at least 80% identity) than SL1 (
64% identity). A feature
that distinguishes novel SLs from both SL1 and SL2 is the AC
dinucleotide that lies 6 nt upstream from the donor splice site. In
contrast, two segments of SL sequences that correspond to nucleotides
1-5 and 8-12 in SL2 are invariant.
Approximately 80 nt
at the 3` end of an SL RNA transcript are essential for catalyzing the
joining of an SL with a target mRNA(38, 39) . Included
in this region are a donor splice site and an Sm antigen-binding
segment(39, 40) . Antibodies directed against
``Sm'' proteins that are common to all snRNPs (41) also bind with RNPs containing C. elegans SL
RNAs(42, 43, 44) . Proteins that interact
with the Sm sequence are essential for the formation of active
spliceosomes(39, 40) . These key functional regions
are evident in the novel SL RNAs (Fig. 5). The splice donor site
(AGGU) is conserved throughout the SL RNA family. However, the
site is followed by U in the novel SL RNAs as compared with A in SL1
and SL2 RNAs. Three versions of the consensus Sm antigen binding site
(PuoA(U)
GPuo) (45) are employed in C. elegans SL RNAs: SL2, SL3
, SL3
and SL5 RNAs share the sequence
AA(U)
GG (nucleotides 70-78 in SL2); SL1 and SL3
RNAs contain AA(U)
GG; and the corresponding region of SL4
RNA is AA(U)
GA. In Ascaris lumbricoides, a
parasitic nematode, short sequences immediately downstream (AAC) and
15 nt upstream (GUGGC) from the Sm site (Fig. 5B)
play central roles in the first step of the trans-splicing
reaction(39, 40) . Modified versions of these
sequences (Fig. 5A) are present at the appropriate
locations in C. elegans SL RNAs. Sequences of functional
regions in C. elegans SL RNAs are compared with the analogous
segments of SL RNA from Ascaris in Fig. 5B.
The novel 114-nt SL RNAs are more homologous with SL2 and
RNAs (
70% identity) than SL1 RNA (
60% identity). The degree of
similarity with SL2 RNA declines according to the following pattern:
SL3
RNA > SL3
RNA > SL3
RNA > SL4 RNA > SL5
RNA. Thus, the closely linked SL3
, SL4, and SL5 RNA genes (Fig. 3) exhibit maximal homology among themselves and diverge
from SL RNA genes located on different chromosomes (see below). Several
permutations of 5` SLs and 3` catalytic regions are engaged in trans-splicing (Fig. 5A). For example,
SL3
and SL4 RNAs are 90% identical, but unlike the
highly-conserved SL2
and SL2
RNAs, they donate different SLs.
In contrast, SL3 is provided by discrete transcripts (SL3
and
SL3
RNAs) that share only 80% overall identity.
Figure 6: Alignments of putative PSE sequences that flank the 5` ends of C. elegans snRNA and SL RNA genes. The consensus predicted snRNA PSE is taken from (47) ; the consensus SL RNA PSE is based on the published sequences of DNA that lies upstream from the SL1 and SL2 RNA genes (5, 16) and DNA sequences that precede the novel SL RNA genes (as reported in Fig. 3and Fig. 4). Conserved nucleotides in the PSEs are indicated with boldfacetype.
Although perfectly matched probe and
target DNAs yielded the most intense signals, hybridization experiments
also identified distinct but homologous SL RNA genes. This is
illustrated by results obtained for SL2 RNA genes. Radiolabeled probes
derived from the full-length SL2 RNA gene hybridized maximally (Fig. 7A) with YACs (Y74A11, Y56D12, Y48F5) that place
the gene at its previously established locus (on chromosome I) in the
physical map. However, two other strong signals (Fig. 7B) and five weaker signals (not shown) were
observed on the grid after longer exposures to x-ray film. One strong
signal corresponds to YAC Y39H2, which contains the 93% identical
SL2
RNA gene(16) . Another highly conserved SL2-like gene
is evidently present in YAC Y50H4. Assuming that each YAC clone
contains 1 or 2 copies of SL RNA genes, the results suggest that there
are 3-6 SL2 RNA genes and 5-10 closely related genes in the C. elegans genome.
Figure 7:
Identification and chromosomal locations
of SL2 RNA genes and SL2 RNA gene homologs. A filter grid that contains
>90% of the C. elegans genome in an array of YACs was
probed with random-primed, P-labeled DNA corresponding to
the SL2
RNA gene (see ``Experimental Procedures'').
Autoradiograms were obtained 8 (A), 48 (B), and 96 h
(not shown) after hybridization and washing under stringent conditions
as defined in (21) . Signals obtained from YACs Y74A11, Y56D12,
and Y48F5 are indicated with arrows; triangles mark
the signals from YACs Y39H2 and Y50H4.
When similar analyses were performed
with novel SL RNA gene probes, 20 homologous DNA fragments were
identified in recombinant YACs. The DNA inserts mapped to all six
chromosomes, but none hybridized with probes derived from the SL2
RNA gene. Thus,
30 genes direct the synthesis of RNAs that can
donate SL2-related and novel spliced leader exons to mRNAs. These
results, the identification of SLs for which genes have yet to be
cloned (Fig. 1) and the possible occurrence of SL RNA genes that
do not hybridize with the SL gene probes used in these studies, suggest
that the total numbers of SL1 RNA genes (
100 tandem copies on
chromosome V) and dispersed non-SL1, SL RNA genes may be similar.
Figure 8:
Novel SL RNA genes are expressed in C.
elegans. The accumulation of full-length SL RNA transcripts was
monitored by RNase protection analysis as described under
``Experimental Procedures.'' Assays were performed with 40
µg of total RNA isolated from a mixed population of C.
elegans. The P-labeled antisense RNA probes used were
complementary to SL1 RNA (lane1), SL2
RNA (lane2), SL3
RNA (lanes3 and 4), and SL3
, SL5, SL4, and SL3
RNAs (lanes5-8, respectively). The sample applied to lane3 was hybridized with 40 µg of tRNA instead of total C. elegans RNA. A composite autoradiogram is presented.
Signals from lanes1 and 2 were obtained
after exposing x-ray film for 3 h, whereas the time of exposure was
increased to 48 h for lanes3-8. This
experiment was replicated four times. Similar data were obtained in
each instance. Typical results are shown.
Figure 9:
RNase protection analysis distinguishes
among individual SL RNA transcripts. A, RNase protection
assays were performed as described under ``Experimental
Procedures'' and in Fig. 8with one modification.
Nonradioactive SL2 RNA (0.5 µg) was substituted for C.
elegans total RNA. The
P-labeled antisense RNA probes
used were complementary to SL1, SL2
, SL3
, SL3
, SL5, SL4,
and SL3
RNAs (lanes1-7, respectively). An
autoradiogram is shown. Panel B shows autoradiographic signals
obtained when
P-labeled, antisense RNA complementary to
SL4 RNA was hybridized with 0.5 µg of nonradioactive SL3
(lane1) or SL4 (lane2) RNA. Arrows indicate the position of the full-length, protected
probes. Lower molecular weight bands in lane2 (panel A) are apparently due to a low level of partial
radiolytic cleavage of the probe. These bands contribute
8% of the
protected radioactivity and are specifically protected only by SL2
RNA.
The major protected species obtained
with the SL2 and SL3
probes were closely spaced doublets (Fig. 8, lanes2 and 4). This may
result from ``breathing'' in the RNA duplexes. Alternatively,
the two species may reflect heterogeneity at the 3` ends of the
poly(A
) SL RNAs or differentially modified 5`
caps(48) . As expected, the protected fragment of antisense SL1
RNA is
95 nt in length(5) . In addition to the principal
protected species, unique patterns of smaller fragments are observed
with antisense probes for SL RNAs 2
, 3
, 3
, and 3
(Fig. 8, lanes2, 4, 5, and 8). These fragments probably arise from partial protection of
the probes by homologous but nonidentical SL RNAs.
Relative levels
of novel SL RNAs were measured by PhosphorImager analysis. Transcripts
derived from the SL 3, 3
, 3
, 4, and 5 RNA genes are
collectively 0.2 ± 0.02% (mean ± S.E., n = 4) as abundant as SL1 RNA and 0.9 ± 0.07% (n = 4) as abundant as SL2
RNA. Thus, the steady-state
levels of the transcripts of the 5 newly discovered genes account for
only a small fraction of SLs that are available for modifying the 5`
ends of C. elegans mRNAs.
An assay
was designed to estimate the levels of specific SLs on intact SL RNA
molecules. P-labeled antisense RNA probes corresponding to
individual 22- or 23-nt SLs were protected with samples of
poly(A
) RNA. Conditions of hybridization and RNase
digestion were adjusted empirically so that only perfectly matched
probes and target RNA sequences yielded significant signals. Typical
results are shown in Fig. 10. SL1 is the most abundant spliced
leader in C. elegans. However, the amounts of SL exons 3, 4,
and 5 in the poly(A
) RNA population (Fig. 10)
are much higher than the levels of the individual full-length
transcripts reported above. The observations that excess SL2 and SL3
RNA sequences do not protect
P-labeled SL3 and SL2
antisense probes, respectively (data not shown), verified the
specificity of the assays. Quantification of the results with a
PhosphorImager revealed that SLs 2-5 are each 5-15% as
abundant as SL1 in poly(A
) RNA. Collectively, the
size of the donor pool of SLs 2-5 is similar to the size of the
SL1 donor pool. Evidently, multiple SL RNA genes encode identical 5` SL
(2, 3, 4, or 5) sequences.
Figure 10:
Detection of novel spliced leader exons
(SL3, SL4, and SL5) in the poly(A) fraction of C.
elegans RNA. RNase protection analysis was performed as described
under ``Experimental Procedures,'' using 30 µg of C.
elegans poly(A
) RNA. The
P-labeled
oligonucleotide probes were complementary to the 22 or 23 nucleotides
that constitute SL1 (lane1), SL2 (lane2), SL3 (lane3), SL5 (lane4), and SL4 (lane5). An autoradiogram
is presented. The gel was calibrated with DNA oligonucleotides of the
indicated sizes that were fractionated in parallel lanes. This
experiment was replicated three times and the results were essentially
the same in each instance.
Under standard conditions of
electrophoresis the protected, P-labeled antisense SLs
exhibit mobilities similar to the mobility of a DNA marker comprising
26 nucleotides (Fig. 10). Moreover, the SL bands are somewhat
diffuse. Several factors may account for these properties: RNA
molecules migrate 5-10% more slowly than DNA fragments in this
gel system; the 3` ends of the protected antisense RNAs are
heterogeneous because irrelevant nucleotides immediately downstream
from the SL sequence in the probe can be protected when they match (by
chance) downstream nucleotides in intact SL RNAs; the high resolution
gel system can separate molecules of the same length on the basis of
nucleotide composition; and RNAs lacking the trimethyl guanosine cap
migrate faster than capped RNAs. (
)Protected fragments that
migrate more rapidly than the 22-nt DNA marker may be due to the
hybridization of the probe with highly homologous but distinct SLs
and/or ``breathing'' in the AU-rich region of RNA duplexes
that corresponds to the conserved 5` ends of the SL sequences (Fig. 5A).
Figure 11:
Detection of novel spliced leader exons
incorporated in C. elegans mRNAs. RNase protection assays were
performed (see ``Experimental Procedures'') using 1.0 µg
of poly(A) RNA for lanes2-5,
0.2 µg of poly(A
) RNA for lane1, and 2.3 µg of poly(A
) RNA for lane6. Assays were performed with
P-labeled probes complementary to SL1 (lane1), SL2 (lane2), SL3 (lane3), SL5 (lane4), and SL4 (lanes5 and 6). Protected fragments from the
radiolabeled probes were fractionated by electrophoresis in a 15%
polyacrylamide, 7 M urea gel. Signals were visualized by
autoradiography. Autoradiographic signals were quantified as described
under ``Experimental Procedures,'' and the data are presented
in Table 1. Size markers were oligodeoxynucleotides of the
indicated size (in nt) that were end-labeled with
ATP and T4 polynucleotide
kinase.
Northern
blots of poly(A) RNA were incubated with radiolabeled
antisense DNA corresponding to either SL3 or SL4. Both probes
hybridized with a heterogeneous array of mRNAs that ranged from several
hundred to >3000 nt in length (Fig. 12). Although high
intensity bands of 0.5 and 1.1 kilobases were observed with the
SL3-specific probe, it appears that both SLs are incorporated into
large constellations of mRNAs. Signal intensities from the Northern
blots were measured in a PhosphorImager. Messenger RNAs containing SL3
are
8-fold more abundant than mRNAs with a 5` SL4 exon. Thus,
relative frequencies of SL3 and SL4 utilization determined by RNase
protection and Northern analyses are in agreement.
Figure 12:
SL3
and SL4 are incorporated into numerous C. elegans mRNAs. C. elegans poly(A) RNA (2.5
µg/lane) was denatured and fractionated in a 1% agarose
gel as described previously(30) . Resolved mRNAs were
transferred to Nytran membranes, and separate blots were probed with
P-labeled antisense DNA probes complementary to SL3 (lane1) and SL4 (lane2).
Hybridization and washing conditions are described under
``Experimental Procedures.'' RNAs that hybridized with the
probes were visualized by autoradiography at -70
°C.
Figure 13:
Expression of SL RNAs during C.
elegans development. RNase protection analysis was performed as
described under ``Experimental Procedures'' using 30 µg
of total RNA from L1 larvae, L3 larvae, and young adult (A)
animals. P-labeled antisense RNAs complementary to SL1,
SL2, SL3
, and SL4 RNAs were employed as probes. The fragments
protected by SL2, SL3
, and SL4 RNAs are 115 nt in length; the
fragment protected by SL1 RNA is 95 nt long. Signals were recorded by
autoradiography. Quantitative analysis was performed with a
PhosphorImager (Molecular Dynamics).
Figure 14:
Expression of SL4 RNA gene promoter
activity in C. elegans hypodermal cells. Transgenic C.
elegans, which contain a lacZ reporter gene downstream
from 800 bp of DNA that flanks the 5` end of the SL4 RNA gene, were
generated as described under ``Experimental Procedures.'' RNA
encoding -galactosidase was detected by in situ hybridization with digoxigenin-labeled, antisense DNA as described
under ``Experimental Procedures.'' RNA-DNA complexes were
visualized by incubating the specimens serially with anti-digoxigenin
IgGs coupled to alkaline phosphatase and a chromogenic substrate.
IgG-coupled alkaline phosphatase catalyzes the synthesis of an
insoluble blue reaction product in cells transcribing the SL4 RNA gene.
Nomarski interference microscopy revealed that the histochemical stain
appeared in the nuclei of hypodermal cells. A photograph of a stained
adult animal (taken with a Zeiss Axioscope microscope at a
magnification of
100) is presented. More than 90% of the
transgenic C. elegans exhibited similar staining
patterns.
Seminal studies by Hirsh and co-workers (5, 6, 8, 16) and Blumenthal and colleagues (9, 13, 49) demonstrated that 5` ends of C. elegans mRNAs are often covalently modified via trans-splicing. Early reports suggested that targeted mRNAs received either of two 22-nt leader sequences (SL1 or SL2) in a mutually exclusive manner(16) . However, recent determinations of sequences at the extreme 5` ends of several C. elegans cDNAs suggested that a larger family of SL RNA genes may be present in the C. elegans genome ( Fig. 1and (18, 19, 20) ). Heretofore, direct evidence for the occurrence of novel SL RNA genes, the transcription of such genes and the utilization of their 5` SL exons for trans-splicing mRNAs in vivo was lacking.
We have
now cloned and characterized genes that encode five novel C.
elegans SL RNAs. Each gene yields a transcript of 114 nt,
which contains several consensus sequences that are predicted to be
essential for biological activity (Fig. 3Fig. 4Fig. 5Fig. 6). Three of these
genes (encoding SL RNAs 3
, 4, and 5) are clustered within 2 kbp of
genomic DNA near the center of chromosome III. Despite the tight gene
linkage, the corresponding SL RNAs donate three distinct spliced
leaders. Head to tail and head to head orientations of SL RNA genes are
evident in the cluster, suggesting that transcription may proceed from
promoters located in both DNA strands. The observations that composite
SL RNA/snRNA promoter elements (PSEs) lie
50 bp upstream from
nucleotide 1 in the SL RNA coding sequences (i.e. the
predicted transcription start site) support this idea (Fig. 3, Fig. 4, and Fig. 6).
Two other novel SL RNA genes
(3, 3
) and associated PSEs were sequenced and mapped to
chromosome I. SL RNA 3
and 3
transcripts provide identical 5`
leader exons, although their 3` sequences differ significantly ( Fig. 4and Fig. 5A).
The derived 114-nt SL
3-
, SL4, and SL5 RNAs are more homologous with SL2 RNAs than
the SL1 transcript (Fig. 5A). Likewise, the novel 22-
or 23-nt spliced leader sequences (SLs 3-5) are more closely
related to SL2 than SL1. These homologies and the ability of multiple
SL RNA genes to donate identical leader sequences (see Fig. 3Fig. 4Fig. 5and (16) ) suggested
that SL2-5 RNA genes may be members of a larger subfamily. A
minimum estimate of the size of this group of genes was obtained by
hybridizing filters, which contain an ordered, overlapping array of DNA
fragments comprising most of the C. elegans genome, with
P-labeled probes for the SL2, SL3, SL4, and SL5 RNA genes.
Approximately 30 loci that are identical or closely related to the
SL2-5 RNA genes were identified on the autosomal and X
chromosomes. Thus, this gene family is both large and dispersed over a
substantial portion of the total genome.
During the preparation of
this paper, sequences of overlapping cosmid inserts derived from C.
elegans chromosome III were deposited in the GenBank data base, in the context of the C. elegans genome
project(50) . The cosmid designated CEL B0280 (accession number
U10438) contains the clustered SL3
, SL4, and SL5 RNA genes. The
sequence we determined for the genomic DNA insert in
SLA (Fig. 3) is identical with that reported for cosmid CEL B0280.
However, the presence of three SL RNA genes and their associated PSEs
was not detected by the methods employed in the genome project.
Analysis of the data from the C. elegans genome project with
the Genefinder program predicts that the three SL RNA genes in the
cluster lie within introns of a gene that encodes a putative glutamate
receptor subunit(50) . Placement of small SL RNA genes within
introns of a larger gene may reflect the efficient utilization of C. elegans' relatively small genome (
2
10
bp). The nematode genome, which is only
5% as large
as the human genome, contains many introns and intergenic DNA sequences
that are considerably smaller than their counterparts in mammalian
systems(51) . As additional data on the C. elegans genome are generated it will be possible to determine whether
dispersed SL RNA genes are typically positioned within other genes or
are distributed in a more randomized manner along the chromosomes.
Sharp(52) , Nilsen(39) , and Blumenthal and Thomas (51) suggested SL RNAs are functional chimeras composed of a 5`
exon that is fused to an snRNA-like downstream sequence. By performing in vitro mutagenesis and assaying trans-splicing in
extracts of Ascaris embryos, Nilsen and colleagues (38, 39) demonstrated that the intronic, snRNA-like
region of an SL RNA can deliver a variety of natural and synthetic 5`
exons to acceptor mRNAs. SL RNA sequence elements that are essential
for trans-splicing are the Sm protein(s) binding site, a
trinucleotide sequence immediately downstream from the Sm site, a
pentanucleotide sequence 15 nt upstream from the Sm site and a 5`
splice site (Fig. 5B). Consensus sequences for each of
these elements appear in the novel C. elegans SL RNA
transcripts as well as the classical SL2 and SL1 RNAs, thereby
suggesting that the newly discovered genes encode components of SL RNPs
and trans-spliceosomes.
In the closely related nematode Ascaris, the 22-nt SL DNA sequence functions as a promoter
element that is essential for transcription by RNA polymerase
II(53) . The high degree of sequence conservation observed at
the 5` ends (nucleotides 1-12) of C. elegans SLs, the
repetitive utilization of certain spliced leader exons (e.g. SL3 and SL2) on multiple SL RNA gene transcripts, and the
discovery of composite SL1/SL2 PSEs 50 bp upstream from the novel
SL RNA structural genes (Fig. 6) suggest the speculation that
these sequence elements and their corresponding trans-acting
proteins may coordinately drive expression of RNAs that deliver
multiple ``isoforms'' of 5`-untranslated exons to C.
elegans mRNAs.
SL RNA transcripts encoded by the C. elegans SL3-
, SL4, and SL5 RNA genes are produced in vivo (Fig. 8). Although the levels of these individual
transcripts are low (
1% of the abundance of SL2 RNA), the total
pools of novel SLs in poly(A
) RNA (full-length SL
RNAs) and the amounts of novel SLs transferred to mRNAs are similar to
the levels of SL2 in the poly(A
) and
poly(A
) fractions of C. elegans RNA ( Fig. 10and Fig. 11, Table 1). This is probably due
to the occurrence of a limited subset of shared SL exons that are
components of a substantially larger group of related but distinct SL
RNA genes. Differences in the activities of individual SL RNA gene
promoters and/or the stabilities of SL RNAs may also contribute to the
net accumulation and utilization of novel SL RNAs.
In instances where reverse transcription-PCR has been employed to determine sequences at the 5` termini of specific mRNAs, it appears that the novel SL exons listed in Table 1are used in trans-splicing reactions with frequencies that match the utilization rate of the classical SL2 sequence(18, 19) . These results provide direct experimental evidence for the in vivo functionality of novel SL RNAs. The appearance of SL3 on mRNA encoding the sex determination factor TRA-2 (18) documents a linkage among a novel SL RNA gene(s), its transcription in vivo, and the use of the 5` SL exon of the transcript to modify a specific mRNA via trans-splicing.
Many C. elegans mRNAs receive a 5` leader exon from transcripts of the repeated SL1 RNA genes. The precise physiological significance of trans-splicing with SL1 is not known. Since the SL usually terminates within 1-50 nt of the initiator AUG, it seems probable that trans-splicing eliminates upstream, out-of-frame AUG codons and deletes long 5`-untranslated sequences capable of folding into secondary structures that inhibit translation. In addition to optimizing translation efficiency, modification with the SL may alter the stability of the target mRNA.
Messenger RNAs that contain SL1 are encoded by
conventional structural genes that are preceded by contiguous promoter
sequences. However, 25% of C. elegans genes are organized
as operons, in which transcription of 2-5 mRNAs is driven by a
single 5` promoter/enhancer region(4, 13) . Exact
sequences of 5` ends of mRNAs derived from downstream genes in C.
elegans operons have been determined only in a few instances
(reviewed in Refs. 4 and 13). The data indicate that SL2 is appended
only to mRNAs encoded by downstream genes. Moreover, polycistronic
mRNAs are not cleaved and trans-spliced if upstream
polyadenylation signals are eliminated by mutation(13) . On the
basis of these considerations, Speith et al.(13) proposed that SL2 RNA plays a specialized role in
processing polycistronic mRNAs. A plausible suggestion is that RNPs
containing SL2 RNA bind to proteins that catalyze 3`-end processing of
pre-mRNAs(13) . Such complexes would mediate the 3` cleavage
and polyadenylation of an upstream mRNA and simultaneously place the
SL2 RNP in proximity with the trans-splice acceptor site at
the 5` end of the neighboring downstream mRNA.
The SL3 exon, which
appears at the 5` ends of three SL RNA structural genes described in
this paper, and additional novel SLs (Table 1) are incorporated
at the 5` termini of mRNAs for TRA-2 and/or protein kinase C1A and
casein kinase II(18, 19) . Each of these mRNAs is
derived from a gene that occupies a downstream position in an
independent operon(13) . These observations and a conserved
structural relationship with SL2 RNA (see above) strongly suggest that
SL3
-
, SL4, and SL5 RNAs play important roles in the
generation of efficiently translated, monocistronic RNAs from the
internal segments of polycistronic transcripts. The molecular basis for
the exclusion of SL1 RNA and the utilization of a family of non-SL1
RNAs for this mode of pre-mRNA processing is unknown. One speculative
suggestion is that a partially conserved sequence near the 3` ends of
SL RNAs 2-5 (nt 94-114) provides a binding site for a
protein(s) involved in the 3` cleavage and/or polyadenylation of mRNAs.
The smaller SL1 RNA transcript lacks this sequence.
The relative abundance of SL4 RNA increases 6-fold during postembryonic development (Fig. 13). Moreover, the putative promoter for the SL4 RNA gene is active principally in hypodermal cells (Fig. 14). These results suggest that the SL4 leader might selectively modulate mRNA translation, stability etc, in a subset of hypodermal transcripts during late development. Furthermore, novel upstream promoter/enhancer elements may control the level and cell-specific expression of SL4 RNA transcripts. These possibilities must be regarded with caution in the absence of (a) knowledge of the properties of specific mRNAs that are trans-spliced with SL4 and (b) systematic analysis of the gene promoter by mutagenesis. Moreover, patterns of promoter activity must be established for other SL RNA genes and compared with that observed for the SL4 RNA gene. Nevertheless, the observations indicate that potential regulatory elements in the SL4 sequence and SL4 RNA gene promoter/enhancer merit further study.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U29449[GenBank], U29490[GenBank], and U29491[GenBank].