©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
Structure and Expression of Novel Spliced Leader RNA Genes in Caenorhabditis elegans(*)

(Received for publication, May 4, 1995; and in revised form, June 29, 1995)

Leorah H. Ross Jonathan H. Freedman (§) Charles S. Rubin

From the Department of Molecular Pharmacology, Atran Laboratories, Albert Einstein College of Medicine, Bronx, New York 10461

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

Approximately 25% of Caenorhabditis elegans genes are organized as operons. Polycistronic transcripts are converted to monocistronic mRNAs by 3` cleavage/polyadenylation and 5` trans-splicing with untranslated, 5`-terminal exons called spliced leaders, (SLs). The 5` termini of mRNAs encoded by downstream genes in operons are acceptors for geq7 recently discovered ``novel'' SLs and a classical SL (SL2). Diversity in SL exons is now partly explained by the discovery and characterization of five novel genes that encode C. elegans SL RNAs. These novel SL RNAs contain a 22- or 23-nucleotide SL followed by conserved splice donor and downstream sequences that are essential for catalysis of trans-splicing reactions. The SL3alpha, SL4, and SL5 RNA genes are tightly clustered on chromosome III; their 114-nucleotide transcripts deliver three distinct SLs to mRNAs. The SL3beta and SL3 RNA genes are on chromosome I, but are not tightly linked. SL RNAs 3alpha, 3beta, and 3 provide identical 5` leader exons, although their 3` sequences diverge. Transcription of SL 3-5 RNA genes appears to be driven by flanking DNA elements that are homologous with segments of promoters for the C. elegans SL2 RNA and small nuclear RNA genes. RNase protection assays demonstrated that novel SL RNAs are transcribed in vivo and accumulate in the poly(A) RNA pool. SL3 exons are transferred to mRNAs as frequently as SL2 exons. In contrast, SL4 is appended to mRNAs 10% as frequently as SL3. The abundance of SL4 RNA increased 6-fold during postembryonic development, and the SL4 RNA gene promoter is active principally in hypodermal cells.


INTRODUCTION

The free living nematode Caenorhabditis elegans excises noncoding segments of gene transcripts by three processes. Internal introns are apparently removed by a conserved classical cis-splicing mechanism that involves snRNPs (^1)(U1-U6), a branched intermediate and auxiliary protein factors(1, 2, 3) . Approximately 70% of C. elegans mRNAs are covalently modified at their 5` termini by the addition of a 22-nt leader sequence (4) named the ``spliced leader'' or SL. SLs are attached in trans-splicing reactions(5, 6, 7) . The 5` SL that appears most frequently on C. elegans mRNAs corresponds to nucleotides 1-22 of the product of the SL1 RNA gene(5, 8) . This gene is reiterated in tandem 100 times on chromosome V and encodes an RNA composed of 95 nt(8) . C. elegans transcripts that (a) contain a consensus splice acceptor sequence (UUUCAG) upstream from the coding region and (b) lack a corresponding 5` splice donor sequence, are targets for trans-splicing(9) . Most of the spliceosomal components required for cis-splicing are also essential for trans-splicing(10) . Key differences are as follows: (a) an SL1 (or SL2; see below) RNA complexed with proteins is included in the splicing complex; (b) SL RNA is consumed in each round of trans-splicing, thereby necessitating reloading or reassembly of the spliceosome for subsequent catalysis; and (c) SL RNA provides a trimethylguanosine cap for mature mRNAs(11, 12) . The 22-nt leader derived from SL1 RNA is usually appended to the 5` end of mRNAs derived from typical genes, in which transcribed sequences are preceded by contiguous promoter/enhancer regions.

Unlike many other eukaryotes, C. elegans contains numerous genes that are organized as operons(4, 13) . Thus, transcription of two (or more) structural genes is sometimes driven by a unique 5` promoter/enhancer region. Polycistronic transcripts are converted to monocistronic mRNAs by a combination of cleavage, polyadenylation, and trans-splicing(13, 14, 15) . Messenger RNAs encoded by downstream genes in operons receive leader sequences that differ from SL1(13, 16) . Initially, two highly homologous SL RNA genes (SL2alpha and SL2beta), which encode identical 22-nt SLs were identified as sources of these 5` termini(16) .

An early compilation of trans-spliced cDNA sequences suggested that C. elegans mRNAs received SL1 or SL2 in a mutually exclusive manner(16) . However, recent applications of a reverse transcriptase-anchored PCR procedure (17) revealed that the 5` ends of mRNAs encoding a sex determination factor (TRA-2; (18) ), protein kinase C1A(19) , and the beta subunit of casein kinase II (^2)(20) are modified with multiple ``novel'' SLs. Thus, our knowledge of SL RNA genes is incomplete, and multiple pertinent questions regarding the origin and utilization of the novel SLs can be asked. Are the novel SLs more closely related to SL1 or SL2? Are the structures of the novel SL RNA genes similar to the previously described SL1 and SL2 RNA genes or different? Are the novel SL RNA genes clustered or scattered in the genome? What are the sizes of the pools of novel SL RNA transcripts relative to the levels of SL1 and SL2 RNAs? To what degree are the novel SLs utilized on mRNAs in vivo? Is the expression of any of the novel SL RNA genes regulated?

To address these questions we have cloned and characterized five novel SL RNA genes. The genes can be either clustered or isolated, and they encode homologous RNA sequences of 114 nt. The novel, 5` SL sequences are more closely related to SL2alpha and SL2beta than SL1. Although the relative abundance of the novel SL RNAs is low when compared with steady-state SL1 and SL2 RNA concentrations, the utilization of certain novel 5` SL sequences on mRNAs occurs with a substantial frequency. Finally, expression of the novel SL4 RNA gene is developmentally regulated, and SL4 RNA transcripts are detected only in hypodermal cells.


EXPERIMENTAL PROCEDURES

Growth of C. elegans

The Bristol N2 strain of C. elegans was grown, synchronized, and harvested as described in previous publications(21, 22) .

Southern Gel Analysis

C. elegans genomic DNA was prepared by the procedure of Yesner and Emmons(23) . Samples of high molecular weight DNA (10 µg) were digested with restriction endonucleases, and the resulting fragments were fractionated on a 1% agarose gel. To optimize hybridization of P-labeled oligonucleotides with DNA fragments ``dry'' gels were prepared as described by Thein and Wallace(24) . DNA fragments in the gels were denatured by incubation in 0.5 M NaOH, 1.5 M NaCl for 10 min at 22 °C. After neutralization in 0.5 M Tris-HCl, pH 8.0, containing 1.5 M NaCl (10 min at 22 °C), the gel was placed in 5 times SSPE (1 times SSPE = 10 mM sodium phosphate, pH 6.5, containing 0.15 M NaCl and 1.3 mM EDTA) that was supplemented with 0.1% sodium pyrophosphate, 0.1% SDS, and 0.1 mg/ml tRNA. Oligonucleotides complementary to SL2 (16) and SLb (see Fig. 1), a novel spliced leader, were 5`-end-labeled (10^9 cpm/µg) as described previously(21) . Probes (2 times 10^6 cpm/ml) were hybridized with the gel for 8 h at 35 °C. Subsequently, the gel was washed twice with 6 times SSC (1 times SSC = 0.15 M NaCl, 15 mM sodium citrate, pH 7.0) containing 0.1% SDS for 15 min at 45 °C. After three rinses with the same solution at 22 °C, the gels were exposed to x-ray film (Kodak XAR-5) at -70 °C.


Figure 1: Multiple, nonclassical SLs are incorporated at the 5` termini of certain C. elegans cDNAs. Primary structures of nonclassical SLs were determined by the direct sequencing of amplified cDNAs(17, 18, 19) . The cDNAs contained complete copies of the 5` termini of C. elegans mRNAs encoding protein kinase C1A, TRA-2, and casein kinase IIbeta (18, 19, 20) .^2 Only the SL sequences are shown. SLs a-d were appended to protein kinase C1A mRNA; SLs b-f were present at the 5` end of tra-2 mRNA; SLg was found at the 5` terminus of cDNA encoding the beta subunit of casein kinase II. The novel SLs are aligned with the classical C. elegans spliced leader sequences SL1 and SL2(5, 16) . Novel SLs marked with an asterisk are evident in both protein kinase C1A and tra-2 cDNAs. A distinctive dinucleotide sequence that is conserved in all of the novel SLs is shown in boldfacetype. The lengths (in nucleotides) of the SLs are given in parenthesis.



Isolation of Novel SL RNA Genes

A C. elegans genomic library in bacteriophage EMBL4 was provided by Dr. Chris Link, University of Denver (Denver, CO). This library was screened with oligonucleotides complementary to novel leader sequences SLb and SLa (see Fig. 1), which appear at the 5` ends of PKC1A mRNAs(19) . Duplicate nitrocellulose filter lifts of recombinant EMBL4 phage were hybridized at 35 °C for 18 h in a mixture of 5 times SSPE containing 5 times Denhardt's solution, 0.1% sodium pyrophosphate, 0.1% SDS, 0.1 mg/ml tRNA, and P-labeled probe (4 times 10^6 cpm/ml). Subsequently, filters were washed twice (15 min) at 45 °C with 6 times SSC, 0.1% sodium pyrophosphate, 0.1% SDS and twice (15 min) at room temperature with the same solution before drying and autoradiography at -70 °C. Three positive phage clones were purified to homogeneity. The DNA insert from clone SLA hybridized with the antisense SLa and SLb oligonucleotides; the other DNA inserts (clones SLB and SLC) hybridized only with the SLb antisense oligonucleotide. When SLA was digested with HpaII, two fragments of similar size (2.2 kbp) hybridized with the P-labeled SLb probe. These segments of genomic DNA were cloned into pGEM7Z for DNA sequence analysis. Digestion of SLC with HindIII yielded a 1-kbp DNA fragment that hybridized with the SLb probe. This fragment was cloned into pGEM7Z and sequenced as described below. SLB was digested with SpeI, and a 5.6-kbp fragment that hybridized with the SLb oligonucleotide was subcloned into pGEM5Z. The amplified 5.6-kbp insert was digested sequentially with AatII, XbaI, and ClaI. An AatII/XbaI fragment (2.5 kbp) hybridized weakly with the SLb probe and was subcloned into pGEM7Z for DNA sequencing. A 2.3-kbp ClaI/XbaI fragment hybridized strongly with the probe and was subcloned into pGEM7Z for further restriction enzyme analysis. Upon digestion with SacI, this recombinant plasmid yielded a 1.3-kbp genomic DNA fragment and linearized vector (3.0 kbp) fused with 1.0 kbp of C. elegans DNA. Both species of DNA hybridized with the SLb probe. The 1.3-kbp SacI fragment was cloned into pGEM5Z, the 4-kbp DNA segment was religated, and both inserts were sequenced as described below. Each recombinant pGEM5Z or 7Z plasmid that was sequenced contained only one SL RNA gene.

DNA Sequence Analysis

DNA inserts were sequenced by the dideoxynucleotide chain termination procedure of Sanger et al.(25) using SP6, T7, and synthetic oligonucleotide primers as described previously(21) .

Computer Analysis

Analysis of sequence data, sequence comparisons, and data base searches were performed using PCGENE-IntelliGenetics software (IntelliGenetics, Mountainview, CA) and BLAST programs (26) provided by the NCBI server at the National Library of Medicine/National Institutes of Health.

Mapping SL RNA Genes to Chromosomes

A nitrocellulose filter that contains an array of 958 recombinant yeast artificial chromosomes (YACs) was obtained from Dr. Alan Coulson at the Medical Research Council Laboratory of Molecular Biology, Cambridge, United Kingdom. Overlapping DNA inserts (200-250 kbp) in the YACs account for >90% of the C. elegans genome(27) . DNA inserts from plasmids that contain individual SL RNA genes and their contiguous 5` and 3` flanking sequences (see above) were used as templates to generate randomly primed, P-labeled probes(21) . A plasmid containing the SL2alpha RNA gene was generously provided by Dr. David Hirsh (Department of Biochemistry, Columbia University). Filter grids were hybridized and washed under high stringency conditions as described previously for Southern blot analysis(21) . Signals were visualized by autoradiography. Physical map positions for the hybridizing inserts were determined from the C. elegans electronic map data base (Harvard Medical School). Map positions of novel SL RNA genes were verified by the fingerprinting method of Coulson et al.(28) in a collaborative experiment with A. Coulson and colleagues (MRC laboratory of Molecular Biology, Cambridge, United Kingdom).

Isolation of RNA and Northern Gel Analysis

Total RNA was extracted from C. elegans as described by Hu and Rubin(21) . Poly(A) RNA was isolated by the procedure of Sambrook et al.(29) . Northern blot analysis was performed as described previously(30) . RNA samples used in these studies were normalized to contain the same amounts of myosin light chain mRNAs as described in Hu and Rubin(20) . Levels of myosin light chain mRNAs are constant during C. elegans development(31) .

RNase Protection Analysis: Detection and Quantification of SL RNAs

Primers that direct the synthesis of 200-bp segments of C. elegans DNA, which include an SL RNA gene and short 5`- and 3`-flanking sequences, were designed. Recombinant pGEM5Z or 7Z plasmids that contain SL RNA genes (see above) served as templates for DNA synthesis catalyzed by Pfu DNA polymerase (Stratagene). NotI and NsiI restriction sites were introduced via the 5` and 3` PCR primers, respectively. PCR products were digested with NotI and NsiI and cloned into pGEM5Z plasmid that was cleaved with the same enzymes. Constructs were verified by sequencing both DNA strands.

Plasmids were linearized by digestion with SpeI, and P-labeled antisense SL RNA probes (200 nt) were synthesized via SP6 RNA polymerase and purified as described previously (20, 22) . Control (sense) probes were generated by T7 RNA polymerase after cleaving with PvuII. Hybridization of P-labeled antisense (or sense) probes (10^6 cpm) with C. elegans total RNA and subsequent digestion of single-stranded RNA with RNase T(1) and RNase A was performed as previously reported(20) . Protected duplex RNA was denatured and subjected to electrophoresis in a 6% polyacrylamide gel containing 7 M urea. Protected, P-labeled complementary SL RNA (115 nt) was visualized by autoradiography on XAR-5 film and quantified by PhosphorImager (Molecular Dynamics) analysis.

Detection and Quantification of Spliced Leader Sequences in C. elegans mRNAs and Poly(A) RNA

Levels of specific SL sequences in poly(A) RNA were measured by RNase protection analysis, using full-length P-labeled antisense probes (see above). Assays were carried out as described above except for two modifications; probe and poly(A) RNA were heated at 85 °C for 10 min and then incubated at 37 °C for 14 h, and RNase digestion was performed at 35 °C. Protected P-labeled fragments were characterized by electrophoresis (in a 15 or 18% polyacrylamide gel that contained 7 M urea) and autoradiography.

The abundance of various SL sequences in poly(A) RNA was determined by RNase protection analysis as described above for poly(A) RNA. P-labeled antisense RNA probes (5 times 10^5 cpm) that complement only the SL sequence were employed. Probes were generated by the method of Milligan et al.(32) using T7 RNA polymerase (>1000 units/ml) provided by Heike Pelka (Department of Molecular Developmental Biology, Albert Einstein College of Medicine). Reaction products (22- or 23-nt SL plus 12 or 13 irrelevant nt) were analyzed on a 10% polyacrylamide, 7 M urea gel to verify their lengths and purity. The 35-nt probes were readily distinguished from the 22/23-nt protected fragments in RNase protection assays.

Preparation of Transgenic C. elegans

Cloned genomic DNA that corresponds to the first 40 bp of the coding region of the SL4 RNA gene and 800 bp of contiguous 5`-flanking DNA (Fig. 3) was amplified by PCR(20, 22) . A separate set of primers was used to amplify a 197-bp DNA fragment that encompassed 162 bp of 5`-flanking sequence and 35 bp of the SL4 RNA structural gene. SphI and SalI restriction sites were introduced via the 5` and 3` primers, respectively. Amplified DNAs were digested with SphI and SalI and cloned into the C. elegans expression vector pPD16.51(33) , which was cleaved with the same enzymes. This places the putative SL4 RNA gene promoter upstream from a nuclear localization signal and the lacZ reporter gene. Recombinant SL4 RNA:lacZ chimeric genes and the rol-6 gene, which provides a selectable marker phenotype(34) , were co-injected into the gonadal syncytium of C. elegans, and stable lines of transgenic nematodes were established as described previously(35) .


Figure 3: Sequence of a segment of C. elegans DNA that contains the SL3alpha, SL4, and SL5 RNA genes. The spliced leader sequences SL3 (inverse complement of nucleotides 933-954), SL5 (nucleotides 1790-1812), and SL4 (nucleotides 2691-2712) are shown in boldfacetype. Sequences of the SL RNA genes are underlined. Restriction enzyme sites (ClaI, SacI, and XbaI, in ascending order) used to clone the individual SL RNA genes and their associated 5`- and 3`-flanking regions are indicated with boldface italic type . The 152-nt inverted repeats that flank the gene cluster are shown in lowercase letters.



Detection of SL RNA Gene Promoter Activity by in Situ Hybridization Analysis

C. elegans were fixed with 4% formaldehyde in 0.1 M HEPES-NaOH buffer, pH 6.9, containing 2 mM MgSO(4) and 1 mM EGTA for 4.5 h at 20 °C. The specimens were dehydrated by successive incubations (10 min) with solutions containing increasing amounts (20% increments) of methanol and corresponding decrements in 10 mM sodium phosphate buffer, pH 7.4, 0.15 M NaCl (PBS). Nematodes were stored in 100% methanol at -70 °C. Prior to hybridization, C. elegans were incrementally rehydrated at 22 °C with PBS containing 0.1% (v/v) Tween 20 and 0.2% (w/v) SDS (buffer A) and washed twice with 5% (v/v) 2-mercaptoethanol in buffer A. The nematodes were then incubated with 50 µg/ml Proteinase K in PBS for 15 min at 37 °C to permeabilize the external collagenous coat. Partial digestion was terminated by washing with 10 volumes of 25 mM glycine in buffer A. After two washes with buffer A, C. elegans were incubated in PBS containing 4% formaldehyde for 20 min at 22 °C, washed three times with PBS containing 0.1% Tween 20, and equilibrated with buffer B (40% (v/v) formamide, 5 times SSC, 0.1 mg/ml sonicated salmon sperm DNA, 50 µg/ml heparin, and 0.1% Tween 20). Prehybridization in buffer B was carried out at 48 °C for 1 h.

The pPD16.51 expression plasmid (33) was cleaved at a unique SacI site that precedes the lacZ gene. An oligonucleotide primer was designed to initiate Taq DNA polymerase-catalyzed DNA synthesis 2 kbp downstream from the lacZ transcription initiation site. DNA synthesis progressed toward the 5` end of the lacZ gene, yielding a 2-kilobase single-stranded, antisense DNA probe. Primer (30 ng) and 200 ng of template DNA were added to a reaction mixture containing 0.5 M KCl, 0.01 M Tris-HCl, pH 8.3, 1.5 mM MgCl(2), 0.001% gelatin, 0.2 mM dATP, 0.2 mM dCTP, 0.2 mM dGTP, and 0.13 mM dTTP. Digoxigenin-11-dUTP was added to a final concentration of 70 µM. The reaction mixture (25 µl) was heated at 100 °C for 3 min and then was incubated in a thermal cycler under the following conditions: denaturation at 95 °C for 45 s; annealing at 55 °C for 30 s; and synthesis at 72 °C for 90 s. After 35 cycles, single-stranded DNA was precipitated with ethanol and dissolved in 0.3 ml of buffer B. The probe was boiled for 10 min and an aliquot (6 µl) was added to a suspension (20 µl) of fixed nematodes in buffer B. Hybridization of probe with lacZ RNA derived from the chimeric reporter genes (see above) was carried out at 48 °C for 16 h. Subsequently, C. elegans were washed serially with 1-ml aliquots of PBS, 0.1% Tween 20 containing 80, 60, 40, 20, and 0% buffer B at 22 °C. C. elegans were then washed with 0.1% (w/v) albumin and 0.1% Triton X-100 in PBS (buffer C) and incubated 16 h at 4 °C with a 1:2500 dilution of antibody directed against digoxigenin (Boehringer Mannheim) in buffer C. The IgG was coupled to alkaline phosphatase. After washing C. elegans four times with 10 volumes of buffer C, antigen-antibody complexes were detected by incubation with a solution containing 0.5% (w/v) 4-nitro blue tetrazolium chloride and 0.33 mg/ml 5-bromo-4-chloro-3-indolyl phosphate, 0.1 M NaCl, 5 mM MgCl(2), 0.1 M Tris-HCl pH 9.5, 1 mM levamisole, and 0.1% Tween 20 for 20 min at 37 °C. Alkaline phosphatase catalyzes the synthesis of an insoluble blue reaction product. Procedures for fixation of C. elegans and in situ hybridization of RNA are modifications of methods described by Tautz and Pfeifle (36) and Patel and Goodman(37) , respectively. Modifications were introduced by Seydoux and Fire (Dept. of Embryology, Carnegie Institute of Washington). (^3)


RESULTS

Formulation of Strategy and Cloning of Novel SL RNA Genes from C. elegans

Nonclassical SLs that appear at the 5` ends of cDNAs encoding three distinct proteins are shown in Fig. 1. Three of these SLs (Fig. 1, asterisks) appear upstream from cDNA sequences encoding both protein kinase C1A (19) and TRA-2 (18) . The remaining SLs were attached to only a single type of cDNA. Nonclassical SLs are collectively designated ``novel SLs.'' Oligonucleotides complementary to SL2 and the novel SL named SLb (Fig. 1) were end-labeled with P and used to establish hybridization conditions suitable for differentially identifying C. elegans DNA fragments encoding novel SL RNA genes (see ``Experimental Procedures''). When C. elegans DNA was digested with ClaI and fractionated in an agarose gel the SLb probe hybridized exclusively with a 15-kbp fragment (Fig. 2A, lane1). In contrast, the SL2 probe generated a prominent signal with a 5-kbp segment of ClaI-digested DNA and also hybridized with five other restriction fragments (Fig. 2B, lane1). None of these fragments corresponded to the band observed with the SLb probe. The pattern of four principal bands obtained with the SL2 probe (Fig. 2B, lanes1 and 2) is in agreement with the copy number for SL2 genes reported by Huang and Hirsh(16) .


Figure 2: Characterization and cloning of fragments of C. elegans DNA that encode novel SL RNA genes. A and B, samples (10 µg) of high molecular weight C. elegans DNA were digested with ClaI (lane1) and PstI (lane2) and processed as described under ``Experimental Procedures.'' Dried agarose gels were probed with P-labeled oligonucleotides corresponding to SLb (A) and SL2 (B) (see Fig. 1and ``Experimental Procedures''). Autoradiograms are shown. C, genomic DNA inserts derived from clones (see ``Experimental Procedures'') of recombinant bacteriophage EMBL4 designated SLA (lane1) and SLB (lane2) were digested with AluI and MspI, respectively, and subjected to Southern gel analysis as described under ``Experimental Procedures.'' Dried agarose gels were probed with a P-labeled oligonucleotide corresponding to SLb. Autoradiograms are presented. Gels were calibrated with DNA markers that were electrophoresed in parallel lanes and stained with ethidium bromide. Sizes of the markers are given in kbp.



Subsequently, a C. elegans genomic DNA library in bacteriophage EMBL4 was screened with P-labeled oligonucleotides complementary to the SLb and SLa leader sequences (Fig. 1). Approximately 200 candidate clones were obtained from 1.6 times 10^5 recombinant phage. Several recombinants were plaque purified, and the DNA inserts were characterized by digestion with restriction endonucleases and Southern gel analysis. Each genomic insert contained multiple copies of novel SL RNA genes. Representative examples are shown in Fig. 2C.

Terminology

SL is defined as a 22- or 23-nucleotide RNA segment (untranslated exon) that is transferred from the 5` end of a spliced leader RNA to the 5` terminus of a messenger RNA via transesterification. SL RNA is a 114-nt (or 95-nt for SL1 RNA) RNA transcript encoded by a spliced leader RNA gene. SL RNA contains an SL, a contiguous donor splice site, and downstream sequences essential for catalysis of the trans-splicing reaction. SL RNA genes are segments of DNA that encode SL RNAs. Expression of these genes is driven (in part) by a characteristic promoter element that lies 40-65 bp upstream from the transcription start site (nucleotide 1 in the SL sequence).

Organization and Sequences of SL RNA Genes

Genomic DNA inserts from recombinant phage clones designated SLA, SLB, and SLC (see ``Experimental Procedures'') were characterized. Restriction fragments of the insert in SLA that hybridized with the P-labeled oligonucleotides (see ``Experimental Procedures'') were subcloned and sequenced. Three SL RNA genes were discovered within 2990 bp of contiguous sequence (Fig. 3). Novel SL sequences were initially identified by homology with SLs presented in Fig. 1. Alignment of eight derived RNA sequences revealed that ``typical'' C. elegans SL RNA genes comprise 114 bp and contain conserved sequences that are essential for spliceosome assembly and function (see Fig. 5, and see below). Genes and cognate SL RNAs were named according to their 5` leader sequence. SL3alpha RNA has a 22-nt leader sequence that differs from SL2 by the substitution of AC for CU at nucleotides 17 and 18. SL5 is identical with SL3alpha except for the insertion of A at position 13. The SL3alpha and SL5 RNA genes are oriented head to head, so that transcription is likely to be driven by promoter elements located in the intervening 835 bp (Fig. 3). SL4 differs from SL2 at multiple positions. It contains UA in place of GU as nucleotides 13 and 14, as well as the AC dinucleotide at positions 17 and 18. The SL5 and SL4 RNA genes are positioned in a head to tail fashion and are separated by 900 bp of intervening sequence (Fig. 3). The three-gene cluster is flanked by 152-bp inverted repeats (151 identities) at its 5` and 3` ends. Repeated sequences begin at +85 (taking the first nucleotide of the SL as +1) in both the SL3alpha and SL4 RNA genes and extend to +237 in each instance. Other SL RNA gene sequences diverge after nucleotide 114.


Figure 5: Alignments of novel and classical SL RNA sequences. A, derived sequences of the novel SL RNAs were obtained from the results presented in Fig. 3and Fig. 4. Sequences of SL1, SL2alpha, and SL2beta RNAs were obtained from (5) and (16) . Nucleotides conserved in all SL RNA transcripts are marked with asterisks; nucleotides comprising the donor splice site are shown with boldfacetype; nucleotides that constitute the Sm antigen binding site are underlined. The conservation of key sequence motifs, in the intronic region of novel C. elegans SL RNAs is illustrated in panel B. A. lumbricoides expresses a single SL RNA that contains the indicated sequences(46) . The introduction of mutations into these functional regions inhibits the trans-splicing of mRNAs catalyzed by extracts of Ascaris embryos(39) .




Figure 4: Sequences of the C. elegans SL3beta and SL3 RNA genes. Sequences of C. elegans genomic DNA fragments that include the SL3beta RNA gene (A) and the SL3 RNA gene (B) are presented. Positions of the SL3 sequence are indicated with boldface lettering. The SL RNA structural genes are underlined. Contiguous DNA sequences that flank the 5` and 3` ends of the SL RNA genes are also shown.



The SL3beta and SL3 RNA genes were found in the DNA inserts of SLB and SLC (Fig. 4). Spliced leader sequences encoded by these genes are identical with nucleotides 1-22 of SL3alpha RNA, but their downstream sequences differ significantly.

Derived novel SL RNA sequences are aligned with SL1, SL2alpha, and SL2beta RNAs in Fig. 5A. The novel SL RNA genes encode spliced leaders (nucleotides 1-22 or 1-23) that are more homologous with SL2 (at least 80% identity) than SL1 (leq64% identity). A feature that distinguishes novel SLs from both SL1 and SL2 is the AC dinucleotide that lies 6 nt upstream from the donor splice site. In contrast, two segments of SL sequences that correspond to nucleotides 1-5 and 8-12 in SL2 are invariant.

Approximately 80 nt at the 3` end of an SL RNA transcript are essential for catalyzing the joining of an SL with a target mRNA(38, 39) . Included in this region are a donor splice site and an Sm antigen-binding segment(39, 40) . Antibodies directed against ``Sm'' proteins that are common to all snRNPs (41) also bind with RNPs containing C. elegans SL RNAs(42, 43, 44) . Proteins that interact with the Sm sequence are essential for the formation of active spliceosomes(39, 40) . These key functional regions are evident in the novel SL RNAs (Fig. 5). The splice donor site (AGGU) is conserved throughout the SL RNA family. However, the site is followed by U in the novel SL RNAs as compared with A in SL1 and SL2 RNAs. Three versions of the consensus Sm antigen binding site (PuoA(U)(n)GPuo) (45) are employed in C. elegans SL RNAs: SL2, SL3beta, SL3 and SL5 RNAs share the sequence AA(U)(5)GG (nucleotides 70-78 in SL2); SL1 and SL3alpha RNAs contain AA(U)(4)GG; and the corresponding region of SL4 RNA is AA(U)(4)GA. In Ascaris lumbricoides, a parasitic nematode, short sequences immediately downstream (AAC) and 15 nt upstream (GUGGC) from the Sm site (Fig. 5B) play central roles in the first step of the trans-splicing reaction(39, 40) . Modified versions of these sequences (Fig. 5A) are present at the appropriate locations in C. elegans SL RNAs. Sequences of functional regions in C. elegans SL RNAs are compared with the analogous segments of SL RNA from Ascaris in Fig. 5B.

The novel 114-nt SL RNAs are more homologous with SL2alpha and beta RNAs (geq70% identity) than SL1 RNA (leq60% identity). The degree of similarity with SL2 RNA declines according to the following pattern: SL3 RNA > SL3beta RNA > SL3alpha RNA > SL4 RNA > SL5 RNA. Thus, the closely linked SL3alpha, SL4, and SL5 RNA genes (Fig. 3) exhibit maximal homology among themselves and diverge from SL RNA genes located on different chromosomes (see below). Several permutations of 5` SLs and 3` catalytic regions are engaged in trans-splicing (Fig. 5A). For example, SL3alpha and SL4 RNAs are 90% identical, but unlike the highly-conserved SL2alpha and SL2beta RNAs, they donate different SLs. In contrast, SL3 is provided by discrete transcripts (SL3alpha and SL3beta RNAs) that share only 80% overall identity.

Characterization of DNA Sequences That Flank the 5` Ends of SL RNA Structural Genes

DNA flanking the 5` ends of C. elegans snRNA genes (U2-U6) contains a 22-24-nt proximal site element (PSE) that lies between nucleotides -40 and -65 ( Fig. 6and (47) ). The snRNA PSE is thought to be an important component of the promoter, and PSE-like sequences were also observed in DNA upstream from the SL1 and SL2 RNA genes(47) . Putative PSEs for the SL3alpha, -3, -4, and -5 RNA genes are homologous and appear to be ``composite'' elements that contain short DNA sequences corresponding to segments of PSEs preceding the SL1 and SL2 RNA genes (Fig. 6). In contrast, the PSE for the SL3beta RNA gene is divergent and distinct. Differential utilization of PSE, 5` splice donor, and 3` catalytic modules may generate diversity in both the levels and sequences of SL RNAs.


Figure 6: Alignments of putative PSE sequences that flank the 5` ends of C. elegans snRNA and SL RNA genes. The consensus predicted snRNA PSE is taken from (47) ; the consensus SL RNA PSE is based on the published sequences of DNA that lies upstream from the SL1 and SL2 RNA genes (5, 16) and DNA sequences that precede the novel SL RNA genes (as reported in Fig. 3and Fig. 4). Conserved nucleotides in the PSEs are indicated with boldfacetype.



Copy Number and Chromosomal Location of Novel SL RNA Genes

DNA fragments that account for >90% of the C. elegans genome were amplified in YAC vectors and immobilized on a gridded filter(27) . Novel SL RNA genes were subcloned in the plasmid pGEM7Z, amplified, and then excised to generate templates for the preparation of random-primed P-labeled DNA probes (see ``Experimental Procedures''). High stringency hybridization and washing of the recombinant YAC grids revealed that the SL 3beta and SL3 RNA genes are present in YACs Y43C3 and Y40B9. Overlap between the YACs places the SL3beta and SL3 RNA genes in the center of chromosome I (between the unc-29 and mei-26 genes) on the physical map(27) . Using the same approach, the clustered SL3alpha, SL4, and SL5 RNA genes were mapped to the central portion of chromosome III.

Although perfectly matched probe and target DNAs yielded the most intense signals, hybridization experiments also identified distinct but homologous SL RNA genes. This is illustrated by results obtained for SL2 RNA genes. Radiolabeled probes derived from the full-length SL2alpha RNA gene hybridized maximally (Fig. 7A) with YACs (Y74A11, Y56D12, Y48F5) that place the gene at its previously established locus (on chromosome I) in the physical map. However, two other strong signals (Fig. 7B) and five weaker signals (not shown) were observed on the grid after longer exposures to x-ray film. One strong signal corresponds to YAC Y39H2, which contains the 93% identical SL2beta RNA gene(16) . Another highly conserved SL2-like gene is evidently present in YAC Y50H4. Assuming that each YAC clone contains 1 or 2 copies of SL RNA genes, the results suggest that there are 3-6 SL2 RNA genes and 5-10 closely related genes in the C. elegans genome.


Figure 7: Identification and chromosomal locations of SL2 RNA genes and SL2 RNA gene homologs. A filter grid that contains >90% of the C. elegans genome in an array of YACs was probed with random-primed, P-labeled DNA corresponding to the SL2alpha RNA gene (see ``Experimental Procedures''). Autoradiograms were obtained 8 (A), 48 (B), and 96 h (not shown) after hybridization and washing under stringent conditions as defined in (21) . Signals obtained from YACs Y74A11, Y56D12, and Y48F5 are indicated with arrows; triangles mark the signals from YACs Y39H2 and Y50H4.



When similar analyses were performed with novel SL RNA gene probes, 20 homologous DNA fragments were identified in recombinant YACs. The DNA inserts mapped to all six chromosomes, but none hybridized with probes derived from the SL2alpha RNA gene. Thus, 30 genes direct the synthesis of RNAs that can donate SL2-related and novel spliced leader exons to mRNAs. These results, the identification of SLs for which genes have yet to be cloned (Fig. 1) and the possible occurrence of SL RNA genes that do not hybridize with the SL gene probes used in these studies, suggest that the total numbers of SL1 RNA genes (100 tandem copies on chromosome V) and dispersed non-SL1, SL RNA genes may be similar.

Novel SL RNA Genes Are Transcribed in C. elegans

A sensitive RNase protection assay was used to determine whether the novel SL RNA genes direct the synthesis of full-length SL RNAs in vivo. P-Labeled, antisense RNA probes that are complementary to nucleotides 1-115 in the predicted transcripts were incubated with total C. elegans RNA prior to digestion with RNases A and T(1). A representative autoradiogram (Fig. 8) shows that each novel gene encodes an authentic 115-nt SL RNA. Neither tRNA (Fig. 8, lane3) nor excess nonradioactive SL2alpha RNA (Fig. 9A) protected the labeled probes for the novel SL RNAs. Excess nonradioactive SL3alpha RNA failed to protect the P-labeled SL4 RNA probe, although the two sequences are 93% identical (Fig. 9B). Thus, the assay discriminates between closely related gene products.


Figure 8: Novel SL RNA genes are expressed in C. elegans. The accumulation of full-length SL RNA transcripts was monitored by RNase protection analysis as described under ``Experimental Procedures.'' Assays were performed with 40 µg of total RNA isolated from a mixed population of C. elegans. The P-labeled antisense RNA probes used were complementary to SL1 RNA (lane1), SL2alpha RNA (lane2), SL3beta RNA (lanes3 and 4), and SL3alpha, SL5, SL4, and SL3 RNAs (lanes5-8, respectively). The sample applied to lane3 was hybridized with 40 µg of tRNA instead of total C. elegans RNA. A composite autoradiogram is presented. Signals from lanes1 and 2 were obtained after exposing x-ray film for 3 h, whereas the time of exposure was increased to 48 h for lanes3-8. This experiment was replicated four times. Similar data were obtained in each instance. Typical results are shown.




Figure 9: RNase protection analysis distinguishes among individual SL RNA transcripts. A, RNase protection assays were performed as described under ``Experimental Procedures'' and in Fig. 8with one modification. Nonradioactive SL2alpha RNA (0.5 µg) was substituted for C. elegans total RNA. The P-labeled antisense RNA probes used were complementary to SL1, SL2alpha, SL3beta, SL3alpha, SL5, SL4, and SL3 RNAs (lanes1-7, respectively). An autoradiogram is shown. Panel B shows autoradiographic signals obtained when P-labeled, antisense RNA complementary to SL4 RNA was hybridized with 0.5 µg of nonradioactive SL3alpha (lane1) or SL4 (lane2) RNA. Arrows indicate the position of the full-length, protected probes. Lower molecular weight bands in lane2 (panel A) are apparently due to a low level of partial radiolytic cleavage of the probe. These bands contribute 8% of the protected radioactivity and are specifically protected only by SL2alpha RNA.



The major protected species obtained with the SL2alpha and SL3beta probes were closely spaced doublets (Fig. 8, lanes2 and 4). This may result from ``breathing'' in the RNA duplexes. Alternatively, the two species may reflect heterogeneity at the 3` ends of the poly(A) SL RNAs or differentially modified 5` caps(48) . As expected, the protected fragment of antisense SL1 RNA is 95 nt in length(5) . In addition to the principal protected species, unique patterns of smaller fragments are observed with antisense probes for SL RNAs 2alpha, 3beta, 3alpha, and 3 (Fig. 8, lanes2, 4, 5, and 8). These fragments probably arise from partial protection of the probes by homologous but nonidentical SL RNAs.

Relative levels of novel SL RNAs were measured by PhosphorImager analysis. Transcripts derived from the SL 3alpha, 3beta, 3, 4, and 5 RNA genes are collectively 0.2 ± 0.02% (mean ± S.E., n = 4) as abundant as SL1 RNA and 0.9 ± 0.07% (n = 4) as abundant as SL2alpha RNA. Thus, the steady-state levels of the transcripts of the 5 newly discovered genes account for only a small fraction of SLs that are available for modifying the 5` ends of C. elegans mRNAs.

Determination of the Relative Abundance of Various SLs in C. elegans

Poly(A) RNA-The concentrations of selected individual SL RNAs may not be reliable indicators of the total amounts of SLs that are available for modifying the 5` ends of mRNAs. A minimum of 30 gene loci encode SL RNAs identical with or closely related to the products of the SL 2-5 RNA genes (Fig. 7, and see above). Levels of the individual SL RNAs may vary significantly as functions of their rates of transcription and/or half-lives. Moreover, multiple SL RNAs may donate identical or nearly identical 21-23-nt SLs to mRNAs.

An assay was designed to estimate the levels of specific SLs on intact SL RNA molecules. P-labeled antisense RNA probes corresponding to individual 22- or 23-nt SLs were protected with samples of poly(A) RNA. Conditions of hybridization and RNase digestion were adjusted empirically so that only perfectly matched probes and target RNA sequences yielded significant signals. Typical results are shown in Fig. 10. SL1 is the most abundant spliced leader in C. elegans. However, the amounts of SL exons 3, 4, and 5 in the poly(A) RNA population (Fig. 10) are much higher than the levels of the individual full-length transcripts reported above. The observations that excess SL2 and SL3 RNA sequences do not protect P-labeled SL3 and SL2 antisense probes, respectively (data not shown), verified the specificity of the assays. Quantification of the results with a PhosphorImager revealed that SLs 2-5 are each 5-15% as abundant as SL1 in poly(A) RNA. Collectively, the size of the donor pool of SLs 2-5 is similar to the size of the SL1 donor pool. Evidently, multiple SL RNA genes encode identical 5` SL (2, 3, 4, or 5) sequences.


Figure 10: Detection of novel spliced leader exons (SL3, SL4, and SL5) in the poly(A) fraction of C. elegans RNA. RNase protection analysis was performed as described under ``Experimental Procedures,'' using 30 µg of C. elegans poly(A) RNA. The P-labeled oligonucleotide probes were complementary to the 22 or 23 nucleotides that constitute SL1 (lane1), SL2 (lane2), SL3 (lane3), SL5 (lane4), and SL4 (lane5). An autoradiogram is presented. The gel was calibrated with DNA oligonucleotides of the indicated sizes that were fractionated in parallel lanes. This experiment was replicated three times and the results were essentially the same in each instance.



Under standard conditions of electrophoresis the protected, P-labeled antisense SLs exhibit mobilities similar to the mobility of a DNA marker comprising 26 nucleotides (Fig. 10). Moreover, the SL bands are somewhat diffuse. Several factors may account for these properties: RNA molecules migrate 5-10% more slowly than DNA fragments in this gel system; the 3` ends of the protected antisense RNAs are heterogeneous because irrelevant nucleotides immediately downstream from the SL sequence in the probe can be protected when they match (by chance) downstream nucleotides in intact SL RNAs; the high resolution gel system can separate molecules of the same length on the basis of nucleotide composition; and RNAs lacking the trimethyl guanosine cap migrate faster than capped RNAs. (^4)Protected fragments that migrate more rapidly than the 22-nt DNA marker may be due to the hybridization of the probe with highly homologous but distinct SLs and/or ``breathing'' in the AU-rich region of RNA duplexes that corresponds to the conserved 5` ends of the SL sequences (Fig. 5A).

Novel Spliced Leaders Are Incorporated into trans-Spliced mRNAs in Vivo

Poly(A) RNA from an asynchronous population of C. elegans was probed with antisense SL1-SL5 RNAs in RNase protection assays (Fig. 11). Since poly(A) RNA is removed by oligo(dT) cellulose chromatography, only the 22- or 23-nt segments complementary to SL sequences are protected in RNA-RNA duplexes. Slight differences in mobilities of the protected fragments (Fig. 11) may be ascribed to factors listed above and the larger size (23 nt) of SL 5 (lane4). Collectively, non-SL1 exons are incorporated into mRNAs 8% as frequently as SL1 (Fig. 11, Table 1). SL2 and SL3 are attached to mRNAs with similar frequencies (Table 1). In contrast, SL4 is used only 10% as often as SL2 or SL3.


Figure 11: Detection of novel spliced leader exons incorporated in C. elegans mRNAs. RNase protection assays were performed (see ``Experimental Procedures'') using 1.0 µg of poly(A) RNA for lanes2-5, 0.2 µg of poly(A) RNA for lane1, and 2.3 µg of poly(A) RNA for lane6. Assays were performed with P-labeled probes complementary to SL1 (lane1), SL2 (lane2), SL3 (lane3), SL5 (lane4), and SL4 (lanes5 and 6). Protected fragments from the radiolabeled probes were fractionated by electrophoresis in a 15% polyacrylamide, 7 M urea gel. Signals were visualized by autoradiography. Autoradiographic signals were quantified as described under ``Experimental Procedures,'' and the data are presented in Table 1. Size markers were oligodeoxynucleotides of the indicated size (in nt) that were end-labeled with ATP and T4 polynucleotide kinase.





Northern blots of poly(A) RNA were incubated with radiolabeled antisense DNA corresponding to either SL3 or SL4. Both probes hybridized with a heterogeneous array of mRNAs that ranged from several hundred to >3000 nt in length (Fig. 12). Although high intensity bands of 0.5 and 1.1 kilobases were observed with the SL3-specific probe, it appears that both SLs are incorporated into large constellations of mRNAs. Signal intensities from the Northern blots were measured in a PhosphorImager. Messenger RNAs containing SL3 are 8-fold more abundant than mRNAs with a 5` SL4 exon. Thus, relative frequencies of SL3 and SL4 utilization determined by RNase protection and Northern analyses are in agreement.


Figure 12: SL3 and SL4 are incorporated into numerous C. elegans mRNAs. C. elegans poly(A) RNA (2.5 µg/lane) was denatured and fractionated in a 1% agarose gel as described previously(30) . Resolved mRNAs were transferred to Nytran membranes, and separate blots were probed with P-labeled antisense DNA probes complementary to SL3 (lane1) and SL4 (lane2). Hybridization and washing conditions are described under ``Experimental Procedures.'' RNAs that hybridized with the probes were visualized by autoradiography at -70 °C.



Levels of Certain SL RNAs May Be Regulated during Postembryonic Development

Levels of SL1, SL2alpha, and SL3alpha transcripts are essentially invariant throughout C. elegans development (Fig. 13). In contrast, SL4 RNA is minimally expressed in L1 larvae, and the content of the transcript increases during later development. Adult nematodes have a 6-fold higher level of SL4 RNA than L1 animals. The developmentally controlled limitation of SL4 RNA content in early development may account for its rather infrequent utilization on mRNAs derived from mixed populations of C. elegans (see Fig. 11and Fig. 12, Table 1).


Figure 13: Expression of SL RNAs during C. elegans development. RNase protection analysis was performed as described under ``Experimental Procedures'' using 30 µg of total RNA from L1 larvae, L3 larvae, and young adult (A) animals. P-labeled antisense RNAs complementary to SL1, SL2, SL3alpha, and SL4 RNAs were employed as probes. The fragments protected by SL2, SL3alpha, and SL4 RNAs are 115 nt in length; the fragment protected by SL1 RNA is 95 nt long. Signals were recorded by autoradiography. Quantitative analysis was performed with a PhosphorImager (Molecular Dynamics).



The SL4 RNA Gene Promoter Is Active in Hypodermal Cells

The developmental stage-specific accumulation of SL4 RNA may be due to its restricted expression in a limited number of highly differentiated cells. This possibility was investigated by generating transgenic nematodes, in which the first 40 bp of the SL4 RNA structural gene and 800 bp of contiguous 5`-flanking DNA drive expression of the E. coli lacZ reporter gene. Transgene expression was monitored in individual cells of fixed C. elegans via in situ hybridization (see ``Experimental Procedures'' and Fig. 14). SL4 RNA gene promoter activity was evident only in hypodermal cells of C. elegans. Similar results were obtained when the 5`-flanking DNA was limited to 162 bp immediately adjacent to the SL4 RNA structural gene.


Figure 14: Expression of SL4 RNA gene promoter activity in C. elegans hypodermal cells. Transgenic C. elegans, which contain a lacZ reporter gene downstream from 800 bp of DNA that flanks the 5` end of the SL4 RNA gene, were generated as described under ``Experimental Procedures.'' RNA encoding beta-galactosidase was detected by in situ hybridization with digoxigenin-labeled, antisense DNA as described under ``Experimental Procedures.'' RNA-DNA complexes were visualized by incubating the specimens serially with anti-digoxigenin IgGs coupled to alkaline phosphatase and a chromogenic substrate. IgG-coupled alkaline phosphatase catalyzes the synthesis of an insoluble blue reaction product in cells transcribing the SL4 RNA gene. Nomarski interference microscopy revealed that the histochemical stain appeared in the nuclei of hypodermal cells. A photograph of a stained adult animal (taken with a Zeiss Axioscope microscope at a magnification of times 100) is presented. More than 90% of the transgenic C. elegans exhibited similar staining patterns.




DISCUSSION

Seminal studies by Hirsh and co-workers (5, 6, 8, 16) and Blumenthal and colleagues (9, 13, 49) demonstrated that 5` ends of C. elegans mRNAs are often covalently modified via trans-splicing. Early reports suggested that targeted mRNAs received either of two 22-nt leader sequences (SL1 or SL2) in a mutually exclusive manner(16) . However, recent determinations of sequences at the extreme 5` ends of several C. elegans cDNAs suggested that a larger family of SL RNA genes may be present in the C. elegans genome ( Fig. 1and (18, 19, 20) ). Heretofore, direct evidence for the occurrence of novel SL RNA genes, the transcription of such genes and the utilization of their 5` SL exons for trans-splicing mRNAs in vivo was lacking.

We have now cloned and characterized genes that encode five novel C. elegans SL RNAs. Each gene yields a transcript of 114 nt, which contains several consensus sequences that are predicted to be essential for biological activity (Fig. 3Fig. 4Fig. 5Fig. 6). Three of these genes (encoding SL RNAs 3alpha, 4, and 5) are clustered within 2 kbp of genomic DNA near the center of chromosome III. Despite the tight gene linkage, the corresponding SL RNAs donate three distinct spliced leaders. Head to tail and head to head orientations of SL RNA genes are evident in the cluster, suggesting that transcription may proceed from promoters located in both DNA strands. The observations that composite SL RNA/snRNA promoter elements (PSEs) lie 50 bp upstream from nucleotide 1 in the SL RNA coding sequences (i.e. the predicted transcription start site) support this idea (Fig. 3, Fig. 4, and Fig. 6).

Two other novel SL RNA genes (3beta, 3) and associated PSEs were sequenced and mapped to chromosome I. SL RNA 3beta and 3 transcripts provide identical 5` leader exons, although their 3` sequences differ significantly ( Fig. 4and Fig. 5A).

The derived 114-nt SL 3alpha-, SL4, and SL5 RNAs are more homologous with SL2 RNAs than the SL1 transcript (Fig. 5A). Likewise, the novel 22- or 23-nt spliced leader sequences (SLs 3-5) are more closely related to SL2 than SL1. These homologies and the ability of multiple SL RNA genes to donate identical leader sequences (see Fig. 3Fig. 4Fig. 5and (16) ) suggested that SL2-5 RNA genes may be members of a larger subfamily. A minimum estimate of the size of this group of genes was obtained by hybridizing filters, which contain an ordered, overlapping array of DNA fragments comprising most of the C. elegans genome, with P-labeled probes for the SL2, SL3, SL4, and SL5 RNA genes. Approximately 30 loci that are identical or closely related to the SL2-5 RNA genes were identified on the autosomal and X chromosomes. Thus, this gene family is both large and dispersed over a substantial portion of the total genome.

During the preparation of this paper, sequences of overlapping cosmid inserts derived from C. elegans chromosome III were deposited in the GenBank data base, in the context of the C. elegans genome project(50) . The cosmid designated CEL B0280 (accession number U10438) contains the clustered SL3alpha, SL4, and SL5 RNA genes. The sequence we determined for the genomic DNA insert in SLA (Fig. 3) is identical with that reported for cosmid CEL B0280. However, the presence of three SL RNA genes and their associated PSEs was not detected by the methods employed in the genome project. Analysis of the data from the C. elegans genome project with the Genefinder program predicts that the three SL RNA genes in the cluster lie within introns of a gene that encodes a putative glutamate receptor subunit(50) . Placement of small SL RNA genes within introns of a larger gene may reflect the efficient utilization of C. elegans' relatively small genome (2 times 10^8 bp). The nematode genome, which is only 5% as large as the human genome, contains many introns and intergenic DNA sequences that are considerably smaller than their counterparts in mammalian systems(51) . As additional data on the C. elegans genome are generated it will be possible to determine whether dispersed SL RNA genes are typically positioned within other genes or are distributed in a more randomized manner along the chromosomes.

Sharp(52) , Nilsen(39) , and Blumenthal and Thomas (51) suggested SL RNAs are functional chimeras composed of a 5` exon that is fused to an snRNA-like downstream sequence. By performing in vitro mutagenesis and assaying trans-splicing in extracts of Ascaris embryos, Nilsen and colleagues (38, 39) demonstrated that the intronic, snRNA-like region of an SL RNA can deliver a variety of natural and synthetic 5` exons to acceptor mRNAs. SL RNA sequence elements that are essential for trans-splicing are the Sm protein(s) binding site, a trinucleotide sequence immediately downstream from the Sm site, a pentanucleotide sequence 15 nt upstream from the Sm site and a 5` splice site (Fig. 5B). Consensus sequences for each of these elements appear in the novel C. elegans SL RNA transcripts as well as the classical SL2 and SL1 RNAs, thereby suggesting that the newly discovered genes encode components of SL RNPs and trans-spliceosomes.

In the closely related nematode Ascaris, the 22-nt SL DNA sequence functions as a promoter element that is essential for transcription by RNA polymerase II(53) . The high degree of sequence conservation observed at the 5` ends (nucleotides 1-12) of C. elegans SLs, the repetitive utilization of certain spliced leader exons (e.g. SL3 and SL2) on multiple SL RNA gene transcripts, and the discovery of composite SL1/SL2 PSEs 50 bp upstream from the novel SL RNA structural genes (Fig. 6) suggest the speculation that these sequence elements and their corresponding trans-acting proteins may coordinately drive expression of RNAs that deliver multiple ``isoforms'' of 5`-untranslated exons to C. elegans mRNAs.

SL RNA transcripts encoded by the C. elegans SL3alpha-, SL4, and SL5 RNA genes are produced in vivo (Fig. 8). Although the levels of these individual transcripts are low (1% of the abundance of SL2 RNA), the total pools of novel SLs in poly(A) RNA (full-length SL RNAs) and the amounts of novel SLs transferred to mRNAs are similar to the levels of SL2 in the poly(A) and poly(A) fractions of C. elegans RNA ( Fig. 10and Fig. 11, Table 1). This is probably due to the occurrence of a limited subset of shared SL exons that are components of a substantially larger group of related but distinct SL RNA genes. Differences in the activities of individual SL RNA gene promoters and/or the stabilities of SL RNAs may also contribute to the net accumulation and utilization of novel SL RNAs.

In instances where reverse transcription-PCR has been employed to determine sequences at the 5` termini of specific mRNAs, it appears that the novel SL exons listed in Table 1are used in trans-splicing reactions with frequencies that match the utilization rate of the classical SL2 sequence(18, 19) . These results provide direct experimental evidence for the in vivo functionality of novel SL RNAs. The appearance of SL3 on mRNA encoding the sex determination factor TRA-2 (18) documents a linkage among a novel SL RNA gene(s), its transcription in vivo, and the use of the 5` SL exon of the transcript to modify a specific mRNA via trans-splicing.

Many C. elegans mRNAs receive a 5` leader exon from transcripts of the repeated SL1 RNA genes. The precise physiological significance of trans-splicing with SL1 is not known. Since the SL usually terminates within 1-50 nt of the initiator AUG, it seems probable that trans-splicing eliminates upstream, out-of-frame AUG codons and deletes long 5`-untranslated sequences capable of folding into secondary structures that inhibit translation. In addition to optimizing translation efficiency, modification with the SL may alter the stability of the target mRNA.

Messenger RNAs that contain SL1 are encoded by conventional structural genes that are preceded by contiguous promoter sequences. However, 25% of C. elegans genes are organized as operons, in which transcription of 2-5 mRNAs is driven by a single 5` promoter/enhancer region(4, 13) . Exact sequences of 5` ends of mRNAs derived from downstream genes in C. elegans operons have been determined only in a few instances (reviewed in Refs. 4 and 13). The data indicate that SL2 is appended only to mRNAs encoded by downstream genes. Moreover, polycistronic mRNAs are not cleaved and trans-spliced if upstream polyadenylation signals are eliminated by mutation(13) . On the basis of these considerations, Speith et al.(13) proposed that SL2 RNA plays a specialized role in processing polycistronic mRNAs. A plausible suggestion is that RNPs containing SL2 RNA bind to proteins that catalyze 3`-end processing of pre-mRNAs(13) . Such complexes would mediate the 3` cleavage and polyadenylation of an upstream mRNA and simultaneously place the SL2 RNP in proximity with the trans-splice acceptor site at the 5` end of the neighboring downstream mRNA.

The SL3 exon, which appears at the 5` ends of three SL RNA structural genes described in this paper, and additional novel SLs (Table 1) are incorporated at the 5` termini of mRNAs for TRA-2 and/or protein kinase C1A and casein kinase IIbeta(18, 19) . Each of these mRNAs is derived from a gene that occupies a downstream position in an independent operon(13) . These observations and a conserved structural relationship with SL2 RNA (see above) strongly suggest that SL3alpha-, SL4, and SL5 RNAs play important roles in the generation of efficiently translated, monocistronic RNAs from the internal segments of polycistronic transcripts. The molecular basis for the exclusion of SL1 RNA and the utilization of a family of non-SL1 RNAs for this mode of pre-mRNA processing is unknown. One speculative suggestion is that a partially conserved sequence near the 3` ends of SL RNAs 2-5 (nt 94-114) provides a binding site for a protein(s) involved in the 3` cleavage and/or polyadenylation of mRNAs. The smaller SL1 RNA transcript lacks this sequence.

The relative abundance of SL4 RNA increases 6-fold during postembryonic development (Fig. 13). Moreover, the putative promoter for the SL4 RNA gene is active principally in hypodermal cells (Fig. 14). These results suggest that the SL4 leader might selectively modulate mRNA translation, stability etc, in a subset of hypodermal transcripts during late development. Furthermore, novel upstream promoter/enhancer elements may control the level and cell-specific expression of SL4 RNA transcripts. These possibilities must be regarded with caution in the absence of (a) knowledge of the properties of specific mRNAs that are trans-spliced with SL4 and (b) systematic analysis of the gene promoter by mutagenesis. Moreover, patterns of promoter activity must be established for other SL RNA genes and compared with that observed for the SL4 RNA gene. Nevertheless, the observations indicate that potential regulatory elements in the SL4 sequence and SL4 RNA gene promoter/enhancer merit further study.


FOOTNOTES

*
This work was supported in part by National Institutes of Health Grant DK44597 (to C. S. R.) and Medical Scientist Training Program Grant GM7288 (to L. H. R.) The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U29449[GenBank], U29490[GenBank], and U29491[GenBank].

§
Present address: Duke University, School of the Environment, Box 90328, Room A254, LSRC, Durham, NC 27708-0328.

(^1)
The abbreviations used are: snRNP and snRNA, small nuclear RNP and RNA, respectively; RNP, ribonucleoprotein; SL, spliced leader; PCR, polymerase chain reaction; YAC, yeast artificial chromosome; nt, nucleotide(s); bp, base pair(s); kbp, kilobase pair(s); PBS, phosphate-buffered saline; PSE, proximal site element.

(^2)
E. Hu, R. Y. Lin, and C. S. Rubin, unpublished observations.

(^3)
G. Seydoux and A. Fire, personal communication.

(^4)
L. H. Ross and C. S. Rubin, unpublished observations.


ACKNOWLEDGEMENTS

We thank Ann Marie Alba for expert secretarial services.


REFERENCES

  1. Guthrie, C., and Patterson, B. (1988) Annu. Rev. Genet. 22,387-419 [CrossRef][Medline] [Order article via Infotrieve]
  2. Steitz, J. A., Black, D. L., Gerke, V., Parker, K. A., Kr ä mer, A., Frendeqay, D., and Keller, W. (1988) in Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles (ed. M. L. Birnstiel) pp. 115-154, Springer-Verlag, New York, NY
  3. Moore, M. J., Query C. C., and Sharp, P. A. (1993) in The RNA World (Gesteland, R. F., and Atkins, J. F., eds) pp. 303-357, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
  4. Zorio, D. A. R., Cheng, N. N., Blumenthal, T., and Speith, J. (1994) Nature 372,270-272 [CrossRef][Medline] [Order article via Infotrieve]
  5. Krause, M., and Hirsh, D. (1987) Cell 49,753-761 [Medline] [Order article via Infotrieve]
  6. Bektesh, S. L., and Hirsh, D. (1988) Nucleic Acids Res. 16,5692 [Medline] [Order article via Infotrieve]
  7. Hannon, G. J., Maroney, P. A., Denker, J. A., and Nilsen, T. W. (1990) Cell 61,1247-1255 [Medline] [Order article via Infotrieve]
  8. Bektesh, S., Van Doren, K., and Hirsh, D. (1988) Genes & Dev. 2,1277-1283
  9. Conrad, R., Thomas, J., Spieth, J., and Blumenthal, T. (1991) Mol. Cell. Biol. 11,1921-1926 [Medline] [Order article via Infotrieve]
  10. Hannon, G. J., Maroney, P. A., and Nilsen, T. W. (1991) J. Biol. Chem. 266,22792-22795 [Abstract/Free Full Text]
  11. Van Doren, K., and Hirsh, D. (1990) Mol. Cell. Biol. 10,1769-1772 [Medline] [Order article via Infotrieve]
  12. Liou, R. F., and Blumenthal, T. (1990) Mol. Cell. Biol. 10,1764-1768 [Medline] [Order article via Infotrieve]
  13. Spieth, J., Brooke, G., Kuersten, S., Lea, K., and Blumenthal, T. (1993) Cell 73,521-532 [Medline] [Order article via Infotrieve]
  14. Matthews, K. R., Tschudi, C., and Ullu, E. (1994) Genes & Dev. 8,491-501
  15. LeBowitz, J. H., Smith, H. Q., Rusche, L., and Beverley, S. M. (1993) Genes & Dev. 7,996-1007
  16. Huang, X. Y., and Hirsh, D. (1989) Proc. Natl. Acad. Sci. U. S. A. 86,8640-8644 [Abstract]
  17. Frohman, M. A., Dush, M. K., and Martin, G. R. (1988) Proc. Natl. Acad. Sci. U. S. A. 85,8998-9002 [Abstract]
  18. Kuwabara, P. E., Okkema, P. G., and Kimble, J. (1992) Mol. Biol. Cell 3,461-473 [Abstract]
  19. Land, M., Islas-Trejo, A., and Rubin, C. S. (1994) J. Biol. Chem. 269,14820-14827 [Abstract/Free Full Text]
  20. Hu, E., and Rubin, C. S. (1991) J. Biol. Chem. 266,19796-19802 [Abstract/Free Full Text]
  21. Hu, E., and Rubin, C. S. (1990) J. Biol. Chem. 265,5072-5080 [Abstract/Free Full Text]
  22. Land, M., Islas-Trejo, A., Freedman, J., and Rubin, C. S. (1994) J. Biol. Chem. 269,9234-9244 [Abstract/Free Full Text]
  23. Emmons, S. W., and Yesner, L. (1984) Cell 36,599-605 [Medline] [Order article via Infotrieve]
  24. Thein, S. L., and Wallace, R. B. (1986) in Human Genetic Diseases (Davis, K. E., ed.) pp. 33-50, IRL Press, Oxford
  25. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U. S. A. 74,5463-5467 [Abstract]
  26. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) J. Mol. Biol. 215,403-410 [CrossRef][Medline] [Order article via Infotrieve]
  27. Coulson, A., Kozono, Y., Lutterbach, B., Shownkeen, R., Sulston, J., and Waterston, R. (1991) BioEssays 13,413-417 [Medline] [Order article via Infotrieve]
  28. Coulson, A., Sulston, J., Brenner, S., and Karn, J. (1986) Proc. Nat. Acad. Sci. U. S. A. 83,7821-7825 [Abstract]
  29. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual , 2nd Ed., pp. 7.26-7.29, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
  30. Sul, H. S., Wise, L. S., Brown, M. L., and Rubin, C. S. (1984) J. Biol. Chem. 259,1201-1205 [Abstract/Free Full Text]
  31. Cummins, C., and Anderson, P. (1988) Mol. Cell. Biol. 8,5339-5349 [Medline] [Order article via Infotrieve]
  32. Milligan, J. F., Groebe, D. R., Witherell, G. W., and Uhlenbeck, O. C. (1987) Nucleic Acids Res. 15,8783-8798 [Abstract]
  33. Fire, A., Harrison, S. W., and Dixon, D. (1990) Gene (Amst.) 93,189-198 [CrossRef][Medline] [Order article via Infotrieve]
  34. Mello, C. C., Kramer, J. M., Stinchcomb, D., and Ambros, V. (1991) EMBO J. 10,3959-3970 [Abstract]
  35. Freedman, J. H., Slice, L. W., Dixon, D., Fire, A., and Rubin, C. S. (1993) J. Biol. Chem. 268,2554-2564 [Abstract/Free Full Text]
  36. Tautz, D., and Pfeifle, C. (1989) Chromosoma 98,81-85 [Medline] [Order article via Infotrieve]
  37. Patel, N. H., and Goodman, C. S. (1992) in Nonradioactive Labeling and Detection of Biomolecules (Kessler, C., ed.) pp. 377-381, Springer-Verlag, Berlin
  38. Maroney, P. A., Hannon, G. J., Shambaugh, J. D., and Nilsen T. W. (1991) EMBO J. 10,3869-3875 [Abstract]
  39. Nilsen, T. W. (1993) Annu. Rev. Microbiol. 47,413-440 [CrossRef][Medline] [Order article via Infotrieve]
  40. Hannon, G. J., Maroney, P. A., Yi, Y. T., Hannon, G. E., and Nilsen, T. W. (1992) Science 258,1775-1780 [Medline] [Order article via Infotrieve]
  41. Lerner, M. R., and Steitz, J. A. (1979) Proc. Natl. Acad. Sci. U. S. A. 76,5495-5499 [Abstract]
  42. Van Doren, K., and Hirsh, D. (1988) Nature 335,556-559 [CrossRef][Medline] [Order article via Infotrieve]
  43. Thomas, J. D., Conrad, R. C., and Blumenthal, T. (1988) Cell 54,533-539 [Medline] [Order article via Infotrieve]
  44. Bruzik, J. P., Van Doren, K., Hirsh, D., and Steitz, J. A. (1988) Nature 335,559-562 [CrossRef][Medline] [Order article via Infotrieve]
  45. Mattaj, I. W. (1988) in Small Nuclear Ribonucleoprotein Particles , (M. L. Birnstiel, ed.) pp. 100-114, Springer-Verlag, New York
  46. Nilsen, T. W., Shambaugh, J., Denker, J., Chubb, G., Fraser, C., Putnam, L., and Bennett, K. (1989) Mol. Cell. Biol. 9,3543-3547 [Medline] [Order article via Infotrieve]
  47. Thomas, J., Lea, K., Aprison, I., and Blumenthal, T. (1990) Nucleic Acids Res. 18,2633-2642 [Abstract]
  48. Ullu, E., and Tschudi, C. (1990) Nucleic Acids Res. 18,3319-3326 [Abstract]
  49. Conrad, R., Liou, R. F., and Blumenthal, T. (1993) EMBO J. 12,1249-1255 [Abstract]
  50. Wilson, R., Ainscough, R., Anderson, K., Baynes, C., Berks, M., Bonfield, J., Burton, J., Connell, M., Copsey, T., Cooper, J., Coulson, A., Craxton, M., Dear, S., Du, Z., Durbin, R., Favello, A., Fraser, A., Fulton, L., Gardner, A., Green, P., Hawkins, T., Hillier, L., Jier, M., Johnston, L., Jones, M., Kershaw, J., Kirsten, J., Laisster, N., Latreille, P., Lightning, J., Lloyd, C., Mortimore, B., O'Callaghan, M., Parsons, J., Percy, C., Rifken, L., Roopra, A., Saunders, D., Shownkeen, R., Sims, M., Smaldon, N., Smith, A., Smith, M., Sonnahammer, E., Staden, R., Sulston, J., Thierry-Mieg, J., Thomas, K., Vaudin, M., Vaughan, K., Waterston, R., Watson, A., Weinstock, L., Wilkinson-Sproat, J., and Wohldman, P. (1994) Nature 368, 32-38 [CrossRef][Medline] [Order article via Infotrieve]
  51. Blumenthal, T., and Thomas, J. (1988) Trends Genet. 4,305-308 [CrossRef][Medline] [Order article via Infotrieve]
  52. Sharp, P. A. (1987) Cell 50,147-148 [Medline] [Order article via Infotrieve]
  53. Hannon, G. J., Maroney, P. A., Ayers, D. G., and Nilsen, T. W. (1990) EMBO J. 9,1915-1921 [Abstract]

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.