Multiple Ribonuclease H–Encoding Genes in the Caenorhabditis elegans Genome Contrasts with the Two Typical Ribonuclease H–Encoding Genes in the Human Genome

Arulvathani Arudchandran*, Susana M. Cerritelli*, Nathan J. Bowen{dagger}, Xiongfong Chen{ddagger}, Michael W. Krause§ and Robert J. Crouch*

*Laboratory of Molecular Genetics,
{dagger}Laboratory of Gene Regulation and Development,
{ddagger}Unit on Biologic Computation, National Institute of Child Health and Human Development,
§Laboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Database searches of the Caenorhabditis elegans and human genomic DNA sequences revealed genes encoding ribonuclease H1 (RNase H1) and RNase H2 in each genome. The human genome contains a single copy of each gene, whereas C. elegans has four genes encoding RNase H1–related proteins and one gene for RNase H2. By analyzing the mRNAs produced from the C. elegans genes, examining the amino acid sequence of the predicted protein, and expressing the proteins in Esherichia coli we have identified two active RNase H1–like proteins. One is similar to other eukaryotic RNases H1, whereas the second RNase H (rnh-1.1) is unique. The rnh-1.0 gene is transcribed as a dicistronic message with three dsRNA-binding domains; the mature mRNA is transspliced with SL2 splice leader and contains only one dsRNA-binding domain. Formation of RNase H1 is further regulated by differential cis-splicing events. A single rnh-2 gene, encoding a protein similar to several other eukaryotic RNase H2L's, also has been examined. The diversity and enzymatic properties of RNase H homologues are other examples of expansion of protein families in C. elegans. The presence of two RNases H1 in C. elegans suggests that two enzymes are required in this rather simple organism to perform the functions that are accomplished by a single enzyme in more complex organisms. Phylogenetic analysis indicates that the active C. elegans RNases H1 are distantly related to one another and that the C. elegans RNase H1 is more closely related to the human RNase H1. The database searches also suggest that RNase H domains of LTR-retrotransposons in C. elegans are quite unrelated to cellular RNases H1, but numerous RNase H domains of human endogenous retroviruses are more closely related to cellular RNases H.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Data generated from the various genome-sequencing projects provide useful information on diversity in structure and functions of genes and proteins, providing us with a foothold to gain an understanding about related proteins from various organisms. Completion of the genomic sequences of Caenorhabditis elegans (The C. elegans Sequencing Consortium 1998Citation ) and humans (Lander et al. 2001Citation ; Venter et al. 2001Citation ) directed our interest toward the characterization of ribonuclease H (RNase H) homologues of these multicellular animals.

The RNases H in a variety of organisms have been studied extensively with respect to structure, function, and enzymatic properties on the basis of their specific degradation of the RNA in RNA-DNA hybrids (Crouch and Toulmé 1998Citation , pp. 1–265). The RNases H participate in cellular processes, such as DNA replication, repair, and transcription, as well as in the replication of retroviral genomes. Esherichia coli has two RNases H (HI and HII), each having its own characteristic amino acid sequence (Ohtani et al. 1999a,Citation 1999bCitation ). Bacillus subtilis also has two active RNases H, both related by sequence to E. coli RNase HII (Itaya et al. 1999Citation ; Ohtani et al. 1999a,Citation 1999bCitation ). Despite their similarity at the amino acid sequence level, these two B. subtilis proteins have very different specific activities, specificities of cleavage sites, and strikingly different divalent metal ion preferences and, therefore, have been classified as RNases HII and HIII. Bacillus subtilis has a gene encoding a protein with strong sequence similarity to E. coli RNase HI but lacks a portion of the basic protrusion and has other changes that render the protein nonfunctional as an RNase H (Itaya et al. 1999Citation ; Ohtani et al. 1999aCitation ). The presence of a gene encoding an inactive RNase HI in B. subtilis and the disparate activities of RNases HII and HIII point out the difficulties in assigning a function merely on the basis of amino acid sequence similarity.

Thus far, at least one gene encoding an RNase H–like protein is present in all prokaryotic and archael genomes (Ohtani et al. 1999bCitation ). Most often there are two genes, either a combination of HI and HII or HII and HIII. Little is known about the number and types of RNases H in eukaryotes. Two proteins from mammalian sources and Saccharomyces cerevisiae (RNase H1 and RNase H2L) (Crouch and Cerritelli 1998Citation ; Frank, Braunshofer-Reiter, and Wintersberger 1998Citation ) have been shown to be related by amino acid sequence to E. coli RNases HI and HII. A third RNase H, RNase H70 (Frank et al. 1999Citation ), also is present in S. cerevisiae that has sequence similarity to several other proteins in S. cerevisiae, including Rex3P (RNA exonuclease), Rex4P, and Pan2P, the last being a subunit of the polyA ribonuclease. All these proteins are related to exonuclease III, an enzyme known for many years to degrade RNA of RNA-DNA hybrids (Keller and Crouch 1972Citation ). At present, there is no clear consensus amino acid sequence that will permit defining a protein related to RNase H70 as an RNase H.

An analysis of the genome sequence and EST data from C. elegans reveals four RNase HI–like genes. This large number of potential RNases H in a single organism is unprecedented, and there are features of the predicted proteins that are unique. In contrast, the human genome has two genes encoding RNase H1 and RNase H2. We characterized cDNAs obtained from the mRNA of C. elegans for different RNases H and identified unique structural features of some proteins. The splicing events for the regulation of the rnh-1.0 gene, their phylogenetic relationship, and the enzymatic properties of different RNases H expressed in E. coli are also described.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Bacterial Strains
The E. coli strains used in this study were TOP10F' (Invitrogen), DH5{alpha} (Life Technologies), BL21(DE3)pLysS (Novagen), MIC1066 [rnhA-339::cat recB270(Ts)] (Cazenave, Mizrahi, and Crouch 1998Citation ).

Computer Analysis
Caenorhabditis elegans BLAST searches (Altschul et al. 1990Citation ) were performed at either the Sanger (http://www.sanger.ac.uk/Projects/C_elegans/blast_server.html) or National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/BLAST/) web sites. Query RNase H protein sequences were E. coli RNase HI and RNases H1 of S. pombe, S. cerevisiae, and human. The human genome sequences at NCBI or at Celera were examined separately. Query sequences for the NCBI site were the four RNases H used to search the C. elegans genome plus the four RNase H1–related proteins of C. elegans derived from this work. Accession numbers from each of the eight files were combined. Duplicate accession numbers were eliminated, creating a file used for Batch Entrez (http://www.ncbi.nlm.nih.gov:80/entrez/batchentrez.cgi?db=Nucleotide). The output from the Batch Entrez search is the Unmasked database. The Unmasked database was searched by the hurep.ref and hurep.sub file (http://www.girinst.org/server/RepBase/) (Jurka 2000Citation ), removing repeat sequences and generating the Masked database. The Masked database was searched using local tBLASTn with each of the RNase H queries to determine which sequences were selected by all or only some of the protein queries. The Celera database (http://publication.celera.com) was searched using the human RNase H1 protein and mRNA (cDNA) sequence as queries using BLASTp and BLASTn, respectively. Analyses of DNA and protein sequences were done using Wisconsin Package Version 10.0 (Genetics Computer Group [GCG], Madison, Wisc.).

Phylogenetic Analysis
The phylogenetic analysis was performed on the multiple sequence alignment shown in figure 1A using the maximum likelihood method implemented in version 3.6a2.1 of the PHYLIP package (Felsenstein 2001Citation ). The alignment was generated using PILEUP of the GCG package, and adjustments were made, taking into account the structural data for E. coli RNase HI. The sequence alignment was converted to the PHYLIP format, and SEQBOOT was used to generate 100 data replicates. The data were subsequently analyzed with PROML (constant rate of change), followed by NEIGHBOR, and finally with CONSENSE to generate the bootstrapped tree. The phylogram was rooted with E. coli RNase HI, and the tree was visualized with TreeViewPPC version 1.5.3 (Page 1996Citation ).



View larger version (64K):
[in this window]
[in a new window]
 
Fig. 1.—Protein sequences of C. elegans RNases H homologous to RNases H1. Amino acid sequences of RNase HI–like proteins are shown. (A) RNase H domains of the C. elegans proteins are aligned with E. coli RNase HI (Ecoli HI), HIV-1 RNase H portion of HIV-1 reverse transcriptase, and Cer1-1, a retrotransposon of C. elegans. Highly conserved residues (D10, E48, D70, H124, N130, and D134) in RNases H are marked above the sequences with the amino acid numbers of the E. coli RNase HI enzyme. For E. coli and HIV-1 proteins, secondary structure regions are underlined and noted below the amino acid as sequences of ß-strands and {alpha}-helicies. Amino acids of E. coli RNase HI of {alpha}E that interacts with ß2 and ß3 are marked by *. Intron locations are indicated by the use of reverse shaded letters. Several RNases H have N-terminal amino acid extensions that are indicated by dots (.). Adjustments in the postions of amino acids to generate the alignments shown are indicated by dashes (-). (B) The N-terminal, non–RNase H domain of RNase H1A is presented with three segments of sequence highlighted. The first boxed sequence is rich in Gly residues (bold G), the second boxed region is Ser-rich (bold S), and the third region has several RS repeats (bold and underlined). The RS-containing region is still quite Ser-rich. (C) The duplex RNA–binding domains of C. elegans RNase H1, indicated as R1CE, R2CE, and R3CE, are aligned with corresponding sequences of human (Hu), mouse (Mo), and S. cerevisiae (SCR1) RNases H1. The AR sequence in R3CE is the site at which the 19-nt insertion described in the text results in premature termination of the protein. The duplex RNA–binding domain is underlined in the human RNase H1 sequence (amino acids 28–71), and the secondary structure of the S. cerevisiae RNase H1 duplex RNA–binding domain is indicated by underlines with ß-strands in normal font and {alpha}-helicies in italics

 
cDNA Cloning
The mRNA isolated from the total C. elegans RNA (Krause 1995Citation ) was reverse transcribed (Life Technologies) to synthesize the first strand of cDNA by using the oligo dT primer. Primers specific for corresponding rnh genes were designed on the basis of the sequence information obtained from the splicing predicted by the GENEFINDER program (table 1 ). More than one set of primers was used in PCR reactions to characterize each type of cDNA clone. Total PCR product or the gel purified (Gene clean II kit, Bio 101) product was cloned into the pCRII-TOPO TA cloning vector and transformed into TOP1OF' competent cells (Invitrogen). More than two different isolates were sequenced in each case from two different RNA preparations.


View this table:
[in this window]
[in a new window]
 
Table 1 Oligonucleotides

 
Northern Blotting
Five micrograms of mRNA isolated from a mixed-stage population of C. elegans was run on a 1.0% agarose-LE gel and blotted to Bright Star Plus nylon membrane (Ambion). Northern hybridization was carried out by using the NorthernMax-Gly protocol, as described by the manufacturer (Ambion). 32P-Labeled antisense transcripts were generated by in vitro transcription (Promega) using DNA for the R1-R2 dsRNA-binding domains or the RNase H domain.

Transcript Analysis
The 5'– and 3'–rapid amplification of cDNA end (RACE) reactions were carried out to analyze transcripts of different RNase H genes. The 3', 5'-RACE primers (Life Technologies) with gene-specific primers were used in PCR reactions to characterize the 3'- and 5'-ends of the messages. In some cases, primers specific for splice leaders (Huang and Hirsh 1989Citation ) were used with the gene-specific primers to identify the 5'-end of messages. To confirm that the PCR reaction products were indeed derived from the target mRNA when splice leader and gene-specific primers were used in PCR reactions, Southern analysis of the PCR products was carried out using the appropriate probes. Two different mRNA preparations were used.

Expression in E. coli
Once the complete cDNA sequence was obtained and the coding region determined, primers were synthesized to amplify the coding regions such that an NdeI or NcoI site was at the first Met codon and the downstream primer included a BamHI or XhoI restriction enzyme site. The PCR products were cloned into pCRII-TOPO and digested with the appropriate enzymes, and the fragment was cloned into the pET15b expression vector (Novagen) and transformed into the BL21(DE3) pLysS E. coli strain (Novagen). Cells were grown at 32°C to mid-log phase, and expression was induced by the addition of IPTG (final concentration 1 mM), followed by incubation for 3 h. HIS-tagged proteins were purified from HIS-bind columns, as described by the manufacturer (Novagen, Clontech).

RNase H Activity
In Gel Assay
Renaturation gel assays were carried out with the partially purified HIS-tagged proteins (Han, Ma, and Crouch 1997Citation ; Cazenave, Mizrahi, and Crouch 1998Citation ). Renaturation was carried out either with Mg2+ or Mn2+ ions in the buffer. Autoradiograms were developed upon exposing gels to films.

Complementation Assay
The pET15b vectors harboring different rnh-like cDNAs derived from mRNAs of C. elegans were transformed into the MIC1066 E. coli strain (Cazenave, Mizrahi, and Crouch 1998Citation ). Transformants were plated on LB-amp plates at 32 and 42°C. Growth at 42°C indicates that a functional RNase H is present in MIC1066. The pET15b vector was used as a control in this study.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Database Searches for RNase H–Related Proteins in the Genomes of C. elegans and Humans
Caenorhabditis elegans
Using tBLASTn with RNases HI or H1 of E. coli, S. pombe, S. cerevisiae, and human, four C. elegans genes related to RNases HI or H1 were found, each having several of the hallmarks of RNases HI or H1. Another gene related by sequence to RNase HII of E. coli was also observed. Table 2 shows the results of tBLASTn searches against the Sanger Centre and NCBI Databases. F59A6.6, ZK938.7, and C04F12.9 give very significant "expect" values (see table 2 ) when any of the four RNases H is used as the query sequence. In contrast, ZK1290.6 was detected with low probability scores by only two of the RNases H queries. The alignments indicate that most but not all of the canonical RNase H residues are present in the predicted amino acid sequences (Ohtani et al. 1999bCitation ). In addition to the four genes shown in table 2 , a few genes having low probability scores were detected. Upon examining the sequences carefully, we found these additional genes encode proteins that had none or only some of the highly conserved amino acid residues of RNases H.


View this table:
[in this window]
[in a new window]
 
Table 2 Four RNase HI(1)'s Versus Caenorhabditis elegans Database

 
Despite the similarity of E. coli RNase HI and eukaryotic RNases H1 to the sequence of RNase H found in reverse transcriptases and retrotransposons, none of the proteins in table 2 is from the retroviral elements of C. elegans (Britten 1995Citation ; Bowen and McDonald 1999Citation ). Unlike the reverse transcriptase of HIV-1, the RNases H of the C. elegans retrotransposons are so distantly related to the cellular RNases H used as query sequences that BLAST does not detect them. We believe that we have uncovered several interesting genes that are useful for studying RNases H in C. elegans, but because of our failure to find any C. elegans retroviral sequence, there may be more cellular rnh genes to be found. Only one candidate clone is found when using any of several RNase HII– or RNase H2L–like sequences as the query amino acid sequence (data not shown).

Human Genome
Having found several RNase H1 proteins in C. elegans, we searched the human genomic sequence for genes encoding RNase H1–like proteins. Although gaps are still present in the genomic sequence of the human DNA, we were unable to uncover any RNase H1–encoding genes except for the RNASEH1 gene on chromosome 2 (our modification of AC108488—see supplementary material at MBE web site: http://www.molbiolevol.org) and two pseudogenes; one located on chromosome 17p11.2 (AC022596.9) and one on chromosome 1q32.1-4 (AL035414.30) (supplementary material). Unlike the search of C. elegans, the human DNA has numerous retroviral sequences that yield very low (significant) expect scores (supplementary material). One class of these is shown in figure 2 and is a member of the human endogenous retrovirus L (HERVL) family. Elimination of these sequences for further examination using RepMask sequences reduced the total number of sequences to about 47 (supplementary material). These sequences score well in the BLAST searches because of the three highly conserved Trp residues because the substitution matrix Blosum62 (Henikoff S. and Henikoff J. G. 1992Citation ) credits 11 points to Trp residues compared with 4 points for Leu-Leu matches. Interestingly, the conserved Trp residues are in the {alpha}B-{alpha}C-{alpha}D–region (fig. 1A ), where HIV-1 RT has a deletion when compared with cellular RNases H1. Thus, the RNase H domain of HERVL elements are more similar to cellular RNase H1 than to the RNase H domain of HIV-1.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 2.—BLAST-selected HERVL (retroviral) sequence in human genome. One example of retroelement in the human genome whose BLAST selection is enhanced by the presence of three highly conserved Trp residues in the {alpha}B-{alpha}C-{alpha}D–region. Esherichia coli RNase HI is the Query sequence, and the found sequence is in a Human Endogenous RetroVirus L sequence

 
Four C. elegans Genes Produce RNase H1–like Proteins
The four RNase H1–related genes detected by the database searches were examined by cloning cDNAs generated by reverse transcription, followed by PCR amplification. We examined several independently derived clones for each gene. Amino acid sequences of the proteins expressed from these cDNAs are shown in figure 1A. Each of the DNAs was cloned into pET15b for expression as a His-RNase H fusion protein in E. coli. Determination of RNase H activity of these four proteins expressed in E. coli was accomplished by two different assays. A third, typical assay for RNase H activity based on measuring the degradation of RNA-DNA hybrids in a solution-based procedure was of limited usefulness because of the limited solubility of the proteins expressed in E. coli.

In Gel Assay
Results obtained from the gel renaturation assays with labeled RNA-DNA hybrids are presented in figure 3 . RNase H1 (F59A6.6) and RNase H1A (ZK1290.6) exhibited enzymatic activity in this assay with activity detected in two bands for both samples. The major activity of RNase H1 (marked with an arrow) is coincident with the stained protein with some minor activity migrating at a position of a dimer of the 33-kDa protein. We occasionally see dimer bands in this gel assay (Han, Ma, and Crouch 1997Citation ). RNase H1A activity marked by the arrow corresponds to the full-length protein, whereas the band at about 35 kDa most likely represents activity derived from a proteolytic product of RNase H1A. Proteins containing only the RNase H domain of RNase H1 or RNase H1A have significant RNase H activity in the gel assay (data not shown). Retention of the substrate in the region in the Activity Gel (fig. 3B —H1A—arrow at about 66 kDa) indicates the binding of the enzyme to the substrate without substantial degradation. This phenomenon is related to the N-terminal portion of RNase H1A to bind to some types of nucleic acids independent of the RNase H domain (A. Arudchandran and R. J. Crouch, unpublished data).



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 3.—Renaturation gel assay for RNase H activity. RNase H activity was detected using SDS-PAGE, in which 32P-labeled poly(rA)-poly(dT) was included in the running gel during polymerization. The left side of the figure (A) shows a gel of the partially purified proteins after staining with Coomassie Blue. The position of molecular mass markers is given on the left side of the figure. Lanes H1 (RNase H1), H1A (RNase H1A), H1B (RNase H1B), H1C (RNase H1C), and H2L (RNase H2L) show the proteins with molecular masses of 33, 68, 28, 16, and 35 kDa, respectively. (B) Results of the gel activity assay as the reverse image of an autoradiogram of the gel after renaturation

 
The RNase H activity was not detected for the RNases H obtained from the genes ZK938.7 (RNase H1B) and C04F12.9 (RNase H1C) or for the RNase H2L of C. elegans. The apparent activity in H1C migrating with a mobility of a protein of about 30 kDa is often found in proteins after using COBALT columns for partial purification and is absent when using NICKLE columns (data not shown).

Complementation Assay
The inability of the E. coli strain MIC1066 [rnhA-339::cat recB270(Ts)] to grow at 42°C can be overcome if an active RNase H protein is expressed. The strain has a T7 RNA polymerase gene to drive transcription of the RNase H genes cloned into the pET15b expression vector. In many instances, the RNase H produced by basal levels of transcription is sufficient to permit growth at 42°C. Of the five C. elegans RNase H–like genes, only rnh-1.0 and rnh-1.1 were able to complement the temperature-sensitive phenotype of MIC1066 (data not shown). For complementation of the ts-growth defect by these RNase H–like proteins, the polypeptide needs to exhibit RNase H activity. Thus, in agreement with the gel assay, RNase H1 and RNase H1A can express RNase H activity and require no C. elegans–specific modification to be active.

Evolutionary Relationships of RNase H1
The four RNases H1 of C. elegans share the several conserved amino acids present in RNases H of this class yet differ in significant ways. For example, RNase H1 and RNase H1A (fig. 1A and B ) have the RNase H domain attached to an N-terminal non–RNase H sequence, whereas RNase H1B and RNase H1C consist of only an RNase H domain. To assess their evolutionary relationship, we generated the maximum likelihood phylogram shown in figure 4 . The phylogram was generated using only the RNase H region seen in figure 1A. The tree is shown rooted with E. coli RNase HI. Indeed, these results support, as suggested above, that the C. elegans RNases H1B and H1C are more related to one another than either is to any other RNase H used in this analysis. Interestingly, C. elegans RNase H1A groups with the S. cerevisiae and S. pombe RNases H1, albeit quite weakly. Also, the phylogram reveals that C. elegans RNase H1 does appear to be an orthologue of the human RNase H1. One interpretation of this phylogram suggests that the ancestral state may have consisted of multiple RNases H and that fungi and animals have both independently lost different members of this group. A more probable interpretation discussed later is that the RNase H of C. elegans has duplicated several times since its separation from the common ancestor of fungi and mammals.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 4.—Phylogram of the RNase H1 domains of C. elegans and other eukaryotic RNases H1. The phylogram was generated by using the sequences and alignment indicated in figure 1A (Page 1996Citation ). Escherichia coli RNase HI was the root. Cer1-1 is the LTR Cer1-1 of C. elegans. HIV-1 is human immunodeficiency virus 1 RNase H. Sp is S. pombe, Sc is S. cerevisiae, Ce is C. elegans, and Hu is human

 
Multiple mRNAs are Produced from the rnh-1.0 Gene
The mRNA encoding the C. elegans RNase H1 protein is derived by the transcription of a 1.25-kb dicistronic transcript (fig. 5A ), processed to yield two mRNAs (fig. 5A —0.5 and 0.8 kb). Translation of the first (0.5 kb) mRNA produces a protein having two copies of a sequence very closely related to the duplex RNA–binding sequence found at the N-terminus of eukaryotic RNases H1 (figs. 1B and 5 ). The 5'-proximal mRNA has the common SL1 sequence found at the 5'-end of most C. elegans transcripts (Krause and Hirsh 1987Citation ; Huang and Hirsh 1989Citation ), whereas the second, RNase H1–encoding, transcript possesses an SL2 sequence, indicative of an RNA of a dicistronic origin (Spieth et al. 1993Citation ). In addition to these two cDNAs, we obtained cDNAs with two alternative splice sites. One alternative splice event eliminates the site (fig. 5B exon 3–4) for processing the dicistronic mRNA and a second (fig. 5B exon 5–6) is within the RNase H–coding region. The former would require internal initiation of translation to produce RNase H1, whereas the latter results in a frameshift, producing a truncated RNase H1. We found one example in which both alternative splice sites are present on a single transcript. The rnh-1.3 gene was the only other gene for which we observed alternatively spliced mRNAs (supplementary material).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 5.—Transcription and splicing of rnh-1.0. (A) Northern analysis of rnh-1.0. Northern analysis probing for transcripts derived from the rnh-1.0 gene. The blot was probed with antisense transcripts from the R1-R2 region (blue-R1, red-R2, and lane A) and from the RNase H region (black and lane B). The dicistronic transcript (1.25 kb) can be seen in both lanes A and B. The two transpliced products are detected as 0.5 kb (lane A, R1-R2) and 0.8 kb (lane B, R3-H). Purple denotes R3. (B) Splice patterns of C. elegans rnh-1.0. Splice patterns for multiple splice products are shown with boxed areas noting exons connected by thin carat lines marking the intronic regions. (i) is the full-length transcript (1.25 kb) detected by PCR using primers BC673 and BC675 and is labeled as "Dicistronic cDNA." (ii) is the dicistronic precursor to the mature mRNAs shown in (iii) and (iv). The upstream R1-R2–encoding transcript (iii) results from cleavage and polyadenylation of (ii). The mRNA encoding the active RNase H1 protein is shown in (iv), having a purple duplex RNA–binding domain with black boxes designating the RNase H domain—noted as "Processed cDNA." The duplex RNA–binding regions are shown as blue (R1), red (R2), and purple (R3). "polyA" denotes the position of polyA addition to the first mRNA, and SL2 indicates the site to which the SL2 sequence has been transpliced. Numbers above each indicate the exon number starting with the first exon in each case. The lower part of the figure represents an enlargement of the region in (i) and (iv) sometimes having an extra 19 nt (green) at the position marked by *

 
RNase H2 (rnh-2)
A BLAST search with yeast RNase H2 (or several other related protein sequences) as the query sequence identified a locus in C. elegans cosmid T13H5, with the reported product (T13H5.2) being related to the squid retinal-binding protein. The large T13H5.2 protein of 1,264 amino acids has an RNase H2L–like 298–amino acid central domain with a 455–amino acid N-terminal and a 561–amino acid C-terminal domain. We made a series of oligonucleotides both upstream and downstream of the cDNA sequence encoding the RNase H2L protein, all of which failed to produce PCR products when paired with oligonucleotides internal to the rnh-2 gene (data not shown). By using primers both to the N-terminal and C-terminal regions and by analyzing the 5'- and 3'-ends of transcripts (RACE reactions), we have cloned and characterized the cDNA for RNase H2L (accession number AF181619). We conclude that the RNase H2L protein is not part of a larger polypeptide derived from either the N-terminal or the C-terminal domains, as predicted in the database.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Multiple RNases H
Now, the embarkation point for examining proteins of cells can be from information provided by genomic DNA sequence determinations. Several surprises have already been uncovered when examining complete genome sequences for genes encoding RNase H–like proteins. In addition, proteins with RNase HI–related sequences do not always exhibit RNase H activity (Ohtani et al. 1999aCitation ). Because of the poor conservation of RNase HI sequences, the failure to find more than one RNase H in some organisms may be a reflection of the search engines used to find homologous protein sequences.

We have taken advantage of the wealth of information about the DNA sequences of C. elegans and humans to examine potential genes coding for RNases H. To know what the number and types of RNases H might be in animals on the basis of the genomic DNA sequence, we used several RNases H as query sequences in a BLAST search (tBLASTn). Use of multiple proteins helps to overcome the poor conservation of sequences of RNases H1, including the spacing between those regions that are most indicative of RNases H. Even when employing these searches, no retrotransposon element of C. elegans was detected. The alignment shown in figure 1A helps explain why we failed to detect C. elegans retrotransposons in our searches. Only 17 of 155 amino acids are identical in the alignment, and three important conserved regions are missing (the {alpha}C region is missing as well as the region between ß5 and {alpha}E, including the important His124-E. coli numbering). Even though Cer1-1 contains a Glu48 residue (fig. 1A ), the context in which it resides bears little resemblance to the other RNases H1. In contrast, numerous human endogenous retrovirus (HERV) elements are readily detected in the human genome by using almost any of the RNase H proteins as query sequences. This indicates that many HERV RNase H domains are more recently derived from cellular RNases H or vice versa, whereas the separation between the origin of retroelements in C. elegans and cellular enzymes is much greater. This conclusion supports that of Malik and Eickbush (2001)Citation .

Five different genes yielding proteins related to RNases H were found in the C. elegans genome. In organisms such as E. coli and S. cerevisiae, cells deleted for both RNase H–encoding genes are viable (Frank et al. 1999Citation ; Itaya et al. 1999Citation ; Arudchandran et al. 2000Citation ; A. Arudchandran and R. J. Crouch, unpublished data). We found that RNAi (Fire et al. 1998Citation ) inactivation of any or several of the RNase H mRNAs also produced no easily detectable phenotype (data not shown). Thus far, only in Drosophila melanogaster is there a serious defect related to an RNase H mutation (Filippov, Filippova, and Gill 2001Citation ).

Of the five RNase H–related genes, one is similar to RNase H2 or RNase HII. When expressed in E. coli or when assayed in a gel renaturation assay, no enzymatic activity is detected (fig. 3 ). We have expressed RNase H2 from S. cerevisae, human, and mouse and uniformly find no enzymatic activity (Crouch and Cerritelli 1998Citation ). A similar observation has been reported for the human protein (Lima, Wu, and Crooke 2001Citation ), but others have observed very weak activity after refolding of the S. cerevisiae RNase H2 in E. coli (Qiu et al. 1999Citation ). The RNase H2 may be composed of two subunits (Frank et al. 1998Citation ) or may require modification to exhibit RNase H activity.

Two C. elegans proteins, similar to RNase HI of E. coli in amino acid sequence, exhibit RNase H activity when expressed in E. coli. Caenorhabditis elegans RNase H1 is similar to most RNases H1 of eukaryotes having a duplex RNA–binding domain at its N-terminus and the RNase H domain at the C-terminus. RNase H1A is unique. The N-terminal region is not found in any of the other RNase H sequences and contains a large number of the Arg-Ser repeats (fig. 1B ), typical of SR proteins involved in splicing (Graveley 2000Citation ). The Arg-Ser repeats are important for protein-protein interactions and may direct these proteins to the spliceosome (Yuryev et al. 1996Citation ). There seems to be no obvious direct role for RNase H in splicing, and we are unaware of any report indicating a requirement for RNase H in splicing. The importance of the amino terminal region is unclear, particularly in light of the fact that it is not required for enzymatic activity (data not shown). The RNase H domain does differ from those of other active RNases H1. In particular, the {alpha}B-C-D-helices of RNase H1A are more similar in size and content to HIV-1 RNase H than to E. coli RNase HI, suggesting that additional amino acids are important for the binding of the protein to nucleic acid substrates. It may be that the C-terminal extension seen in RNase H1A supplies the binding function through the many basic amino acid residues present there. It should be pointed out that several of the Arg residues are followed by Ser, similar to what is found near the N-terminus of the protein. We are currently examining RNase H1A for determinants of RNase H activity and inquiring into the role of the non–RNase H domain.

The RNases H1B and H1C are inactive, as expressed from the cDNAs we have cloned. The genes encoding RNases H1B and H1C contain introns and, therefore, are probably not pseudogenes. RNase H1C does not have the C-terminal {alpha}E-helix whose presence is necessary for enzymatic activity (Haruki et al. 1994Citation ; Goedken, Raschke, and Marqusee 1997Citation ). If a splice were to occur near the end of the gene, an {alpha}-helix could possibly be attached. We have examined four independently derived cDNAs and have found no example of a transcript encoding the putative {alpha}E-helix. The defect in RNase H1B is most likely due to the unusual nature of the {alpha}B-{alpha}C-{alpha}D–region. In RNase H1B, there are numerous Thr and Ser residues rather than the typical basic and Trp residues at conserved locations (fig. 1A ). We have obtained three independent clones of rnh-1.2, all of which have the same sequence. The RNases H1 and H1C contain introns between the coding sequences for {alpha}A and ß4 (fig. 1A —reverse letters indicate splice site). RNase H1B has no equivalent splice site. Translation of the mRNA in the {alpha}B-{alpha}C-{alpha}D–region of the rnh-1.2 in all three reading frames reveals the presence of an out-of-frame coding sequence that yields a very good {alpha}B-{alpha}C-{alpha}D–region, thereby suggesting that the formation of an active RNase H1B is possible (data not shown). The RNases H1B and H1C may have a function(s) other than providing RNase H activity, but it is also possible that their expression in an RNase H active form may be limited to specialized situations.

The abundance and diversity of alternatively spliced mRNAs of C. elegans RNase H1 is striking (fig. 5B and supplementary material) and makes it clear that synthesis of RNase H1 is regulated by splicing. The primary transcript is differentially spliced to produce two types of dicistronic messages, one of which is processed by the usual pathway for the generation of two mRNAs encoding two different proteins. Because the intercistronic region having the poly(A) addition signal is deleted when splicing occurs in the alternatively spliced mRNA (fig. 5B exon 3 to exon 4), the two coding regions remain on a single message and would require internal initiation of translation to produce RNase H1. The alternative splice joining exon 5 to exon 6 (fig. 5B ) does not permit the synthesis of an active RNase H1.

Generality of Multiple Genes in C. elegans
In contrast to the C. elegans genome, we have been unable to find evidence for multiple RNase H1–like proteins in the human genome. This disparity in the numbers of proteins of a particular type between C. elegans and other genomes is not unique to RNases H (Combes et al. 2000Citation ; Keiper et al. 2000Citation ; Robertson 2000Citation ; Hodgkin 2001Citation ). Sternberg (2001)Citation has suggested that each of the small number of cells comprising C. elegans may be more complex or may respond in more complex ways due to increased molecular diversity within each cell. One prime example is in olfactory neuronal cells where one cell possesses multiple receptors and yet can sense different odors (Bargmann 1998Citation ). In simpler cell types, a single protein may perform many functions but would be limited in its role in any cell by the presence of one or only a few substrates. For example, RNase H1 in human cells may have multiple functions, but within a given cell type these functions may be limited by environmental factors such as substrates. In C. elegans, RNases H of several types may be present within a single cell type but may of necessity be limited to one function or one cell organelle, and the apparent regulation of RNase H1 levels by splicing may indicate that this protein may be able to recognize all the cellular substrates and, therefore, needs to be kept under tight control so that it does not subsume another enzyme's role. Alternatively, RNase H1A may have a requirement uniquely present in C. elegans for splicing or some splice-related event, as indicated by the N-terminal SR character (fig. 1B ).


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank Vivian Tsai for initial characterization of the yk177f5 clone and Dr. Yuji Kohara for several cDNA clones.


    Footnotes
 
David Irwin, Reviewing Editor

Keywords: ribonuclease H splicing multiple genes Caenorhabditis elegans human genome double-stranded RNA RNA-DNA hybrids Back

Address for correspondence and reprints: Robert J. Crouch, Building 6B Room 2B-231, 6 Center Drive MSC 2790, National Institutes of Health, Bethesda, MD 20892. robert_crouch{at}nih.gov Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Altschul S. F., W. Gish, W. Miller, E. W. Myers, D. J. Lipman, 1990 Basic local alignment search tool J. Mol. Biol 215:403-410[ISI][Medline]

    Arudchandran A., S. M. Cerritelli, S. K. Narimatsu, M. Itaya, D.-Y. Shin, Y. Shimada, R. J. Crouch, 2000 The absence of ribonuclease H1 or H2 alters the sensitivity of Saccharomyces cerevisiae to hydroxyurea, caffeine and ethyl methane sulphonate: implications for roles of RNases H in DNA replication and repair Genes Cells 5:789-802[Abstract/Free Full Text]

    Bargmann C. I., 1998 Neurobiology of the Caenorhabditis elegans genome Science 282:2028-2033[Abstract/Free Full Text]

    Bowen N. J., J. F. McDonald, 1999 Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-like elements Genome Res 9:924-935[Abstract/Free Full Text]

    Britten R. J., 1995 Active gypsy/TY3 retrotransposons or retroviruses in Caenorhabditis elegans Proc. Natl. Acad. Sci. USA 92:599-601[Abstract]

    Cazenave C., V. Mizrahi, R. J. Crouch, 1998 Methods—rnh gene and RNase H activity analysis Pp. 251–265 in R. J. Crouch and J. J. Toulmé, eds. Ribonucleases H. INSERM, Paris

    Combes D., Y. Fedon, M. Grauso, J. P. Toutant, M. Arpagaus, 2000 Four genes encode acetylcholinesterases in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae. cDNA sequences, genomic structures, mutations and in vivo expression J. Mol. Biol 300:727-742[ISI][Medline]

    Crouch R. J., S. M. Cerritelli, 1998 RNases H of S. cerevisiae, S. pombe, C. fasciculata, and N. crassa Pp. 79–100 in R. J. Crouch and J. J. Toulmé, eds. Ribonucleases H. INSERM, Paris

    Crouch R. J., J. J. Toulmé, eds 1998 Ribonucleases H INSERM, Paris

    Felsenstein J., 2001 PHYLIP: phylogeny inference package. Version 3.6a Department of Genetics, University of Washington, Seattle, Wash

    Filippov V., M. Filippova, S. S. Gill, 2001 Drosophila RNase H1 is essential for development but not for proliferation Mol. Genet. Genomics 265:771-777[ISI][Medline]

    Fire A., S. Q. Xu, M. K. Montgomery, S. A. Kostas, S. E. Driver, C. C. Mello, 1998 Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans Nature 391:806-811[ISI][Medline]

    Frank P., C. Braunshofer-Reiter, A. Karwan, R. Grimm, U. Wintersberger, 1999 Purification of Saccharomyces cerevisiae RNase H(70) and identification of the corresponding gene FEBS Lett 450:251-256[ISI][Medline]

    Frank P., C. Braunshofer-Reiter, U. Wintersberger, 1998 Yeast RNase H(35) is the counterpart of the mammalian RNase HI, and is evolutionarily related to prokaryotic RNase HII FEBS Lett 421:23-26[ISI][Medline]

    Frank P., C. Braunshofer-Reiter, U. Wintersberger, R. Grimm, W. Büsen, 1998 Cloning of the cDNA encoding the large subunit of human RNase HI, a homologue of the prokaryotic RNase HII Proc. Natl. Acad. Sci. USA 95:12872-12877[Abstract/Free Full Text]

    Goedken E. R., T. M. Raschke, S. Marqusee, 1997 Importance of the C-terminal helix to the stability and enzymatic activity of Escherichia coli ribonuclease H Biochemistry 36:7256-7263[ISI][Medline]

    Graveley B. R., 2000 Sorting out the complexity of SR protein functions RNA 6:1197-1211[Free Full Text]

    Han L. Y., W. P. Ma, R. J. Crouch, 1997 Ribonuclease H renaturation gel assay using a fluorescent-labeled substrate Biotechniques 23:920-926[ISI][Medline]

    Haruki M., E. Noguchi, A. Akasako, M. Oobatake, M. Itaya, S. Kanaya, 1994 A novel strategy for stabilization of Escherichia coli ribonuclease HI involving screening for intragenic suppressors of carboxyl-terminal deletions J. Biol. Chem 269:26904-26911[Abstract/Free Full Text]

    Henikoff S., J. G. Henikoff, 1992 Amino-acid substitution matrices from protein blocks Proc. Natl. Acad. Sci. USA 89:10915-10919[Abstract]

    Hodgkin J., 2001 What does a worm want with 20,000 genes? Genome Biol 2:2008.1-2008.4

    Huang X. Y., D. Hirsh, 1989 A 2nd trans-spliced RNA leader sequence in the nematode Caenorhabditis elegans Proc. Natl. Acad. Sci. USA 86:8640-8644[Abstract]

    Itaya M., A. Omori, S. Kanaya, R. J. Crouch, T. Tanaka, K. Kondo, 1999 Isolation of RNase H genes that are essential for growth of Bacillus subtilis 168 J. Bacteriol 181:2118-2123[Abstract/Free Full Text]

    Jurka J., 2000 Repbase Update—a database and an electronic journal of repetitive elements Trends Genet 16:418-420[ISI][Medline]

    Keiper B. D., B. J. Lamphear, A. M. Deshpande, M. Jankowska-Anyszka, E. J. Aamodt, T. Blumenthal, R. E. Rhoads, 2000 Functional characterization of five eIF4E isoforms in Caenorhabditis elegans J. Biol. Chem 275:10590-10596[Abstract/Free Full Text]

    Keller W., R. Crouch, 1972 Degradation of DNA RNA hybrids by ribonuclease H and DNA polymerases of cellular and viral origin Proc. Natl. Acad. Sci. USA 69:3360-3364[Abstract]

    Krause M. W., 1995 Techniques for analyzing transcription and translation Pp. 513–529 in H. F. Epstein and D. C. Shakes, eds. Caenorhabditis elegans: modern biological analysis of an organism. Academic Press, San Diego

    Krause M., D. Hirsh, 1987 A trans-spliced leader sequence on actin messenger-RNA in C. elegans Cell 49:753-761[ISI][Medline]

    Lander E. S., L. M. Linton, B. Birren, et al. (240 co-authors) 2001 Initial sequencing and analysis of the human genome Nature 409:860-921[ISI][Medline]

    Lima W. F., H. J. Wu, S. T. Crooke, 2001 Human RNases H Methods Enzymol 341:430-440[ISI][Medline]

    Malik H. S., T. H. Eickbush, 2001 Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses Genome Res 11:1187-1197[Abstract/Free Full Text]

    Ohtani N., M. Haruki, M. Morikawa, R. J. Crouch, M. Itaya, S. Kanaya, 1999a. Identification of the genes encoding Mn2+-dependent RNase I-III and Mg2+-dependent RNase HIII from Bacillus subtilis: classification of RNases H into three families Biochemistry 38:605-618[ISI][Medline]

    Ohtani N., M. Haruki, M. Morikawa, S. Kanaya, 1999b. Molecular diversities of RNases H J. Biosci. Bioeng 88:12-19[ISI]

    Page R. D., 1996 TreeView: an application to display phylogenetic trees on personal computers Comput. Appl. Biosci 12:357-358[Medline]

    Qiu J. Z., Y. Qian, P. Frank, U. Wintersberger, B. H. Shen, 1999 Saccharomyces cerevisiae RNase H(35) functions in RNA primer removal during lagging-strand DNA synthesis, most efficiently in cooperation with Rad27 nuclease Mol. Biol. Cell 19:8361-8371

    Robertson H. M., 2000 The large srh family of chemoreceptor genes in Caenorhabditis nematodes reveals processes of genome evolution involving large duplications and deletions and intron gains and losses Genome Res 10:192-203[Abstract/Free Full Text]

    Spieth J., G. Brooke, S. Kuersten, K. Lea, T. Blumenthal, 1993 Operons in C. elegans—polycistronic messenger-RNA precursors are processed by transplicing of SL2 to downstream coding regions Cell 73:521-532[ISI][Medline]

    Sternberg P. W., 2001 Working in the post-genomic C. elegans world Cell 105:173-176[ISI][Medline]

    The C. elegans Sequencing Consortium. 1998 Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282:2012-2108.

    Venter J. C., M. D. Adams, E. W. Myers, et al. (252 co-authors) 2001 The sequence of the human genome Science 291:1304-1351[Abstract/Free Full Text]

    Yuryev A., M. Patturajan, Y. Litingtung, R. V. Joshi, C. Gentile, M. Gebara, J. L. Corden, 1996 The C-terminal domain of the largest subunit of RNA polymerase II interacts with a novel set of serine/arginine-rich proteins Proc. Natl. Acad. Sci. USA 93:6975-6980[Abstract/Free Full Text]

Accepted for publication July 15, 2002.