(Received for publication, September 18, 1995; and in revised form, November 2, 1995)
From the
A transcription termination factor (Rho) was purified from the
Gram-positive bacterium Micrococcus luteus, and the complete
gene sequence was determined. The M. luteus Rho polypeptide
has 690 residues, which is 271 residues more than its homolog from Escherichia coli. Most of the additional residues compose a
highly charged, hydrophilic segment that is inserted in a nonconserved
region between two conserved regions of the RNA-binding domain of the
known Rho homolog proteins. This segment extends from residues 49 to
311 and includes a stretch of 238 residues that contain no hydrophobic
side chains. Biochemical studies indicate that the M. luteus protein is very similar to E. coli Rho in terms of its
RNA-dependent NTPase activity and its sensitivity to the Rho-specific
inhibitor bicyclomycin. However, the M. luteus protein has a
less stringent RNA cofactor specificity. It also acts to terminate RNA
transcription with E. coli RNA polymerase on the cro DNA template, but at much earlier termination stop points than
those recognized by E. coli Rho. Thus, the M. luteus protein functions as a true Rho factor, but with a different
specificity than that of E. coli Rho. We propose that this
altered specificity is consistent with its need to function on
transcripts that have a high content of G + C residues.
The orderly expression of the genetic information in DNA segments into RNA molecules depends on the function of transcription terminators. In Escherichia coli, one mechanism of transcription termination is mediated in part by an essential protein factor called Rho(1) . Rho factor from E. coli has been studied since its discovery nearly 25 years ago(2) . The Rho monomer is a 47-kDa protein. However, Rho factor functions as a homohexamer (3) that can bind to a nascent transcript and mediate its release by actions on the transcription complex that are coupled to the hydrolysis of NTPs(1) .
A recent phylogenetic
study by Opperman and Richardson (4) comparing rho genes isolated from organisms from several of the major branches
of bacteria suggests that Rho is ubiquitous throughout the bacterial
domain. An unexpected result was discovered during the analysis of the rho gene from Micrococcus luteus, a Gram-positive
soil bacterium that has an unusually high G + C DNA content
(74%)(5) . The M. luteus rho homolog was found to have
an open reading frame encoding a protein that was homologous to E.
coli Rho through a very long portion. However, the homology did
not extend all the way through the RNA-binding domain toward the amino
terminus of the protein. Because the region of homology starts in a
segment that has an in-frame GTG codon preceded by a sequence that is a
good match to a Shine-Dalgarno sequence, Opperman and Richardson
proposed that translation began at that GTG codon to yield a 41,733-Da
protein of 382 amino acids that is 52% identical (71% similar) to E. coli Rho. If this proposal were correct, the M. luteus Rho protein would be unusual in comparison with the other Rho
homologs as it would lack a conserved part of its RNA-binding domain.
Additionally, the protein would be 30 amino acids smaller than any
of the other predicted Rho factors that have been sequenced.
The
data of Opperman and Richardson (4) were also consistent with
an alternative hypothesis, namely that the M. luteus Rho
polypeptide is much larger than the homologs from other organisms and
includes a large region with a very unusual amino acid sequence. The
DNA sequence determined in that work indicated that the open reading
frame extended upstream for at least 160 amino acid residues. However,
because the G + C content of that upstream region was 78%,
which is a value that is typical of intergenic spacer regions in M.
luteus DNA(5) , and because these upstream codons had a
very unusual bias favoring Arg, Asp, Gln, and Gly residues and lacking
hydrophobic residues, Opperman and Richardson argued that it was
unlikely to be part of the coding region for the M. luteus Rho
protein.
To resolve this issue, we purified Rho protein from M. luteus. Our studies show conclusively that the latter hypothesis is correct and demonstrate directly that an organism that is phylogenetically distinct from E. coli also has a factor that can cause the termination of RNA transcription.
To identify a 203-kbp ()fragment
containing the sequence encoding the N-terminal region of the M.
luteus rho gene, 10 µg of genomic DNA were digested with
various combinations of restriction endonucleases, and the products
were separated by agarose gel electrophoresis. Fragments containing the
desired rho gene segment were identified by Southern
hybridization (11) with a 0.28-kbp HindIII/SmaI fragment from the plasmid
pMLRHOSK
(12) , which had been radiolabeled to
a high specific activity with
P(13) .
It was
determined that a BamHI/PstI double digest of M.
luteus genomic DNA produced a 2-kbp fragment that contained
the desired portion of the M. luteus rho gene. To clone this
DNA fragment, 50 µg of M. luteus DNA were digested with BamHI and PstI, and the fragments were size-selected
on a 1% agarose gel and ligated into pBluescript II SK
(Stratagene). Colonies of E. coli DH5
F` transformed
with these ligated plasmids were screened by nucleic acid hybridization (14) with the same HindIII/SmaI fragment used
for Southern analysis.
Analysis of the final fraction
by electrophoresis on a 10% SDS-polyacrylamide gel (8) revealed
that it consisted of a single polypeptide in >95% purity with an
apparent M of 95,000 (Fig. 1), which is
approximately twice as large as the E. coli Rho polypeptide
and also significantly larger than the M. luteus rho gene
product (M
= 41,733) proposed by Opperman
and Richardson(4) .
Figure 1: Gel electrophoretic analysis of the RNA-dependent ATPase from M. luteus. The protein samples were separated by electrophoresis on a 10% polyacrylamide gel with the Laemmli buffer system(8) . Lane M, marker proteins (in kilodaltons); lane Ec, 2 µg of E. coli Rho protein; lane Ml, 4 µg of the RNA-dependent ATPase purified from M. luteus.
A sample of this highly purified M. luteus ATPase, analyzed by the Microsequencing Facility at Harvard University, yielded an N-terminal amino acid sequence of TESTE, which is different from the sequence of MAGIL at the N terminus of the proposed M. luteus rho gene product(4) . This sequence also did not match any other pentapeptide sequence in the segment of the M. luteus rho gene that had been sequenced prior to this work. However, a 42-kDa fragment generated from the 95-kDa protein by partial digestion with trypsin had the N-terminal sequence GRPGPEVDE, which did match a sequence located upstream of the previously proposed rho translational start site(12) . Together, these results concerning the apparent size and N-terminal sequence of the M. luteus RNA-dependent ATPase suggested that the M. luteus rho gene sequence reported by Opperman and Richardson (4) was not complete.
Figure 2: Nucleotide and predicted amino acid sequences of the M. luteus rho gene. The sequences were determined by sequencing both DNA strands. The Shine-Dalgarno sequence is indicated (underlined).
To
test whether the lower activity M. luteus Rho had with CTP was
related to the RNA cofactor used, the rate of CTP hydrolysis was
measured with poly(U) and was found to be 30% of that with ATP
(data not shown). Thus, the lower activity of M. luteus Rho
with CTP was not a consequence of using poly(C) as an activator.
Figure 3:
Bicyclomycin inhibits M. luteus Rho ATPase activity. ATP hydrolysis at 37 °C was measured in
standard Rho ATPase reaction mixtures containing 58 nM Rho (M. luteus () or E. coli (
)), 10
µg/ml poly(C), and bicyclomycin as indicated. Reactions were
initiated by the addition of ATP (final concentration of 1 mM)
to prewarmed solutions, and P
release was detected
colorimetrically. 100% activity is 11.5 units (M. luteus) and
11.5 units (E. coli).
Figure 4:
M. luteus Rho terminates
transcription. Ternary transcription complexes stalled on a cro template were prepared and elongated as described under
``Experimental Procedures.'' Lanes 1-5 show
the distribution of RNA transcripts after elongation for 2 s, 5 s, 8 s,
3 min, and 6 min, respectively. Samples in lanes 6-13 were incubated for 3 min with the following additions: lane
6, 28 nME. coli Rho; lane 7, 28 nME. coli Rho and 25 nM NusG; lane 8,
none, then for an additional 3 min with 28 nME. coli Rho; lane 9, 28 nMM. luteus (Mlu) Rho; lane 10, 28 nMM. luteus Rho and 25 nM NusG; lane 11, none, then for an
additional 3 min with 28 nMM. luteus Rho; lane
12, 28 nMM. luteus Rho and 200 µM bicyclomycin; lane 13, none, then for an additional 3 min
with 28 nMM. luteus Rho and 200 µM bicyclomycin. The nucleotide lengths of the RNAs indicated at the
right were determined by transcribing the cro template using
RNA chain-terminating analogs (data not shown). RT,
readthrough.
To show that the smaller transcripts were the result of M. luteus Rho action as a transcription termination factor rather than as a ribonuclease, transcripts synthesized in the absence of Rho factor were subsequently incubated with M. luteus Rho (Fig. 4, lane 11). Although a small amount of a 145-nucleotide RNA appeared, the fact that no other RNA molecules appeared that had the same sizes as the products made when M. luteus Rho was present cotranscriptionally rules out the possibility that they were generated by a ribonuclease activity. Because very few of the transcripts were extended to the size of the readthrough RNA when M. luteus Rho was present during transcription, the overall efficiency of termination within the transcribed fragment was nearly 100%
Two lines of evidence suggest that the 145-nucleotide RNA arose as a result of a contamination of M. luteus Rho with a ribonuclease. First, the extent of appearance of the 145-nucleotide RNA was higher with other, less pure preparations of the M. luteus factor (data not shown). Second, it also appeared when the function of M. luteus Rho was inhibited by bicyclomycin (Fig. 4, lanes 12 and 13).
A comparison of the distribution of transcripts in reaction mixtures lacking Rho that had been quenched at 2, 5, and 8 s after initiation (Fig. 4, lanes 1-3) with the distribution of the terminated transcripts (lanes 9 and 10) indicated that, as with E. coli Rho, the preferred positions for termination stop points were at the positions where RNA polymerase naturally pauses. However, with M. luteus Rho, the termination occurred at pause sites that were farther upstream than the pause sites that were used as the termination points by E. coli Rho.
We have isolated a transcription termination factor from M. luteus that is phylogenetically related to transcription termination factor Rho from E. coli. Although rho homologs have been identified from several different phylogenetic branches of bacteria(4, 25, 26, 27, 28) , this is the first demonstration that an organism that is distantly related to E. coli actually expresses its rho homolog gene. Although M. luteus Rho is similar to E. coli Rho in having a broad NTP substrate specificity, in its turnover number with poly(C) as a cofactor, and in its sensitivity to inhibition with bicyclomycin, it differs in having a less stringent RNA cofactor specificity and in its specificity of termination during transcription of a coliphage gene with E. coli RNA polymerase. We have also found that M. luteus Rho differs from E. coli Rho in containing an extended insertion of very unusual sequence and likely structure within its RNA-binding domain.
M. luteus belongs
to the phylogenetic branch called the high G + C Gram-positive
group. The G + C content of its DNA is 74%(5) . In
contrast, the G + C content of E. coli DNA is only 50%.
In its function, the Rho factor of E. coli acts by binding to
the nascent transcript at regions of the RNA called rut (rho utilization site). Although rut sequences lack a consensus(1) , they do have certain
specific, defining characteristics; they have little base-paired
secondary structure (1) and usually have a compositional bias
that is high in C residues and low in G residues(29) . Because
of their high G + C content, the RNA molecules in M. luteus are likely to have more extensive base pairing than the RNA
molecules in E. coli. Thus, M. luteus Rho has likely
been adapted to use a rut site that has more extensive base
pairing than is typical for a rut site in E. coli.
Evidence in support of this hypothesis is our finding that M.
luteus Rho caused termination of transcription at a site located
well before the rut site used by E. coli Rho on the
cro gene template. The RNA encoded by the upstream
region of
cro forms extended base-paired secondary
structures (30) , thus making it unavailable as a rut site for E. coli Rho. This interpretation is supported by
the finding that E. coli Rho will cause termination at
upstream sites when transcription is performed with ITP in place of GTP
because the resulting inosine-substituted RNA has less stable
base-paired secondary structure than the normal cro transcript(31) . M. luteus Rho, in contrast, was
able to use these segments in the first 100 nucleotides of a normal,
guanosine-containing cro transcript as its rut site
to cause termination.
An exceptionally unusual feature of M.
luteus Rho is the amino acid composition of the insert in the
RNA-binding domain between the two phylogenetically conserved sequence
segments that are found in the RNA-binding domain of all the known Rho
sequences. In M. luteus Rho, this insert is between Ile and Gly
(Fig. 5). In Rho factors from most
organisms, these two phylogenetically conserved landmark residues are
usually separated by 14 amino acids with very little phylogenetic
conservation. With its insert, M. luteus Rho has 263 residues
instead of 14 in this putative loop region. The first part of the
insertion sequence is rich in Ala residues, while the C-terminal part
is rich in Arg, Asp, Gly, and Asn residues. Also, in a stretch of 238
residues, there are no amino acids with a hydrophobic side chain
(excluding Pro and Ala residues). Since patterns of polar and nonpolar
residues are important in the formation of ordered
-stranded and
-helical secondary structures (32) and since hydrophobic
residues have a major role in the formation of ordered tertiary
structures for globular domains(33) , we predict that this very
hydrophilic segment of the protein will be randomly coiled, lacking a
defined secondary structure. Indeed, when the insert sequence was
analyzed for secondary structure (PHDsec Secondary Structure Prediction
Program, EMBL, Heidelberg,
Germany)(34, 35, 36) ,
80% was predicted
to exist as a loop. However, this segment has approximately an equal
number of positively and negatively charged residues and might form an
unprecedented, ordered structure consisting of many salt bridges.
Figure 5:
Schematic representation of the M.
luteus and E. coli Rho polypeptides. The M. luteus and E. coli Rho polypeptides have been drawn to scale.
The relative positions of amino acid insertions are compared with E. coli Rho and are indicated by diverging lines. The black area in M. luteus Rho represents the 263-amino
acid segment between Ile and Gly
, and the gray area the 10-amino acid segment between Lys
and Gln
. The E. coli RNA-binding and
ATP-binding domains are indicated by arrows below. The amino
acids are numbered from the open reading
frame.
The sequences of two other rho genes from this same group
of organisms have recently become available: the genes from Streptomyces lividans()and Mycobacterium
leprae (GenBank
accession number U15186). The open
reading frames of these genes predict Rho proteins with 706 and 610
residues, respectively. With both, the major part of the additional
residues over the
420 that are typical of Rho homologs from other
phylogenetic groups start after Ile
in S. lividans Rho and after Ile
in M. leprae Rho and end
before Gly
and Gly
, respectively. Thus, S. lividans Rho has 263 and M. leprae Rho has 162
residues between these landmark residues. Like M. luteus Rho, S. lividans Rho has a major part that is very rich in Arg,
Asp, Gly, and Glu residues, but is different in having many Gln
residues instead of many Asn residues. The M. leprae sequence
is also rich in polar residues. Like the M. luteus Rho insert,
these sequences are very deficient in hydrophobic residues. These
observations suggest that the presence of a polar, random-coiled
structure insert is a conserved feature of the Rho proteins in these
organisms that have a very high G + C content. However, in spite
of the similar features, the three known Rho RNA-binding domain
insertion sequences did not reveal any obvious phylogenetic
relatedness. It will be of great interest to learn how the presence of
a structurally unordered subdomain can help these Rho factors contend
with their nascent transcripts to cause termination.
M. luteus Rho also contains another smaller insertion sequence that runs
from Lys to Gln
(Fig. 5; see (5) ). It is between two phylogenetically conserved residues in
the RNA-binding domain corresponding to Glu
and
Arg
in E. coli Rho. The S. lividans and M. leprae Rho homologs have insertions of three and six amino
acids in that position, respectively. Like the large upstream insertion
sequence, these lack amino acids with hydrophobic side chains.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) L27277[GenBank].