From the
Escherichia coli has two known modes for termination of RNA transcription(1, 2, 3, 4) . One is intrinsic to the function of RNA polymerase, which can spontaneously terminate transcription in response to certain, limited sequences. The other mode is dependent upon the action of an essential protein factor called Rho and occurs at sequences that are specific for its function but that are less constrained than the sequences for intrinsic termination.
Rho protein functions as a hexamer of a single
polypeptide chain with 419 residues, which is the product of the rho gene(5) . It is an RNA-binding protein with the
capacity to hydrolyze ATP and other nucleoside triphosphates. Rho acts
to cause termination by first binding to a site on the nascent
transcript and by subsequently using its ATP hydrolysis activity as a
source of energy to mediate dissociation of the transcript from RNA
polymerase and the DNA template (6) . In the cell, the ability
of Rho to act at several terminators is dependent upon the presence of
an essential 21-kDa protein called NusG (7) that binds both
RNA polymerase and Rho itself(8) . In vitro the
dependence on NusG became apparent only at proximal terminators (at
sites <300 base pairs from the promoter) and under conditions when
the RNA molecules are being elongated at the in vivo rate of
40-50 nucleotides/s(9) . The requirement for NusG
when RNA chain growth is fast suggests that the NusG is acting to
overcome a kinetic limitation of Rho to act alone, perhaps through
mediating earlier access to the nascent RNA by the formation of a
complex of Rho with RNA polymerase(10) .
The mechanism of how Rho acts to dissociate the transcription complex is unknown. One important approach to elucidating how interactions with RNA mediate termination of transcription is to determine the structure of the protein. Until a good crystal structure becomes available the properties of its structure will have to be inferred from other, less direct methods, such as biochemical characterization of the protein, phylogenetic comparative analyses, and the functional properties of mutants with known amino acid changes. This review summarizes our current understanding of the structure and function of transcription termination factor Rho based on these indirect approaches.
The Rho Polypeptide Has Two, Distinct Functional Domains
The E. coli rho gene was isolated and sequenced in
1983 by Pinkham and Platt(11) . Subsequent comparison of its
predicted amino acid sequence with those of other proteins indicated
that it has several sequence motifs that are characteristic of ATPases (12) plus a sequence motif that is similar to the RNP1 ()sequence of eukaryotic RNA-binding proteins(13) .
The relative positions of these ATPase and RNA-binding sequence motifs
are indicated on Fig. 1.
Figure 1: Diagram of sequence features in the Rho polypeptide. The positions of the RNP1, RNP2, ATPase A (P loop), and ATPase B sequence elements and the trypsin cleavage sites are indicated. Sections with greater than average phylogenetic conservation are shown by rectangular blocks. The numbers indicate amino acid residue positions. The approximate extent of conservation is indicated by the degree of shading in the block with the darker the shading the higher the conservation.
The location of possible loops in
Rho was probed by determining where it is most readily cleaved by
trypsin. The first cleavages of Rho factor by action of trypsin occur
with about equal probability at two different sites(14) , one
at Lys(15) and the other at either Arg
or Lys
(14) (the exact position in this
region has not been determined). By using a UV cross-linking assay it
was found that oligo(C) binds to the N-terminal fragment produced by
the cleavage near residue 130 (14) and that trpt` RNA
reacts with a site that is between Lys
and
Lys
(16) . A polypeptide consisting of just the
first 130 residues of Rho is also able to bind oligo(C) well, (
)as could a fragment ending at residue 116 whereas a
fragment ending at residue 114 could not(17) . Thus, the first
116 residues define the minimum RNA-binding domain. This segment also
includes the sequence motif that resembles the RNP1 sequence of
eukaryotic RNA-binding proteins (Fig. 1).
Treatment of
ATP-Rho mixtures with ultraviolet light covalently linked ATP to a site
in the middle third of the polypeptide(14) , within a region
containing sequences that are very similar to two highly conserved
sequence motifs in many ATPases, ATPase A (also known as the P loop
sequence) and ATPase B (Fig. 1). Lys, a residue
within the ATPase A motif of Rho, is the site for reaction with the ATP
analog pyridoxal 5`-diphospho-5`-adenosine (18) and of
photoreaction with 8-azido-ATP(19) , thus implicating that
residue with close contacts with the
-phosphate and part of the
adenosine ring. Altogether, sequences of Rho that are similar to those
of other ATPases start at Arg
and extend as far as
Glu
(20) . Thus, of the two sites with about
equal access to trypsin, the one at Lys
is within
sequences that are part of an extended ATPase domain whereas the one
near residue 130 is clearly between two functional domains.
These results indicate that Rho polypeptide consists of an N-terminal RNA-binding domain that is a distinct structural and functional entity. It is connected by a linker to an ATP-binding domain that comprises most of the remainder of the polypeptide.
Phylogenetic Comparison of Bacterial rho Gene Homologs
Genes with sequences related to E. coli rho have been isolated from a number of different organisms including several from deeply diverged lineages of the bacteria(20) . The widespread occurrence of a rho-like gene suggests that Rho is ubiquitous in the bacteria, while its presence in the most deeply diverged organisms suggests that a rho-like gene was present in the ancestor that was common to the three domains of life, the bacteria, the archaea, and the eukarya. It is not yet known whether the archaea or the eukarya have rho-like genes.
The comparative
analysis revealed that the Rho homologs are highly conserved,
exhibiting a minimum identity of 50% of their amino acid residues in
pairwise comparisons(20) . Fig. 1shows the distribution
of the most highly conserved segments throughout the Rho polypeptides.
The ATP-binding domain had a particularly high degree of conservation,
consisting of some blocks of residues with sequences that are very
similar to segments of the and
subunits of
F
-ATPase and of other blocks with sequences that are unique
to Rho. The proteins in the data base with the closest similarities to
the Rho homologs were almost invariably
subunits of the
F
-ATPase (AtpB) or subunits of vacuolar,
proton-translocating ATPases. Even though Rho is functionally an
RNA-DNA helicase(21) , it does not show significant similarity
to the RNA helicases that are members of a large class of proteins that
have the DEAD or DEAH sequence motifs. Thus, as a helicase it appears
to be unique. Within the ATP-binding domain of Rhos, the minimum
identity with E. coli AtpB is 21%, and the minimum similarity
is 43% in pairwise comparisons(20) . The high degree of
similarity among these proteins suggests that they may be derived from
a common ancestor.
Because of this close phylogenetic relationship,
the structure of the F-ATPase that was deduced from x-ray
crystallography (22) provides an excellent prototype for the
ATP-binding domain of Rho. Based on the alignments of the similar and
identical sequences, Miwa et al.(23) have modeled the
tertiary structure of the ATP-binding domain of Rho. A diagram of that
model is shown in Fig. 2with some landmarks indicated. This
model includes some secondary structural elements that extend into the
RNA-binding domain because these also show some limited sequence
similarity with the F
-ATPase subunits(23) .
Figure 2:
A model for the tertiary structure of a
Rho subunit based on the structures of the F-ATPase
and
subunits. This is a modified version of the tertiary
structure model proposed by Miwa et al.(23) (with
permission). This is a view from one side of a subunit with the central
cavity of the hexamer on the right. The N and C termini are indicated
with those letters. The
helices are represented by cylinders and the
-strands by flat arrows. The shaded
circles and polygons represent an ATP molecule bound at
its site. The
helices in the ATP-binding domain are identified by letters. Helix A starts with Ala
, and helix I
ends at Thr
. The
-strands -1, 1, 2, and 2.5
show some homology to the ATPase
and
subunits(23) .
-Strands -1 and 1 are within the RNA-binding domain, as
defined by the deletion analysis of Modrak and Richardson(17) .
The two primary trypsin cleavage sites are shown. P loop identifies the
loop with the ATPase A sequence. Loops Q and R are the next two loops
that face the same side of the tertiary structure, and their residue
numbers are indicated. In the F
-ATPase subunits the
-strands identified on the model as 1 and 2 form
part of a extended sheet with strands 5, 4, 6, 7, 3, 8, and 9. The
predictions of the three C-terminal
helices labeled 1c, 2c, and 3c are tentative because they are based on
very limited sequence similarity(23) . The N-terminal segment,
which shows no sequence similarity with the F
-ATPase
subunits, is depicted as an oval at the top of the
picture.
The blocks of sequence in E. coli Rho that are most similar to the other ATPases are those that extend from residues 167 to 192 and from residues 251 to 275(20) . The former includes the ATPase A sequence motif, APPKAGKT in E. coli Rho, and is closely similar to the sequences that form the P loop in phosphohydrolases of known structure. The second block includes the ATPase B sequence motif, VIILLD in E. coli Rho, of which the final Asp plays an important role in catalyzing ATP hydrolysis(24) .
Two blocks
of amino acid residues that are highly conserved within Rho homologs do
not exhibit strong similarity to AtpB. Hence, these residues may be
important for structural and/or functional properties that are unique
to Rho and have been maintained by strong selective pressures during
evolution of the bacteria. The first of these conserved sequence blocks
extends from residues 297 to 310 in E. coli Rho, and the
second block extends from residues 324 to 342 and includes loop R,
helix G, and -strand 8 (Fig. 2).
The RNA-binding domain
is more diverged than the ATP-binding domain (Fig. 1). However,
one of its most highly conserved segments includes an RNP1-like
sequence, DGFGFLR in E. coli Rho. It is known from mutational
studies described below to be involved in RNA binding. The part of the
RNP1-like sequence in E. coli Rho that is most like that of
the consensus sequence of eukaryotic RNA-binding proteins is the core
sequence of GFGF (Fig. 1). In Rho the first of these Gly
residues is preceded by a highly conserved Asp residue. In contrast,
the RNP1 consensus has a conserved basic residue at that
position(25) . The Rhos also have an RNP2 motif IYV in E.
coli Rho that is another characteristic of the eukaryotic RNP
domain proteins. However, in Rho the RNP2 motif is about 12 residues
downstream of RNP1, ()at residues 79-81, whereas in
the eukaryotic proteins it is usually at least 30 residues upstream of
RNP1(25) . In this respect Rho is similar to a class of
single-stranded DNA-binding proteins in bacteria called cold shock
proteins, including CspA and CspB(26) . Thus, in spite of the
partial similarity, the RNA-binding domain of Rho does not seem to be
evolutionarily related to the eukaryotic RNA-binding proteins. The fact
that the Rhos and the eukaryotic RNA-binding domain proteins have the
same GFGF core sequence is likely a consequence of convergent
evolution.
An interesting variant of the structure of the
RNA-binding domain has been found with Rho proteins from organisms in
the high G + C Gram-positive phylogenetic branch. This branch
includes the Streptomycetes, Mycobacteria, and Micrococcus
luteus. The rho homolog genes from three organisms in
this group have been isolated and sequenced(27) . They all
encode larger polypeptides than do the rho genes isolated from
other organisms (690 versus 420 residues) with most of
the differences resulting from an insert of about 260 residues in the
RNA-binding domain. These inserts are in the phylogenetically divergent
region between the first and second conserved segments (residues
41-52 of E. coli Rho (Fig. 1)). Although all
three organisms have an insert of very unusual amino acid composition,
the sequences of these inserts are not conserved. The M. luteus insert starts with an Ala-rich segment followed by a segment that
has a large proportion of Arg, Asp, Gly, and Asn residues(27) .
It also has a very unusual 238-residue segment that lacks any
hydrophobic residues and is thus unlikely to have ordered secondary or
tertiary structures. Although the role of these inserts is not known,
the evidence that a similar insert is present in other organisms from
the same phylogenetic group suggests that the inserts arose as an
evolutionary adaptation. These organisms have an unusually high G
+ C content in their DNA, and since G residues have a strong
propensity to pair with other residues, their mRNA molecules are likely
to be more highly structured than RNA from organisms with fewer G
residues. E. coli Rho is known to initiate its action at sites
on the nascent transcript that have very little secondary
structure(5) . One possible adaptation that has been made in
these organisms is to have a Rho factor that can initiate its action on
a more highly structured RNA. Indeed, M. luteus Rho was found
to terminate transcription of
cro DNA with E. coli RNA polymerase at sites that are not accessible to E. coli Rho due to the structure of the transcript(27) .
Residues That Affect Primary Function within the Domains
The involvement of residues in the RNP1-like sequence for RNA
binding was established from studies of the functional defects of
specific mutant Rho factors. Mutants in which both the Phe residues in
the RNP1-like sequence (Phe, Phe
) were
changed to either leucines or alanines had lower affinities for RNA,
with the double Ala mutant being considerably more defective than the
double Leu mutant(16) . Of these two, Phe
appears
to be more sensitive to change than Phe
. This became
evident from the results of a study designed to determine which, if
any, of the 19 residues in and preceding the RNP1-like sequence were
critical for Rho function.
A large number of randomly
produced mutants with single residue changes were screened for defects
in termination function. This approach revealed three residues in the
RNP1-like sequence that were particularly sensitive to change:
Asp
, Phe
, and Arg
.
When Phe
was changed to a Ser, the resulting protein
had a 100-fold lower affinity for a
cro RNA (a
transcript terminated by Rho action) and was very defective for
transcription in vitro, thus implicating this residue in a
critical, non-ionic interaction with RNA. (
)Mutations of the
conserved Arg
residue to a number of other residues had
strong effects on Rho function in vivo,
and the
change of Asp
to a Gly residue caused Rho to bind more
tightly to RNA, particularly to RNA molecules lacking a rut sequence needed to activate termination.
Thus, three
residues in the phylogenetically conserved RNP1-like sequence are
particularly important for Rho function.
Within the ATP-binding
domain, mutants in which either of the two phylogenetically conserved
lysines at residues 181 and 184 in the ATPase A sequence were changed
to Gln residues had decreased termination efficiency, and the change of
Asp, a residue in the ATPase B motif, to Asn yielded a
mutant Rho with very low ATPase and undetectable
termination(28) . This result was consistent with the evidence
that the corresponding Asp residue in other ATPases has an essential
role in catalyzing ATP hydrolysis(24) , confirming the role of
this sequence motif in ATPase function.
From a collection of a
number of defective mutants with changes in residues in the C-terminal
100 amino acids, Miwa et al. (23) identified several
that affect the binding of ATP and have modest to severe effects on ATP
hydrolysis. Most of the changes were between residue 326 and 366 and
include regions with homology to the F-ATPase subunits. In
the model for the ATP-binding domain of Rho (Fig. 2) the changed
residues that affected ATP hydrolysis were all in contact with or near
various functional groups on the ATP, thus supporting the use of the
F
-ATPase structure as a model for the ATP-binding domain of
Rho.
Evidence for Cooperative Interactions between the Two
Domains
Several mutants of E. coli Rho have been isolated
based on defects in termination function in vivo. Recently the
residues changed by some of these mutations have been determined,
including rho1 and rho115(29, 30) .
Besides being defective in transcription termination in vivo these two mutants have similar functional defects. Both bind RNA
with about the same affinity as does wild-type Rho but are defective in
activation of RNA-dependent ATP hydrolysis. They also have greatly
increased K for oligo(rC) in assays in
which the hydrolysis of ATP depends on the presence of both poly(dC)
and oligo(rC)(30, 31) . Since poly(dC) can bind to the
primary site for polynucleotides in Rho, the change in K
is interpreted as affecting the
putative secondary site(32, 34) . In spite of their
similarity in function, the mutational changes in these two mutants are
in very different locations; rho1 has a change of Lys
to Glu (29) while rho115 has two changes, G99V
and P235H, of which the G99V change is responsible for the difference
in K
values for oligo(rC)(30) .
The Lys
change is of a conserved residue deep in the
ATP-binding domain while Gly
is a conserved residue of the
RNA-binding domain downstream from the RNP1 and RNP2 motifs.
Miwa et al.(23) also isolated some mutants of this type
that had increased K for oligo(C) but
were little changed in primary site RNA binding. One change is of an
unconserved residue at the C-terminal end (M416K) while another is of a
conserved residue in the ATP-binding domain (M327T). Thus, mutations
that yield this phenotype are not clustered in one or two specific
regions. Instead changes in a number of different regions can affect
the function of this proposed secondary RNA-binding site. Although some
of these changes might be affecting directly the interaction of Rho
with RNA at the putative secondary site, all or most of these mutants
could be merely affecting steps in the coupling of RNA binding with ATP
hydrolysis. These steps are likely to involve multiple inter- and
intrasubunit rearrangements and thus be sensitive to changes at many
different locations.
Another mutant that was isolated because it is defective for Rho-dependent termination in vivo is rho201. The Rho factor isolated from that mutant is also defective in its function in vitro. The protein has a single residue change, a Phe to Cys at residue 232(35) . The biochemical characterization of the defect of this Rho factor shows that it has a 100-fold lower affinity for mRNA and a greater rate of ATP hydrolysis with nonspecific RNA than does wild-type Rho. Thus, even though the mutational change is of a highly conserved Phe residue in the region of the ATP-binding domain between the ATPase A and ATPase B sequence, its primary defect is in RNA binding. This result suggests that mutations that are in the ATP-binding domain can affect the function of the RNA-binding domain.
Functional Changes Involving Non-conserved Residues
An example of a functional change caused by an alteration of
a residue in a phylogenetically unconserved region is provided by the
mutation Met to Lys. Another such mutation is Phe
to Leu(36) , which changes the specificity of Rho action
by making it more active(37) . In addition, two mutations with
functional defects have been isolated in the phylogenetically divergent
connector region. One is an Asp to Asn change at residue
156(36) ; the other is the change of Glu to Asp at residue
134(29) . The E134D mutant behaves like a classical polarity
suppressor mutation (like rho1), although it was isolated as a
suppressor of a defect in NusA(29) .
Some mutations in Rho
factor create defects in the cell that prevent growth of certain
bacteriophages, including and T4. Two mutants in this class
(called rho (nusD)) are rho026 and rho4008. The first has two changes, Pro
to Leu
and Ser
to Tyr, while rho4008 has only
Pro
to Leu. (
)Although the mechanism of the
bacteriophage growth exclusion has not yet been elucidated, these Rho
factors cause transcription of the
cro gene DNA template
to be terminated at sites that precede those used with wild-type Rho
alone but identical to those used when NusG is also present.
Thus, this mutation has made the Rho factor less dependent on
NusG, which is consistent with its partial termination activity after
NusG depletion in vivo(38) .
Is There a Site for RNA in the ATP-binding Domain?
Several seemingly unrelated mutations that affect
interactions with RNA (both primary and secondary) are actually
clustered in the part of the tertiary structure of the ATP-binding
domain that includes loop R, helix G, and -strand 8 (23) (Fig. 2). To understand how the side chains in this
region might be involved in interactions with RNA, we have to consider
how the subunits might be arranged in Rho factor. Electron micrographs
of negatively stained images (33, 39) and cryoelectron
microscopic studies (40) revealed a distinct ring shape
structure in which six globular subunits were arranged around a hollow
core. This organization closely resembles that of the
and
subunits of F
-ATPase. In the F
-ATPase the six
isologous subunits have a pseudo-C
symmetry (22) and are arranged with all the N-terminal domains on one
side and the C-terminal domains on the other. This organization differs
from the symmetry that has been proposed for Rho; based on an
interpretation of the intermediates formed during partial
protein-protein cross-linking studies, Geiselmann et al.(41) proposed that Rho has D
symmetry.
However, more recent studies of cross-linking intermediates are more
readily interpreted in terms of a C
symmetry, (
)like the F
-ATPase structure. Geiselmann et
al.(41) also found that fluorescent groups attracted to
Cys
in Rho were separated beyond the critical
Förster distance of
45 Å in the Rho
hexamer. They argued that this result was most consistent with a
D
structure. However, in the model of Miwa et al.(23) Cys
is near the N-terminal end of
-strand 4 or close to the surface periphery of the Rho hexamer (Fig. 2). With a
60-Å radius for the outer boundary
of the hexamer(40) , the individual Cys
residue
would be separated by
60 Å in the model with C
symmetry and thus be consistent with the fluorescence studies.
Assuming that the quaternary structure of Rho is similar to that of the
F
-ATPase, the six subunits in the hexamer would be oriented
with all of the RNA-binding domains on one face of the ring and with
the parts of the subunits containing loop R, helix G, and
-strand
8 exposed to the solvent in the inner hole of the ring. Thus, the fact
that many mutations that affect interactions with RNA are clustered in
the segments that extend into the hole suggests that the hole is a site
of interaction with RNA. This arrangement raises the possibility that
Rho could use the six RNA-binding domains to form the extended primary
binding site with the secondary RNA-binding site located in the hole of
the ring.
Fig. 3A shows a diagram of the six
subunits of Rho arranged with the six RNA-binding domains, colored red, all on one face of the ring structure, and Fig. 3B represents a complex of Rho with an RNA
molecule indicated by a blue line. It is shown making
extensive contact along the surface comprised of the RNA-binding
domains of the six subunits (the primary site) and with its 3`-end
inserted through the hole in the center of the ring (the secondary
site). Distinct conformational shifts in the structure of a subunit
could occur with each of the three following steps: binding of ATP;
conversion of ATP to ADP + P; and release of ADP
+ P
. This directed set of conformational changes could
be tightly coupled to contacts of the protein with the RNA in the hole
and could act to pull the 3`-end through the hole. Since the 5`-portion
of the RNA would be held by bonds along the surface of the RNA-binding
domain, the interactions set forth in this model would be a type of
tethered tracking, a mode of translocation that was suggested from
studies of Faus and Richardson (42) and of Steinmetz and
Platt(43) . After the 5`-portion of the rut site in
the nascent transcript forms stable contacts with the extended
RNA-binding domain surface of Rho, a dissociation and reassociation of
one or more subunits, which is known to occur readily with free Rho (44, 45, 46) and with Rho bound to RNA, (
)would allow the 3`-tail of the rut region of the
transcript to be captured in the hole. The electron micrographs of
negatively stained, unbound Rho often reveal a distinct break in the
ring(33, 40) , which might also be an open site for
allowing RNA to enter the hole. Once captured, the conformational
changes associated with concerted rounds of ATP hydrolysis would propel
Rho in the 3` direction along the RNA and provide a force that is
sufficient to dissociate the transcript from RNA polymerase.
Figure 3:
Model for quaternary structure of Rho
based on the arrangement of and
subunits of
F
-ATPase. The view is through the axis of the hexameric
ring. The ATP-binding domain is indicated in yellow and the
RNA-binding domain (which is based on the outline of the non-homologous
N-terminal domain in the F
-ATPase subunits) is shown in red. A, free Rho; B, Rho with a bound RNA
molecule (blue) with its 3`-tail through the
hole.
This
model differs from two others that have been proposed recently (47, 48) in ascribing a major role to a part of the
protein that has not previously been implicated. Although the parts of
Rho that comprise loop R, helix G, and -strand 8 have not been
shown yet to interact directly with RNA, the corresponding segments of
RecA, another protein with a core ATP-binding domain that is
topologically identical to that of
F
-ATPase(23, 49) , make direct contact
with DNA. Thus, there is a precedent for implicating that region of
Rho. Also recent work on the structure and mechanism of the RuvB
component of a multisubunit complex that promotes DNA branch migration
indicated that it may act by pulling DNA molecules through the hole of
a hexameric ring(50) . However, this model for Rho is very
speculative, and evidence that the RNA is inserted through the hole,
let alone is translocated through the hole, is lacking.
The results obtained from the phylogenetic analysis and studies with mutant Rho factors have provided new insights on its structure and function, and these have been used to formulate a speculative model for its mechanism of action. One purpose for proposing this model is to stimulate the design of experiments that will elucidate the true mechanism. Until the structure of Rho has been determined by x-ray crystallization analysis, we will have to continue to rely on these indirect approaches. Since one of the key differences in the model is the symmetry, further cross-linking studies and possibly immunoelectron micrographs might resolve this issue. Eventually, this work will lead to an understanding of how this protein can couple the chemical energy derived from ATP hydrolysis into the mechanical actions that dissociate a transcript from its complex with RNA polymerase and DNA.