(Received for publication, June 5, 1995)
From the
Two independent assays capable of measuring the relative in vivo translational step times across a selected codon pair in a growing polypeptide in the bacterium Escherichia coli have been employed to demonstrate that codon pairs observed in protein coding sequences more frequently than predicted (over-represented codon pairs) are translated slower than pairs observed less frequently than expected (under-represented codon pairs). These results are consistent with the findings that translational step times are influenced by codon context and that these context effects are related to the compatabilities of adjacent tRNA isoacceptor molecules on the surface of a translating ribosome. These results also support our previous suggestion that the frequency of one codon next to another has co-evolved with the structure and abundance of tRNA isoacceptors in order to control the rates of translational step times without imposing additional constraints on amino acid sequences or protein structures.
While it is known that translational elongation rates are discontinuous and influenced by codon context, the reasons for variations in translational step times across individual codon pairs are not well understood. The prevailing theory has been that translational rates reflect the correlation between the species-specific usage of a given codon and the abundance of its cognate tRNA (Sorensen et al., 1989). However, more recent experimental results support the idea that translation rates are influenced by the compatabilities of adjacent tRNAs in the A- and P-sites on the surface of translating ribosomes (Smith and Yarus, 1989; Yarus and Curran, 1992). In support of this idea, we previously described an extreme, species-specific, codon pair utilization bias in bacteria, yeast, and mammals (Gutman and Hatfield, 1989; Hatfield and Gutman, 1992). We showed that some codon pairs are used in protein coding sequences much more frequently than expected from the usage of the individual codons of these pairs (over-represented codon pairs), and that some codon pairs are observed much less frequently than expected (under-represented codon pairs). Similar results were obtained by Kolaskar and Reddy(1986) who analyzed codon pair bias in the protein coding sequences of Escherichia coli together with nine coliphages.
For E. coli, our codon pair utilization
analysis (Gutman and Hatfield, 1989; Hatfield and Gutman, 1992) was
performed on a collection of 237 nonredundant protein coding sequences
containing 75,403 codon pairs taken from the GENBANK data base (Release
40.0). The usage frequencies of the 61 nonterminating codons were
determined and used to calculate the expected values for the random
occurrence of each of the 3721 (61) codon pairs in these
sequences. The actual occurrence of each codon pair in the data set was
tabulated, and these expected and observed values were used to
calculate a
value for each codon pair (CHISQ1). (
)These values represent the degree of bias of codon pair
usage, and we arbitrarily identify
values associated
with under-represented codon pairs as negative numbers. A second set of
expected values, which removes the component of codon pair bias
associated with the small bias for amino acid nearest neighbors in E. coli (Gutman and Hatfield, 1989) was calculated
and used to generate a second set of
(CHISQ2). A
third set of expected values, further corrected for the well known
III-I dinucleotide bias in protein coding sequences (Fluck et
al., 1977; Bossi and Roth, 1980; Colby et al., 1976;
Fienstein and Altman, 1977), was calculated, and yet another set of
values (CHISQ3) was generated. Thus, the bias
represented by CHISQ3 (used in this report) cannot be the consequence
of the bias in amino acid nearest neighbors, the bias of adjacent
nucleotides between codons, or the bias of codon usage (since the
actual codon frequencies were used to calculate the expected values).
These CHISQ3 values range from 125.7 for the most over-represented GAA
CUG, Glu-Leu, codon pair to -52.5 for the most under-represented
CUG CAG, Leu-Gln, codon pair.
We also previously demonstrated that codon pair biases are directional (the bias associated with codon pair A-B is independent of the bias associated with codon pair B-A) and restricted to adjacent codons, and that genes expressed at high levels tend to avoid over-represented codon pairs (in addition to their well known avoidance of infrequently used codons (Sharp and Li, 1986; Ikemura, 1992)). These observations suggested that at least a portion of codon pair bias is related to the translation process. This conclusion is consistent with the hypothesis that codon pair bias is correlated with translational step times which, in turn, might be related to the compatability of adjacent tRNA molecules on the surface of a translating ribosome. It is, therefore, possible that the use of one codon next to the other may have co-evolved with the structure and abundance of tRNA isoacceptors to control the rates of translation step times without imposing constraints on amino acid sequences or protein structure (Hatfield and Gutman, 1992).
In this report, we describe two independent assays capable of measuring the relative in vivo translational step times of specific codon pairs in a growing polypeptide chain. We have used each of these assays to demonstrate that over-represented codon pairs are translated more slowly than under-represented codon pairs. One assay is based on the observation that a ribosome pausing at a site near the beginning of an mRNA coding sequence can inhibit translation initiation by physically interfering with the attachment of a new ribosome to the message (Liljenstrom and von Heijne, 1987; Bergmann and Lodish, 1979). The other assay is based on the fact that the transit time of a ribosome through the leader polypeptide coding region of the leader RNA of the trp operon sets the basal level of transcription through the trp attenuator (Landick and Yanofsky, 1987).
A plasmid containing the trpLep (Landick et al., 1990) leader-attenuator region of the E. coli tryptophan operon transcriptionally fused to a 3`-truncated lacZ gene with unique PstI and EcoRI restriction endonuclease sites in the trp leader polypeptide coding sequence was constructed as follows. Two deletions were made in the Simons transcriptional fusion vector pRS551 (Simons et al., 1987). A 3853-bp BclI restriction endonuclease fragment containing the 3` portion of the lacZ gene and a 520-bp XhoI-HindIII endonuclease restriction fragment from the kanamycin resistance gene were removed to form plasmid pXH1. The unique PstI site in the ampicillin resistance gene of pXH1 was removed by replacing the BsaI-ScaI restriction endonuclease fragment of this plasmid with the analogous BsaI-ScaI fragment from pUC19. The unique EcoRI site in pXH1 was eliminated by digestion of the plasmid with EcoRI and end-filling with the Klenow fragment of DNA polymerase I and self-ligation. A 490-bp Sau3AI restriction endonuclease fragment containing the trpLep leader-attenuator region was isolated from the plasmid pRL410 (Landick et al., 1990) and ligated into the unique BamHI site of the PstI- and EcoRI-plasmid pXH1 to yield the plasmid pBI-1. Plasmid constructions were verified by DNA sequencing.
All cultures were grown on LB agar or in Luria broth (Miller, 1972). Ampicillin was added to the medium at a final concentration of 100 µg/ml for the growth of strains containing plasmids and 50 µg/ml for the growth of strains containing plasmids integrated into the bacterial chromosome.
Figure 1:
A, the
DNA sequence of the ptrc::lacZ transcription and
translation initation region. The -35 and the -10 RNA
polymerase recognition sequences of the trc promoter are underlined. The transcriptional start site of the lac mRNA is identified as +1. The palindromic sequences of the lac repressor binding site are identified and labeled lacO. The Shine-Dalgarno region of the lacZ gene,
important for ribosome attachment and translation initiation, is underlined and labeled lacS.D. The first 10
codons of the lacZ gene and the first 10 amino acids of
-galactosidase are shown. The locations of the NcoI and BamHI restriction endonuclease sites used to replace the first
nine codons of the lacZ coding region with synthetic
double-stranded DNA oligonucleotides are shown. Insertion of the
oligonucleotides destroys the NcoI site by changing the 3` G
to an A as shown. B, the trp leader polypeptide
coding region. The DNA sequence of the region of the wild-type trp leader containing the 14 amino acid leader polypeptide coding
sequence and the dyad symmetry encoding the stem-loop 1:2 structure
(denoted by the arrows) of the leader RNA is shown in the
upper sequence. The DNA sequence of the region of the trpLep leader containing the 15 amino acid leader polypeptide coding
sequence of the trpLep leader with PstI and EcoRI sites, used to replace codons 8 through 15 with codons
encoded in double-stranded DNA oligonucleotides, is shown in the lower
sequence. The three bases encoding a glutamine codon, CAG, that were
inserted into the leader to create the PstI site and the one
base change and one base insertion that created the EcoRI site
are indicated with bold type.
When the slightly under-represented Ala-Leu codon
pair (GCC CUU, CHISQ3 = -5.7) at positions 3 and 4 of this lacZ mRNA sequence was changed to the more highly
under-represented Thr-Leu (ACC CUG, CHISQ3 = -27.3) codon
pair, the steady state rate of -galactosidase synthesis increased
2-fold (Table 1, compare strains IH78 and IH35). The further
observation that the expression of the downstream lacA gene of
the polycistronic lacZYA operon, which encodes a
thiogalactoside transacetylase, remained the same in both constructs
suggested that these nucleotide changes did not affect the
transcriptional initiation rate or the stability of the lac mRNA. In fact, this is true for all of the codon pair
substitutions reported in Table 1. It should also be noted that
none of the altered codon pairs described in Table 1significantly alter the bias of the flanking codon pairs
at positions two and three or four and five. These results suggest,
therefore, that the highly under-represented codon pair at positions
three and four is translated faster than the moderately
under-represented pair at the same position and that the difference in
the steady state levels of
-galactosidase produced from mRNAs
containing these codon pairs is the result of different translation
initiation rates.
When a single nucleotide change was made in the
highly under-represented Thr-Leu (ACC CUG, CHISQ3 = -27.3)
codon pair to create the highly over-represented Thr-Leu (ACG CUG,
CHISQ3 = 78.9) codon pair, the translational activity of the lacZ gene decreased nearly 10-fold (Table 1, compare
strains IH35 and IH12). The fact that the amino acid sequence of
-galactosidase produced by the lacZ mRNA sequences in
strains IH35 and IH12 is unaltered argues against any intrinsic
differences in
-galactosidase activities in this experiment. Also,
the observation that the level of lacA expression remains the
same in these strains argues against any changes in message stability
or
-induced transcription termination.
Since our previous statistical analyses (Gutman and Hatfield, 1989; Hatfield and Gutman, 1992) showed that there is no correlation between the directionality of a codon pair and its bias, i.e. codon pair A-B versus B-A, we also wished to determine if there is a lack of correlation between the translational efficiencies of a codon pair in a forward and a reverse orientation. If under-represented codon pairs are translated faster than over-represented codon pairs and codon pair usage (codon context) is related to translational efficiency, as suggested by the above results, then the under-represented ACC CUG codon pair might be expected to be translated faster than the randomly utilized CUG ACC codon pair. The data in Table 1show that this is, indeed, the case (compare strains IH35 and IH53). In fact, the mRNA sequence in strain IH53 is translated at about the same efficiency as the mRNA sequence in strain IH78 which is also composed of randomly used codon pairs (Table 1). Furthermore, the fact that both ACC and CUG are frequently used codons suggests that the translational efficiency of these codons is not simply related to the frequency of usage of the individual codons.
Since the above results suggested that a highly over-represented codon pair is translated more slowly than an under-represented pair, we sought to extend these observations by determining how the translational efficiency of a modestly over-represented codon pair compares to the translational efficiencies of more highly over- and under-represented codon pairs. The data in Table 1show that the modestly over-represented Ala-Leu codon pair (GCG CUG, CHISQ3 = 12.3) in strain IH1718 is translated only half as efficiently as the slightly under-represented Ala-Leu codon pair (GCC CUU, CHISQ3 = -5.4) in strain IH78 and 5-fold less efficiently than the highly under-represented Thr-Leu codon pair (ACC CUG, CHISQ3 = -27.3) in strain IH35. However, the modestly over-represented codon pair in strain IH1718 is not translated as slowly as the highly over-represented codon pair in strain IH12. Thus, these results suggest a relationship between the degree of codon pair utilization bias and translational efficiency.
All of the codon pair substitutions reported in Table 1are at codon positions three and four of the lacZ coding sequence. Since it was possible that this region of the message might be important for translational initiation in a manner unrelated to codon context (Gold and Stormo, 1987), we examined the translational efficiency of the over- and under-represented codon pairs shown in Table 1placed farther downstream at codon pair positions six and seven. In these cases, we observed the same results as shown in Table 1(data not shown). Thus, there is no significant positional effect on the placement of these codon pairs early in the coding sequence of this gene. This suggests that the sequence changes we have made do not influence the translational initiation mechanism in a trivial way such as by facilitating base pairing with upstream sequences and interfering with the attachment of a ribosome to the lacZ message.
The data in Table 2show that, as predicted, the replacement of the non-biased Lys-Gly codon pair AAA GGU (CHISQ3 = -0.8) with the under-represented (rapidly translated) Thr-Leu codon pair, ACC CUG (CHISQ3 = -27.3), does not significantly affect the basal level of transcription through the trp attenuator (compare strains IH211 and IH278). However, the replacement of this same codon pair with the highly over-represented (slowly translated) Thr-Leu codon pair, ACG CUG (CHISQ3 = 78.9), results in a 2-fold increase in transcription through the trp attenuator (Table 2; compare strains IH211 and IH256).
This level of deattenuation is less than expected (5-6-fold)
if transcription into the lacZ gene were fully deattenuated,
which might have been expected with a codon pair that severely inhibits
translation initiation (Table 1). One explanation for this low
level of transcription through the attenuator might be that the
insertion of the over-represented codon pair into the leader alters the
secondary structure of the leader RNA. This is unlikely, however, since
the data in Table 2show that comparable basal levels of
transcription through the trp attenuator are observed when the
translation initiation codon of the leader polypeptide coding region in
strains IH211, IH278, and IH256 are changed from AUG to AUA (strains
IH212, IH279, and IH257, respectively). While these codon changes
abolish translation of the leader polypeptide coding region, the
superattenuated basal levels of transcription through the attenuator of
these three strains measured by the production of -galactosidase
is the same. Thus, the 2-fold increase in transcripton into the lac structural genes observed in strain IH256 must be due to
translation through the leader polypeptide coding region of the trp leader RNA and not due to an alteration of the intrinsic secondary
structures. Also, the fact that the
-galactosidase to
transacetylase activity ratios vary less than 2-fold for all of the
strains shown in Table 2suggests that the nucleotide changes we
have introduced into codons nine and ten of the trp leader RNA
do not significantly affect the translational initiation of the lacZ gene.
Another explanation for the low level of deattenuation observed with the highly over-represented ACG CUG codon pair might be that the stalling of a ribosome on the over-represented codon pair only partially disrupts the base pairings in the stem 1:2 region. This possibility was tested by pausing a ribosome at this same position by an independent mechanism. In this case, we placed a UGA translational stop codon at either base pair position nine in strain IH869 or ten in strain IH8610 of the trpLep leader polypeptide coding sequence (Table 2). It has been demonstrated previously that the substitution of a Trp codon with a stop codon in the trp leader causes full deattenuation due to the slow release of the ribosome from the leader RNA (Landick, 1990). The data in Table 2show that the stalling of a ribosome at codons eight and nine with codon eight in the P-site and the stop codon at position nine in the A-site does not cause deattenuation. However, ribosome stalling at codons nine and ten in strain IH10 causes a 2-fold deattenuation, the same level observed when a ribosome is stalled at the over-represented codon pair at positions nine and ten in strain IH256. Therefore, the stalling of a ribosome at the codon pair immediately preceding the tandem Trp codon pair does, indeed, lead to only a partial deattenuation.
In summary, our interpretation of the results obtained with the attenuation assay are the same as our interpretations of the translation initiation assay; that is, the over-represented Thr-Leu codon pair ACG CUG (CHISQ3 = 78.9) is translated slower than the under-represented Thr-Leu codon pair ACC CUG (CHISQ3 = -27.3).
The data reported here suggest a correlation between the
biased use of an individual codon pair and the translational efficiency
of that pair. We have demonstrated that an over-represented codon pair
is translated slower than an under-represented codon pair, and that the
more over-represented a codon pair is, the slower it is translated.
These effects of codon pair bias on translation are consistent with the
facts that codon pair biases in E. coli are directional and
limited to nearest neighbors (there is very little correlation between
the values of any given codon pair and its reverse
counterpart, and more than 95% of the codon pair utilization bias is
removed when codon pairs separated by two or three intervening codons
are examined (Gutman and Hatfield, 1989; Hatfield and Gutman, 1992)).
For example, the non-biased Leu-Thr codon pair (CUG ACC, CHISQ3
= 0.0) is translated more than two times slower than the highly
under-represented Thr-Leu codon pair (ACC CUG, CHISQ3 =
-27.3; Table 1). In addition to supporting the correlation
between codon pair bias and translational efficiency, this observation
also shows that translational efficiency is more closely related to
codon context (codon pair bias) than it is to the utilization frequency
of individual codons. This is because both ACC and CUG are frequently
used codons in E. coli, but the ACC CUG and CUG ACC pairs are
translated at markedly different rates in a context where the biases of
the flanking codon pairs are not significantly altered. If the
differing translation rates were due primarily to the frequency of
usage of one or the other of these codons then the translation rates in
both orientations would be expected to be the same.
The data
presented in Table 1also show that two codon pairs with nearly
equal codon pair bias values, but encoding different amino acid pairs
and differing at all six nucleotide positions, can exhibit the same
translational efficiency (compare strains IH78 and IH53). This
observation suggests a close relationship between codon pair
values and translational efficiency. However, this
close relationship might not be observed in every case. For example, it
is known that identical codons located at different positions in an
mRNA can be read by different isoacceptor tRNAs (Holmes et
al., 1977; Goldman et al., 1979). Therefore, if
translational efficiency is the consequence of the compatability of
adjacent tRNA molecules on a translating ribosome, then the
translational step time across a given codon pair could differ
depending on the identity of the tRNA molecules decoding these codons.
In this case, other factors that affect isoacceptor tRNA selection,
such as III-I dinucleotide biases and codon-anticodon stacking
energies, could also affect translational efficiency. These sorts of
effects on the results presented in Table 1cannot be excluded.
To confirm the conclusions drawn from the results of the translation initiation assay, we employed an independent trp attenuator-based assay to examine the translational efficiency of the same highly over-represented and under-represented codon pairs. With this assay we demonstrated that the highly under-represented Leu-Thr codon pair (ACC CUG, CHISQ 3) that severely inhibits translation initiation also restricts translation of the polypeptide coding sequence in the trp leader-attenuator region and causes increased transcription through the trp attenuator.
The observations reported here suggest that the discontinuities in the translation rates of genes are ``hard wired'' into the sequence of each gene. If this is so, then it is reasonable to assume that the use of one codon next to another has co-evolved with the structure and abundance of tRNA isoacceptors in order to control the rates of translational step times without imposing additional constraints on amino acid sequences or protein structures. This hypothesis offers a simple explanation for the large, seemingly excessive, number of tRNA isoacceptor molecules found in all living cells. It implies that, for any given amino acid sequence, appropriately biased codon pairs can be employed to set the translational step times for the addition of amino acids to the growing polypeptide chain. In this manner, translational pauses important for the folding and other functions of nascent polypeptide chains can be incorporated into the DNA coding sequence of a gene. However, since the relative translational efficiencies of only a small number of codon pairs have been studied, it is not yet possible to ascertain how consistent the relationship between translational step times and codon pair bias values will be. As more codon pairs are examined, it will be interesting to determine if it is possible to use these values in a way that will be predictive of relative translational step times for the identification of translational pause sites.
In summary, we have employed two independent assays to demonstrate a close relationship between the translational efficiency of a codon pair and its degree of bias in protein coding sequences of E. coli. We have demonstrated that at least some codon pairs that are observed in protein coding sequences more frequently than predicted by the frequency of the individual codons in that pair are translated slower than codon pairs that are found less frequently than expected. Additionally, we have demonstrated a general relationship between the utilization bias of a codon pair and its translational bias; the more over-represented these codon pairs are the slower they are translated, and the more under-represented they are the faster they are translated. We have also shown that the translational efficiency of a given codon pair is correlated with its codon pair utilization bias and not with the utilization frequency of the individual codons of the pair. If we conclude, therefore, that codon pair bias is related to translational step times which are mechanistically related to the compatability of adjacent tRNA molecules on the surface of a translating ribosome, then each of these observations is predicted by the results of our statistical analyses of the codon pair utilization patterns in protein coding sequences of E. coli (Gutman and Hatfield, 1989; Hatfield and Gutman, 1992).