(Received for publication, December 9, 1996, and in revised form, March 17, 1997)
From the ¶ Gladstone Institute of Cardiovascular Disease, the
Cardiovascular Research Institute, and the ** Department of
Medicine, University of California, San Francisco, California
94141-9100 and the
Departments of Medicine and
Pharmacology, Vanderbilt University School of Medicine,
Nashville, Tennessee 37232
We previously characterized a mutant apoB allele
(the apoB86 allele) that produces both a truncated apoB (apoB86) and a
full-length apoB100. The mutant allele contained a deletion of a single
cytosine in exon 26, creating a stretch of eight consecutive adenines
in the 1 reading frame. The altered reading-frame allele was
restored, with ~10% efficiency, by the transcriptional insertion of
an extra adenine into the stretch of eight consecutive adenines,
thereby accounting for the synthesis of the full-length apoB100. Here, we demonstrate that this reading-frame restoration does not occur when
the long stretch of adenines is interrupted by a cytosine. To assess
whether reading-frame restoration is unique to a single site in the
apoB gene, the same mutation (eight consecutive adenines in the
1
reading frame) was inserted into another site within the apoB gene.
Reading-frame restoration occurred at the second site and was abrogated
when the stretch of adenines was interrupted by another base. Of note,
a computerized analysis of human cDNA sequences revealed that long
stretches of adenines in protein-coding sequences occur at a lower than
predicted frequency, suggesting that evolution has selected against
these sequences.
In 1992, we reported the existence of a mutant apoB allele causing familial hypobetalipoproteinemia, the apoB86 allele (1). This allele, which resulted in the synthesis of both a truncated apoB (apoB86) and a full-length apoB100, contained a deletion of a single cytosine residue in exon 26 of the apoB gene. This frameshift mutation was predicted to yield a stretch of 20 novel amino acids, followed by a premature stop codon. That this mutation was responsible for the production of apoB86 was proved by immunochemical studies utilizing monoclonal antibodies and an anti-peptide antibody directed against the 20 novel amino acids at the carboxyl terminus of apoB86 (1). The production of apoB100 from this allele involved a novel mechanism: reading-frame restoration by transcriptional insertion of an extra nucleotide at the site of the 1-base pair (bp)1 deletion. The deletion of the single cytosine residue in the apoB86 allele created a stretch of eight consecutive adenine residues. Minigene expression studies in cultured hepatoma cells demonstrated that the single cytosine deletion, along with the stretch of eight consecutive adenines, was faithfully present in the genomic DNA, but that ~10% of apoB86 cDNA clones actually contained nine consecutive adenines (1). Thus, the production of apoB100 by the apoB86 allele was due to the transcriptional insertion of an extra adenine, which restored the proper reading frame to the mutant mRNA. These studies provided the first in vivo evidence of "transcriptional slippage" (or "stuttering") along a long stretch of consecutive adenines in mammalian cells.
Slippage of RNA polymerase during transcription was first proposed by Chamberlin and Berg in 1962 (2). In recent years, Wagner et al. (3) described transcriptional slippage by Escherichia coli RNA polymerase during RNA elongation at runs of 10 or more adenines or thymines, resulting in the addition of untemplated thymine or adenine residues and restoration of the proper reading frame to out-of-frame lacZ constructs. Interestingly, RNA polymerase stuttering was not observed when similar experiments were performed in yeast (3).
This study was undertaken to define further the DNA sequence requirements for reading-frame restoration in the apoB86 allele. We also sought to determine whether the reading-frame restoration was somehow unique to the apoB86 allele or would occur at similar sequences elsewhere in the apoB gene. Finally, because transcriptional slippage could potentially introduce frameshift mutations, we hypothesized that evolution may have selected against the presence of long stretches of adenines in protein-coding sequences. To assess this possibility, we analyzed whether long stretches of adenines occur at a lower than predicted frequency in human proteins containing three consecutive lysine residues.
Blood was collected from H. J. B. (4-7), and plasma was used to prepare very low density lipoproteins, which were used as a source of apoB size standards for Western blot experiments. H. J. B. is a compound heterozygote for hypobetalipoproteinemia with two mutant apoB alleles: an apoB37 allele (7, 8) and an apoB86 allele. The apoB37 allele yields exclusively apoB37 (6, 7), whereas the apoB86 allele yields apoB100, apoB86, and apoB48 (1, 6, 7). ApoB48 is produced by the apoB86 allele as a result of apoB mRNA editing in the intestine (9).
ApoB86 Minigene and cDNA Expression VectorsThe apoB86
allele contains a single cytosine deletion (cDNA nucleotide 11840)
in exon 26 of the apoB gene, which generates a stretch of eight
consecutive adenines in the 1 reading frame (i.e. the
mutation changes the sequence AAA AAC AAA to AAA AAA AA). An apoB
fusion minigene vector (pB18/86) containing the 1-bp deletion found in
the apoB86 allele and a wild-type vector lacking the mutation
(pB18/100) have been described (1). For this study, we mutated the
pB18/100 expression vector to delete one of the adenines immediately
preceding cytosine 11840 instead of cytosine 11840 itself. This
construct contained the sequence AAA ACA AA and was designated
pB18/86:4AC3A (because it had four adenines, one cytosine, and then
three more adenines) (Fig. 1). Thus, pB18/86:4AC3A contained the same
1 frameshift as the apoB86 allele, but lacked the
long stretch of adenines. To create pB18/86:4AC3A, we used the
mutagenesis technique of Deng and Nickoloff (10) and a mutagenic primer
(5
-GCCAGTTTGAAAACAAAGCAGAT-3
). In addition, we generated pB18/86cDNA (Fig. 1), which is identical to pB18/86, except that it
was constructed entirely from cDNA clones. For this clone, a
2119-bp BamHI-HindIII cDNA fragment was
ligated to the pB18 cDNA vector, rather than to the 3567-bp
BamHI-HindIII genomic fragment. The apoB86
mutation was introduced into the vector by site-directed mutagenesis.
All mutations were confirmed by DNA sequencing.
Introduction of an "ApoB86-like" Frameshift Mutation into a cDNA Expression Vector Coding for ApoB46
A cDNA
expression vector coding for apoB46, pB46N (originally designated
pB46neo; constructed in the neo-containing vector pRC/CMV (Invitrogen, San Diego, CA)), was obtained from Dr. Zemin Yao
(University of Ottawa, Ottawa, Canada) (11). The apoB-coding sequence
of pB46N terminates at the EcoRI site at apoB cDNA
nucleotide 6507 (corresponding to apoB amino acid residue 2099). To
introduce a mutation similar to that observed in the apoB86 allele, we
used site-directed mutagenesis (with the primer
5-GACTCCAAAAAAAACAGCATT-3
) to delete a single guanine residue (apoB
cDNA nucleotide 4553) from the middle of a long stretch of
adenines, changing the sequence AAA AAG AAA (coding for amino acids
1447-1449) to eight consecutive adenines in the
1 reading frame (AAA
AAA AA) (see Fig. 3). The vector pB46N8A was predicted to code for a
truncated apoB protein (apoB32) containing 1492 amino acids (including
44 novel amino acids at the carboxyl terminus). An apoB46 expression
vector (pB46N9A) containing nine consecutive adenines was constructed
by mutating the guanine at cDNA nucleotide 4553 to an adenine. This
vector did not have a frameshift mutation and therefore was predicted to yield the full-length apoB protein. We also generated another vector
(pB46N11A) containing 11 consecutive adenines in the
1 reading frame
(see Fig. 3).
Transfection of COS-7 Cells
COS-7 cells were grown to ~30% confluency in T-75 flasks and transfected with 10-15 µg of plasmid DNA by calcium phosphate coprecipitation and glycerol shock as described previously (12). Two days after the transfection, the cells were washed twice with phosphate-buffered saline, scraped from the plates, and pelleted by centrifugation. The cell pellet was dissolved in 200 µl of sample buffer (62.5 mM Tris-HCl, pH 6.8, 20% glycerol, 2% SDS, and 5% 2-mercaptoethanol) and sonicated before electrophoresis on SDS-3-12% polyacrylamide gels. As a control for these experiments, COS-7 cells were transfected with an apoB33 cDNA expression vector, pB33 (13).
Selection of Stable McA-RH7777 Cell LinesStable transformants of McA-RH7777 cells (a rat hepatoma cell line) were obtained by cotransfecting cells with 10-15 µg of minigene DNA with pSV2neo (at a molar ratio of 20:1), followed by selection with G418 (400 µg/ml) (1, 13). The pB46neo series of constructs (see Fig. 3) contained a neo in the plasmid backbone, obviating the need for cotransfection with pSV2neo. Individual G418-resistant colonies were picked, and stable cell lines were propagated in medium containing G418. To analyze apoB secretion from the cells, ~30% confluent cells were incubated in serum-free Dulbecco's modified Eagle's medium for 2 days. The conditioned medium was then harvested and analyzed for apoB expression by Western blotting. As controls for these experiments, we analyzed the human apoB-containing lipoproteins secreted by McA-RH7777 cells stably transformed with apoB31 or apoB37 cDNA expression vectors (pB31 and pB37, respectively) (13) and McA-RH7777 cells stably transformed with p158, a P1 bacteriophage clone spanning the human apoB gene (14).
Western Blot Analysis of Human ApoB ProteinsHuman apoB proteins from transfected COS-7 cell lysates or from the medium from stably transformed McA-RH7777 cells were detected on Western blots of SDS-3-12% polyacrylamide gels (14) using the human apoB-specific monoclonal antibody 1D1 (which binds to an epitope between apoB amino acids 474 and 539) (15). The human apoB-containing lipoproteins in the medium were either concentrated by adsorption onto Cab-O-Sil (13) or partially purified from the medium by preparing the d < 1.21 g/ml lipoproteins by ultracentrifugation.
Density Gradient UltracentrifugationThe medium from an McA-RH7777 cell line transfected with pB46N11A was analyzed by discontinuous density gradient ultracentrifugation (7, 13). All density fractions were dialyzed against phosphate-buffered saline containing 1.0 mM EDTA before analysis by Western blotting of SDS-3-12% polyacrylamide gels.
Analysis of Codon Usage in Human Proteins with Lys-Lys-Lys MotifsA total of 150 Lys-Lys-Lys motifs from 128 different proteins were identified in the SWISS-PROT data base (Release 33) using the BLITZ program from the European Bioinformatics Institute.2 The nucleotide sequence for the Lys-Lys-Lys motif was then recorded. Proteins with more than one isoenzyme were included only once because including multiple isoenzymes might bias the results of the analysis (i.e. the isoenzymes might have arisen by alternative splicing events or by a relatively recent gene duplication event). The accession numbers for the 128 human proteins, together with the nucleotide sequences coding for the Lys-Lys-Lys motif, are listed in the "Appendix."
Because two different codons specify lysine (AAA and AAG), there are
eight different nucleotide sequences coding for Lys-Lys-Lys. The
frequency of these eight different sequences was predicted from the
codon usage for lysine in human protein-coding sequences (taken from
the studies of Nakamura et al.
(16)).3 The differences between the
predicted frequencies of the different nucleotide sequences and the
actual frequencies of the 128 different human proteins were analyzed
with a 2 test (17).
Initially, we
sought to further define the mechanism for reading-frame restoration in
the apoB86 allele and to determine whether this phenomenon was unique
to hepatoma cells. We transiently transfected COS-7 cells with apoB
expression vectors and then analyzed the COS-7 cell extracts on Western
blots of SDS-polyacrylamide gels. As shown in Fig. 2 (lane
2), the pB18/100 vector yielded a full-length apoB
protein of the expected size (approximately apoB34-sized, migrating
ahead of the apoB37 size standard). The mutant vector pB18/86 yielded
the expected truncated protein (approximately apoB24-sized), but, in
addition, yielded small amounts of the full-length protein, indicating
that reading-frame restoration occurs in COS-7 cells. To test whether
reading-frame restoration depends on the creation of a long stretch of
adenines, we used site-directed mutagenesis to construct a mutant
expression vector (pB18/86:4AC3A) in which one of the adenines, rather
than the cytosine, was deleted (converting the wild-type sequence AAA
AAC AAA to AAA ACA AA) (Fig. 1). This construct contains a 1
frameshift within the same codon as that in the apoB86 allele, but does
not result in a long stretch of adenines. When transfected into COS-7 cells, the pB18/86:4AC3A vector yielded only a truncated apoB and no
full-length apoB (Fig. 2, lane 4), indicating that the reading-frame restoration requires the long stretch of adenines. The
frameshift mutation in pB18/86 was <100 bp 5
of intron 26. Although
the 1-bp deletion did not appear to create a cryptic splice site (1),
we nevertheless considered it important to consider the possibility of
an unusual or low frequency mRNA splicing event in the
reading-frame restoration. To achieve this goal, we generated a pB18/86
expression vector entirely from cDNA (pB18/86cDNA) and analyzed
its expression in COS-7 cells. Western blot analysis revealed that
pB18/86cDNA yielded both the truncated and full-length apoB
proteins, indicating that reading-frame restoration can occur in the
absence of the nearby introns (Fig. 2, lane 5).
Reading-frame Restoration with an ApoB86-like Mutation Introduced into Another Site in the ApoB Gene
We next sought to test whether
the reading-frame restoration at the stretch of eight consecutive
adenines is a unique feature of a specific site within the apoB gene
(i.e. that reading-frame restoration might depend on the
sequences adjacent to the apoB86 mutation, perhaps because they form a
secondary structure that facilitates transcriptional stuttering) or is
a more general phenomenon that could occur at other sites. We generated
a mutant apoB46 expression vector (Fig. 3) containing an
apoB86-like frameshift mutation. The mutant apoB46 construct (pB46N8A)
contained a deletion of a single nucleotide (the guanine at cDNA
nucleotide 4553), which converts the sequence AAA AAG AAA to AAA AAA
AA. This 1 frameshift was predicted to yield 44 novel amino acids,
followed by a premature stop codon (TGA), generating an apoB32-sized
protein. If transcriptional slippage occurred along the stretch of
eight consecutive adenines, thereby correcting the reading frame, a full-length (apoB46-sized) protein would also be expected. In transient
expression studies in COS-7 cells, pB46N8A yielded the predicted
apoB32-sized protein, but also yielded a small amount of apoB46 (Fig.
4, lanes 5 and 6). To test whether
the reading-frame restoration was dependent on the stretch of eight
consecutive adenines, we generated another mutant construct, pB46N4AG3A
(Fig. 3), in which one of the adenines, rather than the guanine, was deleted (converting the sequence AAA AAG AAA to AAA AGA AA). This construct yielded only the truncated protein, apoB32, and none of the
full-length protein (Fig. 4, lanes 7 and 8). In
another construct, pB46N9A (Fig. 3), we substituted an adenine for the guanine, generating the sequence AAA AAA AAA. The predominant product
of this construct was a full-length apoB46; interestingly, however, a
small amount of a truncated, apoB32-sized protein was also produced,
suggesting that transcriptional slippage may have introduced a +1
frameshift at the site of the nine consecutive adenines (Fig. 4,
lanes 3 and 4). (A +1 frameshift would be
expected to yield a truncated, apoB32-sized protein.)
The expression of the pB46N8A and pB46N4AG3A constructs was further
tested by generating stably transformed McA-RH7777 cell lines and using
Western blots of SDS-polyacrylamide gels to examine the human
apoB-containing lipoproteins in the cell culture medium. Western blots
of the d < 1.21 g/ml lipoproteins from pB46N8A
transformants confirmed that pB46N8A results in the production of both
apoB32 and apoB46 (Fig. 5, lanes 3 and
4). In contrast, pB46N4AG3A stable transformants yielded
only apoB32 (Fig. 5, lanes 5 and 6), indicating that reading-frame restoration does not occur when the stretch of
adenines is interrupted.
To determine if a longer stretch of adenines results in reading-frame
restoration, we generated a pB46N11A expression vector (Fig. 3), which
was similar to pB46N8A except that it contained 11 consecutive adenines
in the 1 reading frame. Western blots of SDS-polyacrylamide gels
showed that the pB46N11A-transformed cells produced both the predicted
truncated apoB32 and the full-length apoB46 (Fig. 6,
lanes 7 and 8).
The production of the full-length apoB46 by the pB46N8A and pB46N11A
expression vectors appeared to be definitive (Figs. 4, 5, 6), but we
wanted to obtain additional evidence that would prove that the
apoB46-sized bands on the Western blots represented bona
fide human apoB46 and not a Western blot artifact. To achieve this
goal, we sought to demonstrate, using discontinuous gradient ultracentrifugation, that the apoB32- and apoB46-containing
lipoproteins produced by a pB46N11A transformant had distinctive
physical properties. Because apoB46 has more lipid-binding sequences
than apoB32, apoB46-containing lipoproteins should be more buoyant than
apoB32-containing lipoproteins (13). For example, in our previous
experiments with stably transformed McA-RH7777 cells (13), apoB31- and
apoB37-containing lipoproteins had peak densities of ~1.17 and
~1.14 g/ml, respectively, whereas apoB48-containing lipoproteins had
a peak density of ~1.10 g/ml. As predicted, the apoB46-containing
lipoproteins from the pB46N11A cell line were more buoyant than the
apoB32-containing lipoproteins (Fig. 7), demonstrating
that two distinctly different apoB proteins were generated from a
single construct containing a frameshift mutation.
Analysis of Polyadenine Stretches in Protein-coding Sequences
These apoB expression experiments demonstrated that
transcriptional slippage is likely to occur at long stretches of
adenines. This process can obviously correct the reading frame when the stretch of adenines is in the 1 reading frame, but the same process obviously could introduce a frameshift if the stretch of adenines were
in the proper reading frame (as suggested by the results with the
pB46N9A construct). The introduction of frameshift mutations, even at a
low rate, would seemingly be detrimental to any organism. Because
transcriptional slippage could be harmful, we hypothesized that
evolution exerted selective pressure against long stretches of adenines
within the protein-coding sequences of genes. To test this hypothesis,
we analyzed the nucleotide sequences coding for three consecutive
lysines in human proteins. Lysine can be specified by only two codons,
AAG and AAA. In an analysis of 9808 human protein-coding sequences from
the GenBankTM Data Bank, there were 268,178 lysine residues, of which
159,094 were specified by AAG (59.4%) and 109,084 were specified by
AAA (40.6%) (16). Because there are two codons for lysine, Lys-Lys-Lys
can be coded for by eight (2 × 2 × 2) different nucleotide
sequences. If codon usage within Lys-Lys-Lys motifs conformed to the
usual pattern (59.4% AAG codons and 40.6% AAA codons), the predicted
frequencies of Lys-Lys-Lys motifs coded by AAA AAA AAA and AAA AAA AAG
would be 6.7% (0.406 × 0.406 × 0.406) and 9.8%
(0.406 × 0.406 × 0.594), respectively. According to our
hypothesis, however, we would predict that these two sequences would
actually occur at a lower than predicted frequency. This suspicion was
confirmed: in a series of 150 Lys-Lys-Lys motifs from 128 different
human proteins, only 2.0% of the motifs were coded by AAA AAA AAA, and
only 2.7% were coded by AAA AAA AAG (Table I). In
contrast, nucleotide sequences containing only two or three consecutive
adenines (AAG AAG AAG or AAG AAG AAA) were noted at a greater than
predicted frequency (49.3% versus 35.3%) (Table
II). A
2 analysis revealed that the
observed nucleotide sequences for Lys-Lys-Lys motifs were significantly
different from those predicted by the lysine codon usage data from
human proteins (p = 0.003) (Table I). When the eight
different nucleotide sequences were segregated into groups containing
short (two or three), intermediate (five or six), and long (eight or
nine) stretches of adenines and analyzed, the differences remained
highly significant (p = 0.004) (Table II). Thus, there
appears to be little doubt that evolution has chosen to avoid long
stretches of adenines as a means to code for Lys-Lys-Lys.
|
|
The foregoing analysis depends on appropriate codon usage information
on which to base the "predicted" frequencies for the eight
nucleotide sequences specifying Lys-Lys-Lys. Because our predictions
were based on data from 9808 human coding sequences, we believe that it
is highly unlikely that faulty codon usage data would have caused us to
draw inaccurate conclusions. Nevertheless, because of the importance of
this issue, we calculated alternative predicted frequencies for the
Lys-Lys-Lys motifs using codon usage data from the 450 lysine
codons in the 150 Lys-Lys-Lys motifs. Because of the selective
pressure against long stretches of adenines, one would expect to find a
higher percentage of AAG codons and a lower percentage of AAA codons in
the subset of 450 lysine codons. This expectation was borne out: 66.4%
of the lysines were specified by AAG, and 33.6% were specified by AAA.
Using these codon usage data (which inherently are biased against our
hypothesis), we recalculated the predicted frequencies for the eight
possible nucleotide sequences coding for Lys-Lys-Lys and used
2 analysis to compare the predicted frequencies with
those that were actually observed. Despite the inherent bias, this
analysis showed a significant difference between the predicted and
observed frequencies for the eight different nucleotide sequences
(p = 0.031), with the sequences containing long
stretches of adenines being under-represented relative to their
predicted frequency.
In earlier studies, our laboratory demonstrated that a unique apoB allele (the apoB86 allele) results in the production of both a truncated apoB (apoB86) and a full-length apoB100 (1, 6, 7). The production of apoB86 was due to a single cytosine deletion in exon 26 of the apoB gene (1). Remarkably, the production of apoB100 by the apoB86 allele was due to the transcriptional insertion of an extra adenine into a stretch of eight consecutive adenines created by the 1-bp deletion (1). In other words, transcriptional slippage by RNA polymerase "corrected" the mutant reading frame and allowed apoB100 to be synthesized. In this study, we demonstrated that reading-frame restoration at this site does not occur when the long stretch of adenines is interrupted by a cytosine. This finding in mammalian cells is consistent with the experimental observations of Wagner et al. (3) in E. coli. They found that the insertion of a guanine into a stretch of 11 consecutive adenines prevented RNA polymerase slippage and prevented reading-frame restoration with an "out-of-frame" lacZ construct. Another of our experimental aims was to determine whether reading-frame restoration is unique to liver cells. The fact that reading-frame restoration occurred in COS-7 cells strongly suggests that nonhepatic cell lines have the capacity for transcriptional slippage.
Another aim was to determine whether reading-frame restoration would
occur at other long stretches of adenines in the apoB gene or whether
it was unique to the specific sequence that we had fortuitously
encountered in a human kindred. To test this issue, we introduced an
apoB86-like frameshift mutation (with eight consecutive adenines in the
1 reading frame) into the coding sequences of an apoB46 cDNA
expression vector and expressed the mutant plasmid (pB46N8A) in COS-7
cells and McA-RH7777 cells. In both cell lines, pB46N8A yielded the
expected truncated protein (apoB32) as well as the full-length apoB46,
indicating that reading-frame restoration occurs at other long
stretches of adenines in the apoB gene. A construct containing 11 consecutive adenines in the
1 reading frame (pB46N11A) also yielded
both apoB32 and apoB46, but a construct lacking the uninterrupted
stretch of adenines (pB46N4AG3A) yielded only the truncated protein.
The production of apoB46 in these experiments cannot be dismissed as a
Western blot artifact. As illustrated by the density gradient
ultracentrifugation experiment (Fig. 7), the apoB46-containing
lipoproteins were packaged into lipoproteins with density profiles that
were quite distinct from those of lipoproteins containing apoB32.
Reading-frame restoration could conceivably occur by several
mechanisms. For example, the sequence A AAA AAC, when positioned immediately upstream from a stable RNA stem-loop structure, can lead to
ribosomal frameshifting in the 1 direction in mammalian systems (18,
19). Similarly, the sequence A AAA AAG is associated with ribosomal
frameshifting in E. coli, again when located in close
proximity to an RNA stem-loop structure (20-23). If the ribosome were
to move backwards a single nucleotide along a long stretch of adenines
in the
1 reading frame, the altered reading frame would obviously be
corrected. In the case of the apoB86 and apoB46 constructs, we were not
able to identify sequences downstream from the mutations that would
yield a stable RNA stem-loop structure. Therefore, we doubt that
ribosomal frameshifting contributed significantly to the reading-frame
restoration. Of course, reading-frame restoration could conceivably
occur at the DNA level, if DNA polymerase introduced an extra adenine
into a long stretch of adenines. However, we did look for genomic DNA
sequence heterogeneity in the analysis of the apoB86 allele and found
none (1). Finally, reading-frame restoration could be due to
transcriptional slippage by RNA polymerase. Reading-frame restoration
due to transcriptional slippage has been well documented in the case of
the apoB86 allele (1) and in lacZ constructs in E. coli (3, 24). Based on those precedents, we believe that it is
overwhelmingly likely that transcriptional slippage was
responsible for the reading-frame restoration that we observed in the
experiments described here.
In the case of the apoB86 allele, we demonstrated that reading-frame restoration was due to the transcriptional insertion of an extra adenine into the stretch of eight consecutive adenines (1). Although the extra adenine corrected the frameshift mutation, we hypothesized that the same mechanism could introduce frameshift mutations whenever long stretches of adenines occurred in protein-coding sequences. In our experiments, we obtained evidence that this might indeed occur. A construct containing nine consecutive adenines in the 0 reading frame (pB46N9A) yielded primarily apoB46, but also a small amount of a truncated, apoB32-sized protein, strongly suggesting that transcriptional slippage along the stretch of nine adenines may have introduced a +1 frameshift into the apoB mRNA. Because the introduction of frameshifts, even at a low frequency, could be detrimental to an organism, we further hypothesized that nature would select against long stretches of adenines in protein-coding sequences. Data included in this paper strongly support this hypothesis. Analysis of the nucleotide sequences coding for 150 Lys-Lys-Lys motifs in 128 human proteins demonstrated that the sequences AAA AAA AAA and AAA AAA AAG were utilized at a significantly lower than predicted frequency, suggesting that evolution has exerted selective pressure against long stretches of adenines within protein-coding sequences.
|
We thank Y. Marcel and R. Milne for monoclonal antibody 1D1 and B. Blackhart, Z. Yao, and B. McCarthy for pB46neo.