Reading-frame Restoration by Transcriptional Slippage at Long Stretches of Adenine Residues in Mammalian Cells*

(Received for publication, December 9, 1996, and in revised form, March 17, 1997)

MacRae F. Linton Dagger §, Martin Raabe par , Vincenzo Pierotti and Stephen G. Young par **Dagger Dagger

From the  Gladstone Institute of Cardiovascular Disease, the par  Cardiovascular Research Institute, and the ** Department of Medicine, University of California, San Francisco, California 94141-9100 and the Dagger  Departments of Medicine and Pharmacology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232

ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES


ABSTRACT

We previously characterized a mutant apoB allele (the apoB86 allele) that produces both a truncated apoB (apoB86) and a full-length apoB100. The mutant allele contained a deletion of a single cytosine in exon 26, creating a stretch of eight consecutive adenines in the -1 reading frame. The altered reading-frame allele was restored, with ~10% efficiency, by the transcriptional insertion of an extra adenine into the stretch of eight consecutive adenines, thereby accounting for the synthesis of the full-length apoB100. Here, we demonstrate that this reading-frame restoration does not occur when the long stretch of adenines is interrupted by a cytosine. To assess whether reading-frame restoration is unique to a single site in the apoB gene, the same mutation (eight consecutive adenines in the -1 reading frame) was inserted into another site within the apoB gene. Reading-frame restoration occurred at the second site and was abrogated when the stretch of adenines was interrupted by another base. Of note, a computerized analysis of human cDNA sequences revealed that long stretches of adenines in protein-coding sequences occur at a lower than predicted frequency, suggesting that evolution has selected against these sequences.


INTRODUCTION

In 1992, we reported the existence of a mutant apoB allele causing familial hypobetalipoproteinemia, the apoB86 allele (1). This allele, which resulted in the synthesis of both a truncated apoB (apoB86) and a full-length apoB100, contained a deletion of a single cytosine residue in exon 26 of the apoB gene. This frameshift mutation was predicted to yield a stretch of 20 novel amino acids, followed by a premature stop codon. That this mutation was responsible for the production of apoB86 was proved by immunochemical studies utilizing monoclonal antibodies and an anti-peptide antibody directed against the 20 novel amino acids at the carboxyl terminus of apoB86 (1). The production of apoB100 from this allele involved a novel mechanism: reading-frame restoration by transcriptional insertion of an extra nucleotide at the site of the 1-base pair (bp)1 deletion. The deletion of the single cytosine residue in the apoB86 allele created a stretch of eight consecutive adenine residues. Minigene expression studies in cultured hepatoma cells demonstrated that the single cytosine deletion, along with the stretch of eight consecutive adenines, was faithfully present in the genomic DNA, but that ~10% of apoB86 cDNA clones actually contained nine consecutive adenines (1). Thus, the production of apoB100 by the apoB86 allele was due to the transcriptional insertion of an extra adenine, which restored the proper reading frame to the mutant mRNA. These studies provided the first in vivo evidence of "transcriptional slippage" (or "stuttering") along a long stretch of consecutive adenines in mammalian cells.

Slippage of RNA polymerase during transcription was first proposed by Chamberlin and Berg in 1962 (2). In recent years, Wagner et al. (3) described transcriptional slippage by Escherichia coli RNA polymerase during RNA elongation at runs of 10 or more adenines or thymines, resulting in the addition of untemplated thymine or adenine residues and restoration of the proper reading frame to out-of-frame lacZ constructs. Interestingly, RNA polymerase stuttering was not observed when similar experiments were performed in yeast (3).

This study was undertaken to define further the DNA sequence requirements for reading-frame restoration in the apoB86 allele. We also sought to determine whether the reading-frame restoration was somehow unique to the apoB86 allele or would occur at similar sequences elsewhere in the apoB gene. Finally, because transcriptional slippage could potentially introduce frameshift mutations, we hypothesized that evolution may have selected against the presence of long stretches of adenines in protein-coding sequences. To assess this possibility, we analyzed whether long stretches of adenines occur at a lower than predicted frequency in human proteins containing three consecutive lysine residues.


MATERIALS AND METHODS

Human Subjects

Blood was collected from H. J. B. (4-7), and plasma was used to prepare very low density lipoproteins, which were used as a source of apoB size standards for Western blot experiments. H. J. B. is a compound heterozygote for hypobetalipoproteinemia with two mutant apoB alleles: an apoB37 allele (7, 8) and an apoB86 allele. The apoB37 allele yields exclusively apoB37 (6, 7), whereas the apoB86 allele yields apoB100, apoB86, and apoB48 (1, 6, 7). ApoB48 is produced by the apoB86 allele as a result of apoB mRNA editing in the intestine (9).

ApoB86 Minigene and cDNA Expression Vectors

The apoB86 allele contains a single cytosine deletion (cDNA nucleotide 11840) in exon 26 of the apoB gene, which generates a stretch of eight consecutive adenines in the -1 reading frame (i.e. the mutation changes the sequence AAA AAC AAA to AAA AAA AA). An apoB fusion minigene vector (pB18/86) containing the 1-bp deletion found in the apoB86 allele and a wild-type vector lacking the mutation (pB18/100) have been described (1). For this study, we mutated the pB18/100 expression vector to delete one of the adenines immediately preceding cytosine 11840 instead of cytosine 11840 itself. This construct contained the sequence AAA ACA AA and was designated pB18/86:4AC3A (because it had four adenines, one cytosine, and then three more adenines) (Fig. 1). Thus, pB18/86:4AC3A contained the same -1 frameshift as the apoB86 allele, but lacked the long stretch of adenines. To create pB18/86:4AC3A, we used the mutagenesis technique of Deng and Nickoloff (10) and a mutagenic primer (5'-GCCAGTTTGAAAACAAAGCAGAT-3'). In addition, we generated pB18/86cDNA (Fig. 1), which is identical to pB18/86, except that it was constructed entirely from cDNA clones. For this clone, a 2119-bp BamHI-HindIII cDNA fragment was ligated to the pB18 cDNA vector, rather than to the 3567-bp BamHI-HindIII genomic fragment. The apoB86 mutation was introduced into the vector by site-directed mutagenesis. All mutations were confirmed by DNA sequencing.


Fig. 1. Schematic diagram of the apoB fusion protein expression vectors. To generate pB18/100, a wild-type (wt) 3567-bp BamHI-HindIII fragment from the 3'-end of the human apoB gene (spanning exons 26-29) was ligated in frame to a pB18 cDNA expression vector (1). The inset shows the changes in the apoB-coding sequences generated by site-directed mutagenesis (see "Materials and Methods"). The cytosine that is deleted in the apoB86 allele (apoB cDNA nucleotide 11840) is indicated. Frame indicates the reading frame. CMV, cytomegalovirus promoter and enhancer; hGH, human growth hormone terminator; SV40, simian virus 40 enhancer.
[View Larger Version of this Image (25K GIF file)]

Introduction of an "ApoB86-like" Frameshift Mutation into a cDNA Expression Vector Coding for ApoB46

A cDNA expression vector coding for apoB46, pB46N (originally designated pB46neo; constructed in the neo-containing vector pRC/CMV (Invitrogen, San Diego, CA)), was obtained from Dr. Zemin Yao (University of Ottawa, Ottawa, Canada) (11). The apoB-coding sequence of pB46N terminates at the EcoRI site at apoB cDNA nucleotide 6507 (corresponding to apoB amino acid residue 2099). To introduce a mutation similar to that observed in the apoB86 allele, we used site-directed mutagenesis (with the primer 5'-GACTCCAAAAAAAACAGCATT-3') to delete a single guanine residue (apoB cDNA nucleotide 4553) from the middle of a long stretch of adenines, changing the sequence AAA AAG AAA (coding for amino acids 1447-1449) to eight consecutive adenines in the -1 reading frame (AAA AAA AA) (see Fig. 3). The vector pB46N8A was predicted to code for a truncated apoB protein (apoB32) containing 1492 amino acids (including 44 novel amino acids at the carboxyl terminus). An apoB46 expression vector (pB46N9A) containing nine consecutive adenines was constructed by mutating the guanine at cDNA nucleotide 4553 to an adenine. This vector did not have a frameshift mutation and therefore was predicted to yield the full-length apoB protein. We also generated another vector (pB46N11A) containing 11 consecutive adenines in the -1 reading frame (see Fig. 3).


Fig. 3. Schematic diagram of apoB46 expression vectors. pB46N is a pRC/CMV mammalian expression vector containing apoB100 nucleotides 1-6507, which codes for the amino-terminal 46% of human apoB100. The inset shows the mutations introduced by site-directed mutagenesis. Frame indicates the reading frame. CMV, cytomegalovirus promoter and enhancer; bGH, bovine growth hormone transcription and termination signals; Neo, neomycin resistance gene; wt, wild-type. The guanine (apoB cDNA nucleotide 4553) that was deleted to create the apoB86-like mutation is indicated.
[View Larger Version of this Image (25K GIF file)]

Transfection of COS-7 Cells

COS-7 cells were grown to ~30% confluency in T-75 flasks and transfected with 10-15 µg of plasmid DNA by calcium phosphate coprecipitation and glycerol shock as described previously (12). Two days after the transfection, the cells were washed twice with phosphate-buffered saline, scraped from the plates, and pelleted by centrifugation. The cell pellet was dissolved in 200 µl of sample buffer (62.5 mM Tris-HCl, pH 6.8, 20% glycerol, 2% SDS, and 5% 2-mercaptoethanol) and sonicated before electrophoresis on SDS-3-12% polyacrylamide gels. As a control for these experiments, COS-7 cells were transfected with an apoB33 cDNA expression vector, pB33 (13).

Selection of Stable McA-RH7777 Cell Lines

Stable transformants of McA-RH7777 cells (a rat hepatoma cell line) were obtained by cotransfecting cells with 10-15 µg of minigene DNA with pSV2neo (at a molar ratio of 20:1), followed by selection with G418 (400 µg/ml) (1, 13). The pB46neo series of constructs (see Fig. 3) contained a neo in the plasmid backbone, obviating the need for cotransfection with pSV2neo. Individual G418-resistant colonies were picked, and stable cell lines were propagated in medium containing G418. To analyze apoB secretion from the cells, ~30% confluent cells were incubated in serum-free Dulbecco's modified Eagle's medium for 2 days. The conditioned medium was then harvested and analyzed for apoB expression by Western blotting. As controls for these experiments, we analyzed the human apoB-containing lipoproteins secreted by McA-RH7777 cells stably transformed with apoB31 or apoB37 cDNA expression vectors (pB31 and pB37, respectively) (13) and McA-RH7777 cells stably transformed with p158, a P1 bacteriophage clone spanning the human apoB gene (14).

Western Blot Analysis of Human ApoB Proteins

Human apoB proteins from transfected COS-7 cell lysates or from the medium from stably transformed McA-RH7777 cells were detected on Western blots of SDS-3-12% polyacrylamide gels (14) using the human apoB-specific monoclonal antibody 1D1 (which binds to an epitope between apoB amino acids 474 and 539) (15). The human apoB-containing lipoproteins in the medium were either concentrated by adsorption onto Cab-O-Sil (13) or partially purified from the medium by preparing the d < 1.21 g/ml lipoproteins by ultracentrifugation.

Density Gradient Ultracentrifugation

The medium from an McA-RH7777 cell line transfected with pB46N11A was analyzed by discontinuous density gradient ultracentrifugation (7, 13). All density fractions were dialyzed against phosphate-buffered saline containing 1.0 mM EDTA before analysis by Western blotting of SDS-3-12% polyacrylamide gels.

Analysis of Codon Usage in Human Proteins with Lys-Lys-Lys Motifs

A total of 150 Lys-Lys-Lys motifs from 128 different proteins were identified in the SWISS-PROT data base (Release 33) using the BLITZ program from the European Bioinformatics Institute.2 The nucleotide sequence for the Lys-Lys-Lys motif was then recorded. Proteins with more than one isoenzyme were included only once because including multiple isoenzymes might bias the results of the analysis (i.e. the isoenzymes might have arisen by alternative splicing events or by a relatively recent gene duplication event). The accession numbers for the 128 human proteins, together with the nucleotide sequences coding for the Lys-Lys-Lys motif, are listed in the "Appendix."

Because two different codons specify lysine (AAA and AAG), there are eight different nucleotide sequences coding for Lys-Lys-Lys. The frequency of these eight different sequences was predicted from the codon usage for lysine in human protein-coding sequences (taken from the studies of Nakamura et al. (16)).3 The differences between the predicted frequencies of the different nucleotide sequences and the actual frequencies of the 128 different human proteins were analyzed with a chi 2 test (17).


RESULTS

Reading-frame Restoration in the ApoB86 Allele

Initially, we sought to further define the mechanism for reading-frame restoration in the apoB86 allele and to determine whether this phenomenon was unique to hepatoma cells. We transiently transfected COS-7 cells with apoB expression vectors and then analyzed the COS-7 cell extracts on Western blots of SDS-polyacrylamide gels. As shown in Fig. 2 (lane 2), the pB18/100 vector yielded a full-length apoB protein of the expected size (approximately apoB34-sized, migrating ahead of the apoB37 size standard). The mutant vector pB18/86 yielded the expected truncated protein (approximately apoB24-sized), but, in addition, yielded small amounts of the full-length protein, indicating that reading-frame restoration occurs in COS-7 cells. To test whether reading-frame restoration depends on the creation of a long stretch of adenines, we used site-directed mutagenesis to construct a mutant expression vector (pB18/86:4AC3A) in which one of the adenines, rather than the cytosine, was deleted (converting the wild-type sequence AAA AAC AAA to AAA ACA AA) (Fig. 1). This construct contains a -1 frameshift within the same codon as that in the apoB86 allele, but does not result in a long stretch of adenines. When transfected into COS-7 cells, the pB18/86:4AC3A vector yielded only a truncated apoB and no full-length apoB (Fig. 2, lane 4), indicating that the reading-frame restoration requires the long stretch of adenines. The frameshift mutation in pB18/86 was <100 bp 5' of intron 26. Although the 1-bp deletion did not appear to create a cryptic splice site (1), we nevertheless considered it important to consider the possibility of an unusual or low frequency mRNA splicing event in the reading-frame restoration. To achieve this goal, we generated a pB18/86 expression vector entirely from cDNA (pB18/86cDNA) and analyzed its expression in COS-7 cells. Western blot analysis revealed that pB18/86cDNA yielded both the truncated and full-length apoB proteins, indicating that reading-frame restoration can occur in the absence of the nearby introns (Fig. 2, lane 5).


Fig. 2. Western blot analysis of the apoB proteins produced by transiently transfected COS-7 cells. Cell lysates from transiently transfected COS-7 cells were subjected to electrophoresis on SDS-3-12% polyacrylamide gels; the separated proteins were then transferred to Immobilon-P membranes for Western blot analysis with the apoB-specific monoclonal antibody 1D1. Shown are proteins from nontransfected COS-7 cells (lane 1); human apoB fusion proteins from COS-7 cells transfected with pB18/100 (lane 2), with pB18/86 (lane 3), with pB18/86:4AC3A (lane 4), and with pB18/86cDNA (lane 5); and apoB48/apoB37 size markers from the very low density lipoproteins isolated from the plasma of H. J. B. (lane 6).
[View Larger Version of this Image (37K GIF file)]

Reading-frame Restoration with an ApoB86-like Mutation Introduced into Another Site in the ApoB Gene

We next sought to test whether the reading-frame restoration at the stretch of eight consecutive adenines is a unique feature of a specific site within the apoB gene (i.e. that reading-frame restoration might depend on the sequences adjacent to the apoB86 mutation, perhaps because they form a secondary structure that facilitates transcriptional stuttering) or is a more general phenomenon that could occur at other sites. We generated a mutant apoB46 expression vector (Fig. 3) containing an apoB86-like frameshift mutation. The mutant apoB46 construct (pB46N8A) contained a deletion of a single nucleotide (the guanine at cDNA nucleotide 4553), which converts the sequence AAA AAG AAA to AAA AAA AA. This -1 frameshift was predicted to yield 44 novel amino acids, followed by a premature stop codon (TGA), generating an apoB32-sized protein. If transcriptional slippage occurred along the stretch of eight consecutive adenines, thereby correcting the reading frame, a full-length (apoB46-sized) protein would also be expected. In transient expression studies in COS-7 cells, pB46N8A yielded the predicted apoB32-sized protein, but also yielded a small amount of apoB46 (Fig. 4, lanes 5 and 6). To test whether the reading-frame restoration was dependent on the stretch of eight consecutive adenines, we generated another mutant construct, pB46N4AG3A (Fig. 3), in which one of the adenines, rather than the guanine, was deleted (converting the sequence AAA AAG AAA to AAA AGA AA). This construct yielded only the truncated protein, apoB32, and none of the full-length protein (Fig. 4, lanes 7 and 8). In another construct, pB46N9A (Fig. 3), we substituted an adenine for the guanine, generating the sequence AAA AAA AAA. The predominant product of this construct was a full-length apoB46; interestingly, however, a small amount of a truncated, apoB32-sized protein was also produced, suggesting that transcriptional slippage may have introduced a +1 frameshift at the site of the nine consecutive adenines (Fig. 4, lanes 3 and 4). (A +1 frameshift would be expected to yield a truncated, apoB32-sized protein.)


Fig. 4. Western blot analysis of the apoB proteins produced by transiently transfected COS-7 cells. COS-7 cell lysates were electrophoresed on SDS-3-12% polyacrylamide gels; the separated proteins were then transferred to Immobilon-P for Western blot analysis with monoclonal antibody 1D1. Shown are apoB48/apoB37 size markers from the very low density lipoproteins isolated from the plasma of H. J. B. (lane 1); lysates from COS-7 cells transfected with pB46N (lane 2), with pB46N9A (lanes 3 and 4), with pB46N8A (lanes 5 and 6), and with pB46N4AG3A (lanes 7 and 8); and apoB proteins from COS-7 cells transfected with pB33, a plasmid coding for apoB33 (lane 9).
[View Larger Version of this Image (23K GIF file)]

The expression of the pB46N8A and pB46N4AG3A constructs was further tested by generating stably transformed McA-RH7777 cell lines and using Western blots of SDS-polyacrylamide gels to examine the human apoB-containing lipoproteins in the cell culture medium. Western blots of the d < 1.21 g/ml lipoproteins from pB46N8A transformants confirmed that pB46N8A results in the production of both apoB32 and apoB46 (Fig. 5, lanes 3 and 4). In contrast, pB46N4AG3A stable transformants yielded only apoB32 (Fig. 5, lanes 5 and 6), indicating that reading-frame restoration does not occur when the stretch of adenines is interrupted.


Fig. 5. Western blot analysis of the human apoB-containing lipoproteins secreted by McA-RH7777 cells stably transformed with pB46N, pB46N8A, or pB46N4AG3A. The d < 1.21 g/ml lipoproteins from the cell culture medium were prepared by ultracentrifugation and size-fractionated on SDS-3-12% polyacrylamide gels. The separated proteins were then transferred to Immobilon-P for Western blot analysis with monoclonal antibody 1D1. Shown are lipoproteins from nontransfected McA-RH7777 cells (lane 1) and from cells stably transformed with pB46N (lane 2), with pB46N8A (lanes 3 and 4), and with pB46N4AG3A (lanes 5 and 6) and the very low density lipoproteins (VLDL) isolated from the plasma of H. J. B. (lane 7).
[View Larger Version of this Image (27K GIF file)]

To determine if a longer stretch of adenines results in reading-frame restoration, we generated a pB46N11A expression vector (Fig. 3), which was similar to pB46N8A except that it contained 11 consecutive adenines in the -1 reading frame. Western blots of SDS-polyacrylamide gels showed that the pB46N11A-transformed cells produced both the predicted truncated apoB32 and the full-length apoB46 (Fig. 6, lanes 7 and 8).


Fig. 6. Western blot analysis of the human apoB proteins secreted by McA-RH7777 cells stably transformed with pB31, pB37, p158, or pB46N11A. The proteins in the cell culture medium were concentrated by absorption onto Cab-O-Sil, size-fractionated on SDS-3-12% polyacrylamide gels, and analyzed by Western blotting with monoclonal antibody 1D1. Shown are cells stably transformed with pB31 (lanes 1 and 2), with pB37 (lanes 3 and 4), with p158 (lanes 5 and 6), and with pB46N11A (lanes 7 and 8).
[View Larger Version of this Image (30K GIF file)]

The production of the full-length apoB46 by the pB46N8A and pB46N11A expression vectors appeared to be definitive (Figs. 4, 5, 6), but we wanted to obtain additional evidence that would prove that the apoB46-sized bands on the Western blots represented bona fide human apoB46 and not a Western blot artifact. To achieve this goal, we sought to demonstrate, using discontinuous gradient ultracentrifugation, that the apoB32- and apoB46-containing lipoproteins produced by a pB46N11A transformant had distinctive physical properties. Because apoB46 has more lipid-binding sequences than apoB32, apoB46-containing lipoproteins should be more buoyant than apoB32-containing lipoproteins (13). For example, in our previous experiments with stably transformed McA-RH7777 cells (13), apoB31- and apoB37-containing lipoproteins had peak densities of ~1.17 and ~1.14 g/ml, respectively, whereas apoB48-containing lipoproteins had a peak density of ~1.10 g/ml. As predicted, the apoB46-containing lipoproteins from the pB46N11A cell line were more buoyant than the apoB32-containing lipoproteins (Fig. 7), demonstrating that two distinctly different apoB proteins were generated from a single construct containing a frameshift mutation.


Fig. 7. Separation, by density gradient ultracentrifugation, of apoB32- and apoB46-containing lipoproteins produced by McA-RH7777 cells stably transformed with pB46N11A. The medium from a pB46N11A transformant was subjected to discontinuous density gradient ultracentrifugation (see "Materials and Methods"). A preliminary analysis revealed that fractions 1 and 2 (d = 1.063 and 1.071 g/ml, respectively) did not contain human apoB. To analyze the distribution of human apoB32 and apoB46 in the gradient, fractions 3-14 were size-fractionated on an SDS-3-12% polyacrylamide gel and analyzed by Western blotting with monoclonal antibody 1D1. A sample of the unfractionated cell culture medium was included on the same gel.
[View Larger Version of this Image (29K GIF file)]

Analysis of Polyadenine Stretches in Protein-coding Sequences

These apoB expression experiments demonstrated that transcriptional slippage is likely to occur at long stretches of adenines. This process can obviously correct the reading frame when the stretch of adenines is in the -1 reading frame, but the same process obviously could introduce a frameshift if the stretch of adenines were in the proper reading frame (as suggested by the results with the pB46N9A construct). The introduction of frameshift mutations, even at a low rate, would seemingly be detrimental to any organism. Because transcriptional slippage could be harmful, we hypothesized that evolution exerted selective pressure against long stretches of adenines within the protein-coding sequences of genes. To test this hypothesis, we analyzed the nucleotide sequences coding for three consecutive lysines in human proteins. Lysine can be specified by only two codons, AAG and AAA. In an analysis of 9808 human protein-coding sequences from the GenBankTM Data Bank, there were 268,178 lysine residues, of which 159,094 were specified by AAG (59.4%) and 109,084 were specified by AAA (40.6%) (16). Because there are two codons for lysine, Lys-Lys-Lys can be coded for by eight (2 × 2 × 2) different nucleotide sequences. If codon usage within Lys-Lys-Lys motifs conformed to the usual pattern (59.4% AAG codons and 40.6% AAA codons), the predicted frequencies of Lys-Lys-Lys motifs coded by AAA AAA AAA and AAA AAA AAG would be 6.7% (0.406 × 0.406 × 0.406) and 9.8% (0.406 × 0.406 × 0.594), respectively. According to our hypothesis, however, we would predict that these two sequences would actually occur at a lower than predicted frequency. This suspicion was confirmed: in a series of 150 Lys-Lys-Lys motifs from 128 different human proteins, only 2.0% of the motifs were coded by AAA AAA AAA, and only 2.7% were coded by AAA AAA AAG (Table I). In contrast, nucleotide sequences containing only two or three consecutive adenines (AAG AAG AAG or AAG AAG AAA) were noted at a greater than predicted frequency (49.3% versus 35.3%) (Table II). A chi 2 analysis revealed that the observed nucleotide sequences for Lys-Lys-Lys motifs were significantly different from those predicted by the lysine codon usage data from human proteins (p = 0.003) (Table I). When the eight different nucleotide sequences were segregated into groups containing short (two or three), intermediate (five or six), and long (eight or nine) stretches of adenines and analyzed, the differences remained highly significant (p = 0.004) (Table II). Thus, there appears to be little doubt that evolution has chosen to avoid long stretches of adenines as a means to code for Lys-Lys-Lys.

Table I. Nucleotides coding for 150 Lys-Lys-Lys motifs in 128 human proteins from the SWISS-PROT data base


Nucleotide sequence Predicted frequencya Observed frequencyb

AAG AAG AAG 0.210 0.267
AAG AAG AAA 0.143 0.227
AAG AAA AAG 0.143 0.147
AAA AAG AAG 0.143 0.107
AAA AAG AAA 0.098 0.113
AAG AAA AAA 0.098 0.093
AAA AAA AAG 0.098 0.027
AAA AAA AAA 0.067 0.020

a Based on the codon usage data base from 9808 human coding sequences (16).
b p = 0.003 (chi 2).

Table II. Codon usage in 150 Lys-Lys-Lys motifs in 128 human proteins from the SWISS-PROT data base


Short stretch of adenines (2 or 3) Intermediate stretch of adenines (5 or 6) Long stretch of adenines (8 or 9)

Nucleotide sequence AAG AAG AAG AAG AAA AAG AAA AAA AAG
AAG AAG AAA AAA AAG AAG AAA AAA AAA
AAA AAG AAA
AAG AAA AAA
Predicted frequencya 0.353 0.482 0.165
Observed frequencyb 0.493 0.460 0.047

a Based on the codon usage data base from 9808 human coding sequences (16).
b p = 0.004 (chi 2).

The foregoing analysis depends on appropriate codon usage information on which to base the "predicted" frequencies for the eight nucleotide sequences specifying Lys-Lys-Lys. Because our predictions were based on data from 9808 human coding sequences, we believe that it is highly unlikely that faulty codon usage data would have caused us to draw inaccurate conclusions. Nevertheless, because of the importance of this issue, we calculated alternative predicted frequencies for the Lys-Lys-Lys motifs using codon usage data from the 450 lysine codons in the 150 Lys-Lys-Lys motifs. Because of the selective pressure against long stretches of adenines, one would expect to find a higher percentage of AAG codons and a lower percentage of AAA codons in the subset of 450 lysine codons. This expectation was borne out: 66.4% of the lysines were specified by AAG, and 33.6% were specified by AAA. Using these codon usage data (which inherently are biased against our hypothesis), we recalculated the predicted frequencies for the eight possible nucleotide sequences coding for Lys-Lys-Lys and used chi 2 analysis to compare the predicted frequencies with those that were actually observed. Despite the inherent bias, this analysis showed a significant difference between the predicted and observed frequencies for the eight different nucleotide sequences (p = 0.031), with the sequences containing long stretches of adenines being under-represented relative to their predicted frequency.


DISCUSSION

In earlier studies, our laboratory demonstrated that a unique apoB allele (the apoB86 allele) results in the production of both a truncated apoB (apoB86) and a full-length apoB100 (1, 6, 7). The production of apoB86 was due to a single cytosine deletion in exon 26 of the apoB gene (1). Remarkably, the production of apoB100 by the apoB86 allele was due to the transcriptional insertion of an extra adenine into a stretch of eight consecutive adenines created by the 1-bp deletion (1). In other words, transcriptional slippage by RNA polymerase "corrected" the mutant reading frame and allowed apoB100 to be synthesized. In this study, we demonstrated that reading-frame restoration at this site does not occur when the long stretch of adenines is interrupted by a cytosine. This finding in mammalian cells is consistent with the experimental observations of Wagner et al. (3) in E. coli. They found that the insertion of a guanine into a stretch of 11 consecutive adenines prevented RNA polymerase slippage and prevented reading-frame restoration with an "out-of-frame" lacZ construct. Another of our experimental aims was to determine whether reading-frame restoration is unique to liver cells. The fact that reading-frame restoration occurred in COS-7 cells strongly suggests that nonhepatic cell lines have the capacity for transcriptional slippage.

Another aim was to determine whether reading-frame restoration would occur at other long stretches of adenines in the apoB gene or whether it was unique to the specific sequence that we had fortuitously encountered in a human kindred. To test this issue, we introduced an apoB86-like frameshift mutation (with eight consecutive adenines in the -1 reading frame) into the coding sequences of an apoB46 cDNA expression vector and expressed the mutant plasmid (pB46N8A) in COS-7 cells and McA-RH7777 cells. In both cell lines, pB46N8A yielded the expected truncated protein (apoB32) as well as the full-length apoB46, indicating that reading-frame restoration occurs at other long stretches of adenines in the apoB gene. A construct containing 11 consecutive adenines in the -1 reading frame (pB46N11A) also yielded both apoB32 and apoB46, but a construct lacking the uninterrupted stretch of adenines (pB46N4AG3A) yielded only the truncated protein. The production of apoB46 in these experiments cannot be dismissed as a Western blot artifact. As illustrated by the density gradient ultracentrifugation experiment (Fig. 7), the apoB46-containing lipoproteins were packaged into lipoproteins with density profiles that were quite distinct from those of lipoproteins containing apoB32.

Reading-frame restoration could conceivably occur by several mechanisms. For example, the sequence A AAA AAC, when positioned immediately upstream from a stable RNA stem-loop structure, can lead to ribosomal frameshifting in the -1 direction in mammalian systems (18, 19). Similarly, the sequence A AAA AAG is associated with ribosomal frameshifting in E. coli, again when located in close proximity to an RNA stem-loop structure (20-23). If the ribosome were to move backwards a single nucleotide along a long stretch of adenines in the -1 reading frame, the altered reading frame would obviously be corrected. In the case of the apoB86 and apoB46 constructs, we were not able to identify sequences downstream from the mutations that would yield a stable RNA stem-loop structure. Therefore, we doubt that ribosomal frameshifting contributed significantly to the reading-frame restoration. Of course, reading-frame restoration could conceivably occur at the DNA level, if DNA polymerase introduced an extra adenine into a long stretch of adenines. However, we did look for genomic DNA sequence heterogeneity in the analysis of the apoB86 allele and found none (1). Finally, reading-frame restoration could be due to transcriptional slippage by RNA polymerase. Reading-frame restoration due to transcriptional slippage has been well documented in the case of the apoB86 allele (1) and in lacZ constructs in E. coli (3, 24). Based on those precedents, we believe that it is overwhelmingly likely that transcriptional slippage was responsible for the reading-frame restoration that we observed in the experiments described here.

In the case of the apoB86 allele, we demonstrated that reading-frame restoration was due to the transcriptional insertion of an extra adenine into the stretch of eight consecutive adenines (1). Although the extra adenine corrected the frameshift mutation, we hypothesized that the same mechanism could introduce frameshift mutations whenever long stretches of adenines occurred in protein-coding sequences. In our experiments, we obtained evidence that this might indeed occur. A construct containing nine consecutive adenines in the 0 reading frame (pB46N9A) yielded primarily apoB46, but also a small amount of a truncated, apoB32-sized protein, strongly suggesting that transcriptional slippage along the stretch of nine adenines may have introduced a +1 frameshift into the apoB mRNA. Because the introduction of frameshifts, even at a low frequency, could be detrimental to an organism, we further hypothesized that nature would select against long stretches of adenines in protein-coding sequences. Data included in this paper strongly support this hypothesis. Analysis of the nucleotide sequences coding for 150 Lys-Lys-Lys motifs in 128 human proteins demonstrated that the sequences AAA AAA AAA and AAA AAA AAG were utilized at a significantly lower than predicted frequency, suggesting that evolution has exerted selective pressure against long stretches of adenines within protein-coding sequences.

APPENDIX

Nucleotide Sequences Encoding 150 Lys-Lys-Lys Motifs in 128 Human Proteins (Listed by SWISS-PROT/EMBL ID) Nucleotide Sequences Encoding 150 Lys-Lys-Lys Motifs in 128 Human Proteins (Listed by SWISS-PROT/EMBL ID)

Protein Nucleotide sequence Protein Nucleotide sequence

2ACA_HUM AAG AAA AAA DHSB_HUM AAG AAG AAG
4F2_HUM AAA AAA AAG DLK_HUM AAG AAG AAG
5H1B_HUM AAG AAG AAA DMD_HUM AAG AAG AAA
8ODP_HUM AAG AAG AAA DOC2_HUM AAG AAA AAG
25A1_HUM AAG AAG AAG DOC2_HUM AAG AAA AAG
41_HUM AAA AAG AAG DPOE_HUM AAG AAA AAG
A4_HUM AAG AAG AAA G6PD_HUM AAG AAG AAG
AACT_HUM AAG AAA AAG GAA2_HUM AAA AAG AAA
AC15_HUM AAG AAA AAG GBP1_HUM AAG AAA AAA
AC15_HUM AAG AAA AAG GBPI_HUM AAG AAA AAG
ACBP_HUM AAG AAA AAA H1B_HUM AAG AAA AAG
ACDM_HUM AAG AAG AAG H1B_HUM AAG AAG AAG
ACHO_HUM AAA AAG AAA HGFA_HUM AAG AAG AAA
ACM2_HUM AAA AAG AAG HM74_HUM AAG AAG AAG
ADDA_HUM AAG AAG AAG ICAL_HUM AAG AAA AAA
ADO_HUM AAG AAG AAA IL5_HUM AAA AAA AAA
AF9_HUM AAG AAG AAA INI2_HUM AAG AAA AAG
AHR_HUM AAA AAG AAA IP3L_HUM AAG AAG AAG
AK79_HUM AAG AAA AAG IT5P_HUM AAG AAG AAG
AMPR_HUM AAG AAA AAG ITB4_HUM AAG AAG AAG
AMPR_HUM AAG AAG AAA K22E_HUM AAG AAG AAG
AOFA_HUM AAG AAG AAA KINH_HUM AAA AAG AAG
AOFA_HUM AAG AAG AAG KRAA_HUM AAG AAG AAA
APB_HUM AAA AAG AAA KRAB_HUM AAG AAA AAG
APE1_HUM AAG AAG AAA KU86_HUM AAG AAA AAG
AQP1_HUM AAG AAG AAG MLH1_HUM AAG AAG AAG
ARK1_HUM AAG AAG AAG MPI3_HUM AAA AAA AAG
ARL3_HUM AAG AAG AAA MPK1_HUM AAG AAG AAG
ARY2_HUM AAG AAG AAA MSH3_HUM AAA AAG AAA
ATCP_HUM AAA AAG AAA MTF1_HUM AAA AAG AAA
ATCP_HUM AAG AAA AAG MYCM_HUM AAG AAG AAG
ATF3_HUM AAG AAG AAG MYSB_HUM AAA AAA AAA
ATPA_HUM AAG AAG AAG MYSB_HUM AAG AAG AAG
ATPG_HUM AAG AAG AAA MYSS_HUM AAA AAG AAA
AVR2_HUM AAA AAA AAG MYSS_HUM AAA AAG AAG
BAL_HUM AAG AAG AAG NAGC_HUM AAG AAA AAA
BASO_HUM AAG AAG AAA NAH1_HUM AAG AAA AAG
BCR_HUM AAG AAG AAG NEFA_HUM AAG AAG AAG
BMP3_HUM AAA AAG AAA NGAL_HUM AAA AAG AAG
BN51_HUM AAG AAA AAA P11A_HUM AAG AAG AAA
BN51_HUM AAG AAG AAA P11G_HUM AAG AAA AAA
BOMA_HUM AAG AAA AAG P2YR_HUM AAA AAG AAG
BTC_HUM AAG AAG AAA P53_HUM AAG AAG AAA
BTF3_HUM AAG AAG AAA P68_HUM AAA AAG AAG
BTK_HUM AAA AAG AAA PA2Y_HUM AAG AAG AAA
C24B_HUM AAG AAG AAG PE19_HUM AAG AAG AAG
CA15_HUM AAA AAG AAG PGDR_HUM AAA AAG AAG
CAG1_HUM AAG AAA AAG PGDS_HUM AAG AAG AAA
CAG1_HUM AAG AAG AAA PRS4_HUM AAG AAA AAG
CAH2_HUM AAA AAG AAA RA54_HUM AAA AAG AAA
CATE_HUM AAG AAG AAG RA54_HUM AAA AAG AAA
CCG1_HUM AAA AAA AAG RA54_HUM AAA AAG AAG
CCG1_HUM AAG AAG AAA RA54_HUM AAA AAG AAG
CCG1_HUM AAG AAG AAG RA54_HUM AAG AAA AAA
CD2_HUM AAG AAA AAG RAPA_HUM AAG AAG AAG
CD3G_HUM AAA AAA AAA RBB2_HUM AAG AAG AAA
CD4X_HUM AAG AAA AAG RL40_HUM AAG AAG AAG
CDP_HUM AAG AAG AAG RS27_HUM AAG AAG AAA
CDX1_HUM AAG AAG AAA RS5_HUM AAG AAG AAG
CENC_HUM AAA AAG AAG RU2B_HUM AAG AAA AAA
CENC_HUM AAG AAG AAA RU2B_HUM AAG AAA AAA
CENC_HUM AAG AAG AAA RU2B_HUM AAG AAA AAA
CENE_HUM AAA AAG AAA RYNR_HUM AAG AAG AAG
CENE_HUM AAA AAG AAA SP2_HUM AAG AAG AAG
CIC1_HUM AAG AAG AAA SPCB_HUM AAG AAG AAG
CIK4_HUM AAG AAG AAA TEC_HUM AAA AAG AAG
CINA_HUM AAG AAA AAA TKT_HUM AAA AAG AAG
CRCM_HUM AAA AAG AAA TNR2_HUM AAA AAG AAG
CYCLI_HUM AAG AAA AAG TOP1_HUM AAG AAG AAG
CYG4_HUM AAA AAG AAG TPMB_HUM AAG AAG AAG
DEK_HUM AAG AAG AAA UBA52_HUM AAG AAG AAG
DESP_HUM AAG AAG AAG VATC_HUM AAG AAA AAA
DEST_HUM AAA AAG AAA VAV_HUM AAG AAG AAG
DHB2_HUM AAG AAA AAG VCA1_HUM AAG AAA AAA
DHQU_HUM AAG AAG AAA YY07_HUM AAG AAG AAA


FOOTNOTES

*   This work was supported in part by National Institutes of Health Grant HL41633; a fellowship award from the American Heart Association, California Affiliate (to M. F. L.); Clinical Investigator Development Award HL-02925 from the National Institutes of Health (to M. F. L.); and a research fellowship from the Deutsche Forschungsgemeinschaft (to M. R.).The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
§   To whom reprint requests may be addressed: Medical Center, Div. of Endocrinology, Vanderbilt University, Nashville, TN 37232-2250. Tel.: 615-936-1656; Fax: 615-936-1667.
Dagger Dagger    To whom correspondence and reprint requests may be addressed: Gladstone Inst. of Cardiovascular Disease, P. O. Box 419100, San Francisco, CA 94141-9100. Tel.: 415-695-3343; Fax: 415-285-5632; E-mail: steve_young.gicd{at}quickmail.ucsf.edu.
1   The abbreviation used is: bp, base pair(s).
2   Available at http://www.ebi.ac.uk.
3   Available at http://www.dna.affrc.go.jp/~nakamura/codon.html.

ACKNOWLEDGEMENTS

We thank Y. Marcel and R. Milne for monoclonal antibody 1D1 and B. Blackhart, Z. Yao, and B. McCarthy for pB46neo.


REFERENCES

  1. Linton, M. F., Pierotti, V., and Young, S. G. (1992) Proc. Natl. Acad. Sci. U. S. A. 89, 11431-11435 [Abstract]
  2. Chamberlin, M., and Berg, P. (1962) Proc. Natl. Acad. Sci. U. S. A. 48, 81-93 [Medline] [Order article via Infotrieve]
  3. Wagner, L. A., Weiss, R. B., Driscoll, R., Dunn, D. S., and Gesteland, R. F. (1990) Nucleic Acids Res. 18, 3529-3535 [Abstract]
  4. Steinberg, D., Grundy, S. M., Mok, H. Y. I., Turner, J. D., Weinstein, D. B., Brown, W. V., and Albers, J. J. (1979) J. Clin. Invest. 64, 292-301 [Medline] [Order article via Infotrieve]
  5. Young, S. G., Peralta, F. P., Dubois, B. W., Curtiss, L. K., Boyles, J. K., and Witztum, J. L. (1987) J. Biol. Chem. 262, 16604-16611 [Abstract/Free Full Text]
  6. Young, S. G., Bertics, S. J., Curtiss, L. K., Dubois, B. W., and Witztum, J. L. (1987) J. Clin. Invest. 79, 1842-1851 [Medline] [Order article via Infotrieve]
  7. Young, S. G., Bertics, S. J., Curtiss, L. K., and Witztum, J. L. (1987) J. Clin. Invest. 79, 1831-1841 [Medline] [Order article via Infotrieve]
  8. Young, S. G., Northey, S. T., and McCarthy, B. J. (1988) Science 241, 591-593 [Medline] [Order article via Infotrieve]
  9. Powell, L. M., Wallis, S. C., Pease, R. J., Edwards, Y. H., Knott, T. J., and Scott, J. (1987) Cell 50, 831-840 [Medline] [Order article via Infotrieve]
  10. Deng, W. P., and Nickoloff, J. A. (1992) Anal. Biochem. 200, 81-88 [Medline] [Order article via Infotrieve]
  11. Yao, Z., Blackhart, B. D., Johnson, D. F., Taylor, S. M., Haubold, K. W., and McCarthy, B. J. (1992) J. Biol. Chem. 267, 1175-1182 [Abstract/Free Full Text]
  12. Chen, C., and Okayama, H. (1987) Mol. Cell. Biol. 7, 2745-2752 [Medline] [Order article via Infotrieve]
  13. Yao, Z., Blackhart, B. D., Linton, M. F., Taylor, S. M., Young, S. G., and McCarthy, B. J. (1991) J. Biol. Chem. 266, 3300-3308 [Abstract/Free Full Text]
  14. Linton, M. F., Farese, R. V., Jr., Chiesa, G., Grass, D. S., Chin, P., Hammer, R. E., Hobbs, H. H., and Young, S. G. (1993) J. Clin. Invest. 92, 3029-3037 [Medline] [Order article via Infotrieve]
  15. Pease, R. J., Milne, R. W., Jessup, W. K., Law, A., Provost, P., Fruchart, J.-C., Dean, R. T., Marcel, Y. L., and Scott, J. (1990) J. Biol. Chem. 265, 553-568 [Abstract/Free Full Text]
  16. Nakamura, Y., Wada, K.-N., Wada, Y., Doi, H., Kanaya, S., Gojobori, T., and Ikemura, T. (1996) Nucleic Acids Res. 24, 214-215 [Abstract/Free Full Text]
  17. Glantz, S. A. (1992) Primer of Biostatistics: The Program, Version 3.0, McGraw-Hill Book Co., New York
  18. Jacks, T., Townsley, K., Varmus, H. E., and Majors, J. (1987) Proc. Natl. Acad. Sci. U. S. A. 4298-4302
  19. Chamorro, M., Parkin, N., and Varmus, H. E. (1992) Proc. Natl. Acad. Sci. U. S. A. 89, 713-717 [Abstract]
  20. Sekine, Y., and Ohtsubo, E. (1989) Proc. Natl. Acad. Sci. U. S. A. 86, 4609-4613 [Abstract]
  21. Flower, A. M., and McHenry, C. S. (1990) Proc. Natl. Acad. Sci. U. S. A. 87, 3713-3717 [Abstract]
  22. Tsuchihashi, Z., and Kornberg, A. (1990) Proc. Natl. Acad. Sci. U. S. A. 87, 2516-2520 [Abstract]
  23. Atkins, J. F., Weiss, R. B., and Gesteland, R. F. (1990) Cell 62, 413-423 [Medline] [Order article via Infotrieve]
  24. Weiss, R. B., Dunn, D. M., Shuh, M., Atkins, J. F., and Gesteland, R. F. (1989) New Biol. 1, 159-169 [Medline] [Order article via Infotrieve]

©1997 by The American Society for Biochemistry and Molecular Biology, Inc.