©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
Differential Effects of Simple Repeating DNA Sequences on Gene Expression from the SV40 Early Promoter (*)

(Received for publication, July 25, 1994; and in revised form, November 30, 1994)

Sorour Amirhaeri Franz Wohlrab (§) Robert D. Wells (¶)

From the Department of Biochemistry, Schools of Medicine and Dentistry, University of Alabama, Birmingham, Alabama 35294 and the Institute of Biosciences and Technology, Texas A& University, Texas Medical Center, Houston, Texas 77030

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

The influence of simple repeat sequences, cloned into different positions relative to the SV40 early promoter/enhancer, on the transient expression of the chloramphenicol acetyltransferase (CAT) gene was investigated. Insertion of (G)bullet(C) in either orientation into the 5`-untranslated region of the CAT gene reduced expression in CV-1 cells 50-100 fold when compared with controls with random sequence inserts. Analysis of CAT-specific mRNA levels demonstrated that the effect was due to a reduction of CAT mRNA production rather than to post-transcriptional events. In contrast, insertion of the same insert in either orientation upstream of the promoter-enhancer or downstream of the gene stimulated gene expression 2-3-fold. These effects could be reversed by cotransfection of a competitor plasmid carrying (G)bullet(C) sequences. The results suggest that a GbulletC-binding transcription factor modulates gene expression in this system and that promoter strength can be regulated by providing protein-binding sites in trans. Although constructs containing longer tracts of alternating (C-G), (T-G), or (A-T) sequences inhibited CAT expression when inserted in the 5`-untranslated region of the CAT gene, the amount of CAT mRNA was unaffected. Hence, these inhibitions must be due to post-transcriptional events, presumably at the level of translation. These effects of microsatellite sequences on gene expression are discussed with respect to recent data on related simple repeat sequences which cause several human genetic diseases.


INTRODUCTION

Eukaryotic genomes contain a relatively high percentage of direct repeat sequences known as microsatellites. Simple di-, tri, or tetranucleotide repeats have been reported to be unstable and vary in length of repeat unit from one to several thousand base pairs. Recently, abnormalities of such repeats have been implicated in the genesis of a number of human genetic conditions. For example, instability of dinucleotide (CA) repeats has been observed in some human cancers(1) . More recently, several neurological diseases result from an increased number of trinucleotide repeat units near or within certain genes(2, 3, 4, 5) . Expansion of trinucleotide (CGG) repeats is responsible for the fragile X syndrome (6) while several other disorders (myotonic dystrophy, spinal and bulbar muscular atrophy, spinocerebellar ataxia type 1, and Huntington's disease) have been linked to expanded triplet blocks of (CTG) repeats(4, 7, 8, 9, 10) . The massive expansion of these triplet repeats is a novel type of mutational event(11) . The molecular etiology of this non-Mendelian behavior is unknown at present.

Simple repeat sequences adopt several types of non-B-DNA structures under appropriate environmental conditions (negative supercoil density, ionic strength, etc). Repeating pur-pyr sequences, (G-C) or (A-C) form left-handed Z-DNA, and inverted repeats adopt cruciform structures(11, 12) . Under the influence of negative supercoiling and acidic conditions, (G-A)bullet(T-C), repeat sequences fold into intramolecular triple-stranded structures (H-DNA) based on CbulletGbulletC triads, as long as there is mirror symmetry of the purbulletpyr stretch(11, 13) . In the presence of magnesium, (G)bullet(C) sequences in plasmids also form CbulletGbulletG-based triple strands even at neutral pH(13, 14) . The tandem arrays of 12 bp (^1)direct repeats (DR2) in the segment inversion site of herpes simplex virus-1 is highly G+C-rich and forms an unorthodox conformation under physiological conditions (15) . Additionally, runs of three or more Gs can self-associate to form tetra-stranded structures which have been implicated as possible synaptic intermediates during meiosis (16) and are essential parts of the structure adopted by the single-stranded telomeric repeats at the ends of eukaryotic chromosomes(17, 18, 19) . The trinucleotide (CTG) repeat found in the 3`-untranslated region of the myotonin kinase gene (20) adopts a still different non-B DNA structure (^2)(^3)and displays preferential nucleosome assembly in vitro(22) .

Recent studies demonstrate that the non-B DNA structures form and have biological consequences in vivo. Left-handed Z-DNA exists in plasmids (23) and the chromosome (24) and is stabilized by domains of negative supercoiling (25, 26) in living Escherichia coli cells. In addition, the triplexes can exist in vivo and regulate gene expression(27, 28) .

Simple repeat sequences of purbulletpyr composition occur with high frequency in the promoters and particularly around the transcriptional signal of many eukaryotic genomes(13, 14) . These repeat sequences are found in the 5`-untranslated region of a number of human genes (3, 29, 30) and are the binding sites for several proteins(29, 31) . The trinucleotide (CCG) repeat which becomes unstable in fragile X syndrome is located in the 5`-untranslated region of the FMR1 gene and is characterized as a binding site for a specific nuclear protein(32) .

In spite of the realization that simple repeat sequences exhibit high structural variability in vitro, relatively little is known about their properties in biological systems. Hence, we investigated the effects of the simple repeating sequences on transcription. Herein, a systematic study was conducted to understand the biological consequences of the presence of repeating sequences in the vicinity of a transcription unit on transient gene expression in monkey cells. Our data shows that (C)bullet(G) tracts act as modulators in cis on transcription driven by the SV40 early promoter/enhancer. Transcription is inhibited, and this inhibition can be titrated in trans with a competitor sequence. Therefore, (G)bullet(C)-binding proteins may act as transcriptional regulators. Constructs containing other repeating sequences, (C-G), (T-G), and (A-T), inhibited gene expression only when inserted in the 5`-untranslated region of the CAT gene in a manner that was length-dependent. RNA analyses revealed that transcription was not inhibited by these inserts and that full-length mRNA was produced. This indicates that the inhibition is due to post-transcriptional events.


EXPERIMENTAL PROCEDURES

DNA Samples

Table 1shows the constructs used in this study. pRW1906 was generated by insertion of the pUC12 polylinker fragment into the unique HindIII site (site 1, 5`-untranslated region) of pSV2cat(33) . The GbulletC insert was generated originally by insertion of a blunt-ended 25-bp (G)bullet(C) oligomer between the SmaI and the filled-in DraII site of pBluescript (+)KS (pRW1231). Subsequent isolation of a HaeIII fragment containing this insert and ligation into the SmaI site of pRW1906 resulted in a 29-bp (G)bullet(C) stretch flanked by two BamHI sites with the following sequence: GGATCCCCC(G)ATCC. This plasmid is designated pRW1253. Inversion of the HindIII fragment containing the polylinker and the insert gave pRW1254. In order to study the effect of flanking sequences on CAT gene expression, a vector plasmid (pRW1293) was constructed by inserting a BglII linker into the filled-in HindIII site of pSV2cat. Insertion of the BamHI fragment containing the (G) sequence into this site yielded pRW1266. To move the DNA repeating blocks upstream of the SV40 promoter, a vector plasmid (pRW1292) was constructed by insertion of a BglII linker into the Tth111I site of pSV2cat (site 2, 400 bp upstream of the promoter/enhancer). The BamHI fragment containing the (G) insert was excised from pRW1253 and then placed into this site, giving rise to pRW1259 and pRW1260, which differ in the orientation of the insert. Insertion of the same BamHI fragment into BamHI site of pSV2cat (site 3, 1600 bp downstream of the promoter/enhancer) resulted in pRW1269. pRW1221 was prepared by cloning the synthetic (GGGGA)(5) insert into the SmaI site of the plasmid vector pRW790. The BamHI fragment containing this insert was then excised from this plasmid, blunt- ended and inserted into Site 1 of pSV2cat to yield pRW1271 (purines in template strand) and pRW1272 (pyrimidines in template strand). The insertion of the same fragment in the Site 2 of pSV2cat resulted pRW1273. For constructs in Site 4, the BamHI fragment containing (G) insert was excised from pRW1253, blunt-ended and inserted into the MscI site of pSV2cat (600 bp downstream of the SV40 promoter/enhancer, within the CAT coding sequence) which resulted in pRW1931 and pRW1932, with different orientations of the insert.



In parallel studies, plasmids containing different lengths of other repeating DNA sequences were constructed (Table 1). Fragments containing (CG)E(CG)(9), (CG)(7)E(CG)(7), and (CG)(7) were excised from pRW1558, pRW1557, and pRW1567(34) , respectively, by XhoII cleavage except for (CG)(7) where XhoII + EcoRI were used. (E designates the AATT in the EcoRI sites.) These fragments were blunt-ended and inserted into the filled-in HindIII site of pSV2cat (site 1) except for pRW1958, which had its insert in the SmaI site of the polylinker in pRW1906. The plasmids thus obtained were designated pRW1958, pRW1957, and pRW1925, respectively. The 48- and 32-bp alternating (purbulletpyr) sequences were inserted into site 2 of pSV2cat resulting in pRW1922 and pRW1923, respectively. The insertion of the same 48- and 32-bp fragments into site 3 generated pRW1927 and pRW1928, respectively. Recombinant plasmids also were constructed which contained other repeating sequences. pRW1912 has a 98-bp BamHI fragment containing (TG)E(CA) derived from pRW1151(35) . Insertion of this fragment into sites 2 and 3 resulted in pRW1921 and pRW1926, respectively. pRW1924 contains the 140-bp insert from the 3` side of the mouse kappa immunoglobulin gene containing a 62-bp tract of (T-G); this fragment was excised from pRW777 (36) using EcoRI + HindIII. pRW1929 has a 120-bp XbaI-SspI fragment containing the (AT)N(7)(AT) sequence. This fragment was excised from a modified pUC plasmid containing the 1.9-kilobase pair KpnI- PvuII fragment, which is the DNase I super-hypersensitive site located 53 kbp upstream of the human beta-globin gene (37) (a gift of Dr. Tim Townes of University of Alabama, Birmingham). The above fragments were blunt-ended and inserted into the site 1 of pSV2cat. All plasmids were grown in E. coli HB101 and purified twice over cesium chloride gradients. All inserts were characterized by DNA sequence analyses on both strands using the Maxam-Gilbert sequencing method(38) .

DNA Transfections

Plasmid DNA was introduced into CV-1 cells by the calcium phosphate procedure(39) . Briefly, cells were grown in Dulbecco's modified Eagle's medium (Flow Laboratories) supplemented with 10% fetal calf serum (Life Technologies, Inc.) and transfected at 60-80% confluence with 20 µg of plasmid DNA/100-mm dish. At 6 h after transfection at 37 °C, cells were shocked with 10% glycerol for 2 min and refed with fresh medium and further incubated for 48 h. COS-1 cells were transfected by the DEAE-dextran procedure with 20 µg of DNA/100-mm plate in the presence of chloroquine phosphate. The cells were shocked with 10% Me(2)SO and further incubated with fresh culture medium as described above.

Isolation of DNA

Extrachromosomal DNA was recovered from transfected COS-1 cells from Hirt supernatants (40) by several cycles of treatment with proteinase K and ribonuclease A, phenol-chloroform extractions, and ethanol precipitations. High molecular weight DNA was recovered from transfected CV-1 cells (41) and purified as described above. Supercoiled or linear DNA was digested with DpnI for several hours. DpnI is a multicutting enzyme that requires methylated DNA for its substrate and thus cleaves the input plasmid DNA, but not the DNA newly replicated in a eukaryotic cell. The digested products were fractionated on agarose gels and transferred to Zeta-probe nylon membranes (Bio-Rad). The integrity of the inserts was monitored by the Southern blot analysis of the replicated DNA, using labeled linearized pSV2cat as a probe.

CAT Assays

CAT assays were conducted essentially as described by Gorman et al.(33) . Briefly, transfected cells were scraped into Eppendorf tubes and resuspended in 40 mM Tris-HCl, pH 7.4, 1 mM EDTA, 150 mM NaCl. The cells were broken by repeated cycles of freeze-thawing, and the supernatant was analyzed for protein content. Equal amounts of extract were used in enzyme assays using ^14C-labeled chloramphenicol (DuPont) and acetyl-CoA (Pharmacia LKB Biotechnol) as substrates. The products of the reaction were separated by ascending thin layer chromatography and analyzed by autoradiography and liquid scintillation counting of the individual spots. The assays were normalized for transfection efficiency by analysis for expression of beta-galactosidase activity by cotransfection of 10 µg/100-mm dish of the plasmid pCH110 (Clontech) which carries a beta-galactosidase gene driven by the SV40 early promoter.

Analysis of RNA

Total cellular RNA was prepared by the guanidinium isothiocyanate total cell lysis method(42) . RNA was analyzed by transferring to Zeta-probe nylon membrane and subsequent slot or Northern blotting.

In Vivo Competition Assay

(G) CAT constructs were cotransfected in CV-1 cells with different amounts of a competitor plasmid (pRW1231) which contains (G)bullet(C) in pBluescript(+) KS. The total amount of DNA transfected per 100-mm dish was brought to 20 µg by addition of pBluescript vector. CAT assays and RNA analyses were performed as described.


RESULTS

Runs of (G)bullet(C) within the Untranslated Leader Sequence Strongly Reduce Gene Expression

The plasmids shown in Table 1were designed to investigate the effects of purbulletpyr tracts on transcription from the SV40 early promoter. Simple DNA sequences were placed in different positions in the expression vector pSV2cat (shown in Fig. 1), namely upstream of the promoter/enhancer region (Tth111I site) (site 2), 600 bp downstream (MscI site) (site 4), far downstream (BamHI site) (site 3), and at a site located between the promoter and the start of translation (HindIII site) (site 1). Whereas the first three positions lie outside the transcription unit, cloning into the HindIII site results in modification of the 5`-untranslated region of the CAT mRNA.


Figure 1: Schematic diagram of pSV2cat. The SV40 early promoter, the TATA box (AT), the 21-bp repeats, and the 72-bp enhancers are shown. The arrow indicates the direction of transcription. DNA-repeating sequences were inserted into the 5`-untranslated region of the CAT gene (site 1). In addition, the same sequences were placed either upstream (site 2) or downstream of the SV40 promoter/enhancer (sites 3 and 4). The heavy line indicates the bacterial CAT-coding sequence. The cross-hatched regions correspond to the three DNA fragments used as hybridization probes.



Table 1and Fig. 2show that insertion of 50 bp of polylinker sequences into the 5`-untranslated region of the gene (pRW1906) has no detrimental effect on gene expression and, in fact, consistently yielded slightly higher enzyme activity levels than the parent plasmid pSV2cat. In contrast, the level of CAT activity for pRW1253, which contains a 29-bp GbulletC tract in the same context as pRW1906, is strongly reduced. If this effect was caused by mRNA secondary structure, inversion of the insert (pRW1254) should lead to normal levels of CAT expression, since the other strand is now the coding strand. Comparison of the CAT activity expressed by the two constructs containing (G)bullet(C) sequences in different orientations showed that both produced similar low levels of enzyme. As shown in Table 1, the reduction in activity was of the order of 20-100-fold relative to pSV2cat.


Figure 2: CAT levels expressed by constructs containing repeating (G)bullet(C) tracts within the transcribed region. Transfection of CV-1 cells and determination of CAT activity were performed as described under ``Experimental Procedures.''



Since the negative effect of the (G)bullet(C) sequences was independent of the orientation and therefore of the transcribed DNA strand, it was likely to be caused by an event at the DNA level. To eliminate the possibility that the observed reductions are a consequence of the polylinker sequences present in addition to the purbulletpyr tracts in pRW1253 and pRW1254, we also constructed a plasmid (pRW1266) in which most of the polylinker sequences surrounding the (G)bullet(C) block were deleted (see ``Experimental Procedures''). This construct was very deficient in CAT expression, even more than pRW1253 and pRW1254, indicating that the negative effect is caused by the presence of the purbulletpyr tracts and not by any flanking sequences (data not shown).

The effect of simple repeat DNA sequences was investigated further by studying sequence isomeric inserts. The level of CAT activity in constructs containing (C-G) and (C-G) was strongly reduced (80-90%) whereas no inhibition was observed for pRW1925 containing 14 bp of (C-G). Thus, the inhibition of gene expression depends on the length of the (C-G) insert. Additionally, constructs containing (T-G) (pRW1912) and (A-T) sequences (pRW1929) inhibited CAT expression whereas little or no inhibition was observed with pRW1924 containing the (T-G) insert (Table 1).

The inhibition of CAT expression observed with plasmids containing these repeating sequences was not a result of deletions or rearrangements of these repeats in mammalian cells since Southern blot analyses on plasmids containing the above inserts in CV-1 and COS-1 cells confirmed the integrity of all the insert sequences (see ``Experimental Procedures''). The reason why other workers (43) observed deletions of (C-G) sequences in SV40-derived constructs is unclear but may be due to the use of different systems.

The Block to Gene Expression in (G) Tracts Is Pretranslational

To further investigate if the block to gene expression was pretranslational, the RNA blot analyses shown in Fig. 3were performed. The probes used were specific to the 5`- untranslated sequence, probe A (NcoI-HindIII fragment), and the 3` end of the RNA, probe C (EcoRI-BamHI fragment), shown as cross-hatched regions in Fig. 1. Fig. 3shows that the level of CAT mRNA made by pRW1253 is greatly reduced compared to the control plasmids. This is true with the probe hybridizing to full-length RNA and the one hybridizing to the 5` end of the CAT gene. Similar results also were obtained with probe B (HindIII-EcoRI fragment) which covers the middle of the transcript (data not shown). These results indicate that the overall levels of transiently transcribed RNA are reduced rather than just the production of full-length RNA. This conclusion is corroborated by the same result obtained with pRW1254, which carries the insert in the opposite orientation. In contrast, full-length RNA was made with constructs containing 48-, 32-, and 14-bp alternating (C-G) sequences (pRW1958, pRW1957, and pRW1925, respectively) as revealed by RNA analysis using probe C, which hybridizes to the 5` end of the CAT gene (Fig. 3). The amount of mRNA made from these constructs was approximately equal to or greater than that made from pSV2cat or pRW1906. Similar results also were obtained with plasmids containing (T-G) and (A-T) inserts (data not shown). In these cases, it is clear that the block to gene expression does not lie at the DNA level, but rather is post-transcriptional, presumably at the level of translation.


Figure 3: CAT mRNA levels in transfected CV-1 cells. Total RNA was extracted from CV-1 cells transfected with the indicated constructs 48-h post-transfection, transferred to nylon membranes, and probed with the DNA fragments as described under ``Experimental Procedures.''



Effect of PurbulletPyr Sequences Upstream of the Promoter

Since some of the simple purbulletpyr sequences appeared to down-regulate promoter activity rather than block transcription elongation, the inserts were moved outside of the transcription unit. Table 1shows that when the (G) tract was inserted in site 2, approximately 400 bp upstream of the transcription start site (pRW1259), gene expression was higher than that in pSV2cat or a control plasmid pRW1292 which carries a BglII linker instead of the purbulletpyr sequences in the same position. The effect was again orientation independent, as demonstrated by the orientation isomer pRW1260. However, the increase in CAT activity (compared to the parent plasmid pSV2cat) was only 2-3-fold and could only be observed with plasmids containing the SV40 72-bp enhancers. No stimulation of gene expression above the basal level could be observed (data not shown) with constructs derived from the parent plasmids pA10cat2 and pCAT promoter (Promega), which lack the enhancer sequences.

The up-modulation of gene expression was largely independent of the distance of the (G) inserts from the transcription start site, since constructs with inserts in site 3, downstream of the CAT gene (pRW1269) showed the same level of expression as those in site 2, (pRW1259 and pRW1260). This is further corroborated by the results obtained with (G) insert into site 4. Since the insertion of (G) and (C) at site 4 lies within the coding sequence of the CAT gene and causes no expression of the gene, RNA analyses were performed on these constructs. Full-length mRNA was made by pRW1931 as revealed by Northern blot analysis (not shown) which again was independent of the orientation of the insert as demonstrated by the orientation isomer pRW1932. The amounts of RNA made by the constructs at site 4 were similar to those in sites 2 and 3.

The level of the CAT activity observed in constructs containing 48- and 32-bp repeating (C-G) inserts in site 2 (pRW1922 and pRW1923, respectively) and site 3 (pRW1927 and pRW1928, respectively) was similar to the parental plasmid, pSV2cat or pRW1292. Similar results were obtained with pRW1921 and 1926 containing repeating (T-G) inserts at site 2 and 3, respectively.

Furthermore, when these sequences were inserted into pA10cat2, no stimulation of CAT expression was observed. Our results clearly indicate that the alternating (C-G) and (T-G) sequences do not enhance transcription of the CAT gene in the enhancerless constructs (pA10cat2 derivatives), in disagreement with a previous report (44) and likewise do not have any effect on transcription from the wild-type SV40 promoter.

Effect of G to A Substitutions

A possible reason for the observed inhibition of CAT gene expression is the binding of a transcription factor to the (G)bullet(C) sequences. To investigate this possibility, we first examined gene expression in pSV2cat which carried (GGGGA)(5) inserts. As shown in Table 1, these inserts (pRW1271, pRW1272, and pRW1273) had little effect on CAT expression, indicating that the modest interruption of the run of Gs is detrimental to the observed inhibitions. This result also argues against the possibility that the modulation of gene expression is due to cleavage of the purbulletpyr sequences by a member of the family of nuclear endonucleases with specificity for (G)bullet(C) tracts(45, 46, 47, 48) . In vitro experiments have shown that (GGGGA)(5) is cleaved by these enzymes at approximately one-third the rate of (G)bullet(C) (data not shown). This relatively small difference does not explain the large influences seen in the transient expression assays. In addition, Southern blot analysis of the fate of the (G)bullet(C)-containing plasmid in COS-1 cells showed about 5% double- stranded cleavage, which is too little to account for the effects on transcription.

The Effect on Gene Expression Mediated by (G) Tracts Is Due to a Trans-acting Factor

If the (G) tract-repeating sequences are complexed to putative DNA-binding proteins in vivo, this may interfere with gene expression. To investigate this behavior, we conducted an in vivo competition experiment. pRW1254 was transfected into CV-1 cells together with a competitor plasmid (pRW1231), which contains a run of (G) in pBluescript(+) KS. The total amount of DNA transfected per 100-mm dish was brought to 20 µg by addition of pBluescript vector. As shown in Fig. 4, the presence of the (G) block on the competitor plasmid relieved the inhibition of gene expression on the reporter construct. It can be seen that when more of the reporter plasmid was used, more of the competitor plasmid had to be added in proportion and hence the same effects are observed. Cotransfection of 12 µg of pRW1231 or pBluescript had no effect (< ±4%) on CAT expression from the pSV2cat control. This suggests that a trans-acting factor is responsible for the inhibition of gene expression in pRW1254, and that the factor acts as a repressor on the (G) inserts in site 1. The fact that the degree of derepression is titratable by competitor plasmid indicates that the factor involved is present in limiting amounts. The in vivo competition assay was also carried out for (G) inserts in sites 2 and 3. pRW1259 and pRW1269 were cotransfected with competitor plasmid (pRW1231) as described above. The up-modulation of CAT gene expression was reversed upon adding the competitor plasmid in a manner that was orientation-independent. This is also true for the insert at site 4 (pRW1931 and pRW1932) as confirmed by RNA analyses. This suggests that the trans-acting factor acts as an activator when the (G) sequences are inserted distal from the initiation site.


Figure 4: Competition in trans by a plasmid containing repeating (G)bullet(C) sequences. CAT activity of cells cotransfected with 3 µg (open circles) or 8 µg (filled circles) of reporter plasmid pRW1254 and increasing amounts of competitor plasmid pRW1231 is shown. For the CAT assays, protein concentrations were adjusted so that 20-35% of starting material was acetylated after 30 min. All values reflect the average of two determinations. Reproducibility was ±15%.




DISCUSSION

These results clearly demonstrate that (G)bullet(C) blocks of sufficient length at site 1 can exert a negative regulatory effect on gene expression in a transient expression system. This effect is independent of the orientation of the insert, which excludes the possibility of formation of mRNA secondary structure as the basis for the observed block of CAT expression. In addition, it appears from RNA blots that overall transcription is repressed in these constructs and that pausing or stoppage of the RNA polymerase at the purbulletpyr sites does not significantly contribute to the effect.

In parallel studies, we found that alternating (G-C) stretches of 48 or 32 bp in length exhibited a strong (80-90%) reduction in CAT activity. Similar effects were observed with constructs containing 120-bp alternating (A-T) and 98-bp (T-G) insert sequences. The large reduction in CAT activity exerted by these inserts is clearly due to a post-transcriptional block. Since these inserts have a dyad axis in their sequences, a hairpin can form in the mRNA. Such hairpins have been shown to inhibit the movement of the 40 S ribosomal subunit along the RNA(49, 50) . The (T-G) tracts lacked this symmetry and did not inhibit CAT gene expression. Calculations of the free energies of formation of such hairpins, which are of the order of 100 kcal/mol, reveal that the mRNA of the (G-C) inserts can fold into extraordinarily stable hairpins. It is very likely that such a structure would present a strong stop for the ribosomes. The finding that the (C-G)(7) insert, which contains an inverted repeat, does not inhibit CAT expression is probably due to its shorter length and its lower free energy (<30 kcal/mol) for mRNA hairpin formation. The free energy of RNA hairpin formation for the (A-T)-rich insert (pRW1929) is <30 kcal/mol. The reason for the observed reduction in its CAT activity is uncertain but may be related to its A+T content. The alternating (C-G) and (T-G) sequences are known to form Z-DNA in vitro and in vivo in bacterial cells(34, 35, 51) , whereas plasmids containing repeat sequences of (A-T) (pRW1929) and (TG)E(CA) (pRW1912) adopt cruciform structures. Our data do not strictly exclude the possible existence of such structures in mammalian cells, but it clearly indicates that these sequences do not act as inhibitors of transcription. From our studies, it therefore appears that the G+C content of the insert alone is not sufficient to produce the observed inhibitions of (G)bullet(C) tracts on transcription. Since the inserts seem to be >95% stable, we assume that the down-regulation of CAT transcription is not due to gene rearrangements or site-specific cutting by (G)bullet(C)-specific endonucleases. We cannot rule out extensive site-specific nicking of the purbulletpyr sequences by such enzymes. However, such a model would make it difficult to explain the observed reduction of expression of the CAT gene by the constructs carrying inserts outside the transcription unit. In addition, to ascertain that the (G)bullet(C)-specific reductions in transcriptional competence are not cloning artifacts, we have removed the inserts from these constructs and reconstituted wild-type pSV2cat activity (data not shown).

It is conceivable that mRNA containing long runs of homopolymers could hybridize with its coding sequence by forming a D-loop or a triplex and thus hinder the formation of new transcripts. However, this is very unlikely since the down-regulation of transcription is independent of the orientation of the inserts. Similarly, the fact that these sequences do not act as enhancers excludes the possibility of the competition for transcription stimulation factors. Since the inserts do not appear to inhibit elongation of the CAT mRNA, but rather repress transcription initiation, it seems possible that the (G)bullet(C) inserts act at a distance to render the promoter region less active. Oligo(G)bullet(C) blocks have been shown to exert structural distortions on flanking sequences(52, 53) , and similar purbulletpyr sequences have been reported to influence structural transitions over a long distance (54, 55, 56) . The fact that gene expression is reduced to a different degree depending on the distance of the (G)bullet(C) inserts from the promoter supports this model.

A plausible model for site 1 is that the (G)bullet(C) sequences are complexed to putative DNA-binding proteins in vivo which then interfere with gene expression. In this case, the binding protein would obstruct the accessibility of the promoter to transcription factors or prevent the DNA from assuming the conformation necessary for promoter function. In previous experiments, a model was proposed for the effect of (G)(n) tracts on gene expression when these sequences were inserted 5` of the TK enhancerless promoter(28) . In their model, a trans-acting factor fails to bind to the long (G) tracts due to formation of triple helix structure in mouse LTK cells. Our results with inserts at site 2 agree with theirs although the level of enhancement of CAT gene expression is different. We have observed 2-3-fold enhancement with (G) at this site as compared to their 10-fold enhancement. The difference could be due to the fact that our systems are not the same. It is possible that different promoters have different responses to the presence of poly purine tracts, and also the G-binding protein (GBP) could be present at different levels in different cell types.

As stated above, simple repeat sequences containing runs of Gs and Cs occur in the 5`-untranslated region of several genes (14, 29) and are binding sites for proteins(29, 57) . An erythrocyte-specific factor (BGP1) binds to the linear fragments of (G) tracts in the 5`-flanking region of the chicken adult beta-globin gene(29) . It has been shown that this factor has greater affinity for the (G)(n) sequences than Sp1 and is distinct from Sp1. Additionally, the nuclear factor suGF1 from sea urchin embryos was reported recently to interact with 11 contiguous Gs in the H1-H4 intergenic region of a sea urchin early histone gene in vitro(58) . Although it has been suggested that these factors may play a role in gene regulation, the current experimental evidence for the proposed role of these factors binding to (G)bullet(C) is unclear. Here, we have shown that when (G) is inserted in the 5`-untranslated region of the CAT gene transcription is inhibited, and this inhibition can be titrated in trans with the competitor. Thus, the (G)-binding proteins can act as transcriptional regulators. We believe that this (G)-binding factor is different from Sp1, but further work is required to characterize this factor and study its interaction with the (G) tracts in detail.

The results with (G) inserts in sites 1 and 2-4 are consistent with a looping model. DNA looping is known to facilitate protein-protein interactions for accurate transcriptional initiation (59) . A possible mechanism by which these inserts may affect gene expression is suggested by the fact that the (G)-binding protein (GBP) can recognize the (G) sequences. The DNA bound regulatory protein (GBP) touches the TATA-box binding protein (TBP), and initiates transcription when the intervening DNA loops out or bends to allow protein-protein interactions to occur. When (G) is inserted in site 1, the loop is too small to form because the distance between the (G) and the transcription initiation site is 100 bp; the GBP binds to (G) in the reporter plasmid, blocking transcription and resulting in very low expression. This suggests that it acts as a repressor for the inserts in site 1. In the presence of the competitor, the GBP binds to the (G) in the competitor DNA, and the level of expression goes up slightly. Alternatively, when (G) is inserted in sites 2, 3, or 4, distal from the initiation site, looping is possible, transcription is initiated, and RNA is produced. In this case, the loop is not formed in the presence of the competitor, and the expression is lowered, suggesting that the GBP now acts as an activator for the inserts at these sites.

Our results are consistent with this looping model. However, we cannot exclude the possibility of formation of an altered DNA structure in (G) tracts inside the monkey cells which could be recognized by the G-binding factor, thereby affecting the initiation of transcription. Protein-protein interactions between GBP and the basal transcription factor can be studied further using the two-hybrid system pioneered by Fields et al.(61) .

The biological functions of DNA microsatellites are unknown. Our data demonstrate that microsatellite-type sequences, which are recognized by some proteins, can profoundly effect transcription and thus gene expression, depending on their sequence, length, and map location. Numerous prior studies have revealed that these same factors are important in stabilizing non-B-DNA conformations(25, 26) . Several human genetic diseases (Introduction, 2-11) are caused by expansion of (CGG) or (CTG) triplet repeats which are proximal or within the relevant genes. It is of interest that the (CGG)(n) repeat sequences are located in the 5`-untranslated region of the FMR1 gene whose transcriptional regulation is linked to the fragile X syndrome. In affected individuals, the FMR1 protein is absent(21) , therefore, suggesting the possible role of microsatellites in gene expression. These sequences are known to adopt non-B-DNA structures (11-13, 15-19, 28, 34-36, 51-53, 56, 60) and can serve as binding sites for some proteins(32, 60) . These correlations suggest the possible conformational roles of these repetitive sequences in gene regulation of human diseases.


FOOTNOTES

*
This work was supported by National Institutes of Health Grant GM 30822, National Science Foundation Grant DMB-9103942, and the Robert A. Welch Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
Present address: Dept. of Molecular Genetics, University and Biocenter Vienna, A-1030 Wien, Austria.

To whom correspondence should be addressed.

(^1)
The abbreviations used are: bp, base pair(s); CAT, chloramphenicol acetyltransferase.

(^2)
S. Kang, K. Ohshima, S. Amirhaeri, and R. D. Wells, manuscript submitted for publication.

(^3)
S. Amirhaeri and R. D. Wells, manuscript in preparation.


ACKNOWLEDGEMENTS

We thank Dr. Tom Boal (Cruachem) for helpful discussions.


REFERENCES

  1. Yee, C. J., Roodi, N., Verrier, C. S., and Parl, F. F. (1994) Cancer Res. 54, 1641-1644 [Abstract]
  2. Caskey, C. T., Pizzuti, A., Fu, Y-H., Fenwick, R. G., and Nelson, D. (1991) Science 256, 784-788
  3. Richards, R. I., and Sutherland, R. G. (1992) Cell 70, 709-712 [Medline] [Order article via Infotrieve]
  4. The Huntington's Disease Collaborative Research Group (1993) Cell 72, 971-983 [Medline] [Order article via Infotrieve]
  5. Miva, S. (1994) Nature Genet. 6, 3-4 [CrossRef][Medline] [Order article via Infotrieve]
  6. Kremer, E. J., Pritchard, M., Lynch, M., Yu, S., Holman, K., Baker, E., Warren, S. T., Schlessinger, D., Sutherland, G. R., and Richards, R. I. (1991) Science 252, 1711-1714 [Medline] [Order article via Infotrieve]
  7. Mahadevan, M., Tsilfidis, C., Sabourin, L., Shutler, G., Amemiya, C., Jansen, G., Neville, C., Narang, M., Barcelo, J., O'Hoy, K., Leblond, S., Earle-MacDonald, J., De Jong, P. J., Wieringa, B., and Korneluk, R. G. (1992) Science 255, 1253-1255 [Medline] [Order article via Infotrieve]
  8. Biancalana, V., Serville, F., Pommier, J., Julien, J., Hanauer, A., and Mandel, J. L. (1992) Hum. Mol. Genet. 1, 255-258 [Abstract]
  9. Orr, H. T., Chung, M-Y., Banfi, S., Kwiatkowski, T. J., Servadio, A., Beaudet, A. L., McCall, A. E., Duvick, L. A., and Ranum, L. P. W. (1993) Nature Genet. 4, 221-226 [Medline] [Order article via Infotrieve]
  10. La Spada, A. R., Wilson, E. M., Lubahn, D. B., Harding, A. E., and Fischbeck, K. H. (1991) Nature 352, 77-79 [CrossRef][Medline] [Order article via Infotrieve]
  11. Sinden, R. R., and Wells. R. D. (1992) Curr. Opin. Biotechnol. 3, 612-615 [Medline] [Order article via Infotrieve]
  12. Murchie, A. I. H., Bowater, R., Aboul-ela, F., and Lilley, D. M. J. (1992) Biochim. Biophys. Acta 1131, 1-15 [Medline] [Order article via Infotrieve]
  13. Mirkin, S. M., and Frank-Kamenetskii, M. D. (1994) Annu. Rev. Biophys. Biomol. Struct. 23, 541-576 [CrossRef][Medline] [Order article via Infotrieve]
  14. Lu, G.-H., and Ferl, R. J. (1993) Int. J. Biochem. 25, 1529-1537 [CrossRef][Medline] [Order article via Infotrieve]
  15. Wohlrab, F., and Wells, R. D. (1989) J. Biol. Chem. 264, 8207-8213 [Abstract/Free Full Text]
  16. Sen, D., and Gilbert, W. (1988) Nature 334, 364-366 [CrossRef][Medline] [Order article via Infotrieve]
  17. Sundquist, W. I., and Klug, A. (1989) Nature 342, 825-829 [CrossRef][Medline] [Order article via Infotrieve]
  18. Williamson, J. R., Raghuraman, M. K., and Cech, T. R. (1989) Cell 59, 871-880 [Medline] [Order article via Infotrieve]
  19. Panyutin, I. G., Kovalsky, O. I., Budowsky, R. E., Dickerson, R. E., Rikhirev, M. E., and Lipanov, A. A. (1990) Proc. Natl. Acad. Sci. U. S. A. 87, 867-872 [Abstract]
  20. Wieringa, B. (1994) Human Mol. Genet. 3, 1-7 [Medline] [Order article via Infotrieve]
  21. Bates, G., and Lehrach, H. (1994) BioEssays 16, 277-284 [Medline] [Order article via Infotrieve]
  22. Wang, Y. H., Amirhaeri, S., Kang, S., Wells, R. D., and Griffith, J. (1994) Science 265, 669-671 [Medline] [Order article via Infotrieve]
  23. Jaworski, A., Hsieh, W.-T., Blaho, J. A., Larson, J. E., and Wells, R. D. (1987) Science 238, 773-777 [Medline] [Order article via Infotrieve]
  24. Lukomski, S., and Wells, R. D. (1994) Proc. Natl. Acad. Sci. U. S. A. 91, 9980-9984 [Abstract/Free Full Text]
  25. Zheng, G., Kochel, T., Hoepfner, R. W., Timmons, S. E., and Sinden, R. R. (1991) J. Mol. Biol. 221, 107-129 [CrossRef][Medline] [Order article via Infotrieve]
  26. Rahmouni, A. R., and Wells, R. D. (1992) J. Mol. Biol. 223, 131-144 [Medline] [Order article via Infotrieve]
  27. Ussery, D. W., and Sinden, R. R. (1993) Biochemistry 32, 6206-6213 [Medline] [Order article via Infotrieve]
  28. Kohwi, Y., and Kohwi-Shigematsu, T. (1991) Genes & Dev. 5, 2547-2554
  29. Clark, S. P., Lewis, C. D., and Felsenfeld, G. (1990) Nucleic Acids Res. 18, 5119-5126 [Abstract]
  30. Higgins, G. J., Lokey, L. K., Chastain, J. L., Leiner, H. A., Sherman, S. L., Wilkinson, K. D., and Warren, S. T. (1992) Nature Genet. 2, 186-191 [Medline] [Order article via Infotrieve]
  31. Zhu, Q. S., Heisterkamp, N., and Groffen, J. (1990) Nucleic Acids Res. 18, 7119-7125 [Abstract]
  32. Richards, R. I., Holman, K., Yu, S., and Sutherland, G. R. (1993) Hum. Mol. Genet. 2, 1429-1435 [Abstract]
  33. Gorman, C. M., Moffat, L. F., and Howard, B. H. (1982) Mol. Cell. Biol. 2, 1044-1051 [Medline] [Order article via Infotrieve]
  34. Zacharias, W., Jaworski, A., Larson, J. E., and Wells, R. D. (1988) ProcNatl. Acad. Sci. U. S. A. 85, 7069-7073 [Abstract]
  35. Blaho, J. A., Larson, J. E., McLean, M. J., and Wells, R. D. (1988) J. Biol. Chem. 263, 14446-14455 [Abstract/Free Full Text]
  36. Wells, R. D., Miglietta, J. J., Klysik, J., Larson, J. E., Stirdivant, S. M., and Zacharias, W. (1982) J. Biol. Chem. 257, 10166-10171 [Abstract/Free Full Text]
  37. Ryan, T., Behringer, R. R., Martin, N. C., Townes, T. M., Palmiter, R. D., and Brinster, R. L. (1989) Genes &Dev. 3, 314-323
  38. Maxam, A. M., and Gilbert, W. (1980) Methods Enzymol. 65, 499-560 [Medline] [Order article via Infotrieve]
  39. Gorman, C. M. (1985) in DNA Cloning (Glover D. M., ed) Vol. 2, pp. 143-190, IRL Press, Washington, D. C.
  40. Hirt, B. (1967) J. Mol. Biol. 26, 365-369 [Medline] [Order article via Infotrieve]
  41. Berger, S. L., and Kimmel, A. R. (1987) Methods Enzymol. 152, 182-183
  42. Chirgwin, J. M., Przybyla, A. E, MacDonald, R. J., and Rutter, W. J. (1979) Biochemistry 18, 5294-5299 [Medline] [Order article via Infotrieve]
  43. Casasnovas, J. M., Ellison, M. J., Rodriguez- Campos, A., and Azorin, F. (1987) Eur. J. Biochem. 167, 489-492 [Abstract]
  44. Howard, H., Seidman, M., Howard, B. H., and Gorman, C. M. (1984) Mol. Cell. Biol. 4, 2622-2630 [Medline] [Order article via Infotrieve]
  45. Ruiz-Carrillo, A., and Renaud, J. (1987) EMBO J. 6, 401-407 [Abstract]
  46. Gottlieb, J., and Muzyczka, N. (1988) Mol. Cell. Biol. 6, 2513-2522
  47. Cote, J., Renaud, J., and Ruiz-Carrillo, A. (1989) J. Biol. Chem. 264, 3301-3310 [Abstract/Free Full Text]
  48. Wohlrab, F., Chatterjee, S., and Wells, R. D. (1991) Proc. Natl. Acad. Sci. U. S. A. 88, 6432-6436 [Abstract]
  49. Kozak, M. (1989) Mol. Cell. Biol. 9, 5134-5142 [Medline] [Order article via Infotrieve]
  50. Pelletier, J., and Sonnenberg, N. (1985) Cell 40, 515-526 [Medline] [Order article via Infotrieve]
  51. Wells, R. D., Amirhaeri, S., Blaho, J. A., Collier, D. A., Hanvey, J. C., Jaworski,A., Larson, J. E., Rahmouni, A., Rajagopalan, M., Shimizu, M., Wohlrab, F., and Zacharias, W. (1990) in Structure and Methods (Sarma, R. H., and Sarma, M. H., eds) pp. 25-31, Adenine Press, Schenectady, NY
  52. Kohwi, Y. (1989) Nucleic Acids Res. 17, 4493-4502 [Abstract]
  53. Kohwi-Shigematsu, T., and Kohwi, Y. (1986) Cell 43, 199-206
  54. Boles, T. C., and Hogan, M. E. (1987) Biochemistry 26, 367-376 [Medline] [Order article via Infotrieve]
  55. McCarthy, J. G., and Heywood, S. M. (1987) Nucleic Acids Res. 15, 8069-8085 [Abstract]
  56. Wohlrab, F., McLean, M. J., and Wells, R. D. (1987) J. Biol. Chem. 262, 6407-6416 [Abstract/Free Full Text]
  57. Lewis, C. D., Clark, S. P., Felsenfeld, G., and Gould, H. (1988) Genes & Dev. 2, 863-873
  58. Hapgood, J., and Patterton, D. (1994) Mol. Cell. Biol. 14, 1402-1418 [Abstract]
  59. Ptashne, M., and Gann, A. F. (1990) Nature 346, 329-331 [CrossRef][Medline] [Order article via Infotrieve]
  60. Liu, Z., and Gilbert, W. (1994) Cell 77, 1083-1092 [Medline] [Order article via Infotrieve]
  61. Chien, C. T., Bartel, P. L., Sternglanz, R., and Fields, S. (1991) Proc. Natl. Acad. Sci. U. S. A. 88, 9578-9582 [Abstract]

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.