DNA Secondary Structures and the Evolution of Hypervariable Tandem Arrays*

(Received for publication, August 2, 1996, and in revised form, December 4, 1996)

M. Neale Weitzmann Dagger , Kerry J. Woodford Dagger and Karen Usdin §

From the Section on Genomic Structure and Function, Laboratory of Molecular and Cellular Biology, NIDDK, National Institutes of Health, Bethesda, Maryland 20892-0830

ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES


ABSTRACT

Tandem repeats are ubiquitous in nature and constitute a major source of genetic variability in populations. This variability is associated with a number of genetic disorders in humans including triplet expansion diseases such as Fragile X syndrome and Huntington's disease. The mechanism responsible for the variability/instability of these tandem arrays remains contentious. We show here that formation of secondary structures, in particular intrastrand tetraplexes, is an intrinsic property of some of the more unstable arrays. Tetraplexes block DNA polymerase progression and may promote instability of tandem arrays by increasing the likelihood of reiterative strand slippage. In the course of doing this work we have shown that some of these tetraplexes involve unusual base interactions. These interactions not only generate tetraplexes with novel properties but also lead us to conclude that the number of sequences that can form stable tetraplexes might be much larger than previously thought.


INTRODUCTION

Tandemly repeated DNA sequences are distributed widely in nature and may constitute as much as 10% of the human genome (1). They are sometimes referred to as satellites, minisatellites, or microsatellites, depending on their repeat size or array length. Polymorphic tandem repeats are also sometimes referred to as hypervariable repeats (HVRs)1 or variable number of tandem repeats. Instability of some of these tandem arrays has been implicated in a number of disease states including the so-called triplet expansion diseases (2) such as Fragile X syndrome, one of the most frequent single gene disorders and the second most common genetic cause of mental retardation (3).

The nature of the evolutionary forces that act to create and maintain these tandem arrays has been the subject of much debate (1, 4-12). Processes such as unequal crossing over during recombination (13) and strand slippage during replication (14, 15) have been invoked as potential mechanisms for both the generation of these tandem arrays and for the variability that is sometimes associated with these sequences. This variability is of two sorts. Tandem arrays can show length changes due to the gain and loss of repeat units. These changes tend to occur at one end of the array, and for this reason are said to show polarity. Tandem arrays are also prone to the acquisition of point mutations, and the distribution of these mutations shows a similar polarity (9, 12, 16, 17). This has led to the suggestion that either flanking sequences are important in imparting polarity to an otherwise non-polar process (12) or a mechanism that has an inherent polarity such as replication slippage (16) is involved. However, many of the most hypervariable arrays show a many-fold increase in repeat number that is thought to take place within the space of only a few cell divisions (18). Such a large increase in repeat number cannot be accomplished by a single strand slippage or recombinational event, and it has been suggested that in such cases some specialized mutational mechanism must be active (19, 20).

Many hypervariable sequences that have been described are G + C-rich and show a strand asymmetry in that one strand is predominantly G-rich and the other C-rich (21). It had been suggested that these sequences contained a chi -like sequence that could account for the observed variability by promoting recombination (10). However, many of the more recently identified hypervariable sequences lack a discernible chi -like motif. We had previously found that a hypervariable sequence, the CGG repeat in the human FMR1 gene that undergoes triplet expansion to result in Fragile X syndrome (22, 23), forms a series of intrastrand tetraplexes at physiological temperatures, pH, and ionic strengths (24). This occurs despite the fact that this sequence was one-third Cs, and this C-richness would be expected to reduce tetraplex stability. We have now tested a series of other highly hypervariable tandem repeats (Table I) for the ability to form intrastrand tetraplexes using a K+-dependent arrest of DNA synthesis assay that we have recently developed (25). These sequences are also G + C-rich but, like the CGG-repeat at the FMR1 locus, contain a number of non-G bases. We have found that the ability to form intrastrand tetraplexes is a shared property of all of these sequences. This, together with the observation that other hypervariable tandem arrays form hairpins (24, 26-32), or triplexes (33), supports the idea that DNA secondary structure may play a major role in the generation and evolution of tandem arrays.

Table I.

Hypervariable sequences used in this study


Description  Organism Ref. Repeat unit Sequence tested

Ms6-hm locus Mice 72 CAGGG (CAGGG)8
(TGG)n Humans 50 TGG (TGG)20
D4S43 locus Humans 51 G4AG5AAGA GGAG5AAGA(G4AG5AAGA)2
Insulin-linked HVR Humans 73 ACAGGGGTGTGGGG (ACAGGGGTGTGGGG)4


MATERIALS AND METHODS

Clone Construction

Oligonucleotides containing hypervariable repeat units were synthesized on an ABI 381A oligonucleotide synthesizer using standard phosphoramidite chemistry and cloned into the plasmid pMS189Delta as described previously (24, 34). Plasmids were replicated in Escherichia coli MBM7070, isolated by alkaline lysis, and purified by CsCl gradient centrifugation according to standard procedures.

Intrastrand Tetraplex Assay

Hypervariable sequences were tested for the ability to block DNA synthesis reactions as follows (25). Sequencing primer was phosphorylated with [gamma -32P]ATP (DuPont NEN, 3000-6000 Ci/mmol) using T4 polynucleotide kinase (Epicentre Technologies, Inc.), and a buffer containing 50 mM Tris-HCl, pH 8.0, and 10 mM MgCl2. Reaction mixtures (total volume 6 µl) contained 0.2-2 nM template, 0.16 nM of the primer SupFR4 (5'-ATGCTTTTACTGGCCTGCT-3'), 10 µM dNTPs, one of the following dideoxynucleotides at the concentration indicated in parentheses: ddATP (0.3 mM), ddGTP (0.017 mM), ddCTP (0.2 mM), ddTTP (0.6 mM), 50 mM Tris-HCl, pH 9.3, 2.5 mM MgCl2, 5 units of Taq polymerase (Life Technologies, Inc.), and where indicated 50 mM monovalent cation. Reaction mixtures were subjected to 30 rounds of heating and cooling in a Perkin-Elmer PCR machine for 30 s at 95 °C, 30 s at 55 °C, and 30 s at 72 °C. The reaction was terminated by the addition of one-half volume of stop buffer containing 95% (v/v) formamide, 10 mM EDTA, pH 9.5, 10 mM NaOH, 0.1% xylene cyanol, and 0.1% bromphenol blue, and the mixtures were heated at 90 °C for 5 min prior to electrophoresis on a 6.5% polyacrylamide sequencing gel. The sequence located between the sequencing primer and the repeat on the template strand is 5'-CTCGAGTCAACGTAACACTTTACAGCGGCGCGTCATTTGATATGATGCGCCCCGCTTCCCGATAAGGG-3'.

Preparation of 7-Deazaguanine Templates by PCR

Templates containing guanine or 7-deazaguanine were prepared by PCR amplification of plasmids containing the HVR of interest using the primers AMP2 (5'-GGCGACACGGAAATGTTGAA-3') and supFR1 (5-GATCGAATTCGTCGACATGGTGGTGGGGGAA-3') which flank the HVR. The primer binding sites are located about 500 bases apart, the precise distance depending on the template, with the repeat being located about halfway between the two primer binding sites. Reaction mixtures containing 10 ng of plasmid template DNA containing the repeat of interest: 1 µM each of AMP2 and supFR1; 2.5-5 units of Taq polymerase (Life Technologies, Inc.); 50 mM Tris-HCl, pH 8.0; 10 mM MgCl2; 100 or 160 µM each of dATP, dTTP, dCTP, and either dGTP or 7-deaza-dGTP were prepared. They were then overlaid with a drop of mineral oil and subjected to 30 rounds of heating and cooling in a Perkin-Elmer PCR machine for 30 s at 95 °C, 30 s at 55 °C, and 30 s at 72 °C. The PCR products were purified on a 5% polyacrylamide gel and used as templates in the tetraplex assay described above.

DMS Protection Assays

Dimethyl sulfate (DMS) protection assays were performed on gel-purified oligonucleotides using the method of Williamson et. al. (35) with slight modifications. End-labeled oligonucleotide (1-5 ng per reaction) was resuspended in 18 µl of TE buffer and heated for 1 min at 90 °C. Potassium chloride (1 µl) was added to appropriate tubes to a final concentration of 50 mM. Reactions were then heated for 30 s at 95 °C, 30 s at 55 °C, and 30 s at 72 °C, cooled to room temperature, and reacted for 1 min with 1 µl of DMS (diluted 1:5 in water). Reactions were terminated by addition of 20 µl of 2 M pyrrolidine (diluted in cold water) and cleavage effected at 90 °C for 10 min. Samples were precipitated twice with 1.2 ml of butan-1-ol. The samples were dried under vacuum, redissolved in 20 µl of 42.5% (v/v) formamide, 5 mM EDTA, pH 9.5, 5 mM NaOH, 0.05% xylene cyanol, 0.05% bromphenol blue, denatured for 5 min at 90 °C, and run on a 20% sequencing gel. Gels were covered with plastic wrap and exposed to x-ray film overnight at -20 °C.


RESULTS

Intrastrand tetraplexes form when four G-rich motifs on a single strand interact to form a series of tetrads (36-39). A series of stacked tetrads creates a hollow stem or cylinder. This stem is bounded by three loops formed by bases between the G-rich regions (L1, L2, and L3 in Fig. 1). We have recently developed a highly sensitive and specific technique for the identification of sequences that can form intrastrand DNA tetraplexes (25, 34, 40). This assay, illustrated in Fig. 1, is based on the ability of such sequences to block DNA polymerase progression in the presence of K+ but not in the absence of monovalent cations or in the presence of cations such as Li+, NH4+, Rb+, or Cs+. The specificity of this reaction for K+ is probably related to the fact that its ionic radius is small enough for the ion to fit inside the tetraplex cavity but is still large enough for it to interact with the keto oxygens of guanines in adjacent tetrads (41). This K+ specificity parallels the K+-dependent anomalous mobility of tetraplex-forming oligonucleotides that is considered a diagnostic feature of tetraplex formation (35, 42, 43). Our assay is simple to use and has the advantage of allowing multiple tetraplexes to be discerned in a mixture of such structures or for tetraplexes to be identified even when they are formed by only a small fraction of molecules in the solution.


Fig. 1. The K+-dependent block to DNA synthesis assay for tetraplex formation. Diagrammatic representation of the tetraplex arrest assay on a template containing a generic intrastrand tetraplex containing five G4 tetrads (shown as gray parallelograms). The loops L1, L2, and L3 each contain three unspecified bases (N). DNA synthesis starts 3' of the tetraplex-forming region and proceeds in a 5' to 3' direction toward the tetraplex. The front end of the polymerase is represented by the diagonally striped bullet and the nascent DNA strand by the dashed line. The site of premature chain termination that would result from the formation of the tetraplex on the template strand is indicated by the filled arrow. Inset, a G4 tetrad with a K+ ion situated within the tetrad cavity (not to scale).
[View Larger Version of this Image (29K GIF file)]


One of the most unstable loci thus far identified in any organism is the mouse minisatellite locus Ms6-hm, which has a germ line mutation rate of 2.5% per gamete and which shows frequent intergenerational changes of a kilobase or more (44). This locus contains from 200 to >1000 repeats of the pentamer 5'-CAGGG-3'. A template containing eight CAGGG repeats was tested for the ability to form a K+-dependent block to DNA synthesis. Two distinct non-dideoxynucleotide-mediated chain termination products are seen at the 3' end of the repeat tract in the presence of 50 mM KCl when the G-rich strand is used as a template (Fig. 2). The more prominent of the two products (filled arrow) corresponds to a block to DNA synthesis just 3' of the first G residue in the first 5'-CAGGG-3' repeat on the template. The second product (open arrow) corresponds to premature chain termination one base 3' of this one. A series of weaker stops are seen at corresponding positions in the next four repeats. A smaller amount of premature termination is also observed in the presence of 50 mM NaCl, but none is observed in the absence of cation or in the presence of LiCl, RbCl, CsCl, or NH4Cl. Since metal binding sites on a hairpin are equally accessible to all cations, and the affinity of cations for binding sites on DNA decreases slightly with increasing metal ion radius (45), the cation specificity is inconsistent with the blocks being due to hairpin formation. No block to DNA synthesis is seen when the complementary strand is used as a template (Fig. 2, right panel) or when single-stranded phage DNA is used as a template (data not shown), ruling out structure triplexes that involve interactions between the template and its complementary strand (46). Arrest of DNA synthesis is seen when these repeats are cloned into other vectors (data not shown), indicating that flanking sequences are not involved. Blockage is also independent of template concentration over a wide range (data not shown) indicating that the blocks do not involve interactions between two or more template strands but are due to the formation of intrastrand structures.


Fig. 2. Tetraplex assay of the Ms6-hm locus. The assay was carried out on the G-rich and the C-rich strands, in the presence of the indicated monovalent cations as described under "Materials and Methods." The lane markers T, C, G, and A indicate the bases on the template strand. The bracket on the left side of the figure indicates the tandem repeat, and the open and filled arrows mark the positions of sites of the major monovalent cation-dependent sites of premature chain termination.
[View Larger Version of this Image (95K GIF file)]


The properties of both the Na+- and the K+-dependent DNA synthesis arrest sites including the position of the blocks to DNA synthesis, the template concentration independence, and the strand specificity, are most consistent with intrastrand tetraplex formation. The major stop reflects the most stable tetraplex(es) involving the maximum number of repeats. The less prominent stops at subsequent repeats reflect a series of tetraplexes that presumably involve a smaller number of repeats. In addition to these monovalent cation-dependent stops, a smaller amount of cation-independent premature chain termination is seen at the second G of every repeat. These stops are even more marked in both guanine and 7-deazaguanine containing linear templates (Fig. 3), and this is paralleled by a hypersensitivity of that G to methylation by DMS (see Fig. 4). We hypothesize that these phenomena may be related to a conformational peculiarity of the DNA backbone of this region.


Fig. 3. Tetraplex assay of the Ms6-hm locus on templates containing 7-deazaguanine. The Ms6-hm HVR was assayed for tetraplex formation using PCR-generated templates containing either guanine or 7-deazaguanine as described under "Material and Methods." The assay was conducted in the absence of added monovalent cation (0), in the presence of 50 mM K+ (KCl) or in the presence of 50 mM Na+ (NaCl). The lane markers T, C, G, and A indicate the bases on the template strand. The brackets alongside the gel indicate the extent of each tandem array. The filled arrow mark the first major K+-dependent block to DNA synthesis, with the second stop marked by an open arrow.
[View Larger Version of this Image (136K GIF file)]



Fig. 4. Dimethyl sulfate protection assay of the Ms6-hm HVR. Oligonucleotides containing the Ms6-hm HVR were treated with DMS in the presence and absence of K+ as described under "Materials and Methods." The bracket demarcates the HVR. The solid vertical line on the left indicates the region of DMS protection. Solid arrows represent DMS-reactive bases. The asterisk indicates the G located outside the HVR that serves as a reference base for comparison of the DMS reactivity of bases within the HVR with and without K+.
[View Larger Version of this Image (22K GIF file)]


To confirm that polymerase arrest in the presence of K+ and Na+ is related to tetraplex formation, the polymerase chain reaction (PCR) was used to generate templates containing either guanine or 7-deazaguanine. These templates were then tested for the ability to cause K+/Na+-dependent DNA synthesis arrest. Since 7-deazaguanine cannot act as an N7 donor needed to form G tetrads, substitution of all guanine residues with 7-deazaguanine should abolish the K+/Na+-dependent polymerase blocks. As can be seen in Fig. 3, this is precisely what happens. The PCR template in which all the Gs have been replaced by 7-deazaguanine have lost all the K+/Na+-dependent blocks to DNA synthesis, whereas the PCR template containing guanines produced the same blocks to DNA synthesis seen on the circular templates (Fig. 3).

DMS treatment of an oligonucleotide containing the HVR was also carried out. Since Gs involved in tetrads do not have their N7 positions exposed, they are protected from modification by DMS. In theory, Gs in tetrads are completely protected from DMS, whereas Gs in the loops of the tetraplex that are not involved in intraloop or interloop interactions should be DMS-reactive (24, 48). In practice, the picture is not always so clear, and this represents a very real limitation on the value of this technique. For example, if a tetraplex is not very stable and is formed by only a small fraction of the molecules in the population, this may produce a pattern of DMS modification in which only partial protection of Gs is apparent. In addition, many tetraplex-forming sequences show conformational complexity that can complicate DMS data interpretation, since a base protected in one structure may be exposed in another. Since the fraction of molecules in the population that form a K+-dependent block to DNA synthesis in the case of the mouse Ms6-hm HVR is small, we would expect to see some DMS protection, but this protection would not be complete. This is in fact the case (Fig. 4). After normalizing the K+ and K+-free reactions to a G outside of the HVR (indicated by an asterisk in Fig. 4) we can see that Gs within the HVR show less DMS reactivity when K+ is present than when it is absent. While not definitive, these data are consistent with our other data and support the idea that the mouse Ms6-hm HVR is capable of tetraplex formation.

Why a Na+-induced polymerase block is seen only with this sequence and not other tetraplexes we have tested (24, 25, 34, 47) is not clear, but preliminary evidence suggests that it is related to the involvement of adenines in the structure since the sequence (CTGGG)12 shows K+-dependent but not Na+-dependent DNA polymerase arrest (data not shown). However, the mere presence of adenines is not sufficient to elicit a Na+ stop since not all A containing templates show such stops (Fig. 5). Rather we believe the Na+ effect is related to a specific hydrogen bonding interaction in which As are involved. The molecular basis of the Na+ effect is currently under investigation.


Fig. 5. Tetraplex formation by different HVRs. Plasmids containing (TGG)20 (A), 2.5 copies of the sequence 5'-GGGGAGGGGGAAGA-3' from the human D4S43 locus (B), and the sequence (ACAGGGGTGTGGGG)4 from the human insulin-linked HVR (C) were tested for tetraplex formation as described under "Materials and Methods." The lane markers T, C, G, and A indicate the bases on the template strand. The brackets indicate the extent of each tandem array. The position of the major K+-dependent blocks to DNA synthesis are indicated by black lines. The dashed line in the case of the D4S43 HVR marks the position of a monovalent cation-independent arrest site. The open circle in A marks the position of a monovalent cation independent arrest site seen only on linear templates.
[View Larger Version of this Image (57K GIF file)]


Tandem arrays of the repeat 5'-TGG-3' are polymorphic (49), as are a mixture of the triplets AGG and TGG (50). As with the mouse Ms6-hm minisatellite, we found that a template containing (TGG)20 blocked DNA synthesis in a K+-dependent manner (Fig. 5A), producing a series of premature chain termination products corresponding to arrest opposite the T residues of repeats 13-20 in the (TGG)20 tract. No blocks are seen when the complementary pyrimidine-rich strand was used as template (Fig. 5A). The blocks to DNA synthesis disappear when 7-deazaguanine is incorporated into the template strand (Fig. 5A). A single novel weak stop (open circle) is observed at the second guanine base in repeat 20 on PCR templates containing 7-deazaguanine. This stop is also seen in PCR templates containing guanines and is not dependent on monovalent cation since it is seen in the absence of KCl (data not shown). Since this stop is unique to the PCR templates, is not affected by substitution of Gs by 7-deazaguanine, and is not related to the presence of K+, we presume that it reflects some aspect of the linear templates that is not related to tetraplex formation. Most of the guanines in the TGG repeat are also either fully or partially protected from methylation by DMS (Fig. 6, left panel), consistent with tetraplex formation.


Fig. 6. Dimethyl sulfate protection assays of the (TGG)20, the D4S43 HVR, and the insulin HVR. Oligonucleotides containing repeats of (TGG)20, the D4S43 HVR, and the insulin HVR were treated with DMS in the presence and absence of K+ as described under "Materials and Methods." The brackets on the right side of each panel demarcate each HVR. The solid vertical lines on the left side of each panel represent regions of strong DMS protection. Solid arrows mark DMS-reactive bases. The asterisks mark reference Gs outside the HVRs that are used in comparisons between reactions carried out in the presence and absence of K+.
[View Larger Version of this Image (60K GIF file)]


We have previously shown that a (CGG)20 tract blocks DNA synthesis in a similar manner producing eight premature chain termination products opposite C residues at the 3' end of the CGG tract (24). The similarity in both the pattern of polymerase arrest and DMS protection leads us to think that the tetraplexes formed by these sequences could be very similar. Such tetraplexes may contain G4 tetrads interspersed with pyrimidines or a smaller number of G4 tetrads interspersed with a mixture of Gs and either T or C. We have previously shown that an AGG triplet does not destabilize a CGG-containing tetraplex (24). It is therefore reasonable to assume then that a mixture of AGGs and TGGs would also form a tetraplex.

We also tested repeats with the sequence 5'-GGGGAGGGGGAAGA-3'. Between 1 and 22 repeats of this unit are found upstream of the Huntington's disease gene in humans (51). A template containing 2.5 repeats of this sequence produces a complex pattern of premature chain terminations. There is at least one strong strand-specific K+-dependent block to DNA synthesis and a number of other more minor ones. A small amount of monovalent cation-independent polymerase arrest is seen at the 3' end of the D4S43 tract. This may be due either to the formation of a small amount of tetraplex in the absence of monovalent cation or the formation of another structure such as a hairpin that forms independently of added monovalent cation. A significant amount of monovalent cation-independent arrest is seen in the middle of this tract (indicated by the dashed line in Fig. 5B). This block is consistent with triplex formation between the G-rich template and the nascent strand (52). Any or all of these blocks to DNA synthesis could explain the difficulties reported in amplifying this region by PCR and the observation that incorporation of 7-deaza-dGTP is able to correct this problem (51). Once again, the K+-dependent blocks disappear when other monovalent cations are substituted for K+, or when K+ is omitted, and no K+-dependent stops are seen when the complementary pyrimidine-rich strand is used as a template.

Substitution of guanines in the template with 7-deazaguanine eliminates the K+-dependent blocks to DNA synthesis (Fig. 5B). The K+-independent polymerase arrest observed midway through the sequence is also eliminated, supporting the hypothesis that this stop may represent a purine:purine:pyrimidine triplex formed between the template and the nascent strand produced in the assay. This HVR shows a pattern of DMS modification with alternating regions of DMS protection and DMS reactivity in the presence of K+ (Fig. 6). This contrasts with the almost uniform reactivity of Gs in the absence of K+. Some of the most protected bases show a DMS reactivity indistinguishable from background. Both the 7-deazaguanine substitution data and the DMS protection data are thus consistent with tetraplex formation.

Four repeats from the type I diabetes-linked hypervariable region in the human insulin promoter also produce a number of K+-dependent blocks to DNA synthesis consistent with an array of different tetraplexes (Fig. 5C). These blocks are eliminated by substitution of guanine with 7-deazaguanine and are not observed on the complementary pyrimidine-rich strand. A number of Gs in the HVR are as reactive with DMS as a reference base outside the repeat (indicated with an asterisk in Fig. 6, right panel). These Gs are separated by regions of protected Gs in which no reactivity can be seen above background. Based on indirect evidence from gel electrophoretic mobility assays, and using enzymatic and chemical probes, it had been suggested that this region is able to form a series of intramolecular tetraplexes (43, 53, 54). Our data support this claim.


DISCUSSION

Our observations suggest that the ability to form an intrastrand tetraplex in vitro is a common feature of a number of hypervariable sequences including the mouse minisatellite at the Ms6-hm locus which is one of the most hypervariable sequences thus far described (44). The tetraplex formed by the repeats in the Ms6-hm tandem array is unusual in that it can be stabilized by Na+ as well as K+, albeit with lower efficacy. This contrasts with our observations that all other tetraplexes that we have tested are seen only in the presence of K+ (24, 25, 34, 40, 47). Since the ionic radius of Na+ is smaller than that of K+, it may be that the Ms6-hm tetraplex has smaller internal dimensions than the other previously described tetraplexes. This interpretation is consistent with the fact that other monovalent cations such as Rb+, Cs+, and NH4+ do not result in a block to DNA synthesis in our assay, since these ions have radii that are all larger than that of K+. Li+, on the other hand, is much smaller than Na+ and may still be too small to form the coordination complex that is important in stabilizing these types of structures (41). Our assay might thus be useful in distinguishing between different kinds of tetraplexes such as those that are K+-specific and that correspond to previously described G4 tetrad containing tetraplexes and those that are also seen in the presence of other cations, specifically Na+, that may represent a novel class of tetraplex with different base interactions and thus different properties.

Since we have shown previously that the amount of K+ used in this assay represents saturating amounts of cation for tetraplex formation (24), it is likely therefore that the same pattern of polymerase pausing/tetraplex formation would be seen at physiological [K+] which typically is around 150 mM in mammalian cells (55). Tetraplex formation in vivo would require these regions to be transiently unpaired at some time. This might occur during DNA replication or on extrusion from otherwise duplex molecules (53, 56) any time during the cell cycle. In eukaryotic cells it is thought that only relatively small regions of DNA are unpaired during replication, although it has been suggested that many hundreds of bases can be unpaired under certain circumstances (57). Direct evidence for an altered structure in vivo has been obtained for one of these sequences, that of the human insulin HVR (58), suggesting that formation of DNA tetraplexes by the hypervariable sequences described here might in fact be possible. The fact that a variety of tetraplex-binding proteins have been isolated from eukaryote cells (59-65) supports the idea that tetraplexes can form in vivo. The HVRs we have tested are much shorter than those actually found at their specified loci on chromosomes. Therefore not only could the number of potential tetraplexes at these loci be much larger, but the stability of these tetraplexes would be significantly higher as well.

A variety of other tandem repeats have been shown to form fold-back structures. These include the 5'-CAG-3' repeat that is unstable in triplet expansion diseases such as Huntington's disease and myotonic dystrophy (26, 28, 29, 31) and the centromeric satellite sequence (27). Other simple satellites such as the A + T-rich hypervariable sequence in the 3' region of the human apolipoprotein B gene (66) also have the potential to form cruciforms and hairpins. Some G + C-rich repeats may also form other unusual DNA structures such as triplexes (33).

In the strand slippage models for the generation and evolution of tandem arrays, the nascent strand dissociates from the template, allowing the two strands to slip relative to one another. Successful priming from the slipped position results in a change in repeat number. Factors that favor strand dissociation over polymerization or that stabilize a slipped nascent strand-template complex would be expected to affect the frequency with which repeat units are added to or lost from the array. Blocks to DNA synthesis, such as those resulting from tetraplex formation, would be expected to increase the likelihood that strand slippage would occur. Since the strongest blocks to DNA synthesis are encountered at the 3' end of such an array, these structures would account for the polarity observed for the gain and loss of repeat units from tandem arrays (12, 16, 67). In addition, since polymerase pause sites are known to be hotspots for nucleotide misinsertions (68), such blocks could also explain the clustering of point mutations at one end of the array (12, 16, 67).

One model that attempts to explain the large scale increase in repeat number seen in some tandem arrays invokes a long lived block to DNA synthesis that induces repeat strand slippage during replication (20). Tetraplexes make compelling candidates for this long lived block since they form strong, stable blocks to DNA synthesis under physiological conditions (24, 25, 34). We have shown that even very long hairpins are not effective barriers to DNA polymerase in our assay (see Ref. 47 and Woodford et al.2), which suggests that sequences that are only able to form hairpins may not arrest DNA synthesis. This would be consistent with in vivo observations (69). However, both tetraplexes and hairpins may act to increase the frequency of successful strand slippage by stabilizing the strand slippage intermediate, thus increasing the likelihood that reinitiation of the polymerase would occur from the slipped position.

In addition, we would expect that the intramolecular tetraplex-forming tandem arrays are also likely to form intermolecular tetraplexes involving either one or three other DNA strands (70). Formation of such structures may facilitate synapsis of the DNA strands prior to crossing over during recombination. A combination of enhanced pausing at intrastrand tetraplexes, and enhanced synapsis between strands from different chromosomes or chromatids, may promote instability by facilitating strand switching.

It is possible that the formation of secondary structures in general may contribute to the generation and evolution of tandem arrays. In this regard, we would expect that the likelihood of structure formation would be affected by a variety of factors including the nature of the flanking sequences, the local chromatin structure, the transcriptional activity of a region, the rate of replication through the tandem array, the size of individual nucleotide pools, and whether or not the secondary structure-forming sequence is in the leading or lagging strand of DNA synthesis (71).


FOOTNOTES

*   The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Dagger    These authors have contributed equally to this work.
§   To whom correspondence should be addressed: Bldg. 8, Rm. 202, National Institutes of Health, 8 Center Dr. MSC 0830, Bethesda, MD 20892-0830. Tel.: 301-496-2189; Fax: 301-402-0240; E-mail: ku @helix.nih.gov.
1   The abbreviations used are: HVRs, hypervariable repeats; PCR, polymerase chain reaction; DMS, dimethyl sulfate; dd, dideoxy.
2   K. J. Woodford, M. N. Weitzmann, and K. Usdin, unpublished observations.

ACKNOWLEDGEMENTS

We thank Drs. Anthony Furano and Herbert Tabor for critical reading of this manuscript and for their advice and support.


REFERENCES

  1. Willard, H. F. (1989) Genome 31, 737-744 [Medline] [Order article via Infotrieve]
  2. Caskey, C. T., Pizzuti, A., Fu, Y. H., Fenwick, R. J., and Nelson, D. L. (1992) Science 256, 784-789 [Medline] [Order article via Infotrieve]
  3. Nussbaum, R. L., and Ledbetter, D. H. (1986) Annu. Rev. Genet. 20, 109-145 [CrossRef][Medline] [Order article via Infotrieve]
  4. Stephan, W., and Cho, S. (1994) Genetics 136, 333-341 [Abstract/Free Full Text]
  5. Stephan, W. (1989) Mol. Biol. Evol. 6, 198-212 [Abstract]
  6. Weber, J. L., and Wong, C. (1993) Hum. Mol. Genet. 2, 1123-1128 [Abstract]
  7. Willard, H. F. (1991) Curr. Opin. Genet. & Dev. 1, 509-514 [Medline] [Order article via Infotrieve]
  8. Armour, J. A., Wong, Z., Wilson, V., Royle, N. J., and Jeffreys, A. J. (1989) Nucleic Acids Res. 17, 4925-4935 [Abstract]
  9. Armour, J. A., Harris, P. C., and Jeffreys, A. J. (1993) Hum. Mol. Genet. 2, 1137-1145 [Abstract]
  10. Jeffreys, A. J., Wilson, V., and Thein, S. L. (1985) Nature 314, 67-73 [Medline] [Order article via Infotrieve]
  11. Jeffreys, A. J., Royle, N. J., Wilson, V., and Wong, Z. (1988) Nature 332, 278-281 [CrossRef][Medline] [Order article via Infotrieve]
  12. Jeffreys, A. J., Tamaki, K., MacLeod, A., Monckton, D. G., Neil, D. L., and Armour, J. A. (1994) Nat. Genet. 6, 136-145 [Medline] [Order article via Infotrieve]
  13. Smith, G. P. (1974) Cold Spring Harbor Symp. Quant. Biol. 38, 507-513 [Medline] [Order article via Infotrieve]
  14. Streisinger, G., Okada, Y., Emrich, J., Newton, J., Tsugita, A., Terzaghi, E., and Inouye, M. (1966) Cold Spring Harbor Symp. Quant. Biol. 31, 77-84 [Medline] [Order article via Infotrieve]
  15. Levinson, G., and Gutman, G. A. (1987) Mol. Biol. Evol. 4, 203-221 [Abstract]
  16. Kunst, C. B., and Warren, S. T. (1994) Cell 77, 853-861 [Medline] [Order article via Infotrieve]
  17. Snow, K., Tester, D. J., Kruckeberg, K. E., Schaid, D. J., and Thibodeau, S. N. (1994) Hum. Mol. Genet. 3, 1543-1551 [Abstract]
  18. Wöhrle, D., Hennig, I., Vogel, W., and Steinbach, P. (1993) Nat. Genet. 4, 140-142 [CrossRef][Medline] [Order article via Infotrieve]
  19. Richards, R. I., and Sutherland, G. R. (1994) Nat. Genet. 6, 114-116 [Medline] [Order article via Infotrieve]
  20. Wells, R. D., and Sinden, R. R. (1993) in Genome Rearrangement and Stability (Davies, K. E., and Warren, S. T., eds), 1st Ed., Vol. 7, pp. 107-138, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
  21. Jarman, A. P., and Wells, R. A. (1989) Trends Genet. 5, 367-371 [Medline] [Order article via Infotrieve]
  22. Fu, Y. H., Kuhl, D. P., Pizzuti, A., Pieretti, M., Sutcliffe, J. S., Richards, S., Verkerk, A. J., Holden, J. J., Fenwick, R. G., Warren, S. T., Oostra, B. A., Nelson, D. L., and Caskey, C. T. (1991) Cell 67, 1047-1058 [Medline] [Order article via Infotrieve]
  23. Verkerk, A. J., Pieretti, M., Sutcliffe, J. S., Fu, Y. H., Kuhl, D. P., Pizzuti, A., Reiner, O., Richards, S., Victoria, M. F., Zhang, F. P., Eussen, B. E., van Ommen, G. B., Blonden, L. A. J., Riggins, G. J., Chastain, J. L., Kunst, C. B., Galjaard, H., Caskey, C. T., Nelson, D. L., Oosta, B. A., and Warren, S. T. (1991) Cell 65, 905-914 [Medline] [Order article via Infotrieve]
  24. Usdin, K., and Woodford, K. J. (1995) Nucleic Acids Res. 23, 4202-4209 [Abstract]
  25. Weitzmann, M. N., Woodford, K. J., and Usdin, K. (1996) J. Biol. Chem. 271, 20958-20964 [Abstract/Free Full Text]
  26. Chen, X., Mariappan, S. V., Catasti, P., Ratliff, R., Moyzis, R. K., Laayoun, A., Smith, S. S., Bradbury, E. M., and Gupta, G. (1995) Proc. Natl. Acad. Sci. U. S. A. 92, 5199-5203 [Abstract]
  27. Ferrer, N., Azorin, F., Villasante, A., Gutierrez, C., and Abad, J. P. (1995) J. Mol. Biol. 245, 8-21 [CrossRef][Medline] [Order article via Infotrieve]
  28. Gacy, A. M., Goellner, G., Juranic, N., Macura, S., and McMurray, C. T. (1995) Cell 81, 533-540 [Medline] [Order article via Infotrieve]
  29. Mitas, M., Yu, A., Dill, J., Kamp, T. J., Chambers, E. J., and Haworth, I. S. (1995) Nucleic Acids Res. 23, 1050-1059 [Abstract]
  30. Mitas, M., Yu, A., Dill, J., and Haworth, I. S. (1995) Biochemistry 34, 12803-12811 [Medline] [Order article via Infotrieve]
  31. Mitchell, J. E., Newbury, S. F., and McClellan, J. A. (1995) Nucleic Acids Res. 23, 1876-1881 [Abstract]
  32. Yu, A., Dill, J., Wirth, S. S., Huang, G., Lee, V. H., Haworth, I. S., and Mitas, M. (1995) Nucleic Acids Res. 23, 2706-2714 [Abstract]
  33. Brereton, H. M., Firgaira, F. A., and Turner, D. R. (1993) Nucleic Acids Res. 21, 2563-2569 [Abstract]
  34. Woodford, K. J., Howell, R. M., and Usdin, K. (1994) J. Biol. Chem. 269, 27029-27035 [Abstract/Free Full Text]
  35. Williamson, J. R., Raghuraman, M. K., and Cech, T. R. (1989) Cell 59, 871-880 [Medline] [Order article via Infotrieve]
  36. Gellert, M., Lipsett, M. N., and Davies, D. R. (1962) Proc. Natl. Acad. Sci. U. S. A. 48, 2013-2018 [Medline] [Order article via Infotrieve]
  37. Zimmerman, S. B., Cohen, G. H., and Davies, D. R. (1975) J. Mol. Biol. 92, 181-192 [Medline] [Order article via Infotrieve]
  38. Sen, D., and Gilbert, W. (1988) Nature 334, 364-366 [CrossRef][Medline] [Order article via Infotrieve]
  39. Sundquist, W. I., and Klug, A. (1989) Nature 342, 825-829 [CrossRef][Medline] [Order article via Infotrieve]
  40. Woodford, K. J., Weitzmann, M. N., and Usdin, K. (1995) Nucleic Acids Res. 23, 539 [Medline] [Order article via Infotrieve]
  41. Pinnavaia, T. J., Marshall, C. L., Mettler, C. M., Fisk, C. L., Miles, H. T., and Becker, E. D. (1978) J. Am. Chem. Soc. 100, 3625-3627
  42. Murchie, A. I., and Lilley, D. M. (1994) EMBO J. 13, 993-1001 [Abstract]
  43. Hammond-Kosack, M. C., and Docherty, K. (1992) FEBS Lett. 301, 79-82 [CrossRef][Medline] [Order article via Infotrieve]
  44. Kelly, R., Gibbs, M., Collick, A., and Jeffreys, A. J. (1991) Proc. R. Soc. Lond. B Biol. Sci. 245, 235-245 [Medline] [Order article via Infotrieve]
  45. Ross, P. D., and Scruggs, R. L. (1964) Biopolymers 2, 79
  46. Dayn, A., Samadashwily, G. M., and Mirkin, S. M. (1992) Proc. Natl. Acad. Sci. U. S. A. 89, 11406-11410 [Abstract]
  47. Howell, R. M., Woodford, K. J., Weitzmann, M. N., and Usdin, K. (1996) J. Biol. Chem. 271, 5208-5214 [Abstract/Free Full Text]
  48. Howell, R., and Usdin, K. (1997) Mol. Biol. Evol. 14, 144-155 [Abstract]
  49. Nurnberg, P., Roewer, L., Neitzel, H., Sperling, K., Popperl, A., Hundrieser, J., Poche, H., Epplen, C., Zischler, H., and Epplen, J. T. (1989) Hum. Genet. 84, 75-78 [Medline] [Order article via Infotrieve]
  50. Armour, J. A., Crosier, M., Malcolm, S., Chan, J. C., and Jeffreys, A. J. (1995) Proc. R. Soc. Lond. B Biol. Sci. 261, 345-349 [Medline] [Order article via Infotrieve]
  51. Horn, G. T., McClatchey, A. I., Richards, B., MacDonald, M. E., and Gusella, J. F. (1991) Nucleic Acids Res. 19, 4772 [Medline] [Order article via Infotrieve]
  52. Baran, N., Lapidot, A., and Manor, H. (1991) Proc. Natl. Acad. Sci. U. S. A. 88, 507-511 [Abstract]
  53. Hammond-Kosack, M. C., Dobrinski, B., Lurz, R., Docherty, K., and Kilpatrick, M. W. (1992) Nucleic Acids Res. 20, 231-236 [Abstract]
  54. Hammond-Kosack, M. C., Kilpatrick, M. W., and Docherty, K. (1993) J. Mol. Endocrinol. 10, 121-126 [Abstract]
  55. Ling, G. N. (ed) (1984) In Search of the Physical Basis of Life, pp. 227-269, Plenum Press, New York
  56. Voloshin, O. N., Veselkov, A. G., Belotserkovskii, B. P., Danilevskaya, O. N., Pavlova, M. N., Dobrynin, V. N., and Frank-Kamenetskii, M. D. (1992) J. Biomol. Struct. & Dyn. 9, 643-652 [Medline] [Order article via Infotrieve]
  57. Kunkel, T. A. (1992) BioEssays 14, 303-308 [Medline] [Order article via Infotrieve]
  58. Hammond-Kosack, M. C., Kilpatrick, M. W., and Docherty, K. (1992) J. Mol. Endocrinol. 9, 221-225 [Abstract]
  59. Fang, G., and Cech, T. R. (1993) Biochemistry 32, 11646-11657 [Medline] [Order article via Infotrieve]
  60. Frantz, J. D., and Gilbert, W. (1995) J. Biol. Chem. 270, 9413-9419 [Abstract/Free Full Text]
  61. Giraldo, R., Suzuki, M., Chapman, L., and Rhodes, D. (1994) Proc. Natl. Acad. Sci. U. S. A. 91, 7658-7662 [Abstract]
  62. Liu, Z., Frantz, J. D., Gilbert, W., and Tye, B. K. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 3157-3161 [Abstract]
  63. Liu, Z., and Gilbert, W. (1994) Cell 77, 1083-1092 [Medline] [Order article via Infotrieve]
  64. Schierer, T., and Henderson, E. (1994) Biochemistry 33, 2240-2246 [Medline] [Order article via Infotrieve]
  65. Weisman-Shomer, P., and Fry, M. (1993) J. Biol. Chem. 268, 3306-3312 [Abstract/Free Full Text]
  66. Huang, L.-S., and Breslow, J. L. (1987) J. Biol. Chem. 262, 8952-8955 [Abstract/Free Full Text]
  67. Eichler, E. E., Hammond, H. A., Macpherson, J. N., Ward, P. A., and Nelson, D. L. (1995) Hum. Mol. Genet. 4, 2199-2208 [Abstract]
  68. Fry, M., and Loeb, L. A. (1992) Proc. Natl. Acad. Sci. U. S. A. 89, 763-767 [Abstract]
  69. Charette, M. F., Weaver, D. T., and DePamphilis, M. L. (1986) Nucleic Acids Res. 14, 3343-3362 [Abstract]
  70. Sen, D., and Gilbert, W. (1990) Nature 344, 410-414 [CrossRef][Medline] [Order article via Infotrieve]
  71. Trinh, T. Q., and Sinden, R. R. (1991) Nature 352, 544-547 [CrossRef][Medline] [Order article via Infotrieve]
  72. Kelly, R., Bulfield, G., Collick, A., Gibbs, M., and Jeffreys, A. J. (1989) Genomics 5, 844-856 [Medline] [Order article via Infotrieve]
  73. Bell, G. I., Karam, J. H., and Rutter, W. J. (1982) Prog. Clin. Biol. Res. 103, 57-65 [Medline] [Order article via Infotrieve]

©1997 by The American Society for Biochemistry and Molecular Biology, Inc.