(Received for publication, June 10, 1994; and in revised form, October 12, 1994)
From the
Polypurine-polypyrimidine (RY) sequences have the unusual
ability to form DNA triple helices. Such tracts are overrepresented
upstream of eukaryotic genes, although a function there has not been
clear. We report that transcription in vitro into one such
upstream R
Y tract in the direction that makes a predominantly
purine RNA is effectively blocked by formation of an intramolecular
triple helix. The triplex is triggered by transcription and stabilized
by the binding of nascent purine RNA to the template. Transcription in
the opposite direction is not restricted. Polypurine-polypyrimidine DNA
may provide a dynamic and selective block to transcription without the
aid of accessory proteins.
DNA sequences with an asymmetric strand distribution of purine
and pyrimidine bases (RY) constitute up to 0.4% of mammalian
genomes and are frequent in the upstream regions of genes, where they
contribute S1 nuclease hypersensitive
sites(1, 2, 3) . This has led to the
speculation that R
Y sequences may be involved in gene regulation (4, 5, 6) . The nuclease sensitivity of
R
Y sequences stems from the ability to adopt non-B secondary
structures, including DNA triple helices. To form an intramolecular
triplex, part of the R
Y DNA duplex must dissociate and wind back
down the major groove of the DNA helix to form nonstandard bonds with a
central purine strand. In so doing, it relaxes negative supercoils.
Consequently, the presence of negative supercoiling has a strong
influence on the formation of intramolecular triple
helices(4, 5, 6) .
An argument against the
existence of intramolecular triple helices in vivo has been
that such structures require negative supercoiling to form at
physiologic pH and salt conditions, whereas evidence does not suggest
that genomic DNA is highly supercoiled. A source of negative
supercoiling for these RY tracts could be transcription
elongation, during which a local wave of negative supercoiling is
generated by the passage of DNA through a polymerase (7, 8) . Indeed, it has been reported that
transcription through a R
Y repeat can lead to formation of
intramolecular triple helices on a supercoiled template(9) .
We examined the effects of transcription through a naturally
occurring RY sequence from the upstream region of a
neuron-specific gene. More than 80% of the coding strand in the first
500 base pairs (bp) (
)immediately upstream of the rat GAP-43 gene is composed of purines(10) . Within this
are three R
Y tracts (Fig. 1, top). We demonstrate
that these sequences form stable non-B structures that are consistent
with triple helices on either supercoiled or linear DNA under
conditions that approximate the in vivo state. These
structures form only when transcription is in the direction that
produces a purine-rich RNA, and formation of these structures blocks
further transcription.
Figure 1:
Transcription into an RY tract to
make a purine-rich transcript relaxes negative supercoils. Top, the sequence of the insert used in this work is shown 50
bases/line(10) . R
Y tracts I-III (numbered from the top
or 5` end) are underlined. Plasmids containing the insert in
the orientation that produces a purine-rich transcript from the T7
promoter are designated T7+. Bottom, the native
mobility of plasmids pSP72(-), 72T7- (
) and
72T7+ (
) are shown in lanes 1-3,
respectively. The effects of transcription with either SP6 (S)
or T7 (T) RNA polymerase on gel mobility are shown in lanes 4-9. Lanes 10-15 contain the same
material after single strand-specific RNase treatment (RNase A, 20
µg/ml and T1, 1000 units/ml for 1 h at 37 °C). Templates
transcribed to make a purine-rich RNA now exhibit bands with the
mobility of relaxed and partially relaxed conformers (lanes 12 and 15), and all others have been returned to a
pretranscription mobility. Lane M is the
-DNA BstEII size marker.
The possibility that transcription might cause the GAP-43 upstream RY DNA tract (Fig. 1, top) to adopt
an alternative structure was first tested with supercoiled templates.
The templates were transcribed from flanking phage promoters, and the
effect of such transcription on the mobility of supercoiled templates
was analyzed by gel electrophoresis (Fig. 1, bottom).
Transcription initially produces a smeared appearance for all templates (Fig. 1, bottom lanes 4-9). The majority of this
smearing is resolved by treatment with single strand-specific RNases,
suggesting that it is due to RNA tangled with the supercoiled
templates. The parental plasmid transcribed in either direction, and
the plasmids with insert transcribed to make a predominantly pyrimidine
(Y) transcript, are fully resolved by RNase treatment (lanes
10, 11, 13, and 14). However, templates
transcribed to make a predominantly purine transcript are resistant to
complete resolution by single strand-specific RNases (lanes 12 and 15). The same results were obtained with pSP73,
73T7-, and 73T7+ (not shown). Only transcription to make a
purine-rich transcript produces this RNase-resistant mobility shift. It
is independent of polymerase used, orientation within the plasmid, or
distance from the promoter. RNase treatment of the untranscribed
plasmids had no effect (not shown). The several discrete species that
remain in lanes 12 and 15 after RNase treatment exhibit the mobility of
relaxed conformers, as has been shown by Reaban and Griffin (9) for a different R
Y tract in a supercoiled template.
The sequence we tested resulted in greater relaxation of supercoils
than was evident in the figures presented in that report (9) ,
an indication that it may have a higher propensity to form alternative
structures.
Figure 2:
Transcription through an RY tract is
blocked in one direction. Linear plasmids with the GAP-43 upstream R
Y tracts inserted in opposite orientations are
illustrated at the bottom, with boxes indicating the length
contribution and orientation of the R
Y tracts. The cut point is
10 base pairs downstream of the sequence shown in the top of Fig. 1for the GA-rich direction, and 40 bp upstream of the
sequence as shown, for the inverted (CU) direction. Lanes
1 and 2 show the mobility of the control linear plasmids. Lanes 3-8 show mobility after T7 transcription followed
by no further treatment (lanes 3 and 4), treatment
with RNases specific for single strands (lanes 5 and 6) or treatment with an RNase specific for the RNA in
RNA
DNA hybrids (lanes 7 and 8). Transcription
to make a purine-rich RNA (lane 3) results in no detectable
product and retarded template mobility. Transcription producing a
pyrimidine-rich RNA (lane 4) results in an abundance of
product and no alteration in template mobility. Retarded mobility is
partially reduced in the retarded template by RNase A and T1 treatment (lane 5) and completely eliminated by RNase H treatment (lane 7). Lane M,
-DNA BstEII
digest.
Transcription that produces a purine-rich RNA (Fig. 2, lane 3) causes a very large mobility shift in
the linear DNA template but generates little free product RNA.
Conversely, transcription that produces a pyrimidine-rich RNA
transcript (Fig. 2, lane 4) generates an abundance of
product and no alteration in the mobility of the template. The aberrant
migration of templates generating purine transcripts (lane 3)
appears to be due to retention of nascent transcripts and polymerase.
The combination of single strand specific RNases (A and T1) reduces the
retardation (lane 5). Partial protection from RNase A and T1
implies that some of the RNA can base pair with the template, as is
confirmed by the complete resolution by RNase H (lane 7) which
is specific for the RNA in an RNADNA hybrid. As expected, RNase A
and T1 completely remove the product of opposite orientation
transcription (lane 6), while RNase H treatment does not
affect the mobility of the template or transcripts from that
transcription (lane 8).
Figure 3:
Electron micrographs reveal a knotted
complex. A, EM of templates transcribed in the
``permitted'' direction (pyrimidine RNA, as in lane 4 of Fig. 2). Small structures corresponding to RNA
transcripts are a large molar excess to the plasmid templates. Of 221
templates viewed, 218 (99%) had this simple, linear appearance. B, EM of templates transcribed in the ``blocked''
direction (purine RNA, as in lane 3 of Fig. 2). Knotted
complexes near the end of the templates may correspond to stalled
transcription units. Of 251 templates scored, structures were present
at one end in 240 (96%) while eight had a simple linear appearance and
three had additional structures away from the end. Note the absence of
anything resembling free transcript molecules. C, EM of
templates transcribed in the blocked direction in which the RY
tract is located closer to the center. Templates were generated by
cutting the 2987-bp templates 660 bp downstream of the site used in B. The point of knotting is closer to the center, as
predicted, in 30 (88%) out of 34 templates viewed. Again, little that
might correspond to free transcripts is visible in these spreads. D, a typical molecule, as in B, at higher resolution.
A sharp bend causes the free end (wide arrow) to bend back
sharply, almost touching the plasmid at a point upstream of the knot.
Dark stained beads near the bend point (small arrows) may
correspond to polymerase units. Roughly 80% (194/240) of the templates
with complexes at the end exhibit a similar conformation. In templates
where the free end is clearly visible, the bend ranges from 90 to 180
degrees (as in B). E, about 20% of the templates with
complexes at the end (46/240) have a more ``relaxed''
appearance, exhibiting an open loop (hollow arrow) of varying
size. Templates with a large open loop, such as this example, exhibit
no particular orientation of the free end (wide filled arrow)
in relation to the rest of the template. The dark staining beads (small arrows) may correspond to polymerase, giving this the
appearance of an extended transcription bubble. Size bar equals 0.25 µm.
Figure 4:
Transcription is blocked at multiple
points. End-labeled transcripts show the accumulation of CU-rich RNA
after 2, 5, and 10 min of labeling (lanes 1-3) and the
lack of accumulation of GA-rich transcripts over the same period (lanes 4-6). Longer exposure of lanes 4-6 (lanes 4-6
) shows multiple points
of blockade to GA production. The schematic map indicates the distance,
in bases, from the point of T7 initiation (bottom) to the cut
end of the plasmid (top) for the GA-rich RNA. The total length
of the expected transcript is 621 bases, including 47 bases of plasmid
sequence 5` to the insert shown in Fig. 1. The majority of stops
occur 5` to and within the 5` end of the polypurine tracts. Blockade
does not appear to occur within the 3` half of tract I (210-290
bases) nor beyond the 5` portion of tract III (450 to the end of the
template).
Fig. 3B illustrates the
appearance of the templates transcribed to make a purine-rich RNA
product (blocked direction), corresponding to those with an altered
mobility in lane 3 of Fig. 2. The 2996-bp templates had
been made linear 10 bp downstream of the sequence shown in Fig. 1and exhibit structures on one end that are consistent with
stalled transcription complexes. The position of the complex is close
to the end of the linear DNA in Fig. 3B, as predicted
by the near terminal location of the RY tracts on the templates.
When templates are made linear by cutting 660 bp downstream of the
point used in Fig. 3B, the knotted complex formed after
transcription in the blocked direction is more toward the center of the
template (Fig. 3C). This demonstrates that the R
Y
tract determines the position of the knot and that the unidirectional
transcription block does not require proximity to a free end.
In Fig. 3, D and E, individual templates from a preparation shown in Fig. 3B are displayed at higher resolution. In Fig. 3D, the knotted structure near the end of the template includes a sharp bend that causes the free duplex end of the template (wide arrow) to kink back sharply toward the rest of the template. Such a kink is predicted by models of intramolecular triple helices(12) . Triplex-induced kinks may have consequences for both transcription and recombination by positioning formerly distant sequences(4, 6, 12, 13) , as in this example. The small arrows point to blobs with tails that we believe are polymerase units with nascent transcripts that remain with the template. No fixation step was used in any of these experiments and the association of polymerase/nascent transcript with the template must be fairly stable. The template exhibited in Fig. 3E has a more relaxed appearance. The free end of the template (wide arrow) is similar in length to the one present in Fig. 3D but is not kinked. The sharp bend is replaced by a bubble (wide, hollow arrow). The dark stained objects opposite this (small arrows) may correspond to polymerase molecules. We suggest that Fig. 3D is consistent with the expected conformation of a template containing an intact triple helix and that Fig. 3E is consistent with one that has unwound after trapping RNA polymerase.
The structures corresponding to stalled transcription units in Fig. 3predicted the existence of truncated transcripts. Such structures sometimes appeared to be bound at slightly different distances from the free end of comparable templates (Fig. 3C) and some templates appear to contain several such structures bound (e.g.Fig. 3B, Fig. 3D, and Fig. 3E), which predicted multiple truncation points. To examine the extent and location of these transcription stop points, transcripts were end-labeled during transcription and the products analyzed on a denaturing polyacrylamide gel.
Lanes
4-6
are an 8-fold longer exposure
of lanes 4-6, to show the points of arrest during
purine-rich transcription. There are many stop points spanning nearly
400 bases (over 10% of the total template length). The stopping points
cluster into two primary groups, those between 300 and 400 bases and
those below 210 bases in length. The schematic map shown to the right
indicates the 621 base region transcribed (bottom to top) to produce
the purine-rich material in lanes 4
-6
. Thin lines join selected size intervals on the gel to the
corresponding point on the schematic, so that stop points can be
correlated with the locations of the R
Y tracts. It can be seen
that the lower group of stop points are located 5` to, and within the
5` half of tract I. Transcription does not stop from midway through the
first R
Y tract to its end (from around 210 to 300 bases).
Similarly, the second major cluster of stops are 5` to, and within the
5` end of, tract II. Few additional stops occur from midway through
this tract to the cut end of the plasmid (621 bases, or the top of the
schematic). Given the frequency of transcription stops in both
repetitive and non-repetitive sequences, recognition of a particular
stop sequence by the polymerase seems unlikely. Rather, it appears that
structures formed by the R
Y sequences block entry by RNA
polymerase. The many stop points may indicate either that a number of
structures are formed, or that polymerase units stack 5` to a few
structures. The existence of two primary clusters of stops indicates
that there are likely to be at least two classes of structures.
Fig. 5shows the changes in gel mobility for templates that
had been transcribed in the blocked direction into tract I by itself (lanes 1-4), tracts I and II (lanes 5-8),
or all three (lanes 9-12). When more than one tract is
transcribed additional smearing and a slight mobility change is
apparent even after RNase treatment (lanes 7 and 8).
When all three tracts are transcribed two broad smears that may
correspond to multiple mobility classes are visible (lanes 11 and 12). These span the mobility characteristic of
transcription through just tract I (lanes 3 and 4)
and tracts I and II (lanes 7 and 8). Multiple
mobility classes are also apparent when all three tracts are
transcribed in the blocked direction in supercoiled plasmids (Fig. 1, lanes 12 and 15). Our interpretation
is that transcription through all three RY tracts can generate a
variety of triplex structures both within and between the separate
tracts.
Figure 5:
The three GAP-43 RY tracts
cooperate in forming mobility altering structures. The plasmid
73T7+ was digested with restriction enzyme NsiI (lanes 1-4), NheI (lanes 5-8),
or BglII (lanes 9-12) to include either one,
two or all three R
Y tracts downstream of the T7 promoter in the
blocked orientation. One-fourth of each digest was left untreated. The
rest was reacted with T7 polymerase and then incubated with no
RNase(-), 0.2 µg/ml RNase A and 10 units/ml T1 (+), or
20 µg/ml A and 1000 units/ml T1 (+++) for 30 min at
37 °C. The inclusion of all three tracts leads to a greater degree
of complexity in mobility after RNase
treatment.
This was confirmed by direct EM visualization of templates
corresponding to those in lane 11 of Fig. 5after
preparation with either a denaturing spreading technique (21) or non-denaturing technique(11) . The fields in Fig. 6, A and B, show the predominant classes
of structures found in these preparations. Formamide is predicted to
denature the putative triplex, resulting in an R-loop (Fig. 6C, top). The denatured templates shown
in Fig. 6A clearly demonstrate the presence of the
predicted RNADNA hybrids as multiple size classes of R-loops.
Figure 6:
Multiple structures formed by
transcription resist single strand specific RNase treatment. A, partial denaturation by formamide reveals the predicted
RNADNA hybrids as R-loops. Of 119 molecules scored, 58 (49%) had
a large single loop, 27 (23%) had a small single loop, 23 (19%)
appeared to be linear, 9 were Y-shaped, and 3 had two separate loops. B, lack of denaturation reveals knotted complexes. Ethidium
bromide intercalates into double-stranded nucleic acid and extends it,
giving the templates a long thin appearance. The templates have a
characteristic knot and loop structure, and the knot corresponds to the
predicted location of the putative R-R
Y triplex. The loops in B correspond to the RNase-resistant RNA
DNA hybrids that
appear as R-loops in A. Of 82 separate molecules scored, 30
(37%) had a large loop (form 3), 17 (21%) had a double loop (form 2), 15 (18%) had a small loop (form 1), and 4
(5%) were linear. An additional 16 molecules had a loop evident but
were not spread enough to interpret as structures 1, 2, or 3. C, interpretations. After spreading with formamide/cytochrome c, the triplex would be denatured, leaving R-loops which
should vary in size. Below are shown interpretations of the three major
species found in the ethidium bromide/magnesium spreads. Thickened
lines in the area of the triplex regions are used to indicate the
condensed nature of the triplex. Size bars equal 0.25
µm.
When spread using the non-denaturing technique to permit better
retention of structure, the templates exhibit the conformations
displayed in Fig. 6B. Interpretations of the
non-denatured conformations are shown at the bottom of Fig. 6C. The knot in the templates shown in Fig. 6B is characteristic and corresponds to the
predicted location of the proposed R-RY triplex structure. This
condensed area is not the polymerase, which stains much more densely in
these preparations (see Fig. 3C for a polymerase next
to a knot). The loops correspond to the RNA
DNA hybrids that are
seen as R-loops under the denaturing conditions in Fig. 6A (compare also to Fig. 3, D and E). The
various structures shown in Fig. 6indicate that R-R
Y
triple helices are capable of forming between non-symmetrical R
Y
sequences and across a distance.
Transcription into the GAP-43 upstream RY
tract in the direction that produces a predominantly purine RNA
triggers formation of a structure that traps the transcription complex
and blocks subsequent rounds of transcription. Since it permits
transcription in only one direction, we refer to the R
Y tract as
a DNA diode.
The documented ability of RY sequences to form
triple helices suggests that the blocking complex contains a triple
helix. Blockade occurs only when a purine-rich RNA is generated, so any
triple helix formed is not likely to be the well-described Y-R
Y
(H-DNA) triplex, but rather the more recently described R-R
Y
structure(5, 18, 22, 23) . This
would be true whether this RNA acts as a third strand or replaces the
``donated'' strand in an intramolecular triplex. The
relaxation of negatively supercoiled templates in Fig. 1is
indicative of an intramolecular triplex.
This is the first example
of a transcription elongation block by an intramolecular triplex, and
its dynamic nature is somewhat surprising. Triple helices have been
shown to interfere with transcription in other systems.
Oligonucleotides acting as the third strand in a triplex near the site
of a DNA-binding protein needed for transcription activation may
compete with it and thereby lower transcription initiation (24, 25, 26, 27) .
Oligonucleotide-directed interference with in vitro transcription elongation by intermolecular triple helix formation
has been reported for Escherichia coli RNA polymerase (28) and for RNA polymerase II(29) . In both cases
chemically modified pyrimidine oligonucleotides caused some pausing of
polymerase, but efficient blockade of transcription elongation was
observed only when these oligonucleotides were covalently cross-linked
to the templates. In contrast, the transcription elongation block by
the naturally occurring RY repeats we observe is due to an
intramolecular R-R
Y triplex and needs no base modifications,
exogenous oligonucleotides, or cross-linking for its formation
and stability. It is also very efficient.
An advancing polymerase
generates local negative supercoiling in its
wake(7, 8, 30) , while intramolecular triple
helices relax negative supercoils and can form in response to increased
negative supercoiling (4, 6, 17) . The
diagram in Fig. 7(AandB) is
presented as a way to visualize the combination of these events. As a
polymerase moves through the RY tract, local negative
supercoiling behind the polymerase helps drive the formation of a
triple-stranded structure. As the donated third strand is wound down
the major groove, the rotation of the acceptor helix releases negative
supercoils (Fig. 7A). The spring-like tension relaxed
by the denaturation and winding back of the third strand helps make
formation of the triplex energetically favorable.
Figure 7:
Genesis of the DNA diode. A, when
RNA polymerase moves through an RY tract in the direction that
makes purine RNA, the local negative supercoiling generated in the wake
of the polymerase drives the formation of a triple-stranded structure.
As the acceptor duplex DNA rotates (direction shown by wide
arrow), it releases negative supercoils. This rotation pulls the
donated third strand (direction shown by small arrow) into the
major groove, much as a string would be wrapped around a rotating
screw. The initiation and progression of this triple strand formation
are probably also aided by the polymerase-induced denaturation of the
DNA duplex and, perhaps most importantly, by the propensity of the
R-R
Y triplex to form. B, after formation, an
RNA
DNA duplex stabilizes the triplex. The triplex blocks
subsequent passage by RNA polymerase. Polymerase in the loop during
triplex formation may become trapped, but how this occurs is not yet
clear. Because RNase treatment specific for single strands can remove
trapped polymerase molecules, we favor a mechanism of steric hindrance
by a higher order structure(s) involving the free end of the
transcript. For clarity, this has been omitted from the figure. C, a representation of an R-R
Y triplex (the H-r3
conformation) that could form behind an advancing polymerase. The
additional purine strand is antiparallel to the central purine strand,
and all bonds are stable at physiologic pH. Normal Watson-Crick base
pairs are indicated by a single large dot, and alternative
bonds stable at physiologic pH are indicated by two small
dots. D, a representation of the Y-R
Y or
``H-DNA'' triplex is shown (H-y3 conformation). The
pyrimidine third strand is paired in parallel to the central purine
strand. The G-C Hoogsteen bonds require protonation and are indicated
by a plus sign. In both triplex structures shown, a donated
strand folds back in an intramolecular triplex and would alter the
supercoiling. In contrast, an intermolecular triplex would not affect
supercoiling.
As shown in Fig. 7B, an RNADNA hybrid stabilizes the triplex (9) by blocking the donated purine DNA strand from
reassociation. The sensitivity of the complex to RNase H, resistance to
RNases A and T1, and direct EM visualization of R-loops all support the
hypothesis that an RNA
DNA hybrid with normal base pairing
stabilizes the structure. Such kinetic trapping of a triplex by
occupation of the complement to the donated strand has been
demonstrated for the H-DNA triplex using pyrimidine oligonucleotides (31) and has been proposed to result in the protection of
purine oligonucleotides by the R-R
Y triplex in
vivo(32) .
We believe that the diode-like effect of the
RY sequences stems from differential stability of the two
potential types of intramolecular triple helices generated by
transcription. The triple helix under discussion here, the R-R
Y
triplex (Fig. 7C), is stable at physiologic pH, and its
stability is enhanced by the presence of divalent cations, conditions
that are likely to prevail in the nucleus (18, 22, 23, 32, 33, 34) .
The Y-RY triplex (Fig. 7D) might be expected
to form in response to transcription that makes a
pyrimidine-rich RNA. However, cytosine bases on the third
strand must be protonated to make Hoogsteen base pairs in a
C+G
C triplet, and H-DNA usually requires a low pH to achieve
triplex formation(4, 6) . Even then, when free energy
was calculated, the stability of the third strand in an R-R
Y
triplex at pH 7.3 was found to be twice that of the third strand in the
corresponding Y-R
Y structure at pH 5.5(35, 36) .
Had the Y-R
Y (H-DNA) triplex been as stable as the R-R
Y
triplex under conditions of transcription, both directions should have
been blocked.
A prior report suggesting transcription-mediated
formation of intramolecular triple helices on supercoiled templates
originally did predict an H-DNA triplex stabilized by a pyrimidine
RNA(9) , which would seem to be at odds with our findings.
However, the model was subsequently modified to that of an R-RY
triplex(37) , in accordance with what we have found.
The
conditions for triplex formation may be less stringent for the
R-RY structure even beyond that of pH requirements, since it
appears able to form without strict mirror symmetry, between
non-contiguous tracts, and on linear DNA. In Fig. 8a model of
one of the possible interactions between tracts I and II is shown. The
apparent mismatches between these non-mirror tracts may not be as
destabilizing as they appear, given the relative promiscuity and
flexibility in the proposed hydrogen bonding schemes for R-R
Y
triplets(18, 19, 20) . Accordingly,
transcription-triggered formation of an R-R
Y triplex as presented
in Fig. 7may be able to initiate anywhere within the perfectly
symmetrical (GA) repeats and also at a number of points within the more
variable sequences of tracts II and III. The only requirement may be
that any triplex formed must be stable enough to remain after the
transient wave of negative supercoiling has dissipated.
Figure 8:
A possible alignment of RY tracts I
and II across 70 bp of non-R
Y DNA. An example of how non-mirror
R
Y tracts may interact. The sequence shown is from Fig. 1,
and numbered accordingly. It shows one possible alignment of a 3`
section of tract I with the purine strand of tract II. The hairpin
shown is one of several roughly equivalent possibilities predicted by a
folding program(38, 39) . Cruciform extrusion of
intervening DNA may assist in the formation of alternative structures
between separate R
Y tracts(40) . The pyrimidine-rich
strand is shown only in the area of the triplex for clarity but is
continuous with the rest of the plasmid on both ends (as is the folded
strand). As in the previous figure, a single black dot denotes
normal base pairs, and two small dots indicate alternative
interactions.
In the
example in Fig. 8, the intervening non-RY DNA between
tracts I and II is shown as a self-paired hairpin, one of several
predicted by a folding program(38, 39) . The
non-R
Y DNA between regions II and III is a perfect inverted
repeat (centered on base 369 in Fig. 1). It is intriguing that
both intervening non-R
Y segments are potentially capable of
forming hairpins, since cruciform extrusion of inverted repeats between
separate R
Y tracts has been demonstrated to facilitate the
interaction of those tracts(40) . Moreover, cruciform extrusion
of palindrome sequences is driven by negative supercoiling and
initiated by a denaturation event(41) . Transcription could be
expected to aid cruciform extrusion in a way analogous to the model
presented in Fig. 7for triplex formation.
Given the
propensity of these structures to form in vitro, and their
remarkable stability after formation, it seems very likely that they do
exist, at least transiently, in vivo. RY tracts of
greater than 20 consecutive base pairs are rare in prokaryotes and
phage but are overrepresented in the genomes of higher eucaryotes, and
in birds and mammals in particular(1, 2) . In theory,
long R
Y tracts may be able to regulate transcription without
additional DNA-binding proteins. Whether or not that is the case, the
relatively facile formation of these structures suggests a need for
enzymes to resolve them.
Our results suggest that the RY
tracts upstream of the major GAP-43 transcription starts may
serve to block polymerase units starting upstream of the gene from
transcribing through it. R
Y sequences are present in the upstream
regions of numerous genes(4, 12) , but since the
observed transcription blockade is directional, analysis of orientation
is necessary before concluding that it has more general relevance as a
motif upstream of tightly controlled transcription units.
For
ribosomal RNA, the potential for RY sequences to play an
important role in transcription blockade is particularly striking. The
tandem array of genes coding for ribosomal RNA contains long R
Y
tracts in the so-called ``non-transcribed
spacer''(42) . In the human ribosomal non-transcribed
spacer, R
Y tracts constitute more than 10% of the
sequence(43) , five R
Y homopolymers range from 50 to 200
bp in length, and all are oriented such that antisense transcription
into this region would make a predominantly purine RNA and cause
triplex formation. This could serve to prevent production of antisense
ribosomal RNA from cryptic initiation sites upstream of the strong
polymerase I enhancer.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) L21190[GenBank].