(Received for publication, June 29, 1995; and in revised form, January 3, 1996)
From the
We have re-examined the kinetics of the branch migration of double-stranded DNA that is mediated by the stepwise movement of the Holliday junction. This work revises and extends our previous treatment (Thompson, B. J., Camien, M. N., and Warner, R. C. (1976) Proc. Natl. Acad. Sci. U. S. A. 73, 2299-2303). New methodology and new highly purified substrates have been used. The latter include figure 8s prepared from phage G4 DNA by annealing single-stranded components and two sizes of a novel cruciform. We treat the process as a one-dimensional diffusion based on the random walk, the mathematical basis of which is discussed in detail. The step rate is shown to be 3 orders of magnitude slower than we reported previously. The most important contribution to the erroneously high rate was a result of the presence of EDTA in the spreading solution used for electron microscopy at that time. A second contribution of about 4-fold resulted from catalysis by EcoRI and other proteins. The rates reported here are for the uncatalyzed reaction.
The exchange of strands between two duplex DNAs by branch
migration through the Holliday junction is essential to genetic
recombination, and its physical parameters are of fundamental
importance. The determinations reported here of the rate of uncatalyzed
branch migration required the development of highly purified substrates
and the removal of EDTA and of protein catalysts. The figure 8
substrates that we have used in this work were prepared by annealing
purified single-stranded precursors derived from phage G4-RF. ()We also report the development and use of a novel type of
cruciform substrate. The unexpected finding using these substrates and
using [
H]trioxsalen labeling for the rate
measurements was that the step rate of branch migration was slower by 3
orders of magnitude than our previously reported (1) measurements indicated. We present here measurements using
the new substrates with support for the conclusion that these rates
characterize uncatalyzed branch migration. The earlier observations
were made on X-forms derived by EcoRI cleavage of the figure 8
molecules that occur naturally in the RF DNA of the single-stranded
phage G4(2) . In these experiments, the distinction between the
X-forms and the terminal product of branch migration, linear form III
DNA, was made by electron microscopy on a dimer-enriched fraction of RF
containing about 2% figure 8s. We now find that these high rates were
due primarily to the presence of EDTA in the spreading solutions used
at that time for the preparation of samples for examination by electron
microscopy and secondarily to the catalysis of branch migration by EcoRI and other proteins. The slow rates of branch migration
that have also been reported by several investigators (3, 4, 5, 6, 7) are
discussed as are our results in relation to the structure and stability
of Holliday junctions(8, 9, 10) . The
mathematical bases for the use of random walk theory and for the
calculation of step rate constants are given under
``Appendix.''
Double-stranded DNA was cross-linked and labeled by a modification
of the procedure of Isaacs et al.(14) . Samples
containing 0.2 µg of DNA in 40 µl of buffer were placed in a
microtiter plate packed in ice. [H]Trioxsalen,
about 10
to 10
cpm in 5 µl of methyl
alcohol, was added, and the samples were allowed to stand in the dark
for 30 min to equilibrate the intercalation reaction. They were then
irradiated in the light box described below for 30 min. The salt
concentration was increased to about 0.2 M by adding 4 M NaCl to improve the extractability of trioxsalen and
photoproducts. The solution was extracted 5 times with about 0.6 ml of
chloroform and twice with ether using solvents equilibrated with 2
mM EDTA.
The light box was similar to that described by Isaacs et al.(14) in that it employed 2-400-watt GE Hg vapor lamps (H 400 DX 33-1). They were mounted in an air-cooled chamber above and at an angle to the microtiter plate packed in ice so that the end of each was 8 cm from the samples. A 7-mm thickness of glass was used to filter out the UV radiation below 340 nm so that the principal effective radiation was with wavelengths near 360 nm.
The labeled DNA species were separated by vertical-tube gel electrophoresis on 1% agarose gels containing 1 µg/ml EtBr. The gels were sliced with a parallel-wire gel slicer into 1-mm segments which were counted by scintillation. The output of the counter was taken by a computer which produced a graph of the results as illustrated in Fig. 1. RO was calculated after correcting the counts under the peaks that refer to the branch-migrating species and to the product. For this purpose, two controls were run with each series of timed migration samples. An ``RO control'' run for several half-times, usually at 55 °C, was used to correct for background and nonmigrating DNA. The slowest peak in Fig. 1is an example of the latter. A zero time control was used to establish the initial state with respect to migration. The step rate was derived from the RO as described under ``Appendix.''
Figure 1:
Branch
migration of a 3694-bp cruciform. Four stages in the branch migration
of a cruciform with closed ends, cross-linked and labeled at each stage
with [H]trioxsalen as described in the text. The
fast peak is due to residual unligated 1847-bp snapback. The central
peaks show the conversion of the cruciform to the 3694-bp linear double
strand with closed ends, referred to in the text as the cruciform
precursor. This peak of linear DNA is also present in substantial
amount at zero time because it is produced along with the cruciform in
the annealing reaction. The peaks are arbitrarily scaled so that the
highest counts/min in each panel is at the top of each graph. The four
panels, A-D are for migration times of 0, 120, 395, and
1720 min, respectively, at 45 °C.
The precision of the method for determining the relative amounts of electrophoretically well-separated DNA species was tested using an XhoI restriction of G4 RF. The larger of the two resulting fragments is 3372 bp in size or 60.5% of the total. In 11 determinations at different times, in each of which the total counts in both peaks were greater than 50,000 cpm, the mean value for this measure was 60.3 ± 0.6%. Thus, the procedures of labeling, electrophoresis, and background correction do not introduce any significant error into the determination of RO. Error greater than that indicated by this test arises from problems of allocation of counts to overlapping peaks in the electrophoretic pattern. This is a problem in the analysis for all substrates in part because the branch migrating peak is skewed in the direction of migration and may overlap the product peak as illustrated in Fig. 1.
Protein constituents that were present during the preparation of the branch migration reaction mixture were digested with proteinase K at a concentration of about 0.5 µg/ml at 0 °C. The digestion products and the proteinase K were removed by gel filtration on Bio-Gel A-1.5m (from Bio-Rad) or by dialysis using Spectra/Por 6 membrane (50,000 molecular weight cutoff). These treatments completely removed EcoRI when it was used to convert figure 8s to X-forms and other proteins used before the cross-linking reaction.
Figure 2:
Agarose gel electrophoresis of G4 DNA and
its annealing to form figure 8s. Agarose gels (1%) were run at 6 °C
for about 17 h and were stained with EtBr. All are of G4-DNA except lanes 4c and 4d which are of X-DNA. Annealings
of dimer III and viral DNAs were done in 2 mM EDTA, pH 8, at
the NaCl concentration and the temperature indicated for 30 min: lane 1a, 50 mM NaCl at 60 °C with a 10-fold
excess of viral DNA; lane 1b, 25 mM NaCl at 45 °C
with a 10-fold excess of viral DNA; lane 2a, 50 mM NaCl at 65 °C with a 1.5-fold excess of viral DNA; lane
2b, 50 mM NaCl at 65 °C with a 10-fold excess of
viral DNA. The dimer III used in the annealings is shown in lane 1d and after denaturation at 100 °C in 2 mM EDTA and
quick cooling in lane 1c. The purified figure 8s are shown in lane 3b along with markers in lane 3a. Gel 4 shows a
sample of G4-RF in lane 4a and of
X-RF in lane 4c and after denaturation at 100 °C, 1 mM EDTA in lanes 4b and 4d, respectively. The bands are
identified in the figure as follows: 8, figure 8s;
, one
or two bands containing double-stranded circles with double-stranded
tails formed during annealing; i, intermediate band formed
during annealing at low ionic strength, lower than optimum temperature
or at short time periods of annealing; mI, supercoiled
monomer; mII, relaxed, circular monomer; mIII,
linear, double-stranded monomer, in these gels DNA from palindromic,
dimeric, single strands released during denaturation of the dimer III
prepared by ligation; dII, relaxed, circular dimer; dIII, linear, double-stranded dimer, in lane 1d the
dimer III is the purified product of ligation of monomer III and
consists of all possible head and tail combinations of monomers, in
other lanes the dimer III arises from a small amount of reannealing of d(+)- and d(-)-single strands; v,
viral DNA, circular single (+)-strands; ml+,
monomeric, linear, single (+)-strands and ml- for
the corresponding(-)-strands; mc+, monomeric,
circular, single strands and mc- for the
corresponding(-)-strands; d(+) and d(-), dimeric, linear single
strands.
The progress of annealing dimer III with an excess of G4 viral
strands is illustrated by gels 1 and 2 in Fig. 2. The DNA in lane 1b was annealed at 45 °C
for 30 min in 25 mM NaCl, 2 mM EDTA and in lane
1a at 60 °C in 50 mM NaCl for 30 min. Lanes 2a and 2b were annealed at 65 °C in 50 mM NaCl
for 30 min, but in lane 2a there was only a small excess of
viral DNA over the potential dimer(-)-strand and in 2b there was a 10-fold excess. The latter conditions provide optimum
annealing. Both of these gels illustrate the formation of an
intermediate, designated as the i band, that under longer annealing or
different conditions is driven into figure 8s or into one of the two
bands identified as bands. We refer to these as
bands
because by electron microscopic examination they were shown to consist
mainly of double-stranded loops with double-stranded tails. Many of the
loops are smaller than monomer length. The origin of these structures
is not clear although a small contamination by linear (+)-strands
in the viral DNA or by trimer III in the dimer III may contribute. We
have not found any conditions under which this band is absent. However,
the
species and all other non-figure 8 DNAs are removed by the
CsCl-EtBr gradient which yields figure 8s that migrate as a single band (Fig. 2, lane 3b). Electron micrographs of these figure
8s are shown in Fig. 3.
Figure 3: Electron micrograph of annealed G4 figure 8s. Preparation from single-stranded G4 components as described in the text.
The species derived from the dimer III that do not anneal with viral DNA, such as snapback double strands and dimer (+)-strands, are shown in these gels to persist and not to interfere with the annealing leading to figure 8s. The main conclusions are illustrated by the gel patterns shown in Fig. 2. While there is some residual non-figure 8 DNA present in the annealing mixture in addition to the excess viral DNA, none of this DNA bands with supercoiled DNA in EtBr. Highly purified figure 8s are obtained from the half-relaxed region of an equilibrium CsCl-EtBr gradient.
The isolation of figure 8s from the CsCl-EtBr gradient demonstrates that there are no nicks in one half of the molecule. In the other half, the nick is at the EcoRI site, yielding finally X-forms with no nicks. The gel for the zero time branch migration sample usually shows a few percent of linear monomers, part of which may be migration runoff during sample preparation. An additional 5% peak, as yet unidentified, that does not branch migrate appears in the terminal runoff sample. In a conventional gel, Fig. 2, lane 3b, there is only one figure 8 band. We thus estimate the figure 8 content to be about 90%. The theoretical yield should be equal to half the amount of DNA in the purified dimer III sample, but because of the low yield in the annealing and the gradient steps we do not even come close. The best yield in several preparations was 10% of the dimer.
Figure 4: Electron micrograph of cruciform structures. Preparation is described in the text. The cruciforms are of dimer size (11,154 bp). Most of the linear double-stranded molecules are of the same size.
In our preparation, monomer III from restriction of supercoiled G4-RF with EcoRI was ligated to a high degree of linear ligation, and a limit digest of the product was made with XhoI which has two cleavage sites in G4. The two restriction fragments that are palindromic and contain EcoRI sites at their centers were isolated by gel electrophoresis. G4 DNA treated in this way yields a 716-bp fragment containing an EcoRI site at the head-to-head position and a 3694-bp fragment with an EcoI site at the tail-to-tail position, based on the definition of ``head'' as the end of monomer III made by EcoRI that contains a 5`-overhang on the (+)-strand. When these fragments were denatured by heating each yielded on cooling, a homogeneous snapback fragment half the length of the linear double strand which contained the EcoRI sequence in the snapback position and a free ligatable XhoI sequence at the other end. Ligation yielded covalently closed, double-stranded molecules 716 or 3694 bp in length. The steps in this preparation are illustrated in Fig. 5for the 3694-bp cruciform. Similar small, covalently closed structures have been made by Mizuuchi et al.(21) , Wemmer and Benight(22) , and by Erie et al.(23) . We refer to these palindromic, double hairpins as cruciform precursors. Cruciforms can be made by heating and quick cooling the precursors as indicated above for the full-length G4 dimer. Each preparation from a precursor contained, in addition to a cruciform, an amount of the same size linear dimer in greater proportion from the 716-bp than from the 3694-bp precursor. The linear dimer is apparently formed as a result of the interaction between the nucleation event and the subsequent renaturation at each of two adjacent restriction sites as discussed under ``Appendix.'' In order to obtain a cruciform, there must be at least one initiation at an EcoRI sequence and one at an XhoI sequence. The interval of time between these two initiations must be less than the time required to propagate double-stranded structure from the first to yield a linear dimer. A longer DNA would thus be expected to be more favorable for cruciform formation.
Figure 5: Outline of the synthesis of a cruciform substrate for branch migration. The formation of a 3694-bp cruciform is used as an example. The procedure begins with the tail-to-tail ligation of the mIII that results from the restriction of G4-RF by EcoRI. The DNA sequences of the single EcoRI site and the two XhoI sites in G4-RF are shown to illustrate the palindromic nature of the intermediates. The sequences cannot be shown to scale, but the asymmetry of the positions of the two XhoI sites are approximately correct in order to make it clear that a smaller cruciform of 716 bp would be formed from a head-to-head ligation in Step 1. Step 1, ligation of G4-RF mIII to yield a tail-to-tail dimer III; Step 2, restriction of the tail-to-tail dimer by XhoI; Step 3, a heat and cool step forms two 1847-bp snapbacks; Step 4, ligation of the snapbacks leads to a 3694-bp cruciform precursor; Step 5, another heat and cool step gives the product, the 3694-bp cruciform.
The EDTA effect on rate measurements was avoided when electron microscopy was replaced with the trioxsalen cross-linking method. This prevented additional branch migration during the subsequent processing steps (see ``Experimental Procedures''). The second and smaller increase in rate was traced to the EcoRI that we had routinely allowed to remain in the reaction mixture after its use in converting figure 8s to X-forms. EcoRI was not used when the small cruciforms were the substrates. EcoRI was found to catalyze the step rate by severalfold, presumably as a result of its nonspecific binding to double-stranded DNA as reported by Jack et al.(24) . The method described under ``Experimental Procedures'' removes EcoRI and other proteins and allows the measurement of uncatalyzed branch migration.
Rate measurements were then made on
full-length X-forms from annealed figure 8s, on native figure 8s, and
on 3694- and 716-bp cruciforms prepared as already described. The
cruciforms were used both with closed arm ends and with arm ends opened
by exonuclease S1. All rates were determined using
[H]trioxsalen as described under
``Experimental Procedures'' including the use of proteinase
when any protein components were present in the early stages of the
sample preparation. Experimental values for RO were converted to step
rate constants using or 11 described under
``Appendix.'' Results at several temperatures are shown in Table 1. The standard deviation of the mean is given as a measure
of the overall precision of the step rate. The data in Table 1are plotted in Fig. 6in a form suitable for the
derivation of energy activation parameters.
Figure 6: Plot of the data in Table 1for obtaining kinetic activation parameters. The straight line is a least squares regression of the average value of log k/T degrees K at each temperature plotted against 1/T degrees K. The average values for k/T were weighted as indicated in Table 1.
Some observations on the step rate under varying conditions and with the addition of proteins are given in Table 2. The first entry is on the effect of EcoRI which was crucial to establishing the minimum uncatalyzed step rate. The experiment in Table 2is supported by six determinations of the rate before and after the removal of EcoRI from the reaction mixture during the testing of the procedure we devised for accomplishing this. The catalytic effect of EcoRI was found to be 4- to 9-fold for concentrations of EcoRI up to several times the maximum used in the experiment described in Table 2. We have not examined this effect in detail except for the trial of another DNA-binding protein, cytochrome c, shown in Table 2. In this experiment, the highest concentration is about that commonly used when cytochrome c is employed to coat DNA and enhance its image in electron micrographs. We also examined S1 nuclease for its effect on branch migration. It had none when migration was carried out in EcoRI buffer. However, enzymatic activity requires a pH of 4.6, and, in that buffer in the absence of the S1 enzyme (Table 2), branch migration proceeded at about 40 times its rate in EcoRI buffer at the same temperature, based on a 15 °C rate interpolated on Fig. 6. The treatment of the cruciform with the S1 enzyme to open the ends of the arms was carried out in S1 buffer at 15 °C for 15 min. There is a small but significant amount of branch migration during this treatment for which a correction was made in calculating the rate constant.
Some results are given in Table 3on the step rate
in acidic buffers. The entry in the sixth line was done in a buffer
that approximates the spreading solution used in our early
experiments(1) . The other entries show some effects of pH,
ionic strength, buffer composition, and EtBr at pH values below 7.
Recent determinations of the EDTA effect by Panyutin and Hsieh (7) show an increase in step rate of 800-fold in 0.1 mM EDTA at neutral pH over the rate in 10 mM Mg.
A number of other variables influencing
the rate were also tested using catalyzed migration. These results are
enumerated only briefly here because they have not been repeated in the
absence of catalysis. (i) The rate continuously decreased with increase
in ionic strength. (ii) Ethidium bromide inhibited branch migration to
about 4 10
and 2
10
of its optimum value at 60 µM and 120
µM, respectively. (iii) 70% formamide increased the rate
by 70-fold. These results are given in more detail by
Fishel(25) .
Before the identification of the EDTA effect, various attempts were made to account for the rates originally reported(1) . Some of these were based on the supposition that in addition to the catalytic effect of EcoRI, other proteins catalyzing migration were introduced into the reaction mixtures as contaminants of the relatively crude preparation of EcoRI used at that time. This EcoRI was purified by the method of Greene et al.(26) using E. coli strain RY13. Fractions were first made from the parent E. coli K12 lacking the R plasmid for EcoRI. Most fractions from steps in the procedure of Greene et al.(26) prior to the use of a phosphocellulose column could not be tested because of the presence of nucleases. Increases in step rate were found on adding phosphocellulose fractions that would have contained EcoRI had strain RY13 been used, but they were never more than 3- to 4-fold in magnitude. Similar results were obtained using the same fractions of an extract prepared from the RY13 strain. These extracts contained EcoRI. Its catalytic effect was allowed for in calculating the increase in rate attributable to the extract. EcoRI assays for this calculation were done by monitoring the conversion of G4 mI to mIII by gel electrophoresis. Thus the maximum catalysis of step rate due to the combined presence of EcoRI and other proteins that may have been added as contaminants of EcoRI is indicated to be about an order of magnitude.
The constancy of step rate shown in Table 1for substrates prepared by different methods and of widely varying size both validates the application of random walk theory and gives strong support for the conclusions that the step rate is independent of the length of the arms of the migrating junction. The random walk theory as developed under ``Appendix'' requires that the number of steps or the time taken to reach the same RO for substrates of different size be proportional to the square of the size parameter, r, defined under ``Appendix.'' For our three substrates, the square of the number of such steps is in the ratio 1 to 27 to 243, thus providing a stringent test of the relevance of random walk theory to branch migration. Direct evidence for the adherence of our substrates to the multiple walk formulation is also provided by several kinetic runs having points from both the low and the high RO regions of the curves on Fig. 9. Points from one experiment in which G4 X-forms were used are superimposed on the theoretical curve in this figure in order to show agreement with theory in both regions of the curve. All of the rate constants given in Table 1include results calculated from experimental points at both low and high ROs.
Figure 9:
Logarithmic plot of the progress of branch
migration with runoff. The multiple walk with absorbing barriers is a
plot of . The linear portion including its extension to the
vertical axis (dashed line) is a plot of . The
single walk with absorbing barriers is a plot of . Its
linear portion is given by . Orienting values of RO are
given on the right-hand vertical axis. The points, indicated
by , on the multiple walk curve are experimental values, each
entered at the position determined by the k calculated for
that point. The mean value of k for the points shown was 44.5
± 2.0 steps s
.
Recent studies of the structure and dynamics of the Holliday
junction have been based on immobile junctions introduced and first
synthesized by Seeman (27) and Kallenbach et
al.(28) . These structures have been examined by the use
of chemical probes for the reactivity of hydrogen bonds and by
spectroscopic
methods(8, 9, 10, 29) . The use of
fluorescence resonance energy transfer by Clegg et al.(8, 30) has been particularly valuable in
characterizing them. The junction in the presence of Mg can be formulated as a stacked X-structure, a compact,
specifically folded arrangement of the four arms. In the absence of
Mg
, this structure unfolds to yield an unstacked
positioning of the arms so that the four pairs of nucleotides that
border the junction form a square. The transition from the X-stacked
structure to the unfolded square in the absence of Mg
parallels the change in rate of branch migration from that in the
presence of Mg
to the very high rate observed in
EDTA. This suggests that the unfolded square is an intermediate in a
step of the migration. It would presumably exist in the presence of
Mg
at a very small concentration in equilibrium with
the X-stacked form as a true chemical intermediate, not an activated
structure as suggested by Clegg et al.(30) . The
adoption of the unfolded square as the intermediate is supported by the
finding of Panyutin and Hsieh (7) that although the rate of
branch migration is 800-fold greater in EDTA, the energy of activation, E
, is the same as in the presence of
Mg
. This implies that the structure that is activated
is the same under both conditions. A mechanism of branch migration
formulated in these terms is shown in Fig. 7. The kinetics of
branch migration is particularly suited to the application of
transition state theory because of the fact that the initial and final
states in a step of the random walk have the same structure and thus
are at the same energy level. This simplifies the interpretation of the
activation process in that there are no contributions to the energy of
activation from the difference in energy levels that ordinarily exist
between a reactant and product in a chemical reaction. This
interpretation is evident from the treatment of transition state theory
by Hammes(31) . The fact that the overall process of branch
migration does conform to a random walk suggests that a mechanism
requiring equality for the forward and reverse reactions should be
reflected in the properties of the activated state. The intermediates
shown in Fig. 6for the stepwise movement should be considered
as representing activated states that have such properties. We have
therefore followed Hammes (31) in evaluating
H
, the enthalpy of activation,
directly from a plot of ln k/T versus 1/T as
shown in Fig. 6. This yields
H
= 34.4 kcal mol
and
S
= 58.4 calories
K
mol
giving also
G
= 17.2 kcal
mol
. The Arrhenius parameter, E
, is then given by E
=
H
+
RT = 35 kcal mol
at 37 °C.
Figure 7: Branch migration of a Holliday junction. The indicated exchange of base pairs between B and C or B and D defines the unit step size of migration for the random walk treatment as formulated in the text of the ``Appendix.'' Structures A and B are shown in equilibrium as proposed under ``Discussion.'' A is the stacked X-structure; B, C, and D are unfolded, square structures; and hypothetical activated intermediates are shown between B and C and B and D.
The high E for the uncatalyzed reaction is in agreement
with the value of 36 kcal mol
reported by Panyutin
and Hsieh(7) . Some of the main contributions to E
in chemical and structural terms can be
identified. Meselson (32) proposed that rotary diffusion of the
arms drives branch migration and he made an order of magnitude
calculation to show that the force developed by rotary diffusion would
be more than adequate. We previously reached the conclusion, based on
Meselson's calculation, that events in the junction, rather than
rotary diffusion, are rate-limiting even for the fast rates we observed
at that time(1) . This conclusion is now strongly supported by
the lack of dependence of the step rate on the length of the arms. The
diffusion-driven reaction is similar to the classic example of a
bimolecular reaction driven by a collision mechanism as developed by
Noyes (33) and by Berg and von Hippel(34) . They point
out that the rates of most bimolecular reactions are limited by the
need for activation of intermediates to overcome energy barriers of a
chemical or structural nature. Diffusion control of the rate is reached
only in the absence of such barriers in which case the maximum possible
rate is achieved, and the process would have an E
of about 4 kcal mol
in the 40 to 50 °C
temperature range. However, only a fraction of the energy of rotary
diffusion can be delivered in a way that will result in the opening of
the junction. The four arms of double-stranded DNA that are tethered to
the junction exert a force on the junction that is applied both
randomly and independently. This results in a varying net force being
applied to opening hydrogen bonds in the junction. Robinson and Seeman (35) have formulated this process as the provision of a
``productive torque'' that can cause the opening of the
junction. The torque is at a maximum when each of the four arms rotates
simultaneously in the direction that will result in a cooperative
contribution that can be achieved in either of two directions. Since
the rotation is random in direction and force, many of the net torques
will be nonproductive. This process will make a large contribution to
H
because only a small fraction of
the delivered potential energy will be effective since most of it is
dissipated as entropy of activation. Another contribution to
H
is a consequence of the
equilibrium step between the A and B or A and C structures that are
part of the mechanism in Fig. 7. This will make a contribution
to
G
that depends on the
equilibrium constant, K = (A)/(B). It
would amount to
G
= 9 kcal
mol
for K = 10
.
This contribution represents the energy difference between the
X-stacked Mg
complex and the unfolded square shown in Fig. 7. This
G
may be
larger than our order of magnitude estimate. An additional contribution
will result from energy differences between the square configuration
and the activated state, including the pair of hydrogen bonds that must
be broken.
The rate of branch migration reported in Table 1is much too slow to account for in vivo genetic recombination mediated by the Holliday junction and operating according to the mechanisms we previously suggested(13, 36) . This point has also been made by Panyutin and Hsieh(7) . It is evident that branch migration in vivo must be a catalyzed reaction if it is to fulfill the role in recombination generally attributed to it. We have described catalysis of branch migration by EcoRI (Table 2) and by protein components in extracts of E. coli (see ``Results''). These results show the possibility of such catalysis, but not of sufficient activity to account for in vivo recombination. We assume the catalysis by EcoRI to be a consequence of its nonspecific binding to DNA and the subsequent one-dimensional diffusion that permits it to find its specific binding site. This process has been defined for the lac repressor by Berg and von Hippel (34) and Winter et al.(37) and for EcoRI by Terry et al.(38, 39) . The collision of this bound, but randomly moving protein, with a junction could contribute to the opening of the base pairs that form the junction and thus permit its migration (Fig. 7).
Branch migration rates slower than those
we first reported (1) have been observed in several
studies(3, 4, 5, 6, 7) .
Except for the recent work of Panyutin and Hsieh(7) , none of
these results were calculated to yield step rates and thus cannot be
compared directly with each other or with the rates determined here. We
have recalculated the results of these studies (3, 4, 5, 6) in terms of step rates
employing the methods and equations developed under
``Appendix.'' This requires determining values for the size
parameter, r, and whether the equation for the single walk or
that for the multiple walk as defined under ``Appendix''
should be used. If the substrates used have regions of nonpalindromic
DNA, this must be allowed for. These criteria are considered in detail
under ``Appendix.'' We find that the results of Courey and
Wang(3) , Sinden and Pettijohn(5) , and Johnson and
Symington (6) all give step rates that are 3-fold to 4-fold
greater than the calculation reported here for the same temperature and
for approximately the ionic conditions. The value calculated from
Gellert et al.(4) is based on a t of ``about 1 h'' (
)and leads to a step rate
of about one-third of the rate we find. Some of this difference may be
due to the lower concentration of Mg
in their buffer.
In all of these studies, restriction enzymes and, in some cases, other
proteins were present in the buffer. We interpret these results as
characterizing catalyzed reactions of branch migration.
An
additional kinetic study was made available to us before publication by
Dr. P. Hsieh. In it Panyutin and Hsieh (7) report measurements
on cruciforms in which two arms of the junction in the initial
configuration were blocked by nonpalindromic DNA, thus making the path
of branch migration similar to that in the extruded cruciform. Their
results were calculated using a computer simulation of the random walk
which gives the same results as our or 13. Their results
are reported as half-times and step times in buffers containing 10
mM (Mg) at several temperatures. The step
rate constant we have used is given by the reciprocal of their step
time in seconds. Their results at 37 °C of 3.5 bp s
are in agreement with our value of 4.0 bp s
.
This agreement provides support for the assumption that nonpalindromic
DNA adjacent to a palindrome acts as a reflecting barrier to branch
migration.
The random walk can be represented as a binomial distribution and the probability of occupancy of a position in the walk calculated from the binomial coefficients. This is illustrated in Fig. 8which shows the first 8 steps of such a walk in the form of Pascal's triangle(40) . The triangle facilitates the definition of the scales that relate the walk to the progress of branch migration in DNA. The combinatorial identity given in Fig. 8uniquely defines the x scale which gives the positions that can be occupied by a walker after n steps. The k scale is a translation of the x scale to place the beginning of the walk at the origin of the scale. The m scale is provided in order to relate the progress of the walk to the unit step of 1 base pair of DNA as defined in Fig. 7. On the m scale, parity is required such that both n and m must be even or both must be odd to permit occupancy, as is evident from Pascal's triangle. Odd values are not shown since we have chosen n = 8 to illustrate the distribution of probabilities.
Figure 8: The unrestricted, one-dimensional, symmetric random walk represented as Pascal's triangle. The generating function for the terms of the binomial expansion is
The coefficients which
constitute Pascal's triangle are given by C(n, x) = n!/[x! (n - x)!]. The probability of occupancy is then P(n, x) = C(n, x)2. Also shown are the
alternative k and m scales as discussed in the
text.
Our initial formulation of the walk (1) was taken from Feller (page 179 of (41) ) and was based on the k scale. The alternative m scale arises from a definition of m as the point reached after n steps such that (n + m) are positive and (n - m) are negative. This is equivalent to Feller's definition (page 75 of (41) ). The general form for the binomial distribution employing the m scale is then
This equation can be used for unrestricted branch migration calculations for small values of n. As is customary for large values of the discrete variable n, a transition is made to the continuous variable N, of the equivalent normal density function
or for cumulative values, of the normal distribution function
The criterion for these relations to hold is that the approximating normal have the same mean and variance as the binomial. This criterion is developed by Feller (41) and more directly in texts on stochastic processes such as Harnett(42) . The application of this criterion in the transition from to the equivalent normal distribution requires that the probabilities in and be formulated in terms of m/2. In this case, the mean of m/2 = 0 and the variance = N/4.
In our first application of the random walk to branch migration(1) , the progress of an unrestricted walk was calculated on the assumption that the unit step on the k scale was 1 base pair. This error, which does not enter into the other calculations in the paper, was corrected in a footnote to (13) .
where V = initial temperature, v = temperature at time t at position x,
2
= length of rod with origin at center, i.e. -
< x < +
,
=
conductivity or diffusivity coefficient of heat conduction, comparable
to the diffusion coefficient for molecular diffusion.
The initial heat content of the rod is
where A = cross-sectional area of
the rod and C
= heat capacity of the rod
per unit volume.
The heat content of the rod at time t is
Heat loss from the rod corresponds to runoff in branch
migration. We therefore find the heat loss on a fractional basis, H, by taking the difference between and and dividing by H
to
obtain
The result of the integration, which involves only the cosine
term of , is simplified by using sin (2j + 1)/2 = (-1)
to yield
This equation is converted to the branch migration case by
substituting N or kt for 2 t, RO for H
, and r for
to obtain
gives a value for N from an experimentally determined RO, and, thus, k from N = kt since r is known for a given DNA substrate. The equation was programmed in QBASIC in double precision in order to evaluate the number of terms of the series required for a specified precision as a function of RO. The series converges rapidly at RO > 0.3, but requires about 15 terms at an RO of 0.1 and about 100 terms near an RO of 0.01, depending on the precision needed. However, the calculation can be greatly simplified by the use of two limit approximations to , one for high and one for low RO. A logarithmic plot of is shown in Fig. 9. It is evident that as RO increases, the equation approaches a straight line and the rate becomes essentially a first order process. The equation of this line is the same as that of for j = 0. Thus
will give a value of N that is 0.3% lower that that given by for the same RO at RO = 0.5 and a diminishing error at higher RO. The error increases as RO decreases, becoming 1% at RO = 0.46. At low values of RO, approaches direct proportionality to /r (See Fig. 3B of (1) which is a computer-generated plot of the multiple walk that is identical with the plot of .) An examination of the ratio at low values of RO, using , shows that it approaches . The relationship
can be used in place of at low RO. The value of N obtained from will be lower than that from by an increasing amount, reaching 0.3% at RO = 0.52 and 1% at RO = 0.58. and cover the range of RO from 0 to 0.52 and from 0.52 to 1, respectively, with an error no greater than 0.3% and we have used them routinely.
The single walk with symmetric absorbing barriers is also shown in Fig. 9. The integrated equation for this process was obtained by Spitzer (44) in a form similar to that of . It is given below as , slightly modified for application to branch migration.
reproduces the computer-generated solution to this problem shown in Fig. 3A of (1) . The series expansion in has convergence properties similar to those for , and we have obtained similar limit approximations at high and low RO that are convenient for making calculations. At high RO, the limit approximation is the straight line, which is for j = 0 and which is shown in Fig. 9
will give a value of N that is 0.1% higher than that given by for the same RO at RO = 0.41 and one that is 1% higher at RO = 0.25. The approximation at low RO is one that we previously used and is shown in Fig. 3A of (1) . It was obtained from Feller (page 90, Equation 7.7 of (41) ) and is given here in a form adapted to the branch migration problem:
The integral can be obtained from tables of probability
functions (45) . This approximation is 0.1% higher in N at RO = 0.41 and 1% higher at RO = 0.62 than at the same RO. The QBASIC programs for and are available as Anonymous FTP. ()
The single walk, , describes branch migration from an initial configuration that is identical for all molecules and starts at the origin of the m scale (Fig. 7). The equation also applies to branch migration from a cruciform extruded from a reverse repeat in a plasmid by supercoiling. Branch migration can then be initiated by nucleolytic relaxation of the supercoils. The single walk equation will be applicable to this situation if it is assumed that the migration starts on a perfectly reflecting barrier and proceeds with random steps, but because of the reflecting barrier, with a net movement in one direction toward an absorbing barrier at the end of the repetitive DNA. Supercoiled plasmids were used in this way by Courey and Wang(3) , Gellert et al.(4) , and Sinden and Pettijohn (5) in studies of branch migration. These results are further considered under ``Discussion.''
The
quantity, N/r, is derived from an
experimental RO by the use of one of .
In order to calculate N and thus k, it is necessary
to specify r. This parameter defines the position of the
absorbing barrier and corresponds to the half-length of the cylindrical
rod on which for the multiple walk is based. The
transition from the diffusion of heat to the random walk requires that r be measured as a number of steps. For the branch migration
of DNA, r is equal to the number of steps that would be
required to move a single junction from an initial position to a runoff
position with a continuous non-random movement. If there is more than
one possible path, the length of each must be weighted by the
probability of its occurrence. The use of a step size of 1 bp makes the
number of non-random steps in this path equal to the number of base
pairs in r and makes it possible to specify r from
the structure of the substrate. For X-forms from figure 8s or for other
cruciforms of the kind we have used, this definition leads to r = DNA bp/4 where DNA bp is the molecular size of the
substrate or of its migratable portion.
Fig. 9shows that the curves for both initial distributions of migrating species approach straight lines that have the same slope. It can be shown by computer simulation that this will occur for any initial distribution. These straight lines represent a condition in which the distribution of migrating species and the runoff have reached a steady state and the process is kinetically first order, that is, the rate of runoff is proportional to the fraction of the species remaining to run off. This phenomenon is discussed in treatments of probability as a stationary distribution (see pp. 135 and 335 of (41) ).
Another variation on the basic random walk that, like the step size, cannot be proven or disproven solely by kinetic measurements, is an elastic barrier similar to that defined by Feller (page 343 of (41) ). If such a barrier requires that a step to an adjacent position has a probability of 0.5 and the same probability of remaining at (or returning to) its initial position, the walk would have the same properties and the same mathematical pattern of runoff as the basic walk if a suitable definition of a step were made.
Our choice of
step size of 1 bp is entirely arbitrary with respect to the use of
random walk theory. Another choice of bp per step would result in
different values of r, N, and k, but when
the derived value of k is converted to the number of bp
s, the result will be the same for any assumed step
size including nonintegral values. (
)The step size of 1 bp
has only the advantage of its direct correspondence with the structural
unit of DNA. This point is made to emphasize that any choice of step
size for the physical process must be based on considerations of
chemical structure and energetics.
discussed above for the calculation of N are
based on the normal approximation to the binomial distribution. The
precision of this approximation is discussed by Feller (pp.
179-190 of (41) ) for an unrestricted, single walk. He
shows that it is quite accurate even for low values of N for
calculations involving the central terms of a distribution, but that
the precision deteriorates in the region of the tails. The same problem
can lead to large errors, particularly in the case of the multiple
walk, in calculations for small DNA substrates and low values of RO. In
order to evaluate these errors, we have made computations in which (n/r)
(
)as a
function of RO was obtained for substrates in which r was
given values from 5 to 320. The computer programs yield RO for each
step for the binomial progression of both the single and multiple
walks. The basis for these algorithms was described previously (1) and is given in further detail in the documentation
associated with the deposit of these programs in Anonymous FTP.
The percent error in N resulting from the use of (N/r
)
instead
of (n/r
)
was calculated.
The results for both multiple and single walks are summarized in Table 4. It should be possible using Table 4to determine
whether calculations of the desired precision can be made from or 14 or the approximations to them. If not, the binomial
progression of a walk is easily programmed or our QBASIC programs are
available. They enable the calculation of (n/r
)
from an experimental
RO obtained using a substrate of any specified value of r. The
error in the single walk calculation is much less than that for the
multiple walk because runoff does not occur in the single walk until n > r. This allows a closer approach to a steady
state between RO and the distribution of migrating species than is
possible for the multiple walk for which runoff begins with the first
step. In the work reported in this paper, the lowest value of r was 179 bp. Comparison with the values in Table 4shows that
the error at the ROs used in our work was negligible.
Another problem in the calculation of rate constants is encountered if the initial state of the population of migrating molecules cannot be accurately identified as that for the multiple or for the single walk or, for example, if branch migration occurred during the preparation of the substrate. In such cases, large errors can be made in calculations based on low values of RO. These errors can be avoided by limiting the calculation to the difference between two or more timed points on the linear portion of the curves shown in Fig. 9. In such a calculation any product from branch migration that is present at the beginning of the timed reaction must be subtracted from the product peak in the zero time control pattern. This procedure takes advantage of the general property of first order reactions that the rate constant calculation is independent of the initial concentration.
Our definition of the multiple walk is a walk starting from an initial distribution of the positions of branch-migrating junctions which were randomized over all possible positions on the substrate. This distribution resulted from the randomization of the positions of the pair of EcoRI sites on the G4 figure 8 by branch migration before the conversion of the figure 8s to X-forms by EcoRI. The distribution was maintained by performing all necessary manipulations at 0 °C, prior to the kinetic experiment. The random distribution also prevails for the cruciform substrates in which it is determined, during their annealing, by the interaction between the nucleation and extension events leading to double-stranded structure and the formation of a Holliday junction. Following nucleation at one of the four reverse-repeat positions of a cruciform precursor, the zipping up of double strands will continue rapidly to yield a linear molecule unless a second similar event occurs at one of the adjacent positions. When the two zippings intersect, the process is stalled, and the arm lengths thus established will persist as the remainder of the cruciform is completed by nucleation and annealing from the two remaining reverse-repeat positions. No significant branch migration occurs during this period because it is so much slower than the zipping reaction. The relative arm lengths will then be determined by the relative timing of the two initial nucleations, which we assume to be random events. This randomness may be somewhat compromised if the XhoI and the EcoRI reverse-repeat positions do not nucleate with the same probability.
Direct evidence for the adherence of our substrates to the multiple walk formulation is provided by several kinetic runs having points from both the low and the high RO regions of the curve on Fig. 9. Points from a single kinetic experiment using G4 X-forms are superimposed on the theoretical curve in this figure in order to show agreement with theory in both regions of the curve. The data for the 3694-bp cruciform are less precise, but also show agreement between rate constants calculated at low and high ROs.
The above consideration of random walks has been limited to those with absorbing barriers because of the application of this restriction to our experimental work. Calculations, for example, involving the distance in a genome that a Holliday junction will move with respect to time or to a number of steps can be made by the use of for the unrestricted walk. It must be borne in mind that such a calculation can have only a probabilistic answer, i.e. a statement of the probability that a junction will be found in the region between two points at specific distances from the origin after a given number of steps or a given time.