(Received for publication, October 17, 1994; and in revised form, November 17, 1994)
From the
We have determined the fidelity of DNA replication by human cell
extracts in reactions containing excess dGTP. Replication errors were
scored using two M13 DNA substrates having the replication origin on
opposite sides of the lacZ -complementation gene. The
data suggest that the average rates for replication errors resulting
from G(template), T
dGTP, and A
dGTP mispairs are 25
10
, 12
10
, and 3
10
, respectively. The data also suggest that error
rates for both the (+) and(-) strands differ by less than
2-fold when they are replicated either as the leading or lagging
strand. This is in contrast to the 33- and 8-fold differences observed
earlier for G
dTTP and C
dTTP mispairs on the (+) strand
when replicated by the leading or lagging strand complex (Roberts, J.
D., Izuta, S., Thomas, D. C., and Kunkel, T. A.(1994) J. Biol.
Chem. 269, 1711-1717). Thus, the relative fidelity of the
leading and lagging strand replication proteins varies with the mispair
and sequence considered. Misincorporation of dGTP preferentially occurs
at template positions where dGTP is the next correct nucleotide to be
incorporated. This ``next nucleotide'' effect is
characteristic of reduced exonucleolytic proofreading and suggests that
these replication errors are normally proofread efficiently. Fidelity
measurements performed in the absence or presence of dGMP, an inhibitor
of proofreading exonuclease activity, suggest that the leading strand
replication complex proofreads some mispairs more efficiently than does
the lagging strand replication complex.
Studies with purified DNA polymerases performed during the last
25 years have been invaluable for understanding the basic principles
for accurate DNA polymerization (reviewed in Echols and Goodman(1991),
Kunkel(1992), and Johnson(1993)). These studies have shown that several
discrimination steps in the reaction cycle determine the selectivity
for correct nucleotide incorporation and the efficiency of
exonucleolytic proofreading. They have also revealed that the fidelity
of polymerization reactions can be highly variable, depending on the
DNA polymerase under study, the type of error being considered (e.g. base substitution versus frameshift), the base
composition of the mispair or misalignment, the symmetry of the error (e.g. TdGTP (
)versus G
dTTP or
addition versus deletion intermediate), and the local sequence
surrounding the error.
As complex as these model polymerization
reactions are, replicating the entire genome of an organism is much
more complicated. More than one DNA polymerase and several accessory
proteins are required to replicate the two antiparallel strands
coordinately. Thus, a full appreciation of how genomes are stably
replicated and how instability may arise to generate disease requires a
better understanding of the fidelity of this complex replication
machinery. An important step toward achieving this understanding has
been the development of systems that replicate double-stranded DNA in vitro. One system for studying human genomic replication
depends on the SV40 origin of replication (for recent review, see
Stillman(1994)). Circular double-stranded DNA substrates containing the
SV40 origin can be fully replicated by the proteins present in human
cells, with only the addition of SV40 large T antigen needed to
initiate replication at the origin. These factors can be supplied
either by crude extracts of human cells grown in culture or by
reconstitution with purified proteins prepared from such extracts (Waga
and Stillman, 1994). At least two DNA polymerases, and
, are
among the host factors required for complete DNA replication (Lee et al., 1989; Weinberg and Kelly, 1989; Melendy and Stillman,
1991). Additional host proteins are required for specific initiation at
the origin, for chain elongation on the leading and lagging strands,
and for completion and separation of the daughter molecules.
As
measured with DNA substrates containing reporter genes for scoring
replication errors, SV40 replication in unfractionated cell extracts
has been found to be highly accurate (Roberts and Kunkel, 1988; Hauser et al., 1988). Replication is in fact more accurate than DNA
synthesis by either the 4-subunit DNA polymerase -primase complex
or DNA polymerase
with its associated 3`
5` exonuclease
(Thomas et al., 1991). Further investigation of highly
accurate replication thus requires reaction conditions that generate
replication errors above the background frequencies of existing
fidelity assays. One approach has been to replicate damaged DNA (Carty et al., 1992; Thomas and Kunkel, 1993; Thomas et al.,
1993, 1994). Another strategy is to replicate undamaged DNA using a
damaged dNTP (Pavlov et al., 1994). Still a third approach
uses undamaged substrates in reactions containing unequal
concentrations of dNTPs to force errors that revert specific
pre-existing substitution (Roberts and Kunkel, 1988; Roberts et
al., 1991) or frameshift mutations (Bebenek et al., 1992;
Roberts et al., 1993).
In order to define SV40 replication
fidelity with respect to the type, base composition, symmetry, and
location of errors, we are performing experiments using a fidelity
assay that detects a variety of substitution, deletion, and addition
errors in a target sequence of several hundred base pairs. The first
such study used reactions containing excess dTTP to force a specific
subset of replication errors (Roberts et al., 1994). Two of
twelve possible substitutions as well as single-nucleotide frameshifts
were induced by this substrate imbalance. Errors were found throughout
the 250-base pair target, but they were distributed non-randomly. Two
hot spots were observed, one for a G A transition and one for
the loss of a G
C base pair in a homopolymeric run. Examination of
the fidelity of replication of the same sequence when copied as the
leading or lagging strand suggested that the overall error rates for
G
dTTP and C
dTTP mispairs as well as the error rates at the
two hot spots depended on whether replication was performed by leading
or lagging strand replication proteins.
The current study presents two sets of experiments intended to expand our understanding of the fidelity of the human replication apparatus. The first set describes replication fidelity in reactions containing excess dGTP to define a new set of substitution and frameshift error rates on the leading and lagging strands that are forced by this substrate imbalance. This pool bias provides information on 3 more of the 12 possible mispairs, and the observed error specificity further suggests that proofreading contributes to replication fidelity. The analysis also reveals a base substitution hot spot that is detected as a lagging strand error but not as a leading strand error. A similar observation was made in the earlier study with excess dTTP (Roberts et al., 1994), but it was for a different mispair at a different location. In both cases, the error specificity is consistent with the possibility that some mispairs are more effectively proofread during leading strand replication than during lagging strand replication. The second set of experiments was performed to examine this possibility.
Figure 1:
Replication error
spectra in reactions containing excess dGTP. The mutational spectra
with both the Ori left and Ori right vectors are displayed above and below, respectively, the double-stranded sequence of
the lacZ region of M13mp2SV. Each mutation is shown as
the mispair considered most likely to occur under the reaction
conditions used. Only those mutations consistent with the pool bias are
displayed. The underlined nucleotides in the sequence are
sites at which mutations generated by mispairs with dGTP have
previously been identified and are known to be detectable. Arrows on the right indicate the direction of synthesis for that
strand. Open triangles represent the loss of a single
nucleotide; closed triangles represent the addition of a
single nucleotide. Because neither the nucleotide that is lost or added
in homopolymeric runs nor the strand that represents the template
strand for frameshift errors is known, the deletion or addition is
centered under the runs in the (+)
strand.
The reproducible increases in mutant frequency suggest that many of
the mutants obtained from replication products may have resulted from
incorporation of dGTP during replication. To examine this possibility,
the DNA sequences of 104 mutants from excess dGTP-containing reactions
with Ori left substrate and 180 mutants from reactions with the Ori
right substrate were determined for nucleotides -84 through
+170 of the lacZ -complementation gene. Sequence
changes were found in 76 Ori left mutants and 117 Ori right mutants (Table 2). The remainder had no change in the 254-nucleotide
target sequence. We have previously reported (Roberts et al.,
1994) that, with a dTTP bias, some mutants had changes between
positions 170 and 479, the remaining downstream lacZ gene sequence in
M13mp2. Although that may be the case here as well, we did not analyze
sequences beyond position 170, as this would have more than doubled the
sequencing effort.
The error rates on both the
(+) and(-) strands for both vectors were calculated for
GdGTP, T
dGTP, and A
dGTP mispairs (Table 3).
Rates are expressed per detectable nucleotide incorporated to correct
for small differences in the number of detectable sites for each type
of error on each of the two strands. For all three mispairs, note that
the same sequence, whether a (+) or(-) strand, is replicated
with similar accuracy regardless of the orientation of the origin
relative to the lacZ target sequence. This is in marked
contrast to previous observations with the same assay for reactions
containing excess dTTP (Roberts et al., 1994). In that study,
on the (+) strand, there were 33- and 8-fold differences in rates
for G
dTTP and C
dTTP errors, respectively, between the Ori
left and Ori right vectors.
In the
previous study, employing a dTTP pool bias (Roberts et al.,
1994), we observed a situation similar to that just described. A
template G at position 145 on the (+) strand was found to be a hot
spot for misincorporation of dTTP but only when replicated as the
lagging strand. The sequence at this site is 5`-A-G-3` (nucleotide 145
underlined), and the next correct nucleotide to be incorporated is
dTTP. This site is thus suitable for a second test of the contribution
of proofreading to fidelity but for a mispair having the reciprocal
symmetry (GdTTP rather than
T
dGTP). Therefore, we performed parallel reactions
containing excess dTTP with and without added dGMP, and monitored G
A transitions at position 145. The results in the absence of
dGMP (Table 5, bottom) confirm our previous observation that
position 145 is replicated less accurately by the lagging strand
apparatus (the Ori left substrate) than by the leading strand apparatus
(the Ori right substrate). The addition of dGMP to the reaction
increased the error rate with both substrates. An independent repeat of
this analysis yielded a similar result (data not shown).
From this study we can infer error rates during replication
in human cell extracts for the three mispairs involving
misincorporation of dGTP. These and earlier results with excess dTTP
(Roberts et al., 1994) provide replication error rates for 6
of the 12 possible single-base mispairs, in the following order of
highest to lowest error rate: GdTTP
C
dTTP
G
dGTP
T
dGTP >> T
dTTP
A
dGTP.
This same relative order and similar error rates are obtained during
replication in extracts and in reconstituted reactions known to lack
mismatch repair activity (Roberts et al., 1994), demonstrating
the reproducibility of the observations and suggesting that this
specificity reflects the average base selectivity and proofreading
potential of the human replication apparatus. The error specificity
pattern is not characteristic of that obtained with DNA polymerase
,
(plus proliferating cell nuclear antigen), or
during
gap-filling synthesis templated by the same (+) strand lacZ sequence used here (Thomas et al., 1991). Also note that
the replication error rates for two of the four transversion mispairs
examined, C
dTTP and G
dGTP, are similar to those for the two
transition mispairs, G
dTTP and T
dGTP. These transition
mispairs are among the most common DNA polymerase misinsertion and
mispair extension errors (for review, see Echols and Goodman(1991) and
Johnson(1993)). Specificity differences between purified DNA
polymerases and the multiprotein replication apparatus could reflect
modulation of polymerase fidelity by other proteins. For example, the
single-stranded DNA-binding protein reportedly increases the accuracy
of DNA synthesis by DNA polymerase
(Carty et al., 1992),
and the 3`
5` exonuclease of human DNA polymerase
is
regulated by accessory proteins such as human single-stranded
DNA-binding protein, proliferating cell nuclear antigen, and
replication factor C (Lee, 1993).
As in our earlier study of
replication fidelity with excess dTTP (Roberts et al., 1994),
errors resulting from the presence of excess dGTP do not occur equally
at all template positions. This non-random error distribution
represents preferential misincorporation of dGTP opposite certain
template T and template G nucleotides (Fig. 1) with an overall
preference for those sites followed by a template C (incoming correct
dGTP, Table 4). This sequence preference is a hallmark of
suppression of exonucleolytic proofreading, wherein dGTP
misincorporations are not removed because the high concentration of
dGTP also favors polymerization of the next correct nucleotide prior to
excision of the error. This neighboring nucleotide effect is seen here
for both GdGTP and T
dGTP errors (Table 4) but was not
seen in an earlier study of G
dTTP and C
dTTP errors. These
data imply that, at least under the reaction conditions used here,
which are required to observe errors with the highly accurate
replication machinery, proofreading of all replication errors is not
suppressed equally by a high concentration of the next correct dNTP.
We noted that several 1-nucleotide frameshift errors at repetitive
sequence positions were generated in reactions containing excess dGTP.
For example, four mutants contained an extra GC base pair in a
three-base run of G
C base pairs at positions 88-90 and
another had one fewer G
C base pair (Fig. 1). These
mutations in repetitive sequences were not observed in either of two
previous replication fidelity studies (Thomas et al., 1991;
Roberts et al., 1994) or among background mutations from
unreplicated DNA (Thomas et al., 1991), suggesting that they
are dGTP-induced replication slippage errors. Models for their
generation have been described (Bebenek and Kunkel, 1990; Roberts et al., 1993; Roberts et al., 1994).
SV40 replication in vitro is a complex reaction that utilizes a variety of enzymatic activities and accessory proteins (for review, see Stillman(1994)). These studies have shown that there is an enzymological asymmetry at the replication fork. A major objective of comparing error specificity with the two M13mp2SV substrates was to determine if asymmetric replication enzymology on the leading and lagging strands results in unequal error rates. Initial observations revealed that replication errors inferred to result from misincorporation of dTTP opposite template G and C on the (+) strand occurred at 33- and 8-fold higher rates, respectively, when this strand was replicated as the lagging strand as compared with the leading strand (Roberts et al., 1994). In the present study, we were particularly interested in determining if lagging strand replication is also less accurate with a completely different set of replication errors, those resulting from dGTP misincorporation. Here the pattern that emerged is remarkably different from the earlier study. Although average rates for dGTP misincorporation opposite template T and G were slightly higher for the lagging as compared with the leading strand replication complex (Table 4), the differences were 2-fold or less. This suggests that the relative fidelity of the leading and lagging strand replication complexes varies depending on both the template sequence and the mispair considered.
The next
nucleotide effect shown in Table 4suggests that exonucleolytic
proofreading modulates site-specific misincorporation of dGTP. One
possible explanation for a difference in error rates between the
leading and lagging strand replication complexes for the same mispair
in the same sequence context is a difference in proofreading activity.
To determine if this could explain site-specific differences, we looked
for deoxynucleoside monophosphate inhibition of exonuclease activity at
positions 121 and 145 on the (+) strand, hot spots for two
different substitution errors by the lagging strand complex but not the
leading strand complex. The addition of dGMP, a known inhibitor of
exonucleolytic proofreading (for review, see Kunkel(1988)), increased
the error rate at both sites with both the Ori left and Ori right
substrates (Table 5). If one assumes that the addition of dGMP
does not affect the inherent base selectivity of the insertion step,
then the dGMP-dependent increase in error rate suggests that
misinsertions are indeed occurring that, in the absence of
monophosphate, are removed by the exonuclease. Because fidelity in the
absence of dGMP is higher for both errors during leading strand
replication, this suggests that leading strand misinsertions are more
effectively excised than are the lagging strand errors that are readily
detected even when dGMP is absent. This provides one mechanism that can
explain site- and mispair-specific differences in the fidelity of
leading and lagging strand replication. Because the assignment of the
leading and lagging strand DNA polymerases during eukaryotic
replication is not yet definitive (for review, see Linn(1991)),
proofreading on the two strands could be carried out by the
exonucleases tightly associated with DNA polymerase or
, by
an exonuclease that copurified with DNA polymerase
-primase
(Bialek and Grosse, 1993), or by a separate exonuclease.