*Department of Evolutionary Biology, Zoological Institute, University of Copenhagen Denmark, Copenhagen, Denmark; and
Department of Statistics, Oxford University, Oxford, England
It is generally believed that sequence heterogeneity in PCR products from fossil remains are due to regular DNA polymerase errors as well as miscoding lesions compounded by damage in the template DNA (Pääbo 1990
; Handt et al. 1994b, 1996
; Höss et al. 1996
; Krings et al. 1997
). However, it has been difficult to test the frequency with which this assumption holds. First, DNA extractions from fossil remains rarely produce a yield large enough for pre-PCR analysis of postmortem modifications (Höss et al. 1996
). Second, in most cases, it is not possible to determine whether nucleotide misincorporations by the DNA polymerase enzyme during amplification are caused by regular DNA polymerase errors or miscoding lesions in the template DNA sequences (Greenwood et al. 1999
). Finally, the error rates of the DNA polymerase enzymes for PCR have proved to be highly unpredictable, making it difficult to account for regular DNA polymerase errors in amplified DNA sequences (Eckert and Kunkel 1991
).
Here, we present a statistical model for analyzing PCR-mediated base-misincorporations, catalyzed by the commonly used Thermus aquaticus (Taq) polymerase enzyme, in amplification products from fossil remains.
The error rate of the Taq polymerase enzyme may vary more than 10-fold (2 x 10-4 to <1 x 10-5 per nucleotide per cycle) according to the precise DNA sequence and the in vitro conditions of DNA synthesis (Eckert and Kunkel 1991
). Therefore, the tests of the model rely solely on the relative distribution of the distinct Taq polymerase errors, which, in contrast to the highly variable error rate, is nearly constant and independent of the starting template material and the conditions for the PCR, as shown in table 1
. Hence, the tests are not affected by variations in PCR efficiencies and accuracy. The model compares the distribution of the regular Taq polymerase errors with the observed substitutions in amplification products from fossil remains under the hypothesis that any significant differences between the distributions are due to miscoding lesions in the template DNA sequences used for PCR. The model was applied to published multiple clone sequences of the mitochondrial (mt) DNA control region from three differently preserved specimens of Homo representing different ages: a
600-year-old Hokokam Indian (VC15A) found in a cave in Arizona, southwestern United States (Handt et al. 1996
), the
5,000-year-old ice man recovered from a glacier in the Tyrolean Alps (Handt et al. 1994b
), and the >30,000-year-old Neanderthal-type specimen found in a limestone quarry near Düsseldorf, Germany (Krings et al. 1997
).
|
Clone sequences whose ancient origins were considered uncertain by the authors were omitted from the analysis. Furthermore, the sequence materials used in the model were all obtained using different primer pairs that enabled partially overlapping sequences to be amplified in order to prevent amplification of nuclear insertions (Handt et al. 1994b, 1996
; Krings et al. 1997
). Therefore, contaminant DNA, as well as nuclear insertions, were unlikely to be present in the clone sequences used for analysis.
For each of the specimens, a sequence was constructed that contained all of the observed substitutions in the multiple-clone data set. This sequence was then compared with the proposed consensus sequence of the specimen, and the number of substitutions was calculated (table 2 ). Identical substitutions in a given position present in more than one clone sequence were treated as single events. Ambiguous residues (0.3% in the Tyrolean ice man, 1.0% in the Neanderthal), indels (0.3% in the Hokokam Indian, 1.1% in the Tyrolean ice man, 1.1% in the Neanderthal), and positions with two or more nonidentical substitutions (0.5% in the Neanderthal) were omitted from the analysis. All columns in the alignment of the consensus sequence and the sequence incorporating substitutions were considered as independent observations arising from a common distribution. As a consequence, if p is the probability of a pre-PCR derived substitution and q is the probability of a regular Taq polymerase error, the additive probability of a substitution is p + q. The index notation used to describe the data and the model is explained in table 2 .
|
|
When contamination, nucleic insertions, and mitochondrial heteroplasmy are excluded as significant contributors to the observed sequence heterogeneity, we find the only plausible reason for the discrepancy between the expected and observed distribution of base-misincorporations in the clone sequences to be miscoding lesions in the template DNA sequences. As all three specimens differ in age and preservation conditions, the result suggests that miscoding lesions are common in DNA from fossil remains, across the ages of specimens and their preservation conditions.
To investigate for significant differences in the distribution of pre-PCR derived transitions, the clone data sets were tested under the following hypothesis (H2): Do ATGC (TS1) and GC
AT (TS2) substitutions occur at the same rate? The test of H2 is shown in table 3
.
We found that only the clone sequences from the Neanderthal specimen contain significantly larger amounts of CGTA changes than TA
CG changes (P < 0.05) (table 3
). As this is the oldest of the specimens, the results suggest that distinct miscoding lesions occur at different rates, producing a displacement between transitions with time. This is in agreement with the observation that hydrolytic deamination of cytosine and its homolog 5-methyl cytosine to uracil and thymine, generating CG
TA transitions during replication, are among the major types of miscoding lesions in the genome of living human cells. These transitions are believed to occur at a rate about 3050 times that of hydrolytic deamination of adenine to hypoxanthine, generating TA
CG transitions during replication (Lindahl 1993b
).
The inclusion of the distribution of Taq polymerase errors in the statistical model causes a problem of overparameterization, which limits the opportunities for statistical analysis (table 3 ). Using high-fidelity polymerases such as the Pfu with an error rate of 2.0 x 10-6 to 6.5 x 10-7 per nucleotide per cycle (Flaman et al. 1994
; André et al. 1997
) would permit regular DNA polymerase errors to be completely ignored in the statistical model. This would allow for comparisons of factors such as the amounts of transitions and transversions within a clone data set and transition/transversion ratios among different data sets. Therefore, future amplification of DNA from fossil remains should be carried out using high-fidelity DNA polymerases, as has recently been proved possible (Willerslev et al. 1999
).
In summary, the results provide statistical evidence for the assumption that heterogeneity observed in PCR products from fossil remains in general are due to regular DNA polymerase errors as well as miscoding lesions in the template DNA sequences (Pääbo 1990
; Handt et al. 1994b, 1996
; Krings et al. 1997
). Furthermore, the results suggest that miscoding lesions in DNA sequences from fossil remains can occur with different rates generating a displacement of transitions with time.
Acknowledgements
We are grateful to M.-A. Coutellec-Vreto, S. Mathiasen, J. Pritchard, and S. Sumner for critical reading of the manuscript. A.J.H. and E.W. were supported by the VELUX Foundation of 1981, Denmark, and C.W. was supported by grant BBSRC 43/MMI09788 and the Carlsberg Foundation, Denmark. A.J.H. and E.W. contributed equally to this work and should be regarded as joint first authors.
Footnotes
Fumio Tajima, Reviewing Editor
1 Keywords: ancient DNA
miscoding lesions
Taq polymerase errors
2 Address for correspondence and reprints: Eske Willerslev, Department of Evolutionary Biology, Zoological Institute, University of Copenhagen Denmark, Universitetsparken 15, DK-2100, Copenhagen Ø, Denmark. E-mail: ewillerslev{at}zi.ku.dk
literature cited
André, P., A. Kim, K. Khrapko, and W. Thilly. 1997. Fidelity and mutational spectrum of Pfu DNA polymerase on a human mitochondrial DNA sequence. Genome Res. 7:843852.
Austin, J. J., A. B. Smith, and R. H. Thomas. 1997. Palaeontology in a molecular world: the search for authentic ancient DNA. TREE 12:303306.
Dunning, A. M., P. Talmud, and S. E. Humphries. 1988. Errors in the polymerase chain reaction. Nucleic Acids Res. 16:10393.
Eckert, K. A., and T. A. Kunkel. 1990. The fidelity of DNA polymerase used in the polymerase chain reaction. Pp. 225244 in M. J. McPherson, P. Quirke, and G. R. Taylor, eds. PCR: a practical approach. IRL Press, Oxford University Press, Oxford, England.
. 1991. DNA polymerase fidelity and the polymerase chain reaction. PCR Methods Appl. 1:1724.[Medline]
Flaman, J.-M., T. Frebourg, V. Moreau, F. Charbonnier, C. Martin, C. Ishioka, S. H. Friend, and R. Iggo. 1994. A rapid PCR fidelity assay. Nucleic Acids Res. 22:32593260.[ISI][Medline]
Gocke, C. D., F. A. Benko, and P. K. Rogan. 1998. Transmission of mitochondrial DNA heteroplasmy in normal pedigrees. Hum. Genet. 102:182186.[ISI][Medline]
Greenwood, A. D., C. Capelli, G. Possnert, and S. Pääbo. 1999. Nuclear DNA sequences from late Pleistocene megafauna. Mol. Biol. Evol. 16:14661473.[Abstract]
Handt, O., M. Höss, M. Krings, and S. Pääbo. 1994a. Ancient DNA: methodological challenges. Experientia 50:524529.
Handt, O., M. Krings, R. H. Ward, and S. Pääbo. 1996. The retrieval of ancient human DNA sequences. Am. Hum. Genet. 59:368376.[ISI][Medline]
Handt, O., M. Richards, M. Trommsdorff et al. (13 co-authors). 1994b. Molecular genetic analyses of the Tyrolean ice man. Science 264:17751778.
Höss, M., P. Jaruga, T. H. Zastawny, M. Dizdaroglu, and S. Pääbo. 1996. DNA damage and DNA sequence retrieval from ancient tissues. Nucleic Acids Res. 24:13041307.
Krings, M., A. Stone, R. W. Schmitz, H. Krainitzki, M. Stoneking, and S. Pääbo. 1997. Neandertal DNA sequences and the origin of modern humans. Cell 90:1930.
Lindahl, T. 1993a. Recovery of antediluvian DNA. Nature 365:700.
. 1993b. Instability and decay of the primary structure of DNA. Nature 362:709715.
. 1997. Facts and artifacts of ancient DNA. Cell 90:13.
. 2000. Fossil DNA. Curr. Biol. 10:616.
Pääbo, S. 1990. Amplifying ancient DNA. Pp. 159166 in M. A. Innis, D. H. Gelfand, J. J. Sninsky, and T. J. White, eds. PCR protocols: a guide to methods and applications. Academic Press, San Diego.
Pääbo, S., R. G. Higuchi, and A. C. Wilson. 1989. Ancient DNA and the polymerase chain reaction. J. Biol. Chem. 264:97099712.
Pääbo, S., and A. C. Wilson. 1991. Miocene DNA sequencea dream come true? Curr. Biol. 1:4546.
Saiki, R. K., D. H. Gelfand, S. Stoffel, S. J. Scharf, R. Higuchi, G. T. Horn, K. B. Mullis, and H. A. Erlich. 1988. Primer-directed enzymatic amplification of DNA with thermostable DNA polymerase. Science 239:487491.
Tindall, K. R., and T. A. Kunkel. 1988. Fidelity of DNA synthesis by the Thermus aquaticus DNA polymerase. Biochemistry 27:60086013.
Willerslev, E., A. J. Hansen, B. Christensen, J. P. Steffensen, and P. Arctander. 1999. Diversity of Holocene life forms in fossil glacier ice. Proc. Natl. Acad. Sci. USA 96:80178021.