Statistical Evidence for Miscoding Lesions in Ancient DNA Templates

Anders J. Hansen, Eske Willerslev, Carsten Wiuf, Tobias Mourier and Peter Arctander

*Department of Evolutionary Biology, Zoological Institute, University of Copenhagen Denmark, Copenhagen, Denmark; and
{dagger}Department of Statistics, Oxford University, Oxford, England

It is generally believed that sequence heterogeneity in PCR products from fossil remains are due to regular DNA polymerase errors as well as miscoding lesions compounded by damage in the template DNA (Pääbo 1990Citation ; Handt et al. 1994b, 1996Citation ; Höss et al. 1996Citation ; Krings et al. 1997Citation ). However, it has been difficult to test the frequency with which this assumption holds. First, DNA extractions from fossil remains rarely produce a yield large enough for pre-PCR analysis of postmortem modifications (Höss et al. 1996Citation ). Second, in most cases, it is not possible to determine whether nucleotide misincorporations by the DNA polymerase enzyme during amplification are caused by regular DNA polymerase errors or miscoding lesions in the template DNA sequences (Greenwood et al. 1999Citation ). Finally, the error rates of the DNA polymerase enzymes for PCR have proved to be highly unpredictable, making it difficult to account for regular DNA polymerase errors in amplified DNA sequences (Eckert and Kunkel 1991Citation ).

Here, we present a statistical model for analyzing PCR-mediated base-misincorporations, catalyzed by the commonly used Thermus aquaticus (Taq) polymerase enzyme, in amplification products from fossil remains.

The error rate of the Taq polymerase enzyme may vary more than 10-fold (~2 x 10-4 to <1 x 10-5 per nucleotide per cycle) according to the precise DNA sequence and the in vitro conditions of DNA synthesis (Eckert and Kunkel 1991Citation ). Therefore, the tests of the model rely solely on the relative distribution of the distinct Taq polymerase errors, which, in contrast to the highly variable error rate, is nearly constant and independent of the starting template material and the conditions for the PCR, as shown in table 1 . Hence, the tests are not affected by variations in PCR efficiencies and accuracy. The model compares the distribution of the regular Taq polymerase errors with the observed substitutions in amplification products from fossil remains under the hypothesis that any significant differences between the distributions are due to miscoding lesions in the template DNA sequences used for PCR. The model was applied to published multiple clone sequences of the mitochondrial (mt) DNA control region from three differently preserved specimens of Homo representing different ages: a ~600-year-old Hokokam Indian (VC15A) found in a cave in Arizona, southwestern United States (Handt et al. 1996Citation ), the ~5,000-year-old ice man recovered from a glacier in the Tyrolean Alps (Handt et al. 1994bCitation ), and the >30,000-year-old Neanderthal-type specimen found in a limestone quarry near Düsseldorf, Germany (Krings et al. 1997Citation ).


View this table:
[in this window]
[in a new window]
 
Table 1 Taq Polymerase Errors Obtained from the Literature

 
Contamination by contemporary DNA poses a serious threat to studies of ancient DNA, especially from human remains (Pääbo, Higuchi, and Wilson 1989Citation ). Therefore, the data sets applied to the statistical analysis were carefully chosen from the literature to ensure that all recommended criteria and controls were fulfilled, in order to verify the authenticity of the sequence material (Lindahl 1993aCitation ; Handt et al. 1994aCitation ; Austin et al. 1997Citation ). The estimated ages of all three specimens fall within the theoretical limit of 50,000–100,000 years for amplifiable ancient DNA sequences (Pääbo and Wilson 1991Citation ; Lindahl 1997, 2000Citation ). All DNA extractions and PCR setups were physically separated from running, cloning, and sequencing through the use of fully equipped pre-PCR laboratories solely dedicated to ancient DNA work. Appropriate controls were used to detect possible contamination. For each of the specimens, unambiguous and reproducible results were obtained from independent DNA extracts by different laboratories. Finally, the sequences were congruent with what can reasonably be expected from known mitochondrial sequence variation in present populations of Homo and Pan.

Clone sequences whose ancient origins were considered uncertain by the authors were omitted from the analysis. Furthermore, the sequence materials used in the model were all obtained using different primer pairs that enabled partially overlapping sequences to be amplified in order to prevent amplification of nuclear insertions (Handt et al. 1994b, 1996Citation ; Krings et al. 1997Citation ). Therefore, contaminant DNA, as well as nuclear insertions, were unlikely to be present in the clone sequences used for analysis.

For each of the specimens, a sequence was constructed that contained all of the observed substitutions in the multiple-clone data set. This sequence was then compared with the proposed consensus sequence of the specimen, and the number of substitutions was calculated (table 2 ). Identical substitutions in a given position present in more than one clone sequence were treated as single events. Ambiguous residues (0.3% in the Tyrolean ice man, 1.0% in the Neanderthal), indels (0.3% in the Hokokam Indian, 1.1% in the Tyrolean ice man, 1.1% in the Neanderthal), and positions with two or more nonidentical substitutions (0.5% in the Neanderthal) were omitted from the analysis. All columns in the alignment of the consensus sequence and the sequence incorporating substitutions were considered as independent observations arising from a common distribution. As a consequence, if p is the probability of a pre-PCR derived substitution and q is the probability of a regular Taq polymerase error, the additive probability of a substitution is p + q. The index notation used to describe the data and the model is explained in table 2 .


View this table:
[in this window]
[in a new window]
 
Table 2 Data and Notation

 
Using chi-square statistics, the clone data sets were tested under the following hypothesis (H1): Can all of the observed substitutions be ascribed to regular Taq polymerase errors? The test of H1 is shown in table 3 .


View this table:
[in this window]
[in a new window]
 
Table 3 Transitions and Transversions

 
We find the distribution of substitutions in the amplification products for all three specimens to be significantly different (P < 0.05) from the distribution expected solely from regular Taq polymerase errors (table 3 ). Therefore, regular Taq errors cannot account for all of the heterogeneity observed in the multiple-clone sequences. Mitochondrial heteroplasmy can possibly account for some of the observed substitutions in the clone sequences. However, single-site heteroplasmy in the human mitochondrial control region has been encountered at no more than one or two sites in only 1%–3% of all individuals investigated (Gocke, Benko, and Rogan 1998Citation ). Therefore, possible sequence variation caused by heteroplasmy is of insignificant importance to this investigation.

When contamination, nucleic insertions, and mitochondrial heteroplasmy are excluded as significant contributors to the observed sequence heterogeneity, we find the only plausible reason for the discrepancy between the expected and observed distribution of base-misincorporations in the clone sequences to be miscoding lesions in the template DNA sequences. As all three specimens differ in age and preservation conditions, the result suggests that miscoding lesions are common in DNA from fossil remains, across the ages of specimens and their preservation conditions.

To investigate for significant differences in the distribution of pre-PCR derived transitions, the clone data sets were tested under the following hypothesis (H2): Do AT->GC (TS1) and GC->AT (TS2) substitutions occur at the same rate? The test of H2 is shown in table 3 .

We found that only the clone sequences from the Neanderthal specimen contain significantly larger amounts of CG->TA changes than TA->CG changes (P < 0.05) (table 3 ). As this is the oldest of the specimens, the results suggest that distinct miscoding lesions occur at different rates, producing a displacement between transitions with time. This is in agreement with the observation that hydrolytic deamination of cytosine and its homolog 5-methyl cytosine to uracil and thymine, generating CG->TA transitions during replication, are among the major types of miscoding lesions in the genome of living human cells. These transitions are believed to occur at a rate about 30–50 times that of hydrolytic deamination of adenine to hypoxanthine, generating TA->CG transitions during replication (Lindahl 1993bCitation ).

The inclusion of the distribution of Taq polymerase errors in the statistical model causes a problem of overparameterization, which limits the opportunities for statistical analysis (table 3 ). Using high-fidelity polymerases such as the Pfu with an error rate of 2.0 x 10-6 to 6.5 x 10-7 per nucleotide per cycle (Flaman et al. 1994Citation ; André et al. 1997Citation ) would permit regular DNA polymerase errors to be completely ignored in the statistical model. This would allow for comparisons of factors such as the amounts of transitions and transversions within a clone data set and transition/transversion ratios among different data sets. Therefore, future amplification of DNA from fossil remains should be carried out using high-fidelity DNA polymerases, as has recently been proved possible (Willerslev et al. 1999Citation ).

In summary, the results provide statistical evidence for the assumption that heterogeneity observed in PCR products from fossil remains in general are due to regular DNA polymerase errors as well as miscoding lesions in the template DNA sequences (Pääbo 1990Citation ; Handt et al. 1994b, 1996Citation ; Krings et al. 1997Citation ). Furthermore, the results suggest that miscoding lesions in DNA sequences from fossil remains can occur with different rates generating a displacement of transitions with time.

Acknowledgements

We are grateful to M.-A. Coutellec-Vreto, S. Mathiasen, J. Pritchard, and S. Sumner for critical reading of the manuscript. A.J.H. and E.W. were supported by the VELUX Foundation of 1981, Denmark, and C.W. was supported by grant BBSRC 43/MMI09788 and the Carlsberg Foundation, Denmark. A.J.H. and E.W. contributed equally to this work and should be regarded as joint first authors.

Footnotes

Fumio Tajima, Reviewing Editor

1 Keywords: ancient DNA miscoding lesions Taq polymerase errors Back

2 Address for correspondence and reprints: Eske Willerslev, Department of Evolutionary Biology, Zoological Institute, University of Copenhagen Denmark, Universitetsparken 15, DK-2100, Copenhagen Ø, Denmark. E-mail: ewillerslev{at}zi.ku.dk Back

literature cited

    André, P., A. Kim, K. Khrapko, and W. Thilly. 1997. Fidelity and mutational spectrum of Pfu DNA polymerase on a human mitochondrial DNA sequence. Genome Res. 7:843–852.[Abstract/Free Full Text]

    Austin, J. J., A. B. Smith, and R. H. Thomas. 1997. Palaeontology in a molecular world: the search for authentic ancient DNA. TREE 12:303–306.

    Dunning, A. M., P. Talmud, and S. E. Humphries. 1988. Errors in the polymerase chain reaction. Nucleic Acids Res. 16:10393.

    Eckert, K. A., and T. A. Kunkel. 1990. The fidelity of DNA polymerase used in the polymerase chain reaction. Pp. 225–244 in M. J. McPherson, P. Quirke, and G. R. Taylor, eds. PCR: a practical approach. IRL Press, Oxford University Press, Oxford, England.

    ———. 1991. DNA polymerase fidelity and the polymerase chain reaction. PCR Methods Appl. 1:17–24.[Medline]

    Flaman, J.-M., T. Frebourg, V. Moreau, F. Charbonnier, C. Martin, C. Ishioka, S. H. Friend, and R. Iggo. 1994. A rapid PCR fidelity assay. Nucleic Acids Res. 22:3259–3260.[ISI][Medline]

    Gocke, C. D., F. A. Benko, and P. K. Rogan. 1998. Transmission of mitochondrial DNA heteroplasmy in normal pedigrees. Hum. Genet. 102:182–186.[ISI][Medline]

    Greenwood, A. D., C. Capelli, G. Possnert, and S. Pääbo. 1999. Nuclear DNA sequences from late Pleistocene megafauna. Mol. Biol. Evol. 16:1466–1473.[Abstract]

    Handt, O., M. Höss, M. Krings, and S. Pääbo. 1994a. Ancient DNA: methodological challenges. Experientia 50:524–529.

    Handt, O., M. Krings, R. H. Ward, and S. Pääbo. 1996. The retrieval of ancient human DNA sequences. Am. Hum. Genet. 59:368–376.[ISI][Medline]

    Handt, O., M. Richards, M. Trommsdorff et al. (13 co-authors). 1994b. Molecular genetic analyses of the Tyrolean ice man. Science 264:1775–1778.

    Höss, M., P. Jaruga, T. H. Zastawny, M. Dizdaroglu, and S. Pääbo. 1996. DNA damage and DNA sequence retrieval from ancient tissues. Nucleic Acids Res. 24:1304–1307.[Abstract/Free Full Text]

    Krings, M., A. Stone, R. W. Schmitz, H. Krainitzki, M. Stoneking, and S. Pääbo. 1997. Neandertal DNA sequences and the origin of modern humans. Cell 90:19–30.

    Lindahl, T. 1993a. Recovery of antediluvian DNA. Nature 365:700.

    ———. 1993b. Instability and decay of the primary structure of DNA. Nature 362:709–715.

    ———. 1997. Facts and artifacts of ancient DNA. Cell 90:1–3.

    ———. 2000. Fossil DNA. Curr. Biol. 10:616.

    Pääbo, S. 1990. Amplifying ancient DNA. Pp. 159–166 in M. A. Innis, D. H. Gelfand, J. J. Sninsky, and T. J. White, eds. PCR protocols: a guide to methods and applications. Academic Press, San Diego.

    Pääbo, S., R. G. Higuchi, and A. C. Wilson. 1989. Ancient DNA and the polymerase chain reaction. J. Biol. Chem. 264:9709–9712.[Free Full Text]

    Pääbo, S., and A. C. Wilson. 1991. Miocene DNA sequence—a dream come true? Curr. Biol. 1:45–46.

    Saiki, R. K., D. H. Gelfand, S. Stoffel, S. J. Scharf, R. Higuchi, G. T. Horn, K. B. Mullis, and H. A. Erlich. 1988. Primer-directed enzymatic amplification of DNA with thermostable DNA polymerase. Science 239:487–491.

    Tindall, K. R., and T. A. Kunkel. 1988. Fidelity of DNA synthesis by the Thermus aquaticus DNA polymerase. Biochemistry 27:6008–6013.

    Willerslev, E., A. J. Hansen, B. Christensen, J. P. Steffensen, and P. Arctander. 1999. Diversity of Holocene life forms in fossil glacier ice. Proc. Natl. Acad. Sci. USA 96:8017–8021.

Accepted for publication October 9, 2000.