The Human Genome Project Reveals a Continuous Transfer of Large Mitochondrial Fragments to the Nucleus

Tobias Mourier, Anders J. Hansen, Eske Willerslev and Peter Arctander

Department of Evolutionary Biology, Zoological Institute, University of Copenhagen, Copenhagen, Denmark

Mitochondrial genomes are believed to gradually transfer DNA fragments (numts) into the nuclear chromosomes of eukaryotic cells during evolution (reviewed in Zhang and Hewitt 1996Citation ). This assumption relies on hybridization studies of mitochondrial DNA sequences (mtDNA) (Tsuzuki et al. 1983Citation ), sequencing of numts (e.g., Lopez et al. 1994Citation ; Arctander 1995Citation ; Zischler et al. 1995Citation ; Herrnstadt et al. 1999Citation ), and similarity searches in sequence databases (Blanchard and Schmidt 1996Citation ; Bensasson et al. 2001Citation ). Here we present the first extensive analysis of numts in the human nuclear genome. Through a combination of conventional BLAST alignment (Altschul et al. 1997Citation ) and a DNA block aligning (DBA) algorithm (Jareborg, Birney, and Durbin 1999Citation ), we searched roughly 93.5% of the human genome (http://www.ncbi.nlm.nih.gov/genome/seq/) for numts. This approach revealed three notable findings. First, several numts exceed the size of the longest human numt reported to date (Herrnstadt et al. 1999Citation ). Second, all parts of the mitochondrial DNA are represented in the nuclear genome. Finally, the integration of mtDNAs into the nucleus is a continuous evolutionary process, thereby verifying previous beliefs (Zhang and Hewitt 1996Citation ; Wallace et al. 1997Citation ; Herrnstadt et al. 1999Citation ).

Through the web service provided by NCBI (http://www.ncbi.nlm.nih.gov/), we compared the complete human mitochondrial DNA and the working draft of the human nuclear genome (as of mid-April 2001) using BLAST. This procedure was followed by alignment using the DBA algorithm (Jareborg, Birney, and Durbin 1999Citation ), which found collinear blocks of conserved sequence allowing for indels between blocks. The rationale for this twofold alignment procedure stems from the assumption that two mechanisms may obscure the BLAST alignment. First, the extant mtDNA will have diverged from the ancestral sequence. Second, as the numts are presumably released from selection, larger deletions and insertions may take place.

Hits from the BLAST search (default settings) in the same sense and within the vicinity (4–6,128 bp) of each other were assessed to potentially stem from a single insertion event. If such a group of hits involved more than 100 identical positions, the genomic sequence covering all the hits and their intervening sequences were retrieved. This sequence was aligned to the corresponding mtDNA sequence using the DBA algorithm. The sequences were considered a result of a single insertion event if the DBA algorithm was able to align more than 80% of the mtDNA sequence in a collinear way.

Following the above criteria, we found 296 numts ranging between 106 and 14,654 bp in size (table 1 ). Fifteen of these were found to be longer than 5,842 bp, previously reported by Herrnstadt et al. (1999)Citation as the length of the longest human numt.


View this table:
[in this window]
[in a new window]
 
Table 1 The 60 Longest Human Numts

 

View this table:
[in this window]
[in a new window]
 
Table 1 Continued

 
Furthermore, we found that all positions of the mitochondrial genome are represented in the nuclear DNA, with the domain comprising the control region being relatively underrepresented (fig. 1 ). As this could be an artifact caused by the distal position of the control region in the linear mtDNA sequence, we constructed an alternative representation in which the control region was central. Neither this nor the removal of the low-complexity filter of BLAST produced additional hits to this region (not shown). The deficiency of numts from the control region probably results from the significantly higher evolutionary rate of extant mtDNA in this region (Saccone, Pesole, and Sbisá 1991Citation ). This hypothesis is further supported by the increased number of numts in the region comprising the central conserved domain (fig. 1 ).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 1.—Circular diagram of the number of numts descending from a given position in the mitochondria (thick line). The inner hatched circle depicts the mitochondria, with the two hypervariable segments of the control region (encompassing the central conserved domain) highlighted (black)

 
Interestingly, we found 4 numts covering the complete control region (table 1 ), signifying that these are at least the result of a DNA-based transfer (for a discussion see Shay and Werbin [1992]Citation and references therein).

To estimate the time of insertion of the numts, we collected all numt-mitochondria alignments longer than 2,000 bp (i.e., either complete numts, if they were completely alignable, or subsets of numts of which DBA blocks exceeded 2,000 bp) and aligned these with the corresponding mtDNA sequences from a variety of mammals. The phylogenetic analysis supported the general conviction that numt DNAs are continually integrated into the nuclear genome as a result of several independent evolutionary events (fig. 2 ).



View larger version (50K):
[in this window]
[in a new window]
 
Fig. 2.—Consensus tree of the phylogenetic positions of human numts, based on 35 individual bootstrap analyses of all blocks from the DBA alignment longer than 2,000 bp. The trees were constructed in PAUP*, version 4.0b4a (Swofford 1998Citation ), using the neighbor-joining algorithm based on maximum-likelihood (ML) distance measures. The shape parameters of the gamma distributions, {propto} (0.28–0.43), and the transition-transversion rates (1.6–2.9) were estimated using ML. Six branching points are depicted on the tree (A–F). On the basis of 100% support with 100 bootstrap replicates and Platypus as the outgroup, numts could be confined to one or more of the branching points, as shown below the tree. For example, numts listed in the gray box (A) have 100% bootstrap support positioned at branching point A, whereas numts listed in the box (A–B) with the same support only can be confined to either branching point A or branching point B. Needless to say, numts in the box covering all positions (A–F) are restricted to the primate clade, but their exact position is undetermined. If two or more alignment blocks come from the same numt, these have letter suffixes (see table 1 for details). The following mtDNA sequences were used (GenBank accession numbers in parentheses): human (Homo sapiens; NC_001807), chimpanzee (Pan troglodytes; NC_001643), gorilla (Gorilla gorilla; NC_001645), Orangutan (Pongo pygmaeus; NC_001646), gibbon (Hylobates lar; NC_002082), baboon (Papio hamadryas; NC_001992), wallaroo (Macropus robustus; NC_001794), opossum (Didelphis virginiana; NC_001610), and platypus (Ornithorhynchus anatinus; NC_000891). Nonprimate placentals: alpaca (Lama pacos; NC_002504), armadillo (Dasypus novemcinctus; NC_001821), bat (Chalinolobus tuberculatus; NC_002626), cat (Felis catus; NC_001700), cow (Bos taurus; NC_001567), European hedgehog (Erinaceus europaeus; NC_002080), flying fox (Pteropus scapulatus; NC_002619), guinea pig (Cavia porcellus; NC_000884), Madagascar hedgehog (Echinops telfairi; NC_002631), rabbit (Oryctolagus cuniculus; NC_001913), squirrel (Sciurus vulgaris; NC_002369), and tree shrew (Tupaia belangeri; NC_002521)

 
Since we used the working draft of the human nuclear genome for analysis, we cannot exclude that some of the recent integration events are simply due to erroneous sequencing of mitochondrial contamination. However, this will not change the above conclusions. On the contrary, the above findings may be an underestimate, since recently transferred numts may not have reached fixation (e.g., Zischler et al. 1995Citation ) and therefore may not be present in the available human genome draft.

This study presents the first extensive large-scale survey of human numts based on the human genome project—an initial step on the way to a complete catalog of human numts.

As previously stated (Perna and Kocher 1996Citation ), human numts may serve as both obstacles and tools in understanding the evolution of the human mitochondria. For example, the large number of long numts can confound studies on mitochondrial heteroplasmy as well as phylogenetic and population studies using mtDNA markers. For these studies, decisive knowledge of human numts may be crucial in detecting erroneous results due to false amplification of nuclear homologs.

On the contrary, since numts may be regarded as "molecular fossils" of mtDNA (Zischler, Geisert, and Castresana 1998Citation ), they may provide fruitful insight into the evolution of modern human mitochondria and help to uncover the evolutionary basis of contemporary human diseases related to the genetics of the mitochondria.

Supplementary Materials

A table of all 296 human numts is provided on the Molecular Biology and Evolution web site.

Acknowledgements

We thank Douda Bensasson, Kasi B. Desfor, Sylvia Mathiasen, and Seirian Sumner for help and discussions. A.J.H. and E.W. were supported by the VELUX foundation of 1981, Denmark. A.J.H. and E.W. contributed equally to this work and should be regarded as joint authors.

Footnotes

Pekka Pamilo, Reviewing Editor

1 Keywords: mitochondrial DNA nuclear insertions human genome Back

2 Address for correspondence and reprints: Tobias Mourier, Department of Evolutionary Biology, Zoological Institute, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. tmourier{at}zi.ku.dk . Back

References

    Altschul S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402[Abstract/Free Full Text]

    Arctander P., 1995 Comparison of a mitochondrial gene and a corresponding nuclear pseudogene Proc. R. Soc. Lond. B Biol. Sci 262:13-19[ISI][Medline]

    Bensasson D., D.-X. Zhang, D. Hartl, G. Hewitt, 2001 Mitochondrial pseudogenes: evolution's misplaced witnesses Trends Ecol. Evol 16:314-321[ISI][Medline]

    Blanchard J. L., G. W. Schmidt, 1996 Mitochondrial DNA migration events in yeast and humans: integration by a common end-joining mechanism and alternative perspectives on nucleotide substitution patterns Mol. Biol. Evol 13:537-548[Abstract]

    Herrnstadt C., W. Clevenger, S. S. Ghosh, C. Anderson, E. Fahy, S. Miller, N. Howell, R. E. Davis, 1999 A novel mitochondrial DNA-like sequence in the human nuclear genome Genomics 60:67-77[ISI][Medline]

    Jareborg N., E. Birney, R. Durbin, 1999 Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs Genome Res 9:815-824[Abstract/Free Full Text]

    Lopez J. V., N. Yuhki, R. Masuda, W. Modi, S. J. O'Brien, 1994 Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat J. Mol. Evol 39:174-190[ISI][Medline]

    Perna N. T., T. D. Kocher, 1996 Mitochondrial DNA: molecular fossils in the nucleus Curr. Biol 6:128-129[ISI][Medline]

    Saccone C., G. Pesole, E. Sbis, 1991 The main regulatory region of mammalian mitochondrial DNA: structure-function model and evolutionary pattern J. Mol. Evol 33:83-91[ISI][Medline]

    Shay J. W., H. Werbin, 1992 New evidence for the insertion of mitochondrial DNA into the human genome: significance for cancer and aging Mutat. Res 275:227-235[ISI][Medline]

    Swofford D. L., 1998 PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4 Sinauer, Sunderland, Mass

    Tsuzuki T., H. Nomiyama, C. Setoyama, S. Maeda, K. Shimada, 1983 Presence of mitochondrial-DNA-like sequences in the human nuclear DNA Gene 25:223-229[ISI][Medline]

    Wallace D. C., C. Stugard, D. Murdock, T. Schurr, M. D. Brown, 1997 Ancient mtDNA sequences in the human nuclear genome: a potential source of errors in identifying pathogenic mutations Proc. Natl. Acad. Sci. USA 94:14900-14905[Abstract/Free Full Text]

    Zhang D.-X., G. M. Hewitt, 1996 Nuclear integrations: challenges for mitochondrial DNA markers Trends Ecol. Evol 11:247-251[ISI]

    Zischler H., H. Geisert, A. von Haeseler, S. Pääbo, 1995 A nuclear ‘fossil’ of the mitochondrial D-loop and the origin of modern humans Nature 378:489-492[ISI][Medline]

    Zischler H., H. Geisert, J. Castresana, 1998 A hominoid-specific nuclear insertion of the mitochondrial d-loop: implications for reconstructing ancestral mitochondrial sequences Mol. Biol. Evol 15:463-469[Abstract]

Accepted for publication June 4, 2001.