Department of Cell and Molecular Biology, Northwestern University Medical School, Chicago;
Division of Mammals, Field Museum of Natural History, Chicago
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several criteria have been proposed for discriminating between mtDNA and numt sequences (see Zhang and Hewitt 1996
; Bensasson et al. 2001
). Those applicable to rRNA genes involve either interpretation of the phylogenetic topology, the inferred secondary structure of the transcribed RNA molecule, or a combination of the two. (We assume sequence ambiguities on chromatograms and multiple bands on autoradiograms are automatically considered suspect and therefore limit our discussion here to clean sequences.) Phylogenetic indicators include "unusual or contradictory" (Zhang and Hewitt 1996
, p. 250) or "aberrant" (Arctander 1995
, p. 18) results. However, the absence of prior knowledge of phylogenetic relationships renders this criterion difficult to apply in some situations (Arctander 1995
), particularly as the objective of many phylogenetic studies is the inference of the phylogenetic position of a given taxon (or taxa) in the first place. Significantly shorter branches may also indicate nuclear introgressions under the assumption that numts evolve more slowly than their mtDNA paralogs (Sorenson and Fleischer 1996
; Sorenson and Quinn 1998
). These same authors point out, however, that not only would recently introgressed copies fail detection by this criterion, but that the absence of selective constraints would eventually allow numts to accumulate substitutions at selectively conserved mtDNA sites such that branch lengths leading to older numts may be longer than those leading to their mtDNA paralogs (e.g., see Lopez et al. 1994, 1997
).
A second widely advocated method for discriminating rDNA numts involves aligning sequences to secondary structure models to determine whether structurally disruptive mutations or changes at phylogenetically conserved positions have occurred (Hickson et al. 1996
; Sorenson and Fleischer 1996
; Houde et al. 1997
; Sorenson and Quinn 1998
). Unfortunately, no explicit criteria have been proposed along these lines, leaving the underlying question of what constitutes a permissible change with regard to secondary structure or conserved sites unanswered. Secondary structure itself evolves, and novel structural features involving wholesale losses of otherwise highly conserved elements have been reported (e.g., Janke et al. 1994
), as has the acquisition of novel stem structures (Olson 1999
). Also, as Sorenson and Fleischer (1996)
noted, numts will not necessarily be identifiable by structurally incompatible changes or substitutions in conserved sequence motifs owing, again, to their presumably slower substitution rates (see also Hickson et al. 1996
).
Curiously, despite the apparent popularity of the secondary structure method of detection, few studies have used it to identify a suspected rRNA numt. Noor and Larkin (2000)
employed structural criteria in their examination of suspiciously variable Drosophila 12S rRNA sequences reported by previous authors. These authors used three separate tests, two of which compared the number of changes in conserved regions (conserved sites and sites in stems) with those in nonconserved regions within and between several species. A third test compared minimal free energy values among taxa for a portion of the 12S rRNA molecule folded according to a structural model. Results of each of these tests suggested either errors in the sequences in question or the unintentional sequencing of one or more numts, although additional sequencing under similar conditions by Noor and Larkin (2000)
failed to support a pseudogene explanation.
We are unaware of any study that has critically examined the effectiveness of using secondary structure to identify known numts. For example, none of the studies reporting nuclear insertions of mitochondrial rRNA or tRNA genes in table 1 of Zhang and Hewitt (1996)
compared their resulting sequences with secondary structure models to ascertain the extent to which predicted structure was affected. Thus, the oft-cited utility of secondary structure models in discriminating rDNA numts remains largely speculative. We explore this approach using the 12S rRNA gene from the human mitochondrial genome and the recently completed Human Genome Project. Given the known genomic origin and status of these sequences in humans (functional mitochondrial gene or nuclear pseudogene), the availability of mtDNA sequences for the same locus from several additional primates, a well-supported phylogenetic hypothesis for the chosen taxa, and rigorously tested secondary structure models for the rRNA subunit of interest, we are afforded an unprecedented opportunity for examining this issue in detail. We consider a hypothetical scenario in which a species of uncertain phylogenetic affinity (Homo sapiens, in this case) is surveyed for a mitochondrial gene (12S) whose status (nuclear paralog vs. functional mitochondrial gene) we wish to determine using secondary structure. We consider two structural models and a number of numts in an attempt to avoid erroneous conclusions based on the potential idiosyncrasies of any single pseudogene or structural hypothesis.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Parsimony analyses were performed using PAUP* 4.0b8 (Swofford 2001)
. The branch and bound search algorithm was employed with all characters equally weighted. Trees were rooted using Lemur and Tarsius as outgroups. Bootstrap support was calculated using 100 branch and bound search replicates. Tree searches were conducted for the alignment based on the SD96 model only, as the H96 model is limited to the third domain of the 12S molecule. Ambiguously aligned regions were excluded from parsimony searches. Separate analyses were conducted for each human sequence (three nuclear and one mitochondrial), and their respective phylogenetic positions were compared with the expected phylogeny of hominoids (Ruvolo 1997
; Goodman et al. 1998
). A fifth analysis in which all four human sequences were included was also carried out in an attempt to determine the relative age of each introgression event as well as to explore the possibility of gene duplication with respect to the pseudogenes sampled, i.e., whether one of the pseudogenes considered gave rise to one or both of the remaining numts.
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
All three numts were readily alignable to both secondary structure models (fig. 1
; H96 model and an expanded version of fig. 1 are available online as supplementary material). The only indels larger than one nucleotide occurring within a putative stem structure in any pseudogene are both found in numt3. These include a 4-base deletion encompassing portions of either stem 1 or stem 3 (see fig. 1
for alternative placements of this indel), which would result in the loss of three pair-bonds in either case. No comparable indels were observed in any mitochondrial sequence.
|
|
Of the 56 conserved positions in the H96 model for Domain III, only three were found for which one or more sequences possess an alternative base (fig. 1
). Two of these involve mutations to C's in positions otherwise conserved for T's in stem 31; one occurs in numt1 and one in Lemur. The mutation in numt1 occurs in the fourth position of stem 31 and is not compensated for in the corresponding nucleotide in 31', indicating the loss of a TA pair bond in this stem. Noncanonical CA bonds are generally discounted, but they have nonetheless been suggested for this stem in other mammals (Springer and Douzery 1996
) and may be relatively common among vertebrates in Domain III (Hickson et al. 1996
). In addition, other mammals possess a C at this position (Springer and Douzery 1996
; personal observation). Similarly, the C in what is otherwise the last pairing position of stem 31 in the SD96 model observed in Lemur is uncompensated at the corresponding position in 31', resulting in the loss of a TA bond. The third site found to possess a nonconserved base in any sequence occurs in the nonpairing region connecting stems 36 and 27. Normally constrained to a purine, a cytosine is observed in Papio, Nasalis, and numt1. With respect to the highly conserved positions proposed by Hickson et al. (1996)
, then, only one of the three pseudogenes (numt1) exhibits any deviation, and for every aberrant conserved site in numt1, one or more of the mitochondrial sequences are found to be similarly variable, either at the same position or within the same stem.
The total number of disruptive mutations for each sequence is given in table 2 . Not surprisingly, a much greater range is encountered when the entire molecule is considered (SD96 only), in which case numt1 exhibits more than twice as many disruptive mutations as any other sequence. Values for numts 2 and 3 do not fall outside the range observed in other primate (mtDNA) sequences. When only Domain III is considered, no numt exceeds all other primate sequences in the number of disruptive mutations observed.
|
Results of parsimony analyses are shown in fig. 2
. When only mitochondrial sequences are analyzed (fig. 2A
), a single most-parsimonious topology is recovered. Although the expected human-chimp sister relationship is not obtained, bootstrap support for the conflicting gorilla-chimp clade is weak; all other nodes receive 80% bootstrap support and conform to the well-corroborated phylogenetic hypothesis relating these taxa. Analyses in which individual numts were substituted for the human mitochondrial sequence are shown in fig. 2BD,
and the results of an analysis with all sequences included are summarized in fig 2E
. The well-supported phylogenetic position of numt1 (fig. 2B and E
) suggests that it introgressed from the mitochondrial genome prior to the hominoid radiation. A 4-base deletion occurring in all hominoid sequences (including all numts) in the nonpairing region (in both models) between stems 30 and 31 serves to establish a maximum age of origin for numt1. A survey of 44 additional primate 12S sequences in GenBank (see Appendix) suggests this 4-base deletion is unique to catarrhines, and that the introgression of numt1 therefore likely postdated the platyrrhine-catarrhine divergence. Collectively, this evidence indicates that numt1 originated between 18 and 25 MYA. Numts 2 and 3, on the other hand, appear to be of much more recent origin, apparently having diverged from the human mitochondrial genome sometime after the divergence of orangutans from the remaining hominids (fig. 2CE
). The poorly resolved phylogenetic position of numts 2 and 3 relative to the Gorilla, Pan, and Homo mitochondrial sequences limits inferences of the relative timing of these introgressions. The possibility that either numt2 or numt3 gave rise to the other cannot be discounted based on our analyses, given the recovery of this arrangement in two of the nine most-parsimonious trees.
|
In our exploration of the inferred structural characteristics of three 12S rRNA pseudogenes from the human genome, we found only two structurally disruptive indel events in stem regions, both of which, ironically, occur in what is likely to be the most recently introgressed copy (numt3). The more disruptive of the twosuggesting the loss of three pair bonds in one or possibly two stemsis well outside the region amplified by the popular Kocher et al. (1989) primers and is hence not included in the H96 model; the same is true for the inordinate loss of pair bonds in stem 40 of numt1. Neither of the other numts exhibits any immediately suspicious indels. A simple count of disruptive mutations would suggest numt 1 (but not numts 2 or 3) to be an outlier, but only when the entire gene is assayed; none of the numts appear inordinately modified when Domain III alone is considered. This suggests that studies employing only partial 12S sequences, which may be more prone to numt contamination to begin with (e.g., van der Kuyl et al. 1995
), may be even further compromised if secondary structure alone is used to check for numt amplification.
Inspection of conserved positions in the H96 model similarly fails to elicit alarm. Two of the three numts (not surprisingly, the two youngest) are invariant at all of these sites. Numt1 varies at only two positions, but so do other mammalian mitochondrial sequences at both positions for the same nucleotide. Broadening the taxonomic sample to include other primates would likely assuage suspicion. All 25 cercopithecid (Old World monkeys) and 16 out of 17 platyrrhine (New World monkeys) 12S sequences from GenBank listed in the Appendix possess either a C or a T at this position (rather than the A or G proposed in the H96 model), suggesting that the selective constraints that have maintained a purine at this position in other animal taxa are not acting similarly in primates.
Thus, with the possible exception of numt3, a thorough visual inspection of these sequences against a structural model would likely fail to identify the pseudogenes. Numt1, by far the oldest copy, appears questionable when all disruptive mutations are counted along the entire molecule, but not when shorter pieces are considered. Comparisons of predicted stability for Domain III are perhaps more sophisticated, but their results are equally equivocal and cast as much suspicion on some mitochondrial sequences (e.g., Lemur) as on some pseudogenes, whereas one pseudogene surpasses all other sequences in estimated thermal stability. As would be predicted based on its age, numt1 is the least stable of the human sequences, falling outside the range of free energies observed in other primates in the SD96 model but not under the H96 model. This potential for model-dependent conclusions has implications beyond our study, particularly for studies investigating the evolution of rRNA. We suggest that either alternative models of secondary structure need to be considered or that such models be fine-tuned for the taxa of interest. With respect to our hypothetical scenario, we believe that, in the absence of additional evidence, numts1, 2, and possibly 3 would be accepted as mitochondrial if inadvertently sequenced and subjected to a cursory check against secondary structure. This is particularly sobering in the case of numt1, given its well-supported phylogenetic placement, which differs substantially from that of humans and could lead to myriad erroneous conclusions if accepted as a mitochondrial sequence. Whereas a more exhaustive investigation of these sequences might very well identify a pseudogene signature for these numts, casual inspection of secondary structure can clearly be insufficient for discriminating both recently introgressed as well as relatively ancient mitochondrial rRNA pseudogenes.
Reference to secondary structure is often claimed to be or advocated (e.g., Houde et al. 1997
) as a means of confirming the cytoplasmic origin of sequenced mitochondrial rRNA genes. The expectation that rRNA pseudogenes will eventually accumulate structurally disruptive mutations in the form of excessive point substitutions or indel events or both is certainly a reasonable one, and reference to secondary structure is likely a relatively rapid method of numt identification if such mutations have occurred. Criteria for accepting a given rDNA sequence as mitochondrial based on structural characteristics alone are often vague, at best, and probably for a good reasonnamely, uncertainty as to what constitutes structurally and selectively tenable mutations. Although we strongly advocate the use of secondary structure for inferring positional homology (see Kjer 1995
) and studying character evolution (regardless of the dizzying tedium involved), we caution against relying on concordance to secondary structure as a means of identifying pseudogenes, as this may engender false confidence. Rather, we suggest that researchers who suspect such numts employ more direct methods such as cloning or singe-stranded conformation polymorphism (Bensasson et al. 2001)
.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Appendix |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Tarsius syrichta (AF069976), Callithrix pygmaea (AF069983), Callithrix jacchus (AF069982), Callimico goeldii (AF069981), Saguinus geoffroyi (F069972), Saguinus oedipus (AF069973), Leontopithecus rosalia (AF069969), Cebus apella (AF069965), Saimiri sciureus (AF069974), Aotus trivirgatus (AF069977), Alouatta palliata (AF069964), Alouatta seniculus (AF069975), Ateles sp. (AF069978), Brachyteles arachnoides (AF069979), Callicebus moloch (AF069980), Chiropotes satanas (AF069966), Lagothrix lagotricha (AF069968), Pithecia pithecia (AF069971), Cercocebus aterrimus (L35192), Cercocebus torquatus (L35204), Cercopithecus aethiops (L35185, L35187, L35189, L35190, L35194, L35207), Cercopithecus ascanius (L35202), Cercopithecus cephus (L35191), Cercopithecus diana (L35193), Cercopithecus galeritus (L35208), Cercopithecus mitis (L35197), Cercopithecus mona (L35198), Cercopithecus neglectus (L35182), Cercopithecus nictitans (L35199), Cercopithecus patas (L35186), Colobus guereza (L35195), Macaca mulatta (L35203), Macaca sylvanus (L35188), Mandrillus sphinx (L35196), Miopithecus talapoin (L35205), Papio cynocephalus (L35184), Papio ursinus (L35206), Presbytis cristatus (L35200), and Gorilla gorilla (L35209).
![]() |
Footnotes |
---|
1 Present address: Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut
Keywords: pseudogene
numt
12S rRNA
Address for correspondence and reprints: Link E. Olson, Department of Cell and Molecular Biology, Northwestern University Medical School, 303 E. Chicago Avenue, Chicago, Illinois 60611. lolson{at}fmnh.org
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402
Arctander P., 1995 Comparison of a mitochondrial gene and a corresponding nuclear pseudogene Proc. R. Soc. Lond 262:13-19[ISI][Medline]
Bensasson D., D.-X. Zhang, D. L. Hartl, G. M. Hewitt, 2001 Mitochondrial pseudogenes: evolution's misplaced witnesses Trends Ecol. Evol 16:314-321[ISI][Medline]
Collura R. V., C.-B. Stewart, 1995 Insertions and duplications of mtDNA in the nuclear genomes of Old World monkeys and hominoids Nature 378:485-492[ISI][Medline]
Fukuda M., S. Wakasugi, T. Tsuzuki, H. Nomiyama, K. Shimada, T. Miyata, 1985 Mitochondrial DNA-like sequences in the human nuclear genome Characterization and implications in the evolution of mitochondrial DNA. J. Mol. Biol 186:257-266[ISI][Medline]
Goodman M., C. A. Porter, J. Czelusniak, S. L. Page, H. Schneider, J. Shoshani, G. Gunnell, C. P. Groves, 1998 Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence Mol. Phylogenet. Evol 9:585-598[ISI][Medline]
Groves C. P., 1993 Order Primates Pp. 243277 in D. E. Wilson and D. M. Reeder, eds. Mammal species of the world. A taxonomic and geographic reference. 2nd edition. Smithsonian Institution Press, Washington
Hickson R. E., C. Simon, A. Cooper, G. S. Spicer, J. Sullivan, D. Penny, 1996 Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12SrRNA Mol. Biol. Evol 13:150-169[Abstract]
Houde P., A. Cooper, E. Leslie, A. E. Strand, G. A. Montaño, 1997 Phylogeny and evolution of 12S rDNA in Gruiformes (Aves) Pp. 121158 in D. P. Mindell, ed. Avian molecular evolution and systematics. Academic Press, San Diego, Calif
Janke A., G. Feldmaier-Fuchs, W. K. Thomas, A. von Haeseler, S. Pääbo, 1994 The marsupial mitochondrial genome and the evolution of placental mammals Genetics 137:243-256
Kjer K. M., 1995 Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from the frogs Mol. Phylogenet. Evol 4:314-330[ISI][Medline]
Kocher T. D., W. K. Thomas, A. Meyer, S. V. Edwards, S. Pääbo, F. X. Villablanca, A. C. Wilson, 1989 Dynamics of mitochondrial DNA evolution in animals: amplification and sequencing with conserved primers Proc. Natl. Acad. Sci. USA 86:6196-6200[Abstract]
Lopez J. V., M. Culver, J. C. Stephens, W. E. Johnson, S. J. O'Brien, 1997 Rates of nuclear and cytoplasmic mitochondrial DNA sequence divergence in mammals Mol. Biol. Evol 14:277-286[Abstract]
Lopez J. V., N. Yuhki, R. Masuda, W. Modi, S. J. O'Brien, 1994 Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat J. Mol. Evol 39:174-190[ISI][Medline]
Mathews D. H., J. Sabina, M. Zuker, D. H. Turner, 1999 Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure J. Mol. Biol 288:911-940[ISI][Medline]
Noor M. A., J. C. Larkin, 2000 A re-evaluation of 12S ribosomal RNA variability in Drosophila pseudoobscura Mol. Biol. Evol 17:938-941
Olson L. E., 1999 Systematics, evolution, and biogeography of Madagascar's tenrecs (Mammalia: Tenrecidae) Doctoral dissertation, University of Chicago, Chicago, Ill
Ruvolo M., 1997 Genetic diversity in hominoid primates Annu. Rev. Anthropol 26:515-540[ISI]
Sorenson M. D., R. C. Fleischer, 1996 Multiple independent transpositions of mitochondrial DNA control region sequences to the nucleus Proc. Natl. Acad. Sci. USA 93:15239-15243
Sorenson M. D., T. W. Quinn, 1998 Numts: a challenge for avian systematics and population biology Auk 115:214-221[ISI]
Springer M. S., E. Douzery, 1996 Secondary structure and patterns of evolution among mammalian mitochondrial 12S rRNA molecules J. Mol. Evol 43:357-373[ISI][Medline]
Swofford D. L., 2001 PAUP* Phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, Mass
Tajima F., 1993 Simple methods for testing the molecular evolutionary clock hypothesis Genetics 135:599-607
Van de Peer Y., P. De Rijk, J. Wuyts, T. Winkelmans, R. De Wachter, 2000 The European small subunit ribosomal RNA database Nucleic Acids Res 28:175-176
van der Kuyl A. C., C. L. Kuiken, J. T. Dekker, W. R. K. Perizonius, J. Goudsmit, 1995 Nuclear counterparts of the cytoplasmic mitochondrial 12S rRNA gene: a problem of ancient DNA and molecular phylogenies J. Mol. Evol 40:652-657[ISI][Medline]
Zhang D.-X., G. M. Hewitt, 1996 Nuclear integrations: challenges for mitochondrial DNA markers Trends Ecol. Evol 11:247-251[ISI]
Zischler H., H. Geisert, A. von Haeseler, S. Pääbo, 1995 A nuclear fossil of the mitochondrial d-loop and the origin of modern humans Nature 378:489-492[ISI][Medline]
Zuker M., D. H. Mathews, D. H. Turner, 1999 Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide Pp. 1143 in J. Barciszewski and B. F. C. Clark, eds. RNA biochemstry and biotechnology. Kluwer Academic, Boston, Mass