Consistency of SINE Insertion Topology and Flanking Sequence Tree: Quantifying Relationships Among Cetartiodactyls

J. Koji Lum*, Masato Nikaido{dagger}, Mitsuru Shimamura{dagger}, Hidetoshi Shimodaira*, Andrew M. Shedlock*{dagger}, Norihiro Okada{dagger} and Masami Hasegawa2,*

*Institute of Statistical Mathematics, Tokyo, Japan; and
{dagger}Tokyo Institute of Technology, Yokohama, Japan


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Short interspersed nuclear elements (SINEs) have been used to generate unambiguous phylogenetic topologies relating eukaryotic taxa. The irreversible nature of SINE retroposition is supported by a large body of comparative genome data and is a fundamental assumption inherent in the value of this qualitative method of inference. Here, we assess the key assumption of unidirectional SINE insertion by comparing the SINE insertion–derived topology and the phylogenetic tree based on seven independent loci of five taxa in the order Cetartiodactyla (Cetacea + Artiodactyla). The data sets and analyses were largely independent, but the loci were, by definition, linked, and thus their consistency supported an irreversible pattern of SINE retroposition. Moreover, our analyses of the flanking sequences provided estimates of divergence times among cetartiodactyl lineages unavailable from SINE insertion analysis alone. Unexpected rate heterogeneity among sites of SINE-flanking sequences and other noncoding DNA sequences were observed. Sequence simulations suggest that this rate heterogeneity may be an artifact resulting from the inaccuracies of the substitution model used.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
Analysis of the insertion patterns of SINEs to diagnose common ancestry among eukaryotic host genomes is emerging as a powerful new method of phylogenetic inference (Murata et al. 1993Citation ; Shimamura et al. 1997Citation ; Takahashi et al. 1998Citation ; Nikaido, Rooney, and Okada 1999Citation ; Shedlock and Okada 2000Citation ). The putatively irreversible, independent nature of SINE retropositional events allows them to be employed as unambiguous, shared, derived characters for straightforward cladogram construction. Exploration of the limits of this new approach to molecular systematics invites further investigation (Hillis 1999Citation ; Miyamoto 1999Citation ; Shedlock, Milinkovitch, and Okada 2000Citation ; Shedlock and Okada 2000Citation ). Statistical evaluation of SINE data and their integration with conventional comparative DNA sequence analyses are particularly rich areas for the development of new methods.

Although SINE insertions allow one to construct tree topologies, they cannot be used reliably to calculate relative branch lengths without the ability to model amplification rates of SINE markers over time. It seems clear from well-studied systems such as mammalian MIR elements, human Alus, and the Hpa-I SINEs in fishes (Quentin 1988Citation ; Deininger and Batzer 1993Citation ; Takasaki et al. 1994Citation ; Schmid 1996Citation ) that assuming a constant rate of amplification across and within lineages is not justified and that establishing accurate historical amplification profiles for elements inserted at each independent locus analyzed is intractable. However, SINE-flanking sequences may potentially be used for dating historical retropositional events that diagnose common ancestry, because of the probable neutral nature of evolution in nonfunctional regions of the genome (Del Pozzo and Guardiola 1990Citation ; Shedlock and Okada 2000Citation ). The amount of divergence between nonfunctional SINE-flanking sequences at orthologous loci is proportional to elapsed time, and so analyses of these flanking sequences will provide branch lengths to the SINE topology. In contrast, if loci assumed to be orthologous are actually paralogous, we expect a lack of correspondence between the SINE insertion topology and the flanking-sequence tree. Thus, our evaluation of the consistency between the phylogenetic signal of the SINE insertion and flanking-sequence data is an explicit test of the assumption of orthology of loci.

Traditional morphological taxonomy groups the even-toed ungulates within the order Artiodactyla. Within this order are the three suborders Tylopoda (camels), Suiformes (pigs, peccaries, and hippopotamuses), and Ruminantia (deer, cows, giraffes, etc.) (Simpson 1984Citation ). Cetacea (baleen and toothed whales) is considered a distinct order of probable ungulate origin (Dawson and Krishtalka 1984Citation ). In contrast, molecular phylogenetics have described cetaceans as nested within the order Artiodactyla (Graur and Higgins 1994Citation ; Shimamura et al. 1997Citation ), often with whales and hippopotamuses as sister taxa (Sarich 1985Citation ; Gatesy et al. 1996Citation ; Ursing and Arnarson 1998Citation ; Nikaido, Rooney, and Okada 1999Citation ). A recent, inclusive analysis of DNA sequence data from Cetartiodactyla taxa supports these relationships and also identifies the camel as the root of Cetartiodactyla (Gatesy 1999Citation ). Thus, molecular data view both the traditional order Artiodactyla and the suborder Suiformes as paraphyletic and argue for a revised order Cetartiodactyla (Cetacea + Artiodactyla).

In order to evaluate the orthologous nature of SINE insertion loci and to explore the feasibility of accurately estimating branch lengths for SINE cladograms, we analyzed the DNA sequences flanking seven SINE insertion sites, five of which were recently used to qualitatively define the relationships within the order Cetartiodactyla (Nikaido, Rooney, and Okada 1999Citation ). We reasoned that if the SINE loci examined were orthologous and no misleading excision of elements had occurred, the relationships of the flanking sequences should mirror the SINE insertion topology. If the two data sets were found to be consistent, we then planned to use the sequence divergences to estimate branch lengths for the SINE insertion topology.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
DNA sequences from seven SINE insertion loci (APO, GPI, INO, KM14, LAC, M11, and PRO) were determined by previously described methods (Shimamura et al. 1997Citation ; Nikaido, Rooney, and Okada 1999Citation ). Sequences were aligned using ClustalX (Thompson et al. 1997Citation ), and all ambiguously aligned regions and gap sites were removed. A total of 2,048 bases from at least one species representing five presumably monophyletic groups (Tylopoda, Hippopotamidae, Suidae, Cetacea, and Ruminantia) were analyzed (table 1 ). Although our analyses were of unrooted trees, we implicitly assumed the root to Cetartiodactyla was to Tylopoda, as indicated by analyses of DNA sequences (Gatesy 1999Citation ) and SINE insertions (Nikaido, Rooney, and Okada 1999Citation ).


View this table:
[in this window]
[in a new window]
 
Table 1 Species Used in the Analyses

 
The seven loci were analyzed individually and as concatenated sequences. The log-likelihoods of the 15 possible unrooted topologies relating the five taxa were calculated assuming a general reversible substitution model with a discrete gamma distribution of eight categories for the rate variation among sites (Yang 1994Citation ) using the program baseML in the software package PAML, version 1.3A (Yang 1997Citation ). The bootstrap support for nodes was estimated from the total data by the RELL method (Kishino, Miyata, and Hasegawa 1990Citation ; Hasegawa and Kishino 1994Citation ) using the program TotalML in MOLPHY, version 2.3 (Adachi and Hasegawa 1996aCitation ). Divergence times were estimated from the concatenated sequences allowing the rate of evolution to vary among lineages (Thorne, Kishino, and Painter 1998Citation ).

For each data set, the gamma parameter {alpha}, which summarizes rate variation among sites, was estimated using the program baseML (Yang 1997Citation ). To evaluate the distribution of the estimated {alpha} values, two additional log-likelihoods were calculated from each data set by fixing {alpha} to be first 1 and then {infty}. The log-likelihood value corresponding to the estimated {alpha} was then compared with the values corresponding to {alpha} = 1 and {alpha} = {infty} using the likelihood ratio test. To further explore the estimation of {alpha}, two data sets were simulated from the seven concatenated loci from five taxa using a version of PAML modified by one of us (H.S.). Each of these data sets consisted of 1,000 replicates of simulated sequences (5 x 2,048 bases) with {alpha} fixed at either the value estimated from the original data or {infty}. Values of {alpha} were then estimated from the two sets of 1,000 replicates.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
The results of the maximum-likelihood analyses of the seven loci and the total data are shown in table 2 . Tree 1, (Tylopoda, Suidae, (Ruminantia, (Hippopotamidae, Cetacea))), is favored by the M11 locus and the total data analyses. For the total data analysis, tree 1 is favored over other topologies by at least 9.4 log-likelihood scores and has bootstrap support of 88%. Of the four other trees supported by at least one locus, the two with the strongest support are tree 4, (Tylopoda, Ruminantia (Suidae, (Hippopotamidae, Cetacea))), and tree 5, (Tylopoda, (Suidae, Ruminantia), (Hippopotamidae, Cetacea)), with 6% and 4% bootstrap support, respectively. Thus, 98% of bootstrap replicates support the pairing of the hippopotamus and the whale to the exclusion of other artiodactyls. Although tree 1 is not the maximum-likelihood tree for all loci, more importantly, it is not rejected by any locus. The total branch length, defined as the cumulative substitutions per site over tree 1 for the concatenated sequences, is 0.50 and is fairly uniform among loci (table 2 ).


View this table:
[in this window]
[in a new window]
 
Table 2 Maximum-Likelihood Analyses of SINE-Flanking Sequences

 
The estimates of the gamma shape parameter {alpha} for the tree 1 topology for the seven loci and five concatenated sequences are shown in table 3 . For each of the seven single-locus analyses, the maximum number of species available (total data) was used to estimate {alpha}. For the M11 locus (estimated {alpha} = 3.6), {alpha} = 1 was rejected (P < 0.015), but not {alpha} = {infty} (P > 0.13). Also included in table 3 for comparison are values of {alpha} we estimated from intron sequences from salmon (GH1C and GH2C; Oakley and Phillips 1999Citation ), carnivores (transthyretin intron I; Flynn and Nedbal 1998Citation ), primates (IRBP intron1; Schneider et al. 1996Citation ; Harris and Disotell 1998Citation ), the {psi}-{eta}-globin gene from primates (Koop et al. 1986Citation ; Fitch et al. 1988Citation ), and fourfold-degenerate sites from primate mtDNA (Horai et al. 1995Citation ; Adachi and Hasegawa 1996bCitation ). Figure 1B shows that when data are simulated with {alpha} fixed at 99 (infinity in PAML), 45% of the estimates of {alpha} are lower, with a peak at {alpha} = 10.


View this table:
[in this window]
[in a new window]
 
Table 3 The Gamma Shape Parameter {{alpha}} Estimated from Noncoding Sequences, with Distributions of Log-Likelihoods Inferred from Comparisons Between Estimated and Fixed {{alpha}} Analyses

 


View larger version (11K):
[in this window]
[in a new window]
 
Fig. 1.—The distributions of simulated {alpha} values when {alpha} is fixed at either (A) 2.01, the observed value of the actual data, or (B) 99 ({infty} in baseML)

 
Figure 2 shows the maximum-likelihood tree from the concatenated sequences with branch lengths proportional to the estimated number of substitutions. Bootstrap support for the nodes of this tree were calculated from the total data. As noted above, the grouping of hippopotamuses and Cetacea receives 98% bootstrap support, while the placement of ruminants as the most closely related taxa to this group receives 88% bootstrap support. The likelihood ratio test favors the nonclock model (fig. 2 ) over a clock model (P < 9.7 x 10-7), indicating rate variation among lineages. Although a molecular clock is rejected, divergence times may still be estimated by allowing the rates of evolution to evolve along lineages (Thorne, Kishino, and Painter 1998Citation ). Figure 3 displays divergence times estimated in this manner. The variable molecular clock is calibrated by assuming the divergence of hippopotamuses and whales to be 52 MYA based on the primitive semiaquatic cetacean Pakicetus (Gingerich, Russell, and Ibrahim Shah 1983Citation ).



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 2.—Maximum-likelihood tree from concatenated sequences (2,048 bp). Numbers at the nodes are bootstrap estimates from the total data generated from 10,000 replicates using the RELL method (Kishino, Miyata, and Hasegawa 1990Citation )

 


View larger version (11K):
[in this window]
[in a new window]
 
Fig. 3.—Divergence times estimated allowing the evolution of evolutionary rates along lineages (Thorne, Kishino, and Painter 1998Citation )

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
One goal of this project was to evaluate the assumption of orthologous insertion inherent in the use of retroposons to diagnose clades. If any locus had experienced paralogous insertion or precise excision of a SINE element, we expected the sequences flanking that locus to strongly reject the SINE insertion topology. Analyses of the flanking sequences were further used to generate accurate branch lengths for SINE cladograms and thus provide additional insight into cetartiodactyl evolution. Our results are consistent with the original SINE analyses for these taxa (Shimamura et al. 1998Citation ; Nikaido, Rooney, and Okada 1999Citation ). Although tree 1 (Tylopoda, Suidae, (Ruminantia, (Hippopotamidae, Cetacea))) is not the maximum-likelihood tree for all loci, it is not rejected by any of the seven independent SINE loci examined and is the maximum-likelihood tree for all evidence considered simultaneously (table 2 ). Although no outgroup to Cetartiodactyla is analyzed here, our unrooted phylogeny is completely consistent with the most recent SINE cladogram (Nikaido, Rooney, and Okada 1999Citation ).

Figure 3 shows the divergence times estimated from the concatenated sequences allowing rates of evolution to evolve within lineages (Thorne, Kishino, and Painter 1998Citation ). The radiation of the order Cetartiodactyla is estimated at 65 MYA, in general agreement with the appearance of large-bodied mammals in the fossil record (Dawson and Krishtalka 1984Citation ). The split between ruminant and the sister taxa hippopotamuses and whales is estimated 60 MYA, indicating an earlier, and relatively rapid, divergence of the taxa Tylopoda and Suidae. These times are calibrated from a 52 MYA estimated divergence between hippopotamuses and whales. Pakicetus, an amphibious, but apparently not fully aquatic, ancestor of Cetacea (Gingerich, Russell, and Ibrahim Shah 1983Citation ), is dated to 52 MYA, and presumably the ancestor of both Cetacea and Hippopotamus lived substantially prior to this time, making these divergence times minimum estimates.

A corollary of the divergence dates discussed above is estimates of the independent rates of evolution along each lineage. These relative rates of evolution are listed in table 4 , along with the body masses of each taxon. Since sequences from multiple cetaceans and ruminants are used here, an average body mass weighted by the number of bases from each species examined (table 1 ) is presented. Although it is unclear how long whales have been large, there are fossil skulls measuring 3.5 m dated to the Miocene (Dawson and Krishtalka 1984Citation ). Similarly, it is uncertain when the other taxa examined attained their present size and life history characteristics, but surveys of extant species suggest that ruminants and suids have probably been the smallest artiodactyls over evolutionary time (Simpson 1984Citation ). Note the inverse relationship between estimated substitution rate and body mass. The similarity of estimated evolutionary rates of ruminants and suids may reflect similar sizes and corresponding life histories. If evolution is neutral, as is probably the case for the SINE-flanking sequences, the substitution rate in evolution coincides with the mutation rate (Kimura 1983Citation ). Body mass has been shown to be correlated with age of reproductive maturity, life span, and gestation period and inversely correlated with metabolic rate within mammalian taxa (Calder 1984Citation ). We use body masses as estimators for these more interesting life history parameters because they are much easier to determine and are commonly available. The inverse relationship between estimated substitution rate and body mass is generally consistent with both the generation length (Wu and Li 1985Citation ; Li, Tanimura, and Sharp 1987Citation ) and the metabolic-rate hypothesis of molecular evolution (Martin, Naylor, and Palumbi 1992Citation ). Further work focusing on the metabolic slowdown associated with deep-diving whales may be able to tease apart these correlated hypotheses.


View this table:
[in this window]
[in a new window]
 
Table 4 Estimated Rates of Evolution and Body Masses of Taxa Within Cetartiodactyla

 
Since retroposons typically insert into nonfunctional regions dispersed throughout the eukaryotic genome (Weiner, Deininger, and Efstratiadis 1986Citation ), one expects little rate heterogeneity among sites within flanking sequences. This expectation can be evaluated by examining the gamma shape parameter {alpha}, which describes the mutation rate heterogeneity among sites. Theoretically, {alpha} values of infinity result when all sites are free to vary with equal probability. In contrast, protein-coding sequences typically have {alpha} values less than 1 due to the difference in constraints between the degenerate third position and the conservative, and sometimes nearly invariable, first and second positions within a codon (Hasegawa, Kishino, and Yano 1985Citation ; Swofford et al. 1996Citation ). The values of {alpha} we observed for the SINE-flanking sequences ranged between 0.70 (PRO) and 3.90 (GPI) and were comparable with a wide range of other noncoding regions from a variety of taxa (table 3 ). These data might seem to indicate that rate heterogeneity among sites is common even in the absence of functional constraints.

For each of the noncoding sequences, we calculated log-likelihood values for fixed {alpha} values of 1 and {infty} in order to evaluate the shape of the likelihood surface. For the GPI locus and the two salmon introns (GH1C and GH2C), the likelihood surface was flat, with no significant variation with respect to values of {alpha} ranging from 1 to {infty}. Values of {alpha} for four of the other SINE-flanking loci (INO, KM14, LAC, and PRO), as well as IRBP intron 1 from primates, are not significantly different from 1. Interestingly, M11 is the only SINE-flanking sequence with an {alpha} value not significantly different from {infty}.

In order to remove the potential effects of subtle biological sources of rate heterogeneity such as base composition, repeat motif instability, and regulatory constraints on the estimation of {alpha}, a computer simulation was undertaken. Figure 1A shows the estimates of {alpha} from 1,000 data sets simulated from the seven concatenated loci with {alpha} fixed at 2.01, the value estimated from the data (table 3 ). The 1,000 values of {alpha} have a mean of 2.2 and a standard deviation of 0.81. In contrast, figure 1B shows the distribution of {alpha} values estimated from data sets simulated with {alpha} fixed at infinity. There are two modes to this distribution, with 55% of the values at 99 (infinity in PAML) and another peak at 10. These simulations indicate that rate heterogeneity may be inferred by chance even when {alpha} = {infty}.

The unusual shape of the histogram (fig. 1B ) of the estimated {alpha} values is caused by the parameterization of rate heterogeneity. In fact, the histogram is well approximated by the half-normal distribution if we use the natural parameterization 1/{alpha} instead of {alpha} (fig. 4 ). The following simplified model explains this:



View larger version (7K):
[in this window]
[in a new window]
 
Fig. 4.—The distribution of 1/{alpha} simulated for figure 1B.

 

where xi and yi correspond to the relative rate and the observed pattern at site i, respectively. The variance 1/{alpha} of rates is estimated by


which is asymptotically distributed as the normal distribution, but with 0 representing all the negative values. The simulation results of Whelan and Goldman (1999)Citation and the argument of Ota et al. (2000)Citation confirm the above observation. Taking this into account, the P values of table 3 are calculated by the one-sided tests of 1/{alpha} = 1 against 1/{alpha} < 1, or 1/{alpha} = 0 against 1/{alpha} > 0.

Theoretically, the gamma parameter {alpha} measures rate heterogeneity, but in practice, it may also compensate for other model parameters. For example, if there are aspects of the substitution process that are modeled unrealistically, then observed values of {alpha} may confound rate heterogeneity and these hidden inaccuracies. The simulation results suggest that the rate heterogeneity observed in the SINE-flanking and other noncoding sequences may be an artifact of the substitution modeling process. Further work is in progress to explore this suggestion.

There are limits of SINE insertion analysis that require novel statistical approaches. These include the retention of ancestral polymorphisms and situations in which missing data or insufficient sampling of loci reduces confidence in hypothesis formation (Hillis 1999Citation ; Miyamoto 1999Citation ; Shedlock, Milinkovitch, and Okada 2000Citation ; Shedlock and Okada 2000Citation ). Also, methods used to assess the reliability of phylogenetic inference based on sequences, such as bootstrap resampling of data (Felsenstein 1985Citation ), are not appropriate for small character sets. If the assumptions of irreversible and orthologous insertions are upheld, as in the present case, confidence in the character set defined by resampling of the data is not necessary (Sanderson 1995Citation ; Shedlock, Milinkovitch, and Okada 2000Citation ; Shedlock and Okada 2000Citation ). Results from the present analyses represent an initial step toward confirming the assumptions fundamental to SINE character reliability. The integration of insertion data with nucleotide substitution information contained in nonfunctional flanking sequences provides a means for confirming the reliability and quantifying the phylogenetic results obtained qualitatively.


    Supplementary Material
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
The sequence alignment files used in the analyses are available on request. Accession numbers are AB028484–AB028488 and AB042892–AB042933.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 
We thank the following individuals and organizations for providing samples: H. Kato (National Research Institute of Far Seas Fisheries, Shizuoka), M. Goto (Institute of Cetacean Research, Tokyo), I. Munechika (Chiba Zoological Park), Y. Mukai (Meat Hygiene Inspection Office, Nagano), and O. Ryder (Zoological Society of San Diego's Center for Reproduction of Endangered Species). We thank M. Fujiwara for assistance in implementing the software of J. Thorne. We would also like to thank the Japan Society for the Promotion of Science for postdoctoral support to J.K.L. (P97904) and A.M.S. (P98881) and the Ministry of Education, Science, Sports and Culture of Japan for research grants to M.H., N.O., and H.S.


    Footnotes
 
Dan Graur, Reviewing Editor

1 Keywords: SINE DNA sequence phylogeny Cetartiodactyla rate heterogeneity mutation rate Back

2 Address for correspondence and reprints: Masami Hasegawa, Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan. E-mail: hasegawa{at}ism.ac.jp Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Supplementary Material
 Acknowledgements
 literature cited
 

    Adachi, J., and M. Hasegawa. 1996a. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monogr. 28:1–150

    ———. 1996b. Tempo and mode of synonymous substitutions in mitochondrial DNA of primates. Mol. Biol. Evol. 13:200–208

    Calder, W. A. 1984. Size, function and life history. Harvard University Press, Cambridge, Mass

    Dawson, M. R., and L. Krishtalka. 1984. Fossil history of the families of recent mammals. Pp. 11–58 in S. Anderson and J. K. Jones, eds. Orders and families of recent mammals of the world. John Wiley and Sons, New York

    Deininger, P. L., and M. A. Batzer. 1993. Evolution of retroposons. Evol. Biol. 27:157–196[ISI]

    Del Pozzo, G., and J. Guardiola. 1990. A SINE insertion provides information on the divergence of the HLA-DQA1 and HLA-DQA2 genes. Immunogenetics 31:229–232

    Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791

    Fitch, D. H. A., C. Mainone, J. L. Slightom, and M. Goodman. 1988. The spider monkey {psi}{eta}-globin gene and surrounding sequences: recent or ancient insertions of LINEs and SINEs? Genomics 3:237–255

    Flynn, J. J., and M. A. Nedbal. 1998. Phylogeny of the Carnivoura (Mammalia): congruence vs. incompatibility among multiple data sets. Mol. Phylogenet. Evol. 9:414–426[ISI][Medline]

    Gatesy, J. 1999. Stability of cladistic relationships between Cetacea and higher-level artiodactyl taxa. Syst. Biol. 48:6–20[ISI][Medline]

    Gatesy, J., C. Hayashi, M. A. Cronin, and P. Arctander. 1996. Evidence from milk casein genes that cetaceans are close relatives of hippopotamid artiodactyls. Mol. Biol. Evol. 13:954–963[Abstract/Free Full Text]

    Gingerich, P. D., D. E. Russell, and S. M. Ibrahim Shah. 1983. Origin of whales in epicontinental remnant seas: new evidence from the early Eocene of Pakistan. Science 220:404–406

    Graur, D., and D. G. Higgins. 1994. Molecular evidence for the inclusion of cetaceans within the order Artiodactyla. Mol. Biol. Evol. 11:357–364[Abstract]

    Harris, E. E., and T. R. Disotell. 1998. Nuclear gene trees and the phylogenetic relationships of the Mangabeys (Primates: Papionini). Mol. Biol. Evol. 15:892–900[Abstract]

    Hasegawa, M., and H. Kishino. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum likelihood tree. Mol. Biol. Evol. 11:142–145[Free Full Text]

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174[ISI][Medline]

    Hillis, D. M. 1999. SINEs of the perfect character. Proc. Natl. Acad. Sci. USA 96:9979–9981

    Horai, S., K. Hayasaka, R. Kondo, K. Tsugane, and N. Takahata. 1995. The recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs. Proc. Natl. Acad. Sci. USA 92:532–536

    Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, Mass

    Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29:170–179[ISI][Medline]

    Kishino, H., T. Miyata, and M. Hasegawa. 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 30:151–160

    Koop, B. F., M. Goodman, P. Xu, K. Chan, and J. L. Slightom. 1986. Primate {eta}-globin DNA sequences and man's place among the great apes. Nature 319:234–238

    Li, W.-H., M. Tanimura, and P. M. Sharp. 1987. An evaluation of the molecular clock hypothesis using mammalian DNA sequences. J. Mol. Evol. 25:330–342[ISI][Medline]

    Martin, A. P., G. J. P. Naylor, and S. R. Palumbi. 1992. Rates of mitochondrial DNA evolution in sharks are slow compared to mammals. Nature 357:153–155

    Miyamoto, M. M. 1999. Perfect SINEs of evolutionary history? Curr. Biol. 9:R816–R819

    Murata, S., N. Takasaki, M. Saitoh, and N. Okada. 1993. Determination of the phylogenetic relationships among Pacific salmonids by using short interspersed elements (SINEs) as temporal landmarks of evolution. Proc. Natl. Acad. Sci. USA 90:6995–6999

    Nikaido, M., A. P. Rooney, and N. Okada. 1999. Phylogenetic relationships among cetartiodactyls based on insertions of short and long interspersed elements: hippopotamuses are the closest extant relatives of whales. Proc. Natl. Acad. Sci. USA 96:10261–10266

    Oakley, T. H., and R. B. Phillips. 1999. Phylogeny of Salmonine fishes based on growth hormone introns: Atlantic (Salmo) and Pacific (Oncorhynchus) salmon are not sister taxa. Mol. Phylogenet. Evol. 11:381–393[ISI][Medline]

    Ota, R., P. J. Waddell, M. Hasegawa, H. Shimodaira, and H. Kishino. 2000. Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Mol. Biol. Evol. 17:798–803[Abstract/Free Full Text]

    Quentin, Y. 1988. The Alu family developed through successive waves of fixation closely connected with primate lineage history. J. Mol. Evol. 27:194–202[ISI][Medline]

    Sanderson, M. J. 1995. Objections to bootstrapping phylogenies: a critique. Syst. Biol. 44: 299–320

    Sarich, V. 1985. Rodent macromolecular systematics. Pp. 423–452 in W. Luckett and J. Hartenberger, eds. Evolutionary relationships among rodents: a multidisciplinary approach. Plenum Press, New York

    Schmid, C. W. 1996. Alu: structure, origin, evolution, significance and function of one-tenth of human DNA. Prog. Nucleic Acid Res. Mol. Biol. 53:283–319[ISI][Medline]

    Schneider, H., I. Sampaio, M. L. Harada, C. M. Barroso, M. P. Schneider, J. Czelusniak, and M. Goodman. 1996. Molecular phylogeny of the New World monkeys (Platyrrhini, primates) based on two unlinked nuclear genes: IRBP intron 1 and epsilon-globin sequences. Am. J. Phys. Anthropol. 100:153–179[ISI][Medline]

    Shedlock, A. M., M. C. Milinkovitch, and N. Okada. 2000. SINE evolution, missing data, and the origin of whales. Syst. Biol. (in press)

    Shedlock, A. M., and N. Okada. 2000. SINE insertions: powerful tools for molecular systematics. Bioessays 22:148–160

    Shimamura, M., H. Yasue, K. Ohshima, H. Abe, H. Kato, T. Kishiro, M. Goto, I. Munechika, and N. Okada. 1997. Molecular evidence from retroposons that whales form a clade within even-toed ungulates. Nature 388:666–670

    Simpson, C. D. 1984. Artiodactyls. Pp. 563–588 in S. Anderson and J. K. Jones, eds. Orders and families of recent mammals of the world. John Wiley and Sons, New York

    Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pp. 407–514 in D. M. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. 2nd edition. Sinauer, Sunderland, Mass

    Takahashi, K, Y. Terai, M. Nishida, and N. Okada. 1998. A novel family of short interspersed repetitive elements (SINEs) from cichlids: the pattern of insertion of SINEs at orthologous loci support the proposed monophyly of four major groups of cichlid fishes in Lake Tanganyika. Mol. Biol. Evol. 15:391–407[Abstract]

    Takasaki, N., S. Murata, M. Saitoh, T. Kobayashi, L. Park, and N. Okada. 1994. Species-specific amplification of tRNA-derived SINEs via retroposition: a process of parasitization of entire genomes during the evolution of salmonids. Proc. Natl. Acad. Sci. USA 91:10153–10157

    Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876–4882

    Thorne, J. L., H. Kishino, and I. S. Painter. 1998. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15:1647–1657[Abstract/Free Full Text]

    Ursing, B. M., and U. Arnarson. 1998. Analyses of mitochondrial genomes strongly support a hippopotamus–whale clade. Proc. R. Soc. Lond. B Biol. Sci. 265:2251–2255[ISI][Medline]

    Weiner, A., P. L. Deininger, and A. Efstratiadis. 1986. Nonviral retroposons: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 55:631–661[ISI][Medline]

    Whelan, S., and N. Goldman. 1999. Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics. Mol. Biol. Evol. 16:1292–1299[Free Full Text]

    Wu, C.-I., and W.-H. Li. 1985. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc. Natl. Acad. Sci. USA 82:1741–1745

    Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306–314[ISI][Medline]

    ———. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555–556

Accepted for publication May 29, 2000.