*Institute of Statistical Mathematics, Tokyo, Japan;
and
Tokyo Institute of Technology, Yokohama, Japan
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Although SINE insertions allow one to construct tree topologies, they cannot be used reliably to calculate relative branch lengths without the ability to model amplification rates of SINE markers over time. It seems clear from well-studied systems such as mammalian MIR elements, human Alus, and the Hpa-I SINEs in fishes (Quentin 1988
; Deininger and Batzer 1993
; Takasaki et al. 1994
; Schmid 1996
) that assuming a constant rate of amplification across and within lineages is not justified and that establishing accurate historical amplification profiles for elements inserted at each independent locus analyzed is intractable. However, SINE-flanking sequences may potentially be used for dating historical retropositional events that diagnose common ancestry, because of the probable neutral nature of evolution in nonfunctional regions of the genome (Del Pozzo and Guardiola 1990
; Shedlock and Okada 2000
). The amount of divergence between nonfunctional SINE-flanking sequences at orthologous loci is proportional to elapsed time, and so analyses of these flanking sequences will provide branch lengths to the SINE topology. In contrast, if loci assumed to be orthologous are actually paralogous, we expect a lack of correspondence between the SINE insertion topology and the flanking-sequence tree. Thus, our evaluation of the consistency between the phylogenetic signal of the SINE insertion and flanking-sequence data is an explicit test of the assumption of orthology of loci.
Traditional morphological taxonomy groups the even-toed ungulates within the order Artiodactyla. Within this order are the three suborders Tylopoda (camels), Suiformes (pigs, peccaries, and hippopotamuses), and Ruminantia (deer, cows, giraffes, etc.) (Simpson 1984
). Cetacea (baleen and toothed whales) is considered a distinct order of probable ungulate origin (Dawson and Krishtalka 1984
). In contrast, molecular phylogenetics have described cetaceans as nested within the order Artiodactyla (Graur and Higgins 1994
; Shimamura et al. 1997
), often with whales and hippopotamuses as sister taxa (Sarich 1985
; Gatesy et al. 1996
; Ursing and Arnarson 1998
; Nikaido, Rooney, and Okada 1999
). A recent, inclusive analysis of DNA sequence data from Cetartiodactyla taxa supports these relationships and also identifies the camel as the root of Cetartiodactyla (Gatesy 1999
). Thus, molecular data view both the traditional order Artiodactyla and the suborder Suiformes as paraphyletic and argue for a revised order Cetartiodactyla (Cetacea + Artiodactyla).
In order to evaluate the orthologous nature of SINE insertion loci and to explore the feasibility of accurately estimating branch lengths for SINE cladograms, we analyzed the DNA sequences flanking seven SINE insertion sites, five of which were recently used to qualitatively define the relationships within the order Cetartiodactyla (Nikaido, Rooney, and Okada 1999
). We reasoned that if the SINE loci examined were orthologous and no misleading excision of elements had occurred, the relationships of the flanking sequences should mirror the SINE insertion topology. If the two data sets were found to be consistent, we then planned to use the sequence divergences to estimate branch lengths for the SINE insertion topology.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
For each data set, the gamma parameter , which summarizes rate variation among sites, was estimated using the program baseML (Yang 1997
). To evaluate the distribution of the estimated
values, two additional log-likelihoods were calculated from each data set by fixing
to be first 1 and then
. The log-likelihood value corresponding to the estimated
was then compared with the values corresponding to
= 1 and
=
using the likelihood ratio test. To further explore the estimation of
, two data sets were simulated from the seven concatenated loci from five taxa using a version of PAML modified by one of us (H.S.). Each of these data sets consisted of 1,000 replicates of simulated sequences (5 x 2,048 bases) with
fixed at either the value estimated from the original data or
. Values of
were then estimated from the two sets of 1,000 replicates.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Figure 3
shows the divergence times estimated from the concatenated sequences allowing rates of evolution to evolve within lineages (Thorne, Kishino, and Painter 1998
). The radiation of the order Cetartiodactyla is estimated at 65 MYA, in general agreement with the appearance of large-bodied mammals in the fossil record (Dawson and Krishtalka 1984
). The split between ruminant and the sister taxa hippopotamuses and whales is estimated 60 MYA, indicating an earlier, and relatively rapid, divergence of the taxa Tylopoda and Suidae. These times are calibrated from a 52 MYA estimated divergence between hippopotamuses and whales. Pakicetus, an amphibious, but apparently not fully aquatic, ancestor of Cetacea (Gingerich, Russell, and Ibrahim Shah 1983
), is dated to 52 MYA, and presumably the ancestor of both Cetacea and Hippopotamus lived substantially prior to this time, making these divergence times minimum estimates.
A corollary of the divergence dates discussed above is estimates of the independent rates of evolution along each lineage. These relative rates of evolution are listed in table 4
, along with the body masses of each taxon. Since sequences from multiple cetaceans and ruminants are used here, an average body mass weighted by the number of bases from each species examined (table 1 ) is presented. Although it is unclear how long whales have been large, there are fossil skulls measuring 3.5 m dated to the Miocene (Dawson and Krishtalka 1984
). Similarly, it is uncertain when the other taxa examined attained their present size and life history characteristics, but surveys of extant species suggest that ruminants and suids have probably been the smallest artiodactyls over evolutionary time (Simpson 1984
). Note the inverse relationship between estimated substitution rate and body mass. The similarity of estimated evolutionary rates of ruminants and suids may reflect similar sizes and corresponding life histories. If evolution is neutral, as is probably the case for the SINE-flanking sequences, the substitution rate in evolution coincides with the mutation rate (Kimura 1983
). Body mass has been shown to be correlated with age of reproductive maturity, life span, and gestation period and inversely correlated with metabolic rate within mammalian taxa (Calder 1984
). We use body masses as estimators for these more interesting life history parameters because they are much easier to determine and are commonly available. The inverse relationship between estimated substitution rate and body mass is generally consistent with both the generation length (Wu and Li 1985
; Li, Tanimura, and Sharp 1987
) and the metabolic-rate hypothesis of molecular evolution (Martin, Naylor, and Palumbi 1992
). Further work focusing on the metabolic slowdown associated with deep-diving whales may be able to tease apart these correlated hypotheses.
|
For each of the noncoding sequences, we calculated log-likelihood values for fixed values of 1 and
in order to evaluate the shape of the likelihood surface. For the GPI locus and the two salmon introns (GH1C and GH2C), the likelihood surface was flat, with no significant variation with respect to values of
ranging from 1 to
. Values of
for four of the other SINE-flanking loci (INO, KM14, LAC, and PRO), as well as IRBP intron 1 from primates, are not significantly different from 1. Interestingly, M11 is the only SINE-flanking sequence with an
value not significantly different from
.
In order to remove the potential effects of subtle biological sources of rate heterogeneity such as base composition, repeat motif instability, and regulatory constraints on the estimation of , a computer simulation was undertaken. Figure 1A
shows the estimates of
from 1,000 data sets simulated from the seven concatenated loci with
fixed at 2.01, the value estimated from the data (table 3
). The 1,000 values of
have a mean of 2.2 and a standard deviation of 0.81. In contrast, figure 1B
shows the distribution of
values estimated from data sets simulated with
fixed at infinity. There are two modes to this distribution, with 55% of the values at 99 (infinity in PAML) and another peak at 10. These simulations indicate that rate heterogeneity may be inferred by chance even when
=
.
The unusual shape of the histogram (fig. 1B
) of the estimated values is caused by the parameterization of rate heterogeneity. In fact, the histogram is well approximated by the half-normal distribution if we use the natural parameterization 1/
instead of
(fig. 4
). The following simplified model explains this:
|
where xi and yi correspond to the relative rate and the observed pattern at site i, respectively. The variance 1/ of rates is estimated by
which is asymptotically distributed as the normal distribution, but with 0 representing all the negative values. The simulation results of Whelan and Goldman (1999)
and the argument of Ota et al. (2000)
confirm the above observation. Taking this into account, the P values of table 3
are calculated by the one-sided tests of 1/
= 1 against 1/
< 1, or 1/
= 0 against 1/
> 0.
Theoretically, the gamma parameter measures rate heterogeneity, but in practice, it may also compensate for other model parameters. For example, if there are aspects of the substitution process that are modeled unrealistically, then observed values of
may confound rate heterogeneity and these hidden inaccuracies. The simulation results suggest that the rate heterogeneity observed in the SINE-flanking and other noncoding sequences may be an artifact of the substitution modeling process. Further work is in progress to explore this suggestion.
There are limits of SINE insertion analysis that require novel statistical approaches. These include the retention of ancestral polymorphisms and situations in which missing data or insufficient sampling of loci reduces confidence in hypothesis formation (Hillis 1999
; Miyamoto 1999
; Shedlock, Milinkovitch, and Okada 2000
; Shedlock and Okada 2000
). Also, methods used to assess the reliability of phylogenetic inference based on sequences, such as bootstrap resampling of data (Felsenstein 1985
), are not appropriate for small character sets. If the assumptions of irreversible and orthologous insertions are upheld, as in the present case, confidence in the character set defined by resampling of the data is not necessary (Sanderson 1995
; Shedlock, Milinkovitch, and Okada 2000
; Shedlock and Okada 2000
). Results from the present analyses represent an initial step toward confirming the assumptions fundamental to SINE character reliability. The integration of insertion data with nucleotide substitution information contained in nonfunctional flanking sequences provides a means for confirming the reliability and quantifying the phylogenetic results obtained qualitatively.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: SINE
DNA sequence
phylogeny
Cetartiodactyla
rate heterogeneity
mutation rate
2 Address for correspondence and reprints: Masami Hasegawa, Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan. E-mail: hasegawa{at}ism.ac.jp
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adachi, J., and M. Hasegawa. 1996a. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Comput. Sci. Monogr. 28:1150
. 1996b. Tempo and mode of synonymous substitutions in mitochondrial DNA of primates. Mol. Biol. Evol. 13:200208
Calder, W. A. 1984. Size, function and life history. Harvard University Press, Cambridge, Mass
Dawson, M. R., and L. Krishtalka. 1984. Fossil history of the families of recent mammals. Pp. 1158 in S. Anderson and J. K. Jones, eds. Orders and families of recent mammals of the world. John Wiley and Sons, New York
Deininger, P. L., and M. A. Batzer. 1993. Evolution of retroposons. Evol. Biol. 27:157196[ISI]
Del Pozzo, G., and J. Guardiola. 1990. A SINE insertion provides information on the divergence of the HLA-DQA1 and HLA-DQA2 genes. Immunogenetics 31:229232
Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783791
Fitch, D. H. A., C. Mainone, J. L. Slightom, and M. Goodman. 1988. The spider monkey -globin gene and surrounding sequences: recent or ancient insertions of LINEs and SINEs? Genomics 3:237255
Flynn, J. J., and M. A. Nedbal. 1998. Phylogeny of the Carnivoura (Mammalia): congruence vs. incompatibility among multiple data sets. Mol. Phylogenet. Evol. 9:414426[ISI][Medline]
Gatesy, J. 1999. Stability of cladistic relationships between Cetacea and higher-level artiodactyl taxa. Syst. Biol. 48:620[ISI][Medline]
Gatesy, J., C. Hayashi, M. A. Cronin, and P. Arctander. 1996. Evidence from milk casein genes that cetaceans are close relatives of hippopotamid artiodactyls. Mol. Biol. Evol. 13:954963
Gingerich, P. D., D. E. Russell, and S. M. Ibrahim Shah. 1983. Origin of whales in epicontinental remnant seas: new evidence from the early Eocene of Pakistan. Science 220:404406
Graur, D., and D. G. Higgins. 1994. Molecular evidence for the inclusion of cetaceans within the order Artiodactyla. Mol. Biol. Evol. 11:357364[Abstract]
Harris, E. E., and T. R. Disotell. 1998. Nuclear gene trees and the phylogenetic relationships of the Mangabeys (Primates: Papionini). Mol. Biol. Evol. 15:892900[Abstract]
Hasegawa, M., and H. Kishino. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum likelihood tree. Mol. Biol. Evol. 11:142145
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160174[ISI][Medline]
Hillis, D. M. 1999. SINEs of the perfect character. Proc. Natl. Acad. Sci. USA 96:99799981
Horai, S., K. Hayasaka, R. Kondo, K. Tsugane, and N. Takahata. 1995. The recent African origin of modern humans revealed by complete sequences of hominoid mitochondrial DNAs. Proc. Natl. Acad. Sci. USA 92:532536
Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, Mass
Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29:170179[ISI][Medline]
Kishino, H., T. Miyata, and M. Hasegawa. 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 30:151160
Koop, B. F., M. Goodman, P. Xu, K. Chan, and J. L. Slightom. 1986. Primate -globin DNA sequences and man's place among the great apes. Nature 319:234238
Li, W.-H., M. Tanimura, and P. M. Sharp. 1987. An evaluation of the molecular clock hypothesis using mammalian DNA sequences. J. Mol. Evol. 25:330342[ISI][Medline]
Martin, A. P., G. J. P. Naylor, and S. R. Palumbi. 1992. Rates of mitochondrial DNA evolution in sharks are slow compared to mammals. Nature 357:153155
Miyamoto, M. M. 1999. Perfect SINEs of evolutionary history? Curr. Biol. 9:R816R819
Murata, S., N. Takasaki, M. Saitoh, and N. Okada. 1993. Determination of the phylogenetic relationships among Pacific salmonids by using short interspersed elements (SINEs) as temporal landmarks of evolution. Proc. Natl. Acad. Sci. USA 90:69956999
Nikaido, M., A. P. Rooney, and N. Okada. 1999. Phylogenetic relationships among cetartiodactyls based on insertions of short and long interspersed elements: hippopotamuses are the closest extant relatives of whales. Proc. Natl. Acad. Sci. USA 96:1026110266
Oakley, T. H., and R. B. Phillips. 1999. Phylogeny of Salmonine fishes based on growth hormone introns: Atlantic (Salmo) and Pacific (Oncorhynchus) salmon are not sister taxa. Mol. Phylogenet. Evol. 11:381393[ISI][Medline]
Ota, R., P. J. Waddell, M. Hasegawa, H. Shimodaira, and H. Kishino. 2000. Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Mol. Biol. Evol. 17:798803
Quentin, Y. 1988. The Alu family developed through successive waves of fixation closely connected with primate lineage history. J. Mol. Evol. 27:194202[ISI][Medline]
Sanderson, M. J. 1995. Objections to bootstrapping phylogenies: a critique. Syst. Biol. 44: 299320
Sarich, V. 1985. Rodent macromolecular systematics. Pp. 423452 in W. Luckett and J. Hartenberger, eds. Evolutionary relationships among rodents: a multidisciplinary approach. Plenum Press, New York
Schmid, C. W. 1996. Alu: structure, origin, evolution, significance and function of one-tenth of human DNA. Prog. Nucleic Acid Res. Mol. Biol. 53:283319[ISI][Medline]
Schneider, H., I. Sampaio, M. L. Harada, C. M. Barroso, M. P. Schneider, J. Czelusniak, and M. Goodman. 1996. Molecular phylogeny of the New World monkeys (Platyrrhini, primates) based on two unlinked nuclear genes: IRBP intron 1 and epsilon-globin sequences. Am. J. Phys. Anthropol. 100:153179[ISI][Medline]
Shedlock, A. M., M. C. Milinkovitch, and N. Okada. 2000. SINE evolution, missing data, and the origin of whales. Syst. Biol. (in press)
Shedlock, A. M., and N. Okada. 2000. SINE insertions: powerful tools for molecular systematics. Bioessays 22:148160
Shimamura, M., H. Yasue, K. Ohshima, H. Abe, H. Kato, T. Kishiro, M. Goto, I. Munechika, and N. Okada. 1997. Molecular evidence from retroposons that whales form a clade within even-toed ungulates. Nature 388:666670
Simpson, C. D. 1984. Artiodactyls. Pp. 563588 in S. Anderson and J. K. Jones, eds. Orders and families of recent mammals of the world. John Wiley and Sons, New York
Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pp. 407514 in D. M. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. 2nd edition. Sinauer, Sunderland, Mass
Takahashi, K, Y. Terai, M. Nishida, and N. Okada. 1998. A novel family of short interspersed repetitive elements (SINEs) from cichlids: the pattern of insertion of SINEs at orthologous loci support the proposed monophyly of four major groups of cichlid fishes in Lake Tanganyika. Mol. Biol. Evol. 15:391407[Abstract]
Takasaki, N., S. Murata, M. Saitoh, T. Kobayashi, L. Park, and N. Okada. 1994. Species-specific amplification of tRNA-derived SINEs via retroposition: a process of parasitization of entire genomes during the evolution of salmonids. Proc. Natl. Acad. Sci. USA 91:1015310157
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:48764882
Thorne, J. L., H. Kishino, and I. S. Painter. 1998. Estimating the rate of evolution of the rate of molecular evolution. Mol. Biol. Evol. 15:16471657
Ursing, B. M., and U. Arnarson. 1998. Analyses of mitochondrial genomes strongly support a hippopotamuswhale clade. Proc. R. Soc. Lond. B Biol. Sci. 265:22512255[ISI][Medline]
Weiner, A., P. L. Deininger, and A. Efstratiadis. 1986. Nonviral retroposons: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu. Rev. Biochem. 55:631661[ISI][Medline]
Whelan, S., and N. Goldman. 1999. Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics. Mol. Biol. Evol. 16:12921299
Wu, C.-I., and W.-H. Li. 1985. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc. Natl. Acad. Sci. USA 82:17411745
Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306314[ISI][Medline]
. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555556