*The Institute of Statistical Mathematics, Tokyo, Japan;
Molecular Evolution Laboratory, Faculty of Bioscience and Biotechnology, Tokyo Institute of Technology, Japan
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
When analyzing a multiple sequence alignment, one must assume an underlying evolutionary model. This model includes tree topology, branch lengths, rate heterogeneity among sites, and substitution probabilities. All these parameters may change from gene to gene. For example, the tree topology of various genes may not be the same because of horizontal gene transfer. Similarly, the substitution model may vary from gene to gene because of different evolutionary constraints or because of differences in GC content. In this article we focus on the fitting of different branch length models and rate heterogeneity models to several protein data sets.
The Number of Branch Length Parameters
For an unrooted tree topology T, with n sequences, the number of branches is 2n - 3. We denote branches by t1, ..., t2n-3. Assume now that we have two data sets (e.g., two different genes), each one with n homologous sequences. One way to analyze a combination of these data sets is to concatenate the sequences and evaluate the resultant branch lengths. We refer to this model as the "concatenate model." In such a scenario the joint probability would be
|
|
The Proportional Branch Lengths Approach
The proportional branch lengths approach assumes that branch lengths for two trees are the same, up to a scaling factor r. Thus, if t1, ..., t2n-3 are the branches of the first gene, the branch lengths of the second gene would be rt1, ..., rt2n-3. This scaling factor r corresponds to a gene-specific rate that is assigned to each gene, and for n genes we have n gene-specific rate factors r1, ..., rn. The average r should be equal to 1.0. We refer to this model as the "proportional model." For two data sets the joint probability is
|
Consequently, the total number of parameters for two genes under this proportional model is 2n - 3 + 1 = 2n - 2. The number of parameters for n genes for each of the three models is summarized in table 1 .
|
The Number of Among-Site Rate Variation Parameters
We consider three possible models of among-site rate variation. The first model assumes that all sites have the same rate of evolution ("homogenous" model), the second model assumes one gamma rate parameter for all genes ("1-GAM" model), and the third model assumes a separate gamma parameter for each gene ("N-GAM" model).
We compare all nine combinations of models with respect to likelihood (concatenate-homogenous, concatenate1-GAM, concatenateN-GAM, proportional-homogenous, proportional1-GAM, proportionalN-GAM, separate-homogenous, separate1-GAM, separateN-GAM). With respect to branch lengths, we show that the proportional and separate models are always better than the concatenate model. Selecting between these two models depends on the specification of the data set under study. For some data sets the proportional model represents the best model, whereas for others the separate model is the best. With respect to the number of gamma parameters, the N-GAM model is the best model for all the data sets included in our study.
![]() |
Material and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Madsen Data Set
The Madsen nucleotide alignment includes 28 species for four independent nuclear genes: the alpha-2B adrenergic receptor (A2AB, 344 sites), the breast cancer susceptibility gene (BRCA1, 557 sites), the interphotoreceptor retinoid-binding protein (IRBP, 301 sites), and the von Willebrand factor (vWF, 338 sites). The sequences of the golden mole (Amblysomus hottentotus) and the Madagascar hedgehog (Echinops telfairi) were not included because sequence data were not available for IRBP. The BRCA1 sequence of the thick-tailed opossum (Lutreolina crassicaudata; accession number: AY057826) was added manually to the Madsen alignment.
Murphy Data Set
Among the 18 genes considered in the nucleotide alignment of Murphy et al. (2001)
, we excluded the seven noncoding genes. Because we required that all genes share the same species sampling (i.e., no missing sequences), three more genes for which marsupial sequences were unavailable were excluded. Two genes with a poor species sampling were also excluded. This exclusion allows our analysis to maintain a large and diversified species sampling, where all sequences are available for all genes. The final alignment includes 46 species for six nuclear genes: adenosine A3 receptor (ADORA3, 107 sites), the Menkes disease gene (ATP7A, 220 sites), the brain-derived neurotrophic factor (BDNF, 182 sites), the cannabinoid receptor 1 (CNR1, 219 sites), the sphingolipid G-proteincoupled receptor 1 (EDG1, 199 sites), and the zinc finger protein X-linked (ZFX, 67 sites).
Mitochondrial Data Set
The mitochondrial data set included 56 species comprising the 43 complete mitochondrial coding sequences analyzed by Nikaido et al. (2001)
, together with the following sequences: (1) Asiatic shrew, Soriculus fumidus, AF348081; (2) long-tailed bat, Chalinolobus tuberculatus, AF321051; (3) little red flying fox, Pteropus scapulatus, AF321050; (4) northern brown bandicoot, Isoodon macrourus, AF358864; (5) gymnure, Echinosorex gymnura, AF348079; (6) American pika, Ochotona princeps, AF348080; (7) barbary ape, Macaca sylvanus, AJ309865; (8) slow loris, Nycticebus coucang, AJ309867; (9) white-fronted capuchin, Cebus albifrons, AJ309866; (10) cane rat, Thryonomys swinderianus, AJ301644; (11) vole, Volemys kikuchii, AF348082; (12) tree shrew, Tupaia belangeri, AF217811; and (13) small Madagascar hedgehog, E. telfairi, AJ400734. The 12 H-strand mitochondrial protein-coding genes are ND1 (313 sites), ND2 (313 sites), COX1 (512 sites), COX2 (225 sites), ATP8 (32 sites), ATP6 (201), COX3 (259), ND3 (104 sites), ND4L (94 sites), ND4 (438 sites), ND5 (526 sites), and Cytb (375 sites). The overlapping regions between ATP6 and ATP8 and between ND4 and ND4L were excluded.
All nucleotide alignments were translated into amino acid alignments. To agree with the reading frame, some minor changes were made to the alignments of Madsen et al. (2001)
and Murphy et al. (2001)
. For all genes, gap positions were excluded from the analysis. If data for certain positions were missing in >5% of the species studied, then such positions were excluded from the analysis. All the protein alignments and accession numbers are attached as supplementary material at http://www.molbiolevol.org/.
Tree Topologies
For each data set four different topologies were considered: a morphological tree, a mitochondrial tree, and two nuclear trees (Madsen and Murphy topologies). Because species sampling differed among the four data sets, the four trees were slightly different with respect to the data set used. For the mitochondrial data set the morphological and mitochondrial trees are presented in figures 1
and 2
, respectively. For the Murphy data set the Murphy tree is given in figure 3
, and for the Madsen data set the Madsen tree is given in figure 4
. All 12 trees are attached as supplementary material at http://www.molbiolevol.org/.
|
|
|
|
Mitochondrial Tree
The mitochondrial tree is based on Cao et al. (2000)
. However, the position of the rodents was chosen in agreement with Reyes, Pesole, and Saccone (2000)
, and Mouchaty et al. (2001)
. The position of the vole among the rodents was chosen in agreement with morphological data (McKenna and Bell 1997
). Among cetartiodactyls, the alpaca and the pig were sister clades in agreement with Arnason et al. (2000)
. The relationships among bats were chosen in agreement with Nikaido et al. (2001)
and McKenna and Bell (1997)
. The shrews and moles were placed as a sister clade of the bats, in agreement with Nikaido et al. (2001)
. The Afrotheria phylogeny was in agreement with Murphy et al. (2001)
. Xenarthra was placed as a sister clade of Afrotheria, in agreement with Reyes, Pesole, and Saccone (2000)
. The position of the rabbit, tree shrew, primate, and hedgehog was in agreement with Schmitz, Ohme, and Zischler (2000)
. The lagomorphs were considered monophyletic. The relationship among primates followed McKenna and Bell (1997)
. Hedgehog and gymnure were placed together. Finally, relationships among marsupials were taken from Phillips et al. (2001)
. For the other relationships we followed Murphy et al. (2001)
. Figure 2 presents the mitochondrial tree for the mitochondrial data set.
The Murphy Tree
This tree was based on Murphy et al. (2001)
. McKenna and Bell (1997)
were followed for relationships that were not determined by Murphy et al. Figure 3
presents the Murphy tree for the Murphy data set.
The Madsen Tree
This tree is based on Madsen et al. (2001
, fig. A). Murphy et al. (2001)
and McKenna and Bell (1997)
were followed for relationships that were not determined by Madsen et al. Figure 4 presents the Madsen tree for the Madsen data set.
Model
In this study, models based on amino acid sequences were used. The replacement probabilities among amino acids were calculated with the JTT matrix (Jones, Taylor, and Thornton 1992
) for nuclear genes and the REV model (Adachi and Hasegawa 1996
) for mitochondrial genes. However, the approach presented here is also valid for nucleotide sequences and for any substitution model. The alpha parameter of the gamma distribution was estimated using the ML method. The discrete gamma distribution with four categories was used (Yang 1994
). A program implementing all the nine models described above was written in C++ and is attached as supplementary material at: http://www.molbiolevol.org/.
Procedures for calculation of the likelihood functions were adapted from the SEMPHY program (Friedman et al. 2001
).
To compare the different models, the Akaike Information Criterion (AIC), defined as AIC = -2 x log-likelihood + 2 x number of free parameters, was used (Sakamoto, Ishiguro, and Kitagawa 1986
). A model with a lower AIC is considered more appropriate (Sakamoto, Ishiguro, and Kitagawa 1986
). To evaluate if the AIC values of two models are significant, the test of Linhart (1988)
was used. When comparing different tree topologies for the same model, the one-tailed Kishino-Hasegawa test was used (Kishino and Hasegawa 1989
).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
|
For the 12 mitochondrial genes studied (table 5 ), the AIC values obtained using the proportional model were significantly lower than those obtained with either the concatenate model or the separate model. For example, the N-GAM model for the mitochondrial tree yielded an AIC value of 182,262.6 for the proportional model, whereas the separate analysis and concatenate models gave AIC values of 182,483.55 and 182,619.42, respectively. This difference of 221 between the proportional model and the separate model is significant (P < 0.05). Thus, the proportional model is significantly better than both the separate model and the concatenate model. However, the AIC difference of 136 between the separate model and the concatenate model (table 5 ) is not statistically significant.
Among the four different tree topologies, for all models the most likely tree was the mitochondrial tree. For the N-GAM model with proportional branch length, the log-likelihood of the mitochondrial tree was -90,999.3, whereas the Murphy tree was second best at -91,022.96. This difference of 23.66 ± 47.84 corresponds to a P value of 0.31, using the Kishino-Hasegawa test (Kishino and Hasegawa 1989
). Hence, when using the proportional model, the mitochondrial tree is not significantly different from the Murphy tree. Similar results were obtained when comparing the mitochondrial tree and the Madsen tree (P value of 0.21 using the Kishino-Hasegawa test). However, the morphological tree was rejected when compared with all other trees (log-likelihood difference > 742; P < 0.001).
Murphy Data Set
For the six nuclear genes studied, again the AIC values obtained using the proportional model were significantly lower than those obtained with either the concatenate model or the separate model. For example, the N-GAM model for the Madsen tree yielded an AIC value of 23,287.7 for the proportional model, whereas the separate analysis and concatenate models gave 23,464.2 and 23,427.3, respectively (table 6
). This difference of 176.5 between the proportional model and the separate model is significant (P < 0.05). As for the mitochondrial data set, the AIC differences between the separate model and the concatenate model were not significant. The most likely tree topology for all models is the Madsen topology, in contrast with the observations of Murphy et al. (2001)
, except for the concatenateN-GAM model, where the best tree is the Murphy tree. This discrepancy most likely arises from our use of amino acid data sets instead of nucleotide data sets, as well as from differences in alignment. However, the difference in log-likelihood between the Madsen topology and the Murphy topology is very small and nonsignificant in each case (i.e., log-likelihood difference < 10; table 6 ). For example, for the proportionalN-GAM model the log-likelihood difference between the Madsen and the Murphy tree topologies is 1.87 ± 8.22, corresponding to a P value of 0.41 by the Kishino-Hasegawa test. Assuming this model, both the mitochondrial tree and the morphological tree are significantly worse than the Madsen tree (log-likelihood difference > 44; P < 0.01).
Madsen Data Set
Unlike the results obtained for the other two data sets, the lowest AIC values for the Madsen data set were obtained with the separate model regardless of the number of gamma parameters assumed. For example, the N-GAM model for the Murphy tree yielded an AIC value of 62,738.6 for the separate model, whereas the proportional analysis and concatenate models gave 62,933.6 and 63,152.2, respectively. This difference of 195 between the proportional model and the separate model is significant (P < 0.01). Here, the proportional model is also significantly better than the concatenate model (P < 0.01). The most likely tree topology was obtained for the Murphy topology for all the models considered. Surprisingly, the Murphy tree appears to be significantly better than all other tree topologies. For example, for the N-GAMseparate model the log-likelihood difference between the Madsen (the second best tree) and Murphy tree topologies is 53.19 ± 20.62, corresponding to a P value of <0.05 by the Kishino-Hasegawa test.
Tree Search
In the above analyses, we investigated the differences between the supports of four predetermined tree topologies under nine different models. As can be seen from table 6
, the model can have an impact on the best topology found. For the Murphy data set under the concatenateN-GAM model, the Murphy tree appears to be the best, whereas under all the other models the Madsen tree is the best.
To further determine the effect of the model on tree topology, we implemented a tree-search algorithm to find the most likely tree under each of the models. Because of computational limitations, the tree search was conducted on 14 taxa representing the main mammalian clades, using a subset of the mitochondrial data set (fig. 5 ). Starting with various starting points (a neighbor-joining tree, a tree based on the mitochondrial topology, and a tree based on the Murphy topology), we searched the tree space for better trees through the nearest neighbors interchange (NNI) algorithm. We also limited our searches to the two best models found above (i.e., the proportionalN-GAM model and the separateN-GAM model). The alpha parameters and the gene-specific rates for this search were based on the corresponding N-GAM analysis of the complete mitochondrial data set.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Model Selection
Currently, the method of concatenating sequences is most frequently used for analyzing multiple data sets (Reyes, Pesole, and Saccone 2000
; Madsen et al. 2001
; Murphy et al. 2001
). Conversely, our results show that both the proportional model and the separate model yield results that are superior to the concatenate model. This is in agreement with the result of Cao et al. (1998)
, who showed that for mitochondrial data, separate analysis is superior to the concatenate method. However, selecting the superior of the separate and proportional models is a more complicated issue. The proportional model is more appropriate for analysis of the mitochondrial and Murphy data sets, whereas the separate model is favored for the Madsen data set. The separate analysis is preferable when each gene is of sufficient length to allow accurate determination of a separate set of branch lengths. In our study the sequence lengths in the Madsen data set were longer than those in the Murphy or mitochondrial data sets. When only the first 100 positions in each gene of the Madsen data set were analyzed, the proportional model was the best model (data not shown). Another factor that can affect model selection is inconsistent branch length among different genes in a data set. If, for example, significant rate acceleration occurred for only one gene in a specific clade, then analysis by the separate model would be justified. Thus, it is expected that the proportional model would be most appropriate when analyzing closely related sequences. Yang (1996b)
showed that the proportional model was superior to the separate analysis for a data set composed of four parts: the first-, second-, and third-codon positions of six mitochondrial genes, and a fourth part composed of 11 tRNA sequences. We speculate that this proportional model preference is due to the fact that Yang's analysis considered only primate sequences.
In our analyses it proved more fruitful to assume a different gamma parameter for each gene rather than a single gamma parameter for all genes. Each gamma distribution involves a shape parameter alpha. This alpha parameter determines the shape of the gamma distribution and is inversely related to the extent of rate variation among sites. Some genes show substantial rate variance, whereas others exhibit a more homogenous distribution of the rate of different positions. For example, for the Murphy data set, assuming the best model, the alpha parameter ranges from 0.93 for the ATP7 gene to 0.12 for the CNR1 gene (see table 3 ). We conclude that the substantial difference in rate distribution among genes is sufficient to justify a separate gamma parameter for each gene.
Regarding the effect of the model on tree selection, our results show that the model chosen has an effect on tree topology. It is expected that the model would also affect bootstrap support for different clades, and molecular date estimation based on several genes. More simulation studies and improvements in computational techniques are required to explore fully the effect of these different models on phylogeny reconstruction.
Before selecting a model that combines different genes, one must consider whether there is a basis for combining the genes of interest in the first place. To address this issue, Huelsenbeck and Bull (1996)
proposed a likelihood ratio test designed to detect conflicting phylogenetic signals among genes. Regarding the genes used in our study, we followed Cao et al. (2000)
, Madsen et al. (2001)
, and Murphy et al. (2001)
and assumed that there is agreement between the gene tree and the species tree. Of course, before any analysis of a new data set, such an assumption should be verified (for review see Huelsenbeck, Bull, and Cunningham 1996
).
Mammalian Phylogeny
For all the models and data sets considered in our study, the morphological tree exhibited significantly lowest log-likelihood values (results of the Kishino-Hasegawa test not shown). Many traditional morphological clades are not supported by molecular phylogeny analysis (see Springer et al. 1997
, 1999
; Murphy et al. 2001
), as exemplified by the clades Archonta (bats and primates), Anagalida (elephant shrew and glires), and Ungulata (aardvark, horses, cows, whales, elephants, dugongs, and hyraxes). Interestingly, the McKenna tree (McKenna and Bell 1997
) has also been challenged by recent morphological discoveries. For example, Thewissen et al. (2001)
confirmed a close relationship between Cetacea and Artiodactyls, whereas Cetacea was previously considered as a sister clade of Mesonychia.
Both the mitochondrial and the nuclear data sets support their respective trees for all the models considered. Our results for the mitochondrial data set show that there is no significant difference between the mitochondrial tree and the nuclear tree with regard to likelihood when using the 1-GAM or the N-GAM models (P > 0.05; results of the Kishino-Hasegawa test not shown). However, with the homogenous models the mitochondrial tree was found to be significantly better than both the Madsen and the Murphy trees (P < 0.03; results of the Kishino-Hasegawa test not shown). This is in agreement with Sullivan and Swofford (1997)
, who showed that simplified models could lead to systematic errors.
Our results for the two nuclear data sets reject the mitochondrial tree for all the models considered (P < 0.05; results of the Kishino-Hasegawa test not shown). Thus, the nuclear data sets discriminate more than the mitochondrial data set between alternative topologies. Hence, it is apparent that there is more "phylogenetic signal" in the nuclear genes (e.g., Springer et al. 2001
). The main differences between the mitochondrial tree and the nuclear trees are that (1) Eulipotyphla insectivores (hedgehogs, moles, shrews) are paraphyletic in the mitochondrial tree and the Erinaceidae (hedgehogs) are the most basal mammalian taxa in the mitochondrial tree; (2) glires and Euarchonta (primates, flying lemur, tree shrews) do not cluster in a single clade (the Euarchontoglires) in the mitochondrial tree but appear paraphyletic at the base of the placental tree; (3) rodents are paraphyletic in the mitochondrial tree and monophyletic in the nuclear tree; and (4) consequently, Afrotheria (armadillos, anteaters, and sloths) and Xenarthra are at the base of the placental trees but have a more internal position in the mitochondrial tree.
When comparing the two nuclear trees, the Madsen data set supports the Murphy tree, and the Murphy data set supports the Madsen tree (for eight out of the nine models). For the Murphy data set the differences are not significant; however, the Madsen data set significantly supports the Murphy tree. Both trees support the same topology between the four main clades, Laurasiatheria, Euarchontoglires, Xenarthra, and Afrotheria, and any differences concern only the relationships among these four clades. It is worth noting that the full NNI tree search on the subset of the mitochondrial data set led to a tree supporting these four main clades as well as the rodent monophyly. Our results suggest that the Murphy tree is probably closer to the "true tree" than is the Madsen tree. However, we speculate that the true tree lies between these two alternative nuclear trees, and additional gene sequences and the development of better models will help to address these questions.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: combining data sets
phylogeny
maximum likelihood
Mammalia
molecular evolution
Address for correspondence and reprints: Tal Pupko, The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan. E-mail: tal{at}ism.ac.jp
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adachi J., M. Hasegawa, 1996 MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood Comput. Sci. Monogr 28:1-150
Arnason U., A. Gullberg, S. Gretarsdottir, B. Ursing, A. Janke, 2000 The mitochondrial genome of the sperm whale and a new molecular reference for estimating eutherian divergence dates J. Mol. Evol 50:569-578[ISI][Medline]
Burnham K. P., D. R. Anderson, 1998 Model selection and inference: a practical information-theoretic approach Springer-Verlag, New York
Cao Y., M. Fujiwara, M. Nikaido, N. Okada, M. Hasegawa, 2000 Interordinal relationships and timescale of eutherian evolution as inferred from mitochondrial genome data Gene 259:149-158[ISI][Medline]
Cao Y., A. Janke, P. J. Waddell, M. Westerman, O. Takenaka, S. Murata, N. Okada, S. Pääbo, M. Hasegawa, 1998 Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders J. Mol. Evol 47:307-322[ISI][Medline]
Corneli P. S., R. H. Ward, 2000 Mitochondrial genes and mammalian phylogenies: increasing the reliability of branch length estimation Mol. Biol. Evol 17:224-234
Felsenstein J., 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach J. Mol. Evol 17:368-376[ISI][Medline]
Friedman N., M. Ninio, I. Pe'er, T. Pupko, 2001 A structural EM algorithm for phylogenetic inference Pp. 132140 in T. Lengauer, D. Sankoff, S. Istrail, P. Pevzner, and M. Waterman, eds. Proceedings of the Fifth Annual International Conference on Computational Biology. ACM Press, New York
Graur D., W. H. Li, 1999 Fundamentals of molecular evolution. 2nd edition Sinauer Press, Sunderland, Mass
Huelsenbeck J. P., J. J. Bull, 1996 A likelihood ratio test to detect conflicting phylogenetic signal Syst. Biol 45:92-98[ISI]
Huelsenbeck J. P., J. J. Bull, C. W. Cunningham, 1996 Combining data in phylogenetic analysis Trends Ecol. Evol 11:152-157[ISI]
International Human Genome Sequencing Consortium. 2001 Initial sequencing and analysis of the human genome Nature 409:860-921[ISI][Medline]
Jones D. T., W. R. Taylor, J. M. Thornton, 1992 The rapid generation of mutation data matrices from protein sequences Comput. Appl. Biosci 8:275-282[Abstract]
Kishino H., M. Hasegawa, 1989 Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea J. Mol. Evol 29:170-179[ISI][Medline]
Linhart H., 1988 A test whether two AIC's differ significantly S. Afr. Stat. J 22:153-161[ISI]
Madsen O., M. Scally, C. J. Douady, D. J. Kao, R. W. DeBry, R. Adkins, H. M. Amrine, M. J. Stanhope, W. W. de Jong, M. S. Springer, 2001 Parallel adaptive radiations in two major clades of placental mammals Nature 409:610-614[ISI][Medline]
McKenna M. C., S. K. Bell, 1997 Classification of mammals above the species level Columbia University Press, New York
Mouchaty S. K., F. M. Catzeflis, A. Janke, U. Arnason, 2001 Molecular evidence of an African Phiomorpha-South-American Caviomorpha clade and support for Hystricognathi based on the complete mitochondrial genome of cane rat (Thryonomys swinderianus) Mol. Phylogenet. Evol 18:127-135[ISI][Medline]
Murphy W. J., E. Eizirik, W. E. Johnson, Y. P. Zhang, O. A. Ryder, S. J. O'Brien, 2001 Molecular phylogenetics and the origins of placental mammals Nature 409:614-618[ISI][Medline]
Nei M., S. Kumar, 2000 Molecular evolution and phylogenetics Oxford University Press, New York
Nikaido M., K. Kawai, Y. Cao, M. Harada, S. Tomita, N. Okada, M. Hasegawa, 2001 Maximum likelihood analysis of the complete mitochondrial genomes of eutherians and a reevaluation of the phylogeny of bats and insectivores J. Mol. Evol 53:508-516[ISI][Medline]
Novacek M. J., 1992 Mammalian phylogeny: shaking the tree Nature 356:121-125[ISI][Medline]
Phillips M. J., Y.-H. Lin, G. Harrison, D. Penny, 2001 Mitochondrial genomes of a bandicoot and a brushtail possum confirm the monophyly of australidelphian marsupials Proc. R. Soc. Lond. B 268:1533-1538[ISI][Medline]
Reyes A., G. Pesole, C. Saccone, 2000 Long-branch attraction phenomenon and the impact of among-site rate variation on rodent phylogeny Gene 259:177-187[ISI][Medline]
Sakamoto Y., M. Ishiguro, G. Kitagawa, 1986 Akaike information criterion statistics Reidel, Dordrecht, The Netherlands
Schmitz J., M. Ohme, H. Zischler, 2000 The complete mitochondrial genome of Tupaia belangeri and the phylogenetic affiliation of Scandentia to other Eutherian orders Mol. Biol. Evol 17:1334-1343
Springer M. S., H. M. Amrine, A. Burk, M. J. Stanhope, 1999 Additional support for Afrotheria and Paenungulata, the performance of mitochondrial versus nuclear genes, and the impact of data partitions with heterogeneous base composition Syst. Biol 48:65-75[ISI][Medline]
Springer M. S., A. Burk, J. R. Kavanagh, V. G. Waddell, M. J. Stanhope, 1997 The interphotoreceptor retinoid binding protein gene in therian mammals: implications for higher level relationships and evidence for loss of function in the marsupial mole Proc. Natl. Acad. Sci. USA 94:13754-13759
Springer M. S., R. W. DeBry, C. Douady, H. M. Amrine, O. Madsen, W. W. de Jong, M. J. Stanhope, 2001 Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction Mol. Biol. Evol 18:132-143
Sullivan J., D. L. Swofford, 1997 Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics J. Mamm. Evol 4:77-86
Takahashi K., M. Nei, 2000 Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used Mol. Biol. Evol 17:1251-1258
Thewissen J. G. M., E. M. Williams, J. L. Roe, S. T. Hussain, 2001 Skeletons of terrestrial cetaceans and the relationships of whales to artiodactyls Nature 413:277-281[ISI][Medline]
Yang Z., 1994 Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods J. Mol. Evol 39:306-314[ISI][Medline]
Yang Z., 1996a. Among-site rate variation and its impact on phylogenetics analysis Trends Ecol. Evol 11:367-372[ISI]
Yang Z., 1996b. Maximum-likelihood models for combined analyses of multiple sequence data J. Mol. Evol 42:587-596[ISI][Medline]