European Molecular Biology Laboratory (EMBL), Biocomputing Unit, Heidelberg, Germany
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To determine whether some guidelines can be established, I conducted a comprehensive analysis of complete or nearly complete cytochrome b sequences using a single method of tree reconstruction and genetic distance estimation. In particular, I measured genetic distances from maximum-likelihood trees, and among-site rate variation was allowed in the model of evolution to avoid underestimation of distances among the most divergent species (Golding 1983
; Yang 1996
). Understanding the general trends and, very importantly, the main reasons for the possible biases in the mammalian classification will facilitate the proposition of more consistent taxonomic revisions from a molecular perspective.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Phylogenetic Analysis
Phylogenetic analyses were performed using PAUP*, version 4.0b4a (Swofford 1998
). The model of evolution used in the different calculations was the HKY model (Hasegawa, Kishino, and Yano 1985
) with among-site rate variation assuming a discrete gamma distribution with six rate categories. It has been shown that the estimation of parameters is more reliable for large trees (Excoffier and Yang 1999
). Thus, the parameters necessary for this model were estimated from an alignment of 632 full-length mammalian sequences, and values of 4.2 for the transition/transversion ratio and 0.54 for the gamma rate parameter were obtained. Randomly chosen sets of smaller numbers of sequences produced successively lower values for both parameters and indicated that the values obtained from the complete set were close to the saturation point. For families with only two representatives, the maximum-likelihood distance for the single pair was calculated. For families with three or more representatives, a neighbor-joining tree was first calculated and branch lengths were further optimized by maximum likelihood. Then, distances among all pairs of species were measured as the sum of the branch lengths separating them in the tree (patristic distances).
Statistical Analysis
Pairwise distances were divided into intrageneric and intergeneric distances for their examination. In the histograms of unweighted distances, genera and families with large numbers of species are overwhelmingly represented, since the number of pairwise distances in each genus or family grows in proportion to the square of the number of species. On the other hand, taking averages of pairwise distances in every genus and family, as was previously done (Johns and Avise 1998
), means that the information about the most biased valuesone of the interests in the present workgets lost. Thus, a weight was given to each distance so that every genus or family contributes to the histogram proportionally to the number of species and not to the square of this number. Specifically, intrageneric distances were given a weight of 2/(S - 1), where S is the number of species in the corresponding genus, and intergeneric distances were given a weight of 1/D1/2, where D is the number of intergeneric distances in the corresponding family. This ensures that all intrageneric distances in a genus contribute to the histogram with a value equal to the number of species in this genus, and all intergeneric distances in a family contribute with a value equal to the square root of the number of intergeneric distances, which grows approximately in proportion to the number of species in the family. Other weighting schemes with the same objective produced similar results.
Body masses of the species analyzed here were taken from Silva and Downing (1995)
. Correlations of the logarithm of the average body mass and the maximum intrageneric distance in each genus were calculated with the program JMP, version 3.2.6 (SAS Institute, Cary, N.C.). Phylogenetically independent comparisons (Harvey and Pagel 1991
) were not used for this calculation because the maximum intrageneric distance in each genus depends on a taxonomic decision.
For the computation of the correlation of body mass and evolutionary rates, I used phylogenetically independent comparisons (Harvey and Pagel 1991
). All trees were rooted with midpoint rooting, and terminal pairs of sister taxa (of the same genus or a different genus) for which the weights were available in Silva and Downing (1995)
were compared. If the longest branch in each pair is nonrandomly associated with the biggest (or smallest) species, then a positive (or negative) correlation will be found. Thus, for each pair of species, the difference of the logarithm of the body mass was compared with the difference of the logarithm of the branch length from their last common ancestor, as was described before (Bromham, Rambaut, and Harvey 1996
), and correlations of these two variables were calculated. Both the Pearson product-moment correlation and the Spearman rank correlation produced very similar results, and only the former is reported.
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the histograms of unweighted distances (fig. 1A
), a few genera and families with a large number of species were overwhelmingly represented, since the number of pairwise distances in each genus or family grows in proportion to the square of the number of species. To overcome this problem, which can make comparisons difficult, a weight was given to each distance so that every genus or family contributed to the histograms proportionally to the number of species and not to the square of this number (see Materials and Methods). Now there was an important relative reduction of the bands at long distances (at 0.22 substitutions per site in the intrageneric distance distribution and at
1 substitution per site in the intergeneric distribution; fig. 1B
), which contained mainly pairwise distances belonging to rodent families and dasyurids, making the histograms more suitable for comparative purposes.
|
The distinct separation of the intrageneric and intergeneric distance distributions allows the detection of groups of species whose classification is highly deviant from the main trends. The most extreme cases can be identified using the alternate weighted distribution (fig. 1B ). Thus, all genera showing intrageneric distances among some of their species that are higher than the limit of the modal class of the intergeneric distance distribution (i.e., larger than 0.30 substitutions per site) are listed in table 1 , and their classification would clearly be more consistent with the split of these genera. In fact, many of these genera are known to be differentiated in distinct clades or subgenera. The analysis of the other distribution, that of intergeneric distances, reveals species separated in different genera (in some cases up to five genera) but genetically very similar (i.e., with distances smaller than 0.10 substitutions per site; table 2 ). In these species, the classification in a single genus would better reflect their phylogenetic proximity. Furthermore, the highest intergeneric distances, mostly due to rodent and dasyurid genera, would support the split of some of the corresponding families.
|
|
Great Apes Taxonomy
Other genera with distance values not as extreme as those in tables 1 and 2
may also need to change their generic name to make it more consistent with their phylogeny. Recent classification schemes (Groves 1993
), as well as the taxonomy adopted by the EMBL and GenBank databases, include the great apes (chimpanzee, bonobo or pygmy chimpanzee, gorilla, and orangutan) in the same family as humans, the Hominidae. Although the phylogeny of hominids is well known (Ruvolo 1997
; Satta, Klein, and Takahata 2000
), all species except the two chimpanzees are classified in separate genera, and it has been suggested that at least humans and chimpanzees should be classified in the same genus (Diamond 1992
, p. 25; Easteal, Collet, and Betty 1995
, p. 131; Goodman et al. 1998
). The cytochrome b genetic distances between humans and both chimpanzee species are 0.150 and 0.160 substitutions per site, respectively. Although these distances are not as small as the distances between different genera reported in table 2
, they are much closer to the modal class of the intrageneric distance distribution than to the modal class of the intergeneric distribution (fig. 1B
), which would support the inclusion of humans, chimpanzees, and bonobos in the same genus, Homo (Diamond 1992
, p. 25; Easteal, Collet, and Betty 1995
; Goodman et al. 1998
). Distances between gorillas and the chimpanzees/humans clade range between 0.166 and 0.190 substitutions per site, closer to an equidistant point from both distributions (fig. 1B
), indicating that the generic status of gorillas (Homo or Gorilla) is more difficult to discern with the analysis of a single locus. Finally, distances between orangutans and the other hominids range between 0.221 and 0.264, close to the modal class of the intergeneric distance distribution, justifying its inclusion in the genus Pongo, different from the other hominids.
Toward a More Objective Classification
Although the genetic distance distributions of the cytochrome b gene shown in this work can be most easily used to detect groups with highly inconsistent classification in the context of the mammalian relationships, they can also help to estimate an approximate limit of divergence from which two species should be separated into different genera in a biological classification aided by a standardized temporal scheme (Avise and Johns 1999
). The minimum disruption of the current mammalian taxonomy would occur by establishing the limit at a genetic distance, when measured with the complete cytochrome b gene, of around 0.2 substitutions per site (i.e., in the middle of the intrageneric and intergeneric distance distributions; fig. 1B
), or 0.1 substitutions per site per lineage. If the human-chimpanzee divergence happened 5 MYA (see Yoder and Yang 2000
), this limit would correspond very approximately to 67 MYA, although analyses of more loci should be performed to obtain a better estimate (Avise and Johns 1999
). In addition, certain flexibility around this limit (in the form of a pre-established interval) that allows generic divisions to be placed on the longer edges of the tree would be desirable.
Body Size Taxonomic Bias
It is also of interest to decipher the main reasons for the biases in the mammalian classification. Examination of the mammalian genera that are too lumped (table 1
) or too split (table 2
) indicates that the former are mostly small animals (e.g., rodents or shrews), whereas the latter are bigger ones (e.g., elephants or dolphins). To determine whether this bias is more general, I plotted the maximum intrageneric distance in each genus (as a measure of the genetic diversity in the genus) versus the average body mass of the species in the genus (fig. 2
). For example, the genus Marmota contains 14 species (Hoffmann et al. 1993
; Steppan et al. 1999
), of which 13 had complete cytochrome b sequences available in the databases; the maximum genetic distance (0.155 substitutions per site) occurred between Marmota caligata and Marmota himalayana, two species belonging, respectively, to the two earliest divergent groups, and this distance was plotted against the logarithm of the average body mass in grams of this genus, which was 3.704 (Silva and Downing 1995
). The plot of all genera for which at least two species were available shows the tendency of genera of big animals to contain species genetically similar and of genera of smaller animals to embrace species genetically more diverse (fig. 2
). In fact, there is a strong negative correlation between the two variables. The use of the average instead of the maximum intrageneric distance as a measure of the genetic diversity or the use of the logarithm of the maximum distance yielded basically the same results.
|
Cytochrome b Evolutionary Rates
Finally, it was also necessary to test whether there was a systematic variation of the cytochrome b rates with body size, since it is well known that there are differences in the evolutionary rates of the cytochrome b gene among different lineages (Kocher et al. 1989
; Andrews, Jermiin, and Easteal 1998
). Some authors have postulated that the metabolic rate (Martin and Palumbi 1993
; Nunn and Stanley 1998
) or the generation time (Bromham, Rambaut, and Harvey 1996
; Li et al. 1996
), which, in turn, are correlated with the body mass, could be responsible for the different rates. Although these hypotheses were put forward using a limited number of species (Bromham, Rambaut, and Harvey 1996
), the possibility exists that the observed correlation between maximum cytochrome b divergence and body mass (fig. 2 ) only reflects that the cytochrome b sequences of small animals have faster evolutionary rates. Now this can be tested with the large sample of cytochrome b sequences used here. Using phylogenetically independent comparisons (Bromham, Rambaut, and Harvey 1996
) of 82 terminal pairs of mammals taken from the cytochrome b trees (see Materials and Methods), I have shown that no correlation exists between body mass and maximum-likelihood distance since the last common ancestor of each pair (fig. 3
), at least at the divergence levels analyzed in this work. Therefore, the observed correlation between body mass and genetic diversity in mammalian genera (fig. 2
) cannot be explained by an evolutionary rate acceleration in small animals; it is, rather, a reflection of a body size taxonomic bias.
|
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: cytochrome b, body size
mammals
hominids
maximum likelihood
genetic distance
2 Address for correspondence and reprints: Jose Castresana, European Molecular Biology Laboratory (EMBL), Biocomputing Unit, Meyerhofstrasse 1, D-69117 Heidelberg, Germany. jose.castresana{at}embl-heidelberg.de
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Andrews, T. D., L. S. Jermiin, and S. Easteal. 1998. Accelerated evolution of cytochrome b in simian primates: adaptive evolution in concert with other mitochondrial proteins? J. Mol. Evol. 47:249257[ISI][Medline]
Arnason, U., K. Bodin, A. Gullberg, C. Ledje, and S. Mouchaty. 1995. A molecular view of pinniped relationships with particular emphasis on the true seals. J. Mol. Evol. 40: 7885
Avise, J. C., and G. C. Johns. 1999. Proposal for a standardized temporal scheme of biological classification for extant species. Proc. Natl. Acad. Sci. USA 96:73587363
Baker, W., A. van den Broek, E. Camon, P. Hingamp, P. Sterk, G. Stoesser, and M. A. Tuli. 2000. The EMBL nucleotide sequence database. Nucleic Acids Res. 28:19 23
Bromham, L., A. Rambaut, and P. H. Harvey. 1996. Determinants of rate variation in mammalian DNA sequence evolution. J. Mol. Evol. 43:610621[ISI][Medline]
Diamond, J. M. 1992. The third chimpanzee: the evolution and future of the human animal. HarperCollins, New York
Easteal, S., C. Collet, and D. Betty. 1995. The mammalian molecular clock. R. G. Landes, Austin, Texas
Excoffier, L., and Z. Yang. 1999. Substitution rate variation among sites in mitochondrial hypervariable region I of humans and chimpanzees. Mol. Biol. Evol. 16:1357 1368[Abstract]
Faulkes, C. G., N. C. Bennett, M. W. Bruford, H. P. O'Brien, G. H. Aguilar, and J. U. Jarvis. 1997. Ecological constraints drive social evolution in the African mole- rats. Proc. R. Soc. Lond. B Biol. Sci. 264:16191627[ISI][Medline]
Giao, P. M., D. Tuoc, V. V. Dung, E. D. Wikramanayake, G. Amato, P. Arctander, and J. R. MacKinnon. 1998. Description of Muntiacus truongsonensis, a new species of muntjac (Artiodactyla: Muntiacidae) from central Vietnam, and implications for conservation. Anim. Conserv. 1:6168
Golding, G. B. 1983. Estimates of DNA and protein sequence divergence: an examination of some assumptions. Mol. Biol. Evol. 1:125142[Abstract]
Goodman, M., C. A. Porter, J. Czelusniak, S. L. Page, H. Schneider, J. Shoshani, G. Gunnell, and C. P. Groves. 1998. Toward a phylogenetic classification of Primates based on DNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. 9:585598[ISI][Medline]
Groves, C. P. 1993. Order primates. Pp. 243277 in D. E. Wilson and D. M. Reeder, eds. Mammal species of the world: a taxonomic and geographic reference. Smithsonian Institution Press, Washington, D.C
Harvey, P. H., and M. D. Pagel. 1991. The comparative method in evolutionary biology. Oxford University Press, Oxford, England
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160174[ISI][Medline]
Hoffmann, R. S., C. G. Anderson, R. W. Thorington, and L. R. Heaney. 1993. Family sciuridae. Pp. 419465 in D. E. Wilson and D. M. Reeder, eds. Mammal species of the world: a taxonomic and geographic reference. Smithsonian Institution Press, Washington, D.C
Irwin, D. M., T. D. Kocher, and A. C. Wilson. 1991. Evolution of the cytochrome b gene of mammals. J. Mol. Evol. 32:128144[ISI][Medline]
Johns, G. C., and J. C. Avise. 1998. A comparative summary of genetic distances in the vertebrates from the mitochondrial cytochrome b. Mol. Biol. Evol. 15:14811490
Kocher, T. D., W. K. Thomas, A. Meyer, S. V. Edwards, S. Pääbo, F. X. Villablanca, and A. C. Wilson. 1989. Dynamics of mitochondrial DNA evolution in animals: amplification and sequencing with conserved primers. Proc. Natl. Acad. Sci. USA 86:61966200
Lara, M. C., J. L. Patton, and M. N. da Silva. 1996. The simultaneous diversification of South American echimyid rodents (Hystricognathi) based on complete cytochrome b sequences. Mol. Phylogenet. Evol. 5:403413[ISI][Medline]
LeDuc, R. G., W. F. Perrin, and A. E. Dizon. 1999. Phylogenetic relationships among the delphinid cetaceans based on full cytochrome b sequences. Mar. Mamm. Sci. 15:619 648[ISI]
Li, W. H., D. L. Ellsworth, J. Krushkal, B. H. Chang, and D. Hewett-Emmett. 1996. Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol. Phylogenet. Evol. 5:182187[ISI][Medline]
Martin, A. P., and S. R. Palumbi. 1993. Body size, metabolic rate, generation time, and the molecular clock. Proc. Natl. Acad. Sci. USA 90:40874091
Matthee, C. A., and T. J. Robinson. 1999. Cytochrome b phylogeny of the family bovidae: resolution within the alcelaphini, antilopini, neotragini, and tragelaphini. Mol. Phylogenet. Evol. 12:3146[ISI][Medline]
Meyer, A. 1994. Shortcomings of the cytochrome b gene as a molecular marker. Trends Ecol. Evol. 9:278280[ISI]
Nunn, G. B., and S. E. Stanley. 1998. Body size effects and rates of cytochrome b evolution in tube-nosed seabirds. Mol. Biol. Evol. 15:13601371
Ruvolo, M. 1997. Molecular phylogeny of the hominoids: inferences from multiple independent DNA sequence data sets. Mol. Biol. Evol. 14:248265[Abstract]
Satta, Y., J. Klein, and N. Takahata. 2000. DNA archives and our nearest relative: the trichotomy problem revisited. Mol. Phylogenet. Evol. 14:259275[ISI][Medline]
Silva, M., and J. A. Downing. 1995. CRC handbook of mammalian body masses. CRC Press, Boca Raton, Fla
Steppan, S. J., M. R. Akhverdyan, E. A. Lyapunova, D. G. Fraser, N. N. Vorontsov, R. S. Hoffmann, and M. J. Braun. 1999. Molecular phylogeny of the marmots (Rodentia: Sciuridae): tests of evolutionary and biogeographic hypotheses. Syst. Biol. 48:715734[ISI][Medline]
Swofford, D. L. 1998. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer, Sunderland, Mass
Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pp. 407514 in D. M. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. Sinauer, Sunderland, Mass
Wilson, D. E., and D. M. Reeder. 1993. Mammal species of the world: a taxonomic and geographic reference. 2nd edition. Smithsonian Institution Press, Washington, D.C
Yang, Z. 1996. Among-site rate variation and its impact on phylogenetic analysis. Trends Ecol. Evol. 11:367372[ISI]
Yoder, A. D., and Z. Yang. 2000. Estimation of primate speciation dates using local molecular clocks. Mol. Biol. Evol. 17:10811090
|