Phylogénie, Bioinformatique et Génome, CNRS, Université Pierre et Marie Curie, Paris
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
RAS models postulate that the evolutionary rate of a position is constant throughout time (i.e., in all lineages), even if this rate can vary between positions, eventually leading to so-called slow and fast positions. We will call such models homotachous (from "same speed" in Greek). In such a static evolutionary framework, a fast evolving position will be so in any taxonomic group. However, as demonstrated by Fitch (1971)
, substitutions in the cytochrome c occur at different positions in fungi versus metazoa, which is incompatible with any homotachous model (Fitch 1971
). This is, however, compatible with the covarion model (Fitch and Markowitz 1970
). In this model, at a given time, only a fraction of the positions (called "c"), the concomitantly variable codons (covarions), can accept substitutions, yet with the same probability for each of them. After a substitution, the probability of change of the covarions is 1 - "p" (p is called persistence). When such changes happen, a randomly chosen variable position becomes invariable and vice versa, since "c" is assumed to be constant. Nevertheless, studies have shown that the number of variable positions can be different between lineages (Germot and Philippe 1999
), suggesting that a constant c is a limitation of the covarion model (Steel, Hudson, and Lockhart 2000)
. The model has been refined by including permanently variable and invariable positions (Fitch and Ye 1991
). Although this framework appeared as early as 1971, it has never been shown to thoroughly explain the data. Fitch verified that, in a simulation under a covarion model, pairs of simulated sequences displayed the same amount of differences as real ones. However, this is not an extensive validation, and the authors admit that "the gamma [...] model is a viable alternative" (Miyamoto and Fitch 1995
). In fact the covarion model did not receive much attention until recently (Lockhart et al. 1998
; Tuffley and Steel 1998
; Lopez, Forterre, and Philippe 1999
; Galtier 2001
; Gaucher, Miyamoto, and Benner 2001
).
Many proteins display global substitution rates of their positions that fit a law (Uzzell and Corbin 1971
; Yang 1996
), explaining the current success of the RAS model. Until recently, the assumption that these rates are constant within a position has not been tested. Thanks to different statistical tools (Philippe et al. 1996
; Lockhart et al. 1998
; Gu 1999
; Lopez, Forterre, and Philippe 1999
; Gaucher, Miyamoto, and Benner 2001
), it now has been convincingly demonstrated that the evolutionary rate of a given position is not always constant throughout time. These findings invalidate homotachous models but do not validate the covarion model either as a sufficient explanation of sequence evolution. For this reason, we coined the word heterotachous to describe such positions (Philippe and Lopez 2001)
, rather than the previous term covarion-like (Lockhart et al. 1998
; Lopez, Forterre, and Philippe 1999
). The need for a new term was also because of the possible confusion between covarion and covariation, which could be completely unrelated to heterotachy.
The rejection of homotachous models was always achieved with very divergent orthologs (archaea vs. eukaryota, plastids vs. cyanobacteria, or animals vs. plants [Fitch 1971
; Miyamoto and Fitch 1995
; Lockhart et al. 1998
]), or between paralogs of different functions (Gu 1999
; Lopez, Forterre, and Philippe 1999
; Naylor and Gerstein 2000
). In such cases, the functional constraints are likely different, which would explain the different distribution of variable sites. It has even been suggested that "the covarion theory can be treated as a special case of functional divergence" (Gu 1999
). We instead think that heterotachy (e.g., the covarion theory) is more widely relevant. Therefore, we investigated the possible rejection of the homotachous models when functional changes were presumably ruled out.
As an accurate estimation of evolutionary rates requires a great amount of data, we focused on vertebrate cytochrome b for which more than 3,000 almost complete sequences are available. As the metabolism of the mitochondrion is homogeneous among vertebrates, the proteins of our data set could be reliably considered devoid of functional changes. It appeared that almost all variable positions in the cytochrome b are heterotachous, although there is likely no functional shift. We investigated the localization of heterotachous positions with respect to the three-dimensional structure of the protein, and did not find any clear relationships.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Test of the Distribution of the Number of Substitutions
The distribution of the number of substitutions is compared to the distribution produced by the best-fitting distribution (estimated by PAML [Yang 1997
]). If they are not significantly different, as assessed by a 5% level chi-square test, then we consider the
distribution a good fit.
Test of Heterotachy
If the substitution rate is constant for a position, then the substitutions should be more or less evenly dispatched along the tree. As the data set is divided into monophyletic groups, a tree is computed for each of them. A position is then described by its profile, i.e., the number of changes it undergoes in every group. If a given position is homotachous, its profile should be proportional to the size (in steps) of the groups, which we test with a modified chi-square test (Lopez, Forterre, and Philippe 1999
). Our method thus allows for determining how many positions significantly reject a homotachous behavior. The number of substitutions was inferred for each position either by maximum parsimony (MP), using PAUP 3.1 software (Swofford 1993
) or by maximum likelihood (ML), with the help of GZ-AA software (Gu and Zhang 1997
). A sufficient number of substitutions are necessary to yield significant statistical results. Therefore, a position is considered testable when undergoing a number of substitutions greater than half the number of groups.
Simulations
Sets of sequences were simulated on a template tree, obtained from the MP reconstruction of 200 sequences of vertebrate cytochrome b (four monophyletic groups of 50 sequences). Simulations under a homotachous model were performed by pSeq-Gen (Rambaut and Grassly 1997
), and simulations under a covarion model were performed by simtree (Fitch and Ye 1991
).
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
First, we increased the number of species with a constant number of groups. For three groups (birds, mammals, and fishes), the number of significantly heterotachous positions steadily increased with the number of sequences, until an asymptotic value was reached (fig. 2a ). With 10 sequences in each group (a common case for most markers), only 9% of the positions are detected as heterotachous, a value close to the 5% expected by chance. However, up to 47% are found with 300 sequences in each group. Second, we increased the number of species by increasing the number of groups. The number of significantly heterotachous positions increased rather linearly and converged to the 81% value observed with 32 groups. Contrary to the previous case (fig. 2a ), the shape of the curve (fig. 2b ) suggests that saturation is not reached, and that adding more groups will still allow the detection of more heterotachous positions, likely up to 100%. Third, large numbers of changes were underestimated by the MP method, which we used to reduce computational time, and this underestimation might reduce the detection of heterotachy. Whenever ML methods were applied, the percentage of heterotachous positions increased, up to 88% for our complete data set, rendering our MP-based estimations (81%) conservative. Finally, we considered testable the positions that underwent at least an average of 0.5 substitutions per group, for which the resolving power of our method can be weak. If this threshold is raised, an increase of the percentage of heterotachous positions is always observed. For instance, with 32 groups, at least one substitution per group and ML estimates, 95% of the positions are heterotachous.
|
Heterotachy and Function-structure Analysis
The fact that the functional constraints on a position do not stay the same throughout time should not be an unexpected outcome, both for intrinsic (protein structure) or extrinsic reasons (protein interactions) (Spiller et al. 1999
). First, a single mutation can change the ensuing mutation probabilities of other positions (Fitch and Markowitz 1970
). Second, the environment of the protein will necessarily change, especially because of substitutions occurring in interacting proteins. This is of course more relevant for proteins belonging to large complexes, like the cytochrome b, that have many interactions with other molecules (Xia et al. 1997
). As an example of such environmental changes, the repopulation of mouse mitochondrial DNAless cells with rat mitochondria restores translation but not respiratory functions in the mitochondrion (Yamaoka et al. 2000)
. Even if mouse and rat proteins are highly similar, it demonstrates that some mitochondrion-encoded rat proteins (e.g., cytochrome b) cannot interact properly with mouse nucleusencoded proteins (e.g., cytochrome c, cytochrome oxidase IV), because few independent modifications of interacting proteins were enough to severely disturb the function of the complex. This result is in full agreement with our finding of heterotachy in murid cytochrome b (fig. 1
). The same observations were also made on primates (Kenyon and Moraes 1997
) and on yeast (Spirek et al. 2000)
. These experiments show how, in different lineages, coevolution of proteins canalizes the evolution of a protein in different directions, explaining why heterotachy can be found when function remains the same.
Since the crystal structure of the cytochrome bc1 complex from bovine heart mitochondria has been solved (Xia et al. 1997
), we have mapped the heterotachous sites on the three-dimensional structure (fig. 3
). For clarity, the two subunits of cytochrome b that show opposite sides of the molecule are displayed, so that the eight transmembrane helices are easily visible (fig. 3A
). When considering all vertebrates, patterns identifying constant and heterotachous positions can be observed to be evenly distributed across the structure. Patterns identifying homotachous positions are scarce. The distribution of the different pattern types could not be easily interpreted in terms of the three-dimensional structure. We suspect that the reason for this is the considerable evolutionary divergence separating the groups studied. We therefore focused on smaller taxonomic groups (rodents, fig. 3B;
birds, fig. 3C;
cetartiodactyles, fig. 3D
). In these analyses, many homotachous positions did appear, but seemed also evenly distributed along the structure. From the naive idea that the
helices mainly serve to anchor cytochrome b in the membrane, one would expect that functional constraints acting on them remain the same throughout time. The presence of many heterotachous positions in these helices however demonstrated that their function may be more complex than simple anchoring.
|
|
Heterotachy and the Covarion Model
Considering the extent of heterotachy in protein evolution, sufficiently descriptive models of sequence evolution need to reproduce this feature. The models presently implemented in phylogenetic reconstruction (e.g., the law model [Strimmer and von Haeseler 1996
]) are homotachous, i.e., they assume that the substitution rate of a position is constant throughout time. Such a model seems at first glance appropriate for vertebrate cytochrome b, as the total number of substitutions for vertebrate cytochrome b fits a binomial negative distribution quiet well (data not shown), as is expected if the rate of substitutions is distributed according to a
law. But, we verified that sequences simulated under this model only display a level of heterotachy close to 5%, which is the level expected by chance (table 2
). A single heterotachous model, the covarion one, has been proposed in 1970 by Fitch and Markowitz. In this model, at a given time, only a fraction of the positions, the covarions, can accept substitutions, yet with the same probability for each of them. After a substitution, the covarion pool has a fixed probability to change. When such changes happen, a randomly chosen variable position becomes invariable and vice versa. In order to know whether the covarion model explains our observations, we have conducted extensive simulations of sequence evolution (see Supplementary Material). In brief, the covarion model was able to generate heterotachous positions and binomial negative distribution of the number of substitutions or both, depending on the values of the two free parameters of the model. Unfortunately, no values (table 2
) can simultaneously reproduce the observations made on cytochrome b (too many heterotachous positions for the correct
parameter or too high an
parameter for the correct fraction of heterotachous positions). Thus, our observations are not explained by any current model of sequence evolution.
|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: covarion
cytochrome b
molecular evolution
phylogeny
vertebrates
Address for correspondence and reprints: Hervé Philippe, Phylogénie, Bioinformatique et Génome, UMR 7622 CNRS, Université Pierre et Marie Curie, 9, quai St. Bernard, 75005 Paris, France. herve.philippe{at}snv.jussieu.fr
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adachi J., M. Hasegawa, 1996 Model of amino acid substitution in proteins encoded by mitochondrial DNA J. Mol. Evol 42:459-468[ISI][Medline]
Adachi J., P. J. Waddell, W. Martin, M. Hasegawa, 2000 Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA J. Mol. Evol 50:348-358[ISI][Medline]
Altschul S. F., E. V. Koonin, 1998 Iterated profile searches with PSI-BLASTa tool for discovery in protein databases Trends Biochem. Sci 23:444-447[ISI][Medline]
Brenner S. E., C. Chothia, T. J. Hubbard, 1998 Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships Proc. Natl. Acad. Sci. USA 95:6073-6078
Dayhoff M. O., 1979 Atlas of protein sequence and structure. Vol. 5, Suppl. 3, 1978 National Biomedical Research Foundation, Washington, D.C
Fitch W. M., 1971 The nonidentity of invariable positions in the cytochromes c of different species Biochem. Genet 5:231-241[ISI][Medline]
Fitch W. M., E. Markowitz, 1970 An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution Biochem. Genet 4:579-593[ISI][Medline]
Fitch W. M., J. Ye, 1991 Weighted parsimony: does it work? Pp. 147154 in M. M. Miyamoto and J. Cracraft, eds. Phylogenetic analysis of DNA sequences. Oxford University Press, New York
Galtier N., 2001 Maximum-likelihood phylogenetic analysis under a covarion-like model Mol. Biol. Evol 18:866-873
Gaucher E. A., M. M. Miyamoto, S. A. Benner, 2001 Functionstructure analysis of proteins using covarion-based evolutionary approaches: elongation factors Proc. Natl. Acad. Sci. USA 98:548-552
Germot A., H. Philippe, 1999 Critical analysis of eukaryotic phylogeny: a case study based on the HSP70 family J. Eukaryot. Microbiol 46:116-124[ISI][Medline]
Gu X., 1999 Statistical methods for testing functional divergence after gene duplication Mol. Biol. Evol 16:1664-1674
Gu X., J. Zhang, 1997 A simple method for estimating the parameter of substitution rate variation among sites Mol. Biol. Evol 14:1106-1113[Abstract]
Huelsenbeck J. P., 1998 Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? Syst. Biol 47:519-537[ISI][Medline]
Jones D. T., W. R. Taylor, J. M. Thornton, 1992 The rapid generation of mutation data matrices from protein sequences Comput. Appl. Biosci 8:275-282[Abstract]
. 1994 A mutation data matrix for transmembrane proteins FEBS Lett 339:269-275[ISI][Medline]
Kenyon L., C. T. Moraes, 1997 Expanding the functional human mitochondrial DNA database by the establishment of primate xenomitochondrial cybrids Proc. Natl. Acad. Sci. USA 94:9,131-9,135
Kimura M., 1987 Molecular evolutionary clock and the neutral theory J. Mol. Evol 26:24-33[ISI][Medline]
Lockhart P. J., M. A. Steel, A. C. Barbrook, D. Huson, M. A. Charleston, C. J. Howe, 1998 A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages Mol. Biol. Evol 15:1183-1188[Abstract]
Lopez P., P. Forterre, H. Philippe, 1999 The root of the tree of life in the light of the covarion model J. Mol. Evol 49:496-508[ISI][Medline]
Miyamoto M. M., W. M. Fitch, 1995 Testing the covarion hypothesis of molecular evolution Mol. Biol. Evol 12:503-513[Abstract]
Naylor G. J., M. Gerstein, 2000 Measuring shifts in function and evolutionary opportunity using variability profiles: a case study of the globins J. Mol. Evol 51:223-233[ISI][Medline]
Oue S., A. Okamoto, T. Yano, H. Kagamiyama, 1999 Redesigning the substrate specificity of an enzyme by cumulative effects of the mutations of non-active site residues J. Biol. Chem 274:2344-2349
Philippe H., A. Germot, 2000 Phylogeny of eukaryotes based on ribosomal RNA: long-branch attraction and models of sequence evolution Mol. Biol. Evol 17:830-834
Philippe H., G. Lecointre, H. L. V. L, H. Le Guyader, 1996 A critical study of homoplasy in molecular data with the use of a morphologically based cladogram Mol. Biol. Evol 13:1174-1186
Philippe H., P. Lopez, 2001 On the conservation of protein sequences in evolution Trends Biochem. Sci 26:414-416[ISI][Medline]
Rambaut A., N.C. Grassly, 1997 Seq-Gen:anapplicationfortheMonteCarlosimulationofDNAsequenceevolutionalongphylogenetictrees Comput.Appl.Biosci 13:235-238
Rzhetsky A., M. Nei, 1994 Unbiased estimates of the number of nucleotide substitutions when substitution rate varies among different sites J. Mol. Evol 38:295-299[ISI][Medline]
Spiller B., A. Gershenson, F. H. Arnold, R. C. Stevens, 1999 A structural view of evolutionary divergence Proc. Natl. Acad. Sci. USA 96:12305-12310
Spirek M., A. Horvath, J. Piskur, P. Sulo, 2000 Functional co-operation between the nuclei of Saccharomyces cerevisiae and mitochondria from other yeast species Curr. Genet 38:202-207[ISI][Medline]
Steel M., D. Huson, P. J. Lockhart, 2000 Invariable sites models and their use in phylogeny reconstruction Syst. Biol 49:225-232[ISI][Medline]
Strimmer K., A. von Haeseler, 1996 Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies Mol. Biol. Evol 13:964-969
Sullivan J., D. L. Swofford, 1997 Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics J. Mamm. Evol 4:77-86
Swofford D. L., 1993 PAUP: phylogenetic analysis using parsimony. Version 3.1.1 Illinois Natural History Survey, Champaign
Tobin M. B., C. Gustafsson, G. W. Huisman, 2000 Directed evolution: the rational basis for irrational design Curr. Opin. Struct. Biol 10:421-427[ISI][Medline]
Tuffley C., M. Steel, 1998 Modeling the covarion hypothesis of nucleotide substitution Math. Biosci 147:63-91[ISI][Medline]
Uzzell T., K. W. Corbin, 1971 Fitting discrete probability distributions to evolutionary events Science 172:1089-1096[ISI][Medline]
Van de Peer Y., A. Ben Ali, A. Meyer, 2000 Microsporidia: accumulating molecular evidence that a group of amitochondriate and suspectedly primitive eukaryotes are just curious fungi Gene 246:1-8[ISI][Medline]
Xia D., C. A. Yu, H. Kim, J. Z. Xia, A. M. Kachurin, L. Zhang, L. Yu, J. Deisenhofer, 1997 Crystal structure of the cytochrome bc1 complex from bovine heart mitochondria Science 277:60-66
Yamaoka M., K. Isobe, H. Shitara, H. Yonekawa, S. Miyabayashi, J. I. Hayashi, 2000 Complete repopulation of mouse mitochondrial DNA-less cells with rat mitochondrial DNA restores mitochondrial translation but not mitochondrial respiratory function Genetics 155:301-307
Yang Z., 1996 Among-site rate variation and its impact on phylogenetic analyses Trends Ecol. Evol 11:367-370[ISI]
. 1997 PAML: phylogenetic analysis by maximum likelihood. Version 1.3 Department of Integrative Biology, University of California at Berkeley