Phylogénie, Bioinformatique et Génome, Université Pierre et Marie Curie, Paris, France
Correspondence: E-mail: herve.philippe{at}umontreal.ca.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: covarion evolutionary rate hemoglobin heterotachy protein function tertiary structures
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The switch of constraints on positions over time is a poorly understood phenomenon. Indeed, although the notion that not all sites in a protein are subjected to the same evolutionary forces is well established (Kimura 1983), a site can show dramatic changes in substitution rates on separate parts of a phylogeny. Evidence for such behavior dates as early as the 1970s, with the formulation of Fitch's covarion model of molecular evolution (Fitch and Markowitz 1970). The term heterotachy (Greek for different speed) was recently coined to refine the description of this phenomenon (Philippe and Lopez 2001), as opposed to homotachy, which indicates a homogeneous substitution rate. Because heterotachy (i.e., lineage-specific substitution rate shifts) reflects constraint variation on specific sites of a protein structure across time, it is generally indicated as a landmark of functional divergence (Gaucher et al. 2002b). Under this reasoning, the identification of heterotachous profiles between paralogous genes would be potentially informative for structure/function prediction analyses, because gene duplication is a major source of functional innovation (Ohno, Wolf, and Atkin 1968). Various approaches have been applied to a number of paralogous families in order to identify shifts in replacement rates that may be indicative of their functional diversification after duplication (Gu 1999, 2001; Gaucher, Miyamoto, and Benner 2001; Knudsen and Miyamoto 2001; Gaucher et al. 2002a, 2002b). Recently, Naylor and Gerstein (2000) employed Gu's coefficient of functional divergence (Gu 1999) to identify shifts in variability profiles between alpha and beta globins for three groups of mammals as a measure of their specialization over evolutionary time. Because they observed marked rate shifts between alpha and beta globins, but not within each subunit, they concluded that this approach can successfully pinpoint functional differentiation. However, because a very large number of sequences are needed to assess site-specific rate shifts with statistical confidence (Lopez, Casane, and Philippe 2002), these results might have been biased by both a scarce sampling and limited evolutionary distances. In addition to sites with replacement rate switches, highly constrained residues may also hold information regarding functional divergence. For instance, sites that switched state between two diverging proteins, but that nonetheless conserved a high evolutionary constraint and are consequently not identified in analyses of heterotachous positions, might be potentially important in the process of functional specialization. Although pinpointed by different approaches (Lichtarge, Bourne, and Cohen 1996; Gu 2001), this type of site remains poorly investigated in functional genomics studies.
We applied an approach coupling evolutionary and structural knowledge to survey the sites potentially responsible for functional specialization in the two subunits of vertebrate hemoglobin. Recently, such an approach was employed to investigate the fine functional differences between elongation factor homologs in bacteria and eukarya (Gaucher, Miyamoto, and Benner 2001; Gaucher et al. 2002a). Our choice was driven by the considerable representation of hemoglobin sequences in molecular databases, and by the vast amount of structural and functional information available. Tetrameric vertebrate hemoglobin consists of two identical subunits of 141 residues and of two identical ß subunits of 146 residues, each containing one heme group. Subunits
and ß are paralogous peptides arising from an ancient gene duplication at the base of jawed vertebrates. Oxygen binding is cooperative and is associated with a large shift in the quaternary structure of the heterotetramer, from the deoxy (T) to the oxy (R) forms, as one dimer rotates relative to the other. Two types of subunit interfaces are implicated in such transition, namely
1ß1 (or
2ß2) and
1ß2 (or
2ß1), also referred to as packing and sliding surfaces, respectively (Perutz 1970). During the transition from the T to R state, the
1ß2-subunit interface undergoes a dramatic sliding movement, whereas the
1ß1-subunit interface remains essentially unchanged. Mutational studies have demonstrated the importance of both inter-subunit and intra-subunit contact regions for critical hemoglobin properties such as oxygen affinity and cooperativity (Shionyu, Takahashi, and Go 2001). We sought to identify and study the set of positions potentially implicated in the development of such highly functional interactions over the divergence of
and ß subunits. We first studied heterotachous positions and found a similar level of variability-profile shift between orthologs and between paralogs, suggesting that heterotachous positions may be poor signatures of functional divergence. Then we turned instead to examining more constrained positions, which appeared to be much more reliable predictors.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Phylogenetic Analyses and Study of Site-Specific Evolutionary Behavior
Heterotachy was tested as previously described (Lopez, Casane, and Philippe 2002). Briefly, the number of substitutions at each site was calculated on a phylogenetic tree obtained from each of the six subalignments (corresponding to 3 and 3 ß sequence groups for each taxonomic cluster). Substitution numbers were inferred by maximum likelihood (ML) using PAML (Yang 1997) with the JTT + F +
model. Each position was described by a profile indicating the numbers of inferred substitutions for every group. The program HTACH (Philippe, unpublished) was then employed for all possible binary comparisons to identify positions as either (1) homotachous, (2) heterotachous, (3) constant, and (4) constant but different (CBD). Positions displaying less than a total of three substitutions were classified as untestable, because no statistical test can detect a difference in such cases (this corresponds to the criterion "a total number of substitutions smaller than half the number of groups" used by (Lopez, Forterre, and Philippe 1999)). Positions with only one change in one terminal branch were considered for categories (3) or (4), because this difference is likely the result of sequencing error.
Structure Analyses
The different classes of sites were superimposed onto three-dimensional hemoglobin structures retrieved from the Protein Data Bank (Berman et al. 2000), by using the user script option in RasMol (Sayle and Milner-White 1995). Side chain solvent accessibility was calculated by the program Access (http://www.csb.yale.edu/userguides/datamanip/access/access_descrip.html), but because various types of positions display similar values, this is not discussed here (see table S1 in the online Supplementary Material). Mutational data were retrieved from the Databases of Human Hemoglobin Variants and Other Resources through the Globin Gene Server at http://globin.cse.psu.edu/hbvar/menu.html (Hardison et al. 2002).
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The significance of these findings was investigated further by observing the structural distribution of the 40 H positions identified in mammalian paralogous comparisons (fig. 2a; for a detailed list, see table S2 in the online Supplementary Material). These positions appeared evenly dispersed all over the structure, both at internal and external locations (fig. 2a). Within the pool of H positions there were some likely to hold high functional significance, as they presented strong constraints in one subunit and much higher variability in the other. Consistently, the function of such residues was critical to only one chain (see table S2 in the online Supplementary Material). This was the case of six positions lying at inter-subunit contact surfaces. For example, leucine 40 presented no substitutions over the whole mammalian tree, whereas its ß39 homolog was much more variable. This site is crucial in the
chain as it interacts with histidine ß146 at the sliding interface. Instead, we found no functional indication for residue ß39. Similarly, position ß60 displayed a remarkably conserved valine over the whole mammalian tree, whereas its
homolog switched to variable amino acids on different branches. A valine to glutamate mutation at this site is reported to lead to a highly unstable ß globin responsible for a severe form of thalassemia (Podda et al. 1991), whereas the
55 homolog does not appear to be involved in any essential interaction.
|
To find more genuine signatures of functional divergence between the two globin subunits, we turned to positions harboring strong evolutionary constraints in both paralogs. Among them, we selected those that displayed different amino acid states in each paralog. Accordingly, we named them "conserved but different" (CBD). As shown in figure 1c, we found that, in contrast to H sites, CBD sites were overrepresented in paralogous comparisons with respect to orthologous ones, with a mean of 10% and
2%, respectively. The nonparametric Mann-Whitney rank test suggested that the difference was significant (w = 23; P < 0.002). For example, although a total of 15 and 13 such positions were identified in paralogous comparisons in Sauropsida and mammals, respectively, only one was found in orthologous comparisons between Sauropsida and mammals (see fig. 1c). This evidence seems to indicate a likely involvement of CBD positions in the specialization of the two globin families.
To confirm this prediction, we studied the distribution of CBD sites onto the hemoglobin quaternary structure. When superimposed onto the 3D structure of human adult hemoglobin, the 13 CBD sites identified in mammalian paralogous comparisons (for a detailed list, see table S2 in the online Supplementary Material) were concentrated at non-exposed locations (fig. 2b). This concentration was confirmed by the fact that almost all of them (10/13) were indeed reported to occupy contact surfaces (see table S2 in the online Supplementary Material), such as central cavity, ligand binding pockets, and inter-subunit contacts. In particular, six CBD sites were directly involved in both 1ß2 (sliding) and
1ß1 (packing) interfaces (Perutz 1970; Shionyu, Takahashi, and Go 2001). For example, tyrosine
41 and its homolog arginine ß40 were identified as a highly constrained CBD couple in mammals. These sites interact with each other at the sliding surface in the oxy state. Another case is that of arginine
141 and its homolog histidine ß146, both of which presented no substitutions over the whole mammalian tree. These sites are involved in crucial interactions with different residues in the deoxy state. The high proportion of CBD positions at inter-subunit surfaces supports their role as potential indicators of functional divergence, because the refinement of interactions at these interfaces played a fundamental role in the evolution of critical functions such as modulation of oxygen affinity and cooperative binding (Perutz 1970). It will be interesting to verify whether CBD sites have the same critical role in proteins that do not oligomerize, when their representation in public databases will be sufficient to allow an analysis similar to that presented here. Only three CBD sites occupied external locations on human hemoglobin (fig. 2b; see also table S2 in the Supplementary Material online). Because the reason for their strong conservation on the heterotetramer surface is not obvious, and because they are probably involved in functional divergence, they might represent promising candidates for further experimental studies.
In conclusion, our study underlines the power of integrating evolutionary analyses to structural data in functional prediction studies. In detail, we indicate CBD positions as more reliable markers of functional specialization than heterotachous sites. Although heterotachy remains a phenomenon worthy of further investigation for understanding the dynamics of protein evolution, the study of CBD sites may represent a novel and promising direction for genomic studies aimed at dissecting the function of members of large multigene families.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Brian Golding, Associate Editor
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. 2000. The Protein Data Bank. Nucleic Acids Res. 28:235-242.
Bickel, P. J., K. J. Kechris, P. C. Spector, G. J. Wedemayer, and A. N. Glazer. 2002. Finding important sites in protein sequences. Proc. Natl. Acad. Sci. USA 99:14764-14771.
Blouin, C., Y. Boucher, and A. J. Roger. 2003. Inferring functional constraints and divergence in protein families using 3D mapping of phylogenetic information. Nucleic Acids Res. 31:790-797.
Casari, G., C. Sander, and A. Valencia. 1995. A method to predict functional residues in proteins. Nat Struct Biol 2:171-178.[ISI][Medline]
Eisen, J. A. 1998. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8:163-167.
Eisenberg, D., E. M. Marcotte, I. Xenarios, and T. O. Yeates. 2000. Protein function in the post-genomic era. Nature 405:823-826.[CrossRef][ISI][Medline]
Enright, A. J., and C. A. Ouzounis. 2001. Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions. Genome Biol. 2:Research0034.
Felsenstein, J. 1985. Phylogenies and the comparative method. Am. Nat. 125:1-15.[CrossRef][ISI]
Fitch, W. M., and E. Markowitz. 1970. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4:579-593.[ISI][Medline]
Gaucher, E. A., U. K. Das, M. M. Miyamoto, and S. A. Benner. 2002a. The crystal structure of eEF1A refines the functional predictions of an evolutionary analysis of rate changes among elongation factors. Mol. Biol. Evol. 19:569-573.
Gaucher, E. A., X. Gu, M. M. Miyamoto, and S. A. Benner. 2002b. Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem. Sci. 27:315-321.[CrossRef][ISI][Medline]
Gaucher, E. A., M. M. Miyamoto, and S. A. Benner. 2001. Function-structure analysis of proteins using covarion-based evolutionary approaches: elongation factors. Proc. Natl. Acad. Sci. USA 98:548-552.
Gu, X. 1999. Statistical methods for testing functional divergence after gene duplication. Mol. Biol. Evol. 16:1664-1674.
Gu, X. 2001. Maximum-likelihood approach for gene family evolution under functional divergence. Mol. Biol. Evol. 18:453-464.
Hannenhalli, S. S., and R. B. Russell. 2000. Analysis and prediction of functional sub-types from protein sequence alignments. J. Mol. Biol. 303:61-76.[CrossRef][ISI][Medline]
Hardison, R. C., D. H. Chui, B. Giardine, C. Riemer, G. P. Patrinos, N. Anagnou, W. Miller, and H. Wajcman. 2002. HbVar: a relational database of human hemoglobin variants and thalassemia mutations at the globin gene server. Hum. Mutat. 19:225-233.[CrossRef][ISI][Medline]
Jensen, L. J., M. Skovgaard, and S. Brunak. 2002. Prediction of novel archaeal enzymes from sequence-derived features. Protein Sci. 11:2894-2898.
Kimura, M. 1983. The neutral theory of molecular evolution,. Cambridge University Press, Cambridge.
Knudsen, B., and M. M. Miyamoto. 2001. A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc. Natl. Acad. Sci. USA 98:14512-14517.
Lichtarge, O., H. R. Bourne, and F. E. Cohen. 1996. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257:342-358.[CrossRef][ISI][Medline]
Lichtarge, O., and M. E. Sowa. 2002. Evolutionary predictions of binding surfaces and interactions. Curr. Opin. Struct. Biol. 12:21-27.[CrossRef][ISI][Medline]
Lopez, P., D. Casane, and H. Philippe. 2002. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19:1-7.
Lopez, P., P. Forterre, and H. Philippe. 1999. The root of the tree of life in the light of the covarion model. J. Mol. Evol. 49:496-508.[ISI][Medline]
Marcotte, E. M., M. Pellegrini, H. L. Ng, D. W. Rice, T. O. Yeates, and D. Eisenberg. 1999. Detecting protein function and protein-protein interactions from genome sequences. Science 285:751-753.
Naylor, G. J., and M. Gerstein. 2000. Measuring shifts in function and evolutionary opportunity using variability profiles: a case study of the globins. J. Mol. Evol. 51:223-233.[ISI][Medline]
Ohno, S., U. Wolf, and N. B. Atkin. 1968. Evolution from fish to mammals by gene duplication. Hereditas 59:169-187.[Medline]
Pellegrini, M., E. M. Marcotte, M. J. Thompson, D. Eisenberg, and T. O. Yeates. 1999. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96:4285-4288.
Perutz, M. F. 1970. Stereochemistry of cooperative effects in haemoglobin. Nature 228:726-739.[ISI][Medline]
Philippe, H. 1993. MUST, a computer package of management utilities for sequences and trees. Nucleic Acids Res. 21:5264-5272.[Abstract]
Philippe, H., and P. Lopez. 2001. On the conservation of protein sequences in evolution. Trends Biochem. Sci. 26:414-416.[ISI][Medline]
Podda, A., R. Galanello, L. Maccioni, M. A. Melis, C. Rosatelli, L. Perseu, and A. Cao. 1991. Hemoglobin Cagliari (beta 60 [E4] Val-Glu): a novel unstable thalassemic hemoglobinopathy. Blood 77:371-375.[Abstract]
Sayle, R. A., and E. J. Milner-White. 1995. RasMol: biomolecular graphics for all. Trends Biochem. Sci. 20:374-376.[CrossRef][ISI][Medline]
Shionyu, M., K. Takahashi, and M. Go. 2001. Variable subunit contact and cooperativity of hemoglobins. J. Mol. Evol. 53:416-429.[CrossRef][ISI][Medline]
Tame, J. R., and B. Vallone. 2000. The structures of deoxy human haemoglobin and the mutant Hb Tyralpha42His at 120 K. Acta Crystallogr. D. Biol. Crystallogr. 56:805-811.[CrossRef][ISI][Medline]
Yang, Z. 1997. Phylogenetic Analysis by Maximum Likelihood (PAML), Version 1.3. Department of Integrative Biology, University of California at Berkeley,.
Yang, Z. 2002. Inference of selection from multiple species alignments. Curr. Opin. Genet. Dev. 12:688-694.[CrossRef][ISI][Medline]