Clinical Pharmacology Department, Royal College of Surgeons in Ireland, Dublin, Ireland
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Gene duplication gene function adaptive evolution amino acid replacement conserved sites
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Data and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Phylogenetic Tree and Ancestral Node Reconstruction
We performed a large-scale comparison at the amino acid level between evolutionary patterns after duplication and speciation events. Phylogenetic trees were constructed automatically from 1,821 Pfam protein domain "seed" alignments using ClustalW (Thompson et al. 1997) software, which created neighbor-joining trees, allowing for multiple amino acid replacements. Trees were rooted using Retree from the phylip package (Felsenstein 1993). Because the trees were constructed and rooted automatically, some errors in topology and direction of evolution are expected. Whereas such errors introduce noise into the data set, they should not be biased with regard to definitions of speciation and duplication branches. Ancestral states of sequences were determined for each internal node of each tree using software from the Paml (Yang 1997) package.
Defining Postduplication Branches
If you were to start out with a tree with only proteins from one organism, it would only have duplication nodes and branches. If you now introduce additional species, the effect would be like "randomly" introducing additional speciation nodes along these branches. Evolution after the speciation nodes should be the same as evolution at random points along the original duplication branches. The speciation nodes may be taken to reflect random samples from somewhere along the original duplication branches.
Each branch of the tree was classified as being after gene duplication or after speciation, and the nature of the evolutionary changes between the node sequences at the beginning and end of the branch was analyzed (fig. 1). Duplication nodes, and the subsequent "duplication branches," were defined as all those nodes whose two descendent branches were ancestral to members of the same species. All other nodes were considered to be nonduplication, or "speciation" nodes. When the tree topology is correct, nodes arising through gene duplication should be identified correctly, although nodes may sometimes be incorrectly identified as speciation nodes when paralogs descended from them have not been identified or have been lost through deletion. Misclassification reduces the differences observed between postduplication and postspeciation branches, so that the true differences are probably greater than the observed difference. A total of 35,500 speciation and 14,377 duplication branches were assessed.
|
Constrained Sites
Sequence positions for which there was evidence (at the 90% significance level) that the position was evolving at a slower rate than the sequence as a whole (i.e., ri < 1 [see below]) were classified as constrained sites (on average 23% of sequence positions). Let dc be the inferred number of replacements per constrained site and d, the number of replacements per site. dc and d are corrected separately for multiple hits. The ratio b = dc/d gives an indication of the proportion of replacements at constrained sites.
The determination of whether a residue was significantly slower evolving than the sequence as a whole was based on the following approach. The probability of a given amino acid substitution at a given residue position in a branch is described by the PAM matrix (Dayhoff 1972) equivalent to L ri, where L is the actual PAM length of the branch and ri is the relative rate of evolution at position i (PAM = 10 is equivalent to 10 replacements per 100 sites). The accuracy of the estimate of ri depends on the number of branches, on the lengths of the branches, and on the number of residues in the protein. We generated a confidence interval for the rate ri allowing for different amounts of information concerning ri in branches of different lengths. From Bayes's Theorem,
|
Branches with Unusually High Numbers of Constrained Site Changes
Assuming a binomial distribution for replacements at constrained and unconstrained sites on a branch, the probability of the observed or a greater number of replacements at constrained sites occurring on a branch was calculated as
|
Error bars for the figures were calculated by the bootstrap method (Efron 1979): the data sets were sampled randomly, with replacement, 1,000 times and 2.5% tails of the distribution were used to produce the error bars for figure 2.
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We tested whether there was a greater tendency for replacements at constrained sites after duplication. "Constrained sites" were defined as those with a significantly slower than average rate of evolution. For a given branch, b is defined as the ratio of the number of replacements per constrained site to the number of replacements per site. Figure 2 compares the values of b for branches after gene duplication and for branches after speciation. Over a range of evolutionary distances, there are consistent and significant excesses of constrained site change after duplication. For example, looking at branches of PAM 6 to 10, there is approximately a 60% increase in b for postduplication branches. Thus, constrained sites are subject to greater change after gene duplication than after speciation nodes. Within vertebrate lineages alone, in the range 5 < PAM < 31, the value of b after duplication nodes was 0.23, significantly higher (P = 0.001) than after nonduplication nodes (0.17). Thus, vertebrate protein evolution shows a similar excess of constrained site change after duplication.
On average, nodes preceding gene duplications occur earlier in the phylogenetic trees than speciation nodes. To test that this did not introduce bias, we compared postduplication and postspeciation branches in bins of evolutionary depth. Evolutionary depth was approximated as the average PAM length between the node and its descendent terminal nodes. Whereas the ratio of replacements at constrained sites increases with evolutionary depth, the difference between postduplication branches and postspeciation branches remained significant (fig. 3). The increase in constrained site change with evolutionary depth could reflect long-term selection for diversity of protein function or gradual changes in the set of residues that is constrained (Miyamoto and Fitch 1995).
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
It is also possible that functional redundancy is maintained for a long time after duplication for a subset of duplicated genes (Nowak et al. 1997; Taverna and Goldstein 2000; Wagner 2000). Because these genes are maintained in duplicate over much longer times than most gene pairs, the opportunity for evolution of novel function in these genes is much greater than for most duplicated gene pairs. If this is the case, then the establishment of long-term redundancy or near-redundancy may be frequent after gene duplication, and the raw material for the evolution of novel function may be long-term functionally redundant gene pairs rather than the more common short-lived duplicated gene pairs. Changes in amino acid sequences with divergence times far in excess of the half-life that has been estimated for duplicate genes (Lynch and Conery 2000) are thus likely to be relevant to the fate of duplicated genes.
Previous investigations (Lynch and Conery 2000; Kondrashov et al. 2002) have revealed an increase in the ratio of nonsynonymous nucleotide substitutions per nonsynonymous site (dN) to synonymous substitutions per synonymous site (dS) after gene duplication. Although many examples of the evolution of functional novelty after gene duplication have been found (Rosenberg et al. 1995; Zhang, Rosenberg, and Nei 1998; Zhang, Kimberly, and Rosenberg 2000), it is not clear whether the increase in the rate of nonsynonymous substitution more often reflects reduction in purifying selection, allowing an increased number of slightly deleterious mutations to be fixed. If the increase in dN reflects a reduction in the efficiency of purifying selection during a prolonged period in which functional redundancy is maintained, then the number of slightly deleterious replacements should increase. Maintenance of functional redundancy will require that replacements that completely alter or destroy the protein function continue to be selected against. As a result, we might not expect the increase in slightly deleterious replacements to be accompanied by an increase in replacements at highly constrained sites and the proportion of potentially function-altering replacements should not increase after duplication. We have found evidence that there is an increase in the proportion of amino acid replacements at amino acid sites that are normally constrained after duplication. We argue that amino acid replacements at constrained sites are indicative of changes in protein function and suggest that an increased level of functional change after gene duplication is a general phenomenon that can be observed as an increased relative rate of replacement at constrained sites.
Constrained sites reflect the action of purifying selection against amino acid changes, which can alter the properties of a protein in such a way as to prevent it from carrying out its function effectively. We have interpreted the increase in the rate of replacement at constrained sites as evidence of greater functional change after duplication than after speciation. This interpretation is reasonable, provided that change in protein function is used in a broad sense to include the nonfunctioning, impaired function or reduced functioning of one or both of the gene duplicates as well as the evolution of a novel function. Whereas a change in the functioning of a duplicated gene is not in itself sufficient to prove that a novel protein function has evolved, over large timescales, the probability that both copies of the gene will be maintained will be increased for proteins that have evolved novel functions or specificities. The function change could also be temporary. For example, a change that reduces the fitness of one duplicate could allow a protein to move between distinct fitness peaks during a period of relaxed selection after duplication, followed by increased selection.
A change in protein function may, in some cases, be caused by a set of amino acid replacements at normally constrained sites. Shortly after duplication, the preduplication site-specific rates can be restored. This has been referred to as type II functional divergence (Gu 1999). Alternatively, a change in protein function may change the site-specific rates of replacement permanently, so that the constrained sites after the duplication are different from the constrained sites before duplication. This kind of functional divergence has been termed type I (Gu 1999). Although the method we have used to analyze functional divergence after duplication is more suited to the detection of type II divergence, it should also be possible to detect divergence of type I in many cases. We have identified constrained amino acid sites based on the set of replacements that have been observed across an entire protein family. Type II functional divergence is associated with an increase in the proportion of amino acid replacements at sites that are constrained throughout the whole family. Type I divergence may alter the set of sites that are constrained. Provided the total length of branches after the duplication is significantly less than the remaining length of the tree, the constrained sites will be determined largely by remaining branches of the tree. An altered rate of replacement after duplication may then cause an elevation in the proportion of replacements at constrained sites.
Previous codon-based comparisons of nonsynonymous changes per nonsynonymous site (dN) with synonymous changes per synonymous site (dS) point to a period of relaxed selection immediately after gene duplication (Lynch and Conery 2000; Kondrashov et al. 2002). Our results show no increase in the proportion of replacements at constrained sites on short branches immediately after gene duplication. There are several possible explanations for this. The accuracy with which phylogenetic relationships involving short branches can be inferred is lower than for longer branches. This will impact on the accuracy with which we have labeled nodes as being derived either from duplication or speciation as well as the amino acid replacements associated with the branch. It is also the case that protein pairs immediately after duplication are more likely to be functionally redundant. During a period in which complete functional redundancy is maintained, the number of slightly deleterious mutations may increase without a concomitant increase in replacements at highly constrained sites (as argued above). If these shorter branches include large numbers of protein pairs that are remaining functionally redundant as well as proteins undergoing functional change, then the effect of the redundant pairs may mask any functional adaptation along these branches. The change in the proportion of replacements at constrained sites reported here is distinct from the relaxed selection that has previously been reported on two grounds. First, the timescale on which we have observed our result (five to 25 amino acid replacements per 100 sites) is greater than the timescale of most of the comparisons of dN with dS after duplication (Lynch and Conery 2000; Kondrashov et al. 2002). Second, our method highlights amino acid replacements at a subset of residues and cannot be brought about by an increase in the overall proportion of nonsynonymous change alone.
Comparisons of sequence divergence after gene duplication and speciation should be approached with caution. In several published studies, an excess of dN over dS immediately after duplication has been observed followed by a gradual decrease in the dN/dS with time (Hughes 1999; Lynch and Conery 2000). These results have often been interpreted as evidence that there is a period of relaxed selection after duplication with the normal mode of selection being resumed after some time. The need for caution arises from the fact that nonsynonymous sequence positions, unlike synonymous positions, are under a wide range of selective pressures. If, for example, a sequence has a small proportion of nonsynonymous positions that are normally evolving under diversifying selection (e.g., an antigenic region of a pathogen), dN/dS may be greater than 1 for comparisons of closely related sequences, regardless of whether the sequences have diverged through duplication or speciation. By comparing amino acid sequences and grouping branches in bins according to length, our comparison of postduplication and postspeciation evolution avoids this problem. To infer that positive selection has been brought about by the duplication event, sequence divergence at a specific time point after duplication should be compared with divergence at the same time point after speciation for the same gene. The same caution needs to be applied to inference of selection from the ratio of radical and nonradical amino acid changes.
Unusual changes at constrained sites provide important insights into the evolution of functional differences (Dermitzakis and Clark 2001; Knudsen and Miyamoto 2001) Identifying clusters of such changes within a protein pinpoints regions of proteins that may confer functional specificity (Casari, Sander, and Valencia 1995; Lichtarge, Bourne, and Cohen 1996; Gu 1999; Caffrey, O'Neill, and Shields 2000; Gu 2001a, 2001b, 2001c), complementing other approaches of identifying adaptive change, such as the identification of excesses of nonsynonymous over synonymous DNA substitutions (Iwabe, Kuma, and Miyata 1996; Hughes 1999; Liberles et al. 2001). For example, this could define target regions of proteins in the design of therapeutic drugs that are specific for the protein. The findings of this study provide a strong rationale for seeking to identify residues conferring functional specificity and indicate that in the majority of protein families, such searches should be focused on evolutionary changes that have occurred over a reasonably wide evolutionary time frame after duplication.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bateman, A., E. Birney, R. Durbin, S. R. Eddy, K. L. Howe, and E. L. Sonnhammer. 2000.. Nucleic Acids Res. 28:263-266.
Braunitzer, G., and I. Hiebl. 1988. Molecular aspects of high altitude respiration of birds. Hemoglobins of the striped goose (Anser indicus), the Andean goose, (Chloephaga melanoptera) and vulture (Gyps rueppellii). Naturwissenschaften 75:280-287.[CrossRef][ISI][Medline]
Caffrey, D. R., L. A. O'Neill, and D. C. Shields. 2000. A method to predict residues conferring functional differences between related proteins: application to MAP kinase pathways. Protein Sci. 9:655-670.[Abstract]
Casari, G., C. Sander, and A. Valencia. 1995. A method to predict functional residues in proteins. Nat. Struct. Biol. 2:171-178.[ISI][Medline]
Dayhoff, M. 1972. Atlas of protein sequence and structure, Vol. 5. National Biomedical Research Foundation, Washington, DC.
Dermitzakis, E. T., and A. G. Clark. 2001. Differential selection after duplication in mammalian developmental genes. Mol. Biol. Evol. 18:557-562.
Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. 1998. Biological sequence analysis. Cambridge University Press, Cambridge.
Efron, B. 1979. Bootstrap methods: another look at the jackknife. Ann. Stat. 7:1-26.[ISI]
Felsenstein, J. 1993. PHYLIP (phylogeny inference package). Version 3.5c. Distributed by the author, Department of Genetics, University of Washington, Seattle.
Force, A., M. Lynch, F. B. Pickett, A. Amores, Y. L. Yan, and J. Postlethwait. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:1531-1545.
Gu, X. 1999. Statistical methods for testing functional divergence after gene duplication. Mol. Biol. Evol. 16:1664-1674.
2001a. A site-specific measure for rate difference after gene duplication or speciation. Mol. Biol. Evol. 18:2327-2330.
2001b. Mathematical modeling for functional divergence after gene duplication. J. Comput. Biol. 8:221-234.[CrossRef][ISI][Medline]
2001c. Maximum-likelihood approach for gene family evolution under functional divergence. Mol. Biol. Evol. 18:453-464.
Hughes, A. L. 1992. Coevolution of the vertebrate integrin alpha- and beta-chain genes. Mol. Biol. Evol. 9:216-234.[Abstract]
1994. The evolution of functionally novel proteins after gene duplication. Proc. R. Soc. Lond. B Biol. Sci. 256:119-124.[ISI][Medline]
1999. Adaptive evolution of genes and genomes. Oxford University Press, New York.
Iwabe, N., K. Kuma, and T. Miyata. 1996. Evolution of gene families and relationship with organismal evolution: rapid divergence of tissue-specific genes in the early evolution of chordates. Mol. Biol. Evol. 13:483-493.[Abstract]
Knudsen, B., and M. M. Miyamoto. 2001. A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc. Natl. Acad. Sci. USA 98:14512-14517.
Kondrashov, F. A., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002. Selection in the evolution of gene duplications. Genome Biol. 3:research0008.1-0008.9.
Li, W.-H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.
Liberles, D. A., D. R. Schreiber, S. Govindarajan, S. G. Chamberlin, and S. A. Benner. 2001. The adaptive evolution database (TAED). Genome Biol. 2:research0028.1-0028.6.
Lichtarge, O., H. R. Bourne, and F. E. Cohen. 1996. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257:342-358.[CrossRef][ISI][Medline]
Lynch, M., and J. S. Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151-1155.
Messier, W., and C.-B. Stewart. 1997. Episodic adaptive evolution of primate lysozymes. Nature 385:151-154.[CrossRef][ISI][Medline]
Miyamoto, M. M., and W. M. Fitch. 1995. Testing the covarion hypothesis of molecular evolution. Mol. Biol. Evol. 12:503-513.[Abstract]
Nowak, M. A., M. C. Boerlijst, J. Cooke, and J. M. Smith. 1997. Evolution of genetic redundancy. Nature. 388:167-171.[CrossRef][ISI][Medline]
Ohno, S. 1970. Evolution by gene duplication. Springer Verlag, Heidelberg, Germany.
Rosenberg, H. F., K. D. Dyer, H. L. Tiffany, and M. Gonzalez. 1995. Rapid evolution of a unique family of primate ribonuclease genes. Nat. Genet. 10:219-223.[ISI][Medline]
Stewart, C.-B., and A. C. Wilson. 1987. Sequence convergence and functional adaptation of stomach lysozymes from foregut fermenters. Cold Spring Harbor Symp. Quant. Biol. 52:891-899.[ISI][Medline]
Taverna, D. M., and R. M. Goldstein. 2000. The evolution of duplicated genes considering protein stability constraints. Pp. 6980 in Proceedings of the Pacific Symposium on Biocomputing.
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876-4882.
Yang, Z. 1993. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10:1396-1401.[Abstract]
1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555-556.[Medline]
Wagner, A. 2000. Decoupled evolution of coding region and mRNA expression patterns after gene duplication: implications for the neutralist-selectionist debate. Proc. Natl. Acad. Sci. USA 97:6579-6584.
Zhang, J., D. D. Kimberly, and H. F. Rosenberg. 2000. Evolution of the rodent eosinophil-associated RNase gene family by rapid gene sorting and positive selection. Proc. Natl. Acad. Sci. USA 97:4701-4706.
Zhang, J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:3708-3713.
Zvelebil, M. J., G. J. Barton, W. R. Taylor, and M. J. Sternberg. 1987. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J. Mol. Biol. 195:957-961.[ISI][Medline]