Ratios of Radical to Conservative Amino Acid Replacement are Affected by Mutational and Compositional Factors and May Not Be Indicative of Positive Darwinian Selection

Tal Dagan, Yael Talmor and Dan Graur

Department of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 References
 
The ratio of radical to conservative amino acid replacements is frequently used to infer positive Darwinian selection. This method is based on the assumption that radical replacements are more likely than conservative replacements to improve the function of a protein. Therefore, if positive selection plays a major role in the evolution of a protein, one would expect the radical-conservative ratio to exceed the expectation under neutrality. Here, we investigate the possibility that factors unrelated to selection, i.e., transition-transversion ratio, codon usage, genetic code, and amino acid composition, influence the radical-conservative replacement ratio. All factors that have been studied were found to affect the radical-conservative replacement ratio. In particular, amino acid composition and transition-transversion ratio are shown to have the most profound effects. Because none of the studied factors had anything to do with selection (positive or otherwise) and also because all of them (singly or in combination) affected a measure that was supposed to be indicative of positive selection, we conclude that selectional inferences based on radical-conservative replacement ratios should be treated with suspicion.


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 References
 
Nonsynonymous substitutions are far more likely than synonymous substitutions to improve the function of a protein. Because advantageous mutations undergo fixation much more rapidly than neutral mutations and also because the rate of synonymous mutation per synonymous site is the same as the rate of nonsynonymous mutation per nonsynonymous site, the rate of nonsynonymous substitution is expected to exceed that of synonymous substitution, if positive Darwinian selection plays a major role in the evolution of a protein. Nei and Gojobori (1986)Citation were the first to take advantage of this rationale to infer purifying selection. In their method, the ratio between nonsynonymous and synonymous rates is used; if the ratio is significantly larger than 1, advantageous selection is inferred. This method was used in a large number of studies, e.g., most recently by Bielawski and Yang (2001)Citation , Ford (2001)Citation , Johnson and Seger (2001)Citation , Lukens and Doebley (2001)Citation , Swanson et al. (2001)Citation , and Welch and Meselson (2001)Citation . Indeed, Endo et al. (1996) used this method to estimate the prevalence of positive selection and concluded that advantageous selection is a rare phenomenon, being detectable in their set of protein-coding genes in only ~0.5% of the cases. One problem with the nonsynonymous-synonymous ratio is that synonymous substitutions tend to become saturated; therefore, they are underestimated more quickly than nonsynonymous substitutions. In such cases, the nonsynonymous-synonymous ratio may artifactually exceed 1, and positive selection may be inferred where none exists.

Hughes, Ota, and Nei (1990)Citation proposed to circumvent the saturation problem by using the ratio of radical to conservative amino acid replacements. The rationale of this method is very similar to that used in the nonsynonymous-synonymous ratio case. That is, radical replacements are assumed to be more likely than conservative replacements to improve the function of a protein. Therefore, if positive selection plays a major role in the evolution of a protein, we should expect the radical-conservative ratio to exceed the expectation under no selection. There are several methods to estimate the radicalism or conservatism of a particular amino acid replacement. One, for example, may decide that the property of interest is electric charge, and therefore, all replacements that result in charge changes are radical, whereas all replacements that do not affect charge are conservative. Alternatively, several properties may be considered simultaneously through the use of a physico-chemical measure, such as Grantham's (1974)Citation distance. The radical-conservative replacement ratio has also been used extensively to infer positive selection (e.g., Hughes, Ota, and Nei 1990Citation ; Hughes 1992Citation ; Rand, Weinreich, and Cezairliyan 2000Citation ; Hughes 2000, 2002Citation ).

In this study, we investigate the possibility that factors unrelated to selection influence the radical-conservative replacement ratio values. For example, it is known that transversions result in more dramatic changes than do transitions. That is, transversions are more likely than transitions to be nonsynonymous in protein-coding regions, and nonsynonymous transversions are more likely to result in radical replacement than nonsynonymous transitions (Zhang 2000Citation ). It is, therefore, possible that differences in radical-conservative replacement ratios may be caused by mutations factors, such as the transition-transversion ratio, rather than selectional forces. In this study, we simulated DNA-sequence evolution and resulting radical-conservative replacement ratios by varying transition-transversion ratios, codon usage, genetic code, and amino acid composition. In the simulation we introduced no hint of positive selection.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 References
 
Simulated Protein Evolution
Each virtual protein-coding gene was 300 nucleotides long, resulting in a protein 100 amino acids in length. Genetic code, codon usage, and amino acid composition were fixed at the beginning of each simulation. Each virtual gene was used as the ancestor sequence in the simulated-evolution program of ROSE software (Stoye, Evers, and Meyer 1998Citation ). In each run, fixed transition-transversion ratio values were used. Each combination of variables was run 50 times. The number of substitutions between the ancestor sequence and the resulting sequence was 50.

Radical-Conservative Ratios
All the 190 possible amino acid replacements were classified using three independent criteria: (1) charge, (2) volume and polarity, and (3) Grantham's (1974)Citation physico-chemical distance.

Classification by charge was made by dividing the amino acids into three categories: positive (R, H, K), negative (D, E), and uncharged (A, N, C, Q, G, I, L, M, F, P, S, T, W, Y, V).

Classification by volume and polarity was made by dividing the amino acids into six categories: special (C), neutral and small (A, G, P, S, T), polar and relatively small (N, D, Q, E), polar and relatively large (R, H, K), nonpolar and relatively small (I, L, M, V), and nonpolar and relatively large (F, W, Y).

The two classifications above were taken from Zhang (2000)Citation . We did not use an additional classification in Zhang (2000)Citation , i.e., polarity, in order to keep the divisions independent of one another. Within each of the two classifications above, amino acid replacements were deemed conservative if they involved exchanges within a category and radical if the exchanges occurred among categories.

As far as Grantham's (1974)Citation distances are concerned, an amino acid replacement was deemed conservative if the distance value was smaller than 100 and radical otherwise.

Codon Usage
Three patterns of codon usage were used: random, GC biased, and AT biased. In the random pattern, each codon frequency was calculated as the frequency of the amino acid specified by the codon divided by the number of possible codons for the amino acid. In the GC- and AT-biased patterns of codon usage, each codon frequency was calculated as the frequency of the amino acid specified by the codon divided by the number of possible codons ending in GC or AT, respectively.

Amino Acid Composition
Eight amino acid compositions were used. Two compositions were the theoretical equilibrium expectations of two replacement matrices, i.e., Dayhoff's (1978, p. 345)Citation and JTT (Jones, Taylor, and Thornton 1992Citation ). Five compositions were derived from mean amino acid frequencies in different protein classes: (1) extracellular proteins, (2) anchored proteins, (3) membranal proteins, (4) intracellular proteins, and (5) nuclear proteins. The values were taken from Cedano et al. (1997)Citation . The eighth composition was of a proline-rich protein as an example of extreme amino acid bias. In this case, the frequency of 19 amino acids was set at 0.045, whereas the frequency of proline was 0.136. All amino acid frequencies are shown in table 1 .


View this table:
[in this window]
[in a new window]
 
Table 1 Amino Acid Frequencies in the Different Compositions

 
Transition-Transversion Ratios
Transition-transversion ratios inferred from real data range widely, depending among others on divergence time, lineage, and DNA origin (e.g., Lanave et al. 1986Citation ; Purvis and Bromham 1997Citation ; Yang and Yoder 1999Citation ). In our simulation we varied the ratio from 0.017 to 29. We studied 58 ratios, the probability for transition ranging from 0.001 to 0.0295 at 0.0005 intervals and the probability for transversion ranging from 0.029 to 0.0005 at 0.0005 intervals. These simulated values contain the range of ratios reported in the literature.

Insertion and deletion frequencies were set to zero in order to keep the length of the sequences constant and prevent gaps in the alignment.

Genetic Code
Two genetic codes were used: the standard (so-called universal) code and the vertebrate mitochondrial code.

Statistical Analyses
The effects of various variables and the interactions among them on the three radical-conservative replacement ratios were tested by a multiway analysis of variance (ANOVA). All the effects were considered as fixed.

Reality check
In order to establish that compositional and mutational factors may indeed produce false positive inferences of Darwinian selection, we simulated the evolution of several human protein-coding genes in which positive selection has never been reported, e.g., ß hemoglobin, interleukin 2, ribosomal protein S21 (accession numbers NM_000518.3, NM_001024.2, and NM_000586.1, respectively) under the substitution matrix of pseudogenes (presumably a completely neutral matrix of substitution reflecting the pattern of mutation without selection). The neutral substitution matrix was taken from Graur and Li (1999Citation , p. 126)


    Results and Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 References
 
The results of the multiway ANOVA are shown in table 2 . Regardless of the measure used to estimate the radical-conservative replacement ratio, all four factors that have been studied were found to affect the radical-conservative replacement ratio. The transition-transversion ratio and the amino acid composition, as well as the interaction between these two factors, were found to have the most pronounced affect on the radical-conservative ratio. All three radical-conservative measures are affected by mutational and compositional factors. When the amino acid replacements are classified by charge, most of the variation in the radical-conservative ratio is explained by amino acid composition. When the amino acid replacements are classified by either volume and polarity or by Grantham's distance, most of the variation in the radical-conservative ratio is explained by the transition-transversion ratio. These results were unaffected by either length of protein or divergence time between the proteins.


View this table:
[in this window]
[in a new window]
 
Table 2 P Values (left column) and Percent Variation (right column) Explained for Multiway Analyses of Variance of the Effects of Transition-Transversion Ratio, Amino Acid Composition, Codon Usage, Genetic Code, and Their Interactions on Radical-Conservative Ratio Measures Based on Amino Acid Classifications by Charge, Volume, and Polarity and Grantham's Distances

 
We tested the frequency of false positive inferences of Darwinian selection by simulating neutral evolution in ß hemoglobin, interleukin 2, and ribosomal protein S21. When the radical-conservative ratio was calculated on the basis of volume and polarity, 100% of estimates were false positives. When the radical-conservative ratio was calculated on the basis of Grantham's distances for ß hemoglobin, interleukin 2, and ribosomal protein S21, 17%, 21%, and 13% of the estimates, respectively, were false positives. With these three proteins, we obtained no false positives when the radical-conservative ratio was calculated on the basis of electric charge. We note, however, that false positive inferences of Darwinian selection with electric charge as the yardstick for computing radical-conservative ratio were especially abundant in our simulations when the amino acid composition was that of proteins located in the nucleus. None of the three proteins used in the reality check part are nuclear.

We conclude that many factors that have nothing to do with selection (positive or otherwise) either singly or in combination affect measures that were supposed to be indicative of positive selection. Therefore, selectional inferences based on radical-conservative replacement ratios should be treated with utmost caution. In fact, we recommend that these measures not be used at all.


    Footnotes
 
Pekka Pamilo, Reviewing Editor

Address for correspondence and reprints: Tal Dagan, Department of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel. tali{at}kimura.tau.ac.il . Back

Keywords: positive Darwinian selection conservative replacement radical replacement transition bias codon usage genetic codes amino acid composition Back


    References
 TOP
 Abstract
 Introduction
 Methods
 Results and Discussion
 References
 

    Bielawski J. P., Z. Yang, 2001 Positive and negative selection in the DAZ gene family Mol. Biol. Evol 18:523-529[Abstract/Free Full Text]

    Cedano J., P. Aloy, J. A. Perez-Pons, E. Ouerol, 1997 Relation between amino acid composition and cellular location of proteins J. Mol. Biol 266:594-600[ISI][Medline]

    Dayhoff M. O., 1978 Atlas of protein sequence and structure, Vol. 5 (Suppl.3) National Biomedical Research Foundation, Silver Spring, Md

    Endo T., K. Ikeo, T. Gojobori, 1996 Large-scale search for genes on which positive selection may operate Mol. Biol. Evol 13:685-690[Abstract]

    Ford M. J., 2001 Molecular evolution of transferrin: evidence for positive selection in salmonids Mol. Biol. Evol 18:639-647[Abstract/Free Full Text]

    Grantham R., 1974 Amino acid difference formula to help explain protein evolution Science 85:862-864

    Graur D., W.-H. Li, 1999 Fundamentals of molecular evolution Sinauer Associates, Inc., Sunderland, Mass

    Hughes A. L., 1992 Coevolution of the vertebrate integrin {alpha}- and ß-chain genes Mol. Biol. Evol 9:216-234[Abstract]

    , 2002 Origin and evolution of viral interleukin-10 and other dna virus genes with vertebrate homologous J. Mol. Biol 54:90-101

    Hughes A. L., J. A. Green, J. M. Garbayo, R. M. Roberts, 2000 Adaptive diversifications within a large family of recently duplicated, placentally expressed genes Proc. Natl. Acad. Sci. USA 97:3319-3323.[Abstract/Free Full Text]

    Hughes A. L., T. Ota, M. Nei, 1990 Positive Darwinian selection promotes charge profile diversity in the antigen binding cleft of class I major-histocompatibility-complex molecules Mol. Biol. Evol 7:515-524[Abstract]

    Johnson K. P., J. Seger, 2001 Elevated rates of nonsynonymous substitution in island birds Mol. Biol. Evol 18:874-881[Abstract/Free Full Text]

    Jones D. T., W. R. Taylor, J. M. Thornton, 1992 The rapid generation of mutation data matrices from protein sequences Comput. Appl. Biosci 8:275-282[Abstract]

    Lanave C., S. Tommasi, G. Preparata, C. Saccone, 1986 Transition and transversion rate in the evolution of animal mitochondrial DNA Biosystems 19:273-283[ISI][Medline]

    Lukens L., J. Doebley, 2001 Molecular evolution of the teosinte branched gene among maize and related grasses Mol. Biol. Evol 18:627-638[Abstract/Free Full Text]

    Nei M., T. Gojobori, 1986 Simple method for estimating the number of synonymous and non-synonymous nucleotide substitutions Mol. Biol. Evol 3:418-426[Abstract]

    Purvis A., L. Bromham, 1997 Estimating the transition/transversion ratio from independent pairwise comparisons with an assumed phylogeny J. Mol. Evol 44:112-119[ISI][Medline]

    Rand D. M., D. M. Weinreich, B. O. Cezairliyan, 2000 Neutrality tests of conservative–radical amino acid changes in nuclear and mitochondrially encoded proteins Gene 291:115-125

    Stoye J., D. Evers, F. Meyer, 1998 ROSE: generating sequence families Bioinformatics 14:157-163[Abstract]

    Swanson W. J., A. G. Clark, H. M. Waldrip-Dail, M. F. Wolfner, C. F. Aquadro, 2001 Evolutionary EST analysis identifies rapidly evolving male reproductive proteins in Drosophila Proc. Natl. Acad. Sci. USA 98:7375-7379[Abstract/Free Full Text]

    Welch D. B., M. S. Meselson, 2001 Rates of nucleotide substitution in sexual and anciently asexual rotifers Proc. Natl. Acad. Sci. USA 98:6720-6724[Abstract/Free Full Text]

    Yang Z., A. Yoder, 1999 Estimation of the transition/transversion rate bias and species sampling J. Mol. Evol 48:274-283[ISI][Medline]

    Zhang J., 2000 Rates of conservative and radical nucleotide substitutions in mammalian genes J. Mol. Evol 50:56-68[ISI][Medline]

Accepted for publication January 31, 2002.