Center for Information Biology and DNA Data Bank of Japan, National Institute of Genetics, Mishima-shi, Shizuoka-ken, Japan
Correspondence: E-mail: yossuzuk{at}lab.nig.ac.jp.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Three-dimensional structure window analysis positive selection human influenza A virus hemagglutinin neuraminidase
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
When the number of nucleotide substitutions that have accumulated at each codon site is small, several sites may be grouped to increase the total number of nucleotide substitutions. Although it may be difficult to identify selected sites exactly, the neutrality test should be more sensitive than with single-site analysis. There have been two approaches for grouping sites. In the first approach, the amino acid sites involved in similar functions are grouped because the type and strength of selection is likely to be similar for these sites (Hughes and Nei 1988, 1989). This approach is applicable when the functions of amino acid sites are known, but this is not the case for most proteins of interest. In the second, a sliding window is used to detect selection (one-dimensional window analysis) (Clark and Kao 1991). It is not necessary to understand the functions of amino acid sites in this approach. However, the sites included in a window are not necessarily involved in similar functions, because functions are determined by the three-dimensional structure rather than by the linear sequence. It is possible that the type and strength of selection is different among the sites in a window. In this case, the sensitivity of neutrality test should be low (Endo, Ikeo, and Gojobori 1996).
The purpose of this paper is to solve these problems, at least partially, by developing a method of three-dimensional window analysis for detecting selection in structural regions of proteins. It is shown that three-dimensional window analysis not only is more sensitive than single-site analysis and one-dimensional window analysis but also provides as much specificity as these methods for inferring positive selection in the analyses of the hemagglutinin (HA) and neuraminidase (NA) genes of human influenza A viruses.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
When the coordinate data for the side chains are available, the amino acid sites exposed on the protein surface in the three-dimensional window may be identified and grouped separately to allow for detection of positive selection, which is known to operate on the exposed sites in many cases (Hughes 1999). The exposed sites can be identified using relative solvent accessibility, which is defined as the solvent accessibility of a given site divided by the maximum solvent accessibility of the corresponding amino acid (Kabsch and Sander 1983). The maximum accessibilities for 20 amino acids are available in Rost and Sander (1994), and the accessibility of a given site can be computed by DSSP (version DsspCMBI-April-2000) (Kabsch and Sander 1983) using the coordinate data. An amino acid site is judged exposed when the relative accessibility is greater than 16% (Rost and Sander 1994).
The neutrality test is conducted for the grouped sites by extending the parsimony and likelihood methods for detecting selection at single amino acid sites. In the parsimony method of single-site analysis, the average numbers of synonymous (sS) and nonsynonymous (sN) sites as well as the total numbers of synonymous (cS) and nonsynonymous (cN) substitutions for a given phylogenetic tree are computed at each codon site of the sequences. Positive and negative selection are inferred if cN and cS are significantly larger than the expected values, respectively. In the parsimony method of three-dimensional window analysis, the sS and sN values are averaged and the cS and cN values are summed over the grouped sites to obtain SS, SN, CS, and CN. The neutrality test is conducted similarly to the single-site analysis, where sS, sN, cS, and cN are replaced with SS, SN, CS, and CN, respectively. In the likelihood method of single-site analysis, the rN/rS value is estimated at each codon site by the maximum-likelihood method. Positive and negative selection are inferred if rN/rS > 1 and rN/rS < 1, respectively, and the likelihood value is significantly larger than that obtained under the assumption rN/rS = 1. In the likelihood method of three-dimensional window analysis, the rN/rS value for the grouped sites is estimated under the assumption that the rN/rS values are the same for these sites. The neutrality test is conducted by comparing the likelihood value with that obtained under the assumption rN/rS = 1 for these sites.
In this paper, three-dimensional window analysis uses the parsimony method. One of the possible problems of the parsimony method is that the number of nucleotide substitutions may be underestimated when the sequence divergence is large, because this method does not correct for multiple substitutions. However, the probability of multiple substitutions appears to be small in the following two data sets because of the small sequence divergence as indicated below (Saitou 1989).
Analysis of the HA Genes of Human Influenza A Viruses
Influenza A viruses are etiological agents of influenza. HA is an envelope glycoprotein of 566 amino acids and forms homotrimers in the virion. This protein is classified into 15 subtypes (H1 to H15) according to the antigenicity that is determined mainly by five epitopes (A, B, C, D, and E). The amino acid positions included in each epitope are as follows: positions 122, 124, 126, 130 to 133, 135, 137, 138, 140, 142 to 146, 150, 152, and 168 in epitope A; positions 128, 129, 155 to 160, 163 to 165, 186 to 190, 192 to 194, and 196 to 198 in epitope B; positions 44 to 48, 50, 51, 53, 54, 273, 275, 276, 278 to 280, 294, 297, 299, 300, 304, 305, and 307 to 312 in epitope C; positions 96, 102, 103, 117, 121, 167, 170 to 177, 179, 182, 201, 203, 207 to 209, 212 to 219, 226 to 230, 238, 240, 242, 244, and 246 to 248 in epitope D; and positions 57, 59, 62, 63, 67, 75, 78, 80 to 83, 86 to 88, 91, 92, 94, 109, 260 to 262, and 265 in epitope E (Wiley, Wilson, and Skehel 1981; Macken et al. 2001; Wright and Webster 2001). Positive selection has been identified at many of these positions in H3 HA from human influenza A viruses (Fitch et al. 1997; Bush et al. 1999; Suzuki and Gojobori 1999). In addition, influenza viruses are usually passaged in embryonated chicken eggs before isolation. It has been reported that 22 amino acid sites (egg-selection sites) of the H3 HA of human influenza A viruses are positively selected during the passage (Bush et al. 2000). These sites are positions 111, 126, 137, 138, 144, 145, 155, 156, 158, 159, 185, 186, 193, 194, 199, 219, 226, 229, 246, 248, 276, and 290. Therefore, the sensitivity and specificity of single-site analysis, one-dimensional window analysis, and three-dimensional window analysis, were examined using this gene. It was hypothesized that positive selection should be predicted for the windows related to epitopes and egg-selection sites, whereas negative selection should be inferred at other windows.
All nucleotide sequences of the H3 HA genes of human influenza A viruses were extracted from the international nucleotide sequence database (DDBJ release 56). The sequences, including ambiguous nucleotides and minor gaps, were removed. In addition, I eliminated all sequences except for one when multiple sequences had been determined for the same strain. The accession numbers of the remaining 24 sequences were AB019354 to AB019357, AF017270, AF348176, AF363502 to AF363504, AF382318, AF382320, AF382322, AF382324, AJ252129, AJ252131, AJ289703, AY035591, AY271794, J02132, J02135, M55059, U26830, U97740, and X05907.
A multiple alignment of these sequences was made using ClustalW (Thompson, Higgins, and Gibson 1994). The phylogenetic tree was constructed by the neighbor-joining method (Saitou and Nei 1987) with the number of synonymous substitutions (Nei and Gojobori 1986). The total and average branch lengths of the phylogenetic tree were 0.539 and 0.0120, respectively. Single-site analysis was conducted with ADAPTSITE (version 1.3) (Suzuki, Gojobori, and Nei 2001). The two-parameter model (Kimura 1980) of nucleotide mutation was used for estimating sS and sN. The transition/transversion ratio (R) was estimated as the ratio of the transitional to transversional nucleotide diversities at the fourfold degenerate site and was determined to be R = 3.09. In one-dimensional window analysis, the window was defined as a certain number (window size) of continuous amino acid sites in the linear sequence, and all sites included in the window were grouped. The neutrality test was conducted in a way similar to three-dimensional window analysis. The window was progressively moved from the N-terminus towards the C-terminus of the sequence. For three-dimensional window analysis, the coordinate data for H3 HA from strain A/Hong Kong/1/68 (H3N2) with the accession number 1KEN in the PDB was used (Barbey-Martin et al. 2002). The HA of this strain is one of the best-characterized HAs among all H3 strains. The significance level was 5% for all neutrality tests.
Analysis of the NA Genes of Human Influenza A Viruses
NA is also an envelope glycoprotein of 469 amino acids, and forms homotetramers in the virion. This protein is classified into nine subtypes (N1 to N9). Several epitopes have been identified in N2 NA of human influenza A viruses. One of the best-characterized epitopes is called NC41, which consists of positions 326 to 330, 343 to 347, 366 to 372, 399 to 403, and 430 to 435 (Colman et al. 1987; Wright and Webster 2001). There appears to be no indication of positive selection for any epitope of NA in the literature. Single-site analysis, one-dimensional window analysis, and three-dimensional window analysis, were used to examine the gene for indications of positive selection.
All nucleotide sequences of the N2 NA genes of human influenza A viruses were extracted from the DDBJ (release 56). The sequences, including ambiguous nucleotides and minor gaps, were removed. In addition, I eliminated all sequences except for one when multiple sequences had been determined for the same strain. The accession numbers of the remaining 120 sequences were AB126623, AF038260 to AF038265, AF382329, AF382332, AF386761, AF386763, AF503463 to AF503469, AF503471, AF533730, AF533731, AF533733, AF533735, AF533738, AF533742 to AF533745, AF533747 to AF533750, AJ291403, AJ307599 to AJ307606, AJ307609, AJ307611, AJ307612, AJ307614 to AJ307621, AJ307623, AJ307624, AJ307626, AJ307627, AJ307629, AJ316063, AJ457931 to AJ457938, AJ457940, AJ457942 to AJ457946, AJ457956, AJ457957, AJ457960, AJ457962 to AJ457966, AJ489846 to AJ489849, AY271795, D10164, K01150, U42632 to U42637, U42770 to U42780, U43417 to U43424, U43426, U51245 to U51247, and U71140 to U71143.
These sequences were analyzed in a way similar to the analysis of the HA genes. The total and average branch lengths of the phylogenetic tree were 1.04 and 0.00438, respectively. The R value was estimated as 1.53. The coordinate data for N2 NA was available for the strain A/Tokyo/3/67 (H2N2) with the accession number 1NN2 in the PDB (Varghese and Colman 1991). The NA of this strain is one of the best-characterized NAs among all N2 strains.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Positive selection was not inferred for any site of HA by single-site analysis. Note that positive selection has been inferred for many sites by previous analyses using many (> 100), but partial, sequences of the same gene, as mentioned above. It is likely that the number of sequences analyzed in this study was so small that the number of nucleotide substitutions that had accumulated at each codon site was insufficient to detect significant differences between rS and rN.
One-dimensional window analysis was used to increase the number of nucleotide substitutions for the neutrality test. Various window sizes, ranging from 2 to 20 were examined. However, positive selection was not inferred for any window of any size.
In three-dimensional window analysis, window sizes ranging from 1 Å to 10 Å were examined to make the average number of sites in a window compatible with that in one-dimensional window analysis ( 20) (table 1). The number of amino acid sites in a window increased on average as the window size increased, and its distribution was roughly unimodal for each window size (Supplementary Material online). The analyses with window sizes of 1 Å, 2 Å, and 3 Å were the same as single-site analysis because only one site was included in every window. One window each was inferred as positively selected when the window size was 6 Å, 7 Å, and 8 Å. The positively selected window of 6 Å consisted of five sites that were included in epitope A (three sites were also involved in egg selection) and two sites that were involved neither in any epitope nor in egg selection. However, the latter sites, positions 136 and 139, were invariable codon sites. Note that invariable codon sites do not contribute very much to the neutrality test because cS = cN = 0 at these sites. This window was, therefore, considered to be involved in epitope A. The positively selected window of 7 Å was apparently obtained by adding position 140, which was included in epitope A, to the positively selected window of 6 Å. However, because this position was also an invariable codon site, the positively selected window of 7 Å was indistinguishable from that of 6 Å from the statistical viewpoint, and the former window could be reduced to the latter. The positively selected window of 8 Å was apparently obtained by adding position 144, which was a variable codon site and involved in epitope A (and also in egg selection), to the positively selected window of 7 Å. This window was, therefore, also involved in epitope A (fig. 1).
|
|
|
In one-dimensional window analysis, however, positive selection was not inferred at any window with sizes ranging from 2 to 20, despite the fact that the number of nucleotide substitutions was increased for the neutrality test compared with single-site analysis. Note that all windows containing position 267 were not inferred as positively selected. This is apparently because the adjacent sites of position 267 in the linear sequence were negatively selected, and the effect of positive selection at position 267 was diluted when adjacent sites were added for the neutrality test. In fact, the cS/cN values at positions 266 and 268 were 1/0 and 2/1, respectively, indicating rS > rN at both sites, although not significant.
Window sizes ranging from 1 Å to 10 Å in three-dimensional window analysis were examined to make the average number of sites included in a window 20 or less. The analyses with the window sizes of 1 Å, 2 Å, and 3 Å were essentially the same as single-site analysis because the number of sites included in a window was equal or very close to unity (1.00 for 1 Å and 2 Å and 1.01 for 3 Å). Positive selection was not inferred for any window of the sizes ranging from 4 Å to 10 Å, except for a window of 10 Å. This window consisted of positions 370, 403, [427], [428], [429], [430], 431, 432, [433], [434], 435, and [439], where the sites that were included in NC41 are bold-faced and the invariable codon sites are indicated within brackets. Most of these sites were included in NC41. In addition, all sites that were not included in NC41 were invariable codon sites. These results suggest that this window is involved in NC41. Note that all windows including position 267 were not inferred as positively selected, apparently because the adjacent sites of position 267 in the three-dimensional structure were negatively selected. In fact, the cS/cN values at positions 265, 266, 268, and 269, which were in the closest proximity to position 267, were 3/4, 1/0, 2/1, and 0/0, respectively, indicating rS rN at all sites, although not significant.
Because the coordinate data for the side chains were available, the sites exposed on the protein surface in each window were identified and grouped separately for the neutrality test. To make the average number of exposed sites in a window 20 or less, window sizes ranging from 1 Å to 15 Å were examined. Positive selection was not inferred for any window of any size, except for one window of 10 Å. This window consisted of positions 370, [430], 431, 432, [433], [434], and 435 (with the same notations as above), which were all included in NC41. Note that these positions were the subset of positions in the positively selected window inferred without distinguishing the exposed and buried sites, as indicated above. Note also that all windows containing position 267 were not inferred as positively selected, although this position was judged as exposed.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Similar results were obtained in the analysis of the NA gene, which represents a relatively large data set. No window was inferred as positively selected in one-dimensional window analysis, but one window related to NC41 was inferred as positively selected in three-dimensional window analysis. These results appear to be reliable because the monoclonal antibody against NC41 is known to reduce the fitness of influenza A viruses, although it is unclear whether NC41 is involved in neutralization (Wright and Webster 2001). It is interesting that position 267 was inferred as positively selected only by single-site analysis, apparently because the adjacent sites of this position were negatively selected both in the linear sequence and in the three-dimensional structure. Although the biological function of position 267 does not appear to be known, it may be worth examination because the parsimony method of single-site analysis is known to be conservative (Suzuki and Nei 2002).
Three-dimensional window analysis, however, appears to have some problems. First, the three-dimensional structure of at least one of the sequences analyzed has to be known. This problem, however, may be solved to some extent as the number of three-dimensional structures increases exponentially in the PDB. Second, it is implicitly assumed that the three-dimensional structures of all sequences analyzed are the same as that of the reference sequence for which the coordinate data are available. Strictly speaking, this assumption should almost always be violated because the three-dimensional structures of proteins should not be exactly the same if one amino acid site is different. However, the overall structures of proteins are known to be quite stable, even if the sequence divergence is large (Branden and Tooze 1999). Therefore, the assumption may be acceptable as long as closely related sequences are analyzed and relatively large window sizes are used. Third, the most appropriate window size for detecting selection may be different according to the proteins analyzed. For example, the number of amino acid sites that contact with an antibody in general ranges from 15 to 22 (Klein and Horejsi 1997), suggesting that the window sizes examined in this study are appropriate. However, such information is usually not available, and various window sizes may be examined for each data set (Fares et al. 2002). Fourth, even when positive selection is inferred, it may be difficult to identify selected sites exactly, as mentioned above. For example, in the analysis of the HA genes, the positively selected window of 7 Å was obtained by adding an invariable codon site to the positively selected window of 6 Å (table 1). Similar examples can also be found in table 2. In these cases, the additional sites did not contribute to the neutrality test but were included in the positively selected windows because the amino acid sites around them were positively selected. Therefore, the invariable codon sites may be eliminated when the functions of amino acid sites in the positively selected windows are examined. At any rate, it is important to recognize that selection inferred by the methods examined in this study remains tentative until it is confirmed by experiments. This is partly because multiple neutrality tests are conducted with a 5% significance level (two-tailed) and positive selection is expected to be inferred for 2.5% of all windows, even if the null hypothesis is true. The significance level may be lowered by using the Bonferroni correction (Sokal and Rohlf 1995), but it does not appear to be suitable in the postgenomic era, where a large number of tests (e.g., for each codon site of the entire coding region) may be conducted based on limited amounts of data (e.g., the number of nucleotide substitutions accumulated at each codon site). Under such conditions, the corrected significance level becomes unrealistically low. It may, therefore, be important not to try to draw conclusions only from the statistical analysis but to confirm the statistical predictions by experiments, because the uncorrected significance level is applicable as long as a particular window is concerned.
In conclusion, three-dimensional window analysis appears to be more sensitive than one-dimensional window analysis for detecting positive selection. Three-dimensional window analysis also appears to be more sensitive than single-site analysis, especially when the number of sequences analyzed is relatively small and positive selection operates over the structural regions of proteins. This method, however, may fail to detect selection when it operates only on a particular site, in which case single-site analysis may be more accurate, although a large number of sequences is required.
The computer programs for the parsimony and likelihood methods of three-dimensional window analysis will be implemented in ADAPTSITE.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Barbey-Martin, C., B. Gigant, T. Bizebard, L. J. Calder, S. A. Wharton, J. J. Skehel, and M. Knossow. 2002. An antibody that prevents the hemagglutinin low pH fusogenic transition. Virology 294:7074.[CrossRef][ISI][Medline]
Branden, C., and J. Tooze. 1999. Introduction to protein structure. 2nd edition. Garland Publishing, Inc., New York.
Bush, R. M., W. M. Fitch, C. A. Bender, and N. J. Cox. 1999. Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol. Biol. Evol. 16:14571465.[Abstract]
Bush, R. M., C. B. Smith, N. J. Cox, and W. M. Fitch. 2000. Effects of passage history and sampling bias on phylogenetic reconstruction of human influenza A evolution. Proc. Natl. Acad. Sci. USA 97:69746980.
Clark, A. G., and T.-H. Kao. 1991. Excess nonsynonymous substitution at shared polymorphic sites among self-incompatibility alleles of Solanaceae. Proc. Natl. Acad. Sci. USA 88:98239827.[Abstract]
Colman, P. M., W. G. Laver, J. N. Varghese, A. T. Baker, P. A. Tulloch, G. M. Air, and R. G. Webster. 1987. Three-dimensional structure of a complex of antibody with influenza virus neuraminidase. Nature 326:358363.[CrossRef][ISI][Medline]
Endo, T., K. Ikeo, and T. Gojobori. 1996. Large-scale search for genes on which positive selection may operate. Mol. Biol. Evol. 13:685690.[Abstract]
Fares, M. A., S. F. Elena, J. Ortiz, A. Moya, and E. Barrio. 2002. A sliding window-based method to detect selective constraints in protein-coding genes and its application to RNA viruses. J. Mol. Evol. 55:509521.[CrossRef][ISI][Medline]
Fitch, W. M., R. M. Bush, C. A. Bender, and N. J. Cox. 1997. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc. Natl. Acad. Sci. USA 94:77127718.
Hughes, A. L. 1999. Adaptive evolution of genes and genomes. Oxford University Press, New York.
Hughes, A. L., and M. Nei. 1988. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167170.[CrossRef][ISI][Medline]
. 1989. Nucleotide substitution at major histocompatibility complex class II loci: evidence for overdominant selection. Proc. Natl. Acad. Sci. USA 86:958962.[Abstract]
Kabsch, W., and C. Sander. 1983. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:25772637.[ISI][Medline]
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111120.[ISI][Medline]
Klein, J., and V. Horejsi. 1997. Immunology. 2nd edition. Blackwell Science Ltd, Oxford, UK.
Macken, C., H. Lu, J. Goodman, and L. Boykin. 2001. The value of a database in surveillance and vaccine selection. Pp 103106 in A. D. M. E. Osterhaus, N. Cox, and A. W. Hampson, eds. Options for the control of influenza IV. Elsevier Science, Amsterdam.
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418426.[Abstract]
Rost, B., and C. Sander. 1994. Conservation and prediction of solvent accessibility in protein families. Proteins 20:216226.[ISI][Medline]
Saitou, N. 1989. A theoretical study of the underestimation of branch lengths by the maximum parsimony principle. Syst. Zool. 38:16.[ISI]
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425.[Abstract]
Sayle, R., and E. J. Milner-White. 1995. RASMOL: biomolecular graphics for all. Trends Biochem. Sci. 20:374.[CrossRef][ISI][Medline]
Sokal, R. R., and F. J. Rohlf. 1995. Biometry. 3rd edition. W. H. Freeman, New York.
Suzuki, Y. 2004. New methods for detecting positive selection at single amino acid sites. J. Mol. Evol. 59:1119.[ISI][Medline]
Suzuki, Y., and T. Gojobori. 1999. A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16:13151328.[Abstract]
Suzuki, Y., T. Gojobori, and M. Nei. 2001. ADAPTSITE: detecting natural selection at single amino acid sites. Bioinformatics 17:660661.
Suzuki, Y., and M. Nei. 2002. Simulation study of the reliability and robustness of the statistical methods for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 19:18651869.
. 2004. False-positive selection identified by ML-based methods: examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus. Mol. Biol. Evol. 21:914921.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple-sequence alignment through sequence weighting, position-specific gap penalties, and weight-matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
Varghese, J. N., and P. M. Colman. 1991. Three-dimensional structure of the neuraminidase of influenza virus A/Tokyo/3/67 at 2.2 Å resolution. J. Mol. Biol. 221:473486.[CrossRef][ISI][Medline]
Wiley, D. C., I. A. Wilson, and J. J. Skehel. 1981. Structural identification of the antibody-binding sites of Hong Kong influenza haemagglutinin and their involvement in antigenic variation. Nature 289:373378.[ISI][Medline]
Wright, P. F., and R. G. Webster. 2001. Orthomyxoviruses. Pp. 15331579 in D. M. Knipe, P. M. Howley, D. E. Griffin, R. A. Lamb, M. A. Martin, B. Roizman, and S. E. Straus, eds. Fields virology. 4th edition. Lippincott Williams & Wilkins, Philadelphia.
Yang, Z., R. Nielsen, N. Goldman, and A. M. K. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431449.