Galton Laboratory, Department of Biology, University College London;
Department of Molecular Biology and Genetics, Cornell University;
Department of Biology, University of California, Riverside
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Recently Nielsen and Yang (1998)
and Yang et al. (2000)
extended the model of codon substitution of Goldman and Yang (1994)
(see also Muse and Gaut 1994
) to account for variable selective pressures among sites in the sequence. A statistical distribution is assumed for
ratios among sites. For example, the discrete model (M3) assumes three site classes, which have different
ratios. The proportions and
ratios for the site classes are estimated from the data by maximum likelihood. In such a model, we assume that there are several heterogeneous site classes but we do not know a priori which class each site is from. We refer to such models as random-sites models. Application of those models to real data sets has led to detection of positive selection in a number of genes, demonstrating the importance of accounting for variable selective pressures among sites (Zanotto et al. 1999
; Bishop, Dean, and Mitchell-Olds 2000
; Bielawski and Yang 2001
; Fares et al. 2001
; Ford 2001
; Haydon et al. 2001
; Peek et al. 2001
; Swanson et al. 2001
; see Yang and Bielawski 2000
for a review). Consistent with real data analysis, computer simulations also confirmed the power of those methods (Anisimova, Bielawski, and Yang 2001
).
Sometimes prior information is available to partition sites into classes, which are expected to have different selective pressures and thus different ratios. In such cases, it is sensible to make use of such information and fit models that assign different
ratios for site classes. For example, Hughes and Nei (1988)
tested the hypothesis that amino acid residues at the antigen-recognition site (ARS) of the major histocompatibility complex (MHC) identified by Bjorkman et al. (1987a, 1987b)
might be under diversifying selection. In this case, residues in the MHC can be partitioned into two classes: those in the ARS region and those outside, and two independent
ratios can be used. Another possible use of such models is the combined analysis of multiple protein-coding genes from the same set of species to test for their similarities and differences in the substitution pattern. The models then have similarities to the relative-ratio test developed by Muse and Gaut (1997)
.
In this paper, we implement models that account for the heterogeneity of different site partitions, and refer to them as the fixed-sites models. We apply the new models to two well-documented genes, the MHC class I gene (Hughes and Nei 1988, 1989
; Hughes, Ota, and Nei 1990
) and the abalone sperm lysin gene (Lee, Ota, and Vacquier 1995
; Yang, Swanson, and Vacquier 2000
).
![]() |
Theory |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
When we apply the model to data of partitioned sites, we use different ratios, and thus different Q matrices, for sites from different partitions. Similarly we can allow other parameters to differ between site partitions. These models are structurally similar to models of nucleotide substitution of Yang (1996)
, which account for different transition/transversion rate ratios, different base frequencies, and different levels of among-site rate variation among prior partitions of sites, for example, the three codon positions. Here we also implement several models to accommodate different levels of site heterogeneity (table 1 ). The simplest model assumes that all sites in the sequence have the same substitution pattern with identical parameters (model A in table 1
). Parameters in the model include the b branch lengths, the transition/transversion rate ratio
, the nonsynonymous/synonymous rate ratio
, and the nine parameters for the codon frequencies, with b + 11 parameters in total. The most complex model (model F in table 1
) assumes that all site partitions have different substitution patterns with independent substitution parameters. This model is equivalent to analyzing data of different partitions as separate data sets and summing up the log-likelihood values. For g partitions, the model has g x (b + 11) parameters. Models BE lie in between these two extremes, and assume proportional branch lengths among partitions. Branch lengths for partition k are rk times those for the first partition (r1 = 1). Thus b + (g - 1), instead of b x g, parameters are used to specify all branch lengths for the site partitions. Apart from the different substitution rates, model B (table 1 ) assumes homogeneity among partitions in the transition/transversion rate ratio
, the nonsynonymous/synonymous rate ratio
, and the codon frequencies. Model C assumes proportional branch lengths, identical
and
, but different codon frequencies among partitions. Model D assumes proportional branch lengths, different
and
, but identical codon frequencies among partitions. Model E assumes proportional branch lengths, different
and
, and different codon frequencies among partitions. These models are implemented in the PAML program package (Yang 1997
); see table 1
for details.
|
![]() |
Analysis of Class I MHC Alleles and Abalone Sperm Lysin Genes |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Class I MHC
The class I MHC glycoprotein recognizes and binds foreign peptides. The apparent selective force acting upon the MHC is to recognize and bind a large number of foreign peptides. Based on the crystal structure, different domains of the MHC have been characterized. The ARS is the cleft that binds foreign antigens (Bjorkman et al. 1987a, 1987b
). The identification of the ARS enabled previous researchers to partition the data into ARS and non-ARS sites and to demonstrate positive selection in the ARS (Hughes and Nei 1988, 1989
). Without partitioning the data, positive selection was not detected in pairwise comparisons averaging rates over the entire sequence. Therefore, the MHC makes an ideal test case for maximum likelihood analyses of partitioned data. We compiled and aligned 192 alleles of the human class I MHC from the A, B, and C loci. The alignment is available from the authors upon request. Alignment gaps were removed, with 270 codons left in each sequence. We used the maximum likelihood method to estimate pairwise distances under the codon-substitution model (Goldman and Yang 1994
), and then used the neighbor-joining method (Saitou and Nei 1987
) to construct a tree topology, which is used in later analysis. The tree topology was found to have little effect on the analysis in previous studies (e.g., Yang et al. 2000
; Ford 2001
), and in this paper we ignore the uncertainty of the tree topology.
First, we applied the random-sites models (Nielsen and Yang 1998
; Yang et al. 2000
) to the data. The results are presented in table 2
. Model M0 assumes one
ratio for all sites. The log likelihood is
= -8225.16, with the estimate
= 0.612. This is an average over all sites in the protein and all lineages in the tree, and indicates the dominating role of purifying selection in the evolution of the MHC. Model M1 (neutral) assumes two site classes in the sequence: the conserved sites with
0 = 0 and the neutral sites with
1 = 1. This model has the same number of parameters as M0 (one-ratio) but fitted the data much better, with a log likelihood
= -7719.46. Model M2 (selection) adds another site class to M1 (neutral), with a free
ratio estimated from the data, thus allowing for the possibility of positive selection. Parameter estimates suggest that about 10% of sites are under positive selection with
2 = 8.1 (table 2
). This model fits the data much better than the neutral model; the test statistic is 2
= 2 x (-7296.69 - (-7719.46)) = 845.54, compared with the
2 distribution with d.f. = 2. Model M3 (discrete) assumes three site classes with the proportions (p0, p1, p2) and
ratios (
0,
1,
2) estimated from the data. The estimates suggest that the majority of sites are under purifying selection with
0 = 0.07, but about 9% of sites are under strong diversifying selection with
2 = 6.0. M3 fits the data significantly better than any of the simpler models M0, M1, or M2. Model M7 (beta) assumes a beta distribution of
over sites. The beta distribution can take a variety of shapes although it is limited to the interval (0, 1). So it provides a flexible null model for testing positive selection. The estimated distribution B(0.103, 0.354) has an extreme U shape, with most of the sites having
close to either 0 or 1. Model M8 (beta &
) adds an extra site class to M7 (beta) with a free
ratio estimated from the data. The estimates suggest that about 10% of sites are under diversifying selection with
= 5.1. The likelihood ratio test comparing M7 (beta) and M8 (beta &
) has the statistic 2
= 2 x (-7232.68 - [-7498.97]) = 2 x 266.29 = 532.58, much greater than a
2 significance value at d.f. = 2. Summing up, the random-sites models demonstrate extreme variability in selective pressure among sites in the MHC and the presence of a number of sites under diversifying selection. Sites inferred to be under positive selection are listed in table 2
. The posterior probabilities and posterior means for sites are shown in figure 1 . Inferred sites are also mapped onto the crystal structure in figure 2
. It is noteworthy that the sites inferred to be under positive selection are scattered along the primary sequence, but are all clustered in the ARS in the crystal structure (fig. 2 ).
|
|
|
Table 3
lists results obtained under the fixed-sites models. The simplest model (model A in table 3
) assumes no site heterogeneity and gives A = -8225.16. Allowing for different substitution rates for the two partitions (model B in table 3
) gave
B = -7790.10. This is a dramatic improvement of 435.06 log-likelihood units upon adding a single parameter (r2). The estimate r2 indicates that the substitution rate in the ARS is 6.5 times as high as outside the ARS. Model C further allows for different codon frequencies for the two partitions, by using nine additional parameters for base frequencies at the three codon positions. The log likelihood increased by
C -
B = -7767.77 - (-7790.10) = 22.33. While statistically significant, this is not a very big improvement. Model D uses different
and
but the same codon frequencies for the two partitions. It has two more parameters than model B and fits the data much better; the likelihood ratio statistic is 2
= 2(
D -
B) = 2 x ([-7691.57] - [-7790.10]) = 197.06. Variation in
and
between the partitions is much more important to the fit of the model than variation in the codon frequencies. Model E assumes different
and
as well as different codon frequencies for the two partitions, and fits the data significantly better than any of the simpler models. Parameter estimates under model E are similar to those under model D. They all suggest that the
ratio is very different in the two partitions. The non-ARS sites are under purifying selection with
1 = 0.23, whereas the ARS sites are under diversifying selection with
2 = 1.9. Like the comparison between models B and D, comparison between models C and E leads to rejection of model C, with 2
= 2(
E -
C) = 191.70, indicating that
and
are different between the partitions. Model F is the separate analysis. Despite its use of 381 x 2 branch lengths for the two partitions, many of which are zero, the model fits the data significantly better than models for combined analysis which assume proportional branch lengths (models B, C, D, and E). For example, the test statistic for comparing models E and F is 2
= 492.34, and P < 0.0001 with d.f. = 380. Nevertheless, estimates of parameters such as
and
are highly similar to those obtained in the combined analyses. The tree length, i.e., the sum of branch lengths along the tree, for the first partition (sites outside the ARS) is 1.957 nucleotide substitutions per codon, or
S = 1.789 synonymous substitutions per synonymous site and
N = 0.414 nonsynonymous substitutions per nonsynonymous site. At the ARS, the tree length is 12.087 nucleotide substitutions per codon, or
S = 2.317 and
N = 4.297. Therefore, the synonymous rates are similar between the two partitions, and the over sixfold difference in substitution rate between the two partitions is mainly caused by the accelerated nonsynonymous rate at the ARS.
|
|
The poorer performance of the fixed-sites models appears to be mainly caused by inclusion of conserved sites in the list of the 57 ARS sites. We note that structural studies permit the identification of sites potentially involved in antigen binding, but do not expect all of them to be under diversifying selection in the data set examined. The random-sites model M8 (beta & ) identified 25 sites to be under positive selection (table 2
), out of which 22 are in the list of ARS sites. The three sites that are not in the list are 45M, 94T, and 113Y. These sites are located in the ARS domain, although not in the binding cleft, and might also be involved in specificity of binding foreign peptides. Previous studies demonstrated that antibody specificity can be mediated by both variable loops and substitutions on the protein framework that do not have direct contact with the antigen (Foote and Winter 1992
). The results here suggest a similar process may be occurring at these sites in the MHC. There are 35 sites in the ARS partition that are not identified to be under positive selection by the random-sites models. Of them, site 73T has posterior probability P = 0.64 and posterior mean
= 3.6, and is quite likely to be under positive selection (fig. 1
). Sites 64T, 66K, 74H, 75R, 76V, and 171Y all have posterior means
> 0.8 and are possibly under positive selection but not detected by the random-sites models because of lack of information in the data at these sites. Sites 5M, 22F, 26G, 57P, 72Q, 84Y, 146K, 154E, 159Y, 165V, and 169R have posterior probabilities close to zero and posterior mean
< 0.1 (fig. 1
). These sites are most likely to be under strong purifying selection. Indeed, sites 57P, 72Q, 154E, 165V, and 169R point away from the antigen binding cleft and were predicted not to be involved in direct antigen binding in the original MHC structural analysis (Bjorkman et al. 1987b
).
Overall, these comparisons demonstrate the consistency of the fixed-sites and random-sites models and, in particular, the utility of the random-sites models even when structural information is available. They also highlight the power of predicting binding sites by incorporating both structural and evolutionary information.
It is also interesting to compare the results of table 2
(see also fig. 2
) with those of Swanson et al. (2001)
, who applied the random-sites models to a dataset of only six MHC alleles. The smaller data set included the signal sequence and additional C-terminal sequence, which were removed in this paper because these regions were not sequenced in all 192 alleles analyzed. Under the numbering system of this paper, this analysis identified 12 sites at the 50% level: 45M, 62G, 63E, 66K, 67V, 70H, 71S, 97R, 114H, 116Y, 151H, and 156L. All but one site (site 66K) are in the list of this paper (table 2
). It is remarkable that all sites identified in both studies are clustered in the ARS domain. At the 95% level, only two sites (114H and 156L) were identified in the small data set, compared with 25 sites in this paper. This comparison demonstrates the dramatic improvement in the power of the method with the increase of the number of sequences used, consistent with the simulation study of Anisimova, Bielawski, and Yang (2001)
. We suggest that more sites might be under positive selection in the MHC than identified in this paper.
Abalone Sperm Lysin
Abalones are large marine gastropod mollusks that exhibit external fertilization, with sperm and eggs released directly into seawater where fertilization occurs. Despite many of the species having overlapping breeding seasons and habitats, the species remain distinct. One barrier to cross-species fertilization is the species-specific interaction of sperm and eggs, which can be quantitatively demonstrated in the laboratory (e.g., Lyon and Vacquier 1999
). The molecules involved in the species-specific interaction have been characterized extensively (reviewed in Vacquier et al. 1999
). Abalone sperm lysin is a 16-kDa protein localized in the sperm acrosome granule. Upon exocytosis, lysin dissolves a hole in the egg vitelline envelope (VE) in a nonenzymatic and species-specific manner. Lysin binds to and unravels the fibrous VE by disrupting hydrogen bonds and hydrophobic interactions of its receptor VERL (Swanson and Vacquier 1997, 1998
). The crystal structures of the red (Haliotis rufescens) and green (H. fulgens) abalone have been determined (Shaw et al. 1995
; Kresge, Vacquier, and Stout 2000a, 2000b
). The sperm lysin genes of 25 abalone species were sequenced and analyzed by Lee, Ota, and Vacquier (1995)
, and strong diversifying selection was demonstrated at a number of amino acid sites in lysin, particularly in closely related sympatric species (Yang, Swanson, and Vacquier 2000
). The sequence data used in this paper are the same as those analyzed by Lee, Ota, and Vacquier (1995)
and Yang et al. (2000)
, except that an alignment gap between residues 133 and 134 in the original alignment is deleted in this paper, so that 134 codons are in each sequence. We use the phylogeny estimated by Lee, Ota, and Vacquier (1995)
.
Extensive analysis of the data under random-sites models was performed by Yang et al. (2000)
. In this paper, we present results obtained under models M7 (beta) and M8 (beta &
) only (table 5
). Parameter estimates are essentially identical to those in Yang et al. (2000)
, but the log-likelihood values are quite different, because of the removed site. Estimates under model M8 (beta &
) suggest that many sites are highly conserved, but as many as 27% of sites are under diversifying selection with
2 = 3.0. The likelihood ratio test comparing these two models suggests that the difference is statistically significant; the test statistic is 2
= 2(
1 -
0) = 2 x ([-4410.57] - [-4472.16]) = 123.18, compared with the
2 distribution with d.f. = 2. Sites inferred to be under positive selection are listed in table 5
. The lysin structure of the red abalone (H. rufescens), with sites identified to be under positive selection mapped onto it, was presented in Yang, Swanson, and Vacquier (2000)
.
|
The results obtained under the fixed-sites models are shown in table 6
. Model A, which assumes the same parameters in the two partitions, gave A = -4627.03. Model B allows the overall rates to differ and fits the data much better than model A; the likelihood ratio test statistic is 2
= 2 x ([-4549.99] - [-4627.03]) = 154.08, compared with the
2 distribution with d.f. = 1. The rate at the solvent-exposed sites is 2.8 times as high as at the buried sites (r1:
2 = 1:2.755). Model C allows further for different codon frequencies for the two partitions, determined by the nucleotide frequencies at the three codon positions. This model fits the data much better than model B (2
= 119.84, d.f. = 9), suggesting that the codon usage patterns are indeed different at the buried and exposed sites. Model D assumes the same codon frequencies but different transition/transversion rate ratio
and nonsynonymous/synonymous rate ratio
. This model fits the data better than model B (2
= 35.74, d.f. = 2). The estimates are
1 = 1.7 and
1 = 0.39 for the buried sites and
2 = 1.5 and
2 = 1.25 for the solvent-exposed sites (table 6
). Whereas estimates of
are similar between the partitions, estimates of
are very different. As hypothesized, buried sites are under strong purifying selection, and solvent-exposed sites appear to be under diversifying selection. Unlike the MHC data set, allowing for different codon frequencies (model C) improves the fit of the model more than allowing for different
and
(model D). This pattern might be the result of different amino acid compositions at the buried and exposed sites. Model E allows different
and
as well as different codon frequencies between partitions, and fits the data better than any of the simpler models. The model gave similar estimates of parameters as model D (table 6
). Model F is equivalent to separate analysis of the two partitions. It is not significantly better than model E; the statistic is 2
= 38.06, and P = 0.79, with d.f. = 46. So it is acceptable to use 47 + 1 instead of 47 x 2 parameters for branch lengths in the two partitions. Parameter estimates under model F are similar to those obtained in the combined analyses (models BE). The tree length for the buried sites is 3.96 nucleotide substitutions per codon, or
S = 2.20 synonymous substitutions per synonymous site and
N = 0.99 nonsynonymous substitutions per nonsynonymous site. The tree length for the solvent-exposed sites is 9.98, or
S = 2.76 and
N = 3.50. Thus the 2.5 times rate difference between the two partitions is mainly caused by the accelerated nonsynonymous rate at the exposed sites.
|
|
![]() |
Discussions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We note that in the MHC data set, 22 of the 25 sites identified by the random-sites models to be under positive selection are in the list of sites in the ARS, whereas the other three sites are in the ARS domain. In the lysin data set, all sites identified by the random-sites models are in the partition of exposed sites. Such consistency between the two classes of models validates the biological hypothesis used to partition sites a priori and also the reliability of the random-sites models. We suggest that the random-sites models are useful whether or not prior information is available to partition sites in the sequence. However, it should be emphasized that identification of sites under positive selection using the Bayes theorem requires simultaneous inferences at all sites in the sequence. Whereas the accuracy at one site might be high as indicated by the posterior probability, it is very unlikely for all sites to be identified correctly. Furthermore, the empirical Bayes procedure we used does not account for the sampling errors in parameter estimates, and the posterior probability calculations might be sensitive to parameters in the distribution (Yang and Bielawski 2000
). Those problems may be serious when the analyzed data set is small and contains only a few highly similar sequences, with little information to estimate parameters in the
distribution. Thus we suggest that caution be exercised and the inferred sites be considered hypotheses to be verified by experimental investigation.
Analysis of Data from Multiple Genes
We envisage that one major use of the fixed-sites models is to test for similarities and differences in the evolutionary process among different genes. When sequences from multiple protein-coding genes are available for the same set of species, they can be analyzed as a combined data set, with their differences in the substitution pattern accounted for. Interesting hypotheses concerning differences among genes in the selective pressure indicated by the ratio can then be tested. In this regard, some variations to the models we implemented here might be more interesting. For example, one such model might have a homogeneous synonymous substitution rate and variable nonsynonymous rates among genes. Another model might assume proportional branch lengths at the synonymous site and freely variable branch lengths at the nonsynonymous site among the genes. It might also be worthwhile to decouple
and
. In this paper, these two parameters are either both homogeneous or both different among genes. Analyses of this paper did not assume a molecular clock, so that the overall rate varies among branches. Models that enforce the molecular clock at the synonymous site but do not enforce the clock at the nonsynonymous site might be interesting. We note that some similar models have been developed by Muse and Gaut (1997)
in their pioneering work, and further implementation of such likelihood models is straightforward.
![]() |
Footnotes |
---|
Keywords: synonymous rate
nonsynonymous rate
positive selection
partitioned data
lysin
MHC
maximum likelihood
Bayes
Address for correspondence and reprints: Ziheng Yang, Department of Biology, 4 Stephenson Way, London NW1 2HE, UK. z.yang{at}ucl.ac.uk
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Akashi H., 1999 Within- and between-species DNA sequence variation and the footprint of natural selection Gene 238:39-51[ISI][Medline]
Anisimova M., J. P. Bielawski, Z. Yang, 2001 The accuracy and power of likelihood ratio tests to detect positive selection at amino acid sites Mol. Biol. Evol 18:1585-1592
Bielawski J. P., Z. Yang, 2001 Positive and negative selection in the DAZ gene family Mol. Biol. Evol 18:523-529
Bishop J. G., A. M. Dean, T. Mitchell-Olds, 2000 Rapid evolution in plant chitinases: molecular targets of selection in plant-pathogen coevolution Proc. Natl. Acad. Sci. USA 97:5322-5327
Bjorkman P. J., M. A. Saper, B. Samraoui, W. S. Bennett, J. L. Strominger, D. C. Wiley, 1987a. The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens Nature 329:512-518[ISI][Medline]
Bjorkman P. J., M. A. Saper, B. Samraoui, W. S. Bennett, J. L. Strominger, D. C. Wiley, 1987b. Structure of the class I histocompatibility antigen, HLA-A2 Nature 329:506-512[ISI][Medline]
Crandall K. A., C. R. Kelsey, H. Imamichi, H. C. Lane, N. P. Salzman, 1999 Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection Mol. Biol. Evol 16:372-382[Abstract]
Endo T., K. Ikeo, T. Gojobori, 1996 Large-scale search for genes on which positive selection may operate Mol. Biol. Evol 13:685-690[Abstract]
Fares M. A., A. Moya, C. Escarmis, E. Baranowski, E. Domingo, E. Barrio, 2001 Evidence for positive selection in the capsid protein-coding region of the foot-and-mouth disease virus (FMDV) subjected to experimental passage regimens Mol. Biol. Evol 18:10-21
Foote J., G. Winter, 1992 Antibody framework residues affecting the conformation of the hypervariable loops J. Mol. Biol 224:487-499[ISI][Medline]
Ford M. J., 2001 Molecular evolution of transferrin: evidence for positive selection in salmonids Mol. Biol. Evol 18:639-647
Fraczkiewicz R., W. Braun, 1998 Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules J. Comp. Chem 19:319-333[ISI]
Gao G. F., J. Tormo, U. C. Gerth, J. R. Wyer, A. J. McMichael, D. I. Stuart, J. I. Bell, E. Y. Jones, B. K. Jakobsen, 1997 Crystal structure of the complex between human CD8alpha(alpha) and HLA-A2 Nature 387:630-634[ISI][Medline]
Goldman N., Z. Yang, 1994 A codon-based model of nucleotide substitution for protein-coding DNA sequences Mol. Biol. Evol 11:725-736
Haydon D. T., A. D. Bastos, N. J. Knowles, A. R. Samuel, 2001 Evidence for positive selection in foot-and-mouth-disease virus capsid genes from field isolates Genetics 157:7-15
Hughes A. L., M. Nei, 1988 Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection Nature 335:167-170[ISI][Medline]
Hughes A. L., M. Nei, 1989 Evolution of the major histocompatibility complex: independent origin of nonclassical class I genes in different groups of mammals Mol. Biol. Evol 6:559-579[Abstract]
Hughes A. L., T. Ota, M. Nei, 1990 Positive Darwinian selection promotes charge profile diversity in the antigen-binding cleft of class I major-histocompatibility-complex molecules Mol. Biol. Evol 7:515-524[Abstract]
Kresge N., V. D. Vacquier, C. D. Stout, 2000a. 1.35 and 2.07 Å resolution structures of the red abalone sperm lysin monomer and dimer reveal features involved in receptor binding Acta Crystallogr 56:34-41[ISI]
. 2000b. The high resolution crystal structure of green abalone sperm lysin: implications for species-specific binding of the egg receptor J. Mol. Biol 296:1225-1234[ISI][Medline]
Lee Y.-H., T. Ota, V. D. Vacquier, 1995 Positive selection is a general phenomenon in the evolution of abalone sperm lysin Mol. Biol. Evol 12:231-238[Abstract]
Lyon J. D., V. D. Vacquier, 1999 Interspecies chimeric sperm lysins identify regions mediating species-specific recognition of the abalone egg vitelline envelope Dev. Biol 214:151-159[ISI][Medline]
Muse S. V., B. S. Gaut, 1994 A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome Mol. Biol. Evol 11:715-724
Muse S. V., B. S. Gaut, 1997 Comparing patterns of nucleotide substitution rates among chloroplast loci using the relative ratio test Genetics 146:393-399
Nielsen R., Z. Yang, 1998 Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene Genetics 148:929-936
Peek A. S., V. Souza, L. E. Eguiarte, B. S. Gaut, 2001 The interaction of protein structure, selection, and recombination on the evolution of the type-1 fimbrial major subunit (fimA) from Escherichia coli J. Mol. Evol 52:193-204[ISI][Medline]
Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]
Sharp P. M., 1997 In search of molecular Darwinism Nature 385:111-112[Medline]
Shaw A., P. A. Fortes, C. D. Stout, V. D. Vacquier, 1995 Crystal structure and subunit dynamics of the abalone sperm lysin dimer: egg envelopes dissociate dimers, the monomer is the active species J. Cell Biol 130:1117-1125[Abstract]
Swanson W. J., V. D. Vacquier, 1997 The abalone egg vitelline envelope receptor for sperm lysin is a giant multivalent molecule Proc. Natl. Acad. Sci. USA 94:6724-6729
Swanson W. J., V. D. Vacquier, 1998 Concerted evolution in an egg receptor for a rapidly evolving abalone sperm protein Science 281:710-712
Swanson W. J., Z. Yang, M. F. Wolfner, C. F. Aquadro, 2001 Positive Darwinian selection in the evolution of mammalian female reproductive proteins Proc. Natl. Acad. Sci. USA 98:2509-2514
Vacquier V. D., W. J. Swanson, E. C. Metz, C. D. Stout, 1999 Acrosomal proteins of abalone spermatozoa Adv. Dev. Biochem 5:49-81
Yang Z., 1996 Maximum-likelihood models for combined analyses of multiple sequence data J. Mol. Evol 42:587-596[ISI][Medline]
Yang Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood Comput. Appl. Biosci 13:555-556[Medline]
Yang Z., 2001 Adaptive molecular evolution Pp. 327350 in D. Balding, M. Bishop, and C. Cannings, eds. Handbook of statistical genetics. Wiley, New York
Yang Z., J. P. Bielawski, 2000 Statistical methods for detecting molecular adaptation Trends Ecol. Evol 15:496-503[ISI][Medline]
Yang Z., R. Nielsen, N. Goldman, A.-M. K. Pedersen, 2000 Codon-substitution models for heterogeneous selection pressure at amino acid sites Genetics 155:431-449
Yang Z., W. J. Swanson, V. D. Vacquier, 2000 Maximum likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites Mol. Biol. Evol 17:1446-1455
Zanotto P. M., E. G. Kallas, R. F. Souza, E. C. Holmes, 1999 Genealogical evidence for positive selection in the nef gene of HIV-1 Genetics 153:1077-1089