Department of Biology and Health Services, Edinboro University of Pennsylvania
Correspondence: E-mail: usorhannus{at}edinboro.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Sexual reproduction gene likelihood parsimony positive selection synonymous substitution nonsynonymous substitution
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The centric diatom Thalassiosira weissflogii analyzed here forms flagellated spermatozoa and egg cells that must recognize each other when released among a multitude of other cells (i.e., vegetative cells and gametes of other species) (Armbrust and Galindo 2001). Armbrust (1999) has identified three sexually induced genes (Sig1, Sig2, and Sig3) in T. weissflogii, which are thought to play a role in sperm-egg recognition. There are at least 10 unique copies of Sig1 present in an individual (Armbrust and Galindo 2001).
In their study of the evolution of Sig1, Armbrust and Galindo (2001) conducted a gene-wide dN/dS ratio analysis and failed to detect evidence for positive selection. However, gene-wide dN/dS tests have little power compared with more recently developed likelihood and parsimony methods. Thus, maximum-likelihoodbased (Yang et al. 2000) and parsimony-based (Suzuki and Gojobori 1999) analyses were carried out here.
![]() |
Data and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The software package DAMBE (version 4.0.98 [Xia 2000]) was employed to manage the data. Maximum-likelihoodbased (Yang et al. 2000) and parsimony-based (Suzuki and Gojobori 1999) methods were used to detect potential effects of positive selection on the polypeptide derived from Sig1. The maximum-likelihoodbased technique (Yang et al. 2000) was implemented by the CODEML program in the PAML package (version 3.11 [Yang 1997]). A set of likelihood models in CODEML allow for variable dN/dS ratios among sites (Yang et al. 2000). A likelihood ratio test was used to examine the data for positive selection, that is for the presence of sites with dN/dS ratios significantly greater than 1. This was accomplished by comparing a null model that did not allow for variable dN/dS ratios among sites to a more general model that did. Model M0 (one-ratio) was contrasted with the M3 model (discrete model) and model M7 (ß model) was compared with model M8 (ß and ) to discover potential significant heterogeneity in dN/dS among sites (Yang et al. 2000). Since model M8 (ß and
) was prone to multiple local optima, different initial dN/dS values, one value greater than 1 and the other value less than 1, were used in the analysis (Yang 2001). The initial dN/dS that gave the highest likelihood was chosen as the best result. A significance level of 5% was used to test for positive selection. Bayes theorem was implemented in the calculation of the posterior probabilities (confidence probability level = 95%) that sites with a dN/dS > 1 was influenced by positive selection (Yang et al. 2000).
The parsimony-based method (Suzuki and Gojobori 1999) was implemented by the computer program ADAPTSITE (Suzuki, Gojobori, and Nei 2001). This method was used to calculate the ancestral codons for all the internal nodes of the gene tree. Then total number of nonsynonymous (cN), synonymous (cS) substitutions per codon site, the average number of nonsynonymous (sN) and synonymous(sS) sites per codon sites were computed (Suzuki and Nei 2002). The null hypothesis of neutral evolution was tested under the assumption that cS and cN are binomially distributed and that the probabilities of occurrence of synonymous and nonsynonymous substitutions are sS/(sS + sN) and sN/(sS + sN), respectively (Suzuki and Nei 2002). Statistically, positive selection can be inferred when cN/sN is significantly larger than cS/sS (Suzuki and Nei 2002). A significance level of 5% was used.
Recombination events can create the appearance of parallel/convergent changes in different branches of the tree. CODEML will infer parallel/convergent substitutions as independent and could, as a result, give rise to faulty conclusions about positive selection. GENECONV (version 1.81 [Sawyer 1999]) was employed, using the default settings, to detect recombination events in the data set. This method searched for unusually long identical fragments within pairs of aligned sequences or pairwise segments within the alignment characterized by uncommonly high matching scores (Sawyer 1999). To evaluate the significance of the hypothesis that similar fragments arose by recombination, 10,000 randomly permuted data sets derived from the real alignment were generated (see Sawyer 1999 for additional details about the computations). The significance level was set at 5%.
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The likelihood ratio tests for the "large" and the "small" data sets indicated that the selection models M3 (discrete) and M8 (ß and ) fitted the data significantly better than the null models M0 (one-ratio) and M7 (ß), respectively (table 1). The M3 (discrete) and M8 (ß and
) models suggested that about 8% of the sites were under positive selection in both data sets (table 1). Calculations of posterior probabilities identified four amino acid sites (4, 42, 52, and 149) under positive selection in the "large data set" and seven sites (4, 9, 42, 52, 119, 149, and 182) in the "small data set" (table 2). Most of the replacement substitutions in the sites influenced by positive selection were found in the two Long Island isolates. However, two nonsynonymous changes took place in the lineages in the Pacific Ocean, one in the California isolate and the other in the Hawaiian isolate. Replacement substitutions in amino acid site number 4 were the most widely distributed since they occurred in both the Atlantic and Pacific oceans.
|
|
Simulation studies and analyses of real data by Suzuki and Nei (2001, 2002) suggested that positively selected amino acid sites are more reliably inferred by parsimony-based methods than by likelihood-based methods. Suzuki and Nei (2002) concluded that the parsimony-based method tended to be conservative, whereas the maximum-likelihoodbased technique appeared to be liberal in the interpretation of the presence of positively selected sites. An analysis of the human leukocyte antigen (HLA) was taken to support their conclusion (Suzuki and Nei 2001). However, the results obtained from the likelihood analyses of the HLA data by Suzuki and Nei (2001) appear to be problematical as simpler models had much higher likelihood values than the more general models, and multiple runs led to many different sets of parameter estimates. Yang and Swanson (2002) analyzed a similar data set of MHC alleles, and the results were all sensible. The parsimony-based method is expected to lack power for "smaller" data sets due to the fact that the technique performs a separate statistical test on each amino acid site. Thus, the failure of the method to detect sites influenced by positive selection here is not surprising.
Extensive simulations performed by Anisimova, Bielawski, and Yang (2001, 2002) and a review paper by Yang (2002) have suggested that the maximum-likelihoodbased approach implemented here is in general usable and robust. Predictions of positively selected sites are expected to be unreliable when sequences are very similar and the number of lineages small (e.g., tree length less than 0.12 substitutions per codon and number of lineages less than seven) (Anisimova, Bielawski, and Yang 2002). In this study, the tree length derived from the "large data set" (34 sequences) was 0.5 substitutions per codon and for the "small data set" (25 sequences), 0.3 substitutions per codon. Since the tree length and the number of sequences are clearly larger than 0.11 and six, respectively, the results are expected to be reliable. As far as the author is aware, this is the first study that has shown that reproductive proteins in unicellular eukaryotes are influenced by positive selection.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Anisimova, M., J. P. Bielawski, and Z. Yang. 2001. The accuracy and power of likelihood ratio tests to detect positive selection at amino acid sites. Mol. Biol. Evol. 18:1585-1592.
Anisimova, M., J. P. Bielawski, and Z. Yang. 2002. Accuracy and power of Bayes prediction of amino acid sites under positive selection. Mol. Biol. Evol. 19:950-958.
Armbrust, E. V. 1999. Identification of a new gene family expressed during the onset of sexual reproduction in the centric diatom Thalassiosira weissflogii. Appl. Environ. Microbiol. 65:3121-3128.
Armbrust, E. V., and H. M. Galindo. 2001. Rapid evolution of a sexual reproduction gene in centric diatoms of the genus Thalassiosira. Appl. Environ. Microbiol. 67:3501-3513.
Civetta, A., and R. S. Singh. 1995. High divergence of reproductive tractproteins and their association with postzygotic reproductive isolation in Drosophila melanogaster and Drosophila virilis group species. J. Mol. Evol. 41:1085-1095.[ISI][Medline]
Sawyer, S. A. 1999. GENECONV: a computer package for the statistical detection of gene conversion. Distributed by the author, Department of Mathematics, Washington University in St. Louis, available at http://www.math.wustl.edu/sawyer.
Suzuki, Y., and T. Gojobori. 1999. A method for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 16:1315-1328.[Abstract]
Suzuki, Y., T. Gojobori, and M. Nei. 2001. ADAPTSITE: detecting natural selection at single amino acid sites. Bioinformatics 17:660-661.
Suzuki, Y., and M. Nei. 2001. Reliabilities of parsimony-based and likelihood-based methods for dectecting positive selection at single amino acid sites. Mol. Biol. Evol. 18:2179-2185.
Suzuki, Y., and M. Nei. 2002. Simulation study of the reliability and robustness of the statistical methods for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 19:1865-1869.
Swanson, W. J., and V. D. Vacquier. 1995. Extraordinary divergence and positive Darwinian selection in a fusagenic protein coating the acrosomal process of abalone spermatozoa. Proc. Natl. Acad. Sci. USA 92:4957-4961.[Abstract]
Swanson, W. J., Z. Yang, M. F. Wolfner, and C. F. Aquadro. 2001. Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals. Proc. Natl. Acad. Sci. USA 98:2509-2514.
Tsaur, S. C., and C. I. Wu. 1997. Positive selection and the molecular evolution of a gene of male reproduction, Acp26Aa of Drosophila. Mol. Biol. Evol. 14:544-549.[Abstract]
Vacquier, V. D. 1998. Evolution of gamete recognition proteins. Science 281:1995-1998.
Wyckoff, G. J., W. Wang, and C. I. Wu. 2000. Rapid evolution of male reproductive genes in the descent of man. Nature 403:304-309.[CrossRef][ISI][Medline]
Xia, X. 2000. Data analysis in molecular biology and evolution. Kluwer Academic Publishers, Boston.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum. CABIOS 13:555-556.[Medline]
Yang, Z. 2001. Phylogenetic analysis by maximum likelihood (PAML). Version 3.11. University College London.
Yang, Z. 2002. Inference of selection from multiple species alignments. Curr. Opin. Genet. Dev. 12:688-694.[CrossRef][ISI][Medline]
Yang, Z., R. Nielsen, N. Goldman, and A. M. K. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-449.
Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol. Biol. Evol. 19:49-57.