* Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
Department of Biology and Biochemistry, University of Houston
Department of Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
Correspondence: E-mail: dgraur{at}uh.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: rate variation among sites evolutionary conservation empirical Bayesian methods bioinformatics Bcl-xL
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Numerous site-specific conservation scores have been proposed over the years (reviewed in Valdar 2002; see also del Sol Mesa, Pazos, and Valencia 2003, Yao et al. 2003). Though evolution is the driving force that determines site conservation, none of these methods make full use of either the information contained in the phylogenetic tree or the stochastic nature of amino acid replacements. This deficit may lead to erroneous predictions. For example, when branch lengths are ignored, a replacement on a short branch will be given the same weight as one occurring on a long branch. However, an amino acid replacement between two divergent sequences is less surprising than one occurring between two closely related sequences. The incorporation of advanced evolutionary models was proved to greatly increase the accuracy of site-specific rate inference (Pupko et al. 2002). Evolutionary rates are commonly measured as number of replacements per amino acid site per year. The term site-specific evolutionary rate in the context of our conservation scores is different. Here, the rate is relative to the average evolutionary rate across all sites and hence is unitless. In addition, for each site we assume that the rate is constant across all lineages. Finally, in this paper we limit our discussion to site-specific rate inference that is based on probabilistic evolutionary models.
Currently, likelihood methods are considered state-of-the-art phylogenetic techniques, allowing robust statistical testing of evolutionary hypotheses (Whelan, Lio, and Goldman 2001). Several alternatives within the likelihood framework are currently being used for inferring evolutionary rates. These can be divided into two types: (1) Bayesian methods that presuppose a prior distribution of evolutionary rates, and (2) maximum-likelihood (ML) methods that do not. Both approaches have solid statistical foundations and are closely related, as they use the same models of evolution and operate within the same statistical framework.
The ML approach for estimating site-specific conservation scores chooses the rate that yields the highest probability to the observed data. The first site-specific rate estimation using ML was the DNArates program developed in the early 1990s by Gary Olsen. A paper describing DNArates was never published, but documentation can be found at http://geta.life.uiuc.edu/gary/programs/DNArates.html (see also Felsenstein 2001). Nielsen (1997) also studied ML based estimation for DNA sequences and suggested incorporating a Gamma prior to avoid cases where the ML estimate is infinite. Using the same ML methodology, Pupko et al. (2002) developed the Rate4Site tool for the identification of functional regions in proteins. Rate4Site was embedded in the ConSurf server (Glaser et al. 2003; http://consurf.tau.ac.il) and successfully identified functional residues at the contact interface of several proteins (Donaudy et al. 2003; Mella et al. 2003; Ramelot et al. 2003; RamShankar et al. 2003).
Bayesian inference is based on the posterior probability distribution, which is directly proportional to the product of the prior distribution and the likelihood. A Bayesian approach, assuming a Gamma prior for DNA sequences, was suggested by Yang and collaborators (Yang and Wang 1995; Excoffier and Yang 1999). Computing a Bayesian estimate based on a continuous Gamma distribution is computationally impracticable for even a modest number of sequences (Yang 1996). Yang (1994) suggested the discrete Gamma model as an approximate method and found that four categories are sufficient to provide a decent approximation to the continuous Gamma distribution.
Site-specific evolutionary rates are directly connected to the branch lengths of the phylogenetic tree (see Materials and Methods). The problems of estimating branch lengths and site-specific rates are thus inseparable. Two possible solutions exist: (1) estimate branch lengths first and then estimate site-specific rates, assuming that the branch lengths are known (e.g., Nielsen 1997, Pupko et al. 2002); or (2) estimate branch lengths and site-specific rate simultaneously, using an iterative procedure (e.g., Meyer and von Haeseler 2003).
The purpose of this study is to compare the performances of ML and Bayesian estimates through simulation. First, we shall study the effect of the number of discrete Gamma categories on the performance of the Bayesian method for the task of evaluating site-specific rates. Then, we shall study the effect of various evolutionary parameters, such as number of taxonomic units, branch lengths, sequence length, and the shape of the rate distribution, on the quality of predictions. These comparisons assume that the tree topology and branch lengths are known prior to rate inference. We then explore the accuracy of rate estimation in the more realistic scenario where branch lengths are not known in advance. We conclude with an illustrative biological example.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Within the Bayesian framework, the posterior probability is obtained from the likelihood function and the prior probability. Assuming that the topology, the branch lengths and are known a priori, the probability of any given rate, r, is
|
|
Estimating Branch Lengths
When the branch lengths are unknown one may estimate the branch lengths using the classical ML approach and then treat these branch lengths as known for the task of rate estimation, using either the ML or the Bayesian method. When inferring the branch length in this case we assumed a Gamma distribution and found the ML estimates of the parameter and the branch lengths simultaneously. Alternatively, in the maximum-likelihood framework one can consider a rich model in which each site has its own rate. The tree and branch lengths can be estimated using this model (see, e.g., Meyer and von Haeseler 2003). In this case, assuming that the tree topology is known a priori, the parameters of the model (i.e., the site-specific rates) and the branch lengths are estimated simultaneously by an iterative procedure. In each iteration we first estimate site-specific rates, given the branch lengths. We then find the ML estimate of the branches given the rates. We continue until convergence of the likelihood function. We call this rich-model method ML-RICH.
Branch lengths and site-specific rate estimates are not independent. One can always multiply all branch lengths by a constant factor c and divide all rates by this factor, resulting in no change in the likelihood score. To avoid this circularity, in all methods, site-specific rates were scaled so that the average is 1. Our simulations indicate that scaling has a negligible effect on EB-EXP, whereas it increases the accuracy of the ML methods (data not shown).
Simulation
A simulated site-specific rate parameter was assigned to each site. Given a model tree and simulated rates, protein sequences were generated by simulating evolutionary changes along the branches. The simulation used the JTT model of amino acid replacement (Jones, Taylor, and Thornton 1992), in which each site evolves independently. For each run a total of 500 sites were generated in this manner.
For the simulation, one must determine the "true" rate in each site. If the true rates are sampled from a Gamma distribution, this could bias the results toward the Bayesian method, which assumes a Gamma prior. To avoid this bias, the rates used in our simulations were drawn from an empirical rate distribution inferred from a biological multiple sequence alignment (MSA) with many homologs. We used a distribution inferred by EB-EXP from an MSA of 34 homologous Src-homology-2 (SH2) domains (Pupko et al. 2002). The simulated rates were scaled so that the average was set to 1. To avoid a possible bias because the rate distribution was inferred by a specific method, the same MSA was used to infer a second empirical distribution using ML. The simulation results obtained with this distribution were similar with regard to the relative accuracy of the methods (data not shown).
Due to the complexity of the parameter space, we studied only several special cases. In all trees used, the length of interior branches was d and that of the exterior branches was 3d. Our simulation runs varied in their value of d, number of sequences, sequence lengths, and rate distributions used for generating the data. Illustrative model tree with six and 18 sequences are shown in figure 2. The generated sequences, along with the model tree, were given as input to the EB-EXP and ML methods and rates were inferred for each position. We note that in these simulations the tree topology and branch lengths were assumed to be known a priori. The parameter of EB-EXP was inferred from the data for each run. A different set of simulations were performed to study the more complicated case in which the branch lengths are not given a priori (see below). In each simulation run, the accuracy of inference was analyzed by computing the mean square error (MSE) between the simulated rates and the respective inferred estimates. MSE was calculated as
|
|
|
Program Availability
The ML and EB-EXP rate-inference methods were implemented in computer programs written in C++ and are available at http://www.tau.ac.il/talp/rate4site.html. A server for automatic inference of conserved regions in proteins and for projecting them onto the three dimensional structure is available through the ConSurf server (http://consurf.tau.ac.il/).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
Rate Estimation When the Branch Lengths Are Unknown a Priori
When the branch lengths are unknown two alternatives exists. Either the branch lengths are first estimated and then site-specific rates are inferred (using either ML or EB-EXP) or the branch lengths and site-specific rates are estimated simultaneously using the ML-RICH method (see Materials and Methods). We tested the accuracy of site-specific rate prediction using these three alternatives (ML, EB-EXP, and ML-RICH). Figure 4b presents our results for trees with different number of sequences. As expected, EB-EXP was superior to both ML methods, with ML better than ML-RICH in all cases studied. However, the differences between the methods diminished as the number of sequences increased, with the three methods reaching almost the same level of accuracy for 30 sequences.
Case Study
Will the differences between the various inference methods be noticeable when analyzing real data sets? To address this question we examined the evolutionary conservation pattern of the Bcl-2 protein family. This protein family plays a central role in the regulation of apoptotic cell death (Adams and Cory 1998). The family is divided into two subfamilies: anti-apoptotic and pro-apoptotic. All family members possess at least one of four conserved sequence motifs, known as Bcl-2 homology (BH) domains (BH1-BH4). Here we focus on the Bcl-xL protein, for which the structure is known. Bcl-xL contains all four BH domains, whereas distantly related proteins that promote apoptosis posses only BH3. The BH1, BH2, and BH3 domains strongly influence homo- and hetero-dimerization of Bcl-xL. BH4 has been shown to be essential for Bcl-xL to prevent apoptotic mitochondrial changes (Shimizu et al. 2000).
Homologous sequences were obtained from the SwissProt database (www.expasy.org/sprot/). Since only five orthologous sequences were obtained, we supplemented the alignment with 26 paralogous sequences. An MSA of these homologs was built using ClustalW (Thompson, Higgins, and Gibson 1994). We call this data set BCL-BIG. A smaller MSA consisting of the 14 closest homologs of Bcl-xL was also constructed. This set only includes representatives from the anti-apoptotic family. We call this data set BCL-SMALL. For both data sets, an NJ tree was inferred using pairwise distances estimated by ML. Branch lengths in the resulting tree were then optimized using ML. The trees and the MSAs were given as input to the EB-EXP and the ML rate-inference methods. The inferred rates were then projected onto the three dimensional structure of a complex between Bcl-xL and a Bak BH3 fragment (PDB ID: 1bxl; Sattler et al. 1997). In this step, the continuous evolutionary rates were partitioned into a discrete scale of 9 bins. The range of each bin varied such that each one contained 1/9 of the positions. Bin 9 contained the most conserved positions and bin 1 contained the most variable positions.
The conservation pattern obtained by both EB-EXP and ML using the BCL-BIG set of homologs yielded two main surface patches of conserved residues (fig. 8a and b). The first patch corresponds to a hydrophobic groove, formed by residues from the BH1, BH2, and BH3 regions. This patch is the binding site for the Bak peptide. The conservation pattern obtained by EB-EXP is slightly more pronounced than the patch obtained by ML. The second conserved patch corresponds to the BH4 domain. Empirical evidence suggests that BH4 prevents apoptosis (Huang, Adams, and Cory 1998). However, this region is missing in more distant family members that also promote cell survival. In addition, no single residue in BH4 appeared to be essential for its function (Huang, Adams, and Cory 1998). EB-EXP graded the whole BH4 region as less conserved compared to ML.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
One basic assumption in this study was that the rate at each site is constant during evolution. However, one might also try to find sites that are conserved in one part of the tree but are variable in the other. Such rate shifts may indicate change in the selection intensity at specific sites during evolution (reviewed in Gaucher et al. 2002). Rate shifts can also be inferred using an empirical Bayesian approach (Susko et al. 2002; Blouin, Boucher, and Roger 2003) or by using ML (Knudsen and Miyamoto 2001; Pupko and Galtier 2002). In our simulations we assumed that the tree topology is known a priori. In cases where this is not the case, one might use the Markov chain Monte Carlo technique to take the uncertainty of the tree topology into account (Huelsenbeck et al. 2001). Bayesian methods in phylogeny were recently criticized by Suzuki, Glazko, and Nei (2002) in the context of overestimation of Bayesian support for internal nodes. In our case, however, we limited the Bayesian part to a Gamma prior over the evolutionary rates, which is not the case with Bayesian methods that aim at inferring phylogenies.
When using a discrete approximation to the Gamma distribution, as in EB-EXP, the number of discrete categories must be specified. Yang (1994, 1995) suggested that four rate categories are sufficient to provide an optimum or near-optimum fit by the model to the data and to provide a good approximation to the continuous Gamma distribution. Our results showed that four categories are insufficient. For example, when four rate categories are used, 12.5% at each end of the distribution is not taken into account, i.e., 25% of the area below the rate distribution curve is ignored. Consequently, very high or very low substitution rates cannot be observed. This is unfortunate, since these are exactly the rates we seek to identify when predicting functionally important sites. We note, however, that Yang's (1994) emphasis was either phylogenetic tree reconstruction or estimating the shape of the Gamma distribution, which may not change dramatically with the number of categories. In contrast, here we were interested in the rates themselves. The discrete Gamma method with eight categories was recently used by Susko et al. (2002) to infer rate shifts in different subtrees and by Excoffier and Yang (1999), Meyer, Weiss, and von Haeseler (1999), and Misof et al. (2002) to infer substitution rates per site. In light of our findings, choosing 16 categories instead of eight may improve the results.
The simulation results showed that EB-EXP performs better than ML. Since both methods use the same likelihood function in their computations, the differences between EB-EXP and ML must be due to the incorporation of the prior distribution, which reduces the posterior probability of extreme unfavorable observed rates in EB-EXP. It can be claimed that the superiority of the Bayesian approach depends on how well the prior function fits the data. An empirical Bayesian approach is used here, in which the parameter of the prior Gamma function is inferred from the data. This gives more flexibility for the prior to fit the data.
There is another difference between the EB-EXP and ML methods. In EB-EXP, the inferred rate is the expectation over the posterior rate distribution (Yang and Wang 1995; Excoffier and Yang 1999; Susko et al. 2002), whereas the ML estimate is the rate that maximizes the likelihood function. A second Bayesian method, EB-MAP, is possible, in which the rate yielding the maximum a posteriori probability is chosen (i.e., rmap = argmaxrP(r | data, T)). One advantage of EB-MAP over EB-EXP is that there is no need to use a discrete approximation to the continuous Gamma distribution. This can be done by a maximization procedure directly on the continuous posterior distribution. However, taking the expectation of the continuous Gamma distribution is known to be asymptotically more accurate than EB-MAP when the accuracy is measured by a sum-of-square error function. Thus, the advantage of using EB-MAP is that there is no need to approximate the Gamma distribution, while the advantage of EB-EXP is that without the approximation it should be asymptotically more accurate. Simulation results obtained with EB-MAP were very similar, though slightly inferior, to those obtained with EB-EXP (data not shown). We chose EB-EXP for this study because in this method it is easier to obtain not only a point estimate but also its credibility interval (Susko et al. 2002). We note that a common way to infer site-specific rates (e.g., Meyer, Weiss, and von Haeseler 1999) is to choose the discrete rate category that contributes the most to the posterior distribution. This is not a real "Map" estimate: because the prior probability of each category is identical, this would in fact be a discrete version of the ML approach.
We note that in our simulation the accuracy of inference is overestimated, since we rarely know the true tree as was set up in the simulation. In addition, the substitution model used for the simulation is the same as the one used for inference, which is most certainly not the case for real data sets. Nonetheless, this discrepancy is the same for all inference methods, so our conclusions regarding the relative efficiency of the two inference methods should still hold. This uncertainty in the estimation of tree topology, branch lengths, and evolutionary model also results in underestimated credibility intervals obtained for EB-EXP.
We demonstrated that regardless of the inference method employed, accuracy of prediction depends strongly on the amount of data, i.e., the number of sequences in the MSA. We further showed that the degree of similarity in these sequences, represented by branch lengths in the phylogenetic tree, also affects results. A decrease in prediction success was observed when the branch lengths were extremely short. In these cases the number of amino acid replacements was too small to allow reliable rate inferences. For EB-EXP, when branch lengths are very large, multiple replacements at a site might obscure the history of a character, resulting in a reduced accuracy. As ML tends to adopt extreme rates and MSE scores are highly sensitive to extreme rate values, a peculiar behavior for highly diverged sequences was observed in ML.
The shape of the rate distribution influences rate inference accuracy. Meyer and von Haeseler (2003) recently presented an ML variant that identifies site-specific substitution rates. In their simulation study that included different model trees, a decrease in accuracy was observed with increasing values, which is in disagreement with our results. The discrepancy can be explained by the different approaches used to infer accuracy. While MSE was used in our study, Meyer and von Haeseler (2003) used the correlation coefficient between the inferred and simulated rates. To illustrate why these two criteria for accuracy may yield different results, consider two sites evolving at relative rates of 1.02 and 0.98, respectively. If the inferred rates are 1.0 and 1.01, respectively, the inferred rates are very close to the true values but they are in the wrong order. MSE measures the deviation of the inferred rate from its true value for each site independently from the other sites. The correlation coefficient, however, measures to what extent the inferred and simulated rates vary together. Thus, when the rates are nearly homogenous (i.e., high
values), rates with extreme values are rare and the inference is more accurate (low MSE). Correlation coefficients, however, are expected to be relatively low.
Another shortcoming of the ML method is that its point estimates tend to adopt extreme values when the amount of data drops below a critical threshold (Lewis 2001). Thus, when the data are scarce, as was the case when rates were inferred from less than 12 sequences, ML resulted in very rough estimates (MSE = 2.92 and 2.0 for six and 12 sequences, respectively, compared with 0.51 and 0.32, respectively, for EB-EXP). Figure 9a and b show scatter plots of inferred rates obtained using the ML and EB-EXP methods versus the simulated values. Whereas several extreme values were observed using the ML method (fig. 9a), the inferred rates of the EB-EXP method were clustered close to the y = x line (fig. 9b).
|
One of the main difficulties in calculating site-specific conservation scores is to distinguish between amino acid sites that are conserved due to their functionality and those that appear to be conserved due to insufficient time since divergence. EB-EXP appears to differentiate better between these two cases. ML calculates only the most probable rate, which may be misleading when little data are available. Looking at the Bcl-xL example, the arginines in positions 6 and 139 provide an illustration. Both positions are fully conserved, yet while Arg139 is present in all 32 homologs, Arg6 appears in only 11 of them. ML could not discriminate between these two positions and assigned both the highest conservation score. In contrast, EB-EXP rated Arg139 as the most conserved position. Indeed, mutating Arg139 to glutamine in Bcl-xL has been shown to inhibit its anti-apoptotic function (Sattler et al. 1997). Arg6, on the other hand, was only the 29th conserved position (out of 169) when graded by EB-EXP, as it was missing in 21 homologs. This result is congruent with experiments: mutating Arg6 to alanine, in Bcl-xL's close homolog Bcl-2, did not diminish the protein activity (Huang, Adams, and Cory 1998).
The distinction between the Bayesian and ML analyses was reinforced when using limited data, as was the case with BCL-SMALL. Whereas the conservation pattern using EB-EXP was a bit more scattered than in the complete analysis (fig. 8c as compared to 8a), ML graded a vast number of positions as extremely conserved (fig. 8d). As a consequence, the conserved patch expands far beyond the Bak binding groove.
A robust evolutionary analysis can provide means for the identification of patches of conserved residues on the protein surface, which are essential for the protein's function. The bottleneck for the in silico identification of these functional patches appears to be the availability of sequence data (Bell and Ben-Tal 2003). Too little variation in the MSA caused by too few sequences or too little diversity among them can render evolutionary analysis meaningless (Thornton et al. 2000). Ten available homologous proteins appear to be the sensitivity threshold when using ML (Bell and Ben-Tal 2003). Our study implies that these are exactly the conditions where EB-EXP is distinctly better than ML.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Adams, J. M., and S. Cory. 1998. The Bcl-2 protein family: arbiters of cell survival. Science 281:1322-1326.
Bell, R. E., and N. Ben-Tal. 2003. In silico identification of functional protein interfaces. Comp. Funct. Genom. 4:420-423.[CrossRef][ISI]
Blouin, C., Y. Boucher, and A. J. Roger. 2003. Inferring functional constraints and divergence in protein families using 3D mapping of phylogenetic information. Nucleic Acids Res. 31:790-797.
del Sol Mesa, A., F. Pazos, and A. Valencia. 2003. Automatic methods for predicting functionally important residues. J. Mol. Biol. 326:1289-1302.[CrossRef][ISI][Medline]
Donaudy, F., A. Ferrara, L. Esposito, R. Hertzano, O. Ben-David, R. E. Bell, S. Melchionda, L. Zelante, K. B. Avraham, and P. Gasparini. 2003. Multiple mutations of MYO1A, a cochlear-expressed gene, in sensorineural hearing loss. Am. J. Hum. Genet. 72:1571-1577.[CrossRef][ISI][Medline]
Excoffier, L., and Z. Yang. 1999. Substitution rate variation among sites in mitochondrial hypervariable region I of humans and chimpanzees. Mol. Biol. Evol. 16:1357-1368.[Abstract]
Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368-376.[ISI][Medline]
Felsenstein, J. 2001. Taking variation of evolutionary rates between sites into account in inferring phylogenies. J. Mol. Evol. 53:447-455.[CrossRef][ISI][Medline]
Gaucher, E. A., X. Gu, M. M. Miyamoto, and S. A. Benner. 2002. Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem. Sci. 27:315-321.[CrossRef][ISI][Medline]
Glaser, F., T. Pupko, I. Paz, R. E. Bell, D. Bechor-Shental, E. Martz, and N. Ben-Tal. 2003. ConSurf: Identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19:163-164.
Huang, D. C., J. M. Adams, and S. Cory. 1998. The conserved N-terminal BH4 domain of Bcl-2 homologues is essential for inhibition of apoptosis and interaction with CED-4. Embo. J. 17:1029-1039.
Huelsenbeck, J. P., F. Ronquist, R. Nielsen, and J. P. Bollback. 2001. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310-2314.
Jin, L., and M. Nei. 1990. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol. Biol. Evol. 7:82-102.[Abstract]
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275-282.[Abstract]
Kimura, M. 1983. The neutral theory of molecular evolution. Pp. 208233 in M. Nei and R. Koehn, eds. Evolution of Genes and Proteins. Sinauer Associates, Sunderland, Mass.
Knudsen, B., and M. M. Miyamoto. 2001. A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc. Natl. Acad. Sci. USA 98:14512-14517.
Leonard, T., and J. S. J. Hsu. 1999. Bayesian methods: an analysis for statisticians and interdisciplinary researchers. Cambridge University Press, Cambridge.
Lewis, P. O. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst. Biol. 50:913-925.[CrossRef][ISI][Medline]
Lichtarge, O., and M. E. Sowa. 2002. Evolutionary predictions of binding surfaces and interactions. Curr. Opin. Struct. Biol. 12:21-27.[CrossRef][ISI][Medline]
Mella, M., G. Colotti, C. Zamparelli, D. Verzili, A. Ilari, and E. Chiancone. 2003. Information transfer in the penta-EF-hand protein sorcin does not operate via the canonical structural/functional pairing. A study with site-specific mutants. J. Biol. Chem. 278:24921-24928.
Meyer, S., and A. von Haeseler. 2003. Identifying site-specific substitution rates. Mol. Biol. Evol. 20:182-189.
Meyer, S., G. Weiss, and A. von Haeseler. 1999. Pattern of nucleotide substitution and rate heterogeneity in the hypervariable regions I and II of human mtDNA. Genetics 152:1103-1110.
Misof, B., C. L. Anderson, T. R. Buckley, D. Erpenbeck, A. Rickert, and K. Misof. 2002. An empirical analysis of mt 16S rRNA covarion-like evolution in insects: Site-specific rate variation is clustered and frequently detected. J. Mol. Evol. 55:460-469.[CrossRef][ISI][Medline]
Nielsen R. 1997. Site-by-site estimation of the rate of substitution and the correlation of rates in mitochondrial DNA. Syst. Biol. 46:346-353.[ISI][Medline]
Pupko, T., R. E. Bell, I. Mayrose, F. Glaser, and N. Ben-Tal. 2002. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18:S71-S77.
Pupko, T., and N. Galtier. 2002. A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes. Proc. R. Soc. Lond. B Biol. Sci. 269:1313-1316.[CrossRef][ISI][Medline]
Ramelot, T. A., S. Ni, S. Goldsmith-Fischman, J. R. Cort, B. Honig, and M. A. Kennedy. 2003. Solution structure of Vibrio cholerae protein VC0424: A variation of the ferredoxin-like fold. Protein Sci. 12:1556-1561.
RamShankar, M., S. Girirajan, O. Dagan, H. M. Ravi Shankar, R. Jalvi, R. Rangasayee, K. B. Avraham, and A. Anand. 2003. Contribution of connexin26 (GJB2) mutations and founder effect to non-syndromic hearing loss in India. J. Med. Genet. 40:E68.[CrossRef][Medline]
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.[Abstract]
Sattler, M., H. Liang, and D. Nettesheim, et al. (12 co-authors). 1997. Structure of Bcl-xL-Bak peptide complex: recognition between regulators of apoptosis. Science 275:983-986.
Shimizu, S., A. Konishi, T. Kodama, and Y. Tsujimoto. 2000. BH4 domain of antiapoptotic Bcl-2 family members closes voltage-dependent anion channel and inhibits apoptotic mitochondrial changes and cell death. Proc. Natl. Acad. Sci. USA 97:3100-3105.
Sullivan, J., K. E. Holsinger, and C. Simon. 1996. The effect of topology on estimates of among-site rate variation. J. Mol. Evol. 42:308-312.[ISI][Medline]
Susko, E., Y. Inagaki, C. Field, M. E. Holder, and A. J. Roger. 2002. Testing for differences in rates-across-sites distributions in phylogenetic subtrees. Mol. Biol. Evol. 19:1514-1523.
Suzuki, Y, G. V. Glazko, and M. Nei. 2002. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Sci. USA 99:16138-16143.
Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pp. 407514 in C. M. D. M. Hillis and B. K. Mable, eds. Molecular systematics. Sinauer Associates, Sunderland, Mass.
Thompson, J. D., D. G. Higgins., and T. J. Gibson. 1994. ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.[Abstract]
Thornton, J. M., A. E. Todd, D. Milburn, N. Borkakoti, and C. A. Orengo. 2000. From structure to function: approaches and limitations. Nat. Struct. Biol. 7:991-994.[CrossRef][Medline]
Valdar, W. S. 2002. Scoring residue conservation. Proteins 48:227-241.[CrossRef][ISI][Medline]
Whelan, S., P. Lio, and N. Goldman. 2001. Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. 17:262-272.[CrossRef][ISI][Medline]
Yang, Z. 1993. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10:1396-1401.[Abstract]
Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306-314.[ISI][Medline]
Yang, Z. 1995. A space-time process model for the evolution of DNA sequences. Genetics 139:993-1005.
Yang, Z. 1996. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11:367-372.[CrossRef][ISI]
Yang, Z., and T. Wang. 1995. Mixed model analysis of DNA sequence evolution. Biometrics 51:552-561.[ISI][Medline]
Yao, H., D. M. Kristensen, I. Mihalek, M. E. Sowa, C. Shaw, M. Kimmel, L. Kawraki, and O. Lichtarge. 2003. An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 326:255-261.[CrossRef][ISI][Medline]