Department of Zoology/Genetics, Center for Bioinformatics and Biological Statistics, Iowa State University
![]() |
Abstract |
---|
![]() |
Introduction |
---|
Since gene family proliferation is thought to have provided the raw materials for functional innovations, it is desirable, from sequence analysis, to identify amino acid sites that are responsible for the functional diversity. This approach has great potential for functional genomics because it is cost-effective, and these predictions can be further tested by experimentation. For molecular evolutionists, it is important to know the level of functional divergence after gene (or genome) duplication, as well as how many amino acid substitutions are actually involved in functional innovations. Since most amino acid changes are not related to functional divergence but represent neutral evolution, it is crucial to develop appropriate statistical methods to distinguish between these two possibilities. Indeed, some conventional computational approaches, such as homologous search and multiple alignment, may not be sufficient to solve this problem, because amino acid differences between two duplicate gene copies can be the result of either an ancient gene duplication or a more recent rapid divergence (Golding and Dean 1998
).
It has been recognized that when sequences of a gene family are available, the identification of functionally important residues can be approached computationally (e.g., Casari, Sander, and Valencia 1995
; Lichtarge, Bourne, and Cohen 1996
; Livingstone and Barton 1996
; Gu 1999
; Landgraf, Fischer, and Eisenberg 1999
). The approach introduced by Casari, Sander, and Valencia (1995)
used a vectorial analysis of sequence profiles to identify functionally important residues. Lichtarge, Bourne, and Cohen (1996)
developed a method called evolutionary tracing, which was extended recently by Landgraf, Fischer, and Eisenberg (1999)
, known as weighted evolutionary tracing. In these methods, the degree of conservation in each position is scored for different subfamilies and then visualized on the three-dimensional protein structure.
Gu (1999)
has developed a novel probabilistic model based on the underlying principle that functional divergence after gene duplication is highly correlated with the change of evolutionary rate. This correlation is the complement to a fundamental rule in molecular evolutionfunctional importance is highly correlated with evolutionary conservation (Kimura 1983
). A site-specific profile based on posterior probability was then developed to predict critical residues for functional differences between two gene clusters. Wang and Gu (unpublished data) have successfully applied this newly developed method to the study of the functional diversity of the caspase gene family and found that our predictions are supported by experimental data.
In this paper, the modeling for functional divergence after gene duplication was studied extensively under the Markov chain model of sequence evolution (Felsenstein 1981
), which is further extended to the case of a large family with many member genes. According to the observed alignment pattern (amino acid configuration), we studied two important types of functional divergence (type I and type II, respectively). We show that Gu's (1999)
method is a fast algorithm for two gene clusters. The performance of these methods was compared by examples.
![]() |
Functional Divergence After Gene Duplication |
---|
|
After gene duplication, functional divergence between homologous genes 1 and 2 is likely to occur in the early stage (fig. 2 ). According to amino acid configurations that are likely to be generated, there are two basic types of functional divergence after gene duplication. Type I functional divergence results in altered functional constraints (i.e., different evolutionary rates) between duplicate genes. Type II functional divergence results in no altered functional constraints but a radical change in amino acid properties between them (e.g., charge, hydrophobicity, etc.). Intuitively, one may expect that type I (or type II) amino acid configurations are likely to be generated by type I (or type II) functional divergence. This might be true only when the effect of type I (or type II) functional divergence has been shown to be statistically significant under a stochastic model. Thus, the possibility of a site being functional divergencerelated (type I or type II) can be measured by a posterior probability when the observed amino acid configuration is given.
|
![]() |
Type I Functional Divergence: Two Clusters |
---|
Unrooted Likelihood for Each Cluster
Under the Markov chain model, the likelihood for sequence evolution can be derived as follows (Felsenstein 1981
; Kishino, Miyata, and Hasegawa 1990
). First, the transition probability matrix for a given time period t can be computed as P = e
Rt, where the rate matrix R represents the pattern of amino acid substitutions, which can be empirically determined by, for example, the Dayhoff model (Dayhoff, Schwartz, and Orcutt 1978
) or the JTT model (Jones, Taylor, and Thornton 1992). The evolutionary rate (
) may vary among sites because of different functional constraints. Usually,
is treated as a random variable which follows a gamma distribution, i.e.,
| (1) |
Consider the phylogenetic tree in figure 2
. Let X = (x1, x2, x3, x4) and Y = (y1, y2, y3, y4) be the observed amino acid configurations of a site for clusters 1 and 2, respectively. For the unrooted subtree for cluster 1 or 2 (fig. 2A
), the conditional probability of observing X or Y at a site can be written as follows:
| (2) |
| (3) |
Two-State Model for Functional Divergence
For two gene clusters generated by gene duplication, the two-state model assumes that in each cluster one site has two possible states, F0 (functional divergenceunrelated) and F1 (functional divergencerelated). As a result, there are four combined states, i.e., (F0, F0), (F0, F1), (F1, F0), and (F1, F1). These states are also called functional divergence configurations, where the first position is for cluster 1 and the second is for cluster 2. When a site is under (F0, F0), i.e., no altered functional constraints in both clusters, the evolutionary rate at this site is virtually the same between two clusters, i.e., 1 =
2. For the last three combined states, however, the amino acid residue has experienced altered functional constraints (i.e., under F1) at least in one cluster, resulting in statistical independence between
1 and
2 (Gu 1999
).
The assumption of rate-independence for type I functional divergence means that knowledge of the evolutionary rate (or the functional constraint) at such sites in one cluster provides no information for predicting the intensity of functional constraint in the other cluster. Since 1 and
2 are independent under (F0, F1), (F1, F0), or (F1, F1), these combined states are not distinguishable under the current model; they have to be degenerated to a single one. Consequently, there are two nondegenerate combined states (functional divergence configurations), denoted by S0 = (F0, F0) and S1 = (F0, F1)
(F1, F0)
(F1, F1), respectively. It should be noted that (F0, F0) was written as F0, and S1 was written as F1 (Gu 1999
). To avoid further confusion, the F-notation describes the status in a single cluster, while S-notation is used for the functional divergence configuration of a gene family.
The Subtree Likelihood
Let P(S1) = 12 be the probability of a site being in state S1, and let P(S0) = 1 -
12 be the probability of a site being in state S0. We call
12 the coefficient of type I functional divergence between cluster 1 and cluster 2 (Gu 1999
). Let X and Y be the amino acid configurations of a site in clusters 1 and 2, respectively. Our purpose is to build a likelihood function for estimating
12 from sequences. Gu (1999)
has shown that the subtree likelihood provides a simple solution for this purpose. Since it only depends on the (unrooted) subtrees of two clusters, the joint probability can be easily derived based on the pattern of rate independence. In the following, we use an asterisk to distinguish the subtree likelihood from the conventional (whole-tree) likelihood, e.g., the joint probability of two subtrees is denoted by p*(X, Y).
Since evolutionary rates (1 and
2) at an S1 site (i.e., a site under S1) are statistically independent between two clusters, whereas they are completely correlated (
1 =
2 =
, without loss of generality) at an S0 site, the joint probability of subtrees conditional on S0 or S1 is given by
| (4) |
It should be noted that equation (4) actually assumes that the prior frequencies of amino acids at a site are independent between clusters X and Y, which means that the internal branch length is large. As shown in appendix A, the subtree likelihood (denoted by f0) can also be built if this prior is identical (i.e., the internal branch length is small). Interestingly, we have shown that the whole-tree likelihood can be expressed as a linear combination of f* and f0 (see eq. A.4 ). Here, we use f* as an example. Intuitively, it usually gives more conservative results than f0.
From the two-state model, one can easily show that the joint probability of two subtrees can be written as
| (5) |
| (6) |
Numerical Algorithm
It is complicated to compute p*(X, Y); it involves the phylogenetic tree, branch lengths (v), the shape parameter () of a gamma distribution, and the coefficient of functional divergence (
12). Usually, the integration requires a great deal of computational time even for a small number of sequences (Gu, Fu, and Li 1995
). We propose the following iteration to solve this problem:
| (7) |
After obtaining these ML estimates, the likelihood ratio test (LRT) can be constructed under the null hypothesis H0: 12 = 0 versus HA:
12 > 0. If H0 is rejected significantly, it provides statistical evidence for functional divergence in the coding region after gene duplication; i.e., functional constraints have shifted between two homologous genes.
![]() |
Type I Functional Divergence: Multiple Clusters |
---|
Subtree Likelihood for Multiple Clusters
Let i and Xi be the evolutionary rate and amino acid configuration of a site in cluster i, respectively; i = 1, ..., n, and X = (X1, ..., Xn). For two possible states (F0/F1) in each gene cluster, we have, in total, 2n possible combined states (functional divergence configurations).
For three gene clusters (n = 3), all functional divergence configurations (23 = 8) are listed as follows: (F0, F0, F0), (F0, F0, F1), (F0, F1, F0), (F1, F0, F0), (F0, F1, F1), (F1, F0, F1), (F1, F1, F0), and (F1, F1, F1), where the first, second, and third positions refer to the F0/F1 status of gene clusters 1, 2, and 3, respectively. For each of them, the relationship of evolutionary rates (1,
2, and
3) among three clusters is shown in table 1
. If a site is under (F0, F0, F0),
1,
2, and
3 are completely correlated so that one can assume
1 =
2 =
3 (=
) without loss of generality. Similarly, under (F0, F0, F1),
1 =
2 (=
), which is independent of
3, and so forth. However, the last four combined states, (F0, F1, F1), (F1, F0, F1), (F1, F1, F0), and (F1, F1, F1), have to be degenerated to a single combined state (S4) because
1,
2, and
3 are mutually independent (table 1
). Note that there are m = 2n - n nondegenerate combined states (functional divergence configurations) in the case of n clusters, which are denoted by Sj, j = 0, ..., m - 1. In particular, S0 indicates the functional divergence configuration that all clusters are under F0, i.e., S0 = (F0, F0, ..., F0).
|
| (8) |
Let j be the (prior) probability of a site under Sj, i.e.,
j = P(Sj). Thus, the joint probability of three subtrees at a site is given by
| (9) |
| (10) |
Numerical Algorithm
Similar to equation (6)
, unknown parameters can be estimated by maximizing the likelihood L* = k p*(X(k)), which can be approached by extending the algorithm for two clusters, i.e., equation (7) is replaced by the following procedure. When the number of gene clusters
(n) is not very large, the Newton-Raphson algorithm is
computationally efficient. Let
be the parameter vector,
=
(
1, ...,
m-1). The
iteration equation is then given by
(l+1)
=
(l) -
H-1g, where g is the grade vector,
whose ith element is gi =
ln
L*/
i, and H is the
Haesson matrix, whose ijth element is Hij
=
2 ln
L*/
i
j.
When appropriate initial values are given,
(l)
would converge to
, as l
. Finally, their large sample variance-covariance matrix can be approximately estimated by the inverse of Fisher's information matrix.
When n is large, an expectation and maximization (EM) algorithm can be implemented (Dempster, Laird, and Rubin 1977
). The EM method is a very general iterative approach for the data set with missing (or incomplete) data. In our case, the ML estimates of
i will be easy to obtain if we know the state (F0/F1) to which each site belongs in each gene cluster. Thus, the original data set is treated as incomplete data, missing the category information. Using a current estimate of the unknown parameter values, the expected value of the incomplete data is computed, weighted by the posterior probability. This is the expectation [E] step. The result is a set of likelihood equations that are considerably easier to solve than the full likelihood (the maximization [M] step). The new estimates obtained from the M step are then used to update the expected values, and this approach is iterated until convergence. A detailed discussion of implementing an EM algorithm for a large gene family will be published elsewhere.
LRTs can be constructed under various null hypotheses by specifying some coefficients of functional divergence. In particular, the LRT under the null H0: 0 = 1 (or
f = 0) is apparently the most powerful test. Significant rejection of the null hypothesis suggests functional divergence among member genes of a gene family.
![]() |
Type I and II Functional Divergences: Two Gene Clusters |
---|
In the early stage after gene duplication, many evolutionary forces (e.g., positive selection, functional relaxation, or coevolution between contact sites) might play roles in amino acid substitutions such that a comprehensive modeling could be complicated. A simple solution is to consider the internal branch between two clusters (i.e., the early stage) as cluster 0 (fig. 2B
), which is ancestral. Let 1 and
2 be the evolutionary rates in clusters 1 and 2, respectively, and let
0 be the evolutionary rate in the internal branch (cluster 0). For each cluster, a given site has two possible states, F0 (functional divergenceunrelated) and F1 (functional divergencerelated). Therefore, similar to the subtree likelihood of three clusters, we have 23 = 8 possible combined states that can be degenerated to 5 functional divergence configurations, under which the relationship between
0,
1, and
2 is shown in table 2
. Let
j (j = 0, 1, ..., 4) be the probability of a site being under Sj, i.e.,
j = P(Sj). For a given site, the conditional probability of observing X and Y is given by
| (11) |
| (12) |
| (13) |
| (14) |
|
![]() | (15) |
![]() |
Predicting Critical Residues for Functional Divergence |
---|
Type I Functional Divergence Predicted from the Subtree Likelihood
For the simple two-cluster case, there are only two nondegenerate states: S0 and S1. We wish to know the probability of S1 for a given site when the amino acid configuration (X, Y) is observed, i.e., P*(S1 | X, Y). The prior probability of S1 is P(S1) = 1 =
12. According to the Bayesian law, we have
| (16) |
Next, we consider the case of multiple clusters. Similar to equation (16)
, when the amino acid configuration (X) at a site is given, the posterior probability of each functional divergence configuration Si can be generally expressed as follows:
| (17) |
For a large gene family, with little knowledge about its functional diversity, a site-specific measure for the overall level of type I functional divergence at each site is useful. Since the coefficient of overall functional divergence of a gene family is defined as f = 1 -
0, gwhere
0 = P(S0) for S0 = (F0, ..., F0), it is natural to define such a measure as P*(F1 | X) = 1 - P*(S0 | X). According to equation (17)
, it is given by
| (18) |
Type I and Type II Functional Divergence
Based on the whole-tree likelihood for functional divergence, we can develop a site-specific profile for type I, as well as type II, functional divergence. In the case of two clusters, the posterior probability of each (nondegenerate) combined state Si (table 2
) can be computed as
| (19) |
| (20) |
![]() |
Case Study: COX (Cyclooxygenase) Gene Family |
---|
Figure 3
shows the phylogenetic tree of the COX gene family, inferred by the neighbor-joining method (Saitou and Nei 1987
). It is clear that these two isoforms were generated in the early stage of vertebrates (Zou et al. 1999
). Given this topology (fig. 3
), the ML estimate of the coefficient of type I functional divergence
between COX-1 and COX-2 is
12 = 0.44 ±
0.09, and that of the gamma shape parameter is
= 0.43 ± 0.05. We conducted the LRT for the null hypothesis H0:
12 = 0. The logarithm of likelihood under functional divergence is ln L = -7,312.70, and that under H0 (
12 = 0) is ln L = -7,326.51, resulting in
ln L = -7,312.70 - (-7,326.51) = 13.81. Thus, assuming that 2
ln L approximately follows a
2 distribution, we conclude that the null hypothesis (no functional divergence) can be rejected statistically (P < 0.01). Figure 4
shows the shape of the log of likelihood with respect to
12.
|
|
|
|
|
![]() |
Discussion |
---|
The Markov Chain Model for Amino Acid Substitutions
More complicated models for amino acid substitutions can be incorporated into our framework. For example, Bruno (1996)
proposed a likelihood that allows specification of the amino acid usage at a residue. Another interesting case to consider is that of the secondary structure of proteins (e.g., Thorne, Goldman, and Jones 1996
). The difficulty in implementing these models in practice concerns the computational complexity. In the case of the COX gene family (11 sequences, 538 residues) with two gene clusters, the current likelihood method takes about 15 min for MLE searching and posterior computation on a Pentium III PC. It can be expected that a more complex model will require more CPU time. We will investigate how much our estimation and prediction can be affected by the model selection, which can be performed with computer simulations.
Evolutionary rate and the pattern of amino acid substitutions are two fundamental components of an exploration of function/structure relationship from sequences (Lichtarge, Bourne, and Cohen 1996
; Livingstone and Barton 1996
; Landgraf, Fischer, and Eisenberg 1999
). The current model can be extended by assuming that the pattern of amino acid substitutions differs between states F1 and F0. For example, a typical Dayhoff- like model is used under state F0, while an equal-rates model is used under state F1 that allows more radical amino acid substitutions.
The Model for Functional Divergence
Gu (1999)
developed a simple method for estimating the coefficient of type I functional divergence between two gene clusters which is much faster than the current likelihood method. As shown by the example of the COX gene family, these methods give similar results. This conclusion has been verified by several other gene families (results not shown). Although extensive study is needed, it seems that estimation and prediction of functional divergence of a gene family is robust. Moreover, Gu (2000)
has studied a rate-component (di- gamma) model for functional divergence after gene duplication. The relationship between this model and the current two-state model is discussed in appendix B.
The two-state model for functional divergence can be modified by taking the universally conserved site (type 0 in fig. 1 ), at which the amino acid type is invariable among all sequences of the gene family, into account. These sites are important for maintaining the common function of the gene family and/or the backbone of the three-dimensional protein structure and apparently are not involved in functional diversity among duplicate genes. Thus, we may assign a permanent F0 state for a type 0 site in each cluster, such that the functional divergence configuration is always S0 = (F0, ..., F0).
The Subtree Versus the Whole-Tree Likelihood
We have developed two likelihood frameworks. As the whole-tree likelihood (eq. 14
) is built on the basis of the conventional approach (Felsenstein 1981
), the subtree likelihood (eq. 4
) neglects the evolutionary relationship among gene clusters. The subtree likelihood can be built for independent prior frequencies of amino acids (f*), which is suitable for large internal branch lengths, or for identical priors (f 0), which is suitable for short internal branch lengths (see appendix A for details). Although the estimation bias caused by these approximations has been shown to be small (unpublished data), further study is needed.
Interestingly, we show in equation (A.6)
that the whole-tree likelihood can be expressed as the average of
f* and f 0; the weighting function
is determined by the internal branch length. Our preliminary analysis in a
simple case has shown virtually no bias except for very skewed amino acid
frequencies (not shown). Nevertheless, a simple solution for the potential
bias correction is that the final estimate of is the average of
* and
0, i.e.,
= (1 - e-
)
* + e-
0, where
* and
0 are the estimates based on subtree likelihood f* and f 0, respectively, and
is the internal branch length. Another approach is to use ancestral sequence interference (e.g., Zhang and Nei 1997
). Of course, the subtree likelihoods cannot be used to infer the evolutionary relationship among clusters or locate the ancestral node of each cluster. Although the whole-tree likelihood is apparently superior, the subtree likelihood is useful in practice for the following reasons: (1) the computation is fast for multiple gene clusters; (2) it is still applicable when the phylogenetic relationship among gene clusters is unreliable; and (3) the statistical modeling is simple.
The Assumption of Site Independence
The assumption of site independence in our modeling can be criticized because it seems to be contradicted with the great deal of evidence that covariation of residues in evolution is related to structural or functional change (e.g., Pollock, Taylor, and Goldman 1999
). Here, we show that our method is actually the complement to the model of covariation, and a comprehensive analysis combining these two methods is important.
As indicated before, coevolution between sites is one important mechanism for functional divergence after gene duplication. If the evolutionary innovation occurred mainly in the early stage (e.g., cluster 0 in fig. 2 ), it is likely that protein evolution within each gene clusters (the late stage) may simply be under purifying selection, although functional constraints can be different between them. Since the subtree likelihood for type I functional divergence considers only gene clusters (late stages), the assumption of site independence seems reasonable.
For the whole-tree likelihood for estimating type I and type II functional divergences, modeling the mechanism of evolution in the early stage (cluster 0 in fig. 2
) is achievable, e.g., a site dependence model (Pollock, Taylor, and Goldman 1999
) or positive selection (Zhang, Rosenberg, and Nei 1998
). The difficulty concerns the statistical identifiability and the computational feasibility. While a simple model may provide a conservative test for type II functional divergence, we will address these problems in a future study.
Effects of Number of Sequences and Phylogeny
In order to obtain reliable estimates, we suggest that each gene cluster should contain at least four sequences with sufficient sequence diversity. It is not required to use the same set of species in each gene cluster. However, our preliminary result indicates that the estimation could be very biased in an extreme case of unbalanced trees. Moreover, we found that the accuracy of posterior prediction was satisfactory as long as the phylogeny of each cluster was largely correct.
In summary, the newly developed method, as well as previous work (Gu 1999
), provides a novel approach to exploring the pattern of functional divergence among member genes of a gene family. Predictions based on posterior analysis can be powerful and cost-effective in defining a group of amino acid residues that are responsible for these functional differences. Our method, combined with other approaches (e.g., positive selection analysis, coevolution of sites, ancestral inference, etc.), as well as substantial functional/structural information through high throughput technologies, is helpful in achieving a better understanding of functional diversity at the genome level.
![]() |
Appendix A |
---|
| (A1) |
| (A2) |
| (A3) |
| (A4) |
| (A5) |
| (A6) |
![]() |
Appendix B |
---|
Let 1 and
2 be the evolutionary rates of clusters 1 and 2, respectively. The rate-component model assumes that
1 and
2 can be formulated by the following linear equations:
| (1) |
Then, one can show that 1 (
2) is also gamma- distributed, with shape parameter
1 (or
2) and scale parameter ß1 (or ß2), respectively. Moreover,
1 =
0 +
1, and
2 =
0 +
2. We note that the model for rate variation among sites (e.g., Gu, Fu, and Li 1995
) is also a special case of the rate-component model when u1 = 0 and u2 = 0. Gu (2000)
showed that
1 and
2 jointly follow a di-gamma distribution and obtained the analytical form of the density
(
1,
2). In practice, however, it is more convenient to use the joint density of rate components, since they are independent, i.e.,
![]() | (2) |
| (3) |
The relationship between the two-state model and the three-component model can be described as follows. Equation (A.1)
implies that the covariance between 1 and
2 is proportional to the variance of u0, i.e., Cov(
1,
2) = Var(u0)/ß1ß2 =
20/ß1ß2. Since Var(
i) =
i/ß2i (i = 1, 2), one can verify that the coefficient of correlation between
1 and
2 can be written as
| (4) |
| (5) |
![]() |
Acknowledgements |
---|
![]() |
Footnotes |
---|
1 Keywords: gene family evolution
functional divergence
gene (genome) duplication
maximum likelihood
functional prediction
2 Address for correspondence and reprints: Xun Gu, Department of Zoology/Genetics, Center for Bioinformatics and Biological Statistics, 332 Science II Hall, Iowa State University, Ames, Iowa 50011. xgu{at}iastate.edu
![]() |
literature cited |
---|
Bork, P., and E. V. Koonin. 1998. Predicting functions from protein sequenceswhere are the bottlenecks? Nature Genetics 18:313318
Bruno, W. J. 1996. Modeling residue usage in aligned protein sequences via maximum likelihood. Mol. Biol. Evol. 13: 13681374
Casari, G., C. Sander, and A. Valencia. 1995. A method to predict functional residues in proteins. Struct. Biol. 2:171 178
Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary change in proteins. Pp. 342352 in M. O. Dayhoff, ed. Atlas of Protein Sequence Structure. vol 5, Suppl. 3. National Biomedical Research Foundation, Washington, D.C
Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 39:138[ISI]
Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368 376[ISI][Medline]
Golding, G. B., and A. M. Dean. 1998. The structural basis of molecular adaptation. Mol. Biol. Evol. 15:355369[Abstract]
Gu, X. 1999. Statistical methods for testing functional divergence after gene duplication. Mol. Biol. Evol. 16:1664 1674
Gu, X., Y. X. Fu, and W. H. Li. 1995. Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Mol. Biol. Evol. 12:546557[Abstract]
Gu, X., and J. Zhang. 1997. A simple method for estimating the parameter of substitution rate variation among sites. Mol. Biol. Evol. 14:11061113[Abstract]
Henikoff, S., E. A. Greene, S. Pietrokovski, P. Bork, T. K. Attwood, and L. Hood. 1997. Gene families: the taxonomy of protein paralogs and chimeras. Science 278:609 614
Holland, P. W. H., J. Garcia-Fernandez, N. A. Williams, and A. Sidow. 1994. Gene duplication and the origins of vertebrate development. Dev. Suppl. 1994:125133
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275282[Abstract]
Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, England
Kishino, H., T. Miyata, and Hasegawa. 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J. Mol. Evol. 31:151160[ISI]
Landgraf, R., D. Fischer, and D. Eisenberg. 1999. Analysis of heregulin symmetry by weighted evolutionary tracing. Protein Eng. 12:943951
Li, W. H. 1983. Evolution of duplicated genes. Pp. 1437 in M. Nei and R. K. Koehn, eds. Evolution of genes and proteins. Sinauer, Sunderland, Mass
Lichtarge, O., H. R. Bourne, and F. E. Cohen. 1996. An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257:342358[ISI][Medline]
Livingstone, C. D., and G. Barton. 1996. Identification of functional residues and secondary structure from protein sequence alignment. Methods Enzymol. 266:497512[ISI][Medline]
Lundin, L. G. 1993 Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. Genomics 16:119
Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin
Pollock, D., W. R. Taylor, and N. Goldman. 1999. Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287:187 198[ISI][Medline]
Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. 1992. Numerical recipes in C. Cambridge University Press, Cambridge, England
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425[Abstract]
Spring, J. 1997. Vertebrate evolution by interspecific hybridisationare we polyploid? FEBS Lett. 400:28
Thorne, J. L., N. Goldman, D. T. Jones. 1996. Combining protein evolution and secondary structure. Mol. Biol. Evol. 13:666673[Abstract]
Uzzel, T., and K. W. Corbin. 1971. Fitting discrete probability distribution to evolutionary events. Science 172:1089 1096
Wallace, J. L. 1999. Distribution and expression of cyclooxygenase (COX) isoenzymes, their physiological roles, and the categorization of nonsteroidal antiinflammatory drugs. Am. J. Med. 107:11S16S
Williams, C. S., M. Mann, and R. N. DuBois. 1999. The role of cyclooxygenases in inflammation, cancer, and development. Oncogene 18:79087916
Wolfe, K. H., and D. C. Shields. 1997. Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387:708713
Yang, Z. 1993. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10:13961401[Abstract]
. 1994. Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites: approximate methods. J. Mol. Evol. 39:306314[ISI][Medline]
Zhang, J., and M. Nei. 1997. Accuracy of ancestral amino acid sequences inferred by the parsimony, likelihood and distance method. J. Mol. Evol. 44:S139S146
Zhang, J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc. Natl. Acad. Sci. USA 95:37083713
Zou, J., N. F. Neumann, J. W. Holland, M. Belosevic, C. Cunningham, C. J. Secombes, and A. F. Rowley. 1999. Fish macrophages express a cyclo-oxygenase-2 homologue after activation. Biochem. J. 340:153159[ISI][Medline]