Structural Constraints and Emergence of Sequence Patterns in Protein Evolution
Gustavo Parisi and
Julián Echave
,
Universidad Nacional de Quilmes, Bernal, Argentina
 |
Abstract
|
---|
The aim of this work was to study the relationship between structure conservation and sequence divergence in protein evolution. To this end, we developed a model of structurally constrained protein evolution (SCPE) in which trial sequences, generated by random mutations at gene level, are selected against departure from a reference three-dimensional structure. Since at the mutational level SCPE is completely unbiased, any emergent sequence pattern will be due exclusively to structural constraints. In this first report, it is shown that SCPE correctly predicts the characteristic hexapeptide motif of the left-handed parallel ß helix (LßH) domain of UDP-N-acetylglucosamine acyltransferases (LpxA).
 |
Introduction
|
---|
Protein sequences diverge due to amino acid replacements with a mostly neutral effect on organism fitness (Kimura 1983
; Perutz 1983
). To be neutral, sequence variation should have little effect on the protein's function, resulting in small variations in, for example, active sites (Mirny and Shakhnovich 1999
), folding transition states (Shakhnovich, Abkevich, and Ptitsyn 1996
; Mirny and Shakhnovich 1999
; Li, Mirny, and Shakhnovich 2000
), and native state structures (Bajaj and Blundell 1984
; Chothia and Lesk 1986
; Flores et al. 1993
; Wood and Pearson 1999
). Structure conservation would explain why substitution patterns depend on factors such as amino acid physicochemical properties (Xia and Li 1998
), local structural environment (Overington et al. 1990
), and overall environmental constraints (Tourasse and Li 2000
). It is known that considering structural effects on protein evolution can improve phylogenetic inference (Naylor and Brown 1997
). The latest models of protein evolution take this into account by using structure-dependent amino acid replacement rates (Koshi and Goldstein 1998
; Liò and Goldman 1998
). Despite the proven usefulness of such models, to gain further insight into protein evolution, new models should be developed in which structure-dependent substitution patterns are not set in advance, but result naturally as a consequence of restraining structure variation. In this report, a structurally constrained protein evolution (SCPE) model is presented. The model is related to methods used to study the inverse protein folding problem (Babajide et al. 1997
; Koehl and Levitt 1999a, 1999b
).
 |
Materials and Methods
|
---|
The Model
In SCPE, trial sequences, generated by random mutations at the gene level, are selected against departure from a reference structure. The model is based on a sequence-structure distance score, Sdist, which depends on a reference native structure, and a parameter, Sdiv, which measures the degree of structural divergence tolerated by natural selection. The sequence-structure distance measure Sdist is calculated as follows. First, the trial sequence is forced to adopt the three-dimensional reference structure. Then, mean field energies per position Etrial(p) and Eref(p) are calculated for the trial and reference sequences, respectively. Finally, Sdist = {
p [Etrial(p) - Eref(p)]2}1/2 is obtained. To calculate the mean-field energies, we used the PROSA II potential (Sippl 1993
), which includes additive pair contributions that depend on the amino acid types and the geometric distance between the Cß atoms of the interacting amino acids, as well as a surface term that models the protein-solvent interactions.
An SCPE simulation starts with a reference DNA sequence that codes for a reference protein of known three-dimensional structure. Then, each run involves the repetition of evolutionary time steps, which consist of the application of the following four operations. First, the DNA sequence of the previous time step is mutated by introducing a random nucleotide substitution into a randomly chosen sequence position (Jukes-Cantor model). Second, if the mutation introduces a stop codon, the mutated DNA is rejected; otherwise, the mutated DNA is translated, using the genetic code, to obtain a trial protein sequence. Third, the sequence-structure distance score, Sdist, is computed. Finally, the trial sequence is accepted only if Sdist is below the specified cut-off, Sdiv, which represents the degree of structural divergence allowed by natural selection.
Test System
The SCPE model was tested on the left-handed parallel ß helix (LßH) domain of UDP-N-acetylglucosamine acyltransferases (LpxA), which displays a distinctive sequence pattern that is likely to result from structural constraints. The reference for SCPE simulations was the LpxA of Escherichia coli (Raetz and Roderick 1995
) (fig. 1A
and B; PDB code 1lxa). The sequences of the LßH domain of members of the LpxA family (fig. 1C
) consist of the imperfect tandem repetition of hexapeptide units (Vaara 1992
; Vuorio et al. 1994
; Raetz and Roderick 1995
). The hexapeptides are characterized by a high degree of conservation of the third position, which usually displays I, L, or V (a one-letter code is used to designate amino acids). Hexapeptide position 1 is also significantly conserved, although less so than position 3, whereas the other four hexapeptide sites (2, 4, 5, and 6) are not conserved. Figure 1B
shows that the residues of conserved sites 1 and 3 point toward the inside of the beta helix, whereas those in variable positions point toward the outside. The LpxA family belongs to a larger superfamily of LßH acyltransferases. All members of this superfamily present the hexapeptide sequence motif, and those members whose structures have been determined display the LßH fold (Raetz and Roderick 1995
; Kisker et al. 1996
; Beaman et al. 1997
; Beaman, Sugantino, and Roderick 1998
; Brown et al. 1999
). Thus, both the LßH structure and the hexapeptide motif are highly conserved, despite the considerable divergence in sequence and function observed in the LßH superfamily (Parisi, Fornasari, and Echave 2000).

View larger version (76K):
[in this window]
[in a new window]
|
Fig. 1.LpxA of Escherichia coli and LpxA family (Vaara 1992
; Vuorio et al. 1994
; Raetz and Roderick 1995
). A, Cartoon view of the LßH domain of the LpxA of E. coli. PDB entry 1lxa. The left-handed ß helix is formed by nine triangular coils (C1C9). Each coil is formed by three hexapeptides, colored red, yellow, and blue, respectively. Loops are colored gray. B, Detailed view of coil C2 of A. Amino acids at conserved hexapeptide positions 1 (A18, A24, and C30) and 3 (I20, I26, and V32) are labeled. Panels A and B were prepared with the program MOLMOL (Koradi, Billeter, and Wuthrich 1996
). C, Multiple-sequence alignment of the LßH domain of the members of the LpxA family. The alignment was obtained using CLUSTAL W (Thompson, Higgins, and Gibson 1994
). Sequences are identified using the SwissProt/TrEMBL codes (Bairoch and Apweiler 2000). Conserved substitutions are shaded in black if the whole column is conserved, and they are shaded in gray if >75% is conserved. The following classes were used to judge conservation: aliphatic (ACILMV), aromatic (FHWY), polar (NQST), charged positive (KR), charged negative (DE), and special (GP). Colors of the first line of C relate the alignment to the structure shown in A and B. In addition, different coils (C1C9) and hexapeptide third positions (dots) are explicitly indicated
|
|
Probability Distributions
In order to compare the outcome of our simulations with the sequence patterns of actual sequences, we used amino acid probability distributions and entropies. The probability distributions for each of the sequence site classes s were calculated as follows. First, the sequences to be used to estimate the distribution were aligned. Second, a matrix H was built, where H(p, a) = 1 if amino acid a is found at column p of the multiple-sequence alignment, and H(p, a) = 0 otherwise. Finally, P(s, a) =
p
s H(p, a)/
p
s
20a=1 H(p, a) was calculated, where p
s indicates that the sum is limited to sequence positions that belong to the same class.
Let P(a; M) and P(a; D) be, respectively, a simulated distribution obtained with model M and the distribution obtained from experimental data set D. Then, the goodness of fit between the model and the data was measured using zP(M, D) = [
(M, D) -
(M)]/
(M), where the error was defined as
(M, D) =
a {[P(a; M) - P(a; D)]2/[P(a; M) + P(a; D)]}, and
(M) and 
(M) are the average and standard deviations of the errors obtained from comparing pairs of simulated runs. From such simulations, the distribution of zP(M, D) was obtained numerically, and it was found that it could be fit by a normal distribution of zero mean and unit standard deviation. To compare the abilities of two models M0 and M1 to fit the observed amino acid distribution D, we used zP(M0, M1) = [zP(M0, D) - zP(M1, D)]/
, which has a normal distribution with zero mean and unit variance.
Entropies
The variability of each site class was characterized using the site class entropy. These entropies were calculated from the amino acid probability distributions in the usual way using S(s) = -
20a=1 P(s, a)ln P(s, a).
The entropies of a model M and experimental data set D were compared using zS(M, D) = |[S(D) - S(M)]/
S(M)|, where S(D) is the entropy of D, S(M) is the entropy of M averaged over independent runs, and
S(M) is the corresponding standard deviation. The cumulative distribution function was found numerically from simulations to be well fitted by P(zS < z) = 2
(z) - 1, where
(z) is the normal cumulative distribution with zero mean and unit variance. As in the previous section, the abilities of two models M0 and M1 to fit the same data D can be compared using zS(M0, M1) = [zS(M0, D) - zS(M1, D)]/
, whose distribution is approximately normal with zero mean and unit variance.
 |
Results and Discussion
|
---|
We begin by exploring the relationship between sequence divergence and constraint for structure conservation. While the hexapeptide motif is very well conserved in the LpxA family, sequences can definitely diverge, showing as little as 40% identity (Vuorio et al. 1994
). Figure 2
shows that SCPE predicts a sigmoidal relationship between sequence divergence and tolerance to structural divergence (Sdiv). Even though sequences diverge, structure conservation limits this divergence to an extent that depends on the degree of constraint imposed by the environment. If structure divergence is too constrained (Sdiv
0), sequences cannot diverge at all, whereas in the limit of unconstrained evolution (Sdiv
), they lead to effectively random sequences. Since Sdiv measures the tolerance of the environment to structural divergence, it is expected to depend on the protein's function. This suggests the interesting possibility of a connection between figure 2
and the recent observation of a sigmoidal dependence of function similarity on sequence similarity (Wilson, Kreychman, and Gerstein 2000
).

View larger version (14K):
[in this window]
[in a new window]
|
Fig. 2.Sequence divergence is constrained by structure conservation. This figure shows the distance between the final amino acid sequences of SCPE simulations and the reference sequence as a function of Sdiv, the tolerance to structural divergence. The distance is obtained by averaging the percentage of amino acid differences over four runs of 6,000 mutational steps. Also shown is a sigmoidal function, y = 95x2.25/(5.742.25 + x2.25), that fits the data, with correlation coefficient R = 0.998
|
|
We further tested to see if the SCPE model can reproduce the characteristic variability pattern of the hexapeptide motif. The site entropy was used as a measure of the variability of a given hexapeptide site. Figure 3
shows that hexapeptide sites 1 and 3 are significantly conserved, whereas sites 2, 4, 5, and 6 are almost free to vary. It can be seen from figure 3
that for Sdiv = 6, the SCPE variability pattern is in very good agreement with the LpxA family. More importantly, the agreement is much better than that of the reference LpxA of E. coli (Sdiv = 0), which is the only information SCPE has about the LpxA family, since no member of the LßH superfamily was part of the database used to fit the PROSA II potential (Sippl 1993
). The variability pattern of any SCPE simulation at time 0 is that of the LpxA of E. coli (the initial sequence). As time increases, site entropies increase until they reach their asymptotic values. Figure 3
shows the variability patterns of an unconstrained SCPE simulation (Sdiv =
) with the same number of amino acid substitutions as the Sdiv = 6 simulation. Comparison between the Sdiv = 6 and Sdiv =
cases shows that the Sdiv = 6 pattern is mostly the result of structural constraints, rather than memory effects. A similar SCPE-experimental accord was found for intermediate constraints in the range 5 < Sdiv < 10 (data not shown).
Table 1
shows a quantitative comparison of the entropies shown in figure 3
. From the fourth row if this table, it is seen that SCPE with Sdiv = 6 fits the experimental LpxA entropies significantly better than E. coli (Sdiv = 0) for most sites. An exception is site 1, for which the LpxA of E. coli gives better results than the Sdiv = 6 SCPE simulations. However, when other members of the LßH superfamily are considered in the determination of the experimental pattern, Sdiv = 6 SCPE simulations also give significantly better results for hexapeptide site 1, as can be seen from the last two columns of table 1
. The last row of table 1
shows that SCPE with Sdiv = 6 gives significantly better results than the unconstrained case (Sdiv =
) for almost all hexapeptide sites, with all sites except site 4 supporting the rejection of the unconstrained model in favor of the constrained one with significances lower than 10%.
As a final assessment, the ability of SCPE to predict the correct amino acid probability distributions for the different hexapeptide sites was evaluated. Figure 4
shows that SCPE with Sdiv = 6 (and 5 < Sdiv < 10, not shown) is in very good agreement with the observed LpxA amino acid distributions. As for variability patterns, discussed in the previous paragraphs, this is in contrast with the poorer accord found between the LpxA family and either the reference protein (LpxA of E. coli; Sdiv = 0) or the unconstrained evolution (Sdiv =
) case. For the key hexapeptide site 3, Sdiv = 6 SCPE simulations reveal amino acids F, M, W, Y, and C, which are not present in the reference protein. Of these, F, M, and W are confirmed predictions, since they are also present in the LpxA family. Y, which does not appear in the LpxA distribution, is also a confirmed prediction, since we found it in other LßH proteins. In general, all upward triangles in figure 4
mark amino acids predicted by SCPE that, despite not being found in LpxA, are found in other LßH families. In contrast, downward triangles indicate differences between Sdiv = 6 SCPE and LpxA distributions that could not be found in the other LßH proteins considered. Note, however, that the probabilities of most downward-triangle amino acids are so small that they are not likely to be found in a sample the size of the LßH families considered. Moreover, it is interesting to note that even though downward-triangle amino acids may arise during evolution, they are selected against in Sdiv = 6 SCPE, as compared with the unconstrained case Sdiv =
.

View larger version (29K):
[in this window]
[in a new window]
|
Fig. 4.SCPE predicts the hexapeptide amino acid distributions. The probability (bubble area) distribution for each hexapeptide site of the LpxA family (LpxA: light gray) is compared with three SCPE simulations: maximum structural constraint Sdiv = 0 (SCPE0: black), intermediate constraint Sdiv = 6 (SCPE6: dark gray), and no constraint Sdiv = (SCPE : white). Amino acids are sorted as follows: aliphatic (ACILMV), aromatic (FHWY), polar (NQST), charged positive (KR), charged negative (DE), and special (GP). SCPE0 is the distribution corresponding to LxpA of Escherichia coli, since no variation is allowed. On the other hand, SCPE is the distribution obtained with the Jukes-Cantor model with no selection, except for the rejection of stop codons. Therefore, the differences between SCPE6 and SCPE are due to the selection pressure against structural divergence imposed by Sdiv = 6. Triangles are used to mark differences between SCPE6 and LpxA. Triangles pointing upward denote amino acids that, despite being absent in the LpxA family, were found in other proteins of the LßH superfamily (we considered all sequences with >25% identity to any of the LßH proteins of known structure (Raetz and Roderick 1995
; Kisker et al. 1996
; Beaman et al. 1997
; Beaman, Sugantino, and Roderick 1998
; Brown et al. 1999
). Downward triangles mark differences between SCPE6 and LpxA that could not be found in the other LßH sequences considered. The SCPE6 and SCPE distributions were obtained by averaging over 11 runs of 550 amino acid replacements, which was enough to converge the probability distributions for the SCPE6 case. The first and last coils, C1 and C9 in figure 1
, which are subject to constraints different from those of the other coils of LpxA, were not included in the calculation of probability distributions
|
|
In table 2
, a quantitative comparison of the amino acid distributions of figure 4
is performed. The fourth row of table 2
shows that SCPE with Sdiv = 6 fits the LpxA distributions significantly better than SCPE with Sdiv = 0 for most hexapeptide sites. As with entropies, an exception is site 1, for which the LpxA distribution is closer to that of the LpxA of E. coli than to the Sdiv = 6 SCPE distributions. As before, the situation is reversed when other members of the LßH superfamily are considered in the determination of the experimental pattern (last two columns of table 2
). The last row of table 2
shows that the Sdiv = 6 SCPE gives significantly better results than the unconstrained case (Sdiv =
) for the conserved hexapeptide sites 1 and 3, but that the unconstrained model cannot be significantly rejected in favor of the constrained one for the variable sites 2, 4, 5, and 6.
 |
Conclusions
|
---|
This report presented a novel and general model of structurally constrained protein evolution, developed to study the effects of structural constraints on sequence divergence. For the LßH domain of the LpxA family, with the only information of the sequence and structure of one of its members, the model predicts the sequence patterns characteristic of the whole family with a remarkable accuracy. Clearly, the general applicability of the SCPE model to other protein families remains to be studied, but it will take some time, since the model is computationally demanding. In this report, we aimed to present the model and show its applicability by studying one example case. From a mutational point of view, the present model treats all sites and all nucleotide replacements equivalently. Therefore, the observed biases in amino acid replacement patterns are a genuine outcome of the model, showing that they result naturally from constraining structural divergence.
Three considerations should be taken into account. First, SCPE is a neutral evolution model that cannot account for adaptive amino acid substitutions. However, this is not a serious drawback, since such replacements are very rare (Perutz 1983
; Golding and Dean 1998
). Second, SCPE does not explicitly consider the folding pathway, whereas folding constraints are known to result in sequence conservation (Shakhnovich, Abkevich, and Ptitsyn 1996
; Li, Mirny, and Shakhnovich 2000
). Nevertheless, this should not be a major shortcoming, since folding seems to be largely determined by the native structure (Baker 2000). Finally, it is important to stress that introducing mutations at the gene level, rather than protein level, apart from being more realistic, makes this model potentially useful for studying issues such as the effects of nucleotide substitution biases on amino acid sequence patterns or the effects of selection at protein level on nucleotide substitution patterns.
The SCPE model can easily be improved by using a nucleotide mutation model that is more realistic than the Jukes-Cantor model. Also, different energy functions can be used to calculate the sequence-structure distance score. Finally, in the present case we accepted all sequences with scores Sdist < Sdiv and rejected those with Sdist > Sdiv, but other dependencies of the probability of acceptance on Sdist could be used.
Regarding the dynamics of the substitutional process under the SCPE model, some of the issues that are currently being addressed in our group are (1) the site-dependent amino acid substitution probabilities under the SCPE model and their comparison with current models of protein evolution, (2) substitutional rate variation among amino acid sites, (3) correlations between the evolution of different amino acid sites, and (4) effects of structural constraints on the patterns of nucleotide substitution.
Even though our aim in building the SCPE model was to gain a better understanding of the process of molecular evolution, this model can also be useful in addressing phylogenetic inference issues. Thus, the model can be used to generate large benchmark data sets for the assessment of current probabilistic models. Furthermore, SCPE may be used to obtain structure-dependent substitution matrices and build structure-based probabilistic models that can be used, in turn, for phylogenetic inference purposes. Both issues are currently being studied in our group.
 |
Acknowledgements
|
---|
This work was supported by the Universidad Nacional de Quilmes and the Fundación Antorchas. J.E. is a Researcher of CONICET and a Guggenheim Fellow.
 |
Footnotes
|
---|
William Taylor,
Reviewing Editor
1 Abbreviation: SCPE, structurally constrained protein evolution. 
2 Keywords: molecular evolution
protein evolution
simulation
model 
3 Address for correspondence and reprints: Julián Echave, Universidad Nacional de Quilmes, Saenz Peña 180, B1876BXD Bernal, Argentina. je{at}unq.edu.ar 
 |
literature cited
|
---|
Babajide, A., I. L. Hofacker, M. J. Sippl, and P. F. Stadler. 1997. Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. Fold Des. 2:261269.[ISI][Medline]
Bairoch, A., and R. Apweiler. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28:4548.[Abstract/Free Full Text]
Bajaj, M., and T. Blundell. 1984. Evolution and the tertiary structure of proteins. Annu. Rev. Biophys. Bioeng. 13:453492.[ISI][Medline]
Baker, D. 2000. A surprising simplicity to protein folding. Nature 405:3942.
Beaman, T. W., D. A. Binder, J. S. Blanchard, and S. L. Roderick. 1997. Three-dimensional structure of tetrahydrodipicolinate N-succinyltransferase. Biochemistry 36:489494.
Beaman, T. W., M. Sugantino, and S. L. Roderick. 1998. Structure of the hexapeptide xenobiotic acetyltransferase from Pseudomonas aeruginosa. Biochemistry 37:66896696.
Brown, K., F. Pompeo, S. Dixon, D. Mengin-Lecreulx, C. Cambillau, and Y. Bourne. 1999. Crystal structure of the bifunctional N-acetylglucosamine 1-phosphate uridyltransferase from Escherichia coli: a paradigm for the related pyrophosphorylase superfamily. EMBO J. 18:40964107.[Abstract/Free Full Text]
Chothia, C., and A. M. Lesk. 1986. The relation between the divergence of sequence and structure in proteins. EMBO J. 5:823826.[Abstract]
Flores, T. P., C. A. Orengo, D. S. Moss, and J. M. Thornton. 1993. Comparison of conformational characteristics in structurally similar protein pairs. Protein Sci. 2:18111826.[Abstract/Free Full Text]
Golding, G. B., and A. M. Dean. 1998. The structural basis of molecular adaptation. Mol. Biol. Evol. 15:355369.[Abstract]
Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, England.
Kisker, C., H. Schindelin, B. E. Alber, J. G. Ferry, and D. C. Rees. 1996. A left-hand beta-helix revealed by the crystal structure of a carbonic anhydrase from the archaeon Methanosarcina thermophila. EMBO J. 15:23232330.[Abstract]
Koehl, P., and M. Levitt. 1999a. De novo protein design. I. In search of stability and specificity. J. Mol. Biol. 293:11611181.
. 1999b. De novo protein design. II. Plasticity in sequence space. J. Mol. Biol. 293:11831193.
Koradi, R., M. Billeter, and K. Wuthrich. 1996. MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph. 14:5155, 2932.[ISI][Medline]
Koshi, J. M., and R. A. Goldstein. 1998. Models of natural mutations including site heterogeneity. Proteins 32:289295.
Li, L., L. A. Mirny, and E. I. Shakhnovich. 2000. Kinetics, thermodynamics and evolution of non-native interactions in a protein folding nucleus. Nat. Struct. Biol. 7:336342.[ISI][Medline]
Liò, P., and N. Goldman. 1998. Models of molecular evolution and phylogeny. Genome Res. 8:12331244.[Abstract/Free Full Text]
Mirny, L. A., and E. I. Shakhnovich. 1999. Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J. Mol. Biol. 291:177196.[ISI][Medline]
Naylor, G. J., and W. M. Brown. 1997. Structural biology and phylogenetic estimation [letter]. Nature 388:527528.
Overington, J., M. S. Johnson, A. Sali, and T. L. Blundell. 1990. Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc. R. Soc. Lond. B Biol. Sci. 241:132145.[ISI][Medline]
Parisi, G., M. Fornasari, and J. Echave. 2000. Evolutionary analysis of gamma-carbonic anhydrase and structurally related proteins. Mol. Phylogenet. Evol. 14:323334.[ISI][Medline]
Perutz, M. F. 1983. Species adaptation in a protein molecule. Mol. Biol. Evol. 1:128.[Abstract]
Raetz, C. R., and S. L. Roderick. 1995. A left-handed parallel beta helix in the structure of UDP-N-acetylglucosamine acyltransferase. Science 270:9971000.
Shakhnovich, E., V. Abkevich, and O. Ptitsyn. 1996. Conserved residues and the mechanism of protein folding. Nature 379:9698.
Sippl, M. J. 1993. Recognition of errors in three-dimensional structures of proteins. Proteins 17:355362.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
Tourasse, N. J., and W. H. Li. 2000. Selective constraints, amino acid composition, and the rate of protein evolution. Mol. Biol. Evol. 17:656664.[Abstract/Free Full Text]
Vaara, M. 1992. Eight bacterial proteins, including UDP-N-acetylglucosamine acyltransferase (LpxA) and three other transferases of Escherichia coli, consist of a six-residue periodicity theme. FEMS Microbiol. Lett. 76:249254.[Medline]
Vuorio, R., T. Harkonen, M. Tolvanen, and M. Vaara. 1994. The novel hexapeptide motif found in the acyltransferases LpxA and LpxD of lipid A biosynthesis is conserved in various bacteria. FEBS Lett. 337:289292.[ISI][Medline]
Wilson, C. A., J. Kreychman, and M. Gerstein. 2000. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J. Mol. Biol. 297:233249.[ISI][Medline]
Wood, T. C., and W. R. Pearson. 1999. Evolution of protein sequences and structures. J. Mol. Biol. 291:977995.[ISI][Medline]
Xia, X., and W. H. Li. 1998. What amino acid properties affect protein evolution? J. Mol. Evol. 47:557564.[ISI][Medline]
Accepted for publication November 20, 2000.