Protein sequence entropy is closely related to packing density and hydrophobicity

H. Liao1, W. Yeh2, D. Chiang3, R.L. Jernigan4 and B. Lustig1,5

1Department of Chemistry and 2Department of General Engineering, San Jose State University, San Jose, CA 95192-0101, 3Sage-N Research, Saratoga, CA 95070-6082 and 4L.H.Baker Center for Bioinformatics and Biological Statistics, Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50014, USA

5 To whom correspondence should be addressed. E-mail: blustig{at}science.sjsu.edu


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
We investigated the correlation between the Shannon information entropy, ‘sequence entropy’, with respect to the local flexibility of native globular proteins as described by inverse packing density. These are determined at each residue position for a total set of 130 query proteins, where sequence entropies are calculated from each set of aligned residues. For the accompanying aggregate set of 130 alignments, a strong linear correlation is observed between the calculated sequence entropy and the corresponding inverse packing density determined at an associated residue position. This region of linearity spans the range of C{alpha} packing densities from 12 to 25 amino acids within a sphere of 9 Å radius. Three different hydrophobicity scales all mimic the behavior of the sequence entropies. This confirms the idea that the ability to accommodate mutations is strongly dependent on the available space and on the propensity for each amino acid type to be buried. Future applications of these types of methods may prove useful in identifying both core and flexible residues within a protein.

Keywords: hydrophobicity/sequence entropy/sequence–structure relationship/sequence variability


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
General studies of the geometries within proteins have a long history and have lead to important insights into protein structure (Chothia et al., 1981Go; Chothia and Finkelstein, 1990Go; Maritan et al., 2000Go; Banavar et al., 2002Go). Specific studies of the packing geometries have indicated, for coarse-grained structures with one point per residue, that amino acids pack in local clusters with the same orientations as close-packed spheres (Bagci et al., 2002Go, 2003Go). At the same time, cavities within protein structures are known to be important for function (Doyle et al., 1998Go; Sigler et al., 1998Go; Zhang et al., 2003Go).

Globular proteins are compact and hence densely packed (Richards, 1974Go), even to the extent that their interior is frequently viewed as being solid-like (Hermans and Scheraga, 1961Go; Richards, 1997Go); however, there are still numerous voids and cavities in protein interiors (Liang and Dill, 2001Go). The importance of tight packing is widely acknowledged and is thought to be important for protein stability (Ericksson et al., 1992Go; Privalov, 1996Go), for nucleation of protein folding (Ptitsyn, 1998Go; Ptitsyn and Ting, 1999Go; Ting and Jernigan, 2002Go) and for the design of novel proteins (Dahiyat and Mayo, 1997Go). In conjunction with nucleation, it has previously been posited that the conservation of amino acid residues through evolution may include essential tightly packed sites (Mirny et al., 1998Go; Ptitsyn, 1998Go; Ptitsyn and Ting, 1999Go; Ting and Jernigan, 2002Go).

However, the exact relationship between sequence and structure is only partially understood (Jones, 2000Go; Baker and Sali, 2001Go), which is the subject of this paper. Whereas protein sequence is easily determined, 3-D structure is significantly more difficult. Employing sequence alignments in conjunction with molecular modeling has proven to be among the most successful computational methodologies for protein structure prediction (Bryant and Lawrence, 1993Go; Marti-Renom et al., 2000Go). One key assumption in homology-based modeling is that conserved regions share structural similarities, but the structural basis of this connection has not been clearly determined.

Multiple alignments of regions of secondary structure may be useful in the identification of key hydrophobic residues when utilizing hydrophobic cluster analysis (Poupon and Mornon, 1999Go; Gross et al., 2000Go). Determining patterns of variability within amino acid sequence by using information theory has also proven useful in identifying unique protein secondary structures (Pilpel and Lancet, 1999Go). Large-scale exploration of sequence space has shown clustering of sequence entropy values corresponding to a particular fold (Larson et al., 2002Go). The application of Shannon entropy to nucleic acid sequence variability has proven to be a useful tool in identifying control regions in DNA (Schneider et al., 1986Go) and has been extended as one of several methods of scoring amino acid conservation in proteins (Zou and Saven, 2000Go; Valdar, 2002Go).

Shannon entropies for protein sequence have been shown to correlate with entropies calculated from local physical parameters, including backbone geometry (Koehl and Levitt, 2002Go). Interestingly, conventional generalized chain statistics appear to overweigh significantly the magnitude of the entropic penalty associated with loop closure in proteins and RNA (Lustig et al., 1998Go; Scalley-Kim et al., 2003Go). It is clear that continued exploration of the connections between entropy, structure and sequence is critical to a better understanding of protein stability and function.

Although there have been some demonstrations of connections between sequence conservation and structural properties (Demirel et al., 1998Go), there are no definitive studies on this subject. Establishing direct connections between sequences and structural features has proven difficult, hence the limited number of successes at protein design and the limited understanding of mutagenesis. Recent applications of sequence variability to structure predictions have enhanced results, so empirical measures of sequence variability are useful by themselves, even if their full implications are not well understood in terms of structural features.

While investigations of packing of protein atoms would likely be informative, we chose here to investigate coarse-grained packing among points each representing a neighboring amino acid. The results we will see are then more general, even if not so directly useful in predictions related to protein design.

Here we generate a large set of aligned protein sequences generated from a diverse sample of 130 protein sequences. Sequence entropies for individual residues are calculated. They are then compared with the corresponding local flexibility as measured by the extent of C{alpha} packing calculated from the corresponding structures. Similar comparisons are also made between the residue hydrophobicity and the corresponding packing.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
A diverse, well-characterized set of 130 protein sequences (Table I) was compiled from the Protein Data Bank (2002)Go. Redundant proteins were removed. Sequences are utilized from a wide variety of proteins including multi-chain proteins, where 18% involve multi-chain proteins and the remaining 107 sequences are single-chain proteins. Aligned sequences are generated against each of these protein sequences, with BLASTP (Altschul et al., 1997Go) searching GenBank as available from the National Center of Biotechnology Information (2002)Go. Alignments are not included if bit scores fall below 100 and they must be at a level ≥40% of the best score. Calculations with a representative set of proteins showed 40% of the BLASTP bit score as a reasonable threshold with respect to calculations of sequence entropy and their dependence on density.


View this table:
[in this window]
[in a new window]
 
Table I. List of 130 proteins

 
Also at least 10 sequences are required. A maximum number of 100 alignments is typically allowed. The result generates a representative distribution of 7143 aligned protein sequences. The average and median number of alignments per query and the overall range of numbers of alignments are 55, 55 and 10–100, respectively. The frequency distribution of the BLASTP bit scores for all 130 sets of alignments is consistent with the right-skewed (i.e. positive skew) distribution for a randomized set of BLAST scores (Altschul et al., 1994Go). Here the mean, median and the overall range of BLASTP bit scores for all 7143 alignments are 408, 354 and 100–1793, respectively.

For protein sequences an expression for sequence entropy Sk at amino acid position k is expressed as

(1)
where the probability Pjk at some amino acid sequence position k is derived from the frequency fjk for an amino acid type j at sequence position k for all of the aligned residues. Although gaps could have been assigned as an additional amino acid type, we chose to ignore them here. In order to compare against the random case, we subtract the following term (Gerstein and Altman, 1995Go) from Equation 1:

(2)
where Pj is the probability of amino acid type j over all alignments.

For each residue from the 130 sample protein sequences, C{alpha} packing densities are calculated using their associated atomic coordinates. An optimal radius of C{alpha} packing was determined for 9 Å around a given C{alpha} residue position. In limited preliminary investigations this value was found to be best; greater scatter is observed for example in the single average entropies for radii of 10 and 11 Å. Smaller values omit some important cases in the distribution. Here we investigate the extent to which the inverse of the local packing density, as a measure of local flexibility (Bahar et al., 1997Go), is correlated with sequence variability.


    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Calculated sequence entropy (Equation 1) for each protein is compared against the inverse C{alpha} packing density (see Table I for summary). Typically, the probability P that the observed data could come from a randomized population (Bevington, 1969Go) for individual proteins falls below 0.001. A selection of correlation plots are shown in Figure 1A, B and C for pepsinogen (3psg, 365 aligned residues), dihydrofolate reductase (4dfr, 158 aligned residues) and oncomodulin (1omd, 107 aligned residues), respectively. The respective slopes are 13.020, 6.064 and 4.328, with respective correlation coefficients 0.447, 0.274 and 0.141. Data were collected in bins for each integral number of residues falling within a sphere of per 9 Å radius. For most single protein correlation plots the slopes remain effectively unchanged upon averaging.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 1. Correlation between sequence entropy and inverse of packing density for a range of proteins. The inverse of packing density (abscissa) is calculated from the sample protein's atomic coordinates, determining the number of residue's C{alpha} atoms within a 9 Å radius. Sequence entropy is calculated from a sequence alignment set generated by BLASTP from the query sequence. Average entropy (closed squares) is also determined by averaging the sequence entropy for all sequence positions falling within an interval of packing density. Error bars corresponding to standard deviation calculated from the data and the linear fit for all points (line) are shown. (A) For pepsinogen (3psg: 365 aligned residues), the straight-line fit for all data is y = 13.020x – 0.09 with correlation coefficient 0.447 and P < 0.001. For averaged data y = 12.070x – 0.09 with correlation coefficient 0.898 and P < 0.001. (B) For dihydrofolate reductase (4dfr: 158 aligned residues), the straight-line fit for all data is y = 6.064x + 0.34 with correlation coefficient 0.274 and P < 0.001. For averaged data y = 7.350x + 0.22 with correlation coefficient 0.796 and P < 0.001. (C) For oncomodulin (1omd: 107 aligned residues), the straight-line fit for all data is y = 4.328x + 0.43 with correlation coefficient 0.141 and P < 0.15. For averaged data y = 1.624x + 0.59 with correlation coefficient 0.149 and P < 0.15.

 
In total, there are 41 543 query residues following the removal of the 89 extreme outlying values indicated outside the two arrows shown in Figure 2. The mean and median frequency values per density interval of one C{alpha} per 9 Å radius are unchanged at 14.6 and 15. The overall (i.e. for all 130 alignment sets) sequence entropy versus inverse C{alpha} packing density correlation plots are shown in Figure 3A. Here, a single average is performed by summing individual residue entropies for a particular C{alpha} packing density interval from all 130 sets of protein alignments. ‘Double’ averaging entails first averaging the entropy per density interval for individual proteins, before averaging over the full set of proteins. Except for a significant reduction in standard deviations with the ‘double’ averaging procedure, the two types of averaged sequence entropy are essentially identical.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 2. Frequency distribution of the number of aligned residues as a function of C{alpha} density within a radius of 9 Å. The total original 41 632 query protein residues for the set of 130 proteins have a mean packing density of 14.6, a median of 15 and SD 4.056. These values remain effectively unchanged for the 41 543 residues remaining following the removal of outlying values to the left of the first arrow and to the right of the second.

 


View larger version (18K):
[in this window]
[in a new window]
 
Fig. 3. Correlation plots of overall average entropy for the set of 130 proteins with inverse packing density. (A) Inverse packing density (ordinate) is calculated from C{alpha} packing density noted in Figure 2 and overall sequence entropy (ordinate) is calculated in three ways: single averaged (open diamonds) and corrected for randomness as noted on the right ordinate (open circles) and ‘double’ averaged (open triangles). Single averaged entropy is determined by averaging sequence entropy for each associated residue position within its interval of inverse of packing density (abscissa). The estimated standard deviation with and without corrections for randomness is 0.5. ‘Double’ averaged sequence entropy is calculated by first averaging each protein's sequence entropy for a particular density interval and subsequently averaging over all proteins. The estimated standard deviation is 0.3. (B) Linear regression of the selected Region I with 31 169 averaged residue entropy values (ordinate) out of the total of 41 632 aligned query residues. These averaged sequence entropy values correspond to the region of inverse packing density (abscissa) between 0.040 and 0.083 (or 25 to 12 C{alpha} atoms within a 9 Å radius). Overall single averaged entropy (open squares) is fitted with a straight-line y = 12.350x – 0.20 with correlation coefficient 0.997 and P < 0.001. The ‘double’ averaged entropy (open triangle) straight-line fit is y = 12.658x – 0.22 with correlation coefficient 0.997 and P < 0.001. Note that between 0.040 and 0.083 inverse packing density, the single averaged entropy corrected for randomness has a straight-line fit y = 12.409x + 3.05 with correlation coefficient 0.998 and P < 0.001.

 
There are two major regions corresponding to high and low densities observable in the correlation plots of sequence entropy versus inverse packing density in Figure 3A. Note that a similar overall pattern of single averaged sequence entropy was observed when the effects of randomness were accounted for by subtracting the term shown in Equation 2. Region I, with a steep slope, corresponds to the higher packing densities of 25 to 12 C{alpha} atoms (inverse density from 0.040 to 0.083), where an increase in sequence entropy is clearly proportional to the inverse density. Region II to the right still includes a significant number of residues (10 173) and is found to be nearly constant in calculated sequence entropy, involving packing densities ranging from 11 to 6 (representing an upper bound inverse density of 0.17). It is logical that beyond a certain packing density, changes in sequence entropy remain uncorrelated.

Region I, in the overall correlation plots (Figure 3B), involves 74.9% of all the sample protein residues. Here the single averaged and ‘double’ averaged sequence entropies are shown to be strongly linearly correlated with the inverse packing density. The straight-line fit for the single averaged sequence entropy versus inverse packing density is y = 12.350x 0.20; the correlation coefficient is 0.997; P < 0.001. The straight-line fit involving the ‘double’ averaged entropy is effectively identical. Region II, accounting for an additional 24.4% of the sample protein residues, indicates for strongly hydrophobic residue types (Poupon and Mornon, 1999Go) an apparent limiting fraction (Figure 3A) of about 10%. This suggests a threshold for the number of hydrophobic residues embedded in regions that are probably accessible to water.

Shown in Figure 4A is a superposition of normalized averaged sample protein hydrophobicities and single averaged sequence entropy, as a function of inverse packing density. Using three different scales (Hopp and Woods, 1981Go; Engelman et al., 1986Go; Sharp et al., 1991Go), hydrophobicity is calculated for every query protein residue that is part of an alignment. For Hopp and Woods (1981)Go calculations by Levitt (1976)Go were also included. With each scale, a normalized hydrophobicity is calculated for the set of all residues within a density interval. Then those three normalized hydrophobicity plots (see Figure 4B) are averaged and renormalized again. Superimposed is the smooth curve normalized representation (determined from original values in Figure 3A) of values for sequence entropy. Clearly, all three sets of hydrophobicity values, calculated for each scale (Figure 4B), resemble the corresponding sequence entropy values.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 4. Comparison of average hydrophobicity per residue and overall single-averaged sequence entropy with respect to inverse C{alpha} packing density. Residue hydrophobicity is calculated for each query protein, weighting the aligned residue type with the different scales: Hopp and Woods (1981)Go, Engelman et al. (1986)Go and Sharp et al. (1991)Go. The average hydrophobicity for each scale is calculated by averaging the residue hydrophobicities for all aligned residues within an interval of packing density. (A) Each of three sets of hydrophobicities corresponding to the different scales are normalized and then their average is renormalized (dotted line). The single-averaged sequence entropy from Figure 3A is normalized (solid line) and also plotted against inverse density. (B) Inset shows the corresponding three normalized sets of hydrophobicities plotted against inverse density, from Sharp et al. (diamonds), Hopp and Woods (inverted triangles) and Engelman et al. (squares).

 

    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Flexibility and sequence entropy

Previously a strong correlation has been reported between computed displacements based on elastic networks reflecting residue packing (Bahar et al., 1998Go) and measured hydrogen exchange (HX). The freedom to move a residue is entropic in character. Regions of high packing density resist hydrogen exchange, because of both stability and inaccessibility. Here, we have gone further to relate our calculated inverse C{alpha} packing density from X-ray structures to the sequence variabilities. Strong linear correlations are observed between sequence entropy and the inverse packing density, except at the highest and low ranges of densities. This provides a quantitative relationship between these two quantities and an important structural measure for determining likely sites for mutagenesis.

The selection of sequences to be included in sequence analysis is a difficult problem and results can depend strongly on the selection procedure. Ptitsyn (1998)Go advocated selection of conserved clusters of sequence sets determined by including only distantly related species. However, here we simply used the sequence matches from GenBank without any filtering. Despite this, the overall trends are extremely clear, although to a limited extent within individual proteins.

In addition, the correlation between sequence variability and motility is consistent with a similar pattern that we noted with respect to peptide binding to RNA (Hsieh et al., 2002Go). Enhanced motility at a particular residue position is associated with the ability of local structure to accommodate mutation. Such behavior can more broadly be related to sequence variability in a folded protein. The ability to accommodate mutations corresponds to allowing a range of positions, including possible contacts.

Hydrophobicity and sequence variability

The strong correlation between calculated sequence entropy and the hydrophobicity shown in Figure 4 is remarkable. For each protein, its sequence entropy is calculated at each sequence position. This simply reflects the sequence variability at that position. The hydrophobicity for each residue position of each original single sequence is averaged for each bin over just the 130 sample proteins. It is important to remember that here the sequence entropy and the hydrophobicity calculations are both averaged over all residues within each density bin. In addition, the three sets of hydrophobicity scales (Hopp and Woods, 1981Go; Engelman et al., 1986Go; Sharp et al., 1991Go) are diverse in their origins and include experimental optimization and/or validation based on a variety of systems. Calculations by Levitt (1976)Go were also included for use by Hopp and Woods (1981)Go. The lack of any significant differences among the three sets of normalized hydrophobicity values (Figure 4B) as a function of inverse density suggests that the relative differences among individual amino acids within a hydrophobicity scale are largely compensated among other values within that set. Clearly, correlations between the sequence variabilities reflected in the sequence entropies and the corresponding hydrophobicities are consistent with the average behavior for residues with a given packing density. Still, this observed correlation between average sequence entropy and hydrophobicity is remarkable, but both are reflecting fundamental properties relating to the extent of burial. The critical importance of hydrophobicity for folding of model protein chains (Hinds and Levitt, 1994Go; Dill et al., 1995Go) is well known. This is consistent with the fact that key hydrophobic residues can be described as buried or tightly packed (Ptitsyn, 1998Go; Ting and Jernigan, 2002Go).

Packing and the resulting interactions associated with hydrophobicity are not a simple matter of just accounting for pairs of contacts (Dima and Thirumalai, 2004Go). In packing multiple contacts are usual. Our calculation of C{alpha} packing density represents a coarse-grained counting of such contacts, but is a less detailed consideration. We show that the local flexibility is closely related to the inverse of the coarse-grained packing density.

Here, sequence variability as measured by sequence entropy is correlated with the inverse of the residue packing. The propensity for packing of a particular amino acid type reflects its hydrophobicity and side chain entropy (Pickett and Sternberg, 1993Go). Notably, average contact energies for the various amino acid pairs also correlate well with existing hydrophobicity scales (Young et al., 1994Go). This suggests that in principle these are strongly entropic in nature. It might be possible to calculate more directly configurational entropies in lieu of the comparable inverse density measure of relative flexibility, by using full atomic representation. Such calculations would depend upon a residue's environment in more realistic ways than given by simple residue density. This might also reduce the range for individual residue entropies calculated from sequence variability within a density bin.

Progress in this direction would assist with protein design, a closely related problem (Dahiyat and Mayo, 1997Go; Li et al., 1998Go; Buchler and Goldstein, 1999Go; Shih et al., 2000Go; Tiana et al., 2001Go; Koehl and Levitt, 2002Go; Larson et al., 2002Go; England et al., 2003Go). Further studies in the direction of the present work could lead to better predictions of sustainable sequence substitutions. However, from the present results it appears that every measure of packing density for single residues of a single protein does not necessarily correlate well with the sequence conservation at that site. Further efforts are clearly required to achieve this goal; however, the present results begin to point out a way for achieving such a goal.

Conclusion

Here packing at the residue level for coarse-grained structures has been shown to exhibit a strong connection to sequence conservation, by the practice of averaging over large numbers of residues. Why is this averaging necessary? One possible explanation is that the large number of combinations of ways in which a residue's atoms can be packed together requires averaging over large numbers of occurrences, in order to obtain a meaningful single representation of all these combinations. It is also possible that residue size may affect the results, so that averaging over many occurrences will fully account for all of the various types of neighboring residues including individual side chain conformations.

Two distinct behaviors are identified for different inverse packing density regions (Figures 3 and 4). In the first region, 74.9% of sequence positions exhibit a linear dependence of sequence entropy over the inverse C{alpha} packing density range 0.040–0.083, whereas in the second region, having inverse packing density >0.083, another 24.4% of query positions typically indicate a nearly constant sequence entropy. This saturation suggests that up to a certain minimum number of residues are allowed in low-density regions. Moreover, a certain fraction of those residues are hydrophobic and would appear to be accessible to water, consistent with a considerable lack of restrictions on the types of residues that can be accommodated. All of this suggests that for most residue positions the ability to accommodate sequence substitutions as measured by sequence entropy is inversely correlated with the extent of their packing. Also, on average for a particular amino acid type, hydrophobicity is correlated with the degree of residue packing. Deeper understanding of the connections between structural properties and sequence entropy awaits further study. However, the future development of such sequence entropy methods for the identification of core as well as flexible residues appears promising.


    Acknowledgments
 
This work included resources from BERI (Biotechnology Education and Research Institute) at San Jose State University and CSUPERB (California State University Program for Education and Research in Biotechnology).


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Altschul,S.F., Boguski,M.S., Gish,W. and Wooten,J.C. (1994) Nat. Genet., 6, 119–129.[CrossRef][ISI][Medline]

Altschul,S.F., Madden,T.L., Scaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

Bagci,Z., Jernigan,R.L. and Bahar,I. (2002) J. Chem. Phys., 116, 2269–2276.[CrossRef][ISI]

Bagci,Z., Kloczkowski,A., Jernigan,R.L. and Bahar,I. (2003) Proteins, 53, 56–67.[CrossRef][ISI][Medline]

Bahar,I., Atilgan,A.R. and Erman,B. (1997) Fold. Des., 2, 173–181.[ISI][Medline]

Bahar,I., Wallqvist,A., Covell,D.G. and Jernigan,R.L. (1998) Biochemistry, 37, 1067–1075.[CrossRef][ISI][Medline]

Baker,D. and Sali,A. (2001) Science, 294, 93–96.[Abstract/Free Full Text]

Banavar,J.R., Maritan,A., Micheletti,C. and Trovato,A. (2002) Proteins, 47, 315–322.[CrossRef][ISI][Medline]

Bevington,P.R. (1969) Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill, New York, Appendix C.

Bryant,S.H. and Lawrence,C.E. (1993) Proteins, 16, 92–112.[ISI][Medline]

Buchler,N.E. and Goldstein,R.A. (1999) Proteins, 34, 113–124.[CrossRef][ISI][Medline]

Chothia,C. and Finkelstein,A.V. (1990) Annu. Rev. Biochem., 59, 1007–1039.[CrossRef][ISI][Medline]

Chothia,C., Levitt,M. and Richardson,D. (1981) J. Mol. Biol., 145, 215–250.[CrossRef][ISI][Medline]

Dahiyat,B.I. and Mayo,S.L. (1997) Science, 278, 82–87.[Abstract/Free Full Text]

Demirel,M.C., Atilgan,A.R., Jernigan,R.L., Erman,B. and Bahar,I. (1998) Protein Sci., 7, 2522–2532.[Abstract/Free Full Text]

Dill,K.A., Bromberg,S., Yue,K., Fiebig,K.M., Yee,D.P., Thomas,P.D. and Chan,H.S. (1995) Protein Sci., 4, 561–602.[Abstract/Free Full Text]

Dima,R.I. and Thirmalai,D. (2004) J. Phys. Chem. B, 108, 6564–6570.[CrossRef][ISI]

Doyle,D.A., Cabral,J.M., Pfuetzner,R.M., Kuo,A., Gulbis,J.M., Cohen,S.L., Chait,B.T. and MacKinnon,R. (1998) Science, 280, 69–77.[Abstract/Free Full Text]

Engelman,D.M., Steitz,T.A. and Goldman,A. (1986) Annu. Rev. Biophys. Biophys. Chem., 15, 321–353.[CrossRef][ISI][Medline]

England,J.L., Shakhnovich,B.E. and Shakhnovich,E.I. (2003) Proc. Natl Acad. Sci. USA, 100, 8727–8731.[Abstract/Free Full Text]

Eriksson,A.E., Baase,W.A., Zhang,X.J., Heinz,D.W., Blaber,M., Baldwin,E.P. and Matthews,B.W. (1992) Science, 255, 178–183.[ISI][Medline]

Gerstein,M. and Altman,R.B. (1995) J. Mol. Biol., 251, 161–175.[CrossRef][ISI][Medline]

Gross,E.A., Li,G.R., Lin,Z.-Y., Ruuska,S.E., Boatright,J.H., Mian,I.S. and Nickerson,J.M. (2000) Mol. Vis., 6, 30–39.[ISI][Medline]

Hermans,J. and Scheraga,H.A. (1961) J. Am. Chem. Soc., 83, 3293–3330.[CrossRef][ISI]

Hinds,D.A. and Levitt,M. (1994) J. Mol. Biol., 243, 668–682.[CrossRef][ISI][Medline]

Hopp,T.P. and Woods,K.R. (1981) Proc. Natl Acad. Sci. USA, 78, 3824–3828.[Abstract]

Hsieh,M., Collins,E.D., Blomquist,T. and Lustig,B. (2002) J. Biomol. Struct. Dyn., 20, 243–251.[ISI][Medline]

Jones,D.T. (2000) Curr. Opin. Struct. Biol., 10, 371–379.[CrossRef][ISI][Medline]

Koehl,P. and Levitt,M. (2002) Proc. Natl Acad. Sci. USA, 99, 1280–1285.[Abstract/Free Full Text]

Larson,S.M., England,J.L., Desjarlais,J.R. and Pande,V.S. (2002) Protein Sci., 11, 2804–2813.[Abstract/Free Full Text]

Levitt,M. (1976) J. Mol. Biol., 104, 59–107.[CrossRef][ISI][Medline]

Li,H., Tang,C. and Wingreen,N.S. (1998) Proc. Natl Acad. Sci. USA, 95, 4987–4990.[Abstract/Free Full Text]

Liang,J. and Dill,K.A. (2001) Biophys. J., 81, 751–766.[Abstract/Free Full Text]

Lustig,B., Bahar,I. and Jernigan,R.L. (1998) Nucleic Acids Res., 26, 5212–5217.[Abstract/Free Full Text]

Maritan,A., Micheletti,C., Trovato,A. and Banavar,J.R. (2000) Nature, 406, 287–290.[CrossRef][ISI][Medline]

Marti-Renom,M.A., Stuart,A.C., Fiser,A., Sanchez,R., Melo,F. and Sali,A. (2000) Annu. Rev. Biophys. Biomol. Struct., 29, 291–325.[CrossRef][ISI][Medline]

Mirny,L, Abkevich,V.L. and Shakhnovich,E.I. (1998) Proc. Natl Acad. Sci. USA, 95, 4976–4981.[Abstract/Free Full Text]

National Center for Biotechnology Information (2002) http://www.ncbi.nlm.nih.gov/

Pickett,S.D. and Sternberg,M.J.E. (1993) J. Mol. Biol., 231, 825–839.[CrossRef][ISI][Medline]

Pilpel,Y. and Lancet,D. (1999) Protein Sci., 8, 969–977.[Abstract]

Poupon,A. and Mornon,J.-P. (1999) Theor. Chem. Acc., 101, 2–8.[ISI]

Privalov,P.L. (1996) J. Mol. Biol., 258, 707–725.[CrossRef][ISI][Medline]

Protein Data Bank (2002) http//www.rcsb.org.pdb/

Ptitsyn,O.B. (1998) J. Mol. Biol., 278, 655–666.[CrossRef][ISI][Medline]

Ptitsyn,O.B. and Ting,K.L. (1999) J. Mol. Biol., 291, 671–682.[CrossRef][ISI][Medline]

Richards,F.M. (1974) J. Mol. Biol., 82, 1–14.[CrossRef][ISI][Medline]

Richards,F.M. (1997) Cell. Mol. Life Sci., 53, 790–802.[CrossRef][ISI][Medline]

Scalley-Kim,M. Minard,P. and Baker,D. (2003) Protein Sci., 12, 197–206.[Abstract/Free Full Text]

Schneider,T.D., Stormo,G.D. and Gold,L. (1986). J. Mol. Biol., 188, 415–431.[CrossRef][ISI][Medline]

Sharp,K.A., Nicholls,A., Friedman,R. and Honig,B. (1991) Biochemistry, 30, 9686–9697.[CrossRef][ISI][Medline]

Shih,C.T., Su,Z.Y., Gwan,J.F., Hao,B.L., Hsieh,C.H. and Lee,H.C. (2000) Phys. Rev. Lett., 84, 386–389.[CrossRef][ISI][Medline]

Sigler,P.B., Xu,Z., Rye,H.S., Burston,S.G., Fenton,W.A. and Horwich,A.L. (1998) Annu. Rev. Biochem., 67, 581–608.[CrossRef][ISI][Medline]

Tiana,G., Broglia,R.A. and Provasi,D. (2001) Phys. Rev. E, 64, 011904_1–6.

Ting,K.L. and Jernigan,R.L. (2002) J. Mol. Evol., 54, 425–436.[CrossRef][ISI][Medline]

Valdar,W.S.J. (2002) Proteins, 48, 227–241.[CrossRef][ISI][Medline]

Young,L., Jernigan,R.L. and Covell,D.G. (1994) Protein Sci., 3, 717–729.[Abstract/Free Full Text]

Zhang,J., Chen,R., Tang,C. and Liang,J. (2003) J. Chem. Phys., 118, 6102–6109.[CrossRef][ISI]

Zou,J. and Saven,J.G. (2000) J. Mol. Biol., 296, 281–294.[CrossRef][ISI][Medline]

Received August 5, 2004; revised January 25, 2005; accepted January 28, 2005.

Edited by Harold Scheraga





This Article
Abstract
FREE Full Text (PDF)
All Versions of this Article:
18/2/59    most recent
gzi009v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (1)
Request Permissions
Google Scholar
Articles by Liao, H.
Articles by Lustig, B.
PubMed
PubMed Citation
Articles by Liao, H.
Articles by Lustig, B.