1Departments of Physics and Applied Physics and Laboratory for Advanced Materials, Stanford University, CA 94305 and 2Laboratory of Computational Biology, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
3 To whom correspondence should be addressed. E-mail: zhengwj{at}helix.nih.gov
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: fold recognition/linear regression/neural network/small angle X-ray scattering
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Fold recognition (see review by Marchler-Bauer and Bryant, 1999) has been a reasonably effective method by which to identify a probable fold from a fold library for an unknown target protein sequence which has no sequence homologue with a known structure. The standard procedure used is to thread the given sequence on to each candidate fold and evaluate the conformational potential energy which is expected to be minimal for the correct fold (potential based threading). Threading may be done either in gapless mode, where all possible gapless alignments of the target sequence with a given candidate fold are examined, or by making use of multiple sequence alignment using gap penalties, to create an optimal alignment (or alignments) for subsequent energy testing (Jones, 1999
). Recently, attempts have been made to incorporate more sequence-based structural predictions into the fold recognition protocol (David et al., 2000
). As an example, a 1D profile consisting of predicted secondary structural assignments and solvent accessibility is employed to do prediction based threading (Rost et al., 1997
). Sequential information derived from multiple sequence alignment is also helpful in improving the performance of fold recognition (Rykunov et al., 2000
; Williams et al., 2001
).
Besides using sequence-based predictions of structural information to supplement potential-based threading, an alternative approach by which to improve standard threading procedures is to exploit additional structural information derived from experiments such as circular dichroism spectroscopy, which are relatively easy to do in comparison with full-scale structural determination (i.e. based on X-ray crystallography or NMR). In this paper we report on the application of small angle X-ray scattering (SAXS) data as a way to impose physical constraints on threading-based protein structure prediction.
SAXS measures X-ray scattering from a protein in a relatively dilute solution. Thus the measurement of SAXS profiles avoids the need to crystallize the protein. SAXS yields physical information about the internal pair distribution of a molecule in its native state. Svergun et al. (2002) have shown that, given a SAXS profile that extends to 5 Å resolution, it is possible to reconstruct a map giving approximate 3D locations of all the residues in the protein. Hence, despite limitations in resolution resulting from the orientational averaging of the molecules in solution and from practical signal to-noise ratio limitations resulting from radiation damage effects, we believe this physical information has the potential to reduce false positives which naturally occur in fold identification processes based purely on sequence-based information. Recently, we have for the first time explored the application of SAXS-based physical constraints in improving ab initio protein structure prediction (Zheng and Doniach, 2002) and have obtained encouraging results. The present work was motivated by the above preliminary work and was aimed at providing a more comprehensive and in-depth study of this novel method in the context of fold recognition. The following improvements were made compared with the previous work (Zheng and Doniach, 2002
): first, instead of an empirical combination of the SAXS-based fitness scores with the other scores, we attempted more systematic optimizations of the combined scores; second, we tested this method on a significantly larger set of proteins (see Materials and methods).
Following our previous study, we used SAXS-derived structural information to compute a fitness score which evaluates the similarity in SAXS profile between that of the candidate fold (derived computationally from the C representation of the protein) and of the target protein (measured experimentally or simulated computationally). Because SAXS measurements are made on an intact protein (or protein fragment), gapped sequence alignments would not be expected to lead to a strong SAXS similarity (since extra or missing residues in the candidate structure would distort the SAXS profile). Therefore, in this paper we use this score as a supplementary constraint for fold identification that is based on a gapless version of the standard potential energy-based threading procedure. We use both a linear regression-based method (LR) and a neural network-based method (NN) to find optimized combinations of a set of fitness scores. Use of explicit optimization allows us to quantify the performance of the fold identification procedure. We find that the use of an optimized score which includes SAXS information leads to results which are significantly better than those obtained by using each individual fitness score separately and are also significantly better than results obtained by using an optimized combined score without including the SAXS information.
Besides providing an improved fold identification method, the present approach can also be used directly to identify domains which are structurally similar to the target. This is achieved by combining a fold library for fold recognition and a domain library for structural similarity identification. This approach potentially has the capability of recognizing structural homologues or analogues for proteins which are not related by significant sequence similarity.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The protein sequences studied were selected from the list in our previous paper (Zheng and Doniach, 2002) and from the Rosetta test set from Baker's group at the University of Washington (Simons et al., 1999a
), after excluding those irregular targets without well-defined secondary structures. These lists cover a variety of fold classes (
, ß,
/ß) with sequence lengths that vary between 31 and 172. In total we use 11 proteins in our training set and 62 proteins in our test set, which marks a significant extension to the set of sequences studied in our previous work (Zheng and Doniach, 2002
).
Generating candidate structures by threading to the Dali domain library
In the Dali Domain Classification (Holm and Sander, 1998), each domain is assigned a Domain Classification number DC_lmnp representing the fold space attractor region (l), globular folding topology (m), functional family (n) and sequence family (p). We used the Dali Domain Definitions (v3.01) published by Structural Genomics Group at EMBL-EBI in October 2000, which contains 3689 domains with different numbers of DC_lmnp. Given a target protein, we first exclude all domain entries that share the same DC_lmnp number with it because these sequences bear a
25% sequence identity with the target. Then we continuously thread the target sequence on to each domain which has longer sequence length and discard residues which do not overlap the target. Thus for a domain with length L1 and a target sequence with length L0 (L1 > L0), L1 L0 + 1 structural candidates are obtained by threading. A continuous (gapless) threading is not expected to give good residue-wise alignment compared with the dynamic programming-based gapped threading but is much more efficient and sufficient to detect the globally correct folds for most targets we study.
Definition of native-like structures
In order to define a measure of the closeness of a candidate structure to the native structure of a target protein, we define a native-like structure as lying in one of three classes, depending on the overall quality of the set of all generated candidates:
Prescreening
Before doing full-scale structural evaluation, we perform a simple prescreening using the 1D profile consisting of secondary structural assignments (H for -helix, E for ß-strand and X for loop) and HPN-3 letter translation of the sequence (H for hydrophobic, P for polar, N for neutral), where the classification of hydrophobicity follows Huang et al. (1995)
. The secondary structural assignment of both target and candidate fold is obtained by the DSSP program (available at http://www.sander.ebi.ac.uk/dssp/).
The alignment of 1D profile between profile A and profile B is done as follows, where A and B are two sequences of either H/E/X or H/P/N:
Given a residue position i, the score AlignAB(i, i) is 1 (a match) if there exist j [i 1, i + 1] and k
[i 1, i + 1] so that Aj = Bk; otherwise AlignAB(i, i) is 0.
To define FSS and FHPN, we compute the fraction (F) of matches for the whole alignment of 1D profile. We keep structures which satisfy the following criteria: FSS > 0.6 and FHPN > 0.8.
After prescreening, about 104105 candidate structures are kept for further evaluation.
Fitness scores evaluation
We use the following fitness scores to evaluate the candidate structures:
1. Combined hydrophobicity and burial score Fhpb. First we define Fhp (HP fitness score; see Huang et al., 1995) based on the hydrophobic-polar (HP) model which counts pairs of contacts between hydrophobic residues. We define two residues to be in contact if the distance between their C
atoms is <7 Å and they are not sequential neighbors.
Then we define Fburial (burial score; see Huang et al., 1995), which measures the extent to which hydrophobic residues are buried inside the core. It is computed by summing the number of residues within a 10 Å distance cutoff from every hydrophobic residue.
Finally, we combine the above two scores as
![]() | (1) |
2. Statistical contact energy Fstat. We define the statistical energy as the sum of statistical pairwise contact energy between any two residues in contact based on the 20 x 20 matrix. The pairwise residueresidue interaction energy is calculated based on the frequencies of tertiary contacts in a given PDB structure database. We use the table given in Dima et al. (2000), which we have found to work better than the table used in our previous paper (Zheng et al., 2002
).
3. Radius of gyration FRg. We define FRg as the root mean square distance from the center of mass of all C atoms along the C
backbones. This is a useful fitness score for selecting compact structures. Since Rg can be reliably derived from the SAXS data, it is partially overlapping the SAXS score defined later.
4. SAXS fitness score FSAXS. This is defined in the next subsection.
5. 1D profile alignment score: FSS, FHPN. This was defined in the previous subsection.
We make further use of these parameters to construct a combined fitness score in addition to the use in prescreening.
SAXS fitness score evaluation
We adopt the score function used by Walther et al. (2000). The profile of scattering intensity associated with a bead model is given as follows using the Debye equation in its pair-distance histogram form:
![]() | (2) |
![]() | (3) |
![]() | (4) |
Here we simulate IE with all-atom bead model whereas IM is computed based on a C atoms only model without explicit consideration of side chain coordinates, assuming side chain atoms sitting at the same coordinate as the C
atom. This approximation in computing IM may reduce the performance of the SAXS score; however, it also increases the robustness of our approach, which may tolerate some extent of measurement errors.
Structural alignment
CRMS1 and CRMS0.8
We use the standard coordinate RMSD (cRMS) to do structural comparisons between our predicted backbone and the corresponding native C backbone (McLachlan, 1971). This is done by superimposing the above two structures on to each other and minimizing the RMS deviation between 100% or 80% of all the residues. We try both the given C
backbone and its mirror image in the computation of cRMSD and keep the minimum value of cRMS.
LGA
The LGA program was developed by Zemla for structural comparative analysis of two protein structures (Zemla, 2003). We use LGA to search for the largest (not necessarily continuous) set of equivalent residues between a candidate structure and its native structure deviating by no more than DIST = 5 Å. We use the quality score LGA_Q (Zemla, 2003
) to assess the structure comparison.
Linear regression
Given a set of N fitness scores Fi (i = 1, 2,..., N), we determine a linearly weighted sum of them (FLR) by fitting the following linear regression model of the form (Simons et al., 1999b
):
![]() | (5) |
![]() | (6) |
We construct a training set of structures: {S(t, j) | 0 t < T, 0
j < N} for T targets and N structures per target, then we minimize the following squared error:
![]() | (7) |
![]() | (8) |
![]() | (9) |
![]() | (10) |
![]() | (11) |
The A matrix is properly regulated so that it is non-singular and the above linear equation is uniquely solvable.
Multi-layer feed forward neural network
We use a typical three-layer feed-forward neural network (Figure 2) to do fold recognition: the input layer consists of six neurons corresponding to six fitness scores to be compiled for evaluation. The scores are rescaled by a sigmoid function f(x) = 1/(1 + ex) to values between 0 and 1 at the input layer. The hidden layer has five neurons which is sufficient for six input variables and the output layers has two corresponding to positive and negative, respectively. Then we compute the ratio between them and rank the candidates with this ratio P/N: the higher it is, the more favorable is the candidate.
|
The training is performed using the standard back-propagation algorithm and all link-associated weights are adjusted as a result of the learning process. The training set is composed of 11 proteins from Set (A). For each protein from the training set, 5000 candidate structures are extracted from its set of all candidates as ranked by their cRMS1, which includes all the native-like candidates with cRMS1 < 6 Å. The choice of 5000 results from a tradeoff between computing efficiency and the diversity of training data. The target values for both outputs are functions of cRMS1: Positive output is set to 1 if cRMS1 < 4 Å, 0 if cRMS1 > 6 Å and linearly interpolated in between; the negative output is set to 1 minus the positive output.
The learning process goes through the training set multiple times until 90% of the training targets have at least one native-like candidate ranking in top 10 by the ratio P/N. This choice of learning termination criteria ensures sufficient training without over-learning.
The validation of performance is done by running the neural network on a test set of 32 proteins from Set (A).
![]() |
Results and discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The method used in this paper consists of the definition of a number of fitness scores with which to assess an alignment of a target sequence with a set of 104105 candidate structures generated by gapless threading against the set of folds in the Dali domain library. An optimized combination of these fitness scores is then developed by use of two optimization methods, linear regression and neural network based, on a training set of 11 target sequences.
Once these optimized combinations of fitness scores have been generated, we apply them to a set of 62 test sequences for which we generate 104105 candidates per target sequence. We then assess the performance of the fitness scores taken individually and of their optimized combinations, by computing their average Z-score for the native-like subset of candidate structures (see Materials and methods for the definition of native-like) and by finding the best Z-score rank of the native-like candidate structures.
As another measure of the effectiveness of the optimized fitness scores, we also determine if at least one of the structures with the top 10 Z-scores are structural neighbors to the target structure, as measured by the Dali structure alignment tool.
Generating candidate structures via gapless threading
To generate a set of candidate structures for training and evaluation of our fitness scores, we perform gapless threading of each of the target sequences against the Dali domain library (Holm and Sander, 1998), by a procedure which is described in detail in Materials and methods. The results are collected in Table I. This procedure generates sets of 104105 candidates for each sequence. For small proteins with sequence length <80 these candidate sets are found to contain native-like structures of type (A) (cRMS1 < 6 Å; see Materials and methods). For longer sequences (>90) the candidate sets contain structures with partially good structural alignments of type (C) (detectable by the LGA structure alignment tool with LGA_Q > 1.9, meaning significant structural similarity; see Materials and methods).
|
We select 11 proteins from the (A) set to serve as a training set for both the linear regression and neural network procedures. The rest of the targets are used as a test set for evaluating the performance of our SAXS-aided fold recognition protocol. Efforts are made to ensure that no protein in the test set is sequence homologue (with >25% sequence identity) of any protein in the training set.
Z-score evaluation of individual scores
In order to select native-like structures from the set of candidates, we need to define fitness scores (see Materials and methods) that are capable of discriminating them against non-native ones. It is then desirable to combine these scores to optimize the overall performance.
Before exploring the combination of multiple score functions we first study them individually. In total six fitness scores (Fhpb, FRg, FSAXS, Fstat, FSS and FHPN) are used, which are described in detail in Materials and methods. They can be classified into three types: energy based (Fhpb which essentially evaluates how good the hydrophobic resides are buried inside a compact core and Fstat which is a statistical pair-wise contact energy derived from a protein structure database), 1D profile based (secondary structure assignment profile score FSS and hydrophobic-polar profile score FHPN) and SAXS based (FRg and FSAXS). The main purpose of this study is to focus on the evaluation of the SAXS-based scores and their ability, in combination with the other scores, to improve the overall discrimination power of fold recognition.
For a given fitness score F and a given native-like structure s, we can define the following Z-score:
![]() | (12) |
In Table I of the Supplementary data (available at PEDS Online), we list the average and the optimal Z-scores and the best Z-score rank of the native-like structures for each individual fitness score F. One can see that FSAXS (with average Zavg = 0.776) and FRg (with average Zavg = 1.289) do possess a good discrimination power to select native-like structures and that they are comparable to Fhpb (with average Zavg = 0.906) and Fstat (with average Zavg = 0.739). Therefore, SAXS-based scores indeed have the potential to help to improve the selection of native-like structures in combination with the other more standard score functions.
Linear regression: performance evaluation of FLR
To find an optimal linear combination of the individual scores that we have just evaluated, we use the linear regression (LR) method (Simons et al., 1999b
), which is a simple and effective way of optimizing linear decision making. The motivation is to minimize the overall square deviation between a linear combination of all scores and a prediction quality function (see Materials and methods).
The coefficients for the optimal linear combination are evaluated for the training set of 11 target proteins by minimizing this function when averaged over those 5000 candidate structures closest in cRMS to each of the targets in the training set as explained in Materials and methods.
To evaluate the significance of SAXS scores in addition to other standard score functions, we run an LR for all the score functions excluding SAXS scores (FSAXS and FRg) and then compare it with the LR results obtained when all score functions are included. Here is a summary of the results.
On average, the addition of the SAXS scores improves the Z-scores of FLR from 2.066 to 2.319. Assuming that FLR follows a Gaussian distribution approximately, then this improvement corresponds to a reduction of the p-value from 0.019 to 0.01 (or roughly by a factor of 2), which is fairly significant.
Out of 11 targets in the training set, 11 (100%) show better FLR performance than any individual score F and 10 (90.9%) show better performance for FLR with SAXS information than without it.
Out of 32 targets in the test set [also from Set (A)], 19 (59.4%) show better FLR performance than any individual F and 24 (75%) show better performance for FLR with SAXS information than without it. Therefore, LR provides a reasonably optimal way of combining multiple fitness scores into one score and manages to get the best of all performance in most cases. Furthermore the incorporation of SAXS information improves LR's performance further with high probability (75%). Notably, in most of the cases where FSAXS fails to improve the performance further, FLR has already achieved a good Z-score without SAXS data.
In the light of the significantly better performance of FLR, it is natural to ask how much each individual score contributes to this improvement. To shed some light on this issue, we also show the linear correlation coefficient between each individual score F and FLR which measures the relevance of each F to FLR (Table II). It is evident that FSAXS [average correlation coefficient (c.c.) = 0.367] and FRg (average c.c. = 0.584) correlate better with FLR than the other energy-based scores such as Fhpb (average c.c. = 0.104) and Fstat (average c.c. = 0.100). This suggests that FLR's significant improvement in discrimination of native-like structures is to a substantial extent due to the contribution of SAXS information.
|
Neural network: performance evaluation of FNN
Neural networks (NNs) have found extensive application in bioinformatics for their well-known capability of learning complicated patterns of relationships among multiple variables characteristic of biological knowledge of gene sequences and structures. There has been some application of NNs in fold recognition (Jones, 1999; Ding and Dubchak, 2001
). Here we use a typical three-layer feed-forward NN to explore an optimal exploitation of the same six fitness scores used in LR (including SAXS scores). In comparison with LR, which is a typical linear decision procedure, non-linearity is introduced in NNs with the use of the sigma function (see Materials and methods), therefore it is not limited simply to producing a weighted linear combination of the original variables and is thus potentially more flexible in capturing complex patterns. The NN in use has six input variables corresponding to six scores: Fhpb, FRg, FSAXS, Fstat, FSS and FHPN; each is normalized by subtracting the statistical average and then dividing by the standard deviation. There are two outputs, one corresponding to positive and the other negative. To make comparisons with LR's combined score function FLR, we introduce a new score function which is the ratio between the positive output and the negative one and rank structure candidates with this ratio FNN. Similarly to the evaluation procedure used in FLR, we run the NN training and test with and without SAXS scores for comparison. In Table 1 of the Supplementary data, we list the Z-scores of FNN. The training set for NN is the same as that used for LR. Here is a summary of the results.
On average, the addition of the SAXS scores improves the Z-scores of FNN from 1.550 to 2.033. Again assuming that FNN follows a Gaussian distribution approximately, then this improvement corresponds to a reduction of the p-value from 0.0606 to 0.0212 (or roughly by a factor of 3), which is fairly significant.
Out of 11 targets in the training set, 11 (100%) show better FNN performance than any individual score F and 10 (90.9%) show better performance with SAXS information than without it.
Out of 32 targets in the test set [also from Set (A)], 21 (65.6%) show better FNN performance than any individual F and 24 (75%) shows better performance with SAXS information than without it. Therefore, NN shows a similar improvement to that found for LR and again SAXS is shown to be valuable in helping to improve the performance of the NN.
Testing FLR and FNN in native-like structure selection
After obtaining the optimal compilation of our fitness scores, we tested their performance in discriminating native-like structures from the candidate sets generated by our threading protocol. We list the best Z-score rank of native-like structures in Table III.
|
Testing FLR and FNN in structural neighbor identification
As an alternative test of the effectiveness of the performance of FLR and FNN, we measured which of the candidate structures in the top 10 of the Z-score ranked structures is also a structural neighbor (SN) of the actual protein as measured by the Dali structure alignment tool (alignment Z-score >2). This is a more challenging task than finding structures with low cRMS because the SNs are more remotely related to the target structure and the simple cRMS1 does not detect the partial structural similarities that are detected by the Dali structural alignment. Since our scores are based mostly on the structure as a whole and are sensitive to possible fragmentation of the structure, their ability to discriminate native-like partial structural features is expected to be weaker.
In spite of this, the results in Table III still show that we have achieved a moderate success with the identification of correct SN's in the top 10 Z-sore candidates: in seven (six) out of all 11 targets from the training set, at least one candidate from a correct SN is ranked in top 10 by FLR (FNN). In 11 (11) out of all the 16 targets for which there exist correct SNs in the set of all candidates from the test set (the rest of set A), at least one native-like structure is ranked in top 10 by FLR (FNN). In 10 (11) out of all 26 targets from the harder test set [Sets (B) and (C)], at least one native-like structure is ranked in the top 10 by FLR (FNN). This suggests a success rate of SN identification to be between 60 and 70% for relatively easy targets, whereas for harder targets it drops to 40%, which is still reasonably good.
We also give the p-values for the successful cases in Table III to assess the statistical significance of selecting an SN in the top 10. For some of the target proteins, the p-value is relatively high because of the large number of SNs for those proteins; for most others, the p-value is fairly low and suggests high statistical significance.
Compared with the previous test on native-like structure selection, this test is more relevant in the context of functional genomics based on structural homology relations. As is well known, a specific biological function of proteins is in general executed by a limited number of specific structural features (such as an enzyme's binding site) which are only part of the native structure as a whole. Therefore, the conservation of such partial structural features rather than the whole structure is more relevant to the conservation of function. In this context the present SN selection protocol seems to be fairly promising.
Applications of structural neighbor identification
The identification of correct SNs can provide clues to the functional study of a target protein. To illustrate this, we now discuss several such examples for targets we have studied for which correct SNs are selected and where we see interesting functional connections:
In summary, the above examples demonstrate that conservation in protein structures may imply evolutionary relationships and that structurally similar proteins may possibly share similar or related functions. Therefore, by identifying SNs which are structurally similar to a given target, we may gain some insight regarding the biochemical function of the target. Work in this direction is expected to be very fruitful.
Conclusion
We have carried out a systematic study of the use of structural information derived from SAXS measurements to improve fold recognition. The SAXS data for a target protein can serve as a structural fingerprint of its native conformation and can therefore be used to construct a similarity-based fitness score to evaluate candidate structures generated by threading. To combine the SAXS scores with the standard energy scores and other 1D profile-based scores, we have used both a linear regression method and a neural network approach from which we obtain optimal combined fitness scores and apply them to the ranking of candidate structures. Our results show that the use of SAXS scores combined with gapless threading significantly improves the performance of fold recognition. We also demonstrate the effectiveness of this protocol in selecting structural neighbors of target proteins, which can potentially aid the study of their biochemical functions.
The above results support the idea that SAXS-based fitness scores should contain newer structural information than the energy-based scores since the energy scores only take into account of spatially short range native contacts (with inter-residue distance <7 Å) whereas the SAXS profile contains distance distribution information up to the size of the protein (although residue identities are not resolved). Indeed, at the angle cutoff of Smax = 0.12 Å1, the SAXS measurement is able to resolve the shape information (but not the detailed secondary structures). Therefore, besides the compactness information from Rg, the additional filtering capacity of FSAXS is mostly due to the shape information encoded in the SAXS data. Therefore, the performance of FSAXS for a given target protein may depend on the uniqueness of its shape.
To improve the SAXS-aided fold recognition further, it is desirable to replace gapless threading with more sophisticated gapped threading algorithms with inputs from the multiple sequence alignments (e.g. by PsiBlast; see Altschul et al., 1997). This will significantly enrich the native-like structures in the generated set of candidate structures compared with those obtained by gapless threading. We note that the threading-derived sequencestructure alignments must be further used to build a set of complete structural models before the SAXS scores can be assessed. This is not a straightforward task and may need ab initio modeling for those parts of the target protein for which no significant alignment with known structures is found.
In addition to the obvious application of this approach in the post-structural genomics age to help in the identification of the structures of specific genome sequences, it also has potential applications in the implementation of structural genomics projects. Given a set of proteins which have been shown by sequence alignment search to lack sequence homology to proteins of known structure, the use of SAXS data as an input, together with a fold recognition protocol, may be applied to identify a significant number of targets with structural similarity to known proteins even though they lack sequence homology. This approach will then help in target prioritization, either by confirming the putative structural homologues or analogues identified by the SAXS-based threading procedure or by suggesting target sequences with hitherto unknown folds. The SAXS-based technique may therefore help in reducing bottlenecks in high-throughput genomics projects by focusing attention on targets of specific biological or structural interest.
For future work, we plan to improve the SAXS-based protocol by using more accurate models which include side chains and other backbone atoms, in combination with experimentally obtained SAXS data, which may be complicated by measurement errors and the effects of hydration.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F.,Jr, Brice,M.D., Rogers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977)J. Mol. Biol., 112, 535542.[ISI][Medline]
Burley,S.K. (2000) Nat. Struct. Biol., 7, Suppl., 932934.[CrossRef][Medline]
David,R., Korenberg,M.J. and Hunter,I.W. (2000) Pharmacogenomics, 1, 445455.[CrossRef][Medline]
Dima,R., Settanni,G., Micheletti,C., Banavar,J. and Maritan,A. (2000) J. Chem. Phys., 112, 91519166.[CrossRef][ISI]
Ding,C.H. and Dubchak,I. (2001) Bioinformatics, 17, 349358.[Abstract]
Holm,L. and Sander,C. (1998) Proteins, 33, 8896.[CrossRef][ISI][Medline]
Huang,E.S., Subbiah,S. and Levitt,M. (1995) J. Mol. Biol., 252, 709720.[CrossRef][ISI][Medline]
Jones,D.T. (1999) J. Mol. Biol., 287, 797815.[CrossRef][ISI][Medline]
Marchler-Bauer,A. and Bryant,S.H. (1999) Proteins, 37, 218225.[ISI][Medline]
McLachlan,A.D. (1971) J. Mol. Biol., 61, 409424.[CrossRef][ISI][Medline]
Rost,B., Schneider,R. and Sander,C. (1997) J. Mol. Biol., 270, 471480.[CrossRef][ISI][Medline]
Rykunov,D.S., Lobanov,M.Y. and Finkelstein,A.V. (2000) Proteins, 40, 494501.[CrossRef][ISI][Medline]
Simons,K.T., Bonneau,R., Ruczinski,I. and Baker,D. (1999a) Proteins, 3,171176.
Simons,K.T., Ruczinski,I., Kooperberg,C., Fox,B.A., Bystroff,C. and Baker D. (1999b) Proteins, 34, 8295.[CrossRef][ISI][Medline]
Stevens,R.C., Yokoyama,S. and Wilson,I.A. (2001) Science, 294, 8992.
Svergun,D.I., Petoukhov,M.V. and Koch,M.H. (2001) Biophys. J., 80, 29462953.
The Genome International Sequencing Consortium (2001) Nat. Biotechnol., 409, 860921.[CrossRef]
Venter,J.C. et al. (2001) Science, 29, 13041351.
Walther,D., Cohen,F.E. and Doniach,S. (2000) J. Appl. Crystallogr., 33, 350363.[CrossRef][ISI]
Williams,M.G. et al. (2001) Proteins, 45, Suppl. 5, 9297.[CrossRef]
Zemla,A. (2003) Nucleic Acids Res., 31, 33703374.
Zheng,W.J. and Doniach,S. (2002) J. Mol. Biol., 316, 173187.[CrossRef][ISI][Medline]
Received November 10, 2004; revised March 7, 2005; accepted March 25, 2005.
Edited by Fred Cohen
|