Automated clustering of ensembles of alternative models in protein structure databases

Francisco S. Domingues1, Jörg Rahnenführer and Thomas Lengauer

Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany

1 To whom correspondence should be addressed. E-mail: doming{at}mpi-sb.mpg.de


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Experimentally determined protein structures have been classified in different public databases according to their structural and evolutionary relationships. Frequently, alternative structural models, determined using X-ray crystallography or NMR spectroscopy, are available for a protein. These models can present significant structural dissimilarity. Currently there is no classification available for these alternative structures. In order to classify them, we developed STRuster, an automated method for clustering ensembles of structural models according to their backbone structure. The method is based on the calculation of carbon alpha (C{alpha}) distance matrices. Two filters are applied in the calculation of the dissimilarity measure in order to identify both large and small (but significant) backbone conformational changes. The resulting dissimilarity value is used for hierarchical clustering and partitioning around medoids (PAM). Hierarchical clustering reflects the hierarchy of similarities between all pairs of models, while PAM groups the models into the ‘optimal’ number of clusters. The method has been applied to cluster the structures in each SCOP species level and can be easily applied to any other sets of conformers. The results are available at: http://bioinf.mpi-sb.mpg.de/projects/struster/.

Keywords: alternative models/clustering/distance matrix/hierarchical clustering/structure classification


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Different databases are currently available for the classification of proteins according to their structural and evolutionary relationships. In particular, SCOP (Murzin et al., 1995Go), CATH (Orengo et al., 1997Go) and Dali (Holm and Sander, 1996Go) provide convenient access to the known structures deposited in the Protein Data Bank (PDB) (Berman et al., 2000Go) and facilitate the understanding of the structural and evolutionary relationships between proteins. In addition, they provide a definition of protein domains based on structural or evolutionary relationship. The SCOP and CATH databases provide a hierarchical classification of the domains in different levels, where the lower levels correspond to a structural domain from a protein of a certain organism (SCOP level Species) or from a set of domains with identical sequence (CATH level Identical). In the Dali database, the protein structures are grouped according to sequence similarity. For each group a representative is selected and used to perform an all-against-all 3D structure comparison.

It is now common that alternative experimental models of the same protein are available in the PDB. These models can display considerable structural differences as, in general, they correspond to different crystal forms, different physicochemical conditions, different mutants and proteins forming different complexes or binding to different ligands. Currently no database is available that classifies these alternative models according to their structure relationships. In the current SCOP release (1.65, December 2003), 75% of sets at the species level have at least two entries (two or more alternative structural models for a protein domain) and on average there are 6.7 entries per species level. The problem is becoming more relevant as the average number of alternative structures for a given domain is expected to increase.

In recent years, clustering techniques have been applied to classify protein structures. Clustering has been used to automate the classification of proteins in different folds and families or in the analysis of the trajectories from molecular dynamics simulations; see, for example, May (1999), Laboulais et al. (2002)Go and Choi et al. (2004)Go. The method of Carugo and Pongor (2002)Go, in particular, is based on the comparison of the distributions of C{alpha} coordinates between two proteins to estimate the structural similarity of different proteins in an efficient way. Clustering techniques have also been applied to determine representatives for ensembles of NMR-derived structures (Kelley et al., 1996Go). The all-atom root-mean-square deviation (rmsd) is used as a dissimilarity measure for hierarchical clustering. The models are grouped into different clusters and a representative for each cluster is given (model closest to the centroid of each cluster). The OLDERADO database (Kelley and Sutcliffe, 1997Go) provides the results for the NMR ensembles available in the PDB. So far, these methods have not been applied to clustering alternative structure models (determined by protein crystallography or NMR spectroscopy) available for each protein domain available in the PDB.

Here we propose STRuster, a method for clustering alternative structural models corresponding to different structure determination experiments. The structures are classified according to backbone structure similarity using C{alpha} distance matrices. The dissimilarity measure used for clustering is based on the Euclidean distance for each pair of C{alpha} coordinates. Filters are applied in order to render the method sensitive to a wide range of backbone conformational changes. The method has been applied to each SCOP species level and the results are available online. These results can be useful for guiding further structure determination experiments, in the design and interpretation of mutational experiments, in the selection of models for docking or in the selection of templates for structure prediction. More generally, the results can be useful in the selection of a non-redundant set of protein structures.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Structure data set

The SCOP classification database, release 1.65 (December 2003), is used. Analysis is restricted to the first seven classes, the true classes. Therefore, we excluded coiled-coil proteins, low-resolution protein structures, peptides and designed proteins, which are not true classes. Of 7705 sets corresponding to the SCOP species level, 4187 with at least three domain structural models (entries) are clustered. For each set, entries are compared with the one with the largest number of residues. Only the entries with at least 80% of the number of residues and at least 90% sequence identity to the largest entry are considered. The program ALIGN (Myers and Miller, 1989Go) is used to align the sequences and to determine the equivalent residues. The final number of species sets to be clustered is 3716, comprising a total of 36 531 entries. The ASTRAL SCOP 1.65 PDB-style files (Chandonia et al., 2004Go) are used as the source of the coordinates for each SCOP entry.

Dissimilarity measure

For each SCOP species set, the C{alpha} distance matrices are calculated for all entries. Consider the C{alpha} coordinates for residue i, (xi, yi, zi). The Euclidean distance between the C{alpha} atoms of residues i and j in entry a is . In order to reduce theinfluence of differences in large distances associated with extensive conformational changes, a first filter is applied with cut-off F1, resulting in D'ij(a):

For each pair of entries a, b the absolute difference is then calculated for each residue pair: . Only residues that can be aligned to the largest entry in the set (used as reference) are considered. If one of the entries includes an insertion, the corresponding residues in the insertion are not considered. A second filter is then applied with cut-off F2 in order to restrict the analysis to ‘significant’ structural differences:

In this study, we set F1 = 14.0 and F2 = 1.0. The matrix M contains the dissimilarity values of all pairs involving the N entries in the set, where M(a,b) corresponds to the dissimilarity between entries a and b with L aligned residues:

Clustering

The R programming environment for data analysis (version 1.8.1) is used for clustering (http://www.r-project.org/). The dissimilarity matrix M is used as input for two alternative clustering methods. The first is a hierarchical method, using group average agglomeration (Gordon, 1999Go). Dendrograms are generated for visualization of the hierarchical dependencies in the data. The second method, partitioning around medoids (PAM), is applied in order to obtain the optimal number of clusters where the entries are grouped in a robust way (Kaufman and Rousseeuw, 1990Go; Struyf et al., 1997Go).

PAM is a partitioning algorithm and can be regarded as a generalization of K-means clustering to arbitrary dissimilarity matrices. The goal is to minimize the objective function , where the sum is taken over all entries a1,...,aN in the protein set and m1,...,mk are k appropriately chosen representatives (medoids) from the set. The algorithm consists of two steps. In the BUILD step, k initial medoids are sequentially selected. In the SWAP step, the objective function is minimized iteratively by replacing one medoid with another entry. This step is repeated until convergence.

The silhouette width value is a measure of cluster validity (Rousseeuw, 1987Go) and is used to select the best number of clusters obtained with the PAM algorithm. Assume that we have a clustering of N protein entries into k clusters, such that an entry a belongs to cluster C of size r. The average dissimilarity between a and all other entries in cluster C is

The average dissimilarity of a to all entries b that belong to another cluster U != C of size t is

The dissimilarity between a and the closest cluster that is different from C can be defined as

The silhouette width s(a) for entry a and the average silhouette width for the set are defined as


Silhouette values lie in the range [–1, 1]. Entries with a silhouette value s(a) close to 1.0 are well clustered, in the sense that the average distance to entries in the same cluster is small, compared to average distance to the closest other cluster. If the silhouette value is smaller than 0, the entry is misclassified. PAM clustering is applied for all numbers of clusters k between 1 and N – 1 and the corresponding average silhouette values are calculated. The best clustering corresponds to k* number of clusters: .


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The size distribution of the SCOP sets is analyzed first. The dissimilarity measure used for clustering is then compared with the rmsd after superposition. Three clustering examples are presented. In the first example the clustering is based on small structural differences. In the second example, there is both a large conformational change and a local structural difference. The third example illustrates how the silhouette width can be used as a measure of cluster quality.

Distribution of set size and dissimilarity values

Figure 1 shows the histogram of sizes of the protein sets that have been clustered. Most of the sets (76%) have ≤10 entries. The percentage of sets with >40 entries is small (2%). The largest set, with 309 entries, results from the extensive protein engineering work on bacteriophage T4 lysozyme, with SCOP unique identifier (sunid) 53983. All these sets have been clustered and the results are available online at http://bioinf.mpi-sb.mpg.de/projects/struster/.



View larger version (10K):
[in this window]
[in a new window]
 
Fig. 1. Distribution of the number of entries on the SCOP sets corresponding to the species level. For convenience, only sets with <40 entries are displayed. The largest set has 309 entries.

 
Table I gives the distribution of the maximum dissimilarity values M(a,b) observed in each of the 3716 sets clustered. For 10% of the sets there is no dissimilarity. Almost 50% of the cases have considerable dissimilarity (100–1000) and very high dissimilarity (>1000) is observed in 10% of the sets.


View this table:
[in this window]
[in a new window]
 
Table I. Maximum dissimilarity in the clustered sets

 
Dissimilarity versus rmsd

The rmsd after rigid-body superposition is a popular measure for expressing the structure similarity between proteins. Let d be the distance between each pair of equivalent atoms in two optimally superposed structures. The rmsd over n equivalent atoms is defined as

In the present work, the structural models are clustered based on a measure of dissimilarity M(a,b) between structures. This measure is sensitive to both large and small (but still significant) backbone conformational differences. It reflects the extent of significant differences (>1 Å) in short- to medium-range intramolecular distances between the two structures a and b. The value for normalized dissimilarity R(a,b) = M(a,b)/L2 is easier to interpret. It is independent of the number of equivalent residues L in the set and its value lies in the interval between 0 (identity) and 1 (maximum difference).

Figure 2 shows the relationship between rmsd and both the dissimilarity measure M(a,b) (Figure 2A) and the normalized dissimilarity R(a,b) (Figure 2B). There is considerable positive correlation in both cases (0.813 for A and 0.815 for B).



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 2. (A) Dissimilarity M(a,b) versus rmsd on a logarithmic scale. For each of the 3716 sets, the rmsd was calculated for the pair of entries with largest value of M(a,b). A smoothed quantile line is shown, corresponding to a sliding window of 1% quantiles of the rmsd values, calculated over a window size of 0.1 on a logarithmic scale. Evaluations are made at 100 equidistant points and the resulting quantiles are smoothed with a lowess function (local linear scatter plot smoother). (B) Analogously, normalized dissimilarity R(a,b) versus rmsd on a logarithmic scale.

 
The positive correlation is significant considering that rmsd and M measure different levels of structure similarity. In particular, rmsd is applicable for measuring small (all atom) to medium structural differences, while M captures the extent of backbone conformational changes, including large changes in conformation. In contrast, rmsd values spread considerably when large conformational differences are observed. It can also be observed from Figure 2 that for a given dissimilarity value, there is a clear lower limit for rmsd. A smoothed 1% quantile line is shown for both plots giving the lower rmsd limit for given values of M(a,b) and R(a,b).

D-2-Deoxyribose-5-phosphate aldolase

The structure models for Escherichia coli D-2-deoxyribose-5-phosphate aldolase provide a first example of the clustering analysis. Figure 3 shows the structural superposition of the six structures found in the corresponding SCOP species level (sunid 69395) and the dendrogram obtained from hierarchical clustering. The models present small structural differences. Nevertheless two clusters, C1 and C2, can be distinguished in the dendrogram. The same two clusters are produced by PAM clustering, with an average silhouette width of 1.0, indicating two well-differentiated clusters. Larger numbers of clusters produce lower average silhouette width values.



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 3. (A) Dendrogram for hierarchical clustering of SCOP PDB entry domains for the species level of deoxyribose phosphate aldolase from Escherichia coli (SCOP sunid 69395). One can observe two clusters C1 and C2. Horizontal scale in dissimilarity units M(a,b). Entries are identified by the SCOP sid code. (B) Multiple structure comparison of the six models displayed as C{alpha} trace. Entries from C1 are in light gray, entries from C2 in black. There is a deviation of the backbone conformation for serine 238 (next to the labels). This residue makes a main-chain contact with the substrate (coordinates from substrate in djcla_ are shown in space fill). All multiple structure comparisons are generated with MULTIPROT (Shatsky et al., 2002Go) and the molecular graphics are produced with RasMol (Sayle and Milner-White, 1995Go).

 
It is noticeable from Figure 3A that there are no significant differences in the C{alpha} atom conformations, except for two residues in the loop before the C-terminal helix (residues 238–239). In this region the structures d1ktna_ and d1ktnb_ (matching cluster C1) have small but significant backbone deviations from the remaining models (For residue 238, C{alpha} distances of 1.7 Å in the superposition). The dendrogram reflects these small backbone differences. It is noticeable that the models in C1 (but not in C2) include the substrateD-2-deoxyribose-5-phosphate attached to the binding pocket. In fact, residue 238 makes a contact to the substrate in the C1 models (Heine et al., 2001Go). As the substrate is not present in the C2 models, there is a small conformational change in this last loop. This translates into a more open conformation at the entrance to the binding pocket.

Serum transferrin

Transferrins are responsible for sequestering and solubilizing iron. In particular, serum transferrin binds Fe(III) in the blood and transports it to cells, where it is released at low pH into the endosome. The iron-free apotransferrin is then recycled back to circulation. Vertebrate transferrins consist of a single polypeptide chain with a twofold internal repeat, resulting in two homologous lobes (N- and C-lobes). The lobes contain similar iron-binding sites located in a deep cleft between two {alpha}/ß subdomains. A single Fe(III) is bound in this cleft to four amino acid side chains and to a ion. The SCOP species level of human serum transferrin (sunid 53899) includes 20 entries for the N-lobe.

Figure 4 shows the corresponding dendrogram and the structure comparison of the different models. Two major clusters separated by a large dissimilarity value are visible in the dendrogram, C1 and C2 (Figure 4A). In fact, these two clusters correspond to the iron-free apo form (C1) and to the iron-binding holo form of transferrin (C2). The apo form is transformed into the holo form by a large (63°) rigid-body rotation of one of the {alpha}/ß subdomains relative to the other, resulting in the opening of the iron-binding cleft (see Figure 4B).



View larger version (34K):
[in this window]
[in a new window]
 
Fig. 4. (A) Dendrogram for hierarchical clustering of SCOP entries for the species level of human serum transferrin (SCOP sunid 53899). All structural models correspond to the N-lobe. Two major clusters can be observed (C1 and C2). Within C2 two additional subclusters are noticeable (C21, C22). (B) Multiple structure comparison of the 20 models. Models shown in dark gray correspond to the apo form and match cluster C1. Models in light gray correspond to the iron-binding holo form and match cluster C2. The iron atom and the associated carbonate are shown in space-fill. (C) Multiple structure comparison of the C2 models (holo form), detailed view. Differences are observed around residues labeled 140 and 329.

 
Cluster C2 is further subdivided (C21 and C22). These two clusters correspond to structural models obtained from two different crystal forms, one in the tetragonal P41212 space group (C21 cluster) and the other in the orthorhombic P212121 space group (C22 cluster). Significant differences between the structures in the two clusters can be observed around a surface loop (residues 136–145) (see Figure 4C). Residues in this loop form a crystal contact in the tetragonal form (C21 cluster) but not in orthorhombic form (C22 cluster). Other differences can also be observed in the C-terminal region (328–331) and in the 307–309 region; see the original publication for details (MacGillivray et al., 1998Go).

Within each of these clusters (C1, C21 and C22), the backbone differences are small and the different models correspond to single residue mutants, different crystallization conditions or different expression systems. In particular, within the C1 cluster, one can observe two subclusters with low dissimilarity. They match almost identical structures (1btj and 1bp5 PDB entries) derived from two closely related crystal forms (Jeffrey et al., 1998Go).

Figure 5 shows the values of average silhouette width for PAM clustering from one to 19 clusters. Best clustering is achieved with three clusters with a high average silhouette width (0.939) indicating a clear separation between the clusters. The three clusters correspond to C1, C21 and C22. The representatives are d1bp5d_, d1a8f__ and d1n84a_ for C1, C21 and C22, respectively.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 5. PAM clustering for human serum transferrin (SCOP sunid 53899). Average silhouette width values for different number of clusters. Maximum value (0.939) for three clusters, corresponding to clusters C1, C21 and C22. Second highest value for two clusters (0.892), corresponding to C1 and C2.

 
Glucose dehydrogenase

The examples so far demonstrate ensembles which can be clearly classified into different types of structures. It is also possible that the structural neighborhood of a model (the other closely related models) is less clearly defined. This is the case when the structural differences in the set are more continuous or when the structural neighborhood of a given model varies along the polypeptide chain. These cases are associated with a lower silhouette width. The clustering of glucose dehydrogenase structures from Bacillus megaterium provides such an example. The protein consists of a tetramer with four identical subunits. The corresponding SCOP species level (sunid 51785) includes four sequence-identical entries from the same PDB file (Yamamoto et al., 2001Go).

The C{alpha} conformation is very similar in these four entries. The only significant difference is located in a flexible surface loop (Arg39–Asp44) (see Figure 6A). Models d1gcoe_ and d1gcof_ (with a very similar backbone conformation) differ from the other models in this region. In particular, the differences between d1gcoe_ and d1gcof_ on the one hand and d1gcoa_ on the other are clear for residues 41 and 44, but the respective differences to 1gcob_ are only significant for residue 44. From the structural comparison it is clear that d1gcoe_ and d1gcof_ should belong to the same cluster and d1gcoa_ to a different one, whereas d1gcob_ is an intermediate case.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 6. Multiple structure comparison and clustering results for Bacillus megaterium glucose dehydrogenase (SCOP sunid 51785). (A) Multiple structure comparison, detailed view. Entry d1gcoa_ in black, entry d1gcob_ in gray, digcoe_ and d1gcof_ in light gray. Structural differences can be observed for lysine 41 and glutamate 44 (labeled). In the first case, d1gcob_ has a conformation closer to d1gcoe_ and d1gcof_, but in glutamate 44, model d1gcob_ is more similar to d1gcoa_. (B) Dendrogram from hierarchical clustering. (C) PAM clustering results. Silhouette width values for two clusters. d1gcoa_ and d1gcob_ cluster together again, but the average silhouette value for d1gcob_ is lower (0.654 for d1gcob_), indicating it is not as well clustered as the other entries.

 
The dendrogram in Figure 6B provides one solution, where d1gcob_ clusters with d1gcoa_, but the poor cluster assignment of d1gcob_ is not evident in this type of representation. PAM results (Figure 6C) give a best partition for two clusters, with d1gcob_ and d1gcoa_ clustering together in agreement with the dendrogram. The poor cluster assignment of d1gcob_ is reflected in the lower average silhouette value for d1gcob_.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
We have provided a method for clustering of protein structures that uses differences in intramolecular distances for the calculation of a dissimilarity measure. Two filters are applied in this calculation. The first introduces a cut-off for large distances. This is especially important in the case of a rigid-body motion of two subdomains as illustrated by the transferrin example. Here, the two subunits maintain the same conformation, only the relative orientation changes. Considering the differences over large distances would otherwise result in very high dissimilarity values, which does not reflect the fact that the subunits have the same conformation. Therefore, the large change in the orientation of the two subunits can mask other smaller structural differences. The second filter places a threshold on the distance differences, therefore discarding small differences that can result in additional noise. The comparison of the clustering results obtained with and without filters illustrates this effect. We use the human serum transferrin set (SCOP sunid 53899) again as an example. With the filters set to F1 = 14.0 and F2 = 1.0 as described before (Figure 4A), one can observe two main clusters (C1 and C2) and two subclusters (C21 and C22). Figure 7A shows the resulting dendogram when these filters are not applied. In this case the subclusters C21 and C22 are not evident. This observation is confirmed by the PAM clustering results, as the maximum average silhouette width is obtained for two clusters (C1 and C2). Similar PAM results are obtained when rmsd is used as a dissimilarity measure. Figure 7B gives the corresponding hierarchical clustering results. The subclusters are now more noticeable than in Figure 7A, but still not as evident as when the filters are applied (see Figure 4A), as in the method proposed.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 7. Dendrogram for hierarchical clustering of human serum transferring (SCOP sunid 53899). (A) Clustering based on dissimilarity values M(a,b) without any filters. (B) Clustering based on rmsd.

 
Two clustering techniques have been applied, hierarchical and PAM clustering. These different methods provide complementary views on the relationships observed. The former method reflects the hierarchy of structure similarity (or dissimilarity) between the entries, where the dendrogram provides an overall view of the structure relationships. The latter method (PAM) provides an objective partitioning of the entries into the optimal number of clusters. In addition, the silhouette width value provides a measure of the quality of the clustering. Other clustering approaches could be tested: single-linkage or complete-linkage hierarchical clustering, divisive hierarchical clustering or maximum-likelihood methods, for example. Full cluster optimization is outside the scope of the current work, but we intend to test further these additional methods and dissimilarity measures. In particular, the stability of the resulting classifications over a jackknife procedure should be investigated.

The current method only takes into account the C{alpha} conformations. Including the coordinates of the side-chain atoms would make the procedure more sensitive to small structural changes, which might be important for more detailed structural analysis. In addition, we will also investigate how to include temperature factors and occupancy information in the method.


    Acknowledgments
 
We thank Ingolf Sommer and Mario Albrecht for helpful discussions and Andreas Kämper and Oliver Sander for their comments and suggestions. Financial support was provided to J.R. by BMBF grant No. 031U117. This work is part of the BioSapiens project. The BioSapiens project is funded by the European Commission within its FP6 Programme, under the thematic area ‘Life sciences, genomics and biotechnology for health’, contract number LHSG-CT-2003-503265.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) Nucleic Acids Res., 28, 235–242.[Abstract/Free Full Text]

Carugo,O. and Pongor,S. (2002) J. Mol. Biol., 315, 887–898.[CrossRef][ISI][Medline]

Chandonia,J.M., Hon,G., Walker,N.S., Lo Conte,L., Koehl,P., Levitt,M. and Brenner,S.E. (2004) Nucleic Acids Res., 32, D189–D192.[Abstract/Free Full Text]

Choi,I.G., Kwon,J. and Kim,S.H. (2004) Proc. Natl Acad. Sci. USA, 101, 3797–3802.[Abstract/Free Full Text]

Gordon,A.D. (1999) Classification, 2nd edn. Chapman and Hall, London.

Heine,A., DeSantis,G., Luz,J.G., Mitchell,M., Wong,C.H. and Wilson,I.A. (2001) Science, 294, 369–374.[Abstract/Free Full Text]

Holm,L.L. and Sander,C. (1996) Science, 273, 595–602.[Abstract/Free Full Text]

Jeffrey,P.D., Bewley,M.C., MacGillivray,R.T., Mason,A.B., Woodworth,R.C. and Baker,E.N. (1998) Biochemistry, 37, 13978–13986.[CrossRef][ISI][Medline]

Kaufman,L. and Rousseeuw,P.J. (1990) Finding Groups in Data: an Introduction to Cluster Analysis. Wiley-Interscience, New York.

Kelley,L.A. and Sutcliffe,M.J. (1997) Protein Sci., 6, 2628–2630.[Abstract/Free Full Text]

Kelley,L.A., Gardner,S.P. and Sutcliffe,M.J. (1996) Protein Eng., 9, 1063–1065.[ISI][Medline]

Laboulais,C., Ouali,M., Le Bret,M. and Gabarro-Arpa,J. (2002) Proteins, 47, 169–179.[CrossRef][ISI][Medline]

MacGillivray,R.T. et al. (1998) Biochemistry, 37, 7919–7928.[CrossRef][ISI][Medline]

May,A.C. (1999) Proteins, 37, 20–29.[CrossRef][ISI][Medline]

Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[CrossRef][ISI][Medline]

Myers,E.W. and Miller,W. (1989) Comput. Appl. Biosci., 4, 11–17.

Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 1093–1108.[ISI][Medline]

Rousseeuw,P. (1987) J. Comput. Appl. Math., 20, 53–65.[CrossRef][ISI]

Sayle,R. and Milner-White,E. (1995) Trends Biochem. Sci., 20, 374–374.[CrossRef][ISI][Medline]

Shatsky,M., Nussinov,R. and Wolfson,H.J. (2002) In Guigó,R. and Gusfield,D. (eds), Proceedings of the 2nd Workshop on Algorithms in Bioinformatics (WABI). Springer, Berlin, pp. 235–250.

Struyf,A., Hubert,M. and Rousseeuw,P.J. (1997) Comput. Stat. Data Anal., 26, 17–37.[CrossRef][ISI]

Yamamoto,K., Kurisu,G., Kusunoki,M., Tabata,S., Urabe,I. and Osaki,S. (2001) J. Biochem. (Tokyo), 129, 303–312.[Abstract]

Received June 18, 2004; revised August 3, 2004; Edited by Andrej Sali





This Article
Abstract
FREE Full Text (PDF)
All Versions of this Article:
17/6/537    most recent
gzh063v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Domingues, F. S.
Articles by Lengauer, T.
PubMed
PubMed Citation
Articles by Domingues, F. S.
Articles by Lengauer, T.