ProSup: a refined tool for protein structure alignment

Peter Lackner1,2, Walter A. Koppensteiner3, Manfred J. Sippl1,3 and Francisco S. Domingues1

1 Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob-Haringer Str. 3, A-5020 Salzburg and 3 ProCeryon Biosciences GmbH, Jakob-Haringer Str. 3, A-5020 Salzburg, Austria


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
We investigated and optimized a method for structure comparison which is based on rigid body superimposition. The method maximizes the number of structurally equivalent residues while keeping the root mean square deviation constant. The resulting number of equivalent residues then provides an adequate similarity measure, which is easy to interpret. We demonstrate that the approach is able to detect remote structural similarity. We show that the number of equivalent residues is a suitable measure for ranking database searches and that the results are in good agreement with expert knowledge protein structure classification. Structure comparison frequently has multiple solutions. The approach that we use provides a range of alternative alignments rather a single solution. We discuss the nature of alternative solutions on several examples.

Keywords: alternative alignments/structure similarity measure/superimposition


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Protein structure comparison is an important tool in structural biology, e.g. for finding evolutionary relationships (Murzin et al., 1995Go; Orengo et al., 1997Go), structural and functional analysis (Weber and Salemme, 1980Go; Murzin, 1993Go, 1998Go; Holm and Sander, 1996Go; Orengo et al., 1999Go; Koppensteiner et al., 2000Go) or in the evaluation of structure prediction methods (Defay and Cohen, 1995Go; Lemer et al., 1995Go; Marchler-Bauer and Bryant, 1995Go; Lackner et al., 1999Go; Domingues et al., 2000Go).

There are many possible ways to define structure similarity among proteins, e.g. by root mean square deviation (r.m.s.d.) of rigid body superimposition (Zuker and Somorjai, 1989Go; May and Johnson, 1994Go; Diederichs, 1995Go) or differences in interatomic distances (Taylor and Orengo, 1989Go; Holm and Sander, 1993Go). We focus on rigid body superimposition, where the goal is to find the highest number of equivalent residues with the lowest r.m.s.d. Unfortunately, both parameters are interdependent. A solution is to keep the r.m.s.d. constant while optimizing the number of equivalent residues (Feng and Sippl, 1996Go). As a consequence, the main measure of similarity is the number of structurally equivalent residues, which is a rather intuitive measure and considerably simplifies ranking.

It has been demonstrated that multiple solutions are inherent in structure comparison (Boutonnet et al., 1995Go; Feng and Sippl, 1996Go; Godzik, 1996Go). For a particular protein pair several solutions with comparable extent of similarity can be observed. They can differ in small shifts in secondary structure elements, e.g. a shift of two residues in ß-strands, but also correspond to alignments of different regions, e.g. different domains. Therefore, alternative solutions are relevant whenever an alignment is needed, as in the evaluation of structure prediction (Lackner et al., 1999Go). Here we investigated and optimized the ProSup structure comparison program (Feng and Sippl, 1996Go). In ProSup the r.m.s.d. is kept below a threshold by defining a certain distance cutoff for equivalent residues. This cutoff applies during both major steps of the algorithm, (1) the iterative expansion of preliminary alignments and (2) the dynamic programming-based refinement. Pairs of small fragments having an r.m.s.d. below a certain cutoff are selected for initial superimpositions. This in general results in a number of distinct alignments. ProSup provides different filters applicable to the alignments, in order to remove regions of non-significant similarity.

We describe and discuss the ProSup algorithm and its parameters in detail in the Materials and methods section. In the Results and discussion section we (1) show that ProSup is able to detect remote structural similarity using a test set of 10 protein pairs reported in the literature, (2) compare the ranking of ProSup with the ranking of DALI (Holm and Sander, 1993Go), which is based on the alignment of distance matrices, (3) compare the ProSup ranking with the expert knowledge-derived structure classification of SCOP (Murzin et al., 1995Go), (4) calculate a measure of statistical significance for the structure alignments, (5) discuss the problem of side chain orientation and (6) investigate the nature of multiple solutions on several examples.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Below we describe the algorithm and the effect of different parameters on alignment quality, alternative alignments and computing time requirements.

The algorithm

The algorithm basically has four steps (Figure 1Go):



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 1. Flow chart of the ProSup algorithm.

 
  1. Identify similar fragments (seed fragments) in both proteins.
  2. Iteratively expand these seed fragments to initial alignments.
  3. Iteratively apply a dynamic programming algorithm to refine initial alignments.
  4. Evaluate the refined alignments.

Every similar fragment pair from (1) is expanded to an initial alignment. From step (2) a subset of initial alignments is selected rather than a single one. This subset is the origin of the alternative alignments. The preliminary initial alignments are optimized in step (3). Subsequently the optimized alignments are evaluated by a filter, which removes short fragments and checks for side chain orientations. Finally, they are classified by a clustering procedure.

Let A and B be the first and second protein chain, respectively. Let lS be the length of the fragments to be compared and lA and lB the sizes of proteins A and B. Then in step (1) all (lA lS + 1)x(lBlS + 1) fragments are superimposed (Sippl and Stegbuchner, 1981Go) and the subset with r.m.s.d. cS (r.m.s.d. cutoff for seed fragments) is forwarded to step (2).

In step (2) a seed is being expanded by (2a) superimposing the whole A and B according to the seed fragment pair Ai..i + lS–1 and Bj..j + lS–1 where i and j define the position of the seed in A and B. Subsequently (2b) starting from the N- and C-terminal ends of the seed fragment those residue pairs An, Bm are located which have a distance in space smaller than the distance cutoff for alignment construction dC. This defines a new set of equivalent residues. In (2c) A and B are superimposed according to the new set of equivalent residues and step (2b) is repeated until the initial alignment does not change any more.

Step (2) is repeated for each seed fragment pair. Naturally many Ai..i+lS–1, Bj..j+lS–1 pairs expand to the same initial alignment. Identical alignments are disregarded. Because of computing time considerations, not every initial alignment will be refined. Only the nS alignments with the highest number of equivalent residues and the lowest r.m.s.d. will be forwarded to step (3).

In step (3) a seed alignment is refined by a dynamic programming algorithm (Needleman and Wunsch, 1970Go). From step (2) we keep the last superimposition. Then in (3a) we compute a score matrix S for every residues pair Ai, Bj by measuring the distance dAi,Bj:

Then the optimal path through this matrix is search by a Needleman and Wunsch algorithm (Needleman and Wunsch, 1970Go). This results in a set of aligned residues. Note that aligned now does not necessarily mean structurally equivalent. In (3b), those residues where dAi',Bj' <= dC and i', j' is an aligned pair, define a new set of equivalent residues. The procedure is repeated at (3a) until the alignment does not change any more. Also in this step different initial alignments may converge to the same final alignment. Identical alignments are disregarded. Besides dC, the distance cutoff for alignment construction, there are also other parameters which can be applied in the refinement: gap penalty g, gap extension penalty and maximum gap length. In the study presented, only gap penalty is applied.

In (4) several filters are applied to remove non-significant similarities. A second distance cutoff is now introduced which defines the final equivalent residues (dE). As an option, a cutoff (dB) for the maximum distance between Cß atoms can be applied. These will remove equivalent residues where the side chains have different orientations. Finally, a filter for a minimum fragment length of consecutively aligned pairs (lF) is applied.

Parameter optimization

In order to determine the parameter set that produces the most reasonable alignments in the shortest computational time, an optimization procedure was performed. In general, a high number of equivalent residues is associated with significant matches and maximizing this number was used as the main optimization criterion. Human decision was required to select the parameters dE and lF by visual inspection of the superimpositions. Seven parameters were optimized (Table IGo, except nE and cC ) using a set of 100 protein pairs (Table IIGo).


View this table:
[in this window]
[in a new window]
 
Table I. Default parameters of ProSupa
 

View this table:
[in this window]
[in a new window]
 
Table II. Structure pairs used in the parameter optimization
 
Besides the number of equivalent residues, additional optimization criteria were r.m.s.d., the number of distinct solutions and computation time requirements. The average number of equivalent residues and average r.m.s.d. are calculated from the top ranking alternative alignment of each structure pair rather than using all alternatives. The multiple solutions are sorted by the number of equivalent residues as the primary key and r.m.s.d. as the secondary key.

Seed fragment parameters: lS and cS

The selection of initial seed fragments is controlled by the length lS and r.m.s.d. cutoff cS. Large r.m.s.d. cutoff values cS and small fragment lengths lS result in a large number of seed fragments and initial alignments, which is computationally demanding but is more sensitive to weak similarities. Optimization results are shown in Figure 2Go.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 2. Influence of the seed fragment length lS (top) and seed r.m.s.d. cutoff cS (bottom) parameters on the number of equivalent residues (left) and execution time (right). For a certain lS (here 10), if the seed cutoff is below a transition point (2 Å) seed fragments are overlooked (top left). Above this point also low similarity seed fragments are selected and initial alignments are calculated, which costs computing time (top right). The bottom part shows the dependence on seed fragment length lS. If the seed fragment size is larger than 12 the number of equivalent residues drops considerably.

 
Initial alignment parameters: nS and dC

If the maximum number of initial alignments nS is set to one, only the best initial alignment is being forwarded to refinement. Alternative alignments would be neglected. Increasing nS increases the number of alternative alignments but also the computation time. Low values of nS should be used for database searches, where the goal is just to identify the related structures. Larger values should be used for individual comparison where the set of alternatives might provide important information.

The distance cutoff for alignment construction dC defines a maximum distance allowed between two residues in steps (2) and (3). By optimization of the number of equivalent residues and by visual inspection we found that fairly large values for dC should be used in order to find the similarities. To remove aligned regions of low similarity a more restrictive cutoff should be applied in the filter stage.

Refinement parameters: dC and g

dC is used in creating the seed alignments and also on the refinement, the overall influence of dC is discussed above. Small values for the gap penalty g, as in any other application of this type of dynamic programming alignment algorithms, cause scattered alignments if the structures to be aligned have low similarity. An optimal gap penalty depends not only on the similarity function in the alignment procedure but also on the definition of the minimum fragment length in the evaluation. A larger gap penalty forces the alignments to be less scattered. Less scattered alignments produce larger regions of consecutive equivalent residues. Thus, if the minimum fragment length is larger also larger gap penalties should be preferred (Figure 3Go).



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 3. Optimization of the gap penalty g. Solid lines, minimum fragment length lF = 5; dashed lines, lF = 1. For lF = 1 the estimated optimum for g is 5, which corresponds to more scattered alignments.

 
Alignment evaluation parameters: dE, lF, nE and cC

Alignment evaluation parameters cannot be optimized on the number of equivalent residues. Increasing the distance cutoff for evaluation dE always increases the number of equivalences whereas increasing the minimum fragment length lF always decreases that number. Therefore, visualization of individual superimpositions was required to select reasonable values for these parameters, 5 Å for dE and five residues for lF. If lF is set to a low number some residues may be marked as equivalent where the structural relation is insignificant. It happens, for example, that a helix from the one protein crosses a strand from the other and the corresponding C{alpha} atoms are close together in a limited area. Larger values of lF avoid this problem. Again a balance must be found between filtering insignificant equivalences and not losing short but important stretches of equivalences.

An additional criterion is the minimum number of equivalent residues nE (after applying lF and dE). Alignments with a smaller number of equivalences than nE are disregarded.

ProSup provides an additional feature which does not influence the structure comparison results, but is convenient for the analysis of the output. In some cases alternative alignments might share many equivalent residues, in other cases they are rather different. We use a cluster algorithm to classify the alternative alignments. Two alternative alignments are put into different clusters if they have less than cCx100% equivalent residues aligned equally, relative to the smaller alignment. Smaller values for the cluster cutoff cC result in more clusters.

Suggestion for parameter settings

Parameter setting depends on the application of structure comparison. The optimized parameters presented in this study (Table IGo) are a good starting point. For finding remote similarity the distance cutoffs dE and dC can be larger, e.g. 5–6 Å. For calculating precise alignments lower distance cutoff values should be preferred (3–4 Å). Scattered alignments should be avoided by setting lE to 4–5. In case of database searches, where speed is an issue, the number of initial alignments forwarded to the refinement step should be kept low (nS of 20). For studying individual protein pairs large values for nS (100 or more) should be used to search exhaustively for alternative solutions.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Detection of remote similarity

A main requirement for any structure comparison method is its ability to detect weak structural similarity. A test suite should contain structures which have low but detectable structural similarity. We chose a set of 10 protein pairs which was previously used by Shindyalov and Bourne (1998) to investigate their CE method. The set itself was published by Fischer et al. (1996). ProSup results were compared with those from three methods available as web services and frequently used by the scientific community, VAST, DALI and CE (Table IIIGo).


View this table:
[in this window]
[in a new window]
 
Table III. Comparison of structure alignments for 10 difficult cases from Fischer et al. (1996) obtained by VAST, DALI, CE and ProSup
 
All these methods report the number of structurally equivalent residues. This number differs considerably for several protein pairs. In principal, one would expect that a higher number of equivalent residues would indicate a better performance of a particular method in the detection of structural similarity. However, comparing the number of equivalent residues is insufficient without taking r.m.s.d. into account.

R.m.s.d. reported by ProSup is fairly constant in all cases; the range of r.m.s.d. is 1.7 –2.7 Å. The upper limit of r.m.s.d. is determined by the parameters applied in ProSup. In the other three methods we find a larger variation in r.m.s.d.; the largest value is 4.2 Å in VAST. This variation in r.m.s.d. prevents an objective comparison of the results in terms of performance based on number of equivalent residues. For example, consider the comparison of the granulocyte-colony stimulating factors 1bge-B and 2gmf-A. ProSup reports 87 equivalent residues with an r.m.s.d. of 2.4 Å. VAST gives a similar r.m.s.d. of 2.3 Å, but the alignment is shorter with 71 aligned residues. CE reports the largest number of equivalent residues, 107, but also a larger r.m.s.d. of 3.9 Å. Even a detailed comparison according to these numbers is difficult, but in general ProSup results are in the range of those from the other three methods.

We also compare the structural alignments obtained from the different methods. We found that the ProSup alignments agree best with CE, where in sum 761 residue pairs are aligned identically; worst is the agreement with VAST, where 715 residue pairs are aligned identically. In most cases >40% of the aligned residues match. A remarkable exception is the pair 1bge-B and 2gmf-A, where the alignment of ProSup differs from all the others. In this case also CE differs substantially from DALI and VAST in terms of alignment.

Ranking in database searches

The number of equivalent residues nevertheless can be an appropriate measure if r.m.s.d. is kept constant and provides a simple and intuitive measure to rank structural similarity. This principle is applied in ProSup and used whenever a comparison of a single protein structure with a database of structures is performed. Other methods use different approaches. For example, in DALI ranking is done by a Z-score derived from a measure based on intramolecular distances.

DALI is a frequently used method and enjoys broad acceptance in the scientific community. We compare the results of database searches performed with ProSup and DALI. Table IVGo shows the results for a database scan with the transcription factor ETS-1 (1bqv), a CASP3 target (Murzin, 1999Go). The table shows the 10 top ranking structures reported by ProSup. In DALI, structures with a Z-score >2 are considered to be similar and therefore are shown in the table.


View this table:
[in this window]
[in a new window]
 
Table IV. Database search for the CASP3 target structure 1bqv, which has a size of 110 residues
 
Both methods report ribonucleotide reductase 1rlr as a top ranking match, position 1 in DALI and 2 in ProSup. DALI converts its measure of similarity into a Z-score in order to express significance of a match. On the other hand, ProSup directly ranks by its measure of similarity which does not reflect significance, but is easy to interpret: 58 residues is more than 50% of the size of 1bqv. ProSup applies a distance cutoff for equivalent residues; therefore, one knows in advance that none of these residue pairs can have a distance above the cutoff (5 Å). DALI calculates the r.m.s.d. after the structure alignment is calculated and it can vary considerably, in this example from 3.0 to 7.0 Å. An extreme case is the glycogen phosphorylase B 1pjr, ranking position 2 in DALI with 73 equivalent residues at 7.0 Å. See Figure 4Go for a comparison to the ProSup results. ProSup reports 47 equivalent residues with an r.m.s.d. of 2.47 Å. This pushes 1pjr down to rank 26 in ProSup.



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 4. Comparison of the superimposition of 1bqv and 1pjr in DALI and ProSup. The distance between equivalent residues after superimposition is plotted against the sequence position: *, Distance in DALI; {square}, distance in ProSup. The horizontal dashed line corresponds to the distance cutoff for equivalent residues in ProSup.

 
ProSup ranking versus SCOP classification

A further test of ProSup ranking was performed by comparison with the expert knowledge classification of SCOP (Murzin et al., 1995Go). ProSup ranking should match SCOP classification. We used a set of 53 single domain proteins from Domingues et al. (2000) and compared them against a database of about 800 protein chains with low sequence similarity. For all of the 53 target proteins there is at least one structure in the database with the same SCOP fold category. In addition, each of the target proteins has several similar folds in the database according to ProSup. We removed trivial cases of high sequence similarity between target and database structures. Only chains where FASTA3 (Pearson and Lipman, 1988Go) does not detect significant similarity were considered.

In 48 of the 53 cases the target protein and the best ranking database structure are members of the same SCOP fold category (Table VGo). In 42 cases (80%) ProSup identifies a chain which belongs to the same superfamily. This means that ProSup tends to rank those structures in first position where SCOP expects evolutionary relationships.


View this table:
[in this window]
[in a new window]
 
Table V. Comparison of the best match of ProSup database search with SCOP expert knowledge classification
 
In five cases the same fold is on position 2, in one case on position 3. In three of these five cases the best ranking structures are typical examples of the Russian doll effect (Orengo et al., 1997Go). The four-helix bundles 1lpe and 1vlt-A have considerable similarity to a seven-helix fold. The nine-stranded ß-sandwich 1ulo superimposes well with the 12–14-stranded ß-sandwich 2ayh. For 1aps a structure with considerable similarity is identified by ProSup, which is classified in a different fold category by SCOP. Finally, the all-ß protein 2lef-A matches helices in the {alpha} + ß folds 1ldg and 1hyh-A. These are exceptional cases. As a rule, ProSup ranking agrees well with SCOP expert classification.

Significance of structure comparison

We investigate how a measure of significance of structural similarity can be applied to ProSup results. Such a measure has been proposed by Levitt and Gerstein (1998). Here we calculate the structure similarity score, defined as Sstr = M{{Sigma}1/[1 + (dij/d0)2] – Ngap/2}, where M = 20, dij is the distance between aligned C{alpha} atoms, d0 = 5 and Ngap is the number of gaps for the ProSup based alignments. Note that the dij are the distances in the complete alignment, not from the equivalent regions only. Subsequently we convert the scores into p-values by applying the formalism given in Levitt and Gerstein (1998). We compare how the p-values correlate with the number of equivalent residues. We assume that the underlying distribution of Sstr is similar in both methods. Figure 5Go shows the number of equivalent residues versus the logarithm of p-values. The horizontal line separates the 10 lowest p-values and the vertical line separates the ten top ranking cases with ProSup. Five proteins are common in both cases; the agreement is considerable. This was confirmed when we repeated this experiment in the 53 database scans described in the section above. The correlation coefficient ranges from –0.66 to –0.95, average –0.85.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 5. The p-values calculated according to Levitt and Gerstein plotted against the number of equivalent residues reported by ProSup for the database search of 1bqv as in Table IVGo. The correlation coefficient is –0.76.

 
Side chain orientation

Frequently structure comparison is based on C{alpha} traces. This is sufficient if the goal is the identification of similar architectures or topologies. In other applications, such as the detection of evolutionary relationships or evaluation of predictions, a more detailed description of similarity is required. Not only C{alpha} superimposition but also the orientation of the side chains is important.

For example, in the superimposition of ß-strands, C{alpha} atoms close in space can have Cß atoms in an opposite orientation (Figure 6Go). A simple but efficient way to take side chain orientation into account was implemented in ProSup. It consists of a filter applied after C{alpha} superimposition. For each pair of equivalent C{alpha} atoms we measure also the distance between the corresponding Cß. If the distance is larger than a distinct cutoff this pair is not considered as structurally equivalent. For some protein pairs, Cß filtering reduces the number of equivalent residues considerably (Table VIGo).



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 6. Two extended fragments of 1bd9-A and 1rlw which are aligned by ProSup. The Cß atoms for the residue pair 1bd9-A/88 and 1rlw/60 are shown. They point in opposite directions. By applying the Cß filter such cases are removed from the alignment. This figure was prepared with MOLSCRIPT (Kraulis, 1991Go) and Raster3D (Merritt and Murphy, 1994Go).

 

View this table:
[in this window]
[in a new window]
 
Table VI. Some cases where Cß filtering decreases the number of equivalences considerably
 
Multiple solutions in structure comparison

Structure comparison often yields multiple solutions which correspond to distinct alignments. The main sources of alternative alignments are structural repeats at different levels, secondary structure, supersecondary structure, internal repeats and domains.

For example, the comparison of four-helix bundles with shorter and larger helices obviously can be aligned in different ways (Feng and Sippl, 1996Go), because of the repetitive nature of {alpha}-helices. However, there are also more intricate cases at the level of secondary structure, such as in the comparison of the {alpha} + ß proteins 1cew-I and 1oun-A. Superimposition yields two alternative alignments which have comparable numbers of equivalent residues and r.m.s.d., 70/2.4 and 68/2.4 Å, but the aligned residues are different in the two alternatives (Figure 7Go). The differences between these two solutions come from shifts of four residues in a helix and shifts of two residues in the strands.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 7. Two alternative alignments of 1cew-I and 1oun-A. Vertical bars mark equivalent residues. In the boxed areas shifts of two or four residues occur. The dashed box encloses helical regions ad the solid boxes strand regions.

 
In the case of the lipocalin folds 1adl and 1mup, not only shifts within ß-strands can be observed but also the alignment of different strands, corresponding to shifts of strand–loop–strand supersecondary structure units. Two alignments can be observed, one with 78 equivalent residues at 2.7 Å and the other with 76 equivalent residues at 2.7 Å but showing as few as 18 aligned pairs in common. In a previous section we showed that the best ProSup alignment for 1beg-B and 2gmf-A was completely different to those obtained with other methods (Table IIIGo). In fact, there is an alternative alignment with 77 equivalent residues which agrees with DALI and VAST (66 and 67 identically aligned positions, respectively).

An example of alternative alignments caused by internal repeats is the second domain of 1tdj. It consists of two subdomains. Both of them can be superimposed with the ferredoxin-like domain of 1psd-A.

There are many examples on the domain level. For example, the superimposition of a protein consisting of a single immunoglobulin-like domain with one consisting of two such domains has obviously (at least) two solutions.

Alternative alignments provide important information for several applications of structure comparison. For example, if structure alignments are used for deriving amino acid substitution matrices (Prlic et al., 2000Go), the question arises of which alternative to use. If there is sequence homology, the number of identical residues could serve as a measure to select alternatives but in the case of remote homologous or analogous proteins this should not be considered as a reliable criterion. Probably there is no answer and the best solution is to use both alignments. If structure comparison is used to evaluate or optimize structure prediction methods, considering only one and neglecting the other solution might lead to a wrong evaluation. To conclude, in every application of structure comparison, where the alignment plays a role, one must be aware that alternative solutions may exist.

Computational effort and availability

The computing time depends on both the size of the structures and the parameters used. With the default parameter set a search in a database containing 842 protein chains takes between 14 min and 3.2 h on a PentiumIII 500MHz Linux system. For example, a BPTI-like fold (1den, 60 residues) takes 14 min, a globin (1ash, 150 residues) 1 h and a TIM barrel (1ece-A, 358 residues) 2.5 h. On average it is 4.5 s per pairwise comparison for an average target structure size of 157 residues and an average database structure size of 258 residues.

ProSup is available as an online service at http://www.came.sbg.ac.at.

Conclusion

Optimizing the number of equivalent residues while keeping the r.m.s.d. constant provides a simple and intuitive measure of structure similarity. We have demonstrated that this measure can be used for ranking in database searches, whose results are in good agreement with expert knowledge classification. We further showed that structure comparison frequently has multiple solutions, which should not be neglected for some applications. ProSup provides a number of alternative alignments rather than a single solution. The C{alpha} trace is an adequate approximation of the backbone conformation. By default ProSup uses C{alpha} atoms only in the structure superimposition calculation. In this way the computing time is kept to a minimum. Nevertheless, in some applications side chain orientation plays an important role. ProSup provides a filter which guarantees proper side chain orientation.


    Notes
 
2 To whom correspondence should be addressed Back


    Acknowledgments
 
We are grateful to Andreas Prliâ and Nora Benhabiles for valuable suggestions for improving the manuscript. This work was supported by grant No. 7899 of the Jubiläumsfonds der Österreichischen Nationalbank.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 References
 
Boutonnet,N.S., Rooman,M.J., Ochagavia,M.E., Richelle,J. and Wodak,J.S. (1995) Protein Eng., 8, 647–662.[Abstract]

Defay,T. and Cohen,F.E. (1995) Proteins, 23, 431–445.[ISI][Medline]

Diederichs,K. (1995) Proteins, 23, 187–195.[ISI][Medline]

Domingues,F.S., Lackner,P. and Sippl,M.J. (2000) J. Mol. Biol., 297, 1003–1013.[ISI][Medline]

Feng,Z.K. and Sippl,M.J. (1996) Fold. Des., 1, 123–132.[ISI][Medline]

Fischer,D., Eloffson,A., Rice,D.W. and Eisenberg,D. (1996) In Hunter L. and Klein T. (eds), Proceedings of the 1st Pacific Symposium on Biocomputing. World Scientific Publishing Company, Singapore, pp. 300–318.

Godzik,A. (1996) Protein Sci., 5, 1325–1338.[Abstract/Free Full Text]

Holm,L. and Sander,C. (1993) J. Mol. Biol., 233, 123–138.[ISI][Medline]

Holm,L. and Sander,C. (1996) Science, 273, 595–603.[Abstract/Free Full Text]

Koppensteiner,W.A., Lackner,P., Wiederstein,M. and Sippl,M.J. (2000) J. Mol. Biol., 296, 1139–1152.[ISI][Medline]

Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946–950.[ISI]

Lackner,P., Koppensteiner,W.A., Domingues,F.S. and Sippl,M.J. (1999) Proteins, Suppl. 3, 112–120.

Lemer,C.M., Rooman,M.J. and Wodak,S.J. (1995) Proteins, 23, 337–355.[ISI][Medline]

Levitt,M. and Gerstein,M. (1998) Proc. Natl Acad. Sci. USA, 95, 5913–5920.[Abstract/Free Full Text]

Marchler-Bauer,A. and Bryant,S.H. (1995) Proteins, 23, 74–82.

May,A. and Johnson,M.S. (1994) Protein Eng., 7, 475–485.[ISI][Medline]

Merritt,E.A. and Murphy,M.E.P. (1994) Acta Crystallogr. D, 50, 869–873.[ISI][Medline]

Murzin,A.G. (1993) EMBO J., 12, 861–867.[Abstract]

Murzin,A.G. (1998) Curr. Opin. Struct. Biol., 8, 380–387.[ISI][Medline]

Murzin,A.G. (1999) Proteins, Suppl. 3, 88–103.

Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[ISI][Medline]

Needleman,S.B. and Wunsch,C.D. (1970) J. Mol. Biol., 48, 443–453.[ISI][Medline]

Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 1093–1108.[ISI][Medline]

Orengo,C.A., Todd,A.E. and Thornton,J.M. (1999) Curr. Opin. Struct. Biol., 9, 374–382.[ISI][Medline]

Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl Acad. Sci. USA, 85, 2444–2448.[Abstract]

Prlic,A., Domingues,F.S. and Sippl,M.J. (2000) Protein Eng., 13, 545–550.[Abstract/Free Full Text]

Shindyalov,I.N. and Bourne,P.E. (1998) Protein Eng., 11, 739–747.[Abstract]

Sippl,M.J. and Stegbuchner,H. (1981) Comput. Chem., 15, 73–78.

Taylor,W.R. and Orengo,C.A. (1989) J. Mol. Biol., 208, 1–22.[ISI][Medline]

Weber,P.C. and Salemme,F.R. (1980) Nature, 287, 82–84.[ISI][Medline]

Zuker,M. and Somorjai,R.L. (1989) Bull. Math. Biol., 51, 55–78.[ISI][Medline]

Received March 9, 2000; revised September 14, 2000; accepted September 28, 2000.