1 National Research Council of Canada, Plant Biotechnology Institute, Saskatoon, Saskatchewan, S7N 0W9 and Departments of 3 Computer Science and 4 Biochemistry, University of Saskatchewan, Saskatoon, Saskatchewan, S7N 0W0, Canada
2 To whom correspondence should be addressed. E-mail: kusalik{at}cs.usask.ca
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: 3-D protein structure comparison/MolCom
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In essence, conventional methods for comparing proteins attempt to co-locate similar portions of the molecules using features or properties associated with a single level of analysis; to date, no methods analyze and compare a series of defined molecular properties (be they structural, chemical or otherwise) using multiple, simultaneous levels of resolution and none provide a compact consensus structure outlining precisely where the similarity exists in three-dimensional space. The MolCom method simultaneously satisfies all of these shortcomings.
This paper addresses the problem of how to analyze and compare protein structures rigorously so that several levels of detail can be incorporated into a comprehensive three-dimensional spatial map of structural and chemical similarity. Our MolCom method spatially decomposes protein molecules for comparison, similarity detection and categorization. It uses an octtree space-partitioning strategy to explore recursively and represent trends in a dynamic collection of molecular features whose composition changes systematically with the level of analysis. The collection of properties examined and their relative contributions to the assessment of similarity at each level of analysis can be tailored by the investigator. The fundamental goal is to explore and summarize variations in the structure and chemistry of the molecules through space. Features appropriate to the volumes under analysis are compared at every iteration of the algorithm. Measurements taken from overlapping regions of space are summarized at discrete aggregation points. The summarized information is used, in turn, to construct an octtree which identifies those cubic regions containing similar molecular features. The octtree is particularly efficient at summarizing information essential for the characterization and topological localization of molecular properties observed within and around the space occupied by the co-located protein structures being compared. Moreover, the octtree constitutes a multiple alignment of the spatial regions containing molecular similarity (OHearn, 2000; OHearn and Kusalik, 2000
) and provides a compact three-dimensional representation of equivalenced portions of the molecules which reveals a common structural core (Sander and Holm, 1996
). A collection of octtrees creates a rapid, efficient comparison space for automated classification of newly resolved molecules.
A computer application, MolCom3D, has been developed which successfully applies the MolCom method to protein molecules (the name MolCom will refer to both the algorithm and the MolCom3D implementation hereafter, except where indicated otherwise). Separate experiments were conducted to calibrate MolCom and to verify that the spatial analyses and resulting alignments accurately identified both similar and dissimilar molecules (OHearn, 2000). Using pairs of structures with varying degrees of known similarity, the accuracy of MolCom was found to be over 99% in classifying structures as similar or dissimilar. MolComs results were also correlated with scores from CE and compared qualitatively with results from DALI and VAST and with detailed visual inspections of the orientations of secondary structural elements. The similarity scores correlated strongly with the z-scores of the CE method (Shindyalov and Bourne, 1998
). MolCom was highly sensitive even when molecules had extremly low sequence identities (below 4%). On the basis of the qualitative visual comparisons, we consider MolComs similarity scores to be chemically and biologically accurate insomuch as collections of physicochemical properties, which ultimately govern folding and function, are appropriately matched during the comparison.
In the next section, a number of concepts for comparing structures in three dimensions are described, including octtree spatial maps, hierarchical multiple structure alignments and appropriate levels of analysis for structural and chemical properties. These concepts are then used in presenting the details of the MolCom method and its calibration and verification methodologies. The results of calibration and verification tests follow. Finally, in the last section, the advantages of the MolCom method and future research directions are discussed.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our technique requires a number of parameters and functions (called settings hereafter) to be established. Some settings were simply assigned reasonable values based on background knowledge, whereas others required empirical calibration through controlled comparisons using proteins of known similarity. Once an acceptable calibration was achieved, the calibrated method was verified by comparing results from the MolCom and CE methods using a representative sample of proteins.
Definitions
The MolCom method extends prevailing notions of octtrees, multiple structure alignments, levels of analysis and approriate levels of analysis. We now present these definitions before delving into the details of the method itself.
Definition of the octtree spatial map
The MolCom method is a three-dimensional instance of our recently developed Structure Comparison Framework for comparing N-dimensional objects using hypercubes (OHearn, 2000). In three dimensions, the hypercubic subdivision specified in the framework reduces to a cubic mesh structure represented by an octtree. Based on operational definitions compiled from the literature (Foley et al., 1990
), we give the following general definition of an octtree:
An octtree is a hierarchical data structure which represents and locates aggregated feature information within a cubic region of space. It is constructed in accordance with a recursive spatial decomposition of each cubic region into eight uniform cubic sub-regions. These sub-regions are bounded by the three (orthogonal) dividing planes and the original (planar) faces of the cubic region being subdivided.
The octtree is neither the actual subdivided cubic region nor the resulting mesh structure. Rather, it is a logical structure in the form of a tree representing and locating features of interest which are found whenever the corresponding cubic regions are similarly broken down. The actual data structure used to implement the octtree can exist in any convenient form (e.g. binary tree, octary tree, 23 tree, etc.). The octtree simply requires each of the eight sub-cubes to be represented by a corresponding subtree. The MolCom method considers the octtree to be a special case of a binary space partition (BSP) tree (Foley et al., 1990; OHearn, 2000
).
Figure 1 shows a spatial decomposition of successive cubic regions (left-hand side) and a corresponding octary tree representation of the octtree (right-hand side).
|
Definition of analytical appropriateness
We informally define the level of analysis as the degree of physical detail or conceptual abstraction chosen for the investigation or description of a system. A level of analysis can be considered appropriate if the degree of physical detail (or conceptual abstraction) succinctly embodies the essential characteristics and functionality of some portion of the system under investigation. Current comparative methods apply a fixed level of analysis. For example, r.m.s. deviation and distance matrices consider amino acid residue positions (Holm and Sander, 1993; Collaborative Computational Project No. 4, 1994
; Shindyalov and Bourne, 1998
), clustering is based on backbone fragments (Alexandrov et al., 1992
; Gilbrat et al., 1996
) and topological algorithms concentrate on a particular level of structure (Gilbert and Westhead, 1998
). However, the hierarchical nature of protein structure warrants investigation from a variety of perspectives adopting different levels of analysis. To date, no method for protein comparison examines molecules in a recursive manner while changing the features considered salient over the progression of the algorithm. We devised the MolCom method to fill this void, focusing its comparisons on appropriate molecular features at every level of analysis over the progression of the algorithm.
Definition of the hierarchical multiple structure alignment
Current methods for detecting protein similarity involve the construction of some form of multiple structure alignment which reveals regions of similarity based on three-dimensional structure. Our method is no exception. However, the notion of what constitutes a multiple structure alignment varies considerably with the approach used to build it, ranging from alignments of -carbon atoms to alignments of secondary structural elements. Consequently, we have extended the definition of the multiple structure alignment by considering the principal purpose of such alignments, namely to identify general regions of correspondence between proteins in a way that is chemically and biologically meaningful. Our extended definition reads as follows:
A multiple structure alignment of k objects is a computational entity representing the collection of spatial locations (or other referencing identifiers) mapped to regions deemed to coincide in all k objects and deemed equivalent with respect to the consideration of (one or more) properties within and around those regions in all k objects.
Correspondence (coincidence) does not pertain exclusively to sequence or to a given type of structure, but rather to any comparable property that arises from the structural constitution of the molecules (consequently, the word structure has been retained in multiple structure alignment). In particular, the spatial localities from which properties are considered (measured, predicted, conceptualized) need not coincide geometrically with the spatial locations actually attributed with the measurements. This abstraction allows for physically reasonable but insightful departures from the purely geometric bounds of the objects being compared. Such geometric departures have been found to increase sensitivity to similarity between non-rigid objects (OHearn, 2000; OHearn and Kusalik, 2000
) and are characteristic of topological methods such as the TOPS-based structure comparison method (Gilbert and Westhead, 1998
). The MolCom method matches hierarchical sets of features in building its octtrees and the resulting octtree-based alignments are referred to as hierarchical multiple structure alignments.
Dynamic levels of analysis and property weighting curves
Over successive levels of analysis, the depth of the octtree increases and property information is aggregated from cubes having edge lengths which are half as long as in the previous level. To promote analytical appropriateness, the MolCom method defines a set of property weighting functions that regulate the extent to which the various types of molecular properties contribute to the overall similarity score as the level of analysis increases (see Figure 2). The weighting functions are meant to approximate the relative contributions of empirical properties at the volumes of space corresponding with each level of analysis.
|
Octtrees constructed using this hierarchically organized approach to property analysis constitute information-rich hierarchical multiple structure alignments. The octtree-based alignments are, in effect, consensus structures which reveal structurally and chemically relevant similarity based on the defined collection of properties used in the comparative analyses.
Operation of the MolCom algorithm
In general, the MolCom algorithm operates as follows. Properties in corresponding regions of space for each molecule are compared. Regions found to contain similar properties are recursively analyzed in further detail. As the resolution becomes finer, the collection of properties examined changes so that the volumes of the regions under analysis are matched with properties expected to dominate molecular structure and chemical activity. An octtree spatial map is constructed during the comparison which precisely locates those cubic regions of space where, collectively, the properties match. Moreover, the octtree spatial map can be annotated with information about the properties deemed similar in those regions.
The details of the method are as follows. It is assumed that the molecules are pre-aligned in approximately the same orientation. [In earlier implementations of MolCom, molecules under comparison were first rotated and translated into initial alignment orientations by a program called lsqkab, a protein crystallographic program from the Collaborative Computational Project (CCP4) (Collaborative Computational Project No. 4, 1994). With subsequent development of the algorithm, the initial alignment function was incorporated into the implementation. Currently the alignment is performed using repeated orienting applications of MolCom, starting at 40° increments and then refining the best 40° orientations to within 20, 10 and then 5° of the optimal orientation giving the best similarity score. The authors are currently perfecting a better incremental pre-alignment method (see Discussion section).] A starting cube of edge length |H0| is defined such that it is large enough to enclose fully the largest molecule under investigation. Over successive iterations of the algorithm or levels of analysis, L, this cube is recursively subdivided into eight uniform sub-cubes of length |HL| and comparative information is summarized while an octtree is constructed. A fully constructed octtree locates only those cubic regions of space where collections of properties have been found equivalent. Further subdivision of a given sub-cube terminates when an octant decision score falls under a certain exploration threshold value. An octtree is considered fully constructed if no further similarity can be found for the collection of molecules or if the maximum height Lmax has been reached. As a consequence of the algorithm, if one molecule is vastly larger than the other, the molecules will be deemed dissimilar early on in the analysis.
The results of all property measurements for a given sub-cube (or equivalently, octant) are collated and assigned to a point located at the center of the sub-cube called the aggregation point (the points at the centers of the cubes shown in Figure 1). The properties currently examined by the MolCom method are listed in Table I
. The nine properties are categorized into groups 1a and 2a (more influential) and groups 1b and 2b (less influential).
|
|
|
|
The MolCom method indicates the extent to which trends in the defined collection of properties match in the compared molecules by way of a similarity score. The score is meant to account fairly for sparse distributions of properties in space. A significant portion of the volume containing a molecule consists of empty space. While this empty space should be considered in the similarity scores, it should not be allowed to dominate the scoring, lest every molecule be found nearly 100% similar. Thus our similarity score is cumulative over the levels of analysis and is adjusted for the amount of volume under analysis. The similarity of two molecules for a given level of analysis, L, is given by the cumulative volume-adjusted (CVA) score defined by the following relation:
![]() | (1) |
where
CVA0 does not exist;
TL = total number of octants examined at level L;
![]() |
![]() |
The CVA score at level L > 1 is the sum of two terms. The first term scales the ratio of similar octants to total octants examined (RL) by the proportion of the volume actually found to contain properties (TL/8TL- 1). Every octant in a region where RL is above the exploration threshold is further subdivided in the next iteration. The second term awards the remaining proportion of the volume whose octants contain no properties [1 (TL/8TL- 1)] the prevailing similarity score from the previous level of analysis (RL- 1). The octants in this region are not further subdivided because they contain no discernible properties. The factor (1 S) governs the extent to which empty space influences the score. The influence of empty space increases as S is increased. S was set to 0 during calibration, but subsequent informal tests found S = 0.15 to give slightly better scores with large molecules. The overall similarity score (simply referred to as the CVA score) is the geometric mean of the level-dependent CVA scores:
![]() | (2) |
By definition, a CVA score at or above 70% is meant to indicate similarity; dissimilar molecules should never be awarded CVA scores of 70% or more. MolCom was calibrated on the basis of this similarity boundary.
Implementation
The MolCom method for proteins was implemented as a C++ computer application, called MolCom3D, along with auxiliary UNIX shell scripts and a Perl/CGI Web interface. MolCom3D takes as inputs two protein coordinate files (currently in PDB format). It produces the octtree during the comparative process, the best CVA similarity score found and two new PDB coordinate files with the molecules co-located at their centers of mass and rotated into the positions giving the highest CVA scores. A script file is also produced which automatically loads and superimposes the generated files into VMD (Humphrey et al., 1996) for viewing.
Calibration of the MolCom algorithm
The MolCom algorithm required several constants and functions (settings) to be determined in accordance with the rules of the Structure Comparison Framework (OHearn, 2000) from which it was derived. These are summarized in the first two columns of Table III
. Some of these parameters could be assigned values whereas others required empirical calibration using protein structural data. On the basis of background knowledge (Lehninger et al., 1993
; Shortle, 1998
) and practical experience, it was straightforward to assign values for settings controlling the general exploratory behaviour and the overall scoring contribution scheme for similarity detection. These included |H0|, Lmax, D(r), S and the functional coefficients of Wp(L), i.e. a, b, c and d. To enable MolComs exploratory behavior and property scoring contributions to be tuned for proteins, the remaining settings, including E(L), T and the scalars wp, were assigned values based on calibration. The objective of calibration testing was to determine at least one set of parameter assignments that would result in the generation of reasonable CVA similarity scores based on criteria described below.
Calibration testing was conducted using pairs of structures drawn from the knottin and snake toxin superfamilies (folds) (Murzin et al., 1998). These superfamilies offered a collection of medium-sized proteins with roughly equal sizes (6070 residues) which were convenient for calibration purposes. The calibration based on medium-sized proteins was expected to extend to large proteins; this extension was validated with subsequent verification testing.
MolCom made a total of 1950 comparisons using 78 prealigned pairs of protein structures with varying degrees of similarity. The 78 pairs consisted of all possible (132) combinations of structures with PDB identifiers 1AHO, 1LQH, 1NRA, 1PTX, 2SN3, 1COD (knottins) and 1COD, 1CRE, 1ERA, 1FAS, 1KBS, 1NXB, 2CDX, 2CRT (snake toxin-like). The comparisons involved five cubic overlap settings, k, of E(L), ranging from 100 to 200% in increments of 25%, and five exploration thresholds, T, ranging from 50 to 70% in 5% increments (OHearn, 2000). The protein pairs were assigned to two groups based on their confirmed similarity; similar pairs were assigned to Group S and dissimilar pairs were assigned to Group D. Similarity was confirmed if the r.m.s.d. values were below 10 Å, the proteins came from the same SCOP classifications and the presence and orientations of SSEs matched on the basis of visual inspection; pairs were confirmed to be dissimilar otherwise.
A limit of seven cubic subdivisions was imposed for each execution of MolCom since the CVA score was found not to change appreciably beyond this level during previous experimentation. Using the best parameter assignments for k and T, further comparisons were conducted using four different scalar multiplier (wp) combinations for the functions Wp(L) (Figure 2) associated with each property group. The multipliers are given as a 4-tuple (w1a,w1b,w2a,w2b). The 4-tuples tested were (1,1,1,1) (shown in Figure 2
), (1,2,1,1), (1,1,1,2) and (1,2,1,2). For instance, applying combination (2,1,1,1) to the curves shown in Figure 2
would cause helices and sheets to affect the similarity score four times as much as turns and loops (instead of two times) and charged and polar residues to have twice the effect of aromatic and aliphatic residues (as already shown in the figure).
The validity of the CVA scores was assessed as follows. First, the CVA scores were expected to demonstrate a tendency to increase monotonically from 0 to 100% as the degree of similarity of the calibration structures rose from extreme dissimilarity to identity. The correlation of the CVA scores with the minimum r.m.s.d.s of the structures was used as an approximate measure of this tendency. A successful calibration required a correlation of -0.8 or less. Second, the number of individual errors committed in awarding CVA scores was to be minimized. An error was committed whenever a CVA score below 70% was awarded to a pair of proteins from group S (similar structures were deemed dissimilar) or whenever a score of 70% or more was awarded to proteins from group D (dissimilar structures were deemed similar). This was referred to as the segregation rule and was in effect only during calibration wherefor a special, visually inspected, bipartite collection of structures was used. The number of mismatches based on the segregation rule was tallied. It was required that at least 95% of the protein pairs be correctly classified as similar or dissimilar according to the segregation rule. Finally, the difference between the averaged CVA scores from Groups S and D was to be maximized. This difference is referred to as the selectivity of the CVA score.
The results of calibration testing are reported in the Results section.
Verification of the MolCom method
The objective of verification testing was to confirm that the calibrated MolCom method produced meaningful and accurate CVA scores (with at least as much accuracy as the currently available automated methods) and that MolComs alignments of superimposed structures, based on a quantitative analysis of trends in molecular properties through space, would be as good as alignments produced by manual manipulation of structures using visualization software (such as Turbo Frodo, MolMol, Setor or VMD). The verification was performed on data different from that used for calibration.
A collection of 1054 pairs of structures (distinct from those used during calibration testing) was formed from four superfamily classifications, including knottins (1AYJ, 1BCG, 1BMR, 1BRZ, 1CHZ, 1CN2, 1DQ7:A, 1DQ7:B, 1FH3, 1GPS, 1I6F, 1LQQ, 1SNB, 2B3C), snake toxins (1CRD, 1CTX, 1CVO, 1FRA, 1FSC, 1KBT, 1NEA, 1NOR, 1NTN, 1QKE, 2CRS, 2CTX, 3EBX), lysozymes (1AT5, 1AZF, 1HEL, 1HER, 1JPO, 1LYY, 1LZG, 1OUG, 1UIF, 1UIH, 1WQR, 2HEC) and microbial ribonucleases (1BU4, 1DET, 1GSP, 1RGA, 1RGL, 1RLS, 1RMS, 1RTU, 2GSP, 3BIR, 3GSP, 4BIR, 4GSP, 5GSP, 6GSP). The structure sizes ranged from 51 to 436 amino acid residues. The pairs formed from within and between the four superfamily classifications were compared by MolCom and by CE (Shindyalov and Bourne, 1998). There were about four times as many pairs of dissimilar structures as similar structures as a result of the pairing composition. The structures were not pre-aligned. Instead, MolComs property-based technique for alignment of structures was tested. MolCom was configured to combine CVA score estimates from 125, 150, 175 and 200% cubic overlap into an overall CVA score.
In addition to the above test cases, some structures known to be complex or difficult to compare were also tried. These included pairs of large structures with low sequence identities and pairs of structures whose similarity was documented as difficult to detect for CE, DALI and VAST (Shindyalov and Bourne, 1998). The pairs are listed in Table V
.
|
The validity of the CVA scores was also assessed by counting the number of incorrectly awarded CVA scores: pairs of structures formed between groups were expected to receive CVA scores under 70%, while pairs formed within groups were expected to receive CVA scores over 50%.
A subset of 200 alignments of superimposed structures produced by MolCom was chosen at random and assessed qualitatively by visual inspection and any errors were tallied. The MolCom alignments were also visually compared with alignments produced by the lsqkab software (Collaborative Computational Project No. 4, 1994) whenever the minimized r.m.s.d. was 4 Å or lower. Where CVA scores were high (over 70%), the major trends in SSEs were expected to match (e.g.,
-helices found near
-helices, ß-sheets found near ß-sheets, etc.) and were expected to be in approximately similar orientations. This tendency was to become more pronounced with increasing CVA scores.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
MolCom was successfully calibrated. A detailed (and Web-accessible) listing of the calibration results is available (OHearn et al., 2002). For brevity, we discuss only the parameter assignments that yielded an acceptable calibration:
Figure 4 shows the general relationship between the CVA scores and the amount of r.m.s.d. between structures. The CVA scores were found to increase monotonically with the degree of similarity as evidenced by a strong negative correlation between the CVA scores and the r.m.s.d. values, r = -0.85.
|
Verification
A Web-accessible listing of the verification results is available (OHearn et al., 2002). The MolCom method was found to produce biologically accurate CVA scores on the 1053 pairs of structures. Figure 5
shows a scatter plot of MolComs CVA scores versus z-scores from the CE method. The regression line relating CVA scores to z-scores was
|
![]() |
The CVA scores correlated strongly with the z-scores, r = 0.82. Figure 5 also shows four intersecting areas where mismatches in scoring pairs of structures were considered significant. These areas are also listed in Table IV
. Each area is bounded by a z-score domain boundary and a CVA score range boundary; mismatches in other areas were considered marginal.
|
|
The number of incorrectly awarded CVA scores was seven out of 1054 (>99% accuracy). No false positives were detected; all pairs of structures formed between groups (dissimilar pairs) received CVA scores <70%. The seven CVA scores considered incorrect (pairs within groups receiving CVA scores <50%) all exhibited highly discordant secondary structural geometries (the structure pairs are listed in the right-hand column of Table IV). Of the seven misclassified pairs, the CE method also misclassified five of the pairs as being insignificant, with z < 3.7 (1AYJ+1CN2, 1BCG+1GPS, 1CDR+1CTX, 1CN2+1GPS, 1GPS+2B3C) and classified one pair (1CHZ+1GPS) with z = 3.7. The largest mismatch in scoring (1GPS+1SNB) between CE (z = 3.9) and MolCom (CVA = 46.9%) was in the CE twilight zone. The 1SNB structure contains a large looped region not present in the 1GPS structure. The resulting discord in centers of mass causes MolComs low score.
Qualitative visual assessment of the subset of MolComs superimposed structure alignments showed that all alignments appeared correctly superimposed. All lsqkab-generated alignments below 4 Å were virtually indistinguishable from alignments generated by MolCom. In all cases, the major trends in SSEs were noticeably coincident with an increasing amount of overlap as the CVA score increased.
MolCom was able to detect at least marginal similarity in 10 out of the 12 challenging pairs of structures listed in Table V deemed similar by experts (Shindyalov and Bourne, 1998
). Pairs 14 were not detected as similar by DALI or VAST; pairs 5 and 6 were not detected by DALI; pairs 7 and 8 were not detected by VAST. Pairs 912 (and pair 8) were rather complex structures with low sequence identities (well under 30%). MolCom did not consider pairs 6 and 8 to be similar, but the alignments were reasonable. Only VAST detected similarity in pair 6. CE detected similarity in all pairs detected by MolCom, along with pair 8. However, the structures of pair 8 were considered similar by CE primarily because of matching localized regions of sequence and not because of global, 3-D structural similarity. MolCom, in fact, produced a reasonable CVA score for pair 8 despite the amount of global structural differences.
As further illustration of the operation of MolCom, Figure 7 shows an example superposition produced by MolCom of two similar structures from Staphylococcus aureus, V8 protease (V8prot) [recently resolved by X-ray crystallography at the University of Saskatchewan (Prasad et al., 2001
)] and epidermolytic toxin A (1AGJ). Figure 8
shows another example superposition generated by MolCom involving two large TIM-barrel molecules, enolase (1EBH) and mandelate racemase (2MNR).
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Using a modest-sized collection of properties and an unoptimized but acceptable calibration, MolCom performed as well as the CE method in detecting similarity. We conclude that MolCom gives biologically accurate results since CE is considered to do so and MolComs results are consistent with those of CE. MolCom has the potential to be optimized for detecting (and revealing the locations of) extremely subtle differences in similar molecules by expanding the set of properties considered or by increasing the exploration threshold parameter. Thus MolCom can become an indispensable tool of the protein crystallographer for studying subtle differences in crystal structures resulting from differences in determination method (X-ray crystallography, NMR spectroscopy), substrate conformations in the ligand-bound and ligand-unbound states and so on.
The MolCom paradigm can be extended in several ways to handle a wider range of comparative and analytical problems, from comparing two or more structures, to analyzing individual structures and summarizing their structural and chemical characteristics. To this end, annotated octtrees will be incorporated into the algorithm in the near term. Annotation will allow octtree-based comparison spaces to be created in which newly resolved molecules can be rapidly compared and classified. Annotated octtrees could also be used to represent the nature and locations of structural and chemical properties analyzed for a single protein structure in three-dimensional space. It is possible that additional calibration testing would result in settings that give even better verification results. For instance, some of the parameters that were simply assigned values could be set using empirical tests. For improved robustness, the calibration set could be extended to incorporate more diverse proteins. The extended calibration set would be kept available for investigators to add new property types to the method. A substructural co-location algorithm will be developed (OHearn, 2000), which will bring regions of statistically significant similarity into closer contact. Co-location should dramatically increase sensitivity to subtle similarities and differences in hierarchically complex aggregate features, such as binding sites. Additional properites will also be added to MolCom, such as regional dihedral angle distribution tendencies [suggested by Prasad et al. (Prasad et al., 2001
)] and other collections of properties for comparing surfaces and for exploring complementarity in structures. A fast alignment method is under development. It is based on a superposition of the actual observation points using a four-dimensional r.m.s.d. algorithm which incorporates three observation point positional vectors and one observation point type vector. The fast alignment method will allow a local version of MolCom to be created which will be able to detect interesting similarities between molecules of substantially different sizes. We also intend to build a property browser allowing an investigator to view configurable representations of property densities through space.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Alexandrov,N.N. Takahashi,K. and Go,N. (1992) J. Mol. Biol., 225, 59.[ISI][Medline]
Collaborative Computational Project No. 4 (1994) Acta Crystallogr., P50, 760763.
Falicov,A. and Cohen,F.E. (1996) J. Mol. Biol., 258, 871892.[CrossRef][ISI][Medline]
Foley,J.D., van Dam,A., Feiner,S.K. and Hughes,J.F. (1990) Computer Graphics, Principles and Practice. Addison-Wesley, Reading, MA.
Gilbert,D. and Westhead,D. (1998) http//www.ebi.ac.uk/~drg/epsrc-vf/epsrc-report-long.html.
Gilbrat,J.F., Madej,T. and Bryant,S.H. (1996) Curr. Opin. Struct. Biol., 6, 377385.[CrossRef][ISI][Medline]
Holm,L. and Sander C. (1993) J. Mol. Biol., 233, 123138.[CrossRef][ISI][Medline]
Hyams,D. (1997) http//www.ebicom.net/~dhyams/cmain.htm.
Humphrey,W., Dalke,A. and Schulten K. (1996) J. Mol. Graphics, 14, 3338.[CrossRef][ISI][Medline]
Lehninger,A.L., Nelson,D.L. and Cox,M.M. (1993) Principles of Biochemistry. Worth, New York.
Mezey,P.G. (1993) Shape in Chemistry, an Introduction to Molecular Shape and Topology. VCH, New York.
Murzin,A.G., Brenner,S.E., Hubbard,T.J.P. and Chothia C. (1998) http//scop.mrc-lmb.cam.ac.uk/scop/intro.html.
OHearn,S. (2000) Thesis, University of Saskatchewan. http//www.octtech.com/publications/ohearn_thesis.pdf.
OHearn,S. and Kusalik,A. (2000) P073, Intelligent Systems for Molecular Biology. http//ismb2000.sdsc.edu/posters_abstracts/MondayPS.html (download, http//www.octtech.com/publications/ohearn_poster.pdf).
OHearn,S., Kusalik,A. and Angel J (2002) The Protein 3-D Structure Alignment Web Site. http//www.octtech.com/MolCom3D.
Prasad, L, Hayakawa,K., Leduc,Y. and Delbaere,L. (2001) P097. American Crystallographic Association Abstracts. http//www.hwi.buffalo.edu/ACA/ACA01/abstracts/PSII.html
Sander,C. and Holm,L. (1993) J. Mol. Biol., 233, 123138.[CrossRef][ISI][Medline]
Sander,C. and Holm,L. (1996) Science, 273, 595602.
Shindyalov,I.N. and Bourne,P.E. (1998) Protein Eng., 11, 739747.[Abstract]
Shortle,D. (1998) Curr. Biol., 7, 151154.
Taylor,W.R. and Orengo,C.A. (1989) J. Mol. Biol., 208, 122.[ISI][Medline]
Received May 15, 2002; revised October 28, 2002; accepted December 17, 2002.