On the rotational operators in protein structure simulations

Carlos Alvarado1 and Kazem Kazerounian2,3

1Biomedical Engineering Program and 2Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269-3139, USA

3 To whom correspondence should be addressed. e-mail: kazem{at}engr.uconn.edu


    Abstract
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
The reduction of the computational complexity of the algorithms dealing with protein structure analysis and conformation predictions is of prime importance. One common element in most of these algorithms is the process of transforming geometrical information between dihedral angles and Cartesian coordinates of the atoms in the protein using rotational operators. In the literature, the operators used in protein structures are rotation matrices, quaternions in vector and matrix forms and the Rodrigues–Gibbs formula. In the protein structure-related literature, the most widely promoted rotational operator is the quaternions operator. In this work, we studied the computational efficiency of the mathematical operations of the above rotational operators applied to protein structures. A similar study applied to protein structures has not been reported previously. We concluded that the computational efficiency of these rotational operators applied to protein chains is different from those reported for other applications (such as mechanical machinery) and the conclusions are not analogous. Rotation matrices are the most efficient mathematical operators in the protein chains. We examined our findings in two protein molecules: Ab1 tyrosine kinase and heparin-binding growth factor 2. We found that the rotation matrix operator has between 2 and 187% fewer mathematical operations than the other rotational operators.

Keywords: protein/rotation/simulations/transformation


    Introduction
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
The computer algorithms used for the analysis and simulation of protein structures are computationally intensive. These programs require a set of generalized coordinates to define specific three-dimensional geometrical structures. The two commonly used sets of generalized coordinates are (i) the Cartesian coordinates of the atoms in the protein molecule as the generalized coordinates (similar to PDB file formats) and (ii) the dihedral angles. The application of dihedral angles has the advantage of reducing the dimension of search space drastically. However, in almost all cases, transforming the geometrical information back and forth between dihedral angles (as generalized coordinates) and Cartesian coordinates of the atoms in the chain is needed. In other words, given the dihedral angles we calculate the x, y and z coordinates of all atoms in the macromolecules (and vice versa). This transformation mainly relies on what is referred to as a ‘rotation operator’.

A rotation operator is the mathematical means of ‘rotating’ a ‘point’ (e.g. atom) in space by a given angle. Protein structures can be described as a serial combination of such rotations (Branden and Tooze, 1999Go), which are called open serial chains. An analogy of an open serial chain is the body of a snake, where the vertebras (links) can move relative to each other about their joints. The representative structural description of a protein’s open serial chain is shown in Figure 1.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 1. Open serial chain rotation operations. The open serial chains (such as a protein molecule), are geometrically defined by the body vectors, b, unit screw vectors, u, and joint angles, {theta}. The task with protein structures is to determine the new position of the body vectors (or atom positions) and the unit screw vectors for given dihedral and rotameter angles (in our study we consider only the dihedral angles).

 
Every time that we calculate the position of atoms (or body vectors) of protein structures, a sequential composition (multiplication) of several rotational operators is required. This process is computationally very expensive.

Several rotational operators have been developed over the past few decades. These methods are widely reported in the field of analytical and theoretical kinematics. The rotational operators used in protein simulation literature are the quaternion descriptions (Jain et al., 1993Go; Kneller and Hinsen, 1994Go), rotation matrices (Dullweber et al., 1997Go), transformations (Manocha et al., 1995Go) and Rodrigues–Gibbs [similar to the vector rotation approach found in Güntert (Güntert, 1993Go)]. The most widely used method in the modeling of molecular systems is the quaternions in its matrix form.

Our intention in this work was to study and compare the number of operations required in the application of (i) the rotation matrix, (ii) the quaternions vector form, (iii) the quaternions in matrix form, and (iv) the Rodrigues formula to rotate atoms in protein chains. The operator that requires the fewest operations decreases the computational costs of molecular simulations. A reduction in unnecessary computationally expensive calculations improves the computational efficiency of computer simulations.

Funda and Paul performed a similar computational comparison of rotational operators (excluding the Rodrigues formula) (Funda and Paul, 1988Go). Their conclusion, however, is only true for small mechanical chains. They only considered a sequential chain of two revolute joints and assumed that the result increases linearly with the number of joints.

In this study, we not only calculated the number of operations of two compositions, but also provided a general formulation that accounts for a larger number of compositions such as are found in protein chains. We have shown that the conclusions vary significantly as the number of joints increases (as is the case in protein structures) and therefore the results reported previously are no longer valid. We examine our findings with two examples: Abl tyrosine kinase and heparin-binding growth factor 2.


    Methods
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Evaluation of rotational operators in protein structures is of prime importance for selecting the least computationally expensive operator in protein simulations. This study compares several rotational operators, the rotation matrix, quaternions (in both vector and matrix form) and Rodrigues formula, to decide which one requires the least number of operations. This by no means suggests that these are the only rotation operators, but they are the methods that have been widely used in molecular simulations.

First, the different operators are introduced and their respective number of operations applied to protein chains. Then, we show the number of operations required to rotate atoms or vectors of two proteins: Abl tyrosine kinase (62 amino acids) and heparin-binding growth factor 2 (155 amino acids).

Rotation matrix

The rotation matrix operator definition to rotate a vector in space about a screw axis, u, by an angle, {theta}, is (Ha and Radcliffe, 1978Go)

where C{theta} and S{theta} represent cos {theta} and sin {theta}, respectively.

Evaluation of the rotation matrix requires 15 multiplications (*), 10 additions (+) and two function evaluations (f). To determine the rotation of a single body vector about a unit screw axis by an angle {theta}, the following relation is used:

This operation requires 9 (*) and 6 (+) for multiplying a rotation matrix and a vector. In open serial chains, there is a sequential multiplication of rotational operators. To determine the rotation of the last vector (see Figure 1), we have a composition of rotation operators:

The composition of two rotational operators accounts for 27 (*) and 18 (+). Again, referring to Figure 1, vector bM–1 is affected by the same joints as bM, so its new position is defined as follows:

Vector bM–2 is not affected by the joint at uN, so the equation becomes

This can be applied to all the vectors in the kinematic chain defining the position of the vectors. The general equation describing the total number of operations [assuming that (*) and (+) have the same computational cost (Field, 1999Go)] to perform rotation of vectors in open serial chains is

where N is the number of degrees of freedom, M is the number of amino acids and Mj is the number of vectors to operate per amino acid (in fact the number of atoms per amino acid). In proteins {sum}Mj is greater than N or, in other words, there are more atoms than degrees of freedom. (Note: only the dihedral angle contributions are considered, not the rotameter angles.)

Quaternion vector operator form

A quaternion/vector operator (Bottema and Roth, 1979Go) that determines the rotation of a single body vector can be defined as

where is a quaternion number that has four components as follows:

[3 (*)]. Evaluation of this rotational operator gives 22 (*), 19 (+) and 2 (f). The position of vector bM of Figure 1 is

Sequential multiplication of two quaternions has 16 (*) and 12 (+). The same joints affect vector bM–1 compared with bM, so its new position is defined as follows:

Vector bM–2 is not affected by the joint at uN, so the equation becomes

This can be applied to all the vectors in the open serial chain defining the position of the vectors. The general equation describing the total number of operations to perform rotation of vectors in open serial chains is

Quaternions in matrix form

In addition to the quaternion vector form, there is a matrix form of the quaternion. This form can be from the rotation matrix (Sheperd, 1978Go) and is shown below:

where, q0 = cos ({theta}/2), q1 = ux sin ({theta}/2), q2 = uy sin ({theta}/2), and q3 = uz sin ({theta}/2).

Evaluation of operations of this matrix gives 16 (*), 13 (+) and 2 (f). This operator now acts in the same way as the rotation matrix. The equation below shows how this quaternion matrix operates in a different way with composition of rotations (see Figure 1):

The sequential composition of two rotational operators accounts for 27 (*) and 18 (+). The same joints affect vector bM–1 as compared with bM, so its new position is defined as follows:

Vector bM–2 is not affected by the joint at uN, so the equation becomes

This can be applied to all the vectors in the kinematic chain defining the position of the vectors. The general equation describing the total number of operations to perform rotation of vectors in open serial chains is

Other quaternion formulations can be found in the literature (Horn, 1987Go), but they are similar to the quaternion vector form.

Rodrigues

The Rodrigues–Gibbs formulation [derived from Bottema and Roth (Bottema and Roth, 1979Go) or Goldstein (Goldstein, 1980Go)] is a screw transformation in terms of a general coordinate system:

This accounts for 21 (*), 12 (+) and 2 (f) for each rotation operation. Again, we show how this method could be applied to successive N rotations and Mj vectors (see Figure 1). For example, let us define the rotation of the body vector, b5, which is affected by the first and second joints (two rotations, one vector):

which accounts for 21 (*), 12 (+), 2(f)];

which accounts for 15 (*), 11 (+). This accounts for a total of 36 (*), 23 (+) and 2 (f). The Rodrigues–Gibbs formula differs from the rotation matrix and quaternion operators in that we have to substitute one vector into the other to obtain its final position. This approach can be applied to all the vectors in the kinematic chain defining the position of all the body vectors. The general equation describing the total number of operations to perform rotation of vectors in open serial chains is

In order to identify which method is the least computationally expensive, it will be helpful to show examples that require the operation of various vectors and joints (or degrees of freedom). In the next section we consider two protein examples, which meet the requirements mentioned before, to understand which of the methods is more computationally efficient.

Rotational operators applied to proteins

In this section, the number of operations to rotate the atoms of proteins Abl tyrosine kinase and heparin-binding growth factor 2 is determined using Equations 1–4. Kinase is an enzyme that attaches phosphate groups to organic substrates that has 62 amino acids: MNDPNLFVALYDFVASGDNTLSITK GEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS. Keratin is a protein responsible for the structural skin surface, hair, nails, etc. It has 155 amino acids: MAAG SITTLPALPEDGGSGAFPPGHFKDPKRLYCKNGGFFLRIHPDGRVDGVREKSDPHIKLQLQAEERGVVSIKGVSANRYLAMKEDGRLLASKSVTDECFFFERLESNNYNTYRSRKYTSWYVALKRTGQYKLGSKTGPGQKAILFLPMSAKS.

To determine the number of operations we (i) identify atoms affected by a rotation(s), because not all atoms are affected by all dihedral angles; and (ii) substitute corresponding N rotation(s) (dihedral angles), M amino acids and Mj vectors (atoms) in the four derived equations (1–4). Table I summarizes the number of computations required by each method to describe the position of atoms.


View this table:
[in this window]
[in a new window]
 
Table I. Operations of protein body vectors
 

    Results
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Table I show that the rotation matrix requires a smaller number of operations than the other methods. The percentage difference in the number of mathematical operations of the rotational operators with respect to the rotation matrix is also given. The quaternion in matrix form is the closest to the rotation matrix with 2% more mathematical operations than the rotation matrix. The Rodrigues–Gibbs formulation requires the largest number of operations with 187% more mathematical operations than the rotation matrix.


    Discussion
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
In order to understand why the rotation matrix formulation requires the least number of operations in protein structures, let us take a closer look at the equations for each rotation operator. If Mj (number of atoms per amino acid) is larger that N (dihedral angles) the rotation matrix is computationally the least expensive as compared to the other rotational operators. In the case that N is greater than Mj, then it could be possible for the quaternion matrix to be less computationally expensive than the rotation matrix. However, this situation does not arise in the protein chain because Mj is always greater than N.

In addition, vector operators, such as the quaternions in vector form and the Rodrigues–Gibbs formulation, are more computationally expensive for protein chain descriptions. Both the rotation matrix and the quaternions in matrix form require significantly fewer mathematical operations than the vector operators.


    References
 Top
 Abstract
 Introduction
 Methods
 Results
 Discussion
 References
 
Bottema,O. and Roth,B. (1979) Theoretical Kinematics. Dover, New York, pp. 56–62 and 518–525.

Branden,C. and Tooze,J. (1999) Introduction to Protein Structure, 2nd edn. Garland Publishing, New York.

Dullweber,A., Leimkuhler,B. and McLachlan,R. (1997) J. Chem. Phys., 107, 5840–5851.[CrossRef][ISI]

Field,M. (1999) A Practical Introduction to the Simulation of Molecular Systems. Cambridge University Press, Cambridge.

Funda,J. and Paul,R. (1988) Presented at the IEEE International Conference on Robotics and Automation, April 24–29, 1988, Philadelphia, PA.

Goldstein,H. (1980) Classical Mechanics. Addison–Wesley, Reading, MA, Ch. 4.

Güntert,P. (1993) Dissertation, ETH, Zurich, No. 10135.

Ha,C. and Radcliffe,C.W. (1978) Kinematics and Mechanism Design. Wiley, New York, pp. 45–51.

Horn,B.K.P. (1987) J. Opt. Soc. Am. A, 4, 629–642.[ISI]

Jain,A., Vaidehi,N. and Rodriguez,G. (1993) J. Comput. Phys., 106, 258–268.[CrossRef][ISI]

Kernighan,R. and Pike,R. (1999) The Practice of Programming. Addison–Wesley, Reading, MA.

Kneller,G. and Hinsen,K. (1994) Phys. Rev. E, 50, 1559–1564.[CrossRef][ISI]

Manocha,D., Zhu,Y. and Wright,W. (1995) Comput. Appl. Biol. Sci., 11, 71–86.[Abstract]

Sheperd,S. (1978) AIAA J. Guidance Control Dyn., 1, 223–224.

Received June 6, 2003; revised July 17, 2003; accepted August 20, 2003.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (3)
Request Permissions
Google Scholar
Articles by Alvarado, C.
Articles by Kazerounian, K.
PubMed
PubMed Citation
Articles by Alvarado, C.
Articles by Kazerounian, K.