Spatial sign-alternating charge clusters in globular proteins

Yuri N. Chirgadze1 and Elena A. Larionova

Institute of Protein Research, Russian Academy of Sciences, 142292 Pushchino, Moscow, Russia


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Program
 Results
 Discussion
 Note added in proof
 References
 
Large sign-alternating charge clusters formed by the charged side groups of amino acid residues and N- and C-terminal groups were found in the majority of considered globular proteins, namely 235 in a total of 274 protein structures, i.e. 85.8%. The clusters were determined by the criteria proposed earlier: charged groups were included in the cluster if their charged N and O atoms were located at distances between 2.4 and 7.0 Å. The set of selected proteins consisted of known non-homologous protein structures from the Protein Data Bank with a resolution less than or equal to 2.5 Å and pair sequence similarity less than 25%. Molecular masses of the proteins were from 5.5 to 91.5 kDa and protein chain length from 50 to 830 residues. The distribution of charged groups on the protein surface between isolated charged groups, small clusters with two and three groups, and large clusters with four or more groups were found to be approximately similar making 33, 35 and 32% of the total amount of protein charged groups, respectively. The large sign-alternating charge clusters with four or more charged groups were studied in greater detail. The amount of such clusters depends on the protein chain length. The small proteins contain 1–3 clusters while the large proteins display 4–6 or more clusters. On average, 1.5 clusters per each 100 residues were observed. In contrast with this, the size of a cluster, i.e. the number of charged groups inside a cluster, does not depend on the protein molecular mass, and large clusters are observed for proteins from a range of molecular masses. Clusters consisting of four to six charged groups occur most frequently, although extra large clusters are also often revealed. We can conclude that sign-alternating charge clusters are a common feature of the protein surface of globular protein. They are suggested to play a general functional role as a local polar factor of protein surface.

Keywords: charged groups/globular proteins/hydrophilicity/molecular surface/sign-alternating charge cluster


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Program
 Results
 Discussion
 Note added in proof
 References
 
One of the general features of the protein surface is the existence of extended regions with different functional features. At present some of such regions have been detected and examined in globular proteins. They are as follows: the polar areas sized by continuous networks of hydrogen bonds between polar side chains and water molecules (Peters and Peters, 1993Go); areas of ionic pairs clusters (Musofia et al., 1995Go); the extended polar charge areas formed with spatial sign-alternating charged groups of protein (Chirgadze and Tabolina, 1996Go); and the hydrophobic patches of non-polar groups (Lijnzaad et al., 1996Go).

`Complex salt bridges', i.e. clusters of ionic pairs with the charge-to-charge distance less than or equal to 4 Å have been recently examined in the PDB subset of protein structures (Musofia et al., 1995Go). It was revealed that 60% of the studied proteins contained complex salt bridges and most of them had an important function in intersubunit interactions. The extended charge polar regions which include spatial sign-alternating charge clusters were found in the calf eye lens protein {gamma}-crystallins (Chirgadze and Tabolina, 1996Go). Here the charge-to-charge distances were stated to be in the range less than or equal to 7.0 Å. It was shown that {gamma}-crystallin has five such large enough clusters with four to six charged groups which compose 54% of the total charged groups in this protein. Common stereochemical properties of sign-alternating charge clusters has resulted in them being described as surface structural invariants. The evolutionary conservatism of these clusters was confirmed for all members of the {gamma}-crystallin family of vertebrates including fish, frog, mouse, rat, calf and man. The charge clusters play two functional roles in {gamma}-crystallins. One is connected with a decrease of the surface `hydrophilic potential' on the cluster areas. It allows the native protein to exist in the condensed medium of eye lens (Wistow et al., 1983Go). Another function is an increase in local stability of the protein internal structure (Chirgadze, 1996Go). The result obtained for {gamma}-crystallin encouraged us to study the existence of sign-alternating charge clusters in all known protein structures which have been determined with high resolution. We will see below that the charge clusters of this type are wide-spread among globular proteins and are of general interest for studying the protein surface.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Program
 Results
 Discussion
 Note added in proof
 References
 
Definition of sign-alternating charge clusters

Ion pairs on the protein surface are formed by the side groups of amino acid residues and can be divided into two types. One is related with short contact ion pairs which are postulated to have intercharge distances less than 4.0 Å (Barlow and Thornton, 1983Go). The other is connected with distant ion pairs having intercharge distances from 4 to 7 Å. It should be noted that in both cases neither water molecule nor any other protein atomic group can be situated between the charged atoms. Herein we follow the cluster definition given in the previous paper (Chirgadze and Tabolina, 1996Go) which includes ion pairs of both types. It was assumed that a sign-alternating charge cluster is a system of oppositely charged atoms of amino acid side chains and N- and C-terminal charged groups with distances between charged atoms of 2.4 < d <= 7.0 Å. In this approximation, point charges were placed at the centres of corresponding oxygen and nitrogen atoms. Positive charged groups of Lys, Arg and the N-terminus and negative charged groups of Asp, Glu and the C-terminus were considered. The pH value was suggested to be neutral, and side groups of histidine residues were not taken into account. Such clusters differ from the small-size clusters formed by conventional short contact ion pairs by the extent of the local area which is evenly filled only with the charged atoms or atomic groups. As mentioned above, these clusters can be considered as a surface structural invariant. Therefore, we paid attention mainly to large-size clusters. Although the charged groups on the protein surface are distributed between single isolated groups, small-size clusters of two and three groups, and large-size clusters formed by four or more charged groups.


    Program
 Top
 Abstract
 Introduction
 Materials and methods
 Program
 Results
 Discussion
 Note added in proof
 References
 
Searching for sign-alternating charge clusters was performed by means of the CLUSTER program (Chirgadze and Tabolina, 1996Go). (The program CLUSTER can be sent by request.) In this work we used the modified version 03.97. The program finds charge clusters in the protein molecule, produces a list of charged atoms in the cluster and yields statistics, in particular, on the amount of clusters and their size. Thus, we can obtain main results on the isolated charged residues, small-size and large-size charge clusters. Statistics on the composition of charge clusters was also obtained.

Data

We have analysed protein structure data from the Brookhaven Protein Data Bank (Bernstein et al., 1977Go). Only high resolution 3D-crystallographic data with the resolution limit equal to or less than 2.5 Å were chosen. Non-homologous proteins with a pair sequence identity of no more than 25% were taken into account. Small proteins with a chain length of less than 100 residues were allowed to have a sequence similarity less than 30%. The smallest proteins with chain lengths of less than 50 residues were not examined. To obtain a high quality data set at the final step of file selection, a visual inspection of the PDB files was performed. Only a few files were discarded, mainly because of the lack of partial data such as side group of charged residues. Very often a few residues of the N- and C-termini or some parts in the middle of the chain were lacking. The protein structure files with such deficiency were also treated. Sometimes twin positions of charged groups were found in the data files with a resolution less than 1.8 Å. In these cases, we chose the first position of the charged atom.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Program
 Results
 Discussion
 Note added in proof
 References
 
Number of charge clusters depending on protein chain length

An accurate treatment of charge clusters was done at the atomic approximation described in our previous paper (Chirgadze and Tabolina, 1996Go). However, counting of charged groups is a much more common practice. Because of this we considered as large enough only clusters with four or more charged groups which consist of at least five charged atoms. The main result is that the large charge clusters were observed in 235 of a total of 274 selected protein structures, and only in 39 structures were they not found (Table IGo). Thus the majority of proteins, i.e. 85.8% display charge clusters on their protein surfaces. It should be also noted that almost 100% of proteins of medium and large size, starting from chain lengths with 200 residues, always contain large charge clusters.


View this table:
[in this window]
[in a new window]
 
Table 1. PDB codes and chain identifiers of proteins examined
 
The amount of clusters in the protein molecule depends on the protein chain length. We selected protein structures with chain lengths from 50 to 830 amino acid residues which corresponds approximately to a range of molecular masses from 5.5 to 91.5 kDa. The distribution of the selected proteins depending on their chain lengths is presented in Figure 1Go where the proteins are gathered in the intervals. The proteins with a chain length between 50 and 200 residues are most representative.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 1. Distribution of 274 selected proteins depending on their chain length.

 
A relative amount of proteins distributed along the groups with a different number of large-size charge clusters is presented in Table IIGo. We united proteins of large-size into wider intervals of chain length. The distribution of proteins which contain large sign-alternating charge clusters is as follows. We can see that the majority of proteins which do not contain large-charge clusters at all are distributed mainly among the small-size proteins within the interval of protein chain length of 50–100 residues. The proteins with a chain length of 50–200 residues contain either no clusters or 1–3 clusters in the molecule. In middle-size proteins of 200–300 residues, 2–4 clusters are often met, while in the large proteins with a chain length of more than 300 residues, 5–6 or even more clusters are observed. General dependence of the total amount of charge clusters on the chain length of a protein molecule is shown in Figure 2Go. On average, 1.5 large charge clusters are observed for each 100 residues. The observed regularities seem to be related to the size of the protein and, correspondingly, the size of the protein surface and reflect the increase in the absolute value of charged residues in large protein molecules.


View this table:
[in this window]
[in a new window]
 
Table II. Relative amount of large sign-alternating charge clusters in proteins with different chain length
 


View larger version (18K):
[in this window]
[in a new window]
 
Fig. 2. Amount of charge clusters in the protein molecule as a function of protein chain length. Only the data on large-size clusters with four or more charged groups is presented.

 
Size of charge clusters

Total distribution of the charge cluster depending on their size is presented in Table IIIGo for 235 globular proteins. We have determined three main types of cluster depending on their size which is assumed to be simply the number of charged groups composing the cluster. From a total of 13 754 charged groups considered for these proteins, isolated charged groups make 32.5%, small-size clusters of 2–3 charged groups contain 35.1% groups, and large clusters of four or more groups gather 32.4% charged groups. This suggests that in globular proteins nearly one third of all charged groups is very often united in surface large-size sign-alternating charge clusters. In total, 805 such clusters were observed. The protein cluster size as a function of protein chain length is presented in Figure 3Go. Cluster sizes from a wide range, from 4 to 22 charged groups, were observed. Clusters of four and six charged groups occur most frequently. However, extra large clusters which include seven or even more charged groups are also widely spread. It is interesting to note that the existence of extra-large clusters does not depend on the protein chain length, and such clusters occur in a whole range of protein chain lengths from 50 to 830 amino acid residues.


View this table:
[in this window]
[in a new window]
 
Table III. Statistics of sign-alternating charge clusters of different size as observed in 235 proteins
 


View larger version (24K):
[in this window]
[in a new window]
 
Fig. 3. Cluster size, i.e. number of cluster charged groups, as a function of protein chain length.

 
Composition of charge clusters

Relative contents of various charged groups which form clusters are presented in Table IVGo. In general, charged groups of different kinds are equally distributed between three cluster types. However, some groups have an obvious preference. The charged side group of Arg has a lower content in single isolated groups but a higher content in large size clusters. The charged side group of Lys and the carboxyl group of C-termini are preferential in small size clusters but are weakly presented in large size clusters. As concerns large-size clusters, we can conclude that Arg is a higher component giving 41.1%, while Lys and C-termini are lower components giving 28.0 and 21.5%, respectively. The obtained result on Arg and Lys coincides completely with the conclusion of Musofia et al. (1995) on the composition of the so-called `complex salt bridges' consisting of 3–5 ionic groups.


View this table:
[in this window]
[in a new window]
 
Table IV. Relative contents of various charged groups in sign-alternating charge clusters of globular proteins
 

    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Program
 Results
 Discussion
 Note added in proof
 References
 
The results suggest that sign-alternating charge clusters are a rather common feature of globular proteins. Virtually 86% of the considered protein structures have been found to contain large-size charge clusters. Charge clusters with four to six charged groups are the most frequently occurring although extra large clusters with 7–10 or even more charged groups are also observed. An arginine side group is slightly more preferential, and a lysine side group and carboxyl group of C-termini participate weakly in the composition of large-size charge clusters. At present there is no doubt about the functional importance of sign-alternating charge clusters. Their most general function seems to be connected with the local polarity and possibly stability of the protein surface. Earlier data analysis did not allow one to explain unambiguously an increase of the protein stability by introducing salt bridges (Lee and Vasmatzis, 1997Go). However, recent consideration of protein families shows that stability of thermophilic proteins is clearly connected with the increase of hydrogen bonds and salt bridges as well for the majority of families (Vogt and Argos, 1997Go). Thus one can suggest that at least some definite charge clusters could play a role of a stabilizing factor of the protein surface. In fact, direct connection of charge clusters with the parts of secondary and tertiary molecular structure has been found for charge clusters in the structure of calf eye lens {gamma}-crystallins (Chirgadze, 1996Go). A detailed study of large charge clusters in {gamma}-crystallins showed that they display some specific features, such as plane geometry, large linear dimension, water arrangement along the cluster boundary, etc., which should also be taken into account. However, the most interesting feature which seems to be meaningful only for a future consideration, is a decrease of the `hydrophilic potential' of the protein surface in the cluster area (Wistow et al., 1983Go). This is a consequence of partial shielding of oppositely charged atoms. Another important but not common role of sign-alternating charge clusters is their possible participation in the formation of the quaternary structure with intersubunit interaction or any other type of protein–protein association. The joining of two secondary structures in the contacting protein molecules have been observed in a number of proteins containing complex salt bridges in this area (Musofia et al., 1995Go) which are, in fact, particular cases of considered charge clusters. Spatial alternating charge clusters can also play a specific functional role of a recognition site. An example is the Lys-Glu-Lys-Glu motif which is present in multicatalytic proteases and in chaperonins (Realini et al., 1994Go). We suggest also that the charge clusters can play a very significant role at the last stages of protein folding. Finally, one could also take into account surface charge clusters in designing the protein molecules.


    Note added in proof
 Top
 Abstract
 Introduction
 Materials and methods
 Program
 Results
 Discussion
 Note added in proof
 References
 
The theoretical probability of occurrence of charge clusters was estimated depending on their size [E.Larionova and Yu Chirgadze (1998) Mol. Biol., 32, 1–5 (Russian)]. It was shown that the observed distribution of the single charge group, small-size clusters and large charge clusters of 4–6 groups was satisfied by occasional occurrence. But the gigantic clusters consisting of seven or more charge groups were observed in the proteins much more frequently than would be expected from occasional occurrence.


    Acknowledgments
 
This work was supported by the Russian Academy of Sciences and the Russian Foundation for Basic Research (grant No. 96-04-48585).


    Notes
 
1 To whom correspondence should be addressed Back


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Program
 Results
 Discussion
 Note added in proof
 References
 
Barlow,D.J. and Thornton,J.M. (1983) J. Mol. Biol., 168, 867–885.[ISI][Medline]

Bernstein,F.C., Koetzle,T., Williams,G., Meyer,E., Brice,M., Rogers,J., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535–542.[ISI][Medline]

Chirgadze,Yu.N. (1996) Molek. Biologia (in Russian), 30, 343–347. English translation, pp. 202–205.

Chirgadze,Yu.N. and Tabolina,O.Yu. (1996) Protein Engng, 9, 745–754.[Abstract]

Lee,B. and Vasmatzis,G. (1997) Curr. Opin. Biotechnol. 8, 423–428.[ISI][Medline]

Lijnzaad,P., Berendsen,H.J.C. and Argos,P. (1996) Proteins, 25, 389–397.[ISI][Medline]

Musofia,B., Buchner,V. and Arad,D. (1995) J. Mol. Biol., 254, 761–770.[ISI][Medline]

Peters,D. and Peters,J. (1993) Mol. Engng, 2, 375–400.

Realini,C., Rogers,S.W. and Rechsteiner,M. (1994) FEBS Lett., 348, 109–113.[ISI][Medline]

Vogt,G. and Argos,P. (1997) Fold. Design, 2, 540–548.

Wistow,G., Turnell,B., Summers,L., Slingsby,C., Moss,D., Miller,L., Lindley,P.F. and Blundell,T.L. (1983) J. Mol. Biol., 170, 175–202.[ISI][Medline]

Received December 17, 1997; revised May 12, 1998; accepted September 22, 1998.