Biostructure and Protein Engineering Group, Department of Life Sciences, Aalborg University, Sohngaardsholmsvej 49, DK-9000 Aalborg, Denmark
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: amino acid properties/protein engineering/solvent accessibility/spatial contacts/structural preference
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The spatial neighbourhood around individual residues has also been previously investigated (
Burley and Petsko, 1985
;
Bryant and Amzel, 1987
;
Miyazawa and Jernigan, 1993
;
Petersen et al., 1999
). Further, spatial contacts have been studied to derive contact potentials for the different amino acid interactions (
Brocchieri and Karlin, 1995
;
Miyazawa and Jernigan, 1996
,
1999
). The common strategy is to study the number of contacts within a given distance cut-off. However, the literature seems devoid of investigations of distancedependent contacts and also of reports utilizing the embedded information of the solvent accessibility of the residues involved.
A two-state prediction of solvent accessibility correlation between hydrophobicity, buried contact propensity and the location in the prediction window has been reported (
Mucchielli-Giorgi et al., 1999
). However, it does not describe any correlation between individual residue distributions.
It is important to be able to discriminate between correctly folded and misfolded model structures. It has been pointed out that potential energy-based methods do not discriminate well between folded and misfolded structures. However, structural features such as buried polar surface (
Overington et al., 1992
) and number of polar contacts (
Bryant and Amzel, 1987
;
Golovanov et al., 1999
) have proven valuable.
In protein engineering the concept of conservative mutations is frequently used. The general idea is that a substitution of an amino acid with another amino acid with similar physico-chemical properties will not influence the stability and function of the protein. The present paper shows that the spatial preferences for similar residues can be dramatically different in protein structures under similar circumstances (in this context solvent accessibility).
The results of the neighbour analysis will be valuable in model validation, as a tool for structure prediction and especially as a guide in the search for stability enhancing mutations.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
The spatial neighbours of each residue were determined based on solvent accessibility and spatial distance. The solvent accessibility was taken from the respective HSSP-files (
Dodge et al., 1998
). For each surface residue the neighbouring surface residues were grouped according to their distance to the residue in question. The distance between two residues was computed as the shortest distance among the set of all possible pairs of atoms in the two residues. We assume that the alignment in the HSSP-file implies that neighbours in the main sequence are also neighbours in the aligned sequences and that the solvent accessibility is conserved (
Andrade et al., 1998
;
Goldman et al., 1998
). The expected number of neighbour interactions between residues of type
i
and
j
are calculated by
![]() | (1) |
where x i and x j are the fraction of amino acid i and j in the dataset for the distance range d and at a solvent accessibility larger than the cut-off ACC and N 0 is the total number of observed neighbour contacts. The score, S ij|d,ACC , is calculated by
![]() | (2) |
This gives a negative score for disfavoured neighbour-pairs and a positive score for favoured interactions. The score value S ij|d,ACC can be transformed into an apparent thermodynamic parameter by multiplication with RT .
The net charge in each layer of the protein was calculated. Aspartic acid and glutamic acid are considered negatively charged and arginine and lysine are considered positively charged. Histidine is either considered as uncharged or positively charged. The relative net charge,
q
rel
, we define as
![]() | (3) |
where N Positive is the number of positive residues, N Negative the number of negative residues and N Total the total number of residues in that particular layer.
The PDB identification codes for the structures used are 1ptx, 2bbi, 1hcp, 1iml, 1cdq, 1vcc, 1nkl, 1tiv, 2abd, 2hts, 1tpg, 1fbr, 1pco, 1who, 1beo, 2ncm, 1fim, 1tlk, 1xer, 1onc, 1rga, 1erw, 1fd2, 1put, 1fkj, 1jpc, 1thx, 1jer, 1ccr, 1wad, 2tgi, 1pls, 1neu, 4rhn, 1rmd, 1hce, 1hfh, 1tam, 2pf1, 1bip, 1whi, 1yua, 1bp2, 1zia, 4fgf, 7rsa, 1bw4, 2vil, 1eal, 1rie, 1doi, 3chy, 1cpq, 1msc, 1mut, 1rcb, 1lzr, 1htp, 1lid, 1lis, 1lit, 1kuh, 1nfn, 1irl, 1poc, 2tbd, 1cof, 1pms, 1rsy, 1snc, 1eca, 1jvr, 2end, 1anu, 5nul, 1fil, 1jon, 1lcl, 1itg, 1tfe, 1maz, 1pkp, 1lba, 1vsd, 2fal, 1ash, 1def, 2hbg, 1div, 1gds, 1grj, 1i1b, 1ilk, 1rcy, 1sra, 1ulp, 1mbd, 1aep, 1jcv, 2gdm, 1phr, 1rbu, 1esl, 1hlb, 1mup, 1vhh, 1gpr, 1btv, 1cyw, 1klo, 1l68, 3dfr, 2cpl, 1sfe, 1huw, 5p21, 1ha1, 1wba, 1lki, 2fha, 1prr, 2fcr, 1amm, 1cid, 1hbq, 1cdy, 2stv, 153l, 1rec, 1xnb, 2sas, 1gky, 1knb, 1ryt, 1zxq, 1har, 1cex, 1chd, 2tct, 2ull, 1gen, 1iae, 1nox, 1rnl, 2gsq, 1cfb, 1dyr, 1nsj, 2hft, 1fua, 2eng, 1thv, 1hxn, 2abk, 9pap, 1lbu, 3cla, 1vid, 2ayh, 2dtr, 1gpc, 1dts, 1jud, 1emk, 1ois, 1akz, 1sgt, 1ad2, 1nfp, 1din, 1lrv, 1dhr, 1bec, 1lbd, 1dpb, 1jul, 1mrj, 1fib, 1hcz, 1mml, 1vin, 1dja, 2cba, 3dni, 1lxa, 1arb, 1rgs, 1tys, 3tgl, 1ako, 1eny, 1ndh, 2dri, 1xjo, 1drw, 1kxu, 2prk, 1cnv, 1tfr, 1ytw, 1iol, 2ebn, 1tml, 1han, 1xsm, 1pbn, 1amp, 1ryc, 1bia, 1vpt, 1csn, 2ora, 1ctt, 1bco, 1fnc, 1gym, 1pda, 1cpo, 1esc, 2reb, 1mla, 1sig, 8abp, 1ghr, 1iow, 2ctc, 1gca, 1sbp, 1ede, 1pgs, 2cmd, 1anv, 1gsa, 1tag, 1dsn, 2acq, 1cvl, 1tca, 2abh, 2pia, 1pot, 1vdc, 1axn, 1msk, 1hmy, 2bgu, 1ldm, 1dxy, 1ceo, 1nif, 1arv, 1xel, 1uxy, 1rpa, 2lbp, 3pte, 1uby, 1fkx, 1pax, 3bcl, 1air, 1mpp, 2mnr, 1eur, 1cem, 1fnf, 1pea, 1omp, 2chr, 1pud, 1kaz, 1mxa, 1edg, 2sil, 1ivd, 1pbe, 1svb, 1ars, 1oyc, 1inp, 1oxa, 1eft, 1phg, 1cpt, 1iso, 1qpg, 2amg, 1uae, 1gnd, 2dkb, 1gpl, 1csh, 4enl, 1pmi, 1lgr, 1nhp, 1gcb, 1bp1, 1geo, 2bnh, 3grs, 1gln, 1gai, 2pgd, 2cae, 2aaa, 1byb, 1smd, 2myr, 3cox, 1dpe, 1pkm, 1ayl, 1crl, 1ctn, 1clc, 1tyv, 2cas, 1ecl, 1oxy, 1vnc, 1gal, 1dlc, 1sly, 1dar, 1gof, 1bgw, 1aa6, 1vom, 8acn, 1kit, 1taq, 1gpb, 1qba, 1alo and 1kcw.
![]() |
Results and discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Figure
2
shows the score values for all amino acid neighbour pairs involving tryptophan, glycine, alanine, proline, serine, histidine, lysine and aspartic acid for neighbour pairs with at least 20% solvent accessibility. The results for the other amino acids are available on our homepage (
http://www.bio.auc.dk/
). Score values have been calculated similarly for other solvent accessibility cut-offs. The aromatic residue tryptophan is one of only two residues showing a clear preference for contacts with the same residue type (the other is cysteine). Also interactions with the other aromatic residues are preferred. Interestingly the interactions between tryptophan and the two acidic residues (aspartic acid and glutamic acid) seem different. While tryptophan and glutamic acid are observed less frequently than expected, the opposite is observed for tryptophan and aspartic acid. Glycine shows the typical negative score for interactions with the same residue type. Also, glycine does not seem to have neighbours in the close spatial neighbourhood (
3.5 Å). This under-representation of neighbours close by is even clearer for proline. We interpret this under-representation as a sign of the preference for loop that proline residues have. The lack of interactions with all other amino acids in its vicinity point to most contacts being with solvent molecules. However, proline has an abundance of contacts at a larger distance (45 Å). Histidine is interesting in that it shows signs of its aromatic properties, through preference for contacts with aromatic residues (~3.5 Å), and its polarisable nature, through preferred contacts with the negatively charged residues (~3 Å). The basic amino acid lysine has as expected a clear negative score for contacts with other lysines. The favourable electrostatic interactions with the acidic amino acids is evident.
|
|
The amino acid composition of each solvent accessibility layer was determined. As expected the buried parts of the proteins are composed of a higher amount of non-polar residues than the more solvent exposed layers. The correlation between the amino acid composition was calculated from the data of the composition of the individual structural layers. Amino acids that have similar preferences for solvent contact and local environment are expected to show a high positive correlation because of similar trends in their distribution. Hence, amino acids showing negative correlation will have different preferences for local environment and are therefore not believed to be compatible, i.e. a single site mutation of this type at this location is not recommended. As the non-polar residues are abundant in the core and show a gradual decrease as the solvent accessibility increases in general the correlation between the non-polar residues is positive (Figure 4
)
. In contrast, the polar residues are more abundant in the highly exposed parts and hence are negatively correlated with the non-polar residues. Histidine and threonine behave markedly differently. They show positive correlation to each other, but little correlation with any of the other columns, with the exception of arginine and glycine. This is caused by the low occurrence of histidine and threonine in both the buried and highly exposed areas and their relatively high occurrence in the medium exposed layers. Histidine has positive correlation with two aromatic residues, tryptophan and tyrosine, and with the weakly polar threonine and the polar arginine. Again we interpret this as a sign of both the aromatic properties and the charge properties of histidine. The weakly polar residues do not have the same clear similarity in distribution as the polar and non-polar residues. Proline and serine seem to be more closely related to the polar residues. The weakly polar residue alanine has positive correlation only with the non-polar residues. We propose that mutations between residues with high positive correlation have a high chance of maintaining the thermodynamic stability of the 3D structure. This is particularly so for charged residues. In contrast, the residues with a high degree of negative correlation are typically residues with different physical-chemical properties, which cannot be interchanged without changing the physical chemistry of the protein. The non-correlated residues involve residues with a special role in the structure, e.g. some residues often involved in catalysis. We believe that the observation that proline in our study behaves similarly to polar residues is related with the structural role of proline residues and its preference for loops and turns. The alanine screening often used in protein engineering projects involves the substitution of residues to alanine, based on the assumption that alanine is a `neutral' residue. However, our data shows that alanine has a high negative correlation with all but the non-polar residues. We therefore propose the use of, for example, serine as a substitute for the residues that are negatively correlated with alanine.
|
![]() |
Notes |
---|
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bairoch,A. and Apweiler,R. (1997
)
Nucleic Acids Res.
,
25
, 31
36.
Bernstein,F.C., Koetzle,T.F., Williams,G.J., Meyer,E.E.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977 ) J. Mol. Biol. , 112 , 535 542.[ISI][Medline]
Bhavnani,M., Lloyd,D., Bhattacharyya,A., Marples,J., Elton,P. and Worwood,M. (2000
)
Gut
,
46
, 707
710.
Brocchieri,L. and Karlin,S. (1995 ) Proc. Natl Acad. Sci. USA , 92 , 12136 12140.[Abstract]
Bryant,S.H. and Amzel,L.M. (1987 ) Int. J. Pept. Protein Res. , 29 , 46 52.[ISI][Medline]
Burley,S.K. and Petsko,G.A. (1985 ) Science , 229 , 23 28.[ISI][Medline]
Chandonia,J.M. and Karplus,M. (1999 ) Proteins , 35 , 293 306.[ISI][Medline]
Chothia,C. (1976 ) J. Mol. Biol. , 105 , 1 14.[ISI][Medline]
Chou,P.Y. and Fasman,G.D. (1978 ) Annu. Rev. Biochem. , 47 , 251 276.[ISI][Medline]
Deane,C.M., Allen,F.H., Taylor,R. and Blundell,T.L. (1999
)
Protein Eng.
,
12
, 1025
1028.
Dodge,C., Schneider,R. and Sander,C. (1998
)
Nucleic Acids Res.
,
26
, 313
315.
Donnelly,D., Overington,J.P. and Blundell,T.L. (1994 ) Protein Eng. , 7 , 645 653.[Abstract]
Dyson,H.J., Jeng,M.F., Tennant,L.L., Slaby,I., Lindell,M., Cui,D.S., Kuprin,S. and Holmgren,A. (1997 ) Biochemistry , 36 , 2622 2636.[ISI][Medline]
Giletto,A. and Pace,C.N. (1999 ) Biochemistry , 38 , 13379 13384.[ISI][Medline]
Goldman,N., Thorne,J.L. and Jones,D.T. (1998
)
Genetics
,
149
, 445
458.
Golovanov,A.P., Volynsky,P.E., Ermakova,S.B. and Arseniev,A.S. (1999
)
Protein Eng.
,
12
, 31
40.
Hobohm,U. and Sander,C. (1994
)
Protein Sci.
,
3
, 522
524.
Hobohm,U., Scharf,M., Schneider,R. and Sander,C. (1992
)
Protein Sci.
,
1
, 409
417.
Holbrook,S.R., Muskal,S.M. and Kim,S.H. (1990 ) Protein Eng. , 3 , 659 665.[Abstract]
Jones,D.T. (1999 ) J. Mol. Biol. , 292 , 195 202.[ISI][Medline]
McGrath,M.E., Vasquez,J.R., Craik,C.S., Yang,A.S., Honig,B. and Fletterick,R.J. (1992 ) Biochemistry , 31 , 3059 3064.[ISI][Medline]
Miller,S., Janin,J., Lesk,A.M. and Chothia,C. (1987 ) J. Mol. Biol. , 196 , 641 656.[ISI][Medline]
Miyazawa,S. and Jernigan,R.L. (1993 ) Protein Eng. , 6 , 267 278.[Abstract]
Miyazawa,S. and Jernigan,R.L. (1996 ) J. Mol. Biol. , 256 , 623 644.[ISI][Medline]
Miyazawa,S. and Jernigan,R.L. (1999 ) Proteins , 36 , 347 356.[ISI][Medline]
Mucchielli-Giorgi,M.H., Tuffery,P. and Hazout,S. (1999 ) Theor. Chim. Acta. , 101 , 186 193.
Overington,J., Donnelly,D., Johnson,M.S., Sali,A. and Blundell,T.L. (1992
)
Protein Sci.
,
1
, 216
226.
Petersen,M.T.N., Jonson,P.H. and Petersen,S.B. (1999
)
Protein Eng.
,
12
, 535
548.
Petersen,S.B., Jonson,P.H., Fojan,P., Petersen,E.I., Petersen,M.T.N., Hansen,S., Ishak,R.J. and Hough,E. (1998 ) J. Biotechnol. , 66 , 11 26.[ISI][Medline]
Rost,B. and Sander,C. (1994 ) Proteins , 20 , 216 226.[ISI][Medline]
Thompson,M.J. and Goldstein,R.A. (1996 ) Proteins , 25 , 38 47.[ISI][Medline]
Vonderviszt,F., Mátrai,G. and Simon,I. (1986 ) Int. J. Pept. Protein Res. , 27 , 483 492.[ISI][Medline]
Wako,H. and Blundell,T.L. (1994 ) J. Mol. Biol. , 238 , 693 708.[ISI][Medline]
Wojcik,J., Mornon,J.P. and Chomilier,J. (1999 ) J. Mol. Biol. , 289 , 1469 1490.[ISI][Medline]
Received October 17, 2000; revised January 21, 2001; accepted February 15, 2001.