Experimental evidence for the correlation of bond distances in peptide groups detected in ultrahigh-resolution protein structures

Luciana Esposito1, Luigi Vitagliano1, Adriana Zagari1 and Lelio Mazzarella1,2,3

1 Centro di Studio di Biocristallografia CNR and Dipartimento di Chimica, Università di Napoli `Federico II', Via Mezzocannone 4, I-80134 Naples and 2 CEINGE, Biotecnologie Avanzate, Naples, Italy


    Abstract
 Top
 Abstract
 Introduction
 Results and discussion
 Materials and methods
 References
 
The structural analysis of a deamidated derivative of ribonuclease A, determined at 0.87 Å resolution, reveals a highly significant negative correlation between CN and CO bond distances in peptide groups. This trend, i.e. the CO bond lengthens when the CN bond shortens, is also found in seven out of eight protein structures, determined at ultrahigh resolution (<0.95 Å). In five of them the linear correlation is statistically significant at the 95% confidence level. The present findings are consistent with the traditional view of amide resonance and, although already found in small peptide structures, they represent a new and important result. In fact, in a protein structure the fine details of the peptide geometry are only marginally affected by the crystal field and they are mostly produced by intramolecular and solvent interactions. The analysis of very high-resolution protein structures can reveal subtle information about local electronic features of proteins which may be critical to folding, function or ligand binding.

Keywords: amide resonance/crystallography/peptide bond/protein structure/ultrahigh resolution

Abbreviations: Beq, equivalent B-factor • CSD, Cambridge Structural Database • N67isoD-RNase A, ribonuclease A containing an isoaspartyl residue at position 67 • PDB, Protein Data Bank • PPI, peptidyl–prolyl cis–trans isomerase


    Introduction
 Top
 Abstract
 Introduction
 Results and discussion
 Materials and methods
 References
 
Very high-resolution diffraction data from well-ordered crystals with low thermal motion are essential to reveal subtle features of complex and dynamic systems such as protein molecules. The development of high-intensity synchrotron radiation sources, efficient two-dimensional detectors and cryogenic techniques (Walter et al., 1995Go; Garman and Schneider, 1997Go; Moffat and Ren, 1997Go; Lindley, 1999Go) is producing a growing number of atomic resolution protein structures of increasing size and complexity. The level of accuracy of these 3D protein structures is approaching that obtained for small molecules, thus providing relevant information on biological function and catalysis of the particular enzymes under study (Dauter et al., 1997aGo; Longhi et al., 1998Go). From these unbiased structures, evidence for general and fundamental molecular properties can also be derived. Recent studies, focused on the geometry of the polypeptide backbone and on its experimental electron density (Lamzin et al., 1999Go), have demonstrated the potential of crystallography at ultrahigh resolution. In fact, a correlation between conformation and geometrical main-chain features, such as the N–C{alpha}–C angle and carbonyl carbon pyramidalization (Esposito et al., 2000aGo,bGo), has been detected, and in addition the average charge density parameters for the peptide unit have been derived from a multi-pole refinement of crambin at 0.54 Å resolution (Jelsch et al., 2000Go). Many of these results have proved to be consistent with those from extensive analyses performed on small molecule structures (see, e.g., Jeffrey et al., 1985).

In this paper, we consider a further aspect of the peptide group, i.e. the correlation between CO and CN bond lengths as expected from the classical Pauling resonance model. In fact, it posits the contribution of the polar form, >N+=C–O, to the peptide structure, with a partial double bond character for the C–N bond, and it implies an increase in the C=O bond order when the C–N bond order decreases. The simple resonance model for the amide structure has been debated recently (Wiberg and Breneman, 1992Go; Fogarasi and Szalay, 1997Go) since some authors have suggested a more complex picture which, in addition to the {pi}-orbital overlap, emphasizes the role of {sigma}-bond polarization and of the Coulombic interactions between covalently linked atoms (Milner-White, 1997, and references cited therein).


    Results and discussion
 Top
 Abstract
 Introduction
 Results and discussion
 Materials and methods
 References
 
This analysis considers the peptide bond lengths of a previously determined crystal structure of a ribonuclease A deamidated derivative (N67isoD-RNaseA) refined at 0.87 Å resolution (Esposito et al., 2000aGo). To date, only two protein structures (with >50 residues) have been determined at a better resolution (Genicket al., 1998Go; Kuhn et al., 1998Go). In addition, the N67isoD-RNase A system is particularly interesting since the crystal contains two molecules in the asymmetric unit. The analyses performed on the model can be validated by checking the correspondences between the two molecules. The large number of observations per modelled atom (~65) has allowed the use of relaxed stereochemical restraints in the refinement and the description of atomic vibrations by six anisotropic displacement parameters. The model was refined to R = 10.1% using all data between 61 and 0.87 Å resolution (Esposito et al., 2000aGo). Furthermore, to derive an unbiased backbone geometry the structure was refined by omitting all the restraints which involve main-chain atoms (Esposito et al., 2000aGo,bGo). Peptides were selected on the basis of the equivalent B-factors (Beq) of their atoms (see Materials and methods). Although somewhat arbitrary, this criterion allows one to filter out inaccurate data coming from the less defined regions of the molecule. Here, both the restrained and the partially unrestrained models were used to study the relationship between CN and CO bond lengths. Similar results were obtained for both the restrained and the partially unrestrained models (Table IGo), thus showing that at this resolution most of the bond lengths are sharply defined only by X-ray data and are scarcely influenced by the geometric restraints. Hence the discussion will be restricted to the partially unrestrained model. The CO bond distances were plotted against the CN bond distances of each peptide group and a correlation was found between them in both A and B molecules of the asymmetric unit. The linear regression analysis reveals that the best-fit line is very similar in both chains and that the distances correlate with a negative slope (Table IGo), i.e. the CO bond lengthens when the CN bond shortens. The correlation coefficient is 0.44 and 0.38 for A and B molecules, respectively, and 0.41 when considering the total set of 177 selected peptide groups (Figure 1AGo). In all cases, statistical tests were used to provide further evidence that the correlation is real and not a coincidence. The tests yield p-values (see Materials and methods) ranging from 10–4 to 10–8, indicating that the correlation is statistically highly significant. It is worth noting that the estimated average uncertainty for CO and CN bond lengths (see Materials and methods) is of the order of 0.01 Å. Although this value is likely to be significantly underestimated, it is well below the observed variation of these bond distances along the chain. The reliability of these findings is further supported by the results of the analysis performed on CN and CO distances averaged over the two molecules of the asymmetric unit. The above trend persisted, with an increased correlation coefficient, when compared with both the individual molecules (Table IGo).


View this table:
[in this window]
[in a new window]
 
Table I. Results of the linear regression analysis
 




View larger version (65K):
[in this window]
[in a new window]
 
Fig. 1. Scatter diagrams of the C–O bond distances versus the C–N bond distances. Linear regression analysis is shown for data from (A) N67isoD-RNase A (1DY5 unrestr., see footnote to Table IGo), (B) selected peptides from the CSD and (C) total set of high-resolution protein structures presenting a statistically significant bond length correlation (see Table IGo).

 
This investigation was extended to peptide and to ultrahigh-resolution protein structures available in the Cambridge Structural Database (CSD) (Allen et al., 1979Go) and in the Protein Data Bank (PDB) (Bernstein et al., 1977), respectively. The CSD was searched for peptides, excluding data from cyclic, disordered or D-amino acids containing peptides, as well as from structures with an R >0.07. The analysis of bond distances in the selected 252 peptide groups resulted in a linear correlation coefficient of 0.34 (p = 4x10–8) (Figure 1BGo). Also, the equation of the linear fit closely matches those found in our protein structure (Table IGo). This trend of CO/CN distances in amide groups has already been observed in previous small molecule studies (Chakrabarti and Dunitz, 1982Go; Popelier et al., 1990Go; Cieplak, 1994Go). In particular, Popelier et al. (1990) found a fairly strict dependence of the CO length on the number of hydrogen bonds that the oxygen accepts.

In the PDB, eight protein structures were selected when considering only anisotropically refined models containing >50 residues and determined at a resolution of better than 0.95 Å (see Table IGo for references). Again, peptide groups were selected on the basis of the equivalent B-factors of backbone atoms. In five structures, the negative correlation is statistically significant and the linear fitted equations appear to be rather similar (Table IGo). For two structures the regression line shows a negative slope, although the correlation is less significant. In the remaining structure the value of the slope is close to zero, an indication that there is no correlation. Figure 1CGo illustrates the results of the regression analysis of the five protein structures and for A and B molecules of N67isoD-RNase A. The data suggest that the correlated variations in peptide geometry are real, even though sometimes they may be hidden by the stereochemical restraints used in the refinement.

Although already traced in small molecules, the negative correlation between protein CO and CN bond distances represents a new and important result. These geometry changes in small peptides are mostly due to simple crystal field effects, whereas in proteins they can be produced by intramolecular interactions. Preliminary attempts to connect the bond length changes with specific structural motifs or hydrogen bonding environments did not produce consistent results. As an example, the examination of the CO bond lengths in the better defined regions of {alpha}-helices of the N67isoD-RNase A structure showed that, on average, the CO distance is shorter (~0.01 Å) when the oxygen is not involved in the hydrogen bond typical of {alpha}-helices. The difference in bond lengths is, however, not statistically meaningful. The failure to detect the origins of the CN/CO distance variations can be ascribed to the limited database of ultrahigh-resolution structures and also to the high complexity of a macromolecular framework. In proteins, almost every oxygen and nitrogen atom of peptide groups forms hydrogen bonds and it is difficult to evaluate the strength of the various interactions. A rigorous analysis should take into account not only the number and the stereochemical features of hydrogen bonds, but also the occupancies and B-factors of the atoms involved, the long-range interactions, the role played by the solvent and the placement of the peptide bond in the interior or on the surface of the protein. In particular, an accurate definition of the local electrostatic potential near each peptide group is required.

The results presented here are consistent with Pauling's classical model of amide resonance and at the same time contribute to changing the traditional view of peptide groups usually considered as fairly rigid structural units. In proteins, peptide bonds actually undergo subtle changes in the electronic distribution, which are then reflected in geometry differences.

The importance of these subtle electronic variations has been highlighted by studies on peptidyl–prolyl cistrans isomerases (PPI) (VanDuyne et al., 1991) which catalyse the interconversion of cis and trans isomers of peptide bonds, a process which can often have a rate-limiting role in protein folding. It has been suggested that in PPI the catalytic action occurs by burying the proline amide bond in a hydrophobic cavity. This process destabilizes amide resonance forms in which the oxygen is negatively charged, thus favouring the resonance structures in which the amide carbonyl is more ketone-like and consequently lowering the rotational barrier around the C–N bond (Eberhardt et al., 1992Go; Harrison and Stein, 1992Go). The present findings give further support to the hypothesized mechanism by showing a certain degree of flexibility of the peptide linkage, whose electronic state and geometry can be influenced by the environment.

In conclusion, the analysis of very high-resolution protein structures can reveal subtle information about local electronic features of proteins, which may be critical to folding, function or ligand binding. A deeper understanding of the structural properties may be of importance for further progress in structure prediction methods. On the other hand, the accuracy of these methods could be evaluated by their ability to reproduce the fine details found in very high-resolution protein structures.


    Materials and methods
 Top
 Abstract
 Introduction
 Results and discussion
 Materials and methods
 References
 
The 0.87 Å resolution structure was refined by SHELX-97 (Sheldrick and Schneider, 1997Go) to an R of 10.1% and an Rfree of 12.0% (Esposito et al., 2000aGo). The removal of the geometric restraints involving main-chain atoms (decrease in the number of restraints from 26 709 to 23 157) resulted in practically unaltered R and Rfree factors. As expected, in the less defined regions of the structure, larger deviations from geometric ideal values were observed in the partially unrestrained model. However, in this analysis, peptide groups belonging to these zones were rejected by a criterion of selection based on equivalent B-factors. In fact, for N67isoD-RNase A structure as well as for each ultrahigh-resolution protein structure, we calculated the Beq mean values of the C, N and O backbone atoms. Peptide groups were then excluded from the regression analysis when at least one of the atoms (C, N or O) has a Beq 20% larger than the corresponding mean value.

An estimate of the positional uncertainties has been derived at the end of the refinement by a blocked matrix inversion (Esposito et al., 2000aGo).

The CSD database was searched for the fragment, –CH–C(=O)–N–CH–, belonging to peptide sequences and not being N-acetyl or N-methylamide terminal units. Metal-containing complexes were excluded and also data from cyclic, disordered or D-amino acids containing peptides. Furthermore, structure determinations resulting in an R value >0.07 were rejected.

To evaluate the validity of the correlation coefficients between C–O and C–N distances, the so-called null hypothesis, i.e. the hypothesis that the variables are not correlated, was tested by using Student's t distribution. The statistical test yields a p-value (reported in Table IGo) which represents the probability that random sampling would result in a correlation coefficient as far from zero as observed in our data set, under the hypothesis that there is no correlation between the two variables; p-values <0.05 allow one to reject the null hypothesis at the 95% confidence level.


    Notes
 
3 To whom correspondence should be addressed. E-mail: mazzarella{at}chemistry.unina.it Back


    Acknowledgments
 
The authors thank MURST `PRIN 2000' and the CNR `Agenzia 2000' for financial support. The authors are also grateful to Luca De Luca for technical assistance.


    References
 Top
 Abstract
 Introduction
 Results and discussion
 Materials and methods
 References
 
Allen,F.H. et al. (1979) Acta Crystallogr., B35, 2331–2339.[ISI]

Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1997) J. Mol. Biol., 112, 535–542.

Chakrabarti,P. and Dunitz,J.D. (1982) Helv. Chim. Acta, 65, 1555–1562.[ISI]

Cieplak,A.S. (1994) Struct. Chem., 5, 85–98.[ISI]

Dauter,Z., Sieker,L.C. and Wilson,K.S. (1992) Acta Crystallogr., B48, 42–59.

Dauter,Z., Lamzin,V.S. and Wilson,K.S. (1997a) Curr. Opin. Struct. Biol., 7, 681–688.[ISI][Medline]

Dauter,Z., Wilson,K.S., Sieker,L.C., Meyer,J. and Moulis,J.M. (1997b) Biochemistry, 36, 16065–16073.[ISI][Medline]

Deacon,A., Gleichmann,T., Kalb,A.J., Price,H., Raftery,J., Bradbrook,G., Yariv,J. and Helliwell,J.R. (1997) J. Chem. Soc., Faraday Trans., 93, 4305–4312.[ISI]

Declercq,J., Evrard,C., Lamzin,V. and Parello,J. (1999) Protein Sci., 8, 2194–2204.[Abstract]

Eberhardt,E.S., Loh,S.N., Hinck,A.P. and Raines,R.T. (1992) J. Am. Chem. Soc., 114, 5436–5437.[ISI]

Esposito,L., Vitagliano,L., Sica,F., Sorrentino,G., Zagari,A. and Mazzarella,L. (2000a) J. Mol. Biol., 297, 713–732.[ISI][Medline]

Esposito,L., Vitagliano,L., Zagari,A. and Mazzarella,L. (2000b) Protein Sci., 9, 2038–2042.[Abstract]

Fogarasi,G. and Szalay,P.G. (1997) J. Phys. Chem. A., 101, 1400–1408.[ISI]

Garman,E.F. and Schneider,T.R. (1997) J. Appl. Crystallogr., 30, 211–237.[ISI]

Genick,U.K., Soltis,S.M., Kuhn,P., Canestrelli,I.L. and Getzoff,E.D. (1998) Nature, 392, 206–209.[ISI][Medline]

Harrison,R.K. and Stein,R.L. (1992) J. Am. Chem. Soc.,114, 3464–3471.[ISI]

Jeffrey,G.A., Houk,K.N., Paddon-Row,M.N., Rondan,N.G. and Mitra,J. (1985) J. Am. Chem. Soc., 107, 321–326.[ISI]

Jelsch,C., Teeter,M.M., Lamzin,V., Pichon-Pesme,V., Blessing,R.H. and Lecomte,C. (2000) Proc. Natl Acad. Sci. USA, 97, 3171–3176.[Abstract/Free Full Text]

Kuhn,P., Knapp,M., Soltis,S.M., Ganshaw,G., Thoene,M. and Bott,R. (1998) Biochemistry, 37, 13446–13452.[ISI][Medline]

Lamzin,V.S., Morris,R.J., Dauter,Z., Wilson,K.S. and Teeter,M.M. (1999) J. Biol. Chem., 274, 20753–20755.[Abstract/Free Full Text]

Lindley,P.F. (1999) Acta Crystallogr., D55, 1654–1662.

Longhi,S., Czjzek,M. and Cambillau,C. (1998) Curr. Opin. Struct. Biol., 8, 730–737.[ISI][Medline]

Milner-White,E.J. (1997) Protein Sci., 6, 2477–2482.[Abstract/Free Full Text]

Moffat,K. and Ren,Z. (1997) Curr. Opin. Struct. Biol., 7, 689–696.[ISI][Medline]

Parisini,E., Capozzi,F., Lubini,P., Lamzin,V., Luchinat,C. and Sheldrick,G.M. (1999) Acta Crystallogr., D55, 1773–1784.[ISI]

Popelier,P., Lenstra,A.T.H., Van Alsenoy,C. and Geise,H.J. (1990) Struct. Chem., 2, 3–9.[ISI]

Sheldrick,G.M. and Schneider,T.R. (1997) Methods Enzymol., 277, 319–343.[ISI]

Van Duyne,G.D., Standaert,R.F., Karplus,P.A., Schreiber,S.L. and Clardy,J. (1991) Science, 252, 839–842.[ISI][Medline]

Walsh,M.A., Schneider,T., Sieker,L.C., Dauter,Z., Lamzin,V. and Wilson,K.S. (1998) Acta Crystallogr., D54, 522–546.[ISI]

Walter,R.L., Thiel,D.J., Barna,S.L., Tate,M.W., Wall,M.E., Eikenberry,E.F., Gruner,S.M. and Ealick,S.E. (1995) Structure, 3, 835–844.[ISI][Medline]

Wiberg,K.B. and Breneman,C.M. (1992) J. Am. Chem. Soc., 114, 831–840.[ISI]

Received August 10, 2000; revised October 12, 2000; accepted October 12, 2000.