(Received for publication, January 3, 1997, and in revised form, February 5, 1997)
From the X-ray Research Laboratory, Rigaku
Corporation, 3-9-12 Matsubara, Akishima, 196 Tokyo, Japan, the
§ Department of Industrial Chemistry, Kyungpook National
University, Taegu, 702-701 Korea, and the ¶ Merkert Chemistry
Center, Department of Chemistry, Boston College, Chestnut Hill,
Massachusetts 02167
It is not agreed that correlated positions of disordered protein side chains (substate correlations) can be deduced from diffraction data. The pure Ser-22/Ile-25 (SI form) crambin crystal structure confirms correlations deduced for the natural, mixed sequence form of crambin crystals. Physical separation of the mixed form into pure SI form and Pro-22/Leu-25 (PL form) crambin and the PL form crystal structure determination (Yamano, A., and Teeter, M. M. (1994) J. Biol. Chem. 269, 13956-13965) support the proposed (Teeter, M. M., Roe, S. M., and Heo, N. H. (1993) J. Mol. Biol. 230, 292-311) correlation model. Electron density of mixed form crambin crystals shows four possible pairs of side chain conformations for heterogeneous residue 22 and nearby Tyr-29 (22 = 4, two conformations for each of two side chains). One combination can be eliminated because of short van der Waals' contacts. However, only two alternates have been postulated to exist in mixed form crambin: Pro-22/Tyr-29A and Ser-22/Tyr-29B. In crystals of the PL form, Pro-22 and Tyr-29A are found to be in direct van der Waals' contact (Yamano, A., and Teeter, M. M. (1994) J. Biol. Chem. 269, 13956-13965). Comparison of the SI form structure with the mixed form electron density confirms that the fourth combination of side chains does not occur and that side chain correlations are mediated by water networks.
Motion correlated over 5-8 Å (liquid-like movement) has been shown by the non-Bragg technique of x-ray diffuse scattering to be important in insulin and lysozyme crystals (1, 2). State of the art molecular dynamics methods cannot model such correlations (3), perhaps because of inadequate sampling of conformational substates (4). Multiple substates of nearly equal energy are also proposed for myoglobin based on spectroscopic evidence (5-7), but spectroscopy is not well suited to elucidate the nature of these substates. Neither is NMR, unless extremely tight distance restraints are used (8).
Diffraction from a crystal is averaged over many unit cells and over the time spent on data collection. It is generally believed that this averaging precludes extracting dynamic information, such as occurrence of multiple substate correlations from an x-ray structure. However, nonrandom correlations will contribute to Bragg reflections. Given diffraction data beyond 1.4 Å (9), the correlations can be modeled as substate disorder and provide insight into protein dynamics. If one could physically separate the substates and study each separately, one could prove such correlations exist and derive the rules for the correlation.
Crambin presents an excellent system for such an experiment. Crambin from the natural source contains two sequence isomers in a 3:2 ratio (10, 11), the so-called mixed form of crambin. The major isomer has Pro and Leu at positions 22 and 25, respectively (the PL form);1 the minor isomer has Ser and Ile at the same positions (the SI form). In the mixed form crystal structure, side chain electron densities for heterogeneous residues are superimposed (Pro and Ser at residue 22 and Leu and Ile at residue 25). The Tyr-29 side chain from a 21-screw axis-related molecule has close contacts with the Pro or Ser residue and adopts two conformations. A proposed correlation of the Tyr-29 conformation with the identity of the amino acid at residue 22 (12) has been supported by the PL form structure (13). Now the second or SI form of crambin has been purified by fast protein liquid chromatography and crystallized. It establishes the side chain correlations definitively and establishes associated solvent interactions.
In this paper, first the proposed mixed form protein networks are extended to water disorder using stereochemical "rules," such as van der Waals' contacts and hydrogen bonding. Second, these postulated networks are compared with the crystal structures of the physically separated pure forms of crambin: the PL form structure and the newly determined SI form structure. These results establish that the x-ray structure of the mixed form of crambin can elucidate substate spatial correlations between side chains and solvent, as proven by the pure form structures.
Alternative conformations at disordered residues and water molecules are designated by attaching A and B to the residue number. Such disordered conformations are often correlated with neighboring residue disorder through space and may represent conformational substates of the protein. For example, Ser-22A, Tyr-29A, and Wat-132A represent one disordered substate correlated through space with the alternates Ser-22B, Tyr-29B, and Wat-132B.
Crambin was purified to a single sequence form (13), and crystals of the SI form were grown by vapor diffusion techniques (14). Conditions were similar to those previously used (15) but with an initial reservoir concentration of 50% ethanol. In contrast to other forms, seeding by methods such as the streak seeding technique (16) was essential to nucleate crystal growth. Here a submicroscopic mixed form crystal served as the seed crystal, and small crystals appeared along the streak line 2 days after seeding. The ethanol concentration of the reservoir was reduced to 45% ethanol after crystal growth stopped at 50%. Crystals grew to the proper size for x-ray diffraction experiments in 2 weeks (0.5 × 0.2 × 0.1 mm).
Diffraction data were collected to 0.89 Å resolution on a Rigaku AFC5
four circle diffractometer on a Rigaku RU-200 rotating anode generator.
The crystal was flash cooled (17) to 150 K with a Molecular Structure
Corporation rigid tube low temperature device. Refinement consisted of
PROLSQ restrained least squares (18) alternating with interactive
rebuilding using the program FRODO (19) on an Evans & Sutherland PS390.
The initial model, which was the mixed form structure at 130 K without
side chains for residues 22 and 25 but including hydrogen, was first
refined with isotropic temperature factors against 1.5 Å data.
Hydrogens were refined, because it is difficult to fix or ride them in
PROLSQ. The resolution was extended to 0.89 Å in three resolution
steps (1.2, 1.0, and 0.89 Å). Three-parameter anisotropic
temperature factors (20) were introduced after convergence with
isotropic refinement. 95 cycles of PROLSQ refinement brought the
standard R-factor down to 14.7% (with
Rerr (Fo/
Fo) of 9.5%).
The final model has 495 heavy atoms (349 protein atoms, 140 water
sites, and 2 ethanol sites) and 429 hydrogen atoms, for a total of 824 atoms. Table I summarizes refinement statistics for the
SI form structure, and Table II summarizes the agreement with stereochemical restraints. Errors are estimated from a Luzzati plot (21) to be about 0.08 Å for the SI form, 0.06 for the PL form,
and 0.06 for the mixed form crambin (true
from full matrix refinement of the mixed form is 0.022 Å) (22).
|
|
In the SI form, seven residues (15.2%) have multiple conformations. This is less than the eight residues in the PL form (17.4%) and considerably less than the mixed form (28.3%), where sequence heterogeneity plays a major role.
The overall structure of the SI form of crambin (Fig. 1)
is very similar to that of the PL form (at 150 K (13)) and the mixed
form (at 293 K (10) and at 130 K (12)). The largest structural
differences might be expected at the turn from residues 19-22, because
Ser-22 is more flexible than Pro. However, the rms deviation is only
0.056 Å between the SI and PL forms.
Fig. 2 shows the proposed disordered protein/water
networks in the mixed form atomic model and 2Fo Fc electron density around residue 22. The electron density for the side
chain suggested three-way disorder: one Pro and two Ser sites with
disordered O
. The Tyr-29 side chain has two
conformations. The weak electron density between Tyr-29A
O
and Pro-22 C
was assigned to
the water alternates 132A/132B (1.75 Å apart).
Considering only protein atoms and a single Ser conformation, four
possible combinations exist: 1) Ser-22/Tyr-29A, 2) Ser-22/Tyr-29B, 3)
Pro-22/Tyr-29A, and 4) Pro-22/Tyr-29B. Because of a short Van der
Waals' contact, the fourth choice can be excluded immediately (the
distance from O of Tyr-29B to
C
of Pro-22 is only 2.41 Å).
Tyr-29A forms a slightly short hydrogen bond to Wat-182, and Wat-182A
makes a hydrogen bond to Wat-82. But Tyr-29B
O forms hydrogen bonds to either Wat-132A or
Wat-132B as well as to Wat-47 in ring A of the pentagon water ring
cluster (23). Wat-132A or Wat-132B hydrogen bonds to the backbone N of
residue 22. However, this is only possible with the Ser side chain or site Tyr-29B because of the Pro C
-N covalent bond. Short contacts with these waters would result with Pro-22 or Tyr-29A (Wat-132A-C
22 1.47 Å, Wat-132B-C
22
1.52 Å, see Fig. 2).
The three remaining networks can be extended to include hydrogen-bonded
waters (branches are indicated in parenthesis): 1, Ser-22/Tyr-29A/Wat-182A/Wat-82; 2
, Ser-22/Wat-132/(Wat-182B)/Tyr-29B (the red network in Fig. 2); and 3
,
Pro-22/Tyr-29A/Wat-182A/Wat-82 (the green network in Fig.
2.)
Based on this analysis, one would predict for the pure form structures
that waters associated with the missing form would have altered
occupancies. Key would be the weak density sites Wat-132A/B. They
should be considerably stronger in the SI form but absent from the PL
form. Indeed in the mixed form the sum of occupancy and B value average
(B
) for these waters are 0.4 and 11, whereas for the SI
structure, the occupancy sum is 0.8 and
B
is 3.1.
Fig. 3 shows the electron density and atomic model of
the pure PL form structure at the same region that is shown in Fig. 2.
The electron density is consistent with the elimination of the Ser and
Ile side chains. The density at residue 22 matches Pro. The phenol ring
of residue 29 takes the A conformation and Tyr-29A
O makes allowed van der Waals' contacts with
Pro-22 C
and C
.
Water sites Wat-132A and Wat-132B are absent in this structure. The
water-protein conformations perfectly match the green
network in Fig. 2. The rms deviation from the mixed form structure is
0.065 Å over Tyr-29A, Pro-22, Wat-47, Wat-82, and Wat-182A.
Fig. 4 shows the electron density and the atomic model
for the SI form structure. Residue 22 electron density is interpreted as a Ser with disordered O, and no Pro is
present. Tyr-29 takes the Tyr-29B conformation, and waters 132A/132B
are enhanced as predicted. This structure is nearly identical to the
red network in Fig. 2, except for Wat-182B. The rms
deviation between this and the mixed form structure is 0.267 Å for
Tyr-29B, Ser-22A/B, Wat-47, Wat-132A/B, and Wat-182B. An additional
water site (Wat-182C) could be modeled in Fig. 2 (elongated density on
182A), because an additional water site is visible from the Ser/Ile
structure (Wat-182A).
From the above comparisons, one can conclude that interpenetrating
disorder networks can be separated by optimizing van der Waals'
contacts and hydrogen bonds. Because networks 2 and 3
account for all
the electron density in the mixed form crystal, these are the only
networks needed to account for the mixed form disorder.
Why is the network 1 not present in nature? Stereochemical
requirements alone cannot exclude this possibility, because it neither
violates van der Waals' contact limits nor has inappropriate hydrogen
bonds. However, if the phenol ring of residue 29 took the Tyr-29A
conformation and the side chain of residue 22 were Ser, there
would be a large vacancy around Wat-132A, Wat-132B, and Pro-22
C
. The potential empty space is eliminated by the spatial correlation among side chains and water molecules. In other
words, space or vacuum is not allowed at a protein surface, probably because it is energetically unfavorable.
Further, in the SI form structure, this space filling can be seen from the alternate water conformations identified in electron density maps. Wat-182A is shifted downward (Fig. 4) to fill the empty space created by the absence of the Tyr-29A conformation. Another water site (182B) alternates with this site 2.31 Å away and hydrogen bonds to Wat-132. Wat-132A/B disorder appears for similar reasons. Both alternate pairs fill the available space and optimize packing and hydrogen bonding.
From these disordered water molecules, the importance of solvent for protein flexibility is evident. The full rationalization of the correlations derived from Fig. 2 must involve solvent-mediated interactions.
Proposed disorder networks in the mixed form crambin are extended to solvent and confirmed by the pure Pro-22/Leu-25 (13) and Ser-22/Ile-25 forms of crambin. Here the two disordered forms resulting from sequence differences were physically separated by fast protein liquid chromatography, and each was crystallized. The spatial correlations implied from the mixed form structure were proven by examining both protein and water from the two pure form structures. Water was critical for this confirmation.
Derived rules for correlations provide insight into the structure and dynamics of proteins in general. In this paper, we have proven that correlated conformations obey simple stereochemical rules and have alternates that fill space. The same logic used here should apply to assigning multiple conformational substates where sequence differences are not involved (13, 24). These results demonstrate that dynamic correlation does occur and can be deduced from an x-ray structure at 1 Å resolution using fundamental principles. Such elucidation is important for understanding the mechanisms of such important proteins as lysozyme and myoglobin.
The atomic coordinates (code 1abl) and structure factors (code 1ablsf) have been deposited in the Protein Data Bank, Brookhaven National Laboratory, Upton, NY.
The crystal structure determinations of the pure forms of crambin were initiated with a sample of the mixed form of crambin, which was extracted and purified by Hucheng Bei, whose work is gratefully acknowledged. Thanks is due to Ofer Markman, who assisted with the figures. We thank Jack Dunitz and Boguslaw Stec for helpful discussions.