(Received for publication, April 24, 1995; and in revised form, June 12, 1995)
From the
The solution structure has been determined for a 19-residue
peptide that is fully folded at room temperature. The sequence of this
peptide is based on the C-loop, residues 371-389, of the fourth
epidermal growth factor-like domain of thrombomodulin, a protein that
acts as a cofactor for the thrombin activation of protein C. Despite
its small size, the peptide forms a compact structure with almost no
repeating secondary structure. The results indicate the structure is
held together by hydrophobic interactions, which in turn stabilize the
two -turns in the structure. The first
-turn in the C-loop
represents a conserved motif that is found in the published structures
of five other epidermal growth factor-like proteins. The critical role
of Phe
in the stabilization of the first
-turn is
consistent with mutagenesis data with soluble thrombomodulin. The
results also show that a small subdomain of a larger protein can fold
independently, and therefore it could act as an initiation site for
further folding.
Thrombomodulin (TM), ()an endothelial cell surface
glycoprotein, binds thrombin and alters its specificity away from
fibrinogen cleavage and toward the activation of protein C. The
activation of protein C by thrombin is accelerated >1000-fold when
TM is present as a cofactor. Generation of activated protein C, which
inactivates factor Va and factor VIIIa, is an important anticoagulant
mechanism of the endothelial cell
surface(1, 2, 3) .
TM is a multidomain protein, which spans the endothelial cell membrane. Full cofactor activity is present in the soluble ectodomain produced by elastase(4) . Several studies performed with mutagenesis or with peptides derived from the ectodomain of human TM have defined the domains required for activity. The smallest fragment with full cofactor activity for the activation of protein C by thrombin contains the last three consecutive EGF-like domains, EGF4-6(5, 6) . Recent studies suggest that the region of thrombomodulin that binds tightly to thrombin is distinct from the region that modulates the active site of thrombin. Significantly, a construct made from the fourth and fifth EGF-like domains, EGF45, retains approximately 10% of the cofactor activity, although binding to thrombin is drastically reduced(5) .
Deletion mutants near EGF4 effect k/K
for the
thrombin-TM complex with protein C, and removal of the fourth domain
results in a complete loss of cofactor activity. However, deletion
mutants that include the C-loop of EGF6 have a normal k
/K
for protein C
but decreased affinity for thrombin, as measured by the K
for thrombin(6) . A cyclic
peptide based on the C-loop of EGF5 and the interdomain loop between
EGF5 and EGF6 binds with high affinity to thrombin at the anion exosite
of thrombin, a positively charged groove on the surface of thrombin
important for TM, fibrinogen, and hirudin
binding(5, 7, 8) . The results suggest that
EGF56 contains the high affinity binding site for thrombin and EGF4
contains residues that are absolutely required for activity.
Site-specific mutants around EGF4, which result in low activity
analogs, have defined some residues important for cofactor activity in
this domain(9, 10) . This includes Asp in the interdomain region between EGF3 and EGF4, Glu
and Tyr
in the B-loop, and Phe
in the
C-loop of EGF4 (the numbering of the residues in this paper is
consistent with the sequence of thrombomodulin given by Suzuki et
al.(11) ). Met
in the interdomain region
between EGF4 and EGF5 adjacent to the C-loop of EGF4 can be oxidized to
the low activity methionine sulfoxide analog(12) . Perhaps of
more interest are EGF4 mutants, which result in an increase in cofactor
activity. Replacement of Met
by leucine results in an
analog with twice the cofactor activity of wild-type TM(13) .
When a second mutation is introduced in the C-loop, His
Gly, and combined with the Met
Leu
mutation, the resultant TM analog has four times the activity of
wild-type TM.
Clearly, one way these mutants could modulate cofactor activity is by altering the conformation of TM. To test this hypothesis, a set of cyclic peptides was synthesized based on the sequence of the individual loops of TM. Each loop contained a single disulfide. NMR was used to measure structures of these peptides in solution. It was hoped that a comparison between several peptides with single site mutations would shed light on the relationship between structure and function.
Of course, these structural comparisons could only be made if the peptides were folded. In our experience, most peptides of this size do not fold in aqueous solutions. Indeed, there are relatively few structures of peptides of this length listed in the Brookhaven Protein Data Bank. There are seven solution structures for peptides of less than 30 residues (Spring, 1994). All of these compounds contain at least two internal cross-links, which form the core of the structures. Most of the remaining residues form hairpin loops that are wrapped around a densely packed central core. During the course of this study, NMR was used to investigate the structure of nine different peptides. Each peptide contained approximately 20 residues and a single disulfide cross-link. The sequences were based on a single loop found either in TM or a homologous EGF-like protein. For most of the peptides, the spacing between the cysteines was much greater than the loops found in the small peptides of the protein bank data. Therefore, there must be a significant loss of conformational entropy during refolding. The experimental results indicated that the only peptides that formed a compact structure were based on the sequence of the C-loop of TM-EGF4 (Table 1). This was an unexpected result, since the peptides contained a long stretch of 13 amino acids between the cysteines. The experimental results were used to evaluate the structural homology between TM-EGF4 and other EGF-like proteins or domains that, in turn, provided a working hypothesis that explains some of the functional data.
The sequential proton assignments of the four peptides were obtained using standard homonuclear techniques(24) . The spectra were processed using UXNMR (Bruker Instruments), and the results were analyzed using a software package developed from the algorithms outlined by Adler and Wagner(25) . Chemical shifts were somewhat arbitrarily assigned to be consistent with known values for both proteins and water. The chemical shift of water was assigned at 4.89 ppm at 3 °C and 4.66 ppm at 23 °C. Unless otherwise noted, the chemical shifts are reported for 3 °C at pH 6.8.
Figure 1:
A graph that shows the
sequential NOEs that were used in making the proton assignments. The rightcolumn identifies the type of NOEs. dsignifies an NOE between the X proton in the first residue to the Y proton in the second
residue. The NOEs are listed for sequential residues except where
noted. The widths of the blacklines are
proportional to the NOE intensity. For the three prolines, NOEs to the
C
H were used in place of NOEs to the HN. The symbol
(
) is used to denote NOEs that either could not be observed for
either practical or theoretical reasons, such as potential overlap or
residues that are missing a proton. The bottomline of the graph lists some of the observed
J
coupling constants. Symbols are used as follows:
indicates a J
5.5 Hz, b indicates a 8.0
J
< 10.0 Hz,
indicates a 10.0
J
. The use of the symbols
and
do not imply that the residue is part of an
-helix or a
-sheet.
NOE peak intensities
were quantified from a NOESY spectrum of the double mutant
(H381G,M388L). The recycle delay between pulses was 3.6 s. A 100-ms
mixing was used to limit the artifacts caused by spin diffusion. The
spectrum was processed using 75° shifted sine bell. The base line
was flattened with a 5th order polynomial subroutine in both directions
on the fully transformed data set using the Bruker processing software
UXNMR. This subroutine has an automatic selection of base-line points.
Correction along the F axis was performed in eight uneven
sections to minimize the distortions caused by the dispersive water
signal.
The distance constraints were calculated from the peak
intensity. A correction factor was included, which controlled for
variation of peak width, based on the relative width of a resonance in
the F direction compared with the width of the peaks used
for calibration. Eight well-resolved methylene pairs were used to
calculate the scaling factor between the peak volumes and the target
distances. The variation in intensity between these peaks was 20%. The
standard equation for translating peak volumes into target distances
was modified so that the distances were lengthened to compensate for
any experimental uncertainty. First, the volume of each peak was
divided by two to compensate for variations in both peak width and
intensity. Also, it was assumed the volumes of the weaker peaks were
less accurate than the more intense ones. This uncertainty was handled
mathematically by using instead of power to calculate distances from
peak volumes. The final affect on the experimental data was that the
target distances of 2.2, 2.5, 3.0, 3.5, and 4.0 Å were lengthened
to 2.6, 3.1, 3.8, 4.6, and 5.0 Å, respectively (an upper limit of
5.0 Å was used for all observed NOEs). The accuracy of the
modified distance function was verified by examining the target values
both intra- and sequential HN to C
H NOEs. The ranges of
distances calculated from NOE peaks were roughly 20% larger than the
expected values. The calculated distances obtained from 100-ms NOESY
spectrum ranged from 2.5 to 4.7 Å. An additional 0.7 Å was
added to all NOEs involving methyl groups(27) . Lower bound
constraints were set to the Van der Waal's contact radii.
No
stereospecific assignments were performed for the methylene protons. If
a pair of NOE peaks was observed between two protons of a diastereo
pair to a third proton, the weaker NOE peak was used to calculate the
distance constraint. When only one NOE was observed, the distance
constraint was referenced back to the nearest heavy atom that was
equidistant from both protons. The distance constraint was lengthened
by the fixed distance between the protons and the heavy atom. In
general, the preliminary structures were not used to further interpret
the spectra due to the inherent uncertainty involved with these
techniques. However, some of the NOEs to the CH
positions of the two phenylalanines were stereospecifically assigned
when the preliminary structures indicated a separation of greater than
7 Å between protons that had NOEs to same Phe C
H.
The issue spin diffusion was not explicitly addressed during the preparation of the constraints. NOEs that involved a pair of methylene protons were examined for evidence of spin diffusion. We specifically looked at pairs of NOEs where one of two NOEs was very intense and could be a source of spin diffusion. In all cases, the inter-proton distance calculated for the weaker NOE was confirmed by an independent structural constraint.
A fourth set of
substitutions was made in an attempt to increase activity. His is only found in human TM. Glycine is found in this position in
both mouse and hamster TM. Comparison with homologous proteins
indicates that there was probably a
-turn at this position.
Experiments demonstrated that His
Gly doubled the
specific activity of human TM
-M388L. The double
mutant, H381G,M388L, is 400% more active than soluble TM analogs with
the native human sequence. The substitution of His
Ala (10) had no affect. His
Pro actually
caused a slight decrease in activity (60 ± 28%).
Exhaustive analysis of the NOESY spectra of both
the A- and B-loops revealed only eight NOEs that connected residues
separated by at least one amino acid; two bridged across the disulfide
bonds, one connected the HNs of Asn to Tyr
,
and the remaining five involved residues separated by a single amino
acid. Only two of the eight NOEs involved backbone-backbone
interactions. Structure calculations utilizing the combined
experimental constraints from both peptides failed to converge on a
unique structure.
The solution structures of three other loops from
EGF-like proteins were examined: the C-loop of TM-EGF5, the C-loop of
transforming growth factor-, and the B-loop of human urokinase EGF
domain. Inspection of the two-dimensional TOCSY spectra indicated that
there was little chemical shift dispersion of the protons beyond what
was expected based on the random coil values(28) . No further
analysis was attempted. The results from the C-loop of TM-EGF5 were
confirmed by a recent report (8) . Although the peptide is
unfolded in solution, it forms a unique conformation upon binding to
the anion exosite of thrombin.
Two-dimensional spectra of the
peptide based on the C-loop of TM-EGF4 indicated that the peptide did
form a compact structure. In particular, the chemical shifts of the
amide protons ranged from 7.5 to 9.3 ppm (Table 2). The
comparable range for the same protons in unfolded peptides would be
8.2-8.4(28) . To probe further the relationship between
structure and function, a total of four peptides derived from the
C-loop of EGF4 were synthesized (Table 1). Although the peptides
folded into compact structures, the isolated peptides had no measurable
effect on modulating the activation of protein C by thrombin when
tested alone as a cofactor for thrombin or as a competitive inhibitor
of the action of thrombomodulin on thrombin, even at concentrations as
high as 5 mM. ()The activity measurements shown in Table 1were performed by incorporating the sequence of each of
these peptides back into a truncated but fully active form of
thrombomodulin containing the fourth, fifth, and sixth EGF-like
domains, TM
-M388L. The activity, as a percentage
of the specific activity of the TM native sequence, ranged from 400 to
10%. The most detailed structural work was performed on the double
mutant, H381G,M388L, since this represents the most active sequence.
The two-dimensional spectra indicated that all four peptides had the
same overall fold (see below for details).
Figure 2:
A, a stereo view of the double mutant
showing the heavy atoms, without the carbonyl oxygens. The structure
presented was judged the best based on residual value of the penalty
function and how close the coordinates were to the average structure. B, a superposition of the 20 best structures. The side chain
of Val and the guanidinium group of Arg
have been omitted for clarity.
The root
mean square deviations of the well determined backbone atoms is 0.6
Å to the average structure and 0.9 Å for the pair-wise
interactions (Fig. 2B). This figure excludes N-terminal
Val because there is almost no structural information for
this residue. Root mean square deviation for all heavy atoms to average
structure is 1.3 Å (1.8 Å for the pair-wise interactions.)
The side chain conformation has been accurately determined for
Phe
, Ile
, His
, and
Gln
. Constraints on the protein backbone also determine
the locations of the side chains of Ala
,
Ala
, Pro
, Pro
, and
Pro
. Less information is available for the other side
chains.
The
second bend includes residues Ile, Pro
,
Gly
, and Glu
. Both type I and II
-turns are compatible with the experimental data for the double
mutant. The intensity of intraresidue NOEs between the HN and
C
Hs of Gly
would clearly resolve the
ambiguity if there were stereospecific assignments available for the
C
Hs. Unfortunately, the stereospecific assignments
could not be determined in a reliable fashion. The other three
peptides, including the native sequence, all have histidine at third
position of this turn, and all three exhibit a type I
-turn. This
-turn is part of a five-residue insertion, Pro
to
His
, in the sequence of this EGF-like domain (Table 3).
The -turn is stabilized by hydrophobic
interactions that are centered on Ile
. Only the outer
edges of the methyl groups are exposed to the solvent. The side chain
of Ile
residue is covered by the methylene side chains of
residues Pro
, Glu
, Arg
, and
Gln
. The close interaction between these side chains
probably adds to the stability of the protein.
A third, less well
defined type I -turn appears between residues Glu
,
Pro
, His
, and Arg
. The chain
itself forms a right angle turn through this bend. This geometry
distorts the conformation of Glu
and weakens the hydrogen
bond between the CO of Glu
to the HN of
Arg
.
The three prolines, Pro,
Pro
, and Pro
, all have trans peptide bonds.
There was no detectable amount of any folded species that contained a
cis peptide bond. All three prolines are located at bends in the
protein backbone. The prolines are all involved in delineating the
second
-turn. This is part of a five-residue insertion in the
sequence of the C-loop (Table 3).
There are three hydrogen
bonds that can be easily identified in the structure. Two of the
hydrogen bonds are found in the -turns: CO Ala
to HN
Phe
and CO Ile
to Glu
. A
third hydrogen bond is found between the CO of Ala
to the
HN of Gln
. This hydrogen bond appears where the protein
backbone crosses back upon itself (Fig. 2A). Other
potential hydrogen bonds may exist in this structure but cannot be
identified given the resolution of structures. Indeed, the peptide
appears to have only a few internal hydrogen bonds that could
contribute to the stability of the structure.
Figure 3:
The superimposed structures of the C-loop
from TM-EGF4 (double mutant) with the C-loop from the five other
EGF-like proteins shown in Table 3. The two loops that extend out
on the leftside are labeled for TM-EGF4 and FXa-C,
respectively. The superposition was done using the backbone atoms from
structurally homologous residues shown in bold in Table 3. Side chains are shown for the residues that interact
with the first -turn.
The sequence of the structurally conserved residues can be described
as Cys-X-X-Gly-(Phe/Tyr)-X-X . . . X-Cys . . . X. The first gap contains between one and
six residues; the second gap contains either one or two. The structural
similarity in the last position is surprising. The charge,
hydrophobicity, and size of this residue vary between the proteins
listed in Table 3. Also, TM-EGF4 and FXa-C, the two longest
C-loops, exhibit a one-residue deletion prior to this residue. The side
chain of this residue is in close contact with the conserved aromatic
residue in the first -turn and appears to limit the ring's
exposure to solvent. This interaction must be important to the
stability of the protein, since the interaction is maintained despite
the large variation of sequence in this position. In fact, there is
little conservation of the sequence for six out of ten conserved
positions (Table 3).
All six C-loops exhibit a second chain reversal shown on the leftside of Fig. 3. Four of the proteins share the same overall length of the C-loop. The fold of this chain reversal is conserved in each peptide. FXa-C and TM-EGF4 have a four- and five-residue insertion between the cysteines. Both proteins accommodate this insertion in roughly the same manner (Fig. 3). The results show that this bend in the structure accommodates considerable variations in both sequence and structure.
A more detailed study of temperature effects was carried out using two-dimensional NMR. A NOESY spectrum of the double mutant was collected at 23 °C and visually compared to the corresponding data obtained at 3 °C. Although the intensities of the cross-peaks were attenuated at the higher temperature, there was no evidence of any detectable change in conformation.
DQF-COSY spectra
were obtained for all four peptides (Table 1) at both 3 and 23
°C. The similarity of chemical shifts again indicated that the
conformation remained intact. There were, however, consistent changes
in CH of residues 376-378 and 386-388 (Fig. 4). These residues form an antiparallel structure. Some of
the chemical shift perturbations extended to the side chains that
participate in the hydrophobic pocket surrounding residue Phe
(Fig. 4). The elevated temperature had little effect on
the aliphatic protons located near
-turns, indicating that the
-turns were stable at the higher temperature and there was no
global unfolding of the peptide.
Figure 4:
The structure of the double mutant showing
the side chain heavy atoms. Each atom has been depicted in gray scale
to show the absolute value of the change in chemical shift when the
temperature was raised from 3 to 23 °C. The darkestgray represents no change; white represents a
change of greater than 0.13 ppm. The gray scale of the heavy atoms is
determined by the average of their attached proton(s) or by the average
of the nearest assigned proton(s). The ribbon is shaded using the
changes in the CH chemical
shifts.
A detailed comparison was made
between the 200-ms NOESY spectra (3 °C) of the double mutant and
the peptide with oxidized Met ([Met
]Mso). These peptides represent the
most and least active sequences. Of the original 213 assigned for the
double mutant, only two NOEs were found missing for
[Met
]Mso peptide. Both NOEs involved the
backbone protons of Phe
. The remaining 17 NOEs to
Phe
were found in both peptides. The
[Met
]Mso peptide also had some new NOEs not
found for the double mutant. Observation of the new NOEs probably
stemmed from the higher sample concentration used for the
[Met
]Mso peptide. NOESY spectra of the native
peptide and the Met
Leu were also very similar to
corresponding data for the double mutant. Each spectrum contains nearly
the same set of NOEs for the side chains of residues 388 and 389.
However, lower sample concentration precluded a more quantitative
examination of the data.
Similarity in the structures of the four
peptides is also demonstrated by comparing the chemical shifts. The
substitution of His
Gly did not significantly
affect (±0.08 ppm) the chemical shifts of the protons beyond a
5-Å radius of the site of the modification. The oxidation of the
Met
also had very minor affects.
It is worth noting
that the spectra of the oxidized [Met]Mso
peptide indicated the presence of two closely related peptides, even
though the compound was pure, as judged by high performance liquid
chromatography and mass spectrometry. At 3 °C, there was measurable
splitting of all the resonances of the methionine sulfoxide,
Mso
, and in the backbone protons of Gln
and
Phe
. Within accuracy of the data, the intensity of both
sets of peaks was the same. At 23 °C, this splitting became more
pronounced and affected additional residues in the hydrophobic pocket
around Phe
. The mono-oxidation of S
of
methionine introduces a chiral center at the sulfoxide. Since the
peptide was made with a synthetically prepared derivative of
methionine, it contained a racemic mixture of both R and S forms of
methionine sulfoxide at the S
position. These results
indicate that each enantiomer has a slightly different conformation.
As
discussed in the results section, the two -turns are stabilized by
the formation of hydrophobic pockets. It is worth noting that the side
chains of two other hydrophobic residues, Val
and
Phe
, do not interact with the hydrophobic pocket
surrounding Phe
, even though their backbone residues are
close to this pocket (Fig. 2A). A possible explanation
for these results can be found by examining the structure of the
homologous proteins. If Val
and Phe
are
compared to homologous residues in other EGF-like proteins (Table 3), the corresponding amino acids do not participate in
stabilizing this hydrophobic cluster. The results imply that the
structural constraints that control folding of the intact protein are
somehow encoded in the isolated C-loop.
Finally, the structure of
this peptide has some interesting implications for protein folding. It
clearly shows that a subdomain of a larger protein can act as an
autonomous folding unit. The temperature shift data implies that the
two -turns are more stable structures and, therefore, may guide
the folding of the peptide. However, this peptide is small enough such
that it could find the correct structure by random search of
conformational space, and folding may take place in a single
cooperative step. The folded C-loop may then act as a template that
guides the subsequent steps in protein folding. The results suggest
that the folding of the backbone and the side chains can take place
concurrently. Overall folding of the protein may consist of a series of
precise events with intermediates that have a well defined structure.
Previous work has identified
five residues in or near TM-EGF4, whose substitution by alanine
decreases the activity of TM by more than a factor of four:
Asp, Glu
, Tyr
,
Phe
, and
Met
(9, 10, 13) . Asp
is found in the interdomain loop N-terminal to EGF4. Residues
Glu
and Tyr
in the A-loop are located in
the three-residue loop between the second and third cysteine.
Phe
is in the C-loop, and Met
is in the
comparatively short three-residue interdomain loop C-terminal to EGF4.
A potential explanation for this functional data can be found by
examining the structure of the homologous protein domain, FXa-C, the
C-terminal EGF domain of factor Xa(29) . Of the five proteins
available for structural comparisons (Table 3), this protein
comes closest to matching TM-EGF4 in the size of the critical loops,
including matching the spacing between the second and third cysteine.
The work presented here has shown that Phe and
Met
in TM-EGF4 are directly homologous to Tyr
and Pro
in FXa-C (the numbering of residues for
FXa-C is the same used by Padmanabhan et al.(29) ).
Based on the relative location of the second and third cysteines,
Glu
and Tyr
should be directly homologous
to residues Asp
and Gln
in FXa-C. These
four residues form a contiguous patch on the surface of FXa-C. This
patch accounts for roughly half of the contact area between the FXa-C
and the serine protease domain. A similar interface involving an
EGF-like domain is also found in prostaglandin H
synthase-1(34) . It is quite possible that TM-EGF4 forms
of a ternary complex with thrombin and/or protein C using a similar
binding motif. Therefore, mutations in residues Glu
,
Tyr
, Phe
, and Met
would
perturb the formation of this complex. However, without more direct
experimental information, this model must be treated as a working
hypothesis.
The atomic coordinates and structure factors (code 1tmr) have been deposited in the Protein Data Bank, Brookhaven National Laboratory, Upton, NY.