The Crystal Structure of the Inhibitor-complexed Carboxypeptidase D Domain II and the Modeling of Regulatory Carboxypeptidases*

Patrick AloyDagger §, Verònica CompanysDagger §, Josep VendrellDagger , Francesc X. AvilesDagger , Lloyd D. Fricker||**, Miquel CollDagger Dagger , and F. Xavier Gomis-RüthDagger Dagger

From the Dagger  Institut de Biologia Fonamental and Departament de Bioquímica i Biologia Molecular, Unitat de Ciències, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain, || Department of Molecular Pharmacology, Albert Einstein College of Medicine, Bronx, New York 10461, and Dagger Dagger  Institut de Biologia Molecular de Barcelona, Centre d'Investigació i Desenvolerpament-Consejo Superior de Investigaciones Científicas, Jordi Girona, 18-26, 08034 Barcelona, Spain

Received for publication, December 19, 2000, and in revised form, February 14, 2001


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

The three-dimensional crystal structure of duck carboxypeptidase D domain II has been solved in a complex with the peptidomimetic inhibitor, guanidinoethylmercaptosuccinic acid, occupying the specificity pocket. This structure allows a clear definition of the substrate binding sites and the substrate funnel-like access. The structure of domain II is the only one available from the regulatory carboxypeptidase family and can be used as a general template for its members. Here, it has been used to model the structures of domains I and III from the former protein and of human carboxypeptidase E. The models obtained show that the overall topology is similar in all cases, the main differences being local and because of insertions in non-regular loops. In both carboxypeptidase D domain I and carboxypeptidase E slightly different shapes of the access to the active site are predicted, implying some kind of structural selection of protein or peptide substrates. Furthermore, emplacement of the inhibitor structure in the active site of the constructed models showed that the inhibitor fits very well in all of them and that the relevant interactions observed with domain II are conserved in domain I and carboxypeptidase E but not in the non-active domain III because of the absence of catalytically indispensable residues in the latter protein. However, in domain III some of the residues potentially involved in substrate binding are well preserved, together with others of unknown roles, which also are highly conserved among all carboxypeptidases. These observations, taken together with others, suggest that domain III might play a role in the binding and presentation of proteins or peptide substrates, such as the pre-S domain of the large envelope protein of duck hepatitis B virus.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Carboxypeptidases (CPs)1 are enzymes that catalyze the cleavage of C-terminal peptide bonds in proteins and peptides. From a mechanistic point of view, CPs can be classified in two groups, metalloCPs and serine CPs. The metalloCPs possess a Zn2+ cofactor in the active site. In mammals, this family currently contains 13 members subdivided into two subfamilies, the digestive enzymes and the regulatory enzymes (1-5). Whereas the biological function of the digestive CPs is to contribute to protein degradation, the regulatory ones are generally involved in physiological processes that require a higher specificity. Within each group, the members have 25-63% amino acid sequence identity, but it decreases to only 15-25% when comparison is performed between subfamilies. This low overall homology between subfamilies implies that they diverged early in time.

The digestive CPs are soluble, non-glycosylated proteins that are synthesized as inactive precursors containing a 90-95-amino acid N-terminal pro-segment (5, 6). The regulatory CPs have been purified and characterized from biological fluids and tissues, where they are found in soluble or in membrane-attached forms, in minor quantities. This subfamily includes CPD, CPE, CPM, CPN, CPZ, and novel proteins with an unknown function designated adipocyte enhancer-binding protein 1, CPX-1, and CPX-2 (3, 7-11). These proteins perform a variety of important physiological functions, including neuropeptide and prohormone processing, regulation of peptide hormone activity, and alteration of protein-protein or protein-cell interactions (2, 3, 12).

CPE, also known as enkephalin convertase or CPH (EC 3.4.17.10), is a CPB-like enzyme associated with the biosynthesis of many peptide neurotransmitters and hormones. It was purified for the first time from bovine brain (13). Later, cDNAs corresponding to CPEs from cattle (14), rat (15, 16), human (17), Aplysia californica (18), and the fish Lophius americanus (19) were cloned and sequenced. The amino acid sequence homology among vertebrate species is greater than 80%. The molecular mass of CPE is 55 kDa, and it is formed by 476 residues, of which 25 correspond to the signal peptide, and 17 correspond to the pro-segment. However, and in contrast with digestive CPs, scission of the pro-segment is not necessary for expression of the activity (20). Also, in contrast with the great majority of metalloCPs, whose optimum pH value is around neutrality, CPE has its maximum activity at an acidic pH value, between 5 and 5.5 (21), coincident with the internal pH value of the secretory granules. It has also been observed that its activity is regulated by the presence of Co2+ (1). Several analogs of arginine and lysine, which were originally designed as active site-directed inhibitors of CPB and CPN, were found to be potent inhibitors of CPE (13). Two of these compounds, guanidinoethylmercaptosuccinic acid (GEMSA) and aminopropylmercaptosuccinic acid, are several hundred-fold more potent as inhibitors of CPE than of either CPB or CPN (13).

It has been shown that mice with the mutation Cpefat/Cpefat have deficient proinsulin processing because of the absence of CPE activity in the pancreatic islets and the pituitary, caused by a point mutation S202P (22). Mice containing such mutations in the Cpe gene also show a reduced ability to process other hormones (23). However, the observation that Cpefat/Cpefat mice are still able to process a small quantity of insulin suggested that another CP was also involved in peptide processing.

A search for additional CPE-like enzymes led to the discovery of CPD (EC 3.4.17.22) (8). CPD is a 180-kDa protein containing a signal peptide, three CPE-like domains of ~390 residues separated by short bridge regions, a transmembrane domain, and a 60-residue C-terminal cytosolic tail (24-26). The cDNAs corresponding to CPD of human (25), rat (26), mouse (27), duck (24), Drosophila melanogaster (28), and A. californica (29) have been cloned. All species contain three CPE-like domains (here named CPD-I, CPD-II, and CPD-III), suggesting that their distinct physiological functions are important. The characterization of the first and second domains of CPD has shown that both possess catalytic activity and have somewhat complementary activities. Specifically, the first domain is optimally active at pH 6.3-7.5 and prefers substrates with C-terminal Arg, whereas the second domain is optimally active at pH 5.0-6.5 and prefers substrates with C-terminal Lys (30). In contrast, the third domain is inactive toward a variety of standard CP substrates (30, 31). Duck CPD, also named gp180, was identified by its ability to bind the pre-S domain of the large envelope protein of duck hepatitis B virus particles (24). A comparison of human and duck CPD reveals 66, 83, and 82% amino acid sequence identity among the first, second, and third CP repeats, respectively. Recent studies with mutants lacking the first, second, or third CP-like domains have shown that the third domain of duck CPD is responsible for binding to the pre-S domain of the large envelope protein from hepatitis B virus and that this binding does not require CP activity (31). Despite the absence of activity in the third domain, the fact that it is highly conserved among duck and mammals suggests the existence of a biological function for it.

Crystallization of both CPE and the complete three-domain CPD has been attempted. However, low yields in the protein recovery and the occurrence of glycosylations, together with the fact that the interdomain linker peptides in CPD are probably highly flexible, have precluded direct 3D structure determination. The only crystal structure from the regulatory metalloCP subfamily that has been solved is that of the second domain of duck CPD (32). It displays a 300-residue N-terminal alpha /beta -hydrolase with overall topological similarity to pancreatic CPA. This subdomain is followed by a C-terminal 80-residue beta -sandwich subdomain, unique for these regulatory metalloenzymes and topologically related to transthyretin and sugar-binding proteins (32). To further investigate and better define the enzyme substrate pocket and to provide a basis for the rational design of specific inhibitors of regulatory CPs, we have solved the crystal structure of CPD domain II in complex with the peptidomimetic inhibitor GEMSA. Based on this structure, overall models and detailed ones of the respective active sites have been built for human CPE and domains I and III of duck CPD. These models permit hypotheses about the structural basis of enzyme specificity and biological activity.

    EXPERIMENTAL PROCEDURES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Crystallization-- Crystals of native CPD-II were produced as mentioned previously (32). The CPD-II·GEMSA complex was obtained by soaking native crystals in a 2.5 M solution in ammonium sulfate, buffered with 0.15 M sodium acetate to pH 5.2, and containing 10 mM GEMSA (purchased from Calbiochem), for 3 days. Diffraction data to 2.6-Å resolution were collected from a single N2-cryocooled complex crystal that belongs to the same spacegroup (P213) as the native ones at the EMBL synchrotron beamline BW7B (Deutschen Elektroensynchrotron, Hamburg, Germany). Data were processed with MOSFLM, version 6.0.1 (33) and SCALA from the CCP4 suite (Collaborative Computer Project 4, 1994). The coordinates of native CPD-II (after removal of solvent molecules and sulfate anion 998 located in the active site cleft; see Ref. 32), were used for initial rigid body refinement. Positional/temperature refinement employing the program CNS, version 0.9 (34) and using maximum likelihood as minimization criterium followed and omit maps (sigma A-weighted 2Fobs - Fcalc and Fobs - Fcalc) were computed. The difference density map clearly revealed the location of the bound inhibitor (Fig. 1) allowing its model building using the Turbo-Frodo program (35). The complex was submitted to further positional/temperature refinement after setup of appropriate inhibitor parameter and topology files. The refinement of the occupancy of the latter revealed 100% presence, in accordance with the very high affinity of the inhibitor (in the nmol range). The final model comprises residues Gln4-Thr383 of the chemical sequence, 195 solvent molecules (labeled 601-795), one zinc cation (residue 999), one sulfate anion (998) with partial occupancy, and the 15-atom inhibitor GEMSA (designated Gem801). Three asparagine residues were found to be glycosylated (Asn136, Asn321, and Asn377). One peptide bond (Pro190-Phe191) has been found in the cis conformation. Table I provides a summary of the data processing and final model refinement. The coordinates of the complex structure have been deposited with the Protein Data Bank (access code 1h81).


View larger version (41K):
[in this window]
[in a new window]
 
Fig. 1.   Stereo plot of CPD-II in complex with GEMSA displaying the final structure around the active site cleft superimposed with the initial sigma  A-weighted omit map density (Fobs - Fcalc) contoured at 2.5 sigma . The two inhibitor carboxylate groups coordinate the catalytic zinc ion (blue sphere) in a bidentate manner and Arg145/Arg135 from the protein, respectively. The 2-guanidinoethyl moiety of the inhibitor is placed in the specificity pocket. Some residues are labeled.

                              
View this table:
[in this window]
[in a new window]
 
Table I
Data collection, processing, and final model refinement

Model Building-- A preliminary multiple alignment was performed by means of the program PILEUP (36, 37) for the three duck CPD structural repeats (domains). This alignment was used as a "seed" to build a hidden Markov model profile with the program HMMER (38) that was used to align eight additional homologous sequences. Expert knowledge and experimental information were also used to improve the quality of the alignment in several segments. The primary and 3D structures of duck CPD-II were used as a template to build the models. A segment of 30 residues from CPT (PDB access code 1obr) was aligned to CPE to model a 23-residues insertion (residues 202-224; see footnote to Table II for the conventions used on numbering the different sequences). Finally, a 25-residues stretch from the sequence of adenovirus coat protein (1dhx) was also aligned to the insertion observed in CPD-I (residues 96-124). Using the multiple alignment for the three CPD domains (CPD-I, CPD-II, and CPD-III) and CPE as a starting point, a method of comparative modeling by satisfaction of spatial constraints was used to build the 3D structure of CPD-I, CPD-III, and CPE. This method is implemented in the program MODELLER (39). The spatial constraints are derived by transferring the spatial features from the structures of known proteins to the sequence of the unknown ones. The program PROSA-II (40) was used to check the quality of the models as described in a previous work (41). The regions with non-near-native fold were identified by the high positive values of pseudo-potential energy, independently of the crystallographic structure. Once the three models were built automatically, manual intervention was required for re-modeling those regions identified by PROSA-II with non-near-native fold. The program FRAZER, developed in our laboratory2 was used to reconstruct the problematic regions. The overall RMSD calculations and superimposition of the three modeled structures with respect to the crystallographic one (CPD-II) were obtained according to the structural alignment given by the program SSAP (42). The active sites superposition and GEMSA inhibitor replacement in the three models were also performed with FRAZER. The coordinates of the CP models, in PDB format, are available upon request.

                              
View this table:
[in this window]
[in a new window]
 
Table II
Selected residues in duck CPD-II and their equivalents in bovine CPA, bovine CPB, human CPE, and full-length duck CPD


    RESULTS
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Structure of the Complex CPD-II·GEMSA-- The CPD-II polypeptide chain in the complex is folded into two distinct subdomains, a 300-residue catalytic CP subdomain displaying the alpha /beta -hydrolase fold reminiscent of the CPA structure and an 80-residue C-terminal subdomain of all-beta pre-albumin-like beta -sandwich folding topology, the so-called transthyretin subdomain (32). The complex structure displays no significant deviations from the native protein (32), as denoted by a RMSD of 0.19 Å for all Calpha atoms. Only the catalytic zinc ion is somewhat moved away (0.7 Å) from its position in the non-complexed domain structure forced by the presence of the inhibitor. This movement is accompanied by a similar displacement (0.7 Å) of one of the coordinating residues, His74. Interestingly, the catalytic solvent molecule (601) attached to the zinc ion in the unliganded structure is moved 2.3 Å away upon inhibitor binding (solvent molecule 684 in the present structure; see Fig. 2). The peptidomimetic GEMSA molecule occupies the primed side of the catalytic active site cleft, emulating a bound C-terminal amino acid after proteolytic cleavage of a substrate. The guanidinoethylmercapto group is reminiscent of a substrate arginine side chain (CPD-II displays a CPB-like preference for basic residues in P1') and occupies the same position in the specificity pocket. It is anchored through its atoms Neta 1 and Neta 2 to the side chain of Asp192 and the main chain carbonyls of Gly246 and Tyr250, the latter one present in the "down" conformation as in the native structure (32). This planar guanidinoethylmercapto group establishes an additional van der Waals' interaction (3.8 Å) with Val252. The inhibitor carboxylate group mimicking a peptide substrate C terminus is anchored to both Arg145 and Arg135. The second carboxylate group is similar to a scissile carbonium ion in the transition state and coordinates the catalytic zinc ion in a bidentate manner. One of its oxygens is further bonded by Arg135 and His74 (see Fig. 2).


View larger version (30K):
[in this window]
[in a new window]
 
Fig. 2.   Stereo plot of CPD-II in complex with GEMSA as a ball and stick model displaying the hydrogen bond network around the active site. The catalytic zinc ion (999) is displayed as a cyan sphere, the inhibitor moiety is displayed in violet, and the solvent molecules are displayed as red spheres. Residues that conform the active site and interact with the inhibitor are labeled.

Sequence Alignments, Model Building, and Refinement-- Fig. 3 shows the multiple sequence alignment of the three domains of duck CPD and human CPE. This alignment, performed as indicated under "Experimental Procedures," allows us to derive accurate models for these proteins. The alignment reveals 42% sequence identity between duck CPD-II and CPD-I and 32% sequence identity between duck CPD-II and CPD-III. The percentage identity for human CPE with respect to duck CPD-II is even higher, at 50%. These identity levels allow homology modeling of the three-dimensional structures of these proteins.


View larger version (51K):
[in this window]
[in a new window]
 
Fig. 3.   Multiple alignment of duck CPD (domains I, II, and III) and human CPE. Numbers above the sequences highlight residues mentioned under "Results" or "Discussion" and correspond to the numbering used in the description of the crystal structure of CPD-II (32). The indicated sequence of CPD-II begins with residue 2 of the protein used for cystallization. CPD-I, CPD-II, and CPD-III correspond to residues 38-500, 501-920, and 925-1336 of full-length duck CPD, respectively (24). The indicated sequence of CPE corresponds to residues 43-476 of human CPE (17). The alignment was performed using the programs PILEUP and HMMER and were manually refined to account for experimental information. Metal binding residues, catalytic residues, and some important substrate binding residues are in bold and boxed. See Table II for equivalent positions in the standard sequence numbering for pancreatic carboxypeptidases.

The alignment in Fig. 3 shows that all the residues experimentally described as important (those at the active site, at the metal binding site, and at the substrate binding subsites) are essentially conserved among all the sequences, except for CPD-III. Several insertions and deletions, however, can be observed in the alignment. First of all, a large insertion (29 residues) in CPD-I can be detected. This inserted stretch is extremely charged, with 5 basic and 15 acidic residues, and the only sequences in the data banks that show a certain level of homology with it belong to proteins that interact with nucleic acid-binding proteins, which are largely unstructured in the absence of an interacting partner (43). In any case, only one of these related sequences has its 3D structure determined (adenovirus type 2 hexon, PDB code 1dhx), and the low percentage of identity observed is not sufficient to model this stretch. Another large insertion (23 residues) can be found in human CPE. This insertion is present in all species of CPE, as well as most other members of the CP family, although the length varies from 14-15 residues for CPA, CPB, and the bacterial CP to 27 residues for CPX-1, CPX-2, and adipocyte enhancer-binding protein 1. Because the three-dimensional structure of this loop in CPA, CPB, and CPT is known, this region of CPD can be modeled using the crystal structure and alignment of the other CPs. The rest of indels are much shorter and can be modeled with reasonable confidence by energy optimization.

The pseudo-energies of the original models were calculated with PROSA-II (40) to identify the incorrect chain tracings. As expected, the regions that presented higher energy were those of the indels (data not shown). In the case of the large highly charged insertion in duck CPD-I, the energy tended to be infinite, and consequently, the loop was removed from the model. For the insertion in human CPE, the pseudo-energy was corrected to acceptable values by manual modification of the CPE model and energy minimization.

The overall RMSDs calculations between the 3D structure of CPD-II and the three different models gave the following values: 0.5 Å for CPD-I (once the non-modeled loop was removed), 1.3 Å for CPD-III, and 0.3 Å for CPE. Taking into account that only one crystal structure was used to model the three sequences, the RMSDs correlate well with the percentage of sequence identity obtained in the multiple alignment. The RMSD was also calculated for the three models using only the active site residues. In this case, the results were 0.1 Å for CPD-I and CPE and 0.5 Å for CPD-III.

Fig. 4 shows the modeled 3D-structures of CPD-I, CPD-III, and CPE compared with the crystal structure of CPD-II. The RMSD values indicate that, although a number of local differences are obviously present (discussed below) the models share a common topology, and the relative positions of the two subdomains are maintained in all of them. A close inspection of the models also shows that the major structural features in CPD-II that suggest a different selectivity of regulatory CPs toward large protein substrates as compared with the pancreatic enzymes are also present in the other members of the regulatory family studied here. Thus, those loops in the funnel-like access to the active site, which are probably responsible for the different selectivity of the two families of CPs, conform an opening of the solvent-exposed surface, which, beyond individual characteristics that will be discussed below, is very similar in all cases.


View larger version (54K):
[in this window]
[in a new window]
 
Fig. 4.   Ribbon representation of the modeled structures of duck CPD (domains I and III) and human CPE, compared with the crystal structure of duck CPD II. Top, the three loops that shape the entrance to the active site are in red (124), green (149), and cyan (225). Bottom, modeled structures showing the regular secondary structures, alpha -helix (blue) and beta -strands (magenta). The residue numbering corresponds to the CPD-II structure.

Conserved Interactions between GEMSA and the Different Models-- After superimposing the four active sites, the GEMSA inhibitor was emplaced in the three models to find and rationalize its possible interactions with the enzymes. The fit was excellent in all three cases. The residues in the x-ray structure of CPD-II interacting with the GEMSA inhibitor and their equivalents in the three modeled structures are shown in Table II. As can be seen, in CPD-I and CPE all the hydrogen bonds found in the co-crystal are conserved in the complex between the inhibitor and the protein residues. In contrast, in CPD-III several critical interactions are lost because of the different residues found at the active site.

    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

Structural Basis of the Inhibitor Action-- The CP inhibitor GEMSA has been frequently used as a potent inhibitor of regulatory CPs (CPN, CPE, and CPD). The Ki values determined to date fall in the low nanomolar range, 4 nM for duck CPD-I, 34 nM for duck CPD-II (30), and 8 nM for bovine CPE (13). This is the first time that a crystal structure of its complex with one of these enzymes has been reported. Such structure clearly explains the powerful action of this inhibitor; the catalytic water and the essential Zn2+ are both displaced, the latter one being bound in a bidentate manner by one of the carboxylate groups of the inhibitor. In addition, the inhibitor is bound to residues of CPD-II that are essential for substrate binding and polarization, Tyr250, Arg145, and Arg135. Therefore, several structural elements indispensable for the catalytic action of the enzyme are perturbed or shielded by the inhibitor.

Taking into account the similarity of CPD-II to the modeled structures and the easy way in which GEMSA has been fitted on them, it is expected that the inhibitor binds in a very similar way to CPD-I and CPE. However, it is unlikely that GEMSA binds CPD-III because of the absence of critical residues (discussed below).

Overall Comparison of the Models-- The derived models of CPD-I, CPD-III, and CPE show an overall similarity with the recently described crystal structure of CPD-II (32). In all models two subdomains are clearly visible, the CP subdomain and the C-terminal subdomain, which shares topological similarity and connectivity with transthyretin (32). The CP subdomain shows the alpha /beta -hydrolase fold common to many proteases from the cysteine, serine, and metalloprotease families. It is formed by a doubly wound eight-stranded beta -sheet flanked on both sides by three and six helices, respectively. Meanwhile, the C-terminal subdomain displays a rod-like shape forming a beta -barrel or beta -sandwich of pre-albumin-like folding topology made up by seven strands connected by short loops. This is valid for all the models studied here.

As in CPD-II, the interactions between both subdomains are mainly of a hydrophobic nature in all models. Most of the van der Waals' interactions described for CPD-II are also found in CPD-I and CPE. Albeit containing a smaller number of such interactions, CPD-III still conserves the most significant ones. A number of hydrogen bonds also contribute to subdomain interactions in CPD-I, CPD-III, and CPE, most of them being exactly conserved in CPD-I and CPE versus CPD-II, and greater differences being found for CPD-III. It is worth mentioning that the only salt bridge between subdomains described for CPD-II, Asp206-Arg343, is also conserved in the modeled structures between pairs of Asp/Arg at equivalent positions in CPD-I and CPE and between Glu1123 and His1258 at equivalent positions in CPD-III. Also, the disulfide bond in CPD-II between Cys230 and Cys275 is also predicted in the three models; an additional disulfide between Cys70 and Cys132 in CPE (already detected from biochemical measurements) is also predicted in the model built for this form.

CPE is the protein with the highest homology to CPD-II and also the one whose model has the lowest RMSD value with the experimental 3D structure. However, it should be taken into account that the RMSD value was calculated only on the structurally equivalent residues given by the alignment performed with the program SSAP. This means, for instance, that the 23-residues insertion in CPE, spanning from Glu158 to Lys189, was not considered. As compared with CPD-II, pancreatic and bacterial CPs also have a 14-amino acid insertion in this region, forming a loop that shapes one side of the entrance to the active site and establishing cross-connections to an adjacent loop. This feature is considered to be one of the distinctive determinants of specificity between regulatory and pancreatic CPs. The 23 extra residues in CPE form a turn-rich region rather exposed to the solvent in the model built, according to the well defined structure of this loop in Thermoactinomyces vulgaris CPT (1obr).

The main difference between CPD-I and CPD-II is the above mentioned long insertion of 29 residues in CPD-I that contains 20 net charges. This sequence was eliminated from the calculations as no homologous sequences and 3D structures were found to model it with a sufficient degree of confidence. A further significant difference is a glycine-rich insertion of nine residues in one of the loops that shape the active site entrance (residues 165 to 173 in Fig. 1). This insertion does not generate a substantial change in the surface of the active site cleft in our model and is folded inwards over the molecular body of the enzyme.

The study of the model of CPD-III is particularly important because of its lack of enzymatic activity (31), probably because of the absence of key residues for CP catalysis. However, alignment of the sequences and superimposition of the 3D structures shows that other residues with yet unknown function are highly conserved. When comparing the models and the structure, three categories of residues can be defined. The first one is formed by the residues essential for catalysis. In CPD-II, these essential residues are His74, Glu77, and His181 (coordinators of Zn2+), Glu272, and Arg135. Only the first His is conserved in CPD-III, whereas the other residues are replaced by Ala, Asp, Tyr, and His, respectively. The enzymatic machinery of CPD-III is therefore disabled, because neither proper coordinators of Zn2+ nor a general base or a polarizing residue are present (6), respectively.

Those residues that are necessary for substrate binding are included in the second category. The triad Asn144, Arg145, and Asn146, which is responsible for the anchoring of the C-terminal carboxylate (COO-) of the substrate, is generally conserved in all CPs that are enzymatically active toward peptides, including the pancreatic and bacterial CPs (Table II). In CPD-III, this triad is replaced by Asp, Thr, and Asp, rendering a domain that has lost the ability to anchor the carboxyl group. Interestingly, a peptidase in the bacterium Bacillus sphaericus is a distant member of the metalloCP family that also lacks this Asn-Arg-Asn triad (44). Instead of cleaving substrates with a C-terminal carboxylate group, as in other CPs, the B. sphaericus peptidase hydrolyzes C-terminal meso-diaminopimelic acid. This substrate has an amino group in place of the carboxylate of a typical peptide, consistent with the replacement of the Asn-Arg-Asn with an Asn-Asp-Gln. Thus, the differences in this sequence between CPD-II and CPD-III are predicted to be critical for defining the binding specificity of each domain.

Some other residues necessary for substrate binding in CPD-II are also different in CPD-III. For example, Tyr250, Val252, and Gly255 of CPD-II are replaced by His, His, and Ser, respectively, in CPD-III. However, despite these replacements, when the model of CPD-III and the crystal structure of CPD-II are superimposed, it can be observed that the different residues in CPD-III occupy exactly the same position of their homologues in CPD-II. Taken together, it is unlikely that CPD-III binds with high affinity to peptides that are substrates of the other two domains.

The rest of the residues that are highly conserved in almost all CPs, either regulatory or pancreatic, like Gly40, Asn117, Gly120, and Pro190 (numbering system of CPD-II) would belong to the third category. None of them has been related to any specific function, and their role is more likely purely structural.

Thus, to summarize, the catalytic machinery of CPD-III has been suppressed by replacement of the key residues for CP activity, and there are also substantial differences in the residues responsible for substrate binding. The high conservation of sequence and structure in the enzymatically incompetent CPD-III suggests another biological function, possibly related to the binding of proteins or other molecules.

Active Site and Substrate Binding Subsites-- All residues involved in metal binding and catalysis are conserved in CPD-I, CPD-II, and CPE. CPD-III is the already commented exception, because it lacks most of the residues involved in Zn2+ binding and the Arg that binds the terminal carboxylate (here a Thr) and polarizes the scissile peptide bond (a His in CPD-III). Also, the general base (Glu272 in CPD-II) has its position occupied by a Tyr in CPD-III.

The loops that form the specificity pocket in CPD-II (S1' subsite) (Asn188-Asp192, Gly246-Gln257, and Phe267-Thr270) have the same length in all the models; amino acid residue identity in these loops is high for CPD-I and CPE and low for CPD-III. There is also low identity between these loops of CPD-II and those of the pancreatic CPs. On the other hand, it is worth noting that Tyr250 (equivalent to Tyr248 in pancreatic CPs, the one that caps the active site, facilitating the proper location of the substrate over it, and that fluctuates between two conformations depending on substrate binding), is replaced by His in CPD-III, supporting the idea that this domain is unable to catalyze peptide bond hydrolysis.

A key residue that is essential for the specificity of digestive CPB for C-terminal basic residues is Asp255 (6, 45), which is replaced by an Ile or Leu in the digestive enzymes that prefers C-terminal aliphatic and aromatic residues (Table II). In CPD-I, CPD-II, and CPE, which all are highly specific for basic C-terminal amino acids, the residue in a position sequentially equivalent to this Asp255 of CPB is a Gln (Table II), which is functionally unable to perform a similar role as the Asp. Instead, the electronegative character required for the selectivity for C-terminal Lys and Arg residues is provided by Asp192, located in a spatially comparable position. This Asp192 is conserved in all regulatory CP, including CPD-III. However, in CPD-III a Lys residue is found in the position equivalent to Asp255 of CPB (Table II). In the model built, this Lys residue is not directed toward the substrate-binding pocket, as it adopts a conformation similar to the side chain to which it has been modeled (i.e. Gln257 in CPD-II). However, if we consider the presence of this Lys residue, together with the above-mentioned substitution of the very conserved triad Asn-Arg-Asn at the bottom of the specificity pocket by Asp-Thr-Asp in CPD-III, it is tempting to envisage that CPD-III could be able to show a fully reversed selectivity and bind positive terminal charges linked to acidic side chains. Clearly, only a crystal structure of CPD-III in complex with a yet unknown putative substrate would shed light into the question.

The relevant residues at the S2 subsite in CPD-II are also found in equivalent positions in the three models of CPD-I, CPD-III, and CPE. The residues that line this subsite are considerably different in pancreatic CPs, suggesting that a general specificity for either sequence or volume of the substrates is shared in all regulatory enzymes, including the inactive CPD-III. As an example, Gly182 and Gly183 (CPD-II numbering) are present in all models of the regulatory forms at the same positions found in the crystal structure, whereas the equivalent residues in pancreatic CPs are Ser197 and Tyr198, also highly conserved in such pancreatic enzymes.

Variation is also observed in all proteins for those residues putatively involved in subsite S3. However, one remarkable difference involves Lys277, conserved in CPD-I and CPE, and that was putatively involved in P2 carbonyl oxygen binding in CPD-II (32), which is replaced by a Tyr in CPD-III.

Accessibility of the Active Site-- One of the most significant structural differences between the crystal structure of CPD-II and pancreatic CPs is the long insertion Tyr225-His241 that shapes the border of the funnel that leads to the active site and that hinders the binding of potato CP inhibitor to CPD. Potato CP inhibitor is a 39-residue peptide that potently inhibits several of the digestive CPs including CPA, CPB, and CPU (see 32). Although particular residues are not conserved, the loop is present in all models suggesting that restrictions in specificity may be common to all regulatory enzymes. However, two further loops are also critical in shaping the funnel border, and significant differences are observed in these cases (Fig. 4). The insertion of a 9-residue Gly-rich sequence at loop Ser124-Val133 (CPD-II numbering) does not seem to affect the accessible surface in CPD-I. In all cases the loop is longer than that observed in pancreatic CPs and is folded inwards, partially covering the access to the active site. CPE has a much longer insertion between residues 157 and 158 of CPD-II (Fig. 3) that coincides with an equivalent, albeit shorter, insertion in pancreatic CPA and CPB. Taken together, all these observations suggest that, within a general frame of specificity, regulatory CPs have developed variations in the structural determinants that lead to selection of substrates that are far more sophisticated than the mere selectivity of C-terminal residues observed in the pancreatic enzymes. Work is in progress to test this hypothesis.

The information collected or derived in the present study might facilitate the understanding of the differential biological roles of regulatory CPs and the design of specific inhibitors for them. These would be interesting tools to experimentally analyze the properties and roles of these enzymes, and to produce lead compounds for drug design, given the potential biotechnological and biomedical interest in the modulation of their activities.

    ACKNOWLEDGEMENTS

The support provided by the Training and Mobility of Researchers/Access to Large Side Facilities program to the EMBL Hamburg Outstation (reference ERBFMGECT980134) is gratefully acknowledged.

    FOOTNOTES

* This work was supported in part by Grants BIO98-0362, BIO2000-1659, PB98-1631, and 2FD97-0518 from the Ministerio de Educación y Cultura (Spain), by Grant 1999SGR-188 and the Center de Referència en Biotecnologia (both from the Generalitat de Catalunya), by Grants DA-00194 and DK-51271 from the National Institutes of Health, and by the United States-Spain Science and Technology Collaborative Program, 1999. V. C. is a predoctoral fellow of the Universitat Autònoma de Barcelona, and P. A. is a postdoctoral fellow of the Ministerio de Educación y Cultura (Spain).The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§ First author.

To whom correspondence may be addressed: Institut de Biologia Fonamental, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain. Tel.: 34-93-581-1315; Fax: 34-93-581-2011; E-mail: fx.aviles@blues.uab.es.

** To whom correspondence may be addressed: Dept. of Molecular Pharmacology, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461. Tel.: 718-430-4225; Fax: 718-430-8954; E-mail: fricker@aecom.yu.edu.

Published, JBC Papers in Press, February 14, 2001, DOI 10.1074/jbc.M011457200

2 Unpublished information.

    ABBREVIATIONS

The abbreviations used are: CP(s), carboxypeptidase(s); GEMSA, guanidinoethylmercaptosuccinic acid; 3D, three-dimensional; RMSD(s), root mean square deviation(s).

    REFERENCES
TOP
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
REFERENCES

1. Fricker, L. D. (1988) Annu. Rev. Physiol. 50, 309-321[CrossRef][Medline] [Order article via Infotrieve]
2. Skidgel, R. A. (1988) Trends Pharmacol. Sci. 9, 299-304[CrossRef][Medline] [Order article via Infotrieve]
3. Skidgel, R. A. (1996) in Zinc Metalloproteases in Health and Disease (Hooper, N. M., ed) , pp. 241-283, Taylor and Francis, London
4. Rawlings, N. D., and Barret, A. J. (1995) Methods Enzymol. 248, 183-228[Medline] [Order article via Infotrieve]
5. Vendrell, J., Querol, E., and Aviles, F. X. (2000) Biochim. Biophys. Acta 1477, 284-298[Medline] [Order article via Infotrieve]
6. Aviles, F. X., Vendrell, J., Guasch, A., Coll, M., and Huber, R. (1993) Eur. J. Biochem. 211, 381-389[Abstract]
7. He, G. P., Muise, A., Li, A. W., and Ro, H. S. (1995) Nature 378, 92-96[CrossRef][Medline] [Order article via Infotrieve]
8. Song, L., and Fricker, L. D. (1995) J. Biol. Chem. 270, 25007-25013[Abstract/Free Full Text]
9. Song, L., and Fricker, L. D. (1997) J. Biol. Chem. 272, 10543-10550[Abstract/Free Full Text]
10. Xin, X., Day, R., Dong, W., Lei, Y., and Fricker, L. D. (1998) DNA Cell Biol. 17, 897-909[Medline] [Order article via Infotrieve]
11. Lei, Y., Xin, X., Morgan, D., Pintar, J. E., and Fricker, L. D. (1999) DNA Cell Biol. 18, 175-185[CrossRef][Medline] [Order article via Infotrieve]
12. Fricker, L. D. (1991) in Peptide Biosynthesis and Processing (Fricker, L. D., ed) , pp. 199-230, CRC Press, Inc., Boca Raton, FL
13. Fricker, L. D., and Snyder, S. H. (1983) J. Biol. Chem. 258, 10950-10955[Abstract/Free Full Text]
14. Fricker, L. D., Evans, C. J., Esch, F. S., and Herbert, E. (1986) Nature 323, 461-464[Medline] [Order article via Infotrieve]
15. Fricker, L. D., Adelman, J. P., Douglass, J., Thompson, R. C., von Strandmann, R. P., and Hutton, J. (1989) Mol. Endocrinol. 3, 666-673[Abstract]
16. Rodríguez, C., Brayton, K. A., Brownstein, M., and Dixon, J. E. (1989) J. Biol. Chem. 264, 5988-5995[Abstract/Free Full Text]
17. Manser, E., Fernández, D., Loo, L., Goh, P. Y., Monfries, C., Hall, C., and Lim, L. (1990) Biochem. J. 267, 517-525[Medline] [Order article via Infotrieve]
18. Fan, X., and Nagle, G. T. (1996) DNA Cell Biol. 15, 937-945[Medline] [Order article via Infotrieve]
19. Roth, W. W., Mackin, R. B., Spiess, J., Goodman, R. E., and Noe, B. D. (1991) Mol. Cell Endocrinol. 78, 171-178[Medline] [Order article via Infotrieve]
20. Parkinson, D. (1990) J. Biol. Chem. 265, 17101-17105[Abstract/Free Full Text]
21. Greene, D., Das, B., and Fricker, L. D. (1992) Biochem. J. 285, 613-618[Medline] [Order article via Infotrieve]
22. Naggert, J. K., Fricker, L. D., Varlamov, O., Nishina, P. M., Rouille, Y., Steiner, D. F., Carroll, R. J., Paigen, B. J., and Leiter, E. H. (1995) Nat. Genet. 10, 135-142[Medline] [Order article via Infotrieve]
23. Fricker, L. D., Berman, Y. L., Leiter, E. H., and Devi, L. A. (1996) J. Biol. Chem. 271, 30619-30624[Abstract/Free Full Text]
24. Kuroki, K., Eng, F., Ishikawa, T., Turck, C., Harada, F., and Ganem, D. (1995) J. Biol. Chem. 270, 15022-15028[Abstract/Free Full Text]
25. Tan, F., Rehli, M., Krause, S. W., and Skidgel, R. A. (1997) Biochem. J. 327, 81-87[Medline] [Order article via Infotrieve]
26. Xin, X., Varlamov, O., Day, R., Dong, W., Bridget, M. M., Leiter, E. H., and Fricker, L. D. (1997) DNA Cell Biol. 16, 897-905[Medline] [Order article via Infotrieve]
27. Ishikawa, T., Murakami, K., Kido, Y., Ohnishi, S., Yazaki, Y., Harada, F., and Kuroki, K. (1998) Gene 215, 361-370[CrossRef][Medline] [Order article via Infotrieve]
28. Settle, S. H., Jr., Green, M. M., and Burtis, K. C. (1995) Proc. Natl. Acad. Sci. U. S. A. 92, 9470-9474[Abstract]
29. Fan, X., Qian, Y., Fricker, L. D., Akalal, D. B. G., and Nagle, G. T. (1999) DNA Cell Biol. 18, 121-132[CrossRef][Medline] [Order article via Infotrieve]
30. Novikova, E. G., Eng, F. J., Yan, L., Quian, Y., and Fricker, L. D. (1999) J. Biol. Chem. 274, 28887-28892[Abstract/Free Full Text]
31. Eng, F. J., Novikova, E. G., Kuroko, K., Ganem, D., and Fricker, L. D. (1998) J. Biol. Chem. 273, 8382-8388[Abstract/Free Full Text]
32. Gomis-Rüth, F. X., Companys, V., Quian, Y., Fricker, L. D., Vendrell, J., Aviles, F. X., and Coll, M. (1999) EMBO J. 18, 5817-5826[Abstract/Free Full Text]
33. Leslie, A. G. W. (1991) in Crystallographic computing V (Moras, D. , Podjarny, A. D. , and Thierry, J. C., eds) , pp. 27-38, Oxford University Press, Oxford
34. Brünger, A. T., Adams, P. D., Clore, G. M., Delano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T., and Warren, G. L. (1998) Acta Crystallogr. Sect. D Biol. Crystallogr. 54, 905-921[CrossRef][Medline] [Order article via Infotrieve]
35. Roussel, A., and Cambilleau, C. (1989) Turbo-Frodo, Silicon Graphics Geometry Partners Directory , pp. 77-79, Silicon Graphics, Mountain View, CA
36. Fenger, D. F., and Doolittle, R. F. (1987) J. Mol. Evol. 25, 351-360[Medline] [Order article via Infotrieve]
37. Higgins, D. G., and Sharp, P. M. (1989) Comput. Appl. Biosci. 5, 151-153[Abstract]
38. Eddy, S. R. (1998) Bioinformatics 14, 755-763[Abstract]
39. Sâli, A., and Blundell, T. L. (1993) J. Mol. Biol. 234, 779-815[CrossRef][Medline] [Order article via Infotrieve]
40. Sippl, M. J. (1993) Proteins 17, 355-362[Medline] [Order article via Infotrieve]
41. Aloy, P., Mas, J. M., Martí-Renom, M. A., Querol, E., Aviles, F. X., and Oliva, B. (2000) J. Comput. Aided Mol. Des. 14, 83-92[CrossRef][Medline] [Order article via Infotrieve]
42. Orengo, C. A., Brown, N. P., and Taylor, W. R. (1992) Proteins 14, 139-167[Medline] [Order article via Infotrieve]
43. Ptashne, M., and Dann, A. (1997) Nature 386, 569-577[CrossRef][Medline] [Order article via Infotrieve]
44. Hourdou, M. L., Guinand, M., Vacheron, M. J., Michel, G., Denoroy, L., Duez, C., Englebert, S., Joris, B., Weber, G., and Ghuysen, J. M. (1993) Biochem. J. 292, 563-570[Medline] [Order article via Infotrieve]
45. Coll, M., Guasch, A., Aviles, F. X., and Huber, R. (1991) EMBO J. 10, 1-9[Abstract]


Copyright © 2001 by The American Society for Biochemistry and Molecular Biology, Inc.