(Received for publication, November 14, 1994; and in revised form, December 30, 1994)
From the
A full-length cDNA clone coding for human pancreatic
preprocarboxypeptidase A2 has been isolated from a gt11 human
pancreatic library. Expression clones were identified by specific
interaction with antisera raised against the native protein. The open
reading frame of the polynucleotide sequence is 1254 base pairs in
length and encodes a protein of 417 amino acids. This cDNA includes a
short leader signal peptide of 16 amino acids and a 94-amino acid-long
activation segment. The amino acid sequence shows 89% identity to that
of rat procarboxypeptidase A2, the only A2 form sequenced so far, and
64% identity to that of human procarboxypeptidase A1. The newly
determined sequence was modeled to the three-dimensional crystal
structures of both bovine carboxypeptidase A and porcine
procarboxypeptidase A1 by a novel distance geometry approach. Biases in
the modeling were avoided by relying exclusively on automatic
procedures and by using random structures as starting points.
Information taken from the known homologous structures refers only to
the backbone since no explicit data describing the conformation of side
chains were transferred. Ten structures of human carboxypeptidase A2
were determined on the basis of each of the two known crystal
structures. The root-mean-square distance for the backbone atoms
between the 10 structures and their mean for 237 selected residues is
0.7 Å when starting from the bovine protein and 0.8 Å for
251 selected residues when starting from the porcine protein. The 94
residue-long activation segment was also determined in the modeling
based on the porcine zymogen; its structure is well defined but not its
orientation with respect to the enzyme moiety. The model obtained for
human procarboxypeptidase A2 is discussed with respect to the
specificity and activation of the enzyme.
The traditional classification of pancreatic carboxypeptidases
(CPs) ()and their zymogens (procarboxypeptidases: pro-CPs)
into the A and B forms (1) has changed in recent years after
the identification of the A1 and A2 isoforms, first reported for the
rat proteins(2, 3) . The A1 isoform is equivalent to
the forms previously known as A, of which the bovine (4) and
porcine (5, 6) species are good representatives, and
shows preference for aliphatic C-terminal residues of peptide
substrates. The A2 isoform selectively acts on the bulkier aromatic
C-terminal residues, and has only been characterized in depth for the
enzyme form in the rat system(3, 7) ; however, no
information about the three-dimensional structure of the proenzyme is
available.
At the proenzyme level, these proteins show a higher degree of complexity due to their oligomeric association with proenzymes of serine proteases. These associations preferentially involve the precursors of the A1 form, while the A2 and B forms are generally found in the monomeric state(1) . This has been clearly demonstrated for the human species(8) . In this system, it was also found that the A2 zymogen shows inhibition properties and an activation mechanism closer to the B forms than to the A1 form, a surprising fact when considering the closer homologies to the latter in sequence and specificity.
Given the limited information about
procarboxypeptidases A2 (pro-CPA2), studies on this isoform from
species other than rat may help to confirm its differential character
and evolutionary pathway and to understand the molecular reasons for
its specific functional properties. With these aims in mind, we have
cloned and sequenced a full-length cDNA for A2 from a gt11 human
pancreatic library as reported in this paper.
A comparative analysis of the deduced amino acid sequence with those from other pro-CPs reveals amino acid substitutions that may well explain the properties of the human A2 zymogen and its active form. These sequence comparisons are further substantiated in the context of our earlier determinations of the primary structure of human pro-CPA1 (9) and of the three-dimensional structures of both porcine pro-CPA1 and pro-CPB(10, 11) . Here, we extend this analysis by modeling the primary structure of human pro-CPA2 onto the known structures of bovine CPA (12, 13) and porcine pro-CPA1(10) , to which it shows 64 and 63% identity, respectively.
Comparison studies have shown that sequence identities between two proteins exceeding about 40% reliably indicate similar three-dimensional structures(14) . Therefore, it is warranted to model a protein with a new sequence according to the known fold of a homologous protein, provided that the sequence homology is significant. Two important questions arise with respect to the modeled structure. 1) How do the substituted amino acid residues fold within the frame of the known structure, and how do the conserved residues and the backbone fold adapt to the new packing requirements? 2) How accurate is the three-dimensional model of the new protein, in particular in regions with substituted residues or with deletions or insertions? The modeling procedure presented and used here avoids biases by relying exclusively on automatic procedures and on random structures as starting points. Information taken from a homologous structure depends only on its backbone conformation. The local precision of the modeled structure is defined by the variations among the conformers resulting from the repeated application of the procedure to various random start structures.
The two lists with distance and dihedral angle constraints and the new sequences were used as input for the distance geometry program DIANA (19) . This program, which operates in the dihedral angle space and thus keeps all bond lengths and bond angles at their ideal values, first creates from the given sequence a structure with random conformation. This conformation is then modified by varying all dihedral angles in an attempt to satisfy all constraints from the input lists as well as the requirements of steric repulsion. DIANA was applied to 50 different random start structures using the standard protocol and one REDAC cycle (20) . At the end, 3000 iterations of optimization with all distance constraints were added. During this last DIANA step, no dihedral angle constraints were enforced. This should allow for local rearrangements to accommodate the new packing requirements caused by the side chain substitutions. For the description of the newly modeled structure, the ten DIANA conformers that converged best, i.e. with the lowest residual violations of the constraints, were used.
After two rounds of rescreening, only six
hybridization positive clones were independently isolated from 10 recombinant phages; two of these clones contained large inserts.
Their cDNAs were isolated at the preparative level, and the size of
their inserts were shown to be of 1331 bp by digestion with EcoRI restriction endonuclease followed by agarose gel
electrophoresis. One of these large cDNA inserts was subcloned into the EcoRI site of the pUC9 vector and entirely sequenced in both
senses. A simple comparison of the amino acid sequence deduced for this
clone with the sequences from other pancreatic pro-CPs indicates that
this cDNA insert contains the full-length of human pre-pro-CPA2. This
was further confirmed by data base screening with the FASTA
program(21) . The nucleotide sequence and the corresponding
amino acid sequence of the protein are shown in Fig. 1.
Figure 1: Nucleotide sequence and corresponding amino acid sequence of human pancreatic preprocarboxypeptidase 2. The signal and activation regions extend from residues -110 to -95 and from residues -94 to -1, respectively. The processing sites for the pre- and pro- peptides are designated by an arrow. The probable polyadenylation signal ATTAAA is underlined. Amino acids are labeled according to the numbering scheme of rat pro-CPA1 (9) . Note the two deletions with respect to pro-CPA1 at positions 6 and 57.
The analyzed full-length cDNA insert contained standard 5`- and 3`-flanking regions, a poly (A) tail, and an open reading frame of 1254 bp coding for 417 amino acids. The size of the coding region coincides with that of the homologous A2 isoform from rat(3) , the only species in which this form has been sequenced until now. The 5`-flanking region of the human full-length cDNA insert was only 4 bp in length, and the 3`-flanking region had a 48-nucleotide segment containing the consensus polyadenylation signal sequence ATTAAA located 18 nucleotides upstream from the poly(A) tail, which is at least 25 nucleotides in length.
The sequenced human pre-pro-CPA2 encodes a 16-amino acid signal peptide at its N-terminus (residues -110 to -95 in Fig. 1), which is presumably cleaved during the expression of the inactive zymogen. This is deduced by comparison with the N-terminal sequences of human pro-CPs previously determined by one of our groups(8) . It is worth stressing here that, in contrast to most eukaryotic signal peptides(22) , the cleavage site is at a cysteine residue. Comparison with the sequences of active human enzymes (8) also allows us to deduce that the proteolytic activation of the A2 human zymogen occurs by tryptic cleavage of a 94-residue N-terminal fragment and generation of a 307-amino acid-long enzyme (residues -94 to -1 and 1 to 309 with two gaps, respectively, in Fig. 1).
Figure 2: Comparison of the amino acid sequence of human pro-CPA2 to those of bovine pro-CPA, both rat and human pro-CPA1, human mast cell pro-CPA, human pro-CPB, and rat pro-CPA2. The numbering system of the carboxypeptidase moiety is made according to the bovine A enzyme(4) . Residues in the activation segments are written in italics and are preceded by an A, and the numbering system results from the alignment based on maximal coincidence of secondary structure elements, except in their C-terminal regions, where alignment is also based on maximal point identities. The sequence of reference in the latter case is that of porcine pro-CPB, and each insertion or deletion is considered to occupy one position (see (11) ). The actual length of the activation segments is 94 residues for all the A forms and 95 for the B forms. Only the amino acids that differ from those of the bovine pro-CPA sequence are shown for the other proteins. Dashes represent amino acid deletions. Opencirclesabove the alignments identify the two deletions in the enzyme moiety of the A2 forms. Asterisks are placed over the functionally important residues, which are discussed in the text. The arrow indicates the site of the primary tryptic activation cleavage.
The
alignments shown in Fig. 2confirm that the overall sequence
homology in the activation segment regions is substantially lower than
that in the enzyme moieties. When the human A2 activation segment is
compared with the corresponding bovine A, rat A1, human A1, human B,
rat A2, and human mast cell A activation segments, the following
identity scores are obtained, respectively: 55, 51, 56, 23, 82, and
22%. These data confirm that the human A2 form is closer to the A1 form
than to the B form. In particular, human A2 shows an important deletion
at relative positions 45-48 in the activation segments, like the
porcine A1 form, which corresponds to the regions folded as a 3 helix in the three-dimensional structure of pro-CPB (11) . The functional importance of these and other changes
will be discussed below. These sequence comparisons also support the
notion that mast cell pro-CPA is closer to pancreatic pro-CPB than to
pancreatic pro-CPAs.
The
list of upper distance constraints used for human CPA2 contains 4210
entries, and the one for human pro-CPA2, 5563 entries. These numbers
include three constraints for the enforcement of each of the two
disulfide bridges between cysteines 138 and 161, and cysteines 210 and
244. The former disulfide bridge is conserved in all CPAs, while the
latter is observed only in the A2 forms. The corresponding residues in
the crystal structures of the A1 forms are, however, separated by only
a short distance (e.g. the distance from Thr C
to Ile
C
in
bovine CPA is 3.8 Å). A few additional distance constraints
enforce the interaction of the zinc ion with histidines 69 and 196.
Again, these 2 histidines are part of sequence fragments that are
conserved in all CPAs considered here. The calculations with DIANA
yielded for each constraint list 10 conformers with all residual
violations of the distance and the van der Waals' constraints
below 1.2 Å (with the exception of one van der Waals'
constraint in one conformer) indicating that the human CPA2 sequence
can assume the overall chain fold of bovine CPA and of porcine
pro-CPA1. No upper distance constraint from the 4210 entries list, one
upper distance constraint from the 5563 entries list, and exactly one
van der Waals' constraint were violated by more than 0.5 Å
in more than three conformers. Thus, no serious inconsistency of the
distance constraints with the altered packing requirements due to the
new sequence occurred.
The backbone superposition of human pro-CPA2 and porcine
pro-CPA1 in Fig. 3shows that backbone deviations either occur
in surface loops or are due to a different position of the activation
domain relative to the enzyme but that the interior with the binding
pocket is very similar. Thus the 153 side-chain replacements and the
two deletions in human pro-CPA2 with respect to porcine pro-CPA1
require no significant change of the backbone structure. The residues
involved in zinc binding and catalysis (His,
Glu
, Arg
, His
, and
Glu
) and substrate anchoring and positioning
(Arg
, Asn
, Arg
, and
Tyr
) are conserved among all CPAs, and therefore no
significant structural difference occurs between the model and the
crystal structure. Among other residues of importance for substrate
binding such as Arg
, Ser
,
Tyr
, Ser
, Met
,
Ile
, Ile
, Ala
,
Gly
, Ser
, Ile
,
Asp
, Ala
, and Phe
, which are
responsible for the enzyme specificity, only 3 are not conserved in
human pro-CPA2 and porcine pro-CPA1 or bovine pro-CPA. Met
of human pro-CPA2 corresponds to a Leu in both A1 forms,
Ser
Val in porcine pro-CPA1, and Ala
Ser in porcine pro-CPA1 and to Thr in bovine pro-CPA. In
the model structure of human pro-CPA2, all 3 residues are well defined
and next to each other (Fig. 4). Compared with the A1 forms, the
flexible Met and the small Ser and Ala present a binding surface more
capable of accommodating a bulky side chain from the substrate. The
presence of an Ile residue at position 255 in the specificity site (7) confirms that the A types share this characteristic, in
contrast to the Asp residue found at this position in the B types. This
residue, located in the center of the binding pocket of the catalytic
domain, seems to be a critical determinant for the different
specificities for hydrophobic or for charged substrate residues.
Figure 3:
Stereo view of the human pro-CPA2 model
and the porcine pro-CPA1 crystal structure. The C traces of both structures are shown after superposition of the
backbone of the enzyme part. The enzyme part of the model is drawn with
a heavyline, its activation domain and the entire
porcine pro-CPA1 are drawn with thinlines.
Figure 4: Stereo view of the binding pocket of human and porcine pro-CPA2. Shown are the backbone fragments 61-68, 141-145, 188-206, 236-257, and 261-271 of both molecules with thinlines and side chains of residues 203, 254, and 268 (Met, Ser, and Ala in human pro-CPA2, Leu, Val, and Ser in porcine pro-CPA2) with heavylines.
As mentioned above, the major structural difference regarding the activation segment of human and porcine pro-CPA2 is its imprecise positioning with respect to the enzyme part. The largest difference occurs for the two helices on the outer face of the activation domain (Fig. 3). Also, the connecting helix between the activation and enzyme domains is not very well defined. Nevertheless, the superposition of the activation domains of the two structures (not shown) indicates coincidence of all important side chains within the structural spread of the human pro-CPA2 model.
The results shown here demonstrate that a pro-CPA2 form is encoded in the human genome. The abundance of its cDNA in a pancreatic library and the coincidence of its derived Nterminal sequence with that of a major procarboxypeptidase form previously isolated from the human pancreas (8) confirm that pro-CPA2 is abundantly expressed in this organ in humans. A similar reasoning can be applied to the A1 human form previously cloned by our laboratories(9) . Taken together, all this confirms the former assignment of the zymogens (B2, B1, A2, A1, and A1 binary complex) separated by high performance liquid chromatography on DEAE columns, which was based on different biochemical properties(8) .
The higher number of sequence
identities found between human pro-CPA2 and rat pro-CPA2 (89%
homology), as compared with human pro-CPA2 and human pro-CPA1 (64%), is
in agreement with the proposal of Gardell et al.(3) about the formation of the two isoforms by gene
duplication before speciation of mammals. A careful alignment of these
sequences indicates several features that back the differentiation of
A2 from A1, such as the particular occurrence of Cys and
Cys
in the former, both residues forming a disulfide bond
in rat the A2 form (7) and being absent in the A1 forms. This
disulfide bond is of importance, given that it may affect the
conformation and dynamics of the active-site surface loop extending
from residue 242 to 263. Another different structural feature is the
deletion at positions 6 and 57 in human pro-CPA2, also evident in the
rat counterpart.
The occurrence of two carboxypeptidase A isoforms has only been described in the human and rat species. The lack of reports about the occurrence of A2 in bovine (the most studied system) does not guarantee its absence, even if it is taken into account that bovine CPA has a specificity that is mid-way between the A1 and A2 isoforms. Most likely, its apparent absence is due to the inability of the separation procedures to isolate the different forms at the protein level. The resolution of this problem probably requires the use of more efficient separation methodologies or of molecular genetics approaches applied to different vertebrate species besides bovine.
The
comparison between human carboxypeptidases A2 and A1, rat A2, and
bovine A forms (the latter two of known three-dimensional structures)
clearly points out the conservation of the residues that are important
for catalysis and for the delineation of the binding site cavity for
small substrates or inhibitors (such as Gly-Tyr). Thus, the
Zn ligands (His
, Glu
,
His
) and the residues involved in catalysis
(Glu
, Arg
) and in substrate anchoring and
positioning (Arg
, Arg
, Asn
,
Arg
) are present in all of these molecules. Also,
residues that form the surface loop and close the active site
(Ile
, Tyr
, Ala
) are always
present. Differences may be expected at those residues forming the
specificity pocket. The substitutions at positions 203, 253, 254, or
268 in the A1 forms by residues of smaller side chains in human
pro-CPA2 should favor the enlargement of the specificity cavity to
facilitate the recognition of substrates with bulkier aromatic
residues, one of the characteristics of the A2 function. From these
points of view, the previous hypothesis for the specificity basis of
the rat A2 form (3) is fully backed in the human counterpart.
Sequence comparisons between the activation segment regions of
different pro-CPs provide clues for their inhibition mechanisms and
activation processes(1) . These comparisons are shown in Fig. 5for the human A2 form and the porcine A and B forms, of
known tertiary structures(10, 11) . Beforehand, it is
interesting to indicate that the porcine enzymes are folded in those
regions in an -
open sandwich formed by two
-helices
packed on one side of a four-stranded
-sheet plus two reverse
turns; an extra 3
helix is found packed at the side of the
two
-helices in the B form. When the three sequences are aligned
on the basis of both secondary structure and residue conservation, it
is found that the human A2 form exhibits a 56 and 24% homology with the
A and B porcine forms, respectively.
Figure 5: Primary and secondary structure comparison between the activation segment (As) of porcine pancreatic pro-CPs A1, B(10, 11) , and human A2. The numbering system adopted is explained in Fig. 2. The actual length of the activation segments is 94 residues for the A forms and 95 for the B forms. The boxes mark the limits of the secondary structure elements observed, which are identified below the sequences.
Residue conservation at the
activation segment regions of pro-CPs is therefore smaller than that
observed among the enzymes themselves. However, the substitutions do
not modify the secondary structure propensities of those regions, as
shown by sequence-based prediction and by homology modeling. As
depicted in Fig. 5, the deletion of residues 45-48 with
respect to the B form implies the loss of a 3 helix in A2
as is also the case in porcine A1. Given that in pro-CPB this region
has been attributed the role of keeping the proenzyme devoid of
activity in front of small substrates(10) , we may expect a
change in this property in human A2. Indeed, this has been proved by
kinetic analysis with Bz-Gly-L-Phe, which indicated that human
pro-CPA2 presents a significant residual activity against this
substrate (not shown).
Contradictory results seem to appear at the C
terminus of the activation segment (residues -18 to -1 in Fig. 1, or residues 84-101 according to the alignment of Fig. 5). This region connects the activation domain to the
enzyme moiety and is of prime importance in the proteolytic activation
mechanism of pro-CPs(1, 24) . The alignments of
sequence and secondary structure elements (both predicted from the
sequence and observed in the model for human pro-CPA2) indicate the
presence of a long -helix for this region. However, the activation
process of human pro-CPA2 (8) is neither bimodal nor slower
with respect to human pro-CPB, as observed when this region is
structured in a long helix in the A1 forms(1, 10) . In
the present case, it seems that the long helix does not clamp the
activation segment of human pro-CPA2 on to the active enzyme after
being severed from it, as suggested for porcine pro-CPA1(10) .
More detailed analysis of the three-dimensional structure at this
region may give clues to help understand this behavior.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U19977[GenBank].