(Received for publication, October 4, 1996, and in revised form, December 20, 1996)
From the Department of Internal Medicine, Division of Respiratory, Critical Care, and Occupational Medicine, University of Utah Health Sciences Center and Veterans Administration Medical Center, Salt Lake City, Utah 84132
Dipeptidyl-peptidase I, a lysosomal cysteine
proteinase, is important in intracellular degradation of proteins and
appears to be a central coordinator for activation of many serine
proteinases in immune/inflammatory cells. Little is known about the
molecular genetics of the enzyme. In the present investigation the gene for dipeptidyl-peptidase I was cloned and characterized. The gene spans
approximately 3.5 kilobases and consists of two exons and one intron.
The genomic organization is distinct from the complex structures of the
other members of the papain-type cysteine proteinase family. By
fluorescence in situ hybridization, the gene was mapped to
chromosomal region 11q14.1-q14.3. Analysis of the sequenced 5-flanking
region revealed no classical TATA or CCAAT box in the GC-rich region
upstream of cap site. A number of possible regulatory elements that
could account for tissue-specific expression were identified. Northern
analyses demonstrated that the dipeptidyl-peptidase I message is
expressed at high levels in lung, kidney, and placenta, at moderate to
low levels in many organs, and at barely detectable levels in the
brain, suggesting tissue-specific regulation. Among immune/inflammatory
cells, the message is expressed at high levels in polymorphonuclear
leukocytes and alveolar macrophages and their precursor cells.
Treatment of lymphocytes with interleukin-2 resulted in a significant
increase in dipeptidyl-peptidase I mRNA levels, suggesting that
this gene is subjected to transcriptional regulation. The results
provide initial insights into the molecular basis for the regulation of
human dipeptidyl-peptidase I.
Granule-associated serine proteinases are major constituents of polymorphonuclear leukocytes (PMNL),1 cytotoxic lymphocytes, and mast cells, accounting for up to 30% of the cellular protein in these cells. They are involved in many physiologic and pathologic processes. Unlike the more extensively characterized serine proteinases such as trypsin, chymotrypsin, or pancreatic elastase that are stored as inactive zymogens in the secretory vesicles of cells and activated only after secretion into the intestinal lumen, granule-associated serine proteinases of leukocytes and mast cells are stored as fully active enzymes. Nevertheless, based on their known cDNA sequences, these immune/inflammatory cell proteinases are initially translated as zymogens and then processed in several steps including the cleavage of signal peptides and the subsequent removal of short propeptides that typically consist of two amino acid residues (1, 2). The cleavage of the propeptides is unusual in that it occurs at an acidic residue in contrast to most proteinase zymogens that are processed at a basic or, rarely, an aromatic residue. Thus, a major mechanism of control of leukocyte or mast cell granule-associated serine proteinases occurs at the level of dipeptidase activation.
Dipeptidyl-peptidase I (DPP-I, EC 3.4.14.1), a cysteine proteinase, was
recently demonstrated to play a requisite role in removing the
activation dipeptide from many of the leukocyte and mast cell
granule-associated proteinases including human cathepsin G, leukocyte
elastase, mast cell chymase and tryptase, and lymphocyte granzymes B
and H (3-5). DPP-I, originally called cathepsin C, was discovered when
extracts of kidney were found to catalyze the hydrolysis of
Gly-Phe--naphthylamide (6). It is a lysosomal enzyme widely
expressed in many tissues that is felt to be important in intracellular
degradation of proteins. The enzyme was purified from human spleen and
characterized as a glycoprotein with a pI of 5.4, a molecular mass of
200 kDa as determined by gel filtration under non-denaturing
conditions, and a subunit size of 24 kDa (7).
The strong circumstantial evidence that DPP-I is the central
coordinator for activation of many serine proteinases contained in
immune/inflammatory cells and that it is differentially expressed in
human tissues (8) emphasizes the need for in-depth studies to define
factors regulating its expression. A human and a rat DPP-I cDNA
have recently been cloned (9, 10), but the reported sequences contained
only a portion of the 5-untranslated regions (UTRs). Moreover,
information on gene expression and regulation has not been reported. In
the present investigation we describe the structure, localization, and
expression of the gene for DPP-I. We also demonstrate its regulation in
cytokine-stimulated lymphocytes.
Maximum strength Nytran membranes were from
Schleicher & Schuell, Inc. Multiple tissue Northern blots, ExpressHyb
hybridization solution, human spleen total RNA, and Marathon cDNA
amplification kits were from CLONTECH. MicroFastTrack kits, cDNA
cycle kits, and TA Cloning kits were from Invitrogen. LA PCR kits were
from Panvera. TRI REAGENT was from Molecular Research Center Inc.
[-32P]ATP (3000 Ci/mmol) and
[
-32P]dCTP (3000 Ci/mmol) were from Amersham Life
Science, Inc. Sequenase DNA sequencing kits were from U. S.
Biochemical Corp. RPMI 1640, minimum Eagle's medium, nonessential
amino acids, and Cot1 DNA were from Life Technologies, Inc. Defined
fetal bovine serum was from Hyclone Laboratories (Logan, UT). Hybrisol
VI was from Oncor Inc. Actinomycin D, cycloheximide, and other
chemicals not specifically mentioned were high quality grade from
Sigma.
Poly(A)+ RNA was
isolated from human spleen using a MicroFastTrack kit. The first strand
cDNA was synthesized with a combination of oligo(dT) and random
primers. A portion of the cDNA was amplified using the PCR primers
that represent the 5 (743-760 nt) and 3
(complementary to 1429-1446
nt) termini for the rat mature protein coding region (10). The PCR
product represented 756-1455 nt of human DPP-I cDNA and was used
as a probe to screen a human genomic PAC (1
bacteriophage-derived rtificial hromosome)
library (GenomeSystems, Inc.). Phage DNA from the PAC clones was
purified by alkali lysis, and the insert was released with
NotI followed by digestion with EcoRI. The
digested DNA was analyzed by Southern blot hybridization with the human
DPP-I cDNA probe used in the library screening, and the relevant
DNA fragments were purified from the gel using a Prep-A-Gene kit. To
determine the exon and intron organization, fragments of genomic DNA
were amplified by PCR and sequenced. To locate intron sequences, the
following oligonucleotide primer pairs were selected to amplify
overlapping regions spanning the entire length of the human DPP-I
cDNA: 1) 13-37 nt and complementary to 855-878 nt, 2) 855-872 nt
and complementary to 1438-1455 nt, and 3) 1385-1408 nt and
complementary to 1661-1684 nt. The primer pairs used to obtain the
5
-flanking sequence and identify the polyadenylation site were 1) T7
sequencing primer and complementary to 139-162 nt and 2) 1621-1641 nt
and SP6 sequencing primer.
To obtain data on the intron size and splice junction site, long and accurate PCR was performed to amplify the fragments of genomic DNA using a GeneAmpTMPCR system 9600. The PCR amplification reaction consisted of an initial denaturation at 94 °C for 1 min, followed by 30 cycles of denaturation at 94 °C for 15 s, annealing at 62 °C for 15 s, and extension at 72 °C for 2 min. Each PCR product was analyzed on an agarose gel, directly subcloned into the pCRTMII vector using the TA cloning kit, and sequenced.
Chromosomal Assignment of Human DPP-I GeneThe genomic plasmid clone was used as a probe for chromosomal localization of DPP-I by fluorescence in situ hybridization. The genomic clone was nick translation-labeled with biotin, hybridized to metaphase chromosomes, and detected with Cy3-conjugated streptavidin. Human metaphase chromosome spreads were prepared by standard procedures and G-banded after trypsin treatment and Wright's staining. Hybridization and detection conditions on metaphase chromosomes were performed as described previously (11). Briefly, the G-banded preparations were destained with a fixative containing methanol and glacial acetic acid (3:1), dehydrated by successive washings in 70, 90, and 100% ice-cold ethanol, and dried at 37 °C. Probe signals were detected with Cy3 conjugate viewed through a triple pass filter using an epifluorescence microscope. The fluorescence in situ hybridization image was overlaid on the G-banded metaphase image to localize the gene.
Determination of Transcription SiteTo determine the
transcription start site, anchored PCR was performed. Briefly, the
first strand cDNA was synthesized at 50 °C using spleen
total RNA (1 µg) and a gene-specific primer that was complementary to
855-878 nt. The 5-end of purified cDNA was (dA)-tailed with
terminal deoxynucleotidyltransferase and anchored. The second
strand cDNA was synthesized using oligo(dT) containing a 3
rapid
amplification of cDNA ends adapter primer. After purification, the
double-stranded cDNA was amplified by PCR using a primer pair of an
abridged universal amplification primer (Life Technologies, Inc.) and a
gene-specific primer complementary to 828-851 nt. The PCR product was
re-amplified with an abridged universal amplification primer and a
nested gene-specific primer complementary to 751-774 nt. The PCR
product was analyzed on an agarose gel, subcloned into
PCRTMII vector, and sequenced.
Primer extension was performed to confirm the transcription start site.
Briefly, the 18-nt primer complementary to 17-34 nt was end-labeled
with [-32P]ATP using T4 polynucleotide kinase. An
annealing reaction was carried out with 100 fmol of labeled primer and
25 µg of human spleen total RNA at 58 °C for 20 min. To this
reaction, avian myeloblastosis virus reverse transcriptase (1 unit) was
added to a 20-µl final volume containing 1 mM dNTPs, 50 mM Tris, pH 8.3, 50 mM KCl, 10 mM
MgCl2, 10 mM dithiothreitol, and 0.5 mM spermidine. The extension reaction was carried out at
42 °C for 30 min and terminated by adding 20 µl of loading dye.
The primer extension samples were boiled for 10 min and loaded onto a
6% sequencing gel. After electrophoresis, the gel was dried and
developed.
Multitissue blots were used to
determine expression of the DPP-I gene within human tissues. For
studies of gene expression in immune/inflammatory cells, U937, PLB 985, or HL-60 cells were grown as described previously (12). Studies of gene
regulation were conducted in IL-2-stimulated lymphocytes. Experiments
were performed on cells having viabilities of >95% as judged by
trypan blue exclusion. Cells were changed to fresh medium before the exposure to possible agonists/modulators at concentrations specified for the indicated periods of time. The cells were harvested at the end
of each time period and kept frozen at 80 °C until further analysis. Actinomycin D stock (5 mg/ml) was prepared in 95% (v/v) ethanol. Cycloheximide stock (10 mg/ml) was prepared in
phosphate-buffered saline.
Total RNA was prepared by the acid guanidinium thiocyanate
phenol/chloroform method (13) and quantified by measuring absorbance at
260 nm. RNA (10 µg/lane) was size-fractionated on 1% agarose, 0.4 M formaldehyde gels containing formamide and transferred to Nytran membranes by capillary action. The RNA was cross-linked to the
membrane by exposure to UV light, prehybridized at 68 °C for 30 min,
and hybridized with an [-32P]dCTP-labeled probe at
68 °C for 1 h. After several low stringency washes, the blot
was washed twice at high stringency and developed.
Overall, the human DPP-I
gene spans 3.5 kb and contains two exons separated by 1645 nt of
intronic DNA (Fig. 1). The first exon comprises the
5-UTR followed by 889 nt that encode the signal peptide, propeptide,
and partial mature protein region. The second exon contains the
remainder of the mature protein-coding sequence of 501 nt, the stop
codon, and the 3
-UTR including a polyadenylation signal. The location
of the intron was confirmed by sequence analysis. As shown in Fig. 1,
the exon-intron boundaries conform to classical splice donor and
acceptor consensus sequences (14). The single intron splice site
occurred between nucleotides 952 and 953, the first and second
nucleotides of the codon for Gly297 in the cDNA
sequence, which was indicative of a phase 1 intron. The exon sequence
agrees with that determined for the cDNA, indicating that the
obtained cDNA using reverse transcription PCR is free of PCR
artifacts.
The cDNA is similar in size and composition to that recently
reported for human ileum DPP-I (9) but contains an additional 30 nt of
the 5-UTR, including the transcription initiation site (see below).
The DPP-I cDNA sequence spans 1888 nucleotides and includes a
1392-nt open reading frame that encodes for 463 amino acids. This
includes a 24-amino acid residue signal peptide, a 206-amino acid
residue propeptide region, and a 233-amino acid residue mature enzyme.
The coding region of human DPP-I shows 78% identity to the rat at
nucleotide and amino acid levels. The mature protein shows 88%
identity at nucleotide and amino acid levels with the respective rat
sequences. The cDNA sequence differs from that reported for the
ileum at nucleotides 276 (C
G, Leu73 unchanged) and
1440 (G
A, Pro459 unchanged) in the protein coding
region and at nucleotides 1461 (C
G), 1503 (A
G), and 1861 (G
A) in the 3
-UTR. In addition, there is a five-nucleotide (ACTGC)
deletion immediately 5
to the poly(A)+ tail in the spleen
cDNA when compared with that of the ileum. These five nucleotides
(ACTGC) preceding the poly(A)+ tail reported for human
ileum DPP-I by Paris et al. (9) are identical to those we
observed in the genomic sequence. The basis for this difference is not
presently known but may result from the existence of limited genetic
variability.
Of the 20 metaphase cells that were located, all
showed Cy3 signals on the long arms of chromosome 11. Fourteen of 20 showed four hybridization signals (one per chromatid, two on each
chromosome 11) whereas 6 showed only one signal on one chromosome 11 and two signals on the other chromosome 11. No other chromosomes showed signals with the genomic probe, suggesting a single genomic sequence with high homology to the DPP-I gene locus. Imaging techniques further
localized DPP-I to 11q14.1-q14.3 (Fig. 2) with a 92.5% efficiency of hybridization.
Identification of Transcription Initiation Site
Sequence
analysis of independent subclones generated by anchored PCR indicated
that transcription initiates at the A nucleotide located 63 nt upstream
from the ATG that represents the translation initiation codon. Primer
extension analysis with the antisense primer positioned 30 nt 5 to the
ATG yielded a 34-nt-long product (Fig. 3) calculated to
end at the A nucleotide located 63 nt 5
from the ATG and is,
therefore, consistent with the results obtained by anchored PCR. The
sequence of the region encompassing the transcription initiation site
revealed that the A nucleotide is preceded by an invariant C nucleotide
and matches the consensus cap signal that has been found in the
majority of eukaryotic promoters (15).
Identification of Putative Regulatory Elements in the 5
The 5-regulatory region was
determined by sequencing the ~3-kb PCR fragment that contained a
portion of the first exon and the adjoining upstream region (Fig.
4). Computer analysis of this region revealed several
features that are characteristic of promoters. The first 200 nt of the
immediate 5
-region relative to the transcription initiation site is
GC-rich (65%) as compared with the GC composition (50%) of the whole
5
-region. Further analysis of this region revealed neither a classical
TATA box nor a CCAAT box. However, a potential cis-acting DNA element,
GC-box/imian virus 40 rotein
(Sp1) binding site (position
55 in reverse orientation) (16) was
identified. The 5
-region contains recognition sequences for several
other transcription factors including three sites for the
yclic AMP esponse lement
inding rotein (CRE-BP) (17) at positions
519,
950 (reverse orientation), and
953; five sites for
CAAT/nhancer inding
rotein (C/EBP) (18) at positions
480,
531 (reverse
orientation),
660,
732 (reverse orientation), and
665; two sites
for NF
B/c-Rel (19) at positions
637 (reverse orientation) and
797; and two sites for Oct-1 (20) at positions
897 and
1115
(reverse orientation). Other sites of interest in the 5
-region include
binding sites for several cell-specific transcription factors involved
in the proliferation and differentiation of hematopoietic cells. These
include five yeloid inc inger (MZF1) sites (21) at positions
73 (reverse orientation),
116 (reverse orientation),
349 (reverse orientation),
435, and
1070; nine GATA (22) family binding sites including four GATA-1 sites at
positions
76 (reverse orientation),
731 (reverse orientation),
939, and
1055 (reverse orientation), four GATA-2 sites at positions
76 (reverse orientation),
550,
731 (reverse orientation), and
939, and one site for GATA-3 at position
77 (reverse orientation); two sites for aros (IK-2) (23) at positions
691 and
1074 (reverse orientation); one site for the mphoid
transcription actor (Lyf-1) (24) at position
1075
(reverse orientation); three sites for v-Myb (25, 26) at positions
563,
654, and
969; one site for the uclear
espiratory actor (NRF-2) (27) at position
1054 (reverse orientation); one site for Pbx-1 (28) at position
1111; one site for E1A (arly region of
adenovirus)-associated -kDa rotein (p300)
(29) at position
357 (reverse orientation); and several sites for
CdxA (30). Potential recognition sequences for the Ets (26
ransformation pecific) family of
transcription factors c-Ets-1 (31) and
ts-ie (Elk-1) (32) are also
present in reverse orientation at positions +29,
42,
1052, and
1054 and at position
39, respectively.
Expression of Human DPP-I mRNA in Tissues
The expression pattern of DPP-I in adult human tissues was determined by Northern blot analysis. The human DPP-I cDNA probe hybridized to a transcript of ~2 kb in all the tissues but with varying intensities. The strongest signal for human DPP-I mRNA was detected in the lung, kidney, and placenta. A signal of moderate intensity was detected in the small intestine, colon, spleen, and pancreas. A low intensity signal was observed in the heart, reproductive organs (testis and ovary), and peripheral blood leukocytes. A weak signal was present in the thymus, prostate, liver, and skeletal muscle. Transcripts were barely detectable in the brain.
Because of our interest in DPP-I as a central coordinator in activating
granule-associated serine proteinases, we determined the expression of
human DPP-I mRNA in immune/inflammatory cells and their precursors.
A representative Northern blot is shown in Fig. 5. Among
the fully differentiated cells, the strongest hybridization signal was
observed in PMNL and alveolar macrophages. A weak signal was detected
in unstimulated lymphocytes and monocytes. Among precursor cells,
strong signals were observed in PLB 985, a myelomonoblastic cell line,
U937, a myelomonocytic cell line, and HL-60, a promyelocytic cell
line.
Regulation of Human DPP-I in Activated Lymphocytes
Because
granzymes present in lymphokine-activated lymphocytes are presumably
activated by DPP-I that is present in only low levels in unstimulated
lymphocytes, we studied the regulation of the human DPP-I gene in
lymphocytes stimulated by IL-2. As shown in Fig. 6, low
levels of human DPP-I mRNA were present in lymphocytes not exposed
to IL-2. When lymphocytes were exposed to IL-2, induction of human
DPP-I mRNA occurred as early as 12 h, peaked at 48 h, and
then declined by 72 h. Actinomycin D prevented the induction of
human DPP-I mRNA in lymphocytes stimulated by IL-2. Treatment of
lymphocytes with cycloheximide also prevented the induction of human
DPP-I mRNA by IL-2 (data not shown). These results indicate that
the induction of human DPP-I mRNA observed in lymphocytes treated
with IL-2 most likely occurred at the level of gene transcription and
was dependent on protein synthesis.
DPP-I is a lysosomal cysteine proteinase differentially expressed in a variety of tissues and thought to play an important role in intracellular protein degradation. To date, it is the only cysteine proteinase that has been demonstrated in PMNL, and recent studies have focused on the importance of this enzyme as a central coordinator for activation of granule-associated serine proteinases contained in PMNL and mast cells. In this investigation, to better understand the molecular basis for the regulation and physiologic effects of DPP-I and to gain insight into its tissue-specific and cytokine-induced expression, we have determined the organization, chromosomal location, and expression of the human DPP-I gene.
DPP-I has been classified as a member of the lysosomal papain-type cysteine proteinase family that also includes cathepsins B, H, L, O, and S. This classification is based on its localization within the cell, acidic pH optimum for enzyme activity, and conserved amino acid sequence with respect to the NH2-terminal and COOH-terminal regions that form the substrate-binding pocket of the enzyme. However, unlike cathepsin B, H, L, O, and S, which are monomeric proteins (molecular mass 20-30 kDa) with endopeptidase activity, DPP-I is an oligomeric protein (200 kDa) with exopeptidase activity. In addition, the overall amino acid sequence homology of DPP-I shows relatively little identity with other members of this group of proteinases.
We now report that the organization of the human DPP-I gene contrasts strikingly with that of the other enzymes contained within the papain group. The human DPP-I gene is of limited size and complexity, existing as a single copy that spans approximately 3.5 kb, contains two exons divided by a single intron, and is expressed as a single transcript. Reports of genes previously described for cathepsin B (33), cathepsin H (34), cathepsin L (35), and cathepsin S (36) emphasize complex structures consisting of multiple exons and introns, some undergoing alternative splicing that gives rise to multiple transcripts that are differentially expressed. Recently, an alignment/phylogeny of the papain superfamily of cysteine proteases was created (37) in which cathepsin B and DPP-I were placed in the same class, appearing to have diverged from the other papain group of sequences before the origin of kinetoplastids. However, the grouping of cathepsin B and DPP-I was not well supported statistically. Based on the results of the current investigation demonstrating the strikingly different genomic organization, we question the grouping of cathepsin B and DPP-I and speculate that rather than having a common ancestral origin with the other mammalian cysteine proteinases of the papain superfamily, DPP-I may have evolved into the class through convergence by selective evolutionary pressure. Also of note, human DPP-I is neither located on the chromosomes of other cysteine proteinase groups (36, 38, 39) in which it is classified nor on chromosomes of granule-associated serine proteinases (40) to which it functions as a processing enzyme.
To begin to address the regulation of human DPP-I gene expression, we
analyzed the 5-flanking sequence for potential upstream regulatory
elements. The transcription initiation site is located
63 nt from the
translation initiation site and is surrounded by a canonical cap signal
sequence. Consensus transcription sequences such as TATA and CCAAT are
notably absent in the GC-rich upstream region of the cap site.
Eukaryotic promoters lacking a TATA or CCAAT sequence frequently encode
proteins or enzymes with housekeeping functions. Most of these
constitutively expressed genes exhibit multiple transcription
initiation sites distributed over a limited region (41) and multiple
Sp1 binding sequences. In contrast, the human DPP-I promoter has a
single transcription initiation site and a single Sp1 site. This is
similar to the cathepsin S (36), the thrombin receptor (42), and the
nerve growth factor receptor (43) genes that are also subject to
regulated expression.
The 5-flanking region of the human DPP-I gene contains putative
regulatory elements. Many of these elements (e.g. MZF1,
v-Myb, GATA) have been shown to be important in proliferation and
differentiation of hematopoietic cells. The promoter region contains T
cell-specific transcription factor binding sites (e.g.
IK-2/Lyf-1, p300) as well as NF-
B recognition sites reported to be
involved in the cytokine-stimulated gene expression (44). Further
functional analysis of the promoter region will be necessary to
determine which of the factors are involved in the regulation of the
human DPP-I gene.
It has been reported previously that DPP-I mRNA is widely expressed in all rat tissues (10) and that the relative level of message in different tissues mirrored the protein content (45) as well as the enzyme activity (8). This also appears true for the human, where the level of expression of DPP-I transcripts observed in the current investigation is in close agreement with reports of the distribution of DPP-I enzyme activity in tissues (8) and hematopoietic cells (3, 7). However, there are notable differences in the expression of DPP-I in the human and the rat. In the human, for example, DPP-I is expressed at low or barely detectable levels in the liver and brain, whereas in the rat DPP-I is highly expressed in the liver and moderately expressed in the brain. These results suggest species-specific expression of the enzyme.
Importantly, the pattern of expression of DPP-I transcripts in immune/inflammatory cells is distinct from that observed for granule-associated serine proteinases. Results from the current investigation suggest that DPP-I is expressed at all stages of myeloid cell development. In contrast, mRNA expression of granule-associated serine proteinases is restricted to specific stages of myeloid cell development (46-51). This suggests a role for DPP-I in immune/inflammatory cells that extends beyond that of a processing enzyme for the granule-associated enzymes.
In summary, we isolated and characterized the human DPP-I gene including a 1.2-kb promoter region. The gene contains a single intron and maps to chromosome 11q14.1-q14.3. The putative promoter region has neither consensus TATA nor CCAAT sequences, a characteristic of housekeeping genes, but it appears to be regulated, at least in certain settings. Further studies are needed to determine the basis for this regulated expression.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) U79415[GenBank].