(Received for publication, September 5, 1995)
From the
Dictyostelium discoideum makes multiple developmentally
regulated lysosomal cysteine proteinases. One of these, a lysosomal
enzyme called proteinase I, contains a cluster of GlcNAc--1-P-Ser
residues. We call this phosphoglycosylation. To study its function, a
cDNA library from vegetative cells was screened, and two novel cysteine
proteinase clones were characterized (cprD and cprE).
Each of them has highly conserved regions expected for cysteine
proteinases, but unlike any other, each has a serine-rich domain
containing three distinct motifs, poly-S, SGSQ, and SGSG. cprD and cprE cDNAs were overexpressed in Dictyostelium and the active enzymes identified. cprD codes for a
protein of approximately 36 kDa (CP4), which is recognized by
monoclonal antibodies against GlcNAc-1-P and fucose. cprE corresponds to a 29-kDa protein, which is recognized by antibodies
against GlcNAc-1-P. mRNA for both enzymes is present in the vegetative
phase and increases during growth on bacteria but decreases throughout
development. When the formation of the fruiting body is complete the
mRNA for both messages is detected again but in very low levels. Having
cloned cDNAs for proteins that carry GlcNAc-1-P should allow us to
probe the function of the carbohydrate in these putative lysosomal
enzymes.
Dictyostelium discoideum is an eukaryotic amoeba that
grows as single cells, but when the bacterial food source is removed,
the cells initiate a complex multicellular developmental program. Cells
aggregate and differentiate into several different types and, in the
end, 85% of them are converted into spores setting atop a cellular
stalk(1) . We are interested in studying the role of
carbohydrate modifications in this organism(2) . One of these
is the addition of GlcNAc-1-P to serine residues, which has been well
documented to occur on a cysteine proteinase called proteinase I found
in vegetative cells (3, 4, 5) . Although
antibodies against GlcNAc-1-P recognize various proteins in the cells
and in secretions of cells grown in axenic medium ()the
identity of these proteins is unknown. To study the function of
GlcNAc-1-P on a defined protein, we decided to clone members of the
cysteine proteinase family expressed in vegetative cells.
Previous studies in Dictyostelium identified two developmentally regulated members of this gene family, cprA (CP1) and cprB (CP2)(6, 7, 8) , but none have been identified in vegetative cells. Since cysteine proteinases are highly conserved in all eukaryotes, we used the active site consensus sequence of cysteine proteinases and the cprA and cprB cDNAs to clone two novel vegetative cysteine proteinases, cprD and cprE. They have the predicted conserved regions but also have an unusual serine-rich domain not previously found in any known cysteine proteinase that could be the site of GlcNAc-1-P addition. The cDNA clones were overexpressed and the active enzymes were shown to have GlcNAc-1-P.
Figure 1: Nucleotide and deduced amino acid sequences of cprD and cprE. Panels A and B correspond to cprD and cprE, respectively. The start of the polyadenylation signal, AATAAA, is underlined. The putative N terminus of the mature proteinase is boxed. Asterisks signify termination. The amino acids are indicated by the single letter code.
Figure 2: Sequence alignment of CP4 and CP5 to human, plant, and Dictyostelium cysteine proteinases. Shared sequences are boxed. Double underlines indicate putative N-glycosylation sites, and arrows show the active site cysteine and histidine. * indicates the beginning of the mature protein. In boldface are the serine-rich domains on CP4 and CP5.
An unusual feature of these deduced amino acid sequences is the presence of a serine-rich domain near the C terminus of both proteins. In CP4 it is 115 amino acids long and contains 60 serine residues (52%), while in CP5 the same region contains 12 serine residues out of 24 amino acids (50%). Another Dictyostelium cysteine proteinase, CP2, also has an insert in this region (42 amino acids long), but its serine content is only 11%. Other cysteine proteinases typically have much shorter sequences (1-12 amino acids) in this region (Fig. 2). In CP4, the serine residues seem to be distributed in three distinct repeated motifs: poly-S, SGSQ, and SGSG. Serines in the insert from CP5 seem to follow the same pattern but in fewer repeats.
The tertiary structures of cysteine proteinases actinidin, papain, and the human liver cathepsin B are known(27, 28, 29) . The similarity in the conserved regions of CP4 and CP5 to these cysteine proteinases suggests that they may have the same overall structure. Fig. 3shows the tertiary structure of actinidin and the location of the serine-rich insert of CP4 and CP5 based on the inferred amino acid sequence homology and crystal structures. The insertion occurs at Gly-170 (actinidin), and in CP4 it comprises nearly one-third of the predicted size of the mature protein. As seen in Fig. 3, the insert lies on the opposite side of the protein away from the active site.
Figure 3:
Location of the serine-rich inserts in
relation to the active site of a cysteine proteinase. The relative size
and location of the serine-rich insertions of CP4 (A) and CP5 (B) are depicted onto the -carbon structure of actinidin.
The active sites Cys-25 and His-162 are indicated. The insertion occurs
at Gly-170 of the actinidin sequence.
Figure 4: Southern blots of cprA, cprB, cprD, and cprE. Genomic DNA digested with BamHI, BglII, ClaI, and EcoRI/HindIII was electrophoresed in agarose gels and blotted into nylon. The blot was probed at high stringency (55 °C) to the entire cDNAs of cprD and cprE. The same filters were reprobed with cprA and cprB. The molecular weight markers are indicated in kb.
Figure 5:
Analysis
of the mRNA levels corresponding to cprD and cprE during growth and development. A, cells were plated on SM
agar plates with K. aerogenes, and 10 cells were
collected after 44 h (growing cells), 47 h (beginning of clearing), or
50 h (clearing plates). Cells were then washed free of bacteria with 20
mM phosphate, pH 6.4, and plated for development on
nitrocellulose filters. Samples of 10
cells were taken
after 0, 2, 4, and 8 h of development. B, 10
cells
were taken from AX-4, AX-2, CP4-25, CP4-6, and CP5-12
axenic cultures. C, exponentially growing cells from axenic
(HL-5) cultures were washed with 20 mM phosphate buffer, pH
6.4, and plated for development over nitrocellulose filters. Samples of
10
cells were taken after 0, 2, 4, 8, 12, 16, 20, and 24 h
of development. Total RNA was isolated, and 20 µg was submitted to
electrophoresis in agarose-formaldehyde gels. The gels were blotted
into nylon membranes and hybridized against radioactive probes
corresponding to cprD, cprE, and cprA (as an
internal control of development) or 1G7 (a constitutively expressed
gene) as indicated.
Figure 6: Glycosylation pattern of cells that overexpress CP4 and CP5. 40 µg of protein from total cell lysates of control cells grown in HL-5 or Klebsiella (Ka) and transformants (CP4-25, CP4-6, and CP5-12) grown in HL-5 and 1 µg of purified proteinase I were submitted to SDS-PAGE. The proteins of the gel were then blotted into nitrocellulose filters and immunologically detected using a monoclonal antibody (ab) against GlcNAc-1-P (AD7.5) or against fucose (83.5). The primary antibody binding was detected using a goat anti-mouse antibody conjugated to alkaline phosphatase.
Figure 7: Cysteine proteinase activity in cells overexpressing CP4 and CP5. 10 µg of a total cell lysate was preincubated for 30 min with or without 10 µM E-64 in 0.1 M phosphate/citrate, pH 5.0, 1 mM DTT, and then 0.3 mM of the substrates N-Cbz-Lys-ON-p and H-D-Val-Leu-Lys-p-NA was added. After 20 min color development was measured at 405 nM. The graphics indicate the percent of activity in the transformants (CP4-25, CP4-6, CP5-6, and CP5-12) in relation to the controls (AX-4 and AX-2). Values are the average of duplicates. The data are representative of five different experiments.
Figure 8: Cysteine proteinase activity in SDS-PAGE gels from cells overexpressing CP4 and CP5. 40 µg of protein from a total cell lysate of transformants (CP4-25, CP4-6, CP5-6, and CP5-12) and control cells (AX-4 and AX-2) were submitted to SDS-PAGE without boiling. The gels were then preincubated or not with 10 µM E-64 in 0.1 M phosphate/citrate, pH 5.0, 20 mM cysteine for 20 min and then in the same buffer containing 20 µMN-t-Boc-Val-Leu-Lys-7-MCA. Fluorescence developed almost immediately and was observed in a UV transilluminator.
North and Cotter (31) have described cysteine
protease activities in Dictyostelium throughout development
and point out the complex and dynamic activity patterns seen in
vegetative cells(31) . A series of 4-5 different cysteine
proteinase activity bands with apparent M of
30-54 kDa is expressed depending upon whether cells are grown on
bacteria or in axenic media (32, 33) . Gustafson and
co-workers (3, 4) reported a vegetative stage cysteine
proteinase of 38 kDa, proteinase I, that contained up to 20% by weight
GlcNAc-1-P linked to serine residues. Such a serine content is not
typical of cysteine proteinases. Previously, two developmental
stage-specific cysteine proteinase genes, cprA (CP1) and cprB (CP2)(6, 7) , were cloned in Dictyostelium, but their serine content closely resembles that
of other typical eukaryotic cysteine proteinases. A partial sequence
for another developmentally regulated cysteine proteinase (CP3) (34) has also been identified; however, it does not encode a
full-sized enzyme.
We are interested in studying the function of GlcNAc-1-P and the signals needed for its addition to proteins. Based on the previous studies, we screened a vegetative cell cDNA library to look for typical cysteine proteinase genes that would have serine-rich region(s). We found two such cDNAs that could code for cysteine proteinases, cprD and cprE. mRNA for both genes is detected during vegetative growth and decreases with the start of development, reappearing in low levels when the fruiting body is formed (Fig. 5). This is in agreement with the observation that general cysteine proteinase activity slightly increases at the end of development(35) . A surprising feature is that the amount of mRNA increases substantially at the end of vegetative growth. This is typical of the prestarvation responsive genes (36) and occurs in parallel with a burst of cysteine proteinase activity seen at this time (35) . The reason for this is unclear, but this may reflect an increased need for digesting bacteria or for increased protein turnover known to occur in development. When the cells start development, the protease may not be necessary and its mRNA levels decrease. This is consistent with the decrease seen in cysteine proteinase activity during development(37, 38) . Southern blot analysis of cprA, cprB, cprD, and cprE shows that they are located in different genomic DNA fragments (Fig. 4). This was confirmed by mapping the genes in the Dictyostelium genome using yeast artificial chromosomes (YACs). cprD maps to chromosome 3 and cprE maps to the middle of chromosome 2(39) .
CP4 and CP5 have an unusual
domain not present in the other previously studied cysteine
proteinases. CP4 contains a 115-amino acid domain composed of 52%
serine residues divided into three separate contiguous motifs, poly-S,
SGSQ, and SGSG. CP5 contains similar motifs within a 24-amino acid
domain. The serine stretches probably evolved from a series of tandem
duplications. CP5 appears to be the older version of the motifs before
the onset of tandem duplications. The serine-rich inserts in both CP4
and CP5 appear to be located in a non-conserved region of other
cysteine proteinases (Fig. 2). Although they are near the active
site histidine residue, their location in space is expected to be away
from the active site as shown in the three-dimensional structure on Fig. 3. It is possible, though, that the presence of the insert
or of the putative carbohydrate chains may influence the activity of
the enzymes, since the serine-rich domain is connected directly to the
-strand involved in the active site. This domain may serve special
needs for CP4 and CP5 but is obviously not vital for activity since
most cysteine proteinases are devoid of it.
To show that cprD and cprE code for an enzyme that can carry GlcNAc-1-P, the cDNAs were overexpressed in axenically growing cells. This resulted in an average 3-fold increase in cysteine proteinase activity (Fig. 7), which corresponded to an increased activity band of 36 kDa (in CP4 transformants) or 29 kDa (in CP5 transformants) on SDS-PAGE (Fig. 8). Significantly, a monoclonal antibody against GlcNAc-1-P recognizes the same band in the transformants that is found in very low amounts in non-transformed cells (Fig. 6). CP4 transformants also show some additional bands ranging from 45 to 70 kDa, which are detected in the Western blots but not in the activity gels. We are currently unable to explain this effect, but they could possibly represent unprocessed forms of the enzyme due to its overexpression. The results also show that this antibody recognizes GlcNAc-1-P in the 38-kDa proteinase I purified from bacterially grown cells (Fig. 6)(5) . Axenically grown cells also have a 38-kDa protein, but it seems to migrate at a slightly higher molecular weight, both in control and transfected cells. North and co-workers (32, 33) have shown that 38-kDa cysteine proteinases are present in cells grown axenically or in the presence of bacteria but that they have different biochemical properties. Different cysteine proteinase activity patterns are observed in vegetative cells depending on the nutrient availability. These interconversions may be due to differences in post-translational modifications(31, 32, 33) , and the anomalous migration of the 38-kDa protein in the transformants may reflect altered glycosylation of this protein when CP4 is overexpressed. Resolving these issues will require additional experiments, but it is clear that CP4 (36 kDa) and proteinase I (38 kDa) are different. This was confirmed by partial amino acid sequencing of proteinase I(35) , although both proteins showed similar amino acid compositions. It is possible that both are members of a closely related family modified by GlcNAc-1-P, since they at least partially co-purify. Both proteins also contain fucose, as shown by binding of another monoclonal antibody (Fig. 6). The location of the fucose residues is unknown, and they either seem to be absent in CP5 or the expression levels were not high enough to permit detection with this antibody even when the blots were overdeveloped. Based on the processing of other cysteine proteinases, the expected masses of CP4 and CP5 without any modifications would be 32,816 and 24,459 Da, respectively. Two potential N-linked glycosylation sites occur in both enzymes, but it is not known if they are actually used. CP4 migrates in SDS-PAGE as an approximate 36-kDa band and CP5 as a 29-kDa band, but additional experiments will be necessary to determine how much of this mass is contributed by either N-linked chains or by GlcNAc-1-P and fucose.
GlcNAc-1-P is most probably added to CP4
and CP5 in this newly characterized serine-rich domain. It is
interesting to note that this domain has three distinct motifs,
polyserine, SGSQ, and SGSG. An enzyme activity that transfers
GlcNAc-1-P to serine units in proteins has recently been characterized
in Dictyostelium(40) , and a SGSG peptide can act as
an acceptor in an in vitro GlcNAc-1-P transferase
assay(41) . SGSG repeats are used as sites for the addition of
glycosaminoglycan chains to core proteins such as
serglycins(42, 43) . Polyserine repeats have recently
been described in a secreted acid phosphatase from Leishmania,
which is modified by a new class of phosphoserine-linked glycans,
Man-1-P bound to serines(44, 45) . SP96, a spore coat
protein that is present in prespore vesicles of Dictyostelium(46) , is recognized by the GlcNAc-1-P
antibody. ()SP96 has a 96-amino acid domain with 70% serines
interspersed with alanines and prolines and a 49-amino acid region with
SG and GSQ repeats(47) . SP70, another spore coat protein, also
has repeats of SG and a polyserine region(48) . Both proteins
have been shown to be fucosylated and phosphorylated(49) . It
may be that these prespore vesicle proteins, some putative lysosomal
proteins like CP4 and CP5, and other proteins yet to be identified
share a property influenced or controlled by GlcNAc-1-P and/or fucose.
The cloning of these two novel cysteine proteinases will allow us to begin to determine the function of GlcNAc-1-P. By characterizing the sites of GlcNAc-1-P addition and creating mutations in these sites we may understand its potential role in targeting these cysteine proteinases to lysosomes or in affecting enzyme activity. Since these mutant cDNAs can be expressed in Dictyostelium, we can study the fate of the protein with altered glycosylation.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) L36204 [GenBank](CP4) and L36205 [GenBank](CP5).