(Received for publication, November 4, 1996, and in revised form, February 10, 1997)
From the Departments of Dental Research and Biochemistry and Biophysics, the School of Medicine and Dentistry, University of Rochester, Rochester, New York 14642
The cDNA for a fourth member of the mammalian UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase family, termed ppGaNTase-T4, has been cloned from a murine spleen cDNA library and expressed transiently in COS7 cells as a secreted functional enzyme. Degenerate primers, based upon regions that are conserved among the known mammalian members of the enzyme family (ppGaNTase-T1, -T2, and -T3) and three Caenorhabditis elegans homologues (ppGaNTase-TA, -TB, and -TC), were used in polymerase chain reactions to identify and clone this new isoform. Substrate preferences for recombinant murine ppGaNTase-T1 and ppGaNTase-T4 isozymes were readily distinguished. ppGaNTase-T1 glycosylated a broader range of synthetic peptide substrates; in contrast, the ppGaNTase-T4 preferentially glycosylated a single substrate among the panel of 11 peptides tested. Using Northern blot analysis, a ppGaNTase-T4 message of 5.5 kilobases was detectable in murine embryonic tissues, as well as the adult sublingual gland, stomach, colon, small intestine, lung, cervix, and uterus with lower levels detected in kidney, liver, heart, brain, spleen, and ovary. Thus, the pattern of expression for ppGaNTase-T4 is more restricted than for the three previously reported isoforms of the enzyme. The variation in expression patterns and substrate specificities of the ppGaNTase enzyme family suggests that differential expression of these isoenzymes may be responsible for the cell-specific repertoire of mucin-type oligosaccharides on cell-surface and secreted O-linked glycoproteins.
The acquisition of carbohydrate side chains in O-glycosidic linkage to either Thr or Ser has a profound structural impact on a polypeptide backbone and thus underlies the unique physicochemical properties of heavily O-glycosylated proteins such as mucin glycoproteins (1). In addition, O-glycans function as ligands for receptors mediating such diverse actions as lymphocyte trafficking (2), sperm-egg binding (3), and tumor cell adhesion (4). The first committed step of O-glycosylation is initiated through the action of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase (EC 2.4.1.41) (ppGaNTase).1 Thus far, three isoforms of this enzyme have been cloned and expressed, ppGaNTase-T1 (5, 6), -T2 (7), and -T3 (8, 9). The three isoforms differ with regard to their patterns of tissue expression and appear to have both overlapping as well as unique substrate preferences. Thus, to fully understand how O-glycosylation of specific peptide sequences is initiated and how the process is regulated in a given tissue or cell type, the repertoire of ppGaNTases in that cell background will have to be defined.
In the present study, we have used degenerate primers derived from conserved sequences within a region spanning 420-aa residues to amplify putative ppGaNTases from RNA samples. Subcloned PCR products were then secondarily screened by plaque hybridization and subjected to sequence analysis. The method was validated by the successful isolation of products corresponding to ppGaNTase-T1, -T2, and -T3 isoforms from murine RNA samples. In addition, the approach led to the detection of a cDNA that appeared to encode a novel form of ppGaNTase. When a truncated version of this cDNA was expressed as a secreted product in COS7 cells, ppGaNTase activity was recovered. However, the peptide substrate specificity of this enzyme appeared to be more selective than ppGaNTase-T1. Further, this isoform displayed a pattern of transcript expression that is more restricted than ppGaNTase-T1, -T2, or -T3 transcripts. Collectively, these data indicated that we have identified a fourth distinct isoform of the ppGaNTase family.
ppGaNTase family members from mouse and/or human
ppGaNTase-T1, -T2, -T3 and Caenorhabditis elegans clones
that encode putative ppGaNTase-TA, -TB, and -TC (ZK688.8, yk2F11, and
yk3G10, respectively)2 were aligned, and
all were found to contain the translated amino acid sequence AGGLF
(Fig. 1) and CH(G/N)XGGNQ (Fig. 1), an average of 180-aa
apart. A pool of sense and antisense strand degenerate PCR primers were
reverse-translated from the consensus aa sequence AGGLF and
CH(G/N)XGGNQ, respectively. Sense primers (primer
1, Fig. 1A) were d(CNCCNAYNNWKGCNGGWGICTITT) and
d(WGCNGGWGGNCTCTT) and antisense primers (primer 2, Fig.
1A) were d(TGRTTNCCHCCIIIRTTRTGRCA), d(TGRTTNCCNCCNNNNCCRTGRCA), and d(TGRTTNCCNCCIIIICCRTGRCA).
Reverse transcriptase-PCR of spleen total RNA yielded the
expected 540-bp PCR product (using the identical protocol as described
in Hagen et al. (5)). The complete PCR sample was cloned
into a M13 cloning vehicle and screened for putative ppGaNTase
fragments using plaque hybridization and a panel of oligonucleotides,
d(GARATHTGGGGNGGNGARAA) (primer 3, Fig. 1A),
d(GTNGGNCAYGTNTTYMG) (primer 4, Fig. 1A), d(TTRTAYTCRTCCATCCANAC) (primer 5, Fig. 1A), and
d(GGRTADATRTTYTCNARRTACCA) (primer 6, Fig. 1A),
which corresponded to the conserved aa residues IWGGEN, CSXVGHVFR,
EVWMDE, and WYLEN, respectively. Positive M13 clones were sequenced
with infrared fluorescent dye-labeled primers on a LiCOR DNA 4000L DNA
sequencer. The insert was used to screen a random and oligo(dT)-primed
ZapII, mouse spleen cDNA library (Stratagene). Three positive
clones were sequenced on both strands (Fig. 2). Sequence alignments
were performed using the Megalign program (DNASTAR) (Fig. 3).
Functional Expression
A 600-bp region of the cDNA,
beginning with the first alanine codon immediately following the
N-terminal transmembrane domain, was amplified using a PCR primer that
introduced a MluI site into the beginning of the luminal
region of the transferase (Fig. 2, aa 34 in ppGaNTase-T4 protein). This
PCR fragment was fully sequenced (found to be identical to ppGaNTase-T4
except for the presence of the MluI site) and reinserted
into the full-length cDNA construct. A
MluI-NheI digest released a 1.6-kb fragment that
contained the coding region of the truncated ppGaNTase-T4 isozyme and 5 nucleotides of 3-untranslated region. This
MluI-NheI fragment was cloned into the
MluI-BamHI sites of a SV40 promoter-driven
mammalian expression vector, pIMKF1, containing an insulin
secretion signal (I) to direct the secretion of the recombinant
ppGaNTase-T4 enzyme into the cell culture medium (5), a metal
binding site (M) to facilitate purification of recombinant enzyme (10),
a heart muscle kinase site (K) to facilitate detection of recombinant protein, and an N-terminal FLAGTM epitope tag (F) to create
pF1-mT4 (Fig. 4). Mock transfections with pSVL-
-galactosidase and a
positive control, pF1-mT1, which expresses mouse ppGaNTase-T1 (9), were
performed in parallel. To assess transfection efficiency, two
independent constructs of pF1-mT4 were used to transfect COS7 cells.
There was
16% variation observed well to well among the independent
transfections (n = 4) with pF1-mT4. The expression
vector, COS7 cell transfections, and kinetic assays were performed as
reported previously (10). Briefly, 1 µg of supercoiled DNA and 8 µl
of LipofectAMINE (Life Technologies, Inc.) was used to transfect 35-mm
wells of COS7 cells at 85-90% confluency as recommended by the
manufacturer. Cells were grown at 30 °C in 2 ml of Dulbecco's
modified Eagle's medium after transfection. After 3 days the enzyme
activity was measured in clarified (centrifuged 100 × g for 10 min) cell culture medium, using 5 µl
of medium and the following assay conditions: a final volume of 25 µl
containing a final concentration of 500 µM peptide EA2,
PTTDSTTPAPTTK (which corresponds to the tandem repeat sequence of rat
submandibular gland apomucin (11)), 50 µM total
UDP-GalNAc containing UDP-[14C]GalNAc (25,000 cpm), 10 mM MnCl2, 40 mM cacodylate, pH 6.5, 40 mM 2-mercaptoethanol, and 0.1% Triton X-100. All assays
were performed in duplicate as indicated in the figures at 37 °C.
One unit of activity was defined as the transfer of 100 pmol of GalNAc to the EA2 peptide/h under these assay conditions.
To compare the substrate preferences of the ppGaNTase-T4 and ppGaNTase-T1 isoforms, a panel of peptide substrates was screened using anti-FLAGTM affinity-purified recombinant ppGaNTase-T4 and -T1 enzymes. Briefly, clarified cell culture medium (1.5 ml) from COS7 cells transfected with either pF1-mT1 or pF1-mT4 was incubated with anti-FLAGTM M2 affinity gel (150 µl of a 50% (v/v) slurry) (Eastman Kodak Co.) overnight at 4 °C with gentle rocking. Bound materials were eluted with 4 mM FLAGTM peptide dissolved in 75 µl of 50 mM NaCl, 50% glycerol, 50 mM sodium cacodylate buffer, pH 6.5. Peptide substrates included EA2, Muc 1a-APPAHGVTSAPDTRPAPGC, and Muc 1b-PDTRPAPGSTAPPAC (12) derived from the protein core of human Muc 1 mucin-type glycoprotein, Muc 2-PTTTPISTTTMVTPTPTPTC (13) derived from human intestinal apomucin, RMUC-176-TTTPDV (14) derived from rat small intestine/colonic apomucin, EPO-T-PPDAATAAPLR and EPO-S-PPDAASAAPLR (5) derived from human erythropoietin, rMUC-2-SPTTSTPISSTPQPTS (15) derived from rat small and large intestinal apomucin, AWN1a-AIPPLNLSCGKE and AWN1b-KRLPTSGHPASP (16) derived from porcine spermadhesin AWN, and MCP2-STSSSTTKSPASSAS (17) derived from human testis membrane cofactor protein. Assays were conducted in duplicate using enzymes derived from the same transfection. To verify that ppGaNTase-T1 and ppGaNTase-T4 have different substrate specificities and that the recombinant preparations do not contain differing levels of transferase inhibitors, mixing experiments were performed in which the level of sugar transfer by each isoform was compared with that obtained when the two isoforms were mixed together.
Northern Blot AnalysisFollowing electrophoresis, mouse total RNA samples were transferred to Hybond-N membranes (Amersham Corp.). A 117-bp segment of the ppGaNTase-T4 cDNA region (nucleotide position 562-679, Fig. 2) was labeled by asymmetric PCR (18) using the antisense oligonucleotide d(CCAGGAAAGTGAGGACA) and then used as a probe for the ppGaNTase-T4 transcript. ppGaNTAse-T1 message was detected by the same method using cDNA sequences encoding aa positions 515-559 of the ppGaNTase-T1 cDNA and oligonucleotide d(AAGAAAGGATTGACTGGGCTAC). Antisense 18 S ribosomal subunit oligonucleotide d(TATTGGAGCTGGAATTACCGCGGCTGCTGG) was end-labeled (19) and used to normalize sample loading by hybridizing with a 5-molar excess of probe. All hybridizations were performed in 5× SSPE/50% formamide at 42 °C with two final washes in 2× SSC/0.1% SDS at 65 °C for 20 min.
Primary sequence alignments of three different isoforms of the ppGaNTase family from different mammals and putative transferases from C. elegans demonstrated that the size of the conserved region of this enzyme is approximately 420 amino acids in length (consensus aa 151-570, Fig. 3). This 420-aa region consists of 13 segments of 7-12 amino acids that are highly conserved (>70% similarity) among 3 mammalian forms and 3 homologues found in C. elegans (Fig. 1A). Two such short blocks of nearly invariant sequences, AGGLF at consensus aa 367 (Fig. 3) and CH(G/N)XGGNQ at consensus aa 543 (Fig. 3), were used to design degenerate PCR primers (Fig. 1, primers 1 and 2). Reverse transcriptase-PCR of 14 different RNA sources produced 540-bp PCR products in all samples tested (data not shown). We continued the analysis of the PCR products from spleen samples because in prior work we had demonstrated that the spleen expresses a low level of ppGaNTase-T1 transcript but expresses high levels of ppGaNTase enzyme activity (20). PCR products were then subcloned to create a library in M13; the resultant clones were re-screened with internal nested primers (Fig. 1A, primers 3, 4, 5, and 6) to eliminate false positive clones.
Based on DNA sequencing results, the degenerate PCR primers and oligonucleotide probes were successful in identifying all known ppGaNTase isoforms (ppGaNTase-T1, -T2, and -T3) from a variety of tissue sources (data not shown), thus validating the approach. In addition, a PCR fragment lacking nucleic acid sequence identity with ppGaNTase-T1, T2, or -T3 was detected among the amplified fragments derived from spleen RNA. This fragment was used to probe a mouse spleen cDNA library (1 × 106 plaques).
cDNA Cloning of ppGaNTase-T4 and Sequence AnalysisOf the 34 positive clones obtained, one contained a 1734-bp open reading frame encoding a 578-amino acid protein (Fig. 2). Conceptual translation of the ppGaNTase-T4 message revealed a protein with a type II membrane protein architecture, typical of the ppGaNTase family. An N-terminal 12-aa cytoplasmic segment preceded a 22-aa hydrophobic transmembrane domain, which was followed by a 544-amino acid C-terminal luminal domain. Potential N-glycosylation sites are found at Asn466 and Asn471. Sequence analysis using the Clustal Method produced alignments demonstrating that this protein is distinct from the ppGaNTase-T1, -T2, and -T3 isozymes yet shares a sequence similarity between consensus aa positions 150 and 570 (Fig. 3). BestFit analysis reveals that the overall amino acid sequence similarity of this 420-aa region is 69, 67, and 68% when mouse ppGaNTase-T4 is compared with mouse-ppGaNTase-T1, human ppGaNTase-T2, and mouse ppGaNTase-T3, respectively. The 100-amino acid segment from consensus positions 358 to 457 is highly conserved as is a span of 26 aa beginning at consensus position 185 (Fig. 3). A search of the combined GenBankTM/NCBI data bases did not reveal any genes or expressed sequence tags that were identical to ppGaNTase-T4.
Functional ExpressionA truncated coding region of the
putative transferase beginning with the first alanine
(Ala34) codon following the N-terminal transmembrane domain
was cloned downstream of a FLAGTM epitope tag and insulin
secretion signal creating the vector pF1-mT4 (Fig. 4).
The expressed product was secreted into the culture medium of the
transfected COS7 cells and shown to be active. A low background of
activity equal to 1% of the wild-type ppGaNTase-T1 activity was
detected in the culture medium of mock-transfected cells. Following
enrichment of each recombinant enzyme by immunoadsorption on
anti-FLAGTM affinity matrix, similar units of ppGaNTase-T1
and ppGaNTase-T4 enzyme activity were compared using the EA2 peptide to
normalize the amount of enzymatic activity used. Recombinant
ppGaNTase-T1 glycosylated 6 peptides from a diverse set of 11 substrates, whereas ppGaNTase-T4 displayed a marked preference for EA2,
a peptide derived from the tandem repeat sequence of rat submandibular
gland mucin (Fig. 5). ppGaNTase-T4 showed limited
activity (252 cpm/h) on five additional substrates.
To demonstrate that ppGaNTase-T1 and ppGaNTase-T4 have different substrate specificities and to rule out the possibility that the results obtained are not due to the presence of specific inhibitors of transferase activity in each enzyme preparation, we performed a mixing experiment under conditions where each reaction was linear (data not shown). As summarized in Table I, the activity of the two isoforms is additive (within a 10% error margin). The calculated sum of ppGaNTase-T1 and -T4 activities per µl of enzyme preparation yielded the same activity as in assays in which both isoforms were added simultaneously.
|
The size and tissue distribution of
the murine ppGaNTase-T4 and -T1 messages clearly differ (Fig.
6). The predominant ppGaNTase-T4 transcript was 5.5 kb;
minor levels of message were 2.8 and 1.7 kb in length. The most
abundant ppGaNTase-T4 signal was observed in the mouse sublingual
gland, stomach, colon, small intestine, and cervix. Intermediate levels
of message were seen in kidney, ovary, lung, and uterus. Low levels of
transcript were observed in the spleen sample with trace amounts
detected in liver, heart, and brain. No signal was detected from
submandibular and parotid glands, skeletal muscle, and testis. In
contrast, the ppGaNTase-T1 message is expressed as a doublet at 3.6 and
4.2 kb at varying levels in all the tissues surveyed.
To help define the repertoire of ppGaNTases in a tissue, we have devised a PCR-based strategy that employs a series of oligonucleotides derived from conserved sequences found within a 420-aa region of the ppGaNTase family of enzymes. PCR products could be amplified with one set of oligonucleotide primers, cloned into an M13 vehicle, and then screened using a second set of degenerate oligonucleotides that are nested within the original primers. This strategy yields an M13 library of the PCR products and a means to discriminate among clones encoding different ppGaNTase isoforms and false PCR products. In contrast to the approach of Bennett et al. (8), we avoided using restriction enzyme cleavage patterns as a means to identify novel isoforms, because some restriction recognition sequences were conserved among different isoforms whereas other sites were not present in all mammalian species. Using this approach we were able to identify products corresponding to the previously identified ppGaNTase-T1, -T2, and -T3 as well as ppGaNTase-T4 described herein. A search of the GenBankTM/NCBI data base failed to identify any identical matches, indicating that this isoform is novel. We have also identified transcripts that encode additional putative transferases, which are expressed in the rat sublingual gland.2 We are currently determining if these species code for active enzyme.
All four members of the ppGaNTase family share a type II membrane architecture, consisting of a short (4-19 aa) N-terminal cytoplasmic segment, a 19-24-aa transmembrane domain, a variable length stem region, and a luminal catalytic domain that is represented by most of the coding region on the C terminus. While there is no obvious primary aa sequence similarity in the N-terminal transmembrane and stem region of the enzymes, the sequence of the Golgi luminal domain is highly conserved within a block of approximately 420 aa (consensus aa 151-570, Fig. 3). This region is noticeably larger than the putative 61-aa "GalNAc-T Motif" previously reported by Bennett et al. (8). The sequence divergence within this large 420-aa region among the known isoforms of ppGaNTases is not sufficiently high to simply identify potential aa residues that may be involved in substrate binding or catalysis.
When partially purified recombinant ppGaNTase-T1 and -T4 were compared for their ability to incorporate GalNAc into a panel of peptide substrates, we determined that only EA2 (PTTDSTTPAPTTK) was a good acceptor for ppGaNTase-T4. Thus, ppGaNTase-T4 transferred GalNAc to EA2 at a rate approximately 12 times greater than that observed for the single site substrate EPO-T. This is in contrast to ppGaNTase-T1, which transfers GalNAc to EA2 at only a 2-fold higher rate than EPO-T (Fig. 5). To verify that ppGaNTase-T1 and -T4 have different substrate specificities, we took advantage of the apparent substrate preferences for EA2 and EPO-T and performed mixing experiments under conditions in which each reaction was linear. The activity of the transferases in a mixture was shown to be additive. This result eliminates the possibility that inhibitors from each enzyme preparation account for the differences in substrate specificity observed.
The patterns of transcript expression of the known forms of ppGaNTase differ. Based upon our work (Refs. 9 and 20 and the present study) and that of others (8) it appears that ppGaNTase-T1 and -T2 are expressed ubiquitously, -T3 is expressed in a more limited range of tissues, and -T4 more restricted still. Perhaps the pattern of expression reflects the diversity of endogenous substrates that must be O-glycosylated. For example, the rat submandibular gland, which expresses a relatively simple apomucin (11), expresses relatively low levels of ppGaNTase-T1 and -T3 transcript but not -T4. In contrast, the rat sublingual gland, which produces a much larger apomucin,3 expresses at least seven forms of ppGaNTase transcript (i.e. ppGaNTase-T1, -T2, -T3, and -T4 plus three as yet uncharacterized forms identified with the PCR approach described here), with very high levels of ppGaNTase-T4 noted (Fig. 5). Thus, it is conceivable that more than one enzyme may be required to fully glycosylate the complete repertoire of mucin-type substrates expressed in a given cell. This likely accounts for the differences observed in the tissue-specific expression of the various ppGaNTases identified to date.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) U73820[GenBank] (mouse ppGaNTase-T1) and U73819[GenBank] (mouse ppGaNTase-T4).