(Received for publication, May 22, 1995; and in revised form, August 21, 1995)
From the
Monoclonal antibodies were raised to membrane-bound proteoglycans derived from rat brain, and four monoclonal antibodies that recognized a 150-kDa chondroitin sulfate proteoglycan with a core glycoprotein of 120 kDa were obtained. Immunohistological study revealed that the proteoglycan was associated with developing neurons. We screened rat brain cDNA libraries using the four monoclonal antibodies and isolated overlapping cDNA clones that encoded the entire core protein of 514 amino acids plus a 30-residue signal peptide. The deduced amino acid sequence suggested an integral membrane protein divided into five structurally different domains: an N-terminal domain to which chondroitin sulfate chains might be attached, a basic amino acid cluster consisting of seven arginine and two lysine residues, a cysteine-containing domain, a membrane-spanning segment, and a C-terminal cytoplasmic domain of 95 amino acids. On Northern blots, the cDNA hybridized with a single mRNA of 3.1 kilobases that was detectable in brains of neonatal and adult rats but not in kidney, liver, lung, and muscle of either. The sequence of the proteoglycan did not exhibit significant homology to any other known protein, indicating that the proteoglycan, designated neuroglycan C, is a novel integral membrane proteoglycan.
Proteoglycans are molecules in which glycosaminoglycan chains are covalently linked to core glycoproteins. They are major glycoconjugates, located at the cell surface and in extracellular spaces, and they are considered to have various regulatory effects on cells by binding to chemotrophic factors, cell adhesion molecules, and extracellular matrix glycoproteins(1, 2) . Early biochemical studies of the mammalian brain revealed the presence of a varied set of proteoglycans whose expression is regulated precisely during brain development(3, 4) . The developmentally regulated expression of such molecules has been implicated in the formation of several complicated structures in the brain, such as the mossy fiber(5) , visual(6) , and cortical barrel systems(7, 8, 9) . Studies in vitro have also shown that neuronal proteoglycans are involved in the morphogenesis of neurons, for example, in neuritogenesis. They are capable of exerting both positive (10, 11) and negative (12, 13, 14, 15) effects on the outgrowth of neurites in vitro.
Our understanding of
the functions of neuronal proteoglycans is expanding as the primary
structures of core proteins are elucidated. Such analyses have revealed
that a number of proteoglycan core proteins share similar sequence
motifs and can be grouped into several families. Neurocan(16) ,
brevican(17) , versican(18) , and BEHAB (19) are members of a soluble chondroitin sulfate proteoglycan
family; their sequences include the tandem repeats of
hyaluronan-binding domain. N-Syndecan (20) is a member
of the syndecan family of heparan sulfate proteoglycans, which have
extensively similar transmembrane and cytoplasmic domains. Moreover, it
was recently revealed that a chondroitin sulfate proteoglycan,
phosphacan/6B4 proteoglycan, is an extracellular variant of a
receptor-like protein-tyrosine phosphatase,
RPTP/
(21, 22) . Although primary structures
of a number of core proteins have been reported, more proteoglycans in
the brain remain to be characterized. In particular, the complete
primary structures of only three membrane-bound proteoglycans have been
reported, namely NG2(23) , glypican(24) , and
cerebroglycan(25) , although there appear to be at least 16
core proteins in the membrane-bound fraction of the rat
brain(4) . Therefore, we attempted to raise monoclonal
antibodies against membrane-bound proteoglycans and to isolate the cDNA
clones that encoded for a proteoglycan-core protein.
Here, we report
a novel membrane-spanning chondroitin sulfate proteoglycan that is
recognized by a panel of monoclonal antibodies. The predicted protein
is an integral membrane protein containing a signal peptide, a
chondroitin sulfate-attachment domain, a cluster of basic amino acids,
a cysteine-containing domain, a 24-amino acid transmembrane domain, and
a cytoplasmic tail. The amino acid sequence of the core protein
exhibits little similarity to other known proteins, indicating that the
proteoglycan is a novel species of integral membrane proteoglycan. We
propose the name neuroglycan C (NGC ()from neuronal
proteoglycan with chondroitin sulfate) for this proteoglycan.
The proteoglycan fraction (eluted by
0.45-0.68 M NaCl) was pooled and concentrated to 5 ml on
a Diaflo YM-10 membrane (Amicon Corp., Danvers, MA). The concentrated
solution was chromatographed on a column (1.6 cm, inner diameter,
100 cm) of Sepharose CL-4B (Pharmacia) in 4 M guanidine HCl, 50 mM Tris-HCl, pH 7.5, that contained
0.2% Nonidet P-40 and the protease inhibitors. The proteoglycan
fraction (K
from 0.11 to 0.65) was pooled and concentrated
to 3 ml on a Diaflo YM-10 membrane. Proteoglycans were precipitated
from the solution by addition of 3 volumes of 95% ethanol that
contained 1.3% (w/v) potassium acetate. The precipitate was washed
twice with 75% ethanol that contained 1% (w/v) potassium acetate at 0
°C, and then it was dried over P
O
in
vacuo.
The dried material was dissolved in 10 ml of 4 M guanidine HCl, 50 mM Tris-HCl, pH 7.5 (guanidine
HCl buffer), at room temperature. The solution was applied to a column
(1.6 cm, inner diameter, 5.0 cm) of octyl-Sepharose
(Pharmacia). Elution was performed with a linear gradient of Nonidet
P-40 from 0 to 0.8% (v/v) in 100 ml of the guanidine HCl buffer.
Fractions eluted at a detergent concentration from 0.3 to 0.6% were
pooled, and proteoglycans in the pooled fraction were precipitated with
ethanol and dried as described above.
The proteoglycan preparation
was further purified by ultracentrifugation in a CsCl density gradient
at an initial density of 1.38 g/ml in 4 M guanidine HCl, 50
mM Tris-HCl, pH 7.5, at 10 °C in an RPS-65T rotor
(Hitachi, Tokyo) at 150,000 g for 30 h. After
centrifugation, the sample was collected as seven fractions. Fractions
numbered 2 to 6 were pooled and dialyzed against PBS at 4 °C. The
amounts of hexuronate and protein in the proteoglycan solution were
measured by the method of Bitter and Muir (26) and a protein
assay kit (Bio-Rad), respectively. This purification procedure usually
yielded about 3 µmol of hexuronate (2.2 mg of protein) from 100
brains.
Soluble proteoglycans were extracted from 10-day-old rat
brains with PBS that contained protease inhibitors and purified as
described previously(3) . In brief, proteoglycans were purified
sequentially by column chromatography on DEAE-Sepharose and by gel
filtration on Sepharose CL-4B. Fractions with K that
ranged from 0.45 to 0.60 were utilized for Western blot analysis.
Figure 2: Spatiotemporal patterns of expression of NGC in the rat cerebral cortex. A and B, a section of coronal tissue from the cerebral cortex of a 0- (A) and a 7- (B) day-old-rat was immunostained with mAb C5. Immunopositive material associated with neurons can be seen throughout the cortex. Scale = 100 µm. C, developmental changes in the relative amounts of the core glycoprotein of the novel proteoglycan. The intensities of the immunolabeled bands of protein on a Western blot were quantified by densitometry. The levels are shown as percentages of the level on P20.
The purified 120-kDa core protein was subjected to amino acid sequence analysis. Fragments of the 120-kDa core protein were obtained by treatment with cyanogen bromide (CNBr) as described by Scott et al.(28) . The intact core protein and the fragments were separated by SDS-PAGE, as described above, and transferred to a PVDF membrane at 0.3 A for 3 h at 4 °C in 10 mM CAPS buffer (pH 11) that contained 10% methanol. After staining with 0.5% Ponceau Red in 1% acetic acid, bands of protein were excised from the membrane, and proteins were sequenced on an automatic protein sequencer (PPSQ-10; Shimadzu, Kyoto, Japan).
Figure 3: Sequence of the cDNA and the deduced amino acid sequence of rat NGC. The single-letter code for amino acids is used. The putative transmembrane domain is shown by dashed underlining. Deduced sequences corresponding to directly determined sequences are underlined. A cluster of basic amino residues is indicated by double underlining. Cysteine residues are circled. Potential sites of N-linked glycosylation are indicated by filled triangles. Potential sites of glycosaminoglycan attachment are boxed. Potential sites of phosphorylation by protein kinase C in the cytoplasmic domain are indicated by open triangles.
Figure 1: Western blot of a partially purified preparation of membrane-bound proteoglycans and a preparation of PBSsoluble proteoglycans from brains of 10-day-old rats. The preparation of partially purified membrane-bound proteoglycans (A1) was digested with chondroitinase ABC (A2), chndroitinase ABC plus heparitinase I (A3 and B1), neuraminidase (B2), O-glycanase (B3), or N-glycanase (B4). In B, samples were digested sequentially with these glycosidases. Samples (2 nmol of uronic acid for the preparation of intact membrane-bound proteoglycans (A1, and 0.2 nmol for others) were subjected to electrophoresis on a 6% polyacrylamide gel, blotted onto a PVDF membrane, and stained with mAb C5. The positions of molecular mass markers are indicated in kDa.
To examine the possibility that the proteoglycan might have oligosaccharide side chains in addition to chondroitin sulfate chains, the core glycoprotein was digested sequentially with three glycosidases, as shown in Fig. 1B, and the molecular mass of the deglycosylated core protein was estimated by SDS-PAGE. Each treatment with glycosidase resulted in an increase in the mobility of the band of the core protein on SDS-PAGE. The ability of mAb C5 to recognize the core protein was not affected by digestion with any of the glycosidases. Therefore, it appeared that the epitope recognized by mAb C5 was present on the polypeptide moiety.
The spatial expression of the proteoglycan in the rat cerebral cortex was examined by immunohistochemical staining with mAb C5. On postnatal day 0 (P0), immunolabelings with mAb C5 was associated with developing neuronal cells and was seen throughout the cerebral cortex (Fig. 2A). A similar pattern of staining was seen in the cerebral cortex on P7 (Fig. 2B). As the brain matured, the intensity of immunostaining of the cortex decreased (data not shown). A similar spatial expression of the proteoglycan was also observed when sections were stained with other mAbs C1, C3, and C15. Fresh medium containing normal mouse IgG (5 µg/ml, Sigma) was used in place of mAb solutions for negative control of immunostaining. The control sections were virtually free of immunolabelings.
To quantify the temporal expression of the proteoglycan in the cerebral cortex, PBS-insoluble extracts of the cerebral cortex at various developmental stages (from embryonic day 12 (E12) to adulthood) were digested with chondroitinase ABC and processed for immunoblotting, and the intensity of the immunolabeled band was quantified. A small amount of the proteoglycan was detected on E16 through E18, and the amount of the proteoglycan increased to reach a maximum level around P20 (Fig. 2C). After P20, the amount of the proteoglycan component decreased, and in the mature cerebral cortex this proteoglycan was expressed at approximately half of the peak level of expression. This biochemical data confirmed our immunohistochemical data.
One positive clone (NGC1) from the
gt11 cDNA
library (random primers) derived from a rat brain on P18 and another
positive clone (
NGC3) from the
gt11 cDNA library (oligo(dT)
primers) derived from a rat brain on P8 were initially isolated by
immunoscreening with a mixture of the four mAbs, C1, C3, C5, and C15.
NGC1 and
NGC3 contained inserts of 682 bp (nucleotides
108-790 in Fig. 3) and 1,398 bp (nucleotides
710-2,107), respectively. The fusion protein with
-galactosidase from
NGC1 reacted with only mAb C1 and not
with mAbs C3, C5, and C15. By contrast, the fusion protein from
NGC3 reacted with mAbs C3, C5, and C15 but not with mAb C1. The
inserts of
NGC1 and
NGC3 were subcloned into the pBluescript
II plasmid vector for further analyses (to yield pNGC1 and pNGC3,
respectively). The authenticity of these clones was unequivocally
established by identification of an amino acid sequence from the
120-kDa core glycoprotein and the 24-kDa CNBr-generated fragment within
the amino acid sequence deduced from the insert in pNGC1 and by that
from the 15-kDa CNBr fragment within the amino acid sequence deduced
from the insert in pNGC3 (see Fig. 3, underlining).
The insert in pNGC1 was labeled with [P]dCTP
and used as a probe to identify overlapping cDNA clones. Four
overlapping cDNA clones (
NGC7 (nucleotides 339-1, 219 in Fig. 3),
NGC13 (nucleotides >694; 1.6 kb),
NGC15
(nucleotides 205-683), and
NGC19 (nucleotides >752; 2.2
kb)), obtained from the
gt11 cDNA library (random primers), were
covered the entire coding region of the proteoglycan, which we
designated as NGC.
With the same probe, mRNA for NGC was detected by Northern blot analysis (Fig. 4). A single transcript of 3.1 kb was detected in analyses of brains of 7-day-old and adult rats. This transcript was not detected in analyses of kidney, liver, lung, and muscle.
Figure 4:
Northern blot of mRNAs from brains of
7-day-old (B7) and adult (Ba) rats and from kidney (K), liver (Li), lung (Lu), and muscle (M) of adult
rat. Samples (5 µg of poly(A) RNA) were subjected
to electrophoresis on a 1% agarose gel that contained 2.2 M
formaldehyde, blotted onto a PVDF membrane, and hybridized with the
P-labeled insert of
NGC1. NGC mRNA was detected in
the rat brain. This transcript was undetectable in the case of all
non-neural tissues, even after prolonged autoradiography. The positions
of 28 and 18 S ribosomal RNA are indicated on the left for
reference.
The size of the open reading frame was confirmed by reverse transcriptase-mediated PCR. All of the amplified DNA fragments using multiple pairs of primers (see ``Experimental Procedures'') had reasonable sizes, which are predicted from the cDNA sequence. Primers 1 and 2 used for reverse transcriptase-mediated PCR are positioned on the N-terminal sequence of the intact core glycoprotein determined by amino acid sequence analysis. In addition, primer 5 is positioned on the predicted C-terminal sequence. Therefore, the size of the open reading frame shown in Fig. 3can be considered to be appropriate. After removal of oligosaccharide side chains, however, the estimated molecular mass of the core protein was 100 kDa (Fig. 1), which is still considerably larger than the calculated molecular mass (55.8 kDa) of the mature core protein with 514 amino acid residues encoded by the cloned cDNA. This discrepancy might be attributable to the anomalously slow electrophoretic migration of glycosylated proteins that results from decreased binding of SDS(35) , since after digestion by the glycosidases some oligosaccharides might still remain on the core protein. Differences between actual and apparent molecular masses have been reported for the core proteins of other proteoglycans, such as syndecan(36) , versican(18, 37) , and neurocan(16) .
The core protein of NGC was rich in glycine (10.5%), leucine (9.9%), proline (9.0%), glutamic acid (8.1%), serine (7.9%), and threonine (7.7%) residues. The calculated pI is 4.9. We found a total of 10 cysteine residues in the core protein. All of them were localized in the central domain, just outside the transmembrane domain. The predicted extracellular domain of the NGC core protein contained three potential sites of N-glycosylation (38) and three serine-threonine clusters (residues 143-144, 188-189, and 271-272) that could serve as acceptors for O-linked carbohydrates (39) .
The N-terminal region, from amino acid residues 31-281, contained eight serine-glycine (SG) or glycine-serine (GS) dipeptide sequences. These dipeptides have been proposed to be core portions of the consensus sequence of attachment site for chondroitin sulfate(18, 40) . The amino acid sequences around the eight serine residues differed from the consensus sequences for attachment sites for glycosaminoglycans, which are SGXG (40) and (E/D)GSG(E/D)(18) . However, these consensus sequences are not the only ones that allow attachment of glycosaminoglycans to core proteins. In neurocan, for example, the sequence around a serine residue to which a chondroitin sulfate chain is attached is EEVASGQED(16) . The alignment of amino acids characteristic of attachment sites for chondroitin sulfate in reported proteoglycans, such as aggrecan, versican, syndecan, decorin, and collagen type IX, is SG/GS with preceding and following acidic amino acid residues. The importance of acidic amino acid residues in the attachment of chondroitin sulfate has also been noted elsewhere(18, 40, 41) . The eight SG/GS dipeptides of the NGC core protein were arrayed in sequence with this consensus-type arrangement. NGC contained two additional dipeptide sequences (serine residues at positions 341 and 374) in its putative extracellular domain. However, these dipeptides were not associated with this consensus-type organization. Therefore, the N-terminal domain is putatively defined as the chondroitin sulfate attachment domain.
A short stretch of basic amino acids KRRKRRRRIR (residues 282-291) is found (double underlining in Fig. 3) adjacent to this putative chondroitin sulfate-attachment domain. The basic amino acid residues could contribute to the proteolytic processing of this proteoglycan since they correspond to sites of cleavage by serine proteases that might cooperate in the regulation of neurite outgrowth (42, 43, 44, 45, 46, 47) . The soluble type of NGC might be such a proteolytic product since NGC mRNA was detected as a single 3.1-kb transcript on Northern blots of mRNAs from the rat brain (Fig. 4). Moreover, we reported previously the proteolytic processing of another chondroitin sulfate proteoglycan, neurocan(48) .
The cytoplasmic domain of NGC contained threonine and serine residues that could be potential sites of phosphorylation by protein kinases. The amino acid sequences around the threonine residue at position 465 (KLRRTNK) and around the serine residue at position 521 (SPK) are similar to the consensus sequence for the sites of phosphorylation by protein kinase C (X(R/K)XX(T/S)X(R/K)) proposed by Graff et al.(49) . Although it remains to be determined whether NGC is phosphorylated, it is possible that NGC might be involved in signal transduction. The putative cytoplasmic domain contains two tyrosine residues, but the sequences adjacent to these tyrosine residues differ from the reported consensus sequences for tyrosine phosphorylation(50) . Fig. 5is a schematic representation of the proposed domain structure of the NGC core protein.
Figure 5: Schematic representation of the structure of the NGC core protein. The structural organization of the NGC core protein is proposed from the deduced amino acid sequence. The putative signal sequence is shown as a black box, a cluster of basic amino acids is shaded with dots, and a transmembrane domain is diagonally striped. Potential sites of glycosaminoglycan attachment are indicated by solid vertical bars, potential sites of N-linked glycosylation are indicated by filled triangles, and potential sites of phosphorylation by protein kinase C in the cytoplasmic domain are indicated by open triangles. The cysteine residues in the predicted core protein are also indicated.
A search of data base analyses at the amino acid level indicated that small portions of NGC are homologous to regions of the core protein of aggrecan, a large chondroitin sulfate proteoglycan of cartilage(51) . All of the homologous fragments are distributed in the putative extracellular domain of NGC. In the putative chondroitin sulfate-attachment domain, we identified eleven homologous clusters: residues 45-94 of NGC and residues 907-956 of aggrecan (32% identity/48% chemical similarity); 26-70 and 967-1011 (29%/56%), 53-103 and 955-1005 (22%/47%), 88-113 and 1585-1610 (42%/54%), 117-151 and 1649-1683 (31%/40%), 43-92 and 1130-1179 (28%/38%), 50-94 and 892-936 (27%/47%), 59-108 and 1233-1282 (22%/44%), 122-163 and 1296-1337 (29%/36%), 109-153 and 1705-1749 (31%/40%), and 90-114 and 1108-1132 (32%/56%). All of the corresponding homologous clusters in aggrecan are localized in the putative chondroitin sulfate-attachment domain. In the cysteine-containing domain of NGC, we found three homologous clusters: 287-322 and 1827-1862 (31% identity/42% chemical similarity), 348-386 and 1898-1936 (23%/38%), and 401-411 and 1998-2008 (55%/55%). All of the corresponding homologous clusters in aggrecan are also localized in one of the cysteine-containing domains of aggrecan. A schematic alignment of homologous sequences in rat aggrecan and NGC is shown in Fig. 6. Aggrecan consists of several structurally different domains, namely an immunoglobulin-like domain and tandem repeats of a hyaluronic acid-binding region in the N-terminal portion, a chondroitin sulfate-attachment domain in the central portion, and epidermal growth factor-like domains, a lectin-like domain, and a complement regulatory protein-like domain in the C-terminal portion(51) . Several other chondroitin sulfate proteoglycans, such as versican(18) , neurocan(16) , brevican(17) , and BEHAB(19) , also include these domains. Therefore, they are all considered to be members of the aggrecan family. NGC is not a member of this family since the NGC core protein does not have any of these domains except for a chondroitin sulfate-attachment domain.
Figure 6: Schematic alignment of amino acid sequences in rat aggrecan (52) with those in NGC. Sequences were compared with a local alignment search tool(30) . Central positions of homologous clusters in rat aggrecan and in NGC are connected by lines. CS, CS1, CS2, and CS3, chondroitin sulfate-attachment domains; H1 and H2, hyaluronic acid-binding domains; KS, a keratan sulfate-attachment domain; G(ELC), a globular domain that includes epidermal growth factor-like domains, a lectin-like domain, and a complement regulatory protein-like domain; B, a stretch of basic amino acid residues; G, a globular domain; T, a transmembrane region; CP, a cytoplasmic domain. A scale in amino acid residues is shown.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U33553[GenBank].