(Received for publication, June 21, 1994)
From the
We have isolated and sequenced cDNA clones that encode the core
protein of PG-M-like proteoglycan produced by cultured mouse aortic
endothelial cells (Morita, H., Takeuchi, T., Suzuki, S., Maeda, K.,
Yamada, K., Eguchi, G., and Kimata, K.(1990) Biochem. J. 265,
61-68). A homology search of the cDNA sequence has suggested that
the core protein is a mouse equivalent of chick PG-M(V1), one of the
alternatively spliced forms of the PG-M core protein, which may
correspond to human versican. Northern blot analysis revealed three
mRNA species of 10, 9, and 8 kilobases (kb) in size. The analysis of
PG-M mRNA species in embryonic limb buds and adult brain revealed the
presence of other mRNA species with different sizes; the one with the
largest size (12 kb) was found in embryonic limb buds, and the ones
with smaller sizes of 7.5 and 6.5 kb were in adult brain. Sequencing of
cDNA clones for the smaller forms in the adult brain showed that they
were different from PG-M(V1) in encoding the second chondroitin sulfate
attachment domain (CS ) alone. Occurrence of the PCR products
striding over the junction of the first and second chondroitin sulfate
attachment domains suggested that a mRNA of 12 kb in size corresponded
to a transcript without the alternative splicing (PG-M(V0)). It is
likely, therefore, that multiforms of the PG-M core protein may be
generated by alternative usage of either or both of the two different
chondroitin sulfate attachment domains (
and
) and that
molecular forms of PG-M may vary from tissue to tissue by such an
alternative splicing.
PG-M, a large chondroitin sulfate proteoglycan, is one of the major extracellular matrix molecules being expressed in various developing tissues as well as in several differentiated tissues such as smooth muscle in aorta(1) . This proteoglycan was for the first time isolated from chick limb buds at the prechondrogenic stages (2) and was shown to be expressed in the mesenchymal cell condensation area of chick limb buds(2, 3) . Such an expression pattern has suggested that PG-M may play some important roles as extracellular factors in actively differentiating tissues(1, 3) .
Recently, we have isolated cDNA
clones encoding the entire core protein of chick PG-M(4) . The
analysis of the deduced amino acid sequence revealed the presence of a
hyaluronan-binding domain at the amino terminus and two epidermal
growth factor (EGF)()-like domains, a lectin-like domain and
a complement regulatory protein-like domain at the carboxyl terminus. A
human fibroblast chondroitin sulfate proteoglycan, versican, has been
shown to have these structural characteristics and also have domains at
both the amino terminus and the carboxyl terminus (5) with an
extremely high identity to those of the PG-M core protein. Although the
chondroitin sulfate attachment domain in the middle region of the PG-M
core protein showed a low identity to the corresponding domain of
versican, the finding of PG-M(V1), an alternatively spliced form of the
PG-M core protein in the chondroitin sulfate attachment region, which
was about 100 kDa smaller than the original form and similar in size to
versican, has suggested that core proteins of various sizes are
generated by alternative splicing in the chondroitin sulfate attachment
region of the PG-M core protein and that versican could be a human
equivalent of chick PG-M(V1)(4) .
We have shown that END-D cells derived from mouse aortic endothelium produce a large chondroitin sulfate proteoglycan capable of binding to hyaluronan, and judging from their structural and functional similarity we have concluded that the proteoglycan is identical or very closely related to chick PG-M(6) . Large aggregating chondroitin sulfate proteoglycans have recently been isolated from various tissues and cultured cells such as blood vessels(7, 8, 9) , brain(10) , aortic smooth muscle cells(11) , and skeletal muscle(12) . They also seem to be identical or closely related to PG-M or versican. In addition, by immunological analysis of chick PG-M core protein four core molecules different in size (550, 500, 450, and 300-350 kDa) were detected in chondroitin ABC lyase (EC 4.2.2.4) digested extracts of various embryonic chicken tissues(1) . Such heterogeneity of this molecule in structure as well as in distribution, together with the necessity not only to clarify the relationship between the multiple forms of PG-M and versican but also to examine the biological functions of PG-M by gene manipulation, has led us to perform cDNA analysis for the PG-M core proteins of mouse endothelial cells and subsequently those in a variety of other mouse tissues.
In the present study, we show a full-length
cDNA sequence that encodes the core protein of a large chondroitin
sulfate proteoglycan produced by mouse aortic endothelial cells. The
analysis of the deduced amino acid sequence revealed that this molecule
is one of the spliced forms of the PG-M core proteins corresponding to
PG-M(V1)/versican. Northern blot analysis for mRNA species of various
mouse tissues and subsequent sequencing of these transcripts suggest
that a molecular form of PG-M may vary from tissue to tissue by
alternative usage of at least two different exons ( and
)
encoding the chondroitin sulfate attachment region. A proposed
designation for such a proteoglycan population and possible biological
meanings of the heterogeneity by alternative splicing are discussed.
Figure 1: Comparative diagrams of isolated cDNA clones of mouse PG-M(V1) of cultured aortic endothelial cells and PG-M(V2) of adult mouse brain. Sizes and locations of the overlapping cDNA clones are indicated. The coding and noncoding regions are indicated by thick and thin lines, respectively. EcoRI restriction sites are indicated by arrowheads. The core protein structures are shown schematically. Domains of the coding region with high homology at both the amino-terminal and carboxyl-terminal regions are boxed and patterned. Chondroitin sulfate attachment domains (alpha and beta) are also boxed and patterned.
Figure 2:
Nucleotide sequence and deduced amino acid
sequence for mouse aortic endothelial cell PG-M core protein, PG-M(V1) (A), and those for adult mouse brain PG-M(V2) (B).
The cDNA sequence for PG-M(V1) determined from the overlapping clones
of Fig. 1together with the translation of an open reading frame
of 2397 residues is shown in panel A. The nucleic acid and
predicted amino acid sequence for the chondroitin sulfate attachment
domain (), which is unique to PG-M(V2) and contiguous to those of
the amino- and carboxyl-terminal domains common to PG-M(V1), is shown
in panel B. Consensus sequences for chondroitin sulfate
attachment sites (acidic X-Ser-Gly, where X is one or
two amino acids and either a hydrophobic or small neutral residue)
proposed by Bourdon (18) are underlined. Alternative
splicing sites are indicated by double underlines in the
nucleotide sequence. Potential N-glycosylation sites are
indicated by asterisks. Portions showing high identity with
human versican in the chondroitin sulfate attachment domain (
) of
PG-M(V1) (amino acid positions 349-380 and 2061-2090 in panel A) are dotted
underlined.
Figure 5:
PCR analysis and the sequences at the
alternatively spliced sites of PG-M(V0). A, the positions of
specific primers used for PCR analysis are indicated as a and b. Domains at both the amino- and carboxyl-terminal regions
and the chondroitin sulfate attachment domains ( and
) at the
central region are boxed and patterned as shown in Fig. 1. B, the PCR products are analyzed by agarose gel
electrophoresis. DNA size markers are shown at the right. C, the PCR product from mouse limb bud cDNA library was
sequenced, and the part of the sequence around the alternative splicing
site was shown. The nucleotide positions for the alternative splicing
sites are indicated by arrows, and both the nucleotide
positions and sequences for the chondroitin sulfate attachment domain
are shown in boldface type.
The composite sequence is 7547 nucleotides long and encodes 2397 amino acids ( Fig. 1and Fig. 2A), which suggests that this isolated cDNA may correspond to chicken PG-M(V1), a short one of the alternatively spliced forms of the PG-M core protein (4) or human versican(5) . Included in this sequence are 178 nucleotides of 5`-leader and 175 nucleotides of 3`-trailer sequence. The nucleotide sequence immediately upstream from the ATG codon is in good agreement with the consensus sequence for translational initiation in eukaryotes(16) . The putative signal peptide sequence consists of the amino-terminal 20 amino acid residues with a putative cleavage site between Ala-20 and Leu-21. This site is in agreement with the(-3, -1) rule of von Heijne(17) . There are a total of 36 cysteine residues in the core protein, of which 22 residues are in the carboxyl-terminal portion, 12 residues are in the amino-terminal portion, and 2 residues are in the amino-terminal side of the chondroitin sulfate attachment region. Different from chicken and human molecules, one extra cysteine residue is present at the amino-terminal region. There are a total of 31 potential chondroitin sulfate attachment sites at the middle part of the core protein, that have Ser-Gly or Gly-Ser sequences. Eleven of them are in good agreement with the consensus sequence of chondroitin sulfate attachment sites, acidic X-Ser-Gly proposed by Bourdon(18) . In addition, there are nine potential N-glycosylation sites (19) and 45 potential threonine-O-glycosylation sites(20) . Both types of glycosylation sites are distributed almost uniformly on the core protein.
As was expected of mouse PG-M, the deduced amino acid sequence for the core protein revealed the presence of a link protein-like sequence (corresponding to the hyaluronan-binding domain) at the amino-terminal region and two EGF-like sequences, a lectin-like sequence, and a complement regulatory protein-like sequence at the carboxyl-terminal region. In addition, those domains of this mouse proteoglycan show an extremely high identity to corresponding domains of chicken and human molecules (Fig. 3), particularly at the carboxyl-terminal regions. Of the two EGF-like domains consisting of 76 amino acid residues, the second domain consisting of 39 amino acid residues was completely identical in amino acid sequences between mouse and human. Further, in the lectin-like domain composed of 129 amino acid residues, only 1 residue was different between mouse and human, and 5 residues between mouse and chicken. In the complement regulatory protein-like domain composed of 61 amino acid residues, only 1 residue was different between mouse and human, and 3 residues were different between mouse and chicken. Cysteine residues in these domains were all at the same positions among three animals.
Figure 3: Comparisons of amino acid sequences of the amino-terminal region and the carboxyl-terminal region. The same amino acid residues among human, mouse, and chick are shaded. Cysteine residues are shown in boldface type, and one residue specifically found in mouse PG-M is indicated by an asterisk. The data of human versican are taken from (5) .
In contrast, such a high identity was not seen in the chondroitin sulfate attachment region at the middle part of the core protein. In addition, based on a comparison of molecular weights estimated from deduced amino acid sequences the region was about 100 kDa smaller than that of chicken PG-M core protein (3562 amino acid residues) and appeared to correspond to chicken PG-M(V1) or human versican. Thus, a comparison of the amino acid sequence of the chondroitin sulfate attachment region was made among these molecular species. Some identity was observed between mouse and human (52.8%), and essentially no identity was observed between mouse and chicken (21.7%). Interestingly, about 30 amino acid residues at both the starting and ending portions of this region (see Fig. 2A) showed a high identity (about 80%) between mouse and human.
Figure 4:
Northern blot analysis of mRNAs for PG-M
core proteins in various mouse tissues. Poly(A) RNA
samples from cultured END-D cells (lanes 1 and 2),
adult brain (lanes 3 and 4), and embryonic limb buds (lanes 5 and 6) were electrophoresed and transferred
to a Hybond N
membrane. The bound RNAs were first
hybridized with probe A (lanes 1, 3, and 5)
and then hybridized with probe B (lanes 2, 4, and 6) as described under ``Materials and Methods.'' The
location of each probe is shown at the top. Positions of RNA
molecular size markers are indicated in kilobases at the right. Six mRNA bands in different sizes are shown by arrows at the left.
It was of note that probe B shown in Fig. 4(top) was hybridized to the four mRNA bands of
12, 10, 9, and 8 kb in size from END-D cells and limb buds sample (Fig. 4, lanes 2 and 6) but not to the smaller
two bands of 7.5 and 6.5 kb in size from adult brain sample (Fig. 4, lane 4). Since probe B is for the chondroitin
sulfate attachment domain derived from one large exon of 5.2 kb, ()the result suggests that the 7.5- and 6.5-kb PG-M
transcripts may have the different chondroitin sulfate attachment
regions generated by alternative usage of different exons encoding the
chondroitin sulfate attachment domains, which was expected from the
difference in the chondroitin sulfate attachment region found between
chicken PG-M and PG-M(V1)(4) .
Figure 6:
Schematic presentation of the
relationship among different PG-M transcripts (A) and
predicted structure of PG-M(V0) (B). A, domains at
both the amino- and carboxyl-terminal regions and the chondroitin
sulfate attachment domains ( and
) at the central region are boxed and patterned as shown in Fig. 1. B, locations of chondroitin sulfate attachment sites (Ser-Gly
or Gly-Ser sequences), N-glycosylation sites, and cysteine
residues are shown by vertical lines, respectively. Consensus
sequences for chondroitin sulfate attachment sites proposed by Bourdon (18) are indicated by thick vertical lines. One
cysteine residue specifically found in mouse PG-M is indicated by an arrowhead.
The relationship among three transcripts for the different forms of the PG-M is shown in Fig. 6A, and schematic presentation of the PG-M(V0) structure including putative chondroitin sulfate attachment sites, putative N-glycosylation sites, and the locations of cysteine residues is in Fig. 6B.
Earlier studies including ours on proteoglycans synthesized by stage 22-23 chick embryo limb buds revealed the occurrence of a unique band with a sedimentation velocity that was different from those of proteoglycans found in differentiated cartilage. This component was referred to as Fraction III(21) , PCS-M(22) , Fraction II(23) , or PGS(LM)-1(24) . In 1986 we for the first time isolated and characterized the proteoglycan and named it PG-M because of the predominant proteoglycan from the mesenchyme(2) . Since then, we have continued to use this designation in our studies on functions and distribution of PG-M(1, 3, 4, 25, 26, 27) . In addition, we have shown from cDNA study on PG-M from chick limb buds that versican may only correspond to one of the alternatively spliced forms of the PG-M core protein(4) . We therefore continue to use this designation, although the name of versican has become popular in any event.
In the present study we have shown a full-length cDNA
sequence of mouse PG-M and suggested that alternative splicing of a
primary PG-M gene transcript generates multiforms of the core protein.
As shown in Fig. 6A we have demonstrated the occurrence
of at least three different forms of the PG-M core protein caused by
alternative usage of two different exons encoding chondroitin sulfate
attachment region (CS and
domains). We have proposed the
designation for each transcript of the PG-M core protein as follows:
PG-M(V0), the transcript (12 kb) without the alternative splicing;
PG-M(V1), the larger ones (10, 9, and 8 kb) of the alternatively
spliced forms, which have the
domain of the chondroitin sulfate
attachment region and correspond to human versican; PG-M(V2), the
smaller ones (7.5 and 6.5 kb) of the alternatively spliced forms, which
have the
domain of the chondroitin sulfate attachment region.
The present study could not answer the question how the size
differences were generated within 10-, 9-, and 8-kb transcripts and
within 7.5- and 6.5-kb transcripts. The similar heterogeneity in size
was observed with versican in human osteogenic sarcoma
cells(28) . A rapid amplification of cDNA ends at the
5`-terminus did not yield 1- or 2-kb size difference among them. Aggrecan expressed by human cartilage has been reported to
undergo alternative splicing of the EGF-like domain at the
carboxyl-terminal G3 domain(29) . We therefore examined whether
or not the alternative splicing might also occur at the amino-terminal
and/or carboxyl-terminal regions of PG-M core protein to cause such
heterogeneity of transcripts. PCR amplification using several mouse
cDNA libraries and combinations of appropriate primers did not show the
possibility (yielded only a single band).
Thus, the
heterogeneity of 1- or 2-kb size might be caused by some differences of
the noncoding region at the 3`-end. Alternatively, it is also possible
that this size difference of transcripts might be caused by some
artificial degradation.
Although chick PG-M(V1) probably corresponds to human versican, the chondroitin sulfate attachment region of chick PG-M(V1) is still about 700 nucleotide residues longer than that of human versican or mouse PG-M(V1). The difference might be caused by a species specificity because of no detection on our trial basis of further alternative splicing in the chondroitin sulfate attachment region of PG-M(V1) (data not shown). Further analysis of the PG-M genomic gene would be needed to reveal their relationship.
In the present study, expression of PG-M transcripts of various sizes was identified in embryonic limb buds, in adult brain, and in cultured aortic endothelial cells. The results may suggest the tissue-dependent regulation of the alternative splicing. Consistent with these observations, immunological analysis for PG-M core proteins of various chick tissues suggested previously that there were at least four core molecules with different sizes (550, 500, 450, and 300-350 kDa) and their appearance varied in a tissue-dependent manner(1) . For example, all of these four core molecules were detected in embryonic aorta and lung, a 450-kDa core molecule in embryonic skeletal muscle, and a 300-350-kDa molecule in embryonic brain. In brain, existence of several smaller sized core molecules was also suggested. Our present study has suggested that alternative splicing of mRNA may cause such a multiplicity of PG-M core protein, although some may be derived from undergoing posttranslational modification as well. In vitro translation of various PG-M mRNA species and analysis of products would clarify their relationship. In addition, in situ hybridization with appropriate probes would reveal molecular forms of mRNAs expressed in certain tissues at certain stages.
Neurocan and brevican have recently been cloned and identified as chondroitin sulfate proteoglycans unique to brain(30, 31) . Their structures resemble those of PG-M, versican, and aggrecan with regard to the presence of the same domain elements at the amino- and carboxyl-terminal regions. The coding regions of the cDNAs are about 3.8 and 2.2 kb in size, and the mRNAs are about 7.5 and 3.3 kb in size for rat neurocan and bovine brevican, respectively. Considering these sizes, PG-M(V2) might be a mouse equivalent of rat neurocan. The comparison in amino acid sequences of the amino- and carboxyl-terminal domains between mouse PG-M(V2) and rat neurocan showed about 50% identity at the amino-terminal domains and 57% at the carboxyl-terminal domains. In contrast to the extremely high conservation of these domains of PG-M among different animal species, the identity between mouse PG-M(V2) and rat neurocan is rather low. Bovine brevican is distinctly shorter in size than PG-M(V2). In addition, the comparison between mouse PG-M(V2) and bovine brevican showed even to the highest degree a 64% identity at the amino-terminal B domain and a 61% identity at the carboxyl-terminal lectin-like domain. Taken together, these data indicate that PG-M of smaller sizes may be a different molecule from neurocan and brevican.
Some extracellular matrix proteins such as fibronectin, tropoelastin, tenasin, collagen (types II, VI, and XIII), and CD44 have been reported to have a diversity of molecular forms generated by alternative exon usage(32) . Several isoforms of these proteins have been shown to be regulated developmentally and/or in tissue-dependent manners(32) . Functional significance of alternative splicing has been extensively analyzed in fibronectin, and form-specific functions related to regulation of dimer formation, secretion(33) , cell adhesiveness(34) , and incorporation into fibrin clots (35) have been identified. The function of the alternatively spliced region of CD44 has been shown to reflect the ability to adhere to lymph node stromal cells(15) . Our present study has shown that the alternative usage of two different exons encoding chondroitin sulfate attachment domains may yield at least three different forms of PG-M with different sizes of the chondroitin sulfate attachment region. It has been shown in our laboratory that PG-M had a strong inhibitory effect on the adhesion of various types of cells to substrate-adhesive glycoproteins, and the attached chondroitin sulfate chains were responsible for this activity (26) . The finding emphasized the importance of chondroitin sulfate chains in the proteoglycan as an inhibitory modifier in the regulation of cell-substrate interactions and, therefore, suggests that alternative splicing in the chondroitin sulfate attachment domain may be related to one of important cellular mechanisms operative for the regulation of cell-matrix interactions during cell differentiation and tissue morphogenesis.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) D16263 [GenBank]and D28599[GenBank].