©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
The Gene Structure and Organization of Mouse PG-M, a Large Chondroitin Sulfate Proteoglycan
GENOMIC BACKGROUND FOR THE GENERATION OF MULTIPLE PG-M TRANSCRIPTS (*)

Tamayuki Shinomura , Masahiro Zako , Kazuo Ito , Minoru Ujita , Koji Kimata (§)

From the (1) Institute for Molecular Science of Medicine, Aichi Medical University, Nagakute, Aichi 480-11, Japan

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

We previously showed not only the presence of multiple RNA transcripts of different sizes encoding the core protein of mouse PG-M, but also their tissue-dependent expression. Major causes for the multiple forms were found to be due to alternative usage of the two different chondroitin sulfate attachment domains ( and ). In this study, genomic DNA analysis has revealed that these domains are encoded by two large exons, exon VII (2880 base pairs) and exon VIII (5229 base pairs). The splice sites of these two exons were consistent with the occurrence of alternative splicing without frameshift. Furthermore, the mouse PG-M gene was shown to have four distinct polyadenylation signals and three candidates for the transcription initiation site as well. These genomic structural variations may contribute to the multiplicity of PG-M transcripts. Northern hybridization analysis showed that at least three different transcripts were generated by different usage of the distinct polyadenylation signals.


INTRODUCTION

PG-M, a large chondroitin sulfate proteoglycan, was shown to be expressed preferentially in the mesenchymal cell condensation area of chick limb bud at the prechondrogenic stages (1, 2) . Expression in the area, however, is suppressed when aggrecan is expressed by cartilage differentiation. Such a transient expression pattern has suggested that PG-M may play some important roles as an extracellular factor influencing tissue differentiation and morphogenesis (3) .

We have isolated cDNA clones encoding the entire core proteins of chicken PG-M (4) and mouse PG-M (5) . Recently, the whole sequence of human PG-M has also been identified (6) , part of which was initially identified as versican (7) , corresponding to one of the alternatively spliced forms of PG-M (6) . Protein homology analysis of the deduced amino acid sequence of the core protein revealed the presence of a link protein-like sequence in the amino-terminal region and two epidermal growth factor-like sequences, one C-type lectin-like sequence, and one complement regulatory protein-like sequence in the carboxyl-terminal region. These amino- and carboxyl-terminal regions have been shown to have hyaluronan-binding and oligosaccharide-binding activities, respectively (8, 9) . In addition, the amino acid sequences of these regions are evolutionarily highly conserved. In contrast, the primary structure of the chondroitin sulfate attachment region localized at the middle of the core protein is not conserved. Furthermore, the expression of this region is regulated by alternative splicing, which generates four different forms of the PG-M core protein (5, 10) .

In previous studies, we showed the presence of at least seven mRNA species with various sizes of 12, 10, 9, 8, 7.5, 6.5, and 3 kb() encoding mouse PG-M core protein and that their appearance varied in a tissue-dependent manner (5, 10) . Major variation has been shown to be generated by alternative splicing of the chondroitin sulfate attachment region as described above. In this study, we determined the genomic DNA sequence, providing evidence not only for alternative splicing without frameshift, but also for the minor size variations in mRNA species due to different usage of multiple transcription sites and polyadenylation signals.


EXPERIMENTAL PROCEDURES

Isolation and Characterization of the Mouse PG-M Gene

A 129/sv mouse genomic DNA library in the Fix II vector (Stratagene, La Jolla, CA) was used for screening. About 5 10plaques were screened with various P-labeled cDNA and genomic DNA probes encoding mouse PG-M core protein. Restriction enzyme mapping and cross-hybridization analysis of isolated genomic clones were performed to identify overlapping clones. The genomic DNA inserts were excised from the phage DNA by digestion with NotI or SalI and then were subcloned into plasmid vector pGEM3Zf() (Promega) for further characterization. To map exons, results from restriction mapping, hybridization of Southern blots, and screening were combined. The size of each intron was determined by restriction mapping and, in some cases, by polymerase chain reaction (PCR) using primers located in the exons.

DNA Sequencing and Analysis

The nucleotide sequence of the isolated genomic DNA was determined by the dideoxy chain termination method (11) using oligonucleotide primers synthesized based on the cDNA sequences and the preceding sequences. The DNA sequences obtained were compiled and analyzed using DNASIS computer programs (Hitachi Software Engineering Co.).

Northern Blot Analysis

About 5 µg of poly(A)RNA purified from a cultured mouse aortic endothelial cell line (END-D) was used for Northern blot analysis as described previously (4, 5) . Five different cDNA probes were used for hybridization. Probe A corresponds to the hyaluronan-binding region (nucleotides 653-1220 in Fig. 2 A in Ref. 5). Probe B corresponds to a portion of the domain of the chondroitin sulfate attachment region (nucleotides 1486-6160 in Fig. 2 A in Ref. 5). Probes C-E, corresponding to different portions of the 3`-untranslated region, were prepared by PCR amplification using isolated genomic DNA as a template. Their exact positions are as follows: probe C, nucleotides 1-452; probe D, nucleotides 665-1169; and probe E, nucleotides 1280-1867 (see Fig. 3). Each probe was radiolabeled with [-P]dCTP (Amersham International, plc) and was used sequentially for hybridizations as described previously (5) . RNA sizes were determined using an RNA ladder (Life Technologies, Inc.).


Figure 2: Alignment of exon-intron boundaries and comparison with consensus sequences for splicing. Exon sequences are indicated by upper-case letters and intron sequences by lower-case letters. Exon sequences in the 5`-untranslated region are shown in italics. Amino acids encoded near and at splice junctions are indicated in three-letter code below their codons. Splice site consensus sequences are shown at the top and are adapted from the work of Shapiro and Senapathy (16). Py, pyrimidine.




Figure 3: 3`-Region (exon XV) of the mouse PG-M gene. The start site of exon XV is numbered 1. Four poly(A) signals are indicated by underlined boldface letters. Coding and noncoding regions are shown in lower-case and upper-case letters, respectively. The termination codon is underlined.



Analysis of the 5`-Ends of Mouse PG-M RNAs by PCR

The rapid amplification of cDNA ends was performed for identification of transcription initiation sites (12, 13) . For this purpose, a 5`-AmpliFINDER RACE kit (CLONTECH, Palo Alto, CA) was used. Briefly, 2 µg of END-D cell-derived poly(A)RNA was annealed with an antisense gene-specific primer corresponding to the cDNA sequence derived from the third exon (primer a, 5`-dGAGAGCCTTTAACAGGTGGGCTGGTTTCC-3`). To prevent an artificial termination of the reaction due to secondary structures of mRNA, reverse transcription was carried out with avian myeloblastosis virus reverse transcriptase at 52 °C for 30 min. The 3`-end of the resulting first-strand cDNA was ligated to a single-stranded oligonucleotide anchor provided in the kit by T4 RNA ligase. Then, an aliquot of the reaction product was used as a template for PCR amplification using a primer complementary to the anchor sequence (primer d) and a nested gene-specific primer corresponding to the junction of the first and second exons (primer b, 5`-dCAACATCTTGTCCTTGAAAGGCGGCTTACTGC-3`). The reaction was performed in a Model PJ 9600 DNA thermal cycler (Perkin-Elmer) using a GeneAmp DNA amplification reagent kit (Takara Shuzo Co.) under the conditions of 30 cycles at 95 °C for 1 min, 65 °C for 2 min, and 72 °C for 3 min. The product was then used as a template for the second PCR amplification. The reaction was performed by the combination of the anchor primer and an antisense gene-specific primer that was modified to contain an EcoRI site for post-amplification cloning (primer c, 5`-dTGGGAATTCACTGCTGAGCTGTGAGAAAGGGCTCCG-3`). The final product was analyzed by agarose gel electrophoresis (see Fig. 5). Regions of the gel containing specific products were isolated, and the DNAs were purified by JETSORB (GENOMED GmbH, Oeynhausen, Federal Republic of Germany) and cloned into a pGEM3Zf() vector after EcoRI digestion. The inserted cDNA sequences were analyzed by the dideoxy chain termination method (11) .


Figure 5: Determination of the mouse PG-M gene transcription initiation site by a modified 5`-RACE method. A, the positions of four different primers used for the reverse transcription reaction and PCR amplification are indicated by arrows a-d. Different exon boundaries are indicated by vertical arrows. EcoRI sites are indicated by open inverted triangles. mRNA and first-strand cDNA are indicated by thin lines. The anchor is indicated by the thick line. B, the products from the second PCR amplification were analyzed by agarose gel electrophoresis. The amplified DNA fragments are indicated by arrowheads. DNA size markers in base pairs are shown to the right.



Primer Extension Analysis

We used two different antisense oligonucleotide primers (5`-dAGCTGGAGAGGAGAGAGGATGCGCTGGTAG-3` and 5`-dTCACATAGGAAGCGCGGTGGCCTGGGGAAG-3`, corresponding to nucleotides +147 to +118 and nucleotides +198 to +169, respectively, in exon I of the mouse PG-M gene in Fig. 6). Each primer was end-labeled with [-P]dATP (Amersham International, plc) using T4 polynucleotide kinase (Takara Shuzo Co.) and was coprecipitated with either 15 µg of total RNA or 3 µg of poly(A)RNA from END-D cells. RNA and primer were resuspended in 15 µl of 150 mM KCl, 10 mM Tris-HCl, pH 8.3, 1 mM EDTA; heated at 65 °C for 60 min; and then cooled to room temperature. This mixture was then incubated with 30 µl of extension mixture (30 mM Tris-HCl, pH 8.3, 15 mM MgCl, 8 mM dithiothreitol, 1 mM dNTPs, and 0.22 mg/ml actinomycin D) and 1 ml (20 units) of avian myeloblastosis virus reverse transcriptase. After incubation at 42 °C for 60 min, RNase A and EDTA were added, and the mixture was incubated at 37 °C for 30 min. The reaction mixture was phenol- and chloroform-extracted, ethanol-precipitated, and analyzed on a 6% polyacrylamide gel containing 7 M urea.


Figure 6: 5`-Region of the mouse PG-M gene. The three transcription initiation sites in exon I are indicated by underlined boldface letters. A possible TATA sequence is underlined. The intron sequence is shown in lower-case letters.



S1 Nuclease Analysis

S1 nuclease protection assays were performed as described previously (14) . Briefly, single-stranded DNA probes complementary to the transcribed sequence were prepared by primer extension in the presence of P-end-labeled antisense oligonucleotide primers described under ``Primer Extension Analysis'' and Klenow DNA polymerase using a genomic DNA clone as a template. After digestion with SacI, radiolabeled fragments were purified by electrophoresis on an alkaline agarose gel to obtain single-stranded probes. We used two different probes that contained 264 and 315 nucleotides of sequence corresponding to nucleotides 117 to +147 and nucleotides 117 to +198, respectively, in Fig. 6. Approximately 1 10cpm of each probe was hybridized to 50 µg of total RNA in the presence of 80% formamide at 30 °C overnight after heat denaturation at 65 °C for 10 min. After digestion with S1 nuclease, protected products were analyzed on a 6% polyacrylamide gel containing 7 M urea.


RESULTS

Isolation of the Mouse PG-M Gene

A 129/sv mouse genomic DNA library was initially screened with cDNA probes derived from the mouse PG-M cDNA clones (5) . After several positive clones were isolated, subsequent clones were picked up using the 5`- or 3`-terminal regions of the preceding genomic DNA clones thus obtained as probes. A total of 11 independent and overlapping clones were isolated and represented a continuous and almost 100-kb genomic expanse (Fig. 1).


Figure 1: Organization of the mouse PG-M gene and alignment of isolated genomic DNA clones. Eleven phage clones representing an 100-kb genomic section are shown as numbered horizontal lines. Exons are indicated by closed boxes and are numbered from I to XV. The corresponding domains of the PG-M core protein are also shown. The name and characteristic of each domain are described under ``Results.'' Introns as well as 5`- and 3`-flanking regions are indicated by lines. The initiation (ATG) and termination (TGA) codons are indicated by arrows. The scale for 10 kb is indicated. EGF, epidermal growth factor; LEC, C-type lectin; CRP, complement regulatory protein.



Structural Organization of the Mouse PG-M Gene

By a combination of restriction enzyme mapping, PCR, and partial sequencing, the mouse PG-M gene was determined to contain 15 exons. All exon sequences including part of the exon-flanking regions were sequenced bidirectionally. Fig. 2summarizes the splice junction sequences. The placement and size of each exon scaled relative to the restriction-mapped almost 100-kb region are shown in Fig. 1. The structure of the mouse PG-M gene indicated a correlation between exon-intron architecture and the domain structure of the PG-M core protein. summarizes actual exon sizes and corresponding domain structures. Exon I contains part of the 5`-untranslated sequence. The remainder of the 5`-untranslated sequence is located in exon II, which contains the translation initiation site. Exons III-VI encode the hyaluronan-binding region, whose structure is similar to that of link protein and consists of three loop-like subdomains, loops A, B, and B`. The loop A structure shows homology to an immunoglobulin fold and is encoded by exon III. The loop B and B` subdomains form a tandem homologous repeat. The former is encoded by two exons, exons IV and V. In contrast, the loop B` subdomain is encoded by a single exon, exon VI. Exons VII and VIII are very large in size and encode two different chondroitin sulfate attachment domains, CS- and CS-, respectively (5) . These two exons are differently expressed in a tissue-dependent manner by alternative splicing (5, 10) . Finally, the carboxyl-terminal region, which has been shown to possess sugar-binding activity (8) , is encoded by seven exons. This region consists of two epidermal growth factor-like domains, one C-type lectin-like domain, and one complement regulatory protein-like domain. Two epidermal growth factor-like tandem repeats are encoded by exons IX and X. The C-type lectin-like motif is constructed from exons XI-XIII. A complement regulatory protein-like motif is located in exon XIV. Exon XV contains the rest of the carboxyl-terminal coding region and the whole 3`-untranslated region.

Polyadenylation Sites

The mouse PG-M gene is transcribed into at least seven mRNA species of different sizes (5, 10) . Their differences have been shown to be generated by alternative usage of either or both exons VII and VIII encoding the two different chondroitin sulfate attachment domains, CS- and CS- (5) , or by usage of neither (10) . However, it remains to be determined how the minor differences in size were generated, for example within 10-, 9-, and 8-kb transcripts detected in END-D cell mRNA species. Considering other comparable examples (15) , it is likely that the differences are due to alternative usage of polyadenylation sites. To investigate the possibility, exon XV encoding the 3`-untranslated region was sequenced over 2076 bp downstream from the termination codon. As shown in Fig. 3 , four different polyadenylation signals were detected. Based on the results, we carried out the following Northern blot analysis. Poly(A)RNA obtained from END-D cells was hybridized with DNA probes encoding different parts of the 3`-untranslated region as shown in Fig. 4. Three mRNA species of 10, 9, and 8 kb in size were detected by hybridization with probes corresponding to the hyaluronan-binding region (Fig. 4, lane 1) and the large chondroitin sulfate attachment domain, CS- ( lane 2), but not with the probe encoding the other chondroitin sulfate attachment domain, CS- (data not shown). The above three mRNA species were all hybridized with probe C (Fig. 4, lane 3), whereas only the larger mRNA species of 10 and 9 kb in size were positive when hybridized with probe D ( lanes 4). Furthermore, hybridization with probe E was restricted to the largest species of 10 kb in size. These results indicate that three mRNA species in END-D cells are generated by different usage of different polyadenylation signals.


Figure 4: Northern blot analysis of the mRNA encoding mouse PG-M core protein. Poly(A)RNA obtained from cultured END-D cells was analyzed by Northern blot hybridization using probe A ( lane 1), probe B ( lane 2), probe C ( lane 3), probe D ( lane 4), and probe E ( lane 5) as described under ``Experimental Procedures.'' The locations of probes C-E are shown at the bottom. The four polyadenylation signals, which are located at 305, 495, 1102, and 1540 bp in Fig. 3, are marked by closed circles. The termination codon is indicated by the arrow. The coding region encoding the carboxyl-terminal end of the PG-M core protein is indicated by boxes as follows: epidermal growth factor-like ( EGF), C-type lectin-like ( LEC), and complement regulatory protein-like ( CRP) domains. Three mRNA bands of different sizes are indicated in kilobases by the arrows to the left. The appearance of smearing bands was discussed previously (5).



Transcription Initiation Sites

The transcription initiation sites of the mouse PG-M gene were analyzed by primer extension, S1 nuclease protection, and modified 5`-RACE methods. Although the former two methods gave no significant result because of a weakness of the signal, the modified 5`-RACE method amplified three different DNA fragments as shown in Fig. 5. Sequence analysis of these fragments suggested the presence of three different initiation sites, which were located 313 (Site I), 280 (Site II), and 254 (Site III) bases upstream from the 3`-end of the first exon (Fig. 6). However, it is possible that some of the multiple transcription initiation sites might be due to an artificial termination of the reverse transcription reaction by the secondary structure rather than to the real 5`-end of the mRNA. In relation to this possibility, a possible TATA box was found 28 bases upstream from Site II that was numbered as position +1, while Sites I and III did not have a canonical TATA box in their vicinities. Further analysis would be needed to confirm the actual transcription initiation site.


DISCUSSION

In previous studies, we identified PG-M transcripts of various sizes in embryonic limb buds, adult brain, and cultured aortic endothelial cells (4, 5) . Each expression varied from tissue to tissue. Alternative splicing was a major cause of the variation (4, 5, 6, 10) . However, other causes have remained unexplained. We therefore examined the genomic background of the generation of various transcripts encoding the PG-M core protein. In this study, we have characterized the complete genomic structure of the mouse PG-M gene. The gene is large, with 15 exons distributed among 100 kb. The exon-intron architecture is completely consistent with the domain structure of the PG-M core protein. The chondroitin sulfate attachment region is encoded by two large exons, VII and VIII, spanning 3 and 5 kb, respectively. These exons are alternatively spliced out without frameshift, which yields four different forms (Fig. 7), as further discussed below. Similarly, other exons also have a potential for alternative splicing without frameshift, as indicated in Fig. 2. However, in spite of our extensive study of mRNAs from various tissues by reverse transcription-PCR, we have not identified other alternatively spliced forms (data not shown).


Figure 7: Schematic representation of the structures of four different mouse PG-M molecules. Thick lines represent the core proteins. The various structural domains are depicted as folded structures based on the presumed disulfide bonding. Thin lines represent chondroitin sulfate side chains, and the long thin line represents hyaluronan. The open arrow and sugar residues at the top indicate interactions between the carboxyl-terminal regions and those residues.



Our genomic cloning of the PG-M gene revealed the presence of four AATAAA polyadenylation sites (Fig. 3). Northern hybridization analysis using various probes corresponding to different polyadenylation sites suggested that the 10-, 9-, and 8-kb mRNAs detected in END-D cells were generated by different usage of different polyadenylation signals (Fig. 4). These messages were hybridized with hyaluronan-binding region and CS- probes, but not with the CS- probe, indicating that all of these transcripts encode PG-M(V1). Furthermore, by modified 5`-RACE analysis, we obtained results suggesting the presence of three possible transcription initiation sites (Figs. 5 and 6). However, such a multiplicity of transcription sites was not significant enough to cause the observed size differences in the mRNAs. Further study would be needed to completely explain the variation.

In conclusion, we have identified four different PG-M splice variants, schematically shown in Fig. 7 . A transcript containing both exons VII and VIII corresponds to the PG-M(V0) form. When either of the two exons is spliced out, PG-M(V1) or PG-M(V2) is generated. When both are spliced out, PG-M(V3) is generated. These isoforms may have distinct functions. Supporting this hypothesis, we have shown that different splice forms of the PG-M molecule are expressed in distinct spatial and temporal patterns (5, 6, 10) . The difference in the isoforms may affect the extracellular functions of PG-M in which chondroitin sulfate chains are involved. Thus, the expression of different PG-M molecules might influence the formation of the extracellular network and could modulate cell-matrix and cell-cell interactions.

  
Table: Summary of exon sizes of the mouse PG-M gene and the corresponding domain of the transcript

The name and characteristic of each domain are described in the text except for the following: 5`-UTR, 5`-untranslated region; SP, signal peptide domain; and Stop, 3`-terminal region containing translated and untranslated sequences.



FOOTNOTES

*
This work was supported in part by grants-in-aid from the Ministry of Education, Culture, and Science of Japan, by special coordination funds for promoting science and technology from the Science and Technology Agency, and by a special research fund from Seikagaku Corp. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked `` advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequences reported in this paper have been submitted to the GSDB/DDBJ/EMBL/NCBI data bases with accession numbers D45888 (5`-promoter region of the mouse gene for the PG-M core protein (800 bp)) and D45889 (3`-noncoding region of the mouse gene for the PG-M core protein (2201 bp)).

§
To whom correspondence should be addressed. Tel.: 0561-62-3311 (ext. 2088); Fax: 0561-63-3532.

The abbreviations used are: kb, kilobase pair(s); bp, base pair(s); PCR, polymerase chain reaction; CS, chondroitin sulfate; 5`-RACE, rapid amplification of cDNA 5`-ends.


ACKNOWLEDGEMENTS

We are grateful to Y. Noda for assistance. We are also grateful to Drs. G. Eguchi and T. Takeuchi (National Institute for Basic Biology) for providing the END-D cells.


REFERENCES
  1. Kimata, K., Oike, Y., Tani, K., Shinomura, T., Yamagata, M., Uritani, M., and Suzuki, S. (1986) J. Biol. Chem. 261, 13517-13525 [Abstract/Free Full Text]
  2. Shinomura, T., Jensen, K. L., Yamagata, M., Kimata, K., and Solursh, M. (1990) Anat. Embryol. 181, 227-233 [Medline] [Order article via Infotrieve]
  3. Shinomura, T., and Kimata, K. (1990) Dev. Growth & Differ. 32, 243-248
  4. Shinomura, T., Nishida, Y., Ito, K., and Kimata, K. (1993) J. Biol. Chem. 268, 14461-14469 [Abstract/Free Full Text]
  5. Ito, K., Shinomura, T., Zako, M., Ujita, M., and Kimata, K. (1995) J. Biol. Chem. 270, 958-965 [Abstract/Free Full Text]
  6. Dours-Zimmermann, M. T., and Zimmermann, D. R. (1994) J. Biol. Chem. 269, 32992-32998 [Abstract/Free Full Text]
  7. Zimmermann, D. R., and Ruoslahti, E. (1989) EMBO J. 8, 2975-2981 [Abstract]
  8. Ujita, M., Shinomura, T., Ito, K., Kitagawa, Y., and Kimata, K. (1994) J. Biol. Chem. 269, 27603-27609 [Abstract/Free Full Text]
  9. LeBaron, R. G., Zimmermann, D. R., and Ruoslahti, E. (1992) J. Biol. Chem. 267, 10003-10010 [Abstract/Free Full Text]
  10. Zako, M., Shinomura, T., Ujita, M., Ito, K., and Kimata, K. (1995) J. Biol. Chem. 270, 3914-3918 [Abstract/Free Full Text]
  11. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U. S. A. 74, 5463-5467 [Abstract]
  12. Frohman, M. A., Dush, M. K., and Martin, G. R. (1988) Proc. Natl. Acad. Sci. U. S. A. 85, 8998-9002 [Abstract]
  13. Edwards, J. B. D. M., Delort, J., and Mallet, J. (1991) Nucleic Acids Res. 19, 5227-5232 [Abstract]
  14. Greene, J. M., and Struhl, K. (1994) in Current Protocols in Molecular Biology (Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., and Struhl, K., eds) pp. 4.6.1-4.6.13, John Wiley & Sons, Inc., New York
  15. Vihinen, T., Auvinen, P., Alanen-Kurki, L., and Jalkanen, M. (1993) J. Biol. Chem. 268, 17261-17269 [Abstract/Free Full Text]
  16. Shapiro, M. B., and Senapathy, P. (1987) Nucleic Acids Res. 15, 7155-7174 [Abstract]

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.