From the
We previously showed not only the presence of multiple RNA
transcripts of different sizes encoding the core protein of mouse PG-M,
but also their tissue-dependent expression. Major causes for the
multiple forms were found to be due to alternative usage of the two
different chondroitin sulfate attachment domains (
PG-M, a large chondroitin sulfate proteoglycan, was shown to be
expressed preferentially in the mesenchymal cell condensation area of
chick limb bud at the prechondrogenic stages
(1, 2) .
Expression in the area, however, is suppressed when aggrecan is
expressed by cartilage differentiation. Such a transient expression
pattern has suggested that PG-M may play some important roles as an
extracellular factor influencing tissue differentiation and
morphogenesis
(3) .
We have isolated cDNA clones encoding the
entire core proteins of chicken PG-M
(4) and mouse PG-M
(5) . Recently, the whole sequence of human PG-M has also been
identified
(6) , part of which was initially identified as
versican
(7) , corresponding to one of the alternatively spliced
forms of PG-M
(6) . Protein homology analysis of the deduced
amino acid sequence of the core protein revealed the presence of a link
protein-like sequence in the amino-terminal region and two epidermal
growth factor-like sequences, one C-type lectin-like sequence, and one
complement regulatory protein-like sequence in the carboxyl-terminal
region. These amino- and carboxyl-terminal regions have been shown to
have hyaluronan-binding and oligosaccharide-binding activities,
respectively
(8, 9) . In addition, the amino acid
sequences of these regions are evolutionarily highly conserved. In
contrast, the primary structure of the chondroitin sulfate attachment
region localized at the middle of the core protein is not conserved.
Furthermore, the expression of this region is regulated by alternative
splicing, which generates four different forms of the PG-M core protein
(5, 10) .
In previous studies, we showed the presence
of at least seven mRNA species with various sizes of 12, 10, 9, 8, 7.5,
6.5, and 3 kb
In previous studies, we identified PG-M transcripts of
various sizes in embryonic limb buds, adult brain, and cultured aortic
endothelial cells
(4, 5) . Each expression varied from
tissue to tissue. Alternative splicing was a major cause of the
variation
(4, 5, 6, 10) . However, other
causes have remained unexplained. We therefore examined the genomic
background of the generation of various transcripts encoding the PG-M
core protein. In this study, we have characterized the complete genomic
structure of the mouse PG-M gene. The gene is large, with 15 exons
distributed among 100 kb. The exon-intron architecture is completely
consistent with the domain structure of the PG-M core protein. The
chondroitin sulfate attachment region is encoded by two large exons,
VII and VIII, spanning
In conclusion, we have
identified four different PG-M splice variants, schematically shown in
Fig. 7
. A transcript containing both exons VII and VIII
corresponds to the PG-M(V0) form. When either of the two exons is
spliced out, PG-M(V1) or PG-M(V2) is generated. When both are spliced
out, PG-M(V3) is generated. These isoforms may have distinct functions.
Supporting this hypothesis, we have shown that different splice forms
of the PG-M molecule are expressed in distinct spatial and temporal
patterns
(5, 6, 10) . The difference in the
isoforms may affect the extracellular functions of PG-M in which
chondroitin sulfate chains are involved. Thus, the expression of
different PG-M molecules might influence the formation of the
extracellular network and could modulate cell-matrix and cell-cell
interactions.
The name and
characteristic of each domain are described in the text except for the
following: 5`-UTR, 5`-untranslated region; SP, signal peptide domain;
and Stop, 3`-terminal region containing translated and untranslated
sequences.
The nucleotide
sequences reported in this paper have been submitted to the
GSDB/DDBJ/EMBL/NCBI data bases with accession numbers D45888
(5`-promoter region of the mouse gene for the PG-M core protein (800
bp)) and D45889 (3`-noncoding region of the mouse gene for the PG-M
core protein (2201 bp)).
We are grateful to Y. Noda for assistance. We are also
grateful to Drs. G. Eguchi and T. Takeuchi (National Institute for
Basic Biology) for providing the END-D cells.
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES
and
). In
this study, genomic DNA analysis has revealed that these domains are
encoded by two large exons, exon VII (2880 base pairs) and exon VIII
(5229 base pairs). The splice sites of these two exons were consistent
with the occurrence of alternative splicing without frameshift.
Furthermore, the mouse PG-M gene was shown to have four distinct
polyadenylation signals and three candidates for the transcription
initiation site as well. These genomic structural variations may
contribute to the multiplicity of PG-M transcripts. Northern
hybridization analysis showed that at least three different transcripts
were generated by different usage of the distinct polyadenylation
signals.
(
)
encoding mouse PG-M core protein
and that their appearance varied in a tissue-dependent manner
(5, 10) . Major variation has been shown to be generated
by alternative splicing of the chondroitin sulfate attachment region as
described above. In this study, we determined the genomic DNA sequence,
providing evidence not only for alternative splicing without
frameshift, but also for the minor size variations in mRNA species due
to different usage of multiple transcription sites and polyadenylation
signals.
Isolation and Characterization of the Mouse PG-M
Gene
A 129/sv mouse genomic DNA library in the Fix II
vector (Stratagene, La Jolla, CA) was used for screening. About 5
10
plaques were screened with various
P-labeled cDNA and genomic DNA probes encoding mouse PG-M
core protein. Restriction enzyme mapping and cross-hybridization
analysis of isolated genomic clones were performed to identify
overlapping clones. The genomic DNA inserts were excised from the phage
DNA by digestion with NotI or SalI and then were
subcloned into plasmid vector pGEM3Zf(
) (Promega) for further
characterization. To map exons, results from restriction mapping,
hybridization of Southern blots, and screening were combined. The size
of each intron was determined by restriction mapping and, in some
cases, by polymerase chain reaction (PCR) using primers located in the
exons.
DNA Sequencing and Analysis
The nucleotide
sequence of the isolated genomic DNA was determined by the dideoxy
chain termination method
(11) using oligonucleotide primers
synthesized based on the cDNA sequences and the preceding sequences.
The DNA sequences obtained were compiled and analyzed using DNASIS
computer programs (Hitachi Software Engineering Co.).
Northern Blot Analysis
About 5 µg of
poly(A)RNA purified from a cultured mouse aortic
endothelial cell line (END-D) was used for Northern blot analysis as
described previously
(4, 5) . Five different cDNA probes
were used for hybridization. Probe A corresponds to the
hyaluronan-binding region (nucleotides 653-1220 in Fig. 2 A in Ref. 5). Probe B corresponds to a portion of the domain
of the chondroitin sulfate attachment region (nucleotides
1486-6160 in Fig. 2 A in Ref. 5). Probes C-E,
corresponding to different portions of the 3`-untranslated region, were
prepared by PCR amplification using isolated genomic DNA as a template.
Their exact positions are as follows: probe C, nucleotides 1-452;
probe D, nucleotides 665-1169; and probe E, nucleotides 1280-1867
(see Fig. 3). Each probe was radiolabeled with
[
-
P]dCTP (Amersham International, plc) and
was used sequentially for hybridizations as described previously
(5) . RNA sizes were determined using an RNA ladder (Life
Technologies, Inc.).
Figure 2:
Alignment of
exon-intron boundaries and comparison with consensus sequences for
splicing. Exon sequences are indicated by upper-case letters and intron sequences by lower-case letters. Exon
sequences in the 5`-untranslated region are shown in italics.
Amino acids encoded near and at splice junctions are indicated in
three-letter code below their codons. Splice site consensus
sequences are shown at the top and are adapted from the work of Shapiro
and Senapathy (16). Py, pyrimidine.
Figure 3:
3`-Region
(exon XV) of the mouse PG-M gene. The start site of exon XV is numbered
1. Four poly(A) signals are indicated by underlined boldface letters. Coding and noncoding regions
are shown in lower-case and upper-case letters,
respectively. The termination codon is
underlined.
Analysis of the 5`-Ends of Mouse PG-M RNAs by
PCR
The rapid amplification of cDNA ends was performed for
identification of transcription initiation sites
(12, 13) . For this purpose, a 5`-AmpliFINDER RACE kit
(CLONTECH, Palo Alto, CA) was used. Briefly, 2 µg of END-D
cell-derived poly(A)RNA was annealed with an
antisense gene-specific primer corresponding to the cDNA sequence
derived from the third exon (primer a,
5`-dGAGAGCCTTTAACAGGTGGGCTGGTTTCC-3`). To prevent an artificial
termination of the reaction due to secondary structures of mRNA,
reverse transcription was carried out with avian myeloblastosis virus
reverse transcriptase at 52 °C for 30 min. The 3`-end of the
resulting first-strand cDNA was ligated to a single-stranded
oligonucleotide anchor provided in the kit by T4 RNA ligase. Then, an
aliquot of the reaction product was used as a template for PCR
amplification using a primer complementary to the anchor sequence
(primer d) and a nested gene-specific primer corresponding to the
junction of the first and second exons (primer b,
5`-dCAACATCTTGTCCTTGAAAGGCGGCTTACTGC-3`). The reaction was performed in
a Model PJ 9600 DNA thermal cycler (Perkin-Elmer) using a GeneAmp DNA
amplification reagent kit (Takara Shuzo Co.) under the conditions of 30
cycles at 95 °C for 1 min, 65 °C for 2 min, and 72 °C for 3
min. The product was then used as a template for the second PCR
amplification. The reaction was performed by the combination of the
anchor primer and an antisense gene-specific primer that was modified
to contain an EcoRI site for post-amplification cloning
(primer c, 5`-dTGGGAATTCACTGCTGAGCTGTGAGAAAGGGCTCCG-3`). The final
product was analyzed by agarose gel electrophoresis (see Fig. 5).
Regions of the gel containing specific products were isolated, and the
DNAs were purified by JETSORB (GENOMED GmbH, Oeynhausen, Federal
Republic of Germany) and cloned into a pGEM3Zf(
) vector after
EcoRI digestion. The inserted cDNA sequences were analyzed by
the dideoxy chain termination method
(11) .
Figure 5:
Determination of the mouse PG-M gene
transcription initiation site by a modified 5`-RACE method. A,
the positions of four different primers used for the reverse
transcription reaction and PCR amplification are indicated by
arrows a-d. Different exon boundaries are
indicated by vertical arrows. EcoRI sites
are indicated by open inverted triangles. mRNA and
first-strand cDNA are indicated by thin lines. The
anchor is indicated by the thick line. B,
the products from the second PCR amplification were analyzed by agarose
gel electrophoresis. The amplified DNA fragments are indicated by
arrowheads. DNA size markers in base pairs are shown to the
right.
Primer Extension Analysis
We used two different
antisense oligonucleotide primers
(5`-dAGCTGGAGAGGAGAGAGGATGCGCTGGTAG-3` and
5`-dTCACATAGGAAGCGCGGTGGCCTGGGGAAG-3`, corresponding to nucleotides
+147 to +118 and nucleotides +198 to +169,
respectively, in exon I of the mouse PG-M gene in Fig. 6). Each
primer was end-labeled with [-
P]dATP
(Amersham International, plc) using T4 polynucleotide kinase (Takara
Shuzo Co.) and was coprecipitated with either 15 µg of total RNA or
3 µg of poly(A)
RNA from END-D cells. RNA and
primer were resuspended in 15 µl of 150 mM KCl, 10
mM Tris-HCl, pH 8.3, 1 mM EDTA; heated at 65 °C
for 60 min; and then cooled to room temperature. This mixture was then
incubated with 30 µl of extension mixture (30 mM Tris-HCl,
pH 8.3, 15 mM MgCl
, 8 mM dithiothreitol,
1 mM dNTPs, and 0.22 mg/ml actinomycin D) and 1 ml (20 units)
of avian myeloblastosis virus reverse transcriptase. After incubation
at 42 °C for 60 min, RNase A and EDTA were added, and the mixture
was incubated at 37 °C for 30 min. The reaction mixture was phenol-
and chloroform-extracted, ethanol-precipitated, and analyzed on a 6%
polyacrylamide gel containing 7 M urea.
Figure 6:
5`-Region of the mouse PG-M gene. The
three transcription initiation sites in exon I are indicated by
underlined boldface letters. A possible TATA sequence is
underlined. The intron sequence is shown in lower-case
letters.
S1 Nuclease Analysis
S1 nuclease protection assays
were performed as described previously
(14) . Briefly,
single-stranded DNA probes complementary to the transcribed sequence
were prepared by primer extension in the presence of
P-end-labeled antisense oligonucleotide primers described
under ``Primer Extension Analysis'' and Klenow DNA polymerase
using a genomic DNA clone as a template. After digestion with
SacI, radiolabeled fragments were purified by electrophoresis
on an alkaline agarose gel to obtain single-stranded probes. We used
two different probes that contained 264 and 315 nucleotides of sequence
corresponding to nucleotides
117 to +147 and nucleotides
117 to +198, respectively, in Fig. 6. Approximately 1
10
cpm of each probe was hybridized to 50 µg of
total RNA in the presence of 80% formamide at 30 °C overnight after
heat denaturation at 65 °C for 10 min. After digestion with S1
nuclease, protected products were analyzed on a 6% polyacrylamide gel
containing 7 M urea.
Isolation of the Mouse PG-M Gene
A 129/sv mouse
genomic DNA library was initially screened with cDNA probes derived
from the mouse PG-M cDNA clones
(5) . After several positive
clones were isolated, subsequent clones were picked up using the 5`- or
3`-terminal regions of the preceding genomic DNA clones thus obtained
as probes. A total of 11 independent and overlapping clones were
isolated and represented a continuous and almost 100-kb genomic expanse
(Fig. 1).
Figure 1:
Organization of the mouse PG-M gene
and alignment of isolated genomic DNA clones. Eleven phage clones
representing an 100-kb genomic section are shown as numbered
horizontal lines. Exons are indicated by closed boxes and are numbered from I to XV. The corresponding
domains of the PG-M core protein are also shown. The name and
characteristic of each domain are described under
``Results.'' Introns as well as 5`- and 3`-flanking regions
are indicated by lines. The initiation (ATG) and termination
(TGA) codons are indicated by arrows. The scale for 10 kb is
indicated. EGF, epidermal growth factor; LEC, C-type
lectin; CRP, complement regulatory protein.
Structural Organization of the Mouse PG-M Gene
By
a combination of restriction enzyme mapping, PCR, and partial
sequencing, the mouse PG-M gene was determined to contain 15 exons. All
exon sequences including part of the exon-flanking regions were
sequenced bidirectionally. Fig. 2summarizes the splice junction
sequences. The placement and size of each exon scaled relative to the
restriction-mapped almost 100-kb region are shown in Fig. 1. The
structure of the mouse PG-M gene indicated a correlation between
exon-intron architecture and the domain structure of the PG-M core
protein. summarizes actual exon sizes and corresponding
domain structures. Exon I contains part of the 5`-untranslated
sequence. The remainder of the 5`-untranslated sequence is located in
exon II, which contains the translation initiation site. Exons
III-VI encode the hyaluronan-binding region, whose structure is
similar to that of link protein and consists of three loop-like
subdomains, loops A, B, and B`. The loop A structure shows homology to
an immunoglobulin fold and is encoded by exon III. The loop B and B`
subdomains form a tandem homologous repeat. The former is encoded by
two exons, exons IV and V. In contrast, the loop B` subdomain is
encoded by a single exon, exon VI. Exons VII and VIII are very large in
size and encode two different chondroitin sulfate attachment domains,
CS- and CS-
, respectively
(5) . These two exons are
differently expressed in a tissue-dependent manner by alternative
splicing
(5, 10) . Finally, the carboxyl-terminal
region, which has been shown to possess sugar-binding activity
(8) , is encoded by seven exons. This region consists of two
epidermal growth factor-like domains, one C-type lectin-like domain,
and one complement regulatory protein-like domain. Two epidermal growth
factor-like tandem repeats are encoded by exons IX and X. The C-type
lectin-like motif is constructed from exons XI-XIII. A complement
regulatory protein-like motif is located in exon XIV. Exon XV contains
the rest of the carboxyl-terminal coding region and the whole
3`-untranslated region.
Polyadenylation Sites
The mouse PG-M gene is
transcribed into at least seven mRNA species of different sizes
(5, 10) . Their differences have been shown to be
generated by alternative usage of either or both exons VII and VIII
encoding the two different chondroitin sulfate attachment domains,
CS- and CS-
(5) , or by usage of neither
(10) .
However, it remains to be determined how the minor differences in size
were generated, for example within 10-, 9-, and 8-kb transcripts
detected in END-D cell mRNA species. Considering other comparable
examples
(15) , it is likely that the differences are due to
alternative usage of polyadenylation sites. To investigate the
possibility, exon XV encoding the 3`-untranslated region was sequenced
over 2076 bp downstream from the termination codon. As shown in
Fig. 3
, four different polyadenylation signals were detected.
Based on the results, we carried out the following Northern blot
analysis. Poly(A)
RNA obtained from END-D cells was
hybridized with DNA probes encoding different parts of the
3`-untranslated region as shown in Fig. 4. Three mRNA species of
10, 9, and 8 kb in size were detected by hybridization with probes
corresponding to the hyaluronan-binding region (Fig. 4, lane 1) and the large chondroitin sulfate attachment domain,
CS-
( lane 2), but not with the probe encoding
the other chondroitin sulfate attachment domain, CS-
(data not
shown). The above three mRNA species were all hybridized with probe C
(Fig. 4, lane 3), whereas only the larger mRNA
species of 10 and 9 kb in size were positive when hybridized with probe
D ( lanes 4). Furthermore, hybridization with probe E
was restricted to the largest species of 10 kb in size. These results
indicate that three mRNA species in END-D cells are generated by
different usage of different polyadenylation signals.
Figure 4:
Northern blot analysis of the mRNA
encoding mouse PG-M core protein. Poly(A)RNA obtained
from cultured END-D cells was analyzed by Northern blot hybridization
using probe A ( lane 1), probe B ( lane 2), probe C ( lane 3), probe D ( lane 4), and probe E ( lane 5) as described
under ``Experimental Procedures.'' The locations of probes
C-E are shown at the bottom. The four polyadenylation signals,
which are located at 305, 495, 1102, and 1540 bp in Fig. 3, are marked
by closed circles. The termination codon is indicated
by the arrow. The coding region encoding the carboxyl-terminal
end of the PG-M core protein is indicated by boxes as follows:
epidermal growth factor-like ( EGF), C-type lectin-like
( LEC), and complement regulatory protein-like ( CRP)
domains. Three mRNA bands of different sizes are indicated in kilobases
by the arrows to the left. The appearance of smearing bands
was discussed previously (5).
Transcription Initiation Sites
The transcription
initiation sites of the mouse PG-M gene were analyzed by primer
extension, S1 nuclease protection, and modified 5`-RACE methods.
Although the former two methods gave no significant result because of a
weakness of the signal, the modified 5`-RACE method amplified three
different DNA fragments as shown in Fig. 5. Sequence analysis of
these fragments suggested the presence of three different initiation
sites, which were located 313 (Site I), 280 (Site II), and 254 (Site
III) bases upstream from the 3`-end of the first exon (Fig. 6).
However, it is possible that some of the multiple transcription
initiation sites might be due to an artificial termination of the
reverse transcription reaction by the secondary structure rather than
to the real 5`-end of the mRNA. In relation to this possibility, a
possible TATA box was found 28 bases upstream from Site II that was
numbered as position +1, while Sites I and III did not have a
canonical TATA box in their vicinities. Further analysis would be
needed to confirm the actual transcription initiation site.
3 and 5 kb, respectively. These exons are
alternatively spliced out without frameshift, which yields four
different forms (Fig. 7), as further discussed below. Similarly,
other exons also have a potential for alternative splicing without
frameshift, as indicated in Fig. 2. However, in spite of our
extensive study of mRNAs from various tissues by reverse
transcription-PCR, we have not identified other alternatively spliced
forms (data not shown).
Figure 7:
Schematic representation of the structures
of four different mouse PG-M molecules. Thick lines represent the core proteins. The various structural domains are
depicted as folded structures based on the presumed disulfide bonding.
Thin lines represent chondroitin sulfate side chains,
and the long thin line represents
hyaluronan. The open arrow and sugar residues at the
top indicate interactions between the carboxyl-terminal regions and
those residues.
Our genomic cloning of the PG-M gene
revealed the presence of four AATAAA polyadenylation sites
(Fig. 3). Northern hybridization analysis using various probes
corresponding to different polyadenylation sites suggested that the
10-, 9-, and 8-kb mRNAs detected in END-D cells were generated by
different usage of different polyadenylation signals (Fig. 4).
These messages were hybridized with hyaluronan-binding region and
CS- probes, but not with the CS-
probe, indicating that all
of these transcripts encode PG-M(V1). Furthermore, by modified 5`-RACE
analysis, we obtained results suggesting the presence of three possible
transcription initiation sites (Figs. 5 and 6). However, such a
multiplicity of transcription sites was not significant enough to cause
the observed size differences in the mRNAs. Further study would be
needed to completely explain the variation.
Table: Summary of exon sizes of the mouse PG-M gene
and the corresponding domain of the transcript
©1995 by The American Society for Biochemistry and Molecular Biology, Inc.