From the
A novel 85-kDa protein secreted by the mouse stro-mal osteogenic
cell line MN7 was identified using two-dimensional polyacrylamide gel
electrophoresis (Mathieu, E., Meheus, L., Raymackers, J., and
Merregaert, J.(1994) J. Bone Miner. Res. 9, 903-913).
Degenerate primers were used to isolate the cDNA coding for this
protein. The full-length cDNA clone is 1.9 kilobases (kb) and codes for
a protein of 559 amino acid residues. The DNA and deduced amino acid
sequences have no counterparts in public data bases, but a structural
similarity involving typical cysteine doublets can be observed to serum
albumin family proteins and to Endo16 (a calcium-binding protein of sea
urchin). Northern blot analysis revealed the presence of a 1.9-kb
transcript in various tissues, and a shorter transcript of 1.5 kb,
derived by alternative splicing in tail, front paw and skin of
embryonic mice. The gene for the p85 protein, termed Ecm1 (for
extracellular matrix protein 1), is a single-copy gene, which was
localized to the region on mouse chromosome 3 known to contain at least
one locus associated with developmental disorders of the skin, soft
coat (soc). Alternative splicing may serve as a mechanism for
generating functional diversity in the Ecm1 gene.
The process of embryonic bone formation involves the creation of
an extracellular matrix that mineralizes during the course of tissue
maturation. This matrix is subject to constant remodeling during the
lifetime of an individual, through the combined actions of osteoblasts
and osteoclasts. A careful balance of matrix formation and resorption
must be maintained because perturbations can result in various bone
disorders (reviewed in Ref. 1).
The extracellular matrix of bone
consists of two phases, an organic phase and a mineral one. The organic
phase consists primarily of the collagen type I fibrils that are
associated with a number of noncollagenous matrix proteins. Interest in
the noncollagenous proteins of the bone was greatly stimulated when
Urist first demonstrated that demineralized bone extracts could induce
ectopic bone formation(2) . Non-collagenous proteins of bone are
now believed to be involved in mineralization as well as the local
regulation of bone cell function (3, 4). In the past few years, a
number of noncollagenous proteins of bone have been isolated and
characterized; among these are osteocalcin, osteopontin, osteonectin,
and bone sialoprotein(4, 5) .
In order to study the
properties of bone-forming cells, our laboratory has established a
clonal osteogenic cell line (MN7) from bone marrow stroma of the adult
mouse(6) . These cells, under appropriate conditions, undergo
typical osteoblastic differentiation in vitro and are able to
form a mineralized extracellular matrix(7, 8) .
To
characterize the proteins secreted by MN7 cells during in vitro proliferation and differentiation, two-dimensional polyacrylamide
gel electrophoretic (2D SDS-PAGE)(
In the present study we report the
isolation of the p85 cDNA clone. The full-length cDNA contains an ORF
of 1677 bp encoding a protein of 559 amino acids. Computer analysis of
the deduced primary amino acid sequence revealed a hydrophobic signal
peptide characteristic of a secreted protein. Motif analyses did not
identify features typical for known protein families. The message of
1.9 kb is expressed in various tissues such as liver, heart, lungs,
etc., whereas a splice variant was present in embryonic cartilage and
skin. The corresponding gene for p85 (called Ecm1 for
extracellular matrix protein 1), maps on chromosome (Chr) 3 of mouse in
a region containing several loci involved in skin developmental
disorders.
All cell culture materials were purchased from Life
Technologies, Inc. (Ghent, Belgium). RNA for making the cDNA library
was isolated from MN7 cells that were grown as described
before(9) . Briefly, cells were grown in BGJ-B medium
(Biggers-Gwatkin-Jones Bone medium, Fitton-Jackson modification)
supplemented with 10% fetal calf serum and 2 mML-glutamine. After reaching confluence, the cells were
grown in serum-free medium (BGJ-B with ITS
(insulin-transferrin-selenium premix, purchased from Sigma)) for 48 h
before harvesting. MC3T3-E
For
time course expression, MN7 cells were grown under conditions optimized
for inducing mineralization(8) . Briefly, the cells were seeded
at a density of 5
Standard blotting and hybridization procedures were
used(14) . 10-µg quantities of total RNAs were
electrophoresed in formaldehyde gels, blotted onto Hybond N membranes
(Amersham Corp.), UV cross-linked, and then hybridized (Amersham
hybridization protocols). Quantitive comparisons of RNAs present on the
blots were made using the murine glyceraldehyde-3-phosphate
dehydrogenase probe(15) .
Mouse genomic DNA was digested with different enzymes and
separated on 0.8% agarose gels. The DNAs were blotted onto Hybond N
membranes, UV cross-linked, and hybridized.
A directional cDNA library in the pSPORT
The library was
screened by colony hybridization using selected (see
``Results''), end-labeled oligonucleotides as a probe.
Labeling of the oligonucleotides was performed as described (14) using T
Microsequencing of the p85 protein was done on spots isolated
from dried two-dimensional gels as described earlier(9) . One
N-terminal and two internal peptide sequences were obtained. Among
these, a 6-amino acid N-terminal sequence, DQREMT, had minimum
codon degeneracy and was used to design degenerate oligonucleotide
primers with the help of the PGEN computer program (16). The
oligonucleotides were synthesized commercially (British Bio-technology
Products Ltd.) as 17-mers, in three sets of 16 degeneracies each. The
sequences of the oligonucleotides were: 1)
5`-CTRGTYGCRCTYTACTG-3`, 2)
5`-CTRGTYGCYCTYTACTG-3`, and 3)
5`-CTRGTYTCYCTYTACTG-3`, where (R = A or G and Y = C or T).
The cDNA clone of p85 was sequenced by the dideoxynucleotide
chain termination method (17) using
[
For computer analysis of the
sequence data, various software packages were used in the course of
this study. They are mentioned under ``Results'' where
appropriate.
The 5` sequence of the message was obtained using the 5`-RACE
(Rapid Amplification of cDNA Ends) amplification system of
Life Technologies, Inc. The three gene-specific primers used are listed
below. NotI and SalI linkers designed for cloning are
indicated in parentheses; numbers indicate positions on the cDNA (see Fig. 1): GSP1(antisense, 313-289),
5`-CAGAGTGGTCAACTCGGAGTCTCCG-3`; GSP2 (antisense, 222-198),
5`-{TCGCGAACTAGTGCGGCCGC}CGTCATCTCTCGCTGGTCTGAAGC-3`;
GSP3 (antisense, 178-154),
5`-{TCGCGAACTAGTGCGGCCGC}CAGCAGAAGCAAGAGCCAAGCAGGC-3`.
Reverse transcription PCR was used to clone the
differentially spliced p85 transcript. 1 µg of total RNA from mouse
embryonic front paw and tail were reverse transcribed using the
preamplification kit of Life Technologies, Inc. This was followed by
two rounds of PCR amplification using cDNA-specific primers. The
primers used are listed below. NotI and SalI linkers
designed for cloning are indicated in parentheses. Numbers indicate
positions on the cDNA (Fig. 1): N1 (upper, 29-53),
5`-CAGGCTTGCAGCAAGTGGCCCAACC-3`; N2 (upper, 61-86),
5`-{TCGCGAGTCGAC}GTCTAGATCTGCCTGTGACAACCAGC-3`;C1
(lower, 1870-1847), 5`-GTAATGAGTGTTCGAGGAGGGTGG-3`; C2 (lower,
1806-1784),
5`-{TCGCGAGCGGCCGC}GGTGACTCATTCTTCCTTGGACC-3`.
PCR
conditions used during the experiment were denaturation at: 95 °C
for 5 min; 36 cycles of 95 °C, 3 s; 55 °C, 30 s; 72 °C,
2.30 min; and a final extension of 10 min at 72 °C. Products of the
first amplification (using N1 and C1 primers) were purified by phenol
extraction and then subjected to a second amplification using N2 and C1
primers. The PCR products were analyzed on agarose gels, purified, and
subsequently cloned in the NotI and SalI sites of
pSPORT
A 462-bp EcoRI- PstI fragment of the p85
cDNA clone was used as a hybridization probe on Southern blots to type
DNAs from two genetic crosses: (NFS/N or C58/J
The p85 polypeptide isolated from 2D SDS-PAGE gels was
microsequenced as described(9) . An N-terminal and two internal
peptide sequences were obtained (see Fig. 1). Three sets of
degenerate oligonucleotide probes were made against the N-terminal
sequence DQREMT (see ``Materials and Methods'').
These oligonucleotides were tested by hybridization to MN7
poly(A)
Screening of approximately 750,000 clones
yielded a single positive clone with an insert size of 1.9 kb. This
clone was sequenced at the 5` end to confirm the presence of the
N-terminal peptide sequence DQREMT. Subsequently, the insert
was cut into five separate fragments using EcoRI and PstI and subcloned into the pSPORT
In order to identify the 5`-end of the message, a
5`-RACE reaction was carried out and 18 additional nucleotides were
obtained. The complete cDNA sequence contains an ORF of 1677 bp, which
extends between positions 121-1797 and codes for a protein of 559 amino
acids. All three peptide sequences obtained by microsequencing are
present in the deduced protein sequence (see Fig. 1). The first
AUG, at position 121, is in a favorable codon context for an eukaryotic
translation start site (A at -3 and G at +4; Ref. 23). It is
preceded by three in-frame stop codons, while the ORFs in the two
alternative reading frames are not more than 100 bp long. The coding
region is terminated by a single UGA codon and is followed by a
3`-untranslated region of 95 bp, which includes an AAUAAA
polyadenylation signal(24) , and a poly(A) tail.
The protein
sequence contains a putative signal peptide at the N terminus, which is
the only hydrophobic portion of the protein (see Fig. 2b). The cleavage site for the signal peptide is
most probably between Ala-Ala after residue 19. This is based on
N-terminal protein sequencing data, where the first residue is alanine
(at position 20), and the application of von Heijne's(-3,
-1) rule(25) . The cleaved extracellular protein would
therefore contain 540 amino acids with a calculated mass of 61 kDa and
a pI of 6.26. On 2D SDS-PAGE gels, the p85 protein migrates as a
diffuse spot with an average mass of 85 kDa and a pI of 5.7(9) .
The differences in apparent and predicted values are most probably the
result of post-translational modifications. Indeed, the analysis of the
sequence reveals the presence of three potential N-glycosylation sites and several possible phosphorylation
sites (see Fig. 1). N-Linked glycosylations can
contribute significantly to the molecular weight of
proteins(26) , and phosphorylations have been shown to affect
the mobility of proteins on SDS gels(27) . However, the protein
does not contain potential sites for glycosaminoglycan addition,
indicating that it is unlikely to be a proteoglycan(28) .
Considering the differences in apparent and predicted molecular
weights, the total carbohydrate content of the molecule could be up to
28%.
Analysis of the deduced protein sequence of p85 using the
WinDot (30) and Clustal V (31) programs revealed the presence of
a tandemly duplicated domain within the protein; region 170-298
is homologous to region 302-424 (Fig. 2a). These
two domains immediately follow the N-terminal cysteine-free domain of
150 amino acids. The first repeat contains 10 cysteines and the second
9. There is a remarkable conservation of the cysteines, including the
positions of two cysteine doublets, within these domains.
Similarity
searches (performed using BLASTP (32) at the National Center for
Biotechnology Information (NCBI), FASTA at Los Alamos National
Laboratory(33) , and D-FLASH(34) ) indicated that
although the p85 sequence was not closely related to any other known
protein, a limited similarity to a recently characterized
calcium-binding protein of sea urchin called Endo16 (35) could
be observed. Following alignment, three regions in the p85 sequence
(235-264, 265-289, and 400-418) showed 33%, 48%, and
47% similarity to regions 273-301, 412-436, and
664-682 of Endo16, respectively. Coinciding with this, we find
that residues 253-290 of p85 are similar (55%) to residues
654-690 of calpain, a calcium-activated neutral
protease(36) . This region in calpain, contains a
calcium-binding loop of an EF hand structure (residues 663-674 of
calpain) and both p85 and calpain contain 6 acidic residues in this
region. Regions 264-278 and 361-376 of p85 are also similar
(66% and 62%) to serum albumin family repeats(37) . The
similarities to Endo16 and serum albumin family of proteins,
involves characteristic cysteine doublets that typify these proteins
and is discussed under ``Arrangement of Cysteines in the p85
Proteins.''
Domain and motif searches using the mail servers
SBASE(38, 39) , and PRODOM (40) primarily yielded
similarities of the first 19 residues of p85 to signal peptides of
various secreted proteins. Apart from this, typical signatures for
domains or motifs of other known proteins were not identifiable. Cell
attachment motifs such as RGD and LDV were absent(41) .
Alignments of protein sequences on the basis of similarities in amino
acid compositions (using the ExPASy server; Ref. 42) did not identify
any homologues to p85.
The deduced protein sequence of the p85 was
analyzed using the Kyte and Doolittle algorithm (hydrophobicity) as
well as the Chou and Fasman algorithm (secondary structure), which are
included in the University of Wisconsin Genetics Computer Group
software package GCG. Predictions indicate that the protein is highly
hydrophilic with no intervening transmembrane domains (Fig. 2b). The molecule is predicted to contain 39.75%
helices, 24.38% sheets, and 35.87% coils.
Structurally, the p85
molecule can be divided into four regions. These are, a 19-residue
signal peptide, followed by a cysteine-free domain of 150 residues, two
tandem repeats of 129 and 123 residues, and a C-terminal region of 135
residues. This is schematically represented in Fig. 2c.
To study the tissue specificity of p85 gene (Ecm1)
transcription, we performed Northern blot analysis on RNAs isolated
from tissues of embryonic mice and different cell lines. In addition,
the steady state levels of the p85 message in MN7 cultures at different
time points were examined.
To explain the observation that RNAs from tail, front paw,
and skin contained a shorter p85 transcript of 1.5 kb (Fig. 3a), we looked for multiple copies of the p85 gene
by genomic Southern blot analysis. Mouse genomic DNA digested with
different enzymes always showed the presence of a single band when
hybridized to a 5` cDNA probe (Fig. 4). Genomic DNA digested with BamHI and probed with the full-length cDNA gave only two bands
of 4.4 and 1.0 kb, corresponding to the 5` and 3` ends of the gene,
respectively (Fig. 4). Since BamHI does not cut in the
cDNA, these restriction sites ought to be located within intron
sequences. We also isolated different genomic clones of the p85 gene,
and all of them share the same restriction enzyme pattern. Furthermore,
genomic mapping identifies a single locus on Chr 3 corresponding to the
p85 gene (see ``Genetic Mapping''). These data support the
conclusion that the p85 gene exists as a single-copy gene in mouse.
The p85 protein contains 29 cysteines, one located in the
signal peptide and the remaining 28 after the 150 residue-containing
N-terminal cysteine-free domain. Of the 29 cysteines, there are 6 pairs
of cysteine doublets and 17 single cysteines. The arrangement of the
cysteines is very similar to the pattern in serum albumin family
proteins(45, 46) , and the spacing of the single
cysteines is strikingly similar to that of the Endo16 protein (35) (Fig. 5). The six pairs of cysteine doublets, at
positions 239-240, 277-278, 367-368, 402-403,
468-469, and 505-506 of p85, are arranged in a highly
specific manner in which two general rules can be identified. 1) All of
the six doublets are followed by a single cysteine within 7-10
amino acid residues (this is typical for the serum albumin family
proteins(35, 46) ; 2) the distance between the single
cysteine that follows the doublet and its next successive neighbor is
12 residues in every alternate case. The distance between the single
preceding cysteine and its following doublet is also 12 in every
alternate case (Fig. 5). Notably, these characteristic spacings
are also preserved in the splice variant of p85.
The gene encoding p85 was also localized on the mouse linkage
map in an effort to determine if this gene may be implicated in any
known developmental diseases. Southern blot hybridization with a 462-bp
fragment of p85 cDNA identified ApaI fragments of 6.0 kb in
parental M. m. musculus and 5.8 in NFS/N and C58/J. Digestion
with BamHI identified fragments of 5.2 and 4.5 kb in M.
spretus and NFS/N, respectively. Inheritance of the polymorphic
fragments in the progeny of the two genetic crosses was compared with
inheritance of over 700 markers previously mapped to all 19 autosomes
and the X chromosome. As shown in Fig. 6, the gene encoding p85, Ecm1, was linked to markers on Chr 3 just distal to Gba. The closest linkage was observed with Fcgr1. In
the M. m. musculus cross, no recombination was observed
between Ecm1 and Fcgr1 in the 71 mice typed for both
markers, indicating that these genes are within 4.1 centimorgans (see Fig. 6).
p85 was originally identified as a novel secreted protein of
the mouse stromal osteogenic cell line, MN7. This protein has been
further characterized by cDNA cloning, sequencing, Northern analysis
and genomic mapping. The complete cDNA clone is 1.9 kb and contains an
open reading frame of 1677 bp that codes for a protein of 559 amino
acids.
The deduced protein sequence contains a hydrophobic signal
peptide corresponding to the first 19 residues. The detection of p85 in
the culture supernatants of MN7, as well as the establishment of the
N-terminal sequence of this protein as Ala-Ser-Glu . . . , confirms the
function of this signal sequence in the processing of the p85 protein.
The deduced protein also contains three potential N-glycosylation sites and several potential phosphorylation
sites. These post-translational modifications may explain the diffuse
nature of the p85 spot on 2D SDS-PAGE gels (9) and the
differences in apparent and predicted values of mass and pI. The acidic
and basic amino acid contents of the protein are almost balanced (13.4
and 14.6% mass), and the distribution of charged residues is fairly
uniform throughout the sequence (data not shown). The sequence does not
contain acidic regions, common cell attachment motifs, or leucine
repeats that are observed in other non-collagenous bone matrix proteins
such as osteocalcin, osteopontin, osteonectin, and bone
sialoprotein(49) . Data base searches also indicate neither p85
nor closely related genes have previously been described.
Study of
the tissue distribution by Northern blot analysis reveals that the p85
transcript is present in various tissues such as liver, heart, and
lungs, but expression is not detected in brain and is negligible in the
calvaria and the BALB/c 3T3 cell line. Uniquely, in skin and
cartilage-containing-tissues, a smaller transcript derived via
alternative splicing is observed. This suggests the presence of a
smaller protein isoform in these tissues.
Based on the distribution
of cysteines and repeats in the protein sequence, four different
regions can be identified: an N-terminal cysteine-free domain rich in
prolines and glutamines, two tandem repeats, and a C-terminal region.
The p85 molecule, therefore, appears to have a large multidomain
structure.
The cysteine-containing portion of the molecule contains
six cysteine doublets that all have the typical
CC-(X
The cysteine pattern in p85
is also very similar to that of Endo16 with respect to the 12-residue
distances that separate the successive single cysteines in every
alternate case(35) .
The similarities in the structures of
these proteins may be reflected in their functions as well. It seems
very likely that the cysteine-containing portion of p85 would form
double loop structures similar to the albumin proteins. Modeling the
p85 sequence to the albumin pattern, it would appear that each tandem
repeat in the p85 molecule would comprise one double loop domain and
the third double loop domain would be in the C-terminal region. The
typical double loop domain structure of the albumins confers on them
the ability to bind and carry various molecules in the blood, wherein
the different loops bind to distinct ligands(52) . We can at
this point only speculate that this cysteine arrangement in p85 has a
similar role in its biological functions.
It has been demonstrated
that molecules of the extracellular matrix bind to various ligands
important for growth and differentiation and present them to the cells
(reviewed in Refs. 53-55). Since p85 is probably secreted into
the extracellular matrix of bone by osteoblastic cells, the
cysteine-containing region of this protein may be involved in binding
to important ligands (possibly including growth factors) and then
presenting them to cell surface receptors or other interactive
molecules.
The functional analogy may be further supported by the
globular, hydrophilic nature of the p85 molecule, which is similar to
the soluble nature of the albumins. The ligand binding ability of the
albumins is also attributed to the high proportion of acidic and basic
residues in these proteins. The acidic and basic amino acid contents of
the p85 sequence are 13.4 and 14.6%, respectively. In comparison, the
basic amino acid content of p85 is slightly higher than the values for
albumin (16.6% acidic and 12.9% basic) (56) and Endo16 (17.8%
acidic and 14% basic)(35) . The basic amino acid content in p85
may reflect different ligand binding affinities.
The selective
presence of the smaller Ecm1 transcript in skin and cartilage
may signal a diversification in function. Interestingly, the splicing
event does not disrupt the pattern in C-CC-C distances. It does lead to
the loss of the first double loop domain, and creates a rearranged
double loop domain instead. The effects of this alteration are not
known. We observe that the region that is spliced out contains a
sequence similar to the calcium-binding loop of calpain(36) . It
is tempting to speculate that the larger isoform has the ability to
bind calcium and thus would be appropriately placed in bone. The
predicted smaller isoform, on the other hand, may have a structural
modification for an altered function in skin and cartilage. Analysis of
the amino acid compositions of the two isoforms reveals that the
smaller protein is more acidic (predicted pI is 5.55) as compared to
the larger molecule (predicted pI is 6.26). We also observe that the
alternatively deleted domain contains one of the three possible N-glycosylation sites and three of the 14 possible
phosphorylation sites, indicative of possible post-translational
differences.
Thus, alternative splicing may serve as a mechanism for
generating functional diversity in the Ecm1 gene. The larger Ecm1 transcript is expressed in several tissues at varying
levels, but the levels are comparatively higher in MN7 cells. This
larger transcript is detected in subconfluent cultures of MN7,
indicating that the Ecm1 gene is an early-expressed gene.
Furthermore, the steady state levels of the transcript decline after
MN7 cells have passed the proliferative phase, which suggest that the
larger isoform of p85 may be necessary for promoting cell proliferation
and/or inducing the early matrix formation events in
bone(7, 57) .
The presence of the smaller transcript
in skin and cartilage is especially interesting in light of the
chromosomal location of the gene in mouse. The Ecm1 gene maps
to Chr 3, just distal to Gba in a region containing at least
three known mutations affecting skin: ft (flaky tail), soc (soft coat), and ma (matted) (the phenotypes of these
mutations are summarized in Ref. 58). This suggests that Ecm1 may represent a candidate for any of these mutations. In
particular, mice with soc have abnormalities in the epidermis,
hair bulb, whiskers, and display a clumping of the hairs of the
coat(58) , all of which is consistent with the known expression
pattern of Ecm1. Correlation of soc and Ecm1 would provide important information in the elucidation of the in vivo function of p85.
The localization of Ecm1 is also interesting from another standpoint. The Ecm1 region shares linkage homology to human chromosome
1q21(59) , a region that contains a cluster of three families of
genes involved in epidermal differentiation(60) . One family
includes includes the proteins loricrin (LOR), involucrin (IVL), and a
small proline-rich protein (SPRR1). These proteins are closely
associated in the formation of the cornified cell envelope in the
uppermost layers of the
epidermis(61, 62, 63, 64) . Each of
these genes contains a region of short tandem peptide repeats that have
been partially conserved during evolution(65, 66) . A
recent report demonstrated that the mouse homolog of LOR maps to mouse
Chr 3 in apparent close proximity to Ecm1(67) .
The
second group includes several members of the S100 family of small
calcium-binding proteins: calcyclin (CACY), calpactin I light chain
(CAL1L), calgranulins A and B, and possibly
others(68, 69, 70, 71) . These proteins
contain two calcium-binding domains with the EF-hand motif, are highly
homologous at the amino acid sequence level, and have a similar gene
organization.
The third family localized to human 1q21 includes
profilaggrin (FLG) (72) and trichohyalin (THH)(73) . These genes appear to be
``fused'' genes containing at the 5` end two EF-hand calcium
binding motifs like those of the S100 family, and tandem peptide
repeats that are characteristic of the cornified cell envelope
family(74, 75) . The mouse Flg locus has
recently been mapped to Chr 3(67) .
The close physical
linkage of these genes and the striking similarity in their
organizations have been suggested to be the result of a common
evolution(66, 60) . It has been suggested that some of
these genes may share common regulatory regions and may function in
concert during the final steps of epidermal
differentiation(76) . The questions of whether Ecm1 is
evolutionarily related to these genes and whether the p85 protein is
involved in epidermal differentiation are intriguing and require
further investigation.
Finally, the question whether p85, which
shares close structural similarity with Endo16 (a protein that performs
important functions during the development and differentiation of sea
urchin embryos)(35) , could be a mammalian analogue needs
further investigation, especially for its potential implications in
ontogenic and phylogenic studies.
The nucleotide
sequence(s) reported in this paper has been submitted to the
GenBank
We thank Dr. A. Van de Voorde and Dr. W. Deleersnijder
for helpful discussions and encouragement during the course of this
study.
ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES
)
patterns of
proteins present in MN7 culture supernatants at different time points
were compared to those of non-osteogenic stromal cell
cultures(9) . Fifteen different protein spots were found to be
specific for MN7. Five of these polypeptides were isolated and
partially microsequenced. Four of them were identified as osteopontin,
collagen
2 (I), cathepsin L, and tissue inhibitor of
metalloproteinases-2, all of which are known to be involved in bone
formation or resorption processes(10, 11) . Three
partial peptide sequences of the fifth protein did not have any
counterparts in public data banks(9) . This protein migrated as
a train of spots with an average mass of 85 kDa and pI 5. 7, and was
designated ``p85.''
Cell Culture
cells(12) , were grown in
-MEM (
-minimal essential medium) containing 10% fetal calf
serum and 2 mML-glutamine. BALB/c 3T3 cells were
grown in Dulbecco's modified Eagle's medium containing 10%
fetal calf serum. MC615 chondrocytes (13) were grown in
Dulbecco's modified Eagle's medium/F-12 medium containing
10% fetal calf serum and 2 mML-glutamine.
10
cells/cm
in
24-well culture plates, in
-MEM supplemented with 10% fetal calf
serum, 2 mML-glutamine, 10 mM sodium
-glycerophosphate, and 50 µg/ml L-ascorbic acid. The
cultures were refreshed every 2 days and harvested by trypsinization at
different time points. The cells were cultured in the presence of serum
throughout the experiment.
RNA Studies
Isolation of RNA
Isolation from Cells in Culture
This was
performed as described in Ref. 14. Briefly, cells were lysed in Nonidet
P-40, cell lysates extracted with phenol and chloroform, and then
precipitated with isopropanol at -20 °C overnight. The RNA
was pelleted, washed in 70% ethanol, and then dissolved in diethyl
pyrocarbonate-treated water.
Isolation from Tissues
This was performed as
described in Ref. 14. Tissues were dissected out of embryonic BALB/c
mice (16 days post-coitum) and immediately frozen in liquid nitrogen.
The frozen tissues were homogenized in 4 M guanidine
thiocyanate and then centrifuged over a cesium chloride cushion at
32,000 rpm for 24 h at 21 °C in a SW41 rotor (Beckman). The pellet
containing the RNA was washed in 70% ethanol and dissolved in diethyl
pyrocarbonate-treated water.
Poly(A)
Poly(A) containing RNA was isolated
from total RNA using magnetic oligo(dT) beads (DynabeadsEnrichment
)
according to the manufacturer's protocol.
Northern Blotting
Southern Blotting
Construction and Screening of the MN7 cDNA Library
vector
was made using MN7 poly(A)
-enriched RNA and the
Superscript
cDNA cloning system of Life Technologies, Inc. The
library contained approximately 2
10
independent
clones, with an average insert size of 1.7 kb.
polynucleotide kinase (New England
Biolabs) and [
-
P]dATP (Amersham).
Hybridizations were done at room temperature according to standard
procedures. The hybridization was followed by washes at increasing
stringency. Positive clones were subjected to secondary and tertiary
screening.
Design of Oligonucleotide Primers
DNA Sequencing
S]dATP (Amersham Corp.) and Sequenase (a
modified T7 DNA polymerase; Amersham). Vector-specific as well as
insert-specific primers were used.
Identification of the 5` End by PCR-RACE
Figure 1:
Nucleotide and deduced amino acid
sequence of the p85 cDNA. Residues are numbered at sides of
figure. Nucleotides from positions 1 to 18 were obtained by 5`-RACE.
The putative signal peptide (amino acid positions 1-19) is marked
by boldunderline. The putative cleavage site for the
signal peptide after residue 19 is marked by &cjs0435;. The single
N-terminal and the two internal peptide sequences obtained by
microsequencing of the protein spot are marked by thinunderline. Cysteine residues are circled. The
three potential N-glycosylation sites are marked with asterisks (*). Potential phosphorylation sites are indicated
as follows: &cjs3485; for protein kinase C, &cjs3486; for casein kinase
II, &cjs3487; for tyrosine kinase, and &cjs3488; for cAMP- and
cGMP-dependent protein kinases. The termination codon and
polyadenylation signal are marked in bold. The differentially
spliced exon is shaded.
Briefly, first strand cDNA was produced by reverse transcription
using GSP1 as the first primer. This was followed by tailing at the 5`
end using dCTP and terminal transferase. The tailed cDNA product was
amplified using an anchor primer (supplied with the kit) and GSP2. The
amplified product was then cloned in the NotI and SalI sites of pSPORT and sequenced. Sequences
obtained were verified on several independent PCR clones. The 5`
sequence obtained was independently confirmed by repeating PCR-RACE
using the primer GSP3.
Cloning of the Splice Variant by Reverse Transcription
PCR
. The region involved in differential splicing was
identified on the basis of restriction mapping and sequence analysis of
several independent clones originating from separate reactions.
Genomic Mapping
Mus m.
musculus)
M. m. musculus(18) and (NFS/N
Mus spretus)
M. spretus or
C58/J(19) . DNAs from these crosses have been typed for over 700
markers including the Chr 3 markers Gba (glucocerebrosidase), Fcgr1 (Fc
receptor), Cd1a (cluster designation
1a), Amy1 (amylase 1), Ngfb (nerve growth factor
) and Egf (epidermal growth factor). Probes for these
markers and RFLPs used to type these crosses for Gba, Fcgr1, Cd1a, and Ngfb have been described
previously(20) . Amy1 was typed following HindIII digestion in the M. spretus cross using a
530-bp probe kindly provided by Dr. M. Meisler (U. Mich.)(21) . Egf was typed following BamHI digestion in the spretus cross and HindIII digestion in the musculus cross using a 1200-bp fragment of the
pre-pro-epidermal growth factor cDNA obtained from Dr. G. Bell
(University of Chicago)(22) .
Isolation of the p85 cDNA Clone and Its Sequence
RNA on three separate Northern blots. Although
all three sets hybridized specifically to a single message of 1.9 kb,
set number 2 hybridized at highest stringency and was used to screen
the MN7 cDNA library.
vector.
Sequencing of the subclones was performed on both strands using T7 and
SP6 primers of pSPORT
. In cases where these primers could
not be used (compressions or too long insert), specifically designed
primers were employed. The complete sequence of the p85 cDNA is shown
in Fig. 1.
Figure 2:
a, internal sequence homology of
p85. The deduced amino acid sequence was analyzed by computer for
self-homology; gaps are permitted. Similar residues are in uppercase, and identical residues are shaded.
Cysteine doublets are shadeddark. Numbers indicate the positions of the amino acid residues starting at the
N terminus. b, hydropathicity plot of coding region (Kyte and
Doolittle algorithm included in the GCG software package). Apart from
the hydrophobic signal peptide (visible as a single peak at amino acid
positions 1-19), the protein is essentially hydrophilic in
nature. c, schematic representation of the different regions
in the p85 protein. The molecule has been drawn to scale. Each box represents one of the four identifiable regions. The residue
positions that mark the ends of each region are indicated. Single
cysteine residues are indicated as single-headed arrows, and
cysteine-doublets are shown as double-headed arrows.
The alternatively spliced domain is shadedgray.
Analysis of the amino acid composition reveals that protein is
rich in Arg, Pro, and Leu residues (). The basic and acidic
amino acid contents are balanced (14.6. and 13.4% of the mass,
respectively) and the distribution of charged residues is fairly
uniform over the sequence (data not shown). The protein is not rich in
Ser or Thr, which is generally observed for phosphoproteins, and it
does not contain a high number of aspartic acid or glutamic acid which
are common to some bone glycoproteins(10) . The protein is rich
in cysteines (a total of 29 residues or 4.8% mass), that are unevenly
distributed in the sequence. There is one residue in the signal
peptide, followed by a cysteine-free region of 150 residues, while the
remaining 28 residues are distributed over the rest of the molecule in
a typical pattern characteristic of the serum albumin family of
proteins (see ``Arrangement of Cysteines in the p85
Proteins''). There are 63 prolines and 34 glycines in the protein,
which suggests rigidity in the unfolded polypeptide chain, since
prolines permit less rotational freedom(29) .
Computer Analysis of the DNA and Deduced Protein
Sequences
Northern Blot Studies
Expression in Embryonic Tissues
Weak expression of a
1.9-kb p85 gene transcript was detected in liver, heart, and lungs (Fig. 3a). Calvaria, which are essentially membranous
bones produced directly by osteoblasts without a cartilage
intermediate, showed almost negligible expression. On the other hand,
skin- and cartilage-containing tissues, such as tail and front paw,
showed a very strong expression of p85. Moreover, these tissues
uniquely contained a smaller message of 1.5 kb in addition to the
longer transcript of 1.9 kb. In skin, only the smaller transcript was
present. No p85 gene message was detected in brain.
Figure 3:
Northern blot analysis. Each lane contains
10 µg of total RNA. The blots were first probed with the
full-length p85 cDNA and then with mouse glyceraldehyde-3-phosphate
dehydrogenase. a, embryonic tissues. Highest steady state
levels of the p85 transcript were detected in front paw, tail, and MN7.
The single shorter transcript present in skin was clearly visible after
a longer exposure of 5 days (right). b, cell lines.
RNAs were isolated from cells in culture when they reached 95%
confluence. High steady state levels of the longer p85 transcript were
present in MN7, MC3T3-E, and moderate levels in MC615. In
contrast, the p85 transcript was barely detectable in BALB/c 3T3
fibroblasts. c, time course expression. 5 µg of total RNAs
isolated from cells at different time points were loaded on the gels.
High steady state levels of p85 were present before and during the
onset of confluence; thereafter the levels
declined.
Expression in Cell Lines
We examined whether bone
and cartilage derived cell lines other than MN7, expressed p85. Strong
expression was detected in MC3T3-E, which is a mouse
calvarial preosteoblastic cell line(12) . Expression was also
seen in MC615, a mouse chondroblastic cell line(13) .
Interestingly, all the above cell lines expressed only the longer
transcript. Negligible expression of p85 was detected in BALB/c 3T3
fibroblasts (Fig. 3b).
Time Course Expression in MN7
Since MN7 cells are
able to proliferate and differentiate in vitro forming a
mineralized matrix(7, 8) , we studied the steady state
levels of the p85 gene transcript at different periods of culture in an
attempt to identify to which stage of MN7 culture (proliferation or
maturation), its expression could be correlated. It was seen that the
level of p85 mRNA peaked during confluence and then declined (Fig. 3c). This indicates that maximum expression
occurred during the proliferative phase of MN7. However, the expression
of the p85 gene is not completely repressed in post-confluent cultures
and experiments in which MN7 cells were cultured for longer periods in
serum-free media, revealed that p85 mRNA could still be detected in
30-day post-confluent cultures (results not shown).
Characterization of the Splice Variant
Figure 4:
Genomic Southern blot. Mouse genomic DNA
was isolated from BALB/c 3T3 cells. Each lane contains 10 µg of DNA
digested with different restriction enzymes (E, EcoRI; B, BamHI; H, HindIII). The blot on the left was probed with a 5`
cDNA probe (470-bp PstI fragment p85 cDNA). This probe
consistently hybridizes to single bands. The blot on the right represents BamHI-digested DNA probed with the full-length
p85 cDNA. The two bands of 4..4 and 1.0 kb are because of sites
situated within introns.
Post-transcriptional processing of the mRNA seemed a more likely
explanation for the two transcripts. The 1.9-kb message contains a
single polyadenylation signal, indicating that differential
polyadenylation was unlikely. It was also observed, by 5`-RACE, that
the two transcripts share a common 5`-untranslated region. Therefore,
to look for alternatively spliced internal exons, we performed reverse
transcription-PCR using primers located at the 5` and 3` ends of the
cDNA (see ``Materials and Methods'') and observed a specific
band corresponding to the expected size of the smaller transcript.
Subsequent cloning and sequence analysis revealed that a region of 375
bp between nucleotides 886 to 1260 was missing in this transcript.
Comparison of this sequence to the genomic sequence()
revealed that the deleted region corresponds to a single internal
exon. This is a phase 0 exon (it has complete codon triplets at both
ends; Ref. 43) and is bounded by canonical donor and acceptor splice
sites(44) . This exon is deleted through a normal cassette
mechanism, and the splicing event does not change the reading frame of
the protein but produces a smaller protein lacking 125 amino acids. The
absence of this exon in the smaller transcript has been confirmed using
an exon-specific probe (a 250-bp PstI fragment present within
the spliced exon) on Northern blots containing tail and front paw RNAs.
This probe hybridizes only to the longer transcript (data not shown).
Thus tissue-specific alternative splicing is associated with the
expression of the p85 gene.
Arrangement of Cysteines in the p85 Proteins
Figure 5:
The regular distribution of single
cysteines and cysteine doublets in the p85 protein and its putative
smaller isoform. The cysteine doublets have been aligned below each
other, and the distances in terms of amino acid residues of preceding
and following single cysteines are indicated. The figures below the
doublets represent their positions on the protein sequence. The
alternating 12-residue distances are boxed.
Genetic Mapping
Figure 6:
An abbreviated map of Chr 3 showing the
location of the p85 gene, Ecm1. The map to the left represents the composite map of this chromosome showing the
locations of the marker loci typed in our crosses and the locations of
relevant mutations (47). Centimorgan distances of these loci from the
centromere are given to the immediate left of the map. Human map
locations for homologs of the underlined mouse genes are given
to the far left. The two maps to the right were generated from
the two separate crosses typed in this study. Recombination fractions
are given to the right of both maps for each adjacent locus
pair. Numbers in parentheses represent the percent
recombination and standard error calculated according to Green (48). No
double recombinants were observed in this interval in either cross. The
Mouse Genome Data Base (MGD) accession numbers are MGD-CREX-299 for M. m. musculus and MGD-CREX-300 for M.
spretus.
)-C arrangement, which is
characteristic of the serum albumin family of proteins(46) .
Such an arrangement was predicted to generate characteristic
``double loop'' domains in the serum albumin family proteins (45) as confirmed by x-ray
analysis(46, 50, 51) . These double loop
structures are involved in important ligand-binding functions of the
albumin proteins (reviewed in Ref. 52).
Table: Amino acid composition of the p85 coding region
/EMBL Data Bank with accession number(s)
L33416.
. Ph.D. thesis, University of Antwerp, Belgium
©1995 by The American Society for Biochemistry and Molecular Biology, Inc.