(Received for publication, April 3, 1997, and in revised form, May 5, 1997)
From the Center for Extracellular Matrix Biology,
Albert B. Alkek Institute of Biosciences and Technology and the
Department of Biochemistry and Biophysics, Texas A & M University,
Houston, Texas 77030, the § Orthopedic Research
Laboratories, Montefiore Medical Center, Bronx, New York 10467, and the
¶ Shriners Hospital for Children and the Department of
Biochemistry and Molecular Biology, University of South Florida,
Tampa, Florida 33612
The epiphysis of developing bones is a cartilaginous structure that is eventually replaced by bone during skeletal maturation. We have separated a dermatan sulfate proteoglycan, epiphycan, from decorin and biglycan by using dissociative extraction of bovine fetal epiphyseal cartilage, followed by sequential ion-exchange, gel permeation, hydrophobic, and Zn2+ chelate chromatographic steps. Epiphycan is a member of the small leucine-rich proteoglycan family, contains seven leucine-rich repeats (LRRs), is related to osteoglycin (osteoinductive factor) (Bentz, H., Nathan, R. M., Rosen, D. M., Armstrong, R. M., Thompson, A. Y., Segarini, P. R., Mathews, M. C., Dasch, J., Piez, K. A., and Seyedin, S. M. (1989) J. Biol. Chem. 264, 20805-20810), and appears to be the bovine equivalent of the chick proteoglycan PG-Lb (Shinomura, T., and Kimata, K. (1992) J. Biol. Chem. 267, 1265-1270). The intact proteoglycan had a median size of ~133 kDa. The core protein was 46 kDa by electrophoretic analysis, had a calculated size of 34,271 Da, and had two approximately equimolar N termini (APTLES ... and ETYDAT ... ) separated by 11 amino acids. There were at least three O-linked oligosaccharides in the N-terminal region of the protein, based on blank cycles in Edman degradation and corresponding serine or threonine residues in the translated cDNA sequence. The glycosaminoglycans ranged in size from 23 to 34 kDa were more heterogeneous than those in other dermatan sulfate small leucine-rich proteoglycans and were found in the acidic N-terminal region of the protein core, N-terminal to the LRRs. A four-cysteine cluster was present at the N terminus of the LRRs, and a disulfide-bonded cysteine pair was present at the C terminus of the protein core. The seventh LRR and an N-linked oligosaccharide were between the two C-terminal cysteines. An additional potential N-glycosylation site near the C terminus did not appear to be substituted at a significant level.
Cartilage contains a variety of proteoglycans. Particularly
abundant are aggrecan and the small leucine-rich proteoglycans (SLRPs1) (1) fibromodulin and decorin.
Several other proteoglycans in cartilage have also been identified,
including versican, perlecan, and the leucine-rich proteoglycans
lumican and biglycan. The relative abundance of these proteoglycans
varies during development and by location within the tissue. It is
likely that this variation has a role in the differentiation and
maintenance of tissue structure. The exact roles of the SLRPs are
unclear at present, but it is thought that fibromodulin and decorin are
involved in the process of collagen fibrillogenesis (2, 3) and may play
a crucial role in optimizing the diameter of collagen fibrils that will eventually be replaced during remodeling of the cartilage during calcification. It is possible that the SLRPs also have a role in
regulating growth factors, e.g. transforming growth
factor-, which binds to decorin (4, 5).
A hallmark of the SLRPs are the cysteine clusters that flank the
leucine-rich repeats. These cysteines form disulfide bonds and perhaps
provide a structure that differs from leucine-rich repeat
(LRR)-containing proteins that do not contain this feature. The LRR
motif was first identified by Patthy (6) and is characterized by an
LXXLXLXXNXL sequence, where
X is any amino acid and L is often a leucine, but may be any
amino acid with a hydrophobic aliphatic side chain (Ile, Val, and Met).
The LRR motif is conserved throughout evolution, and the increasing
number of members of this family includes a range of proteins with
diverse functions and distributions (reviewed by Kobe (7)). The
three-dimensional structure of the porcine ribonuclease inhibitor, a
member of the leucine-rich protein family, has been determined (8). In
this protein, the 15 individual LRRs adopt a stacked
-sheet/
-helix hairpin structure, resulting in an overall
horseshoe shape and indicating that the LRR is likely to be primarily a
folding motif and not an indicator of function.
By using a novel purification protocol, we have isolated milligram quantities of an LRR-containing proteoglycan from bovine fetal epiphysis and determined its primary structure. We have named this proteoglycan epiphycan based on its isolation from the epiphysis and show it to be a mammalian homolog of the avian proteoglycan PG-Lb, isolated from developing chick limb (9). The amino acid sequence is very similar to a recently published mouse cDNA-derived sequence (10). Unlike other cartilage-derived LRR-containing proteoglycans and glycoproteins, epiphycan contains only seven LRRs instead of the more usual 10 or 11. We show that all of the glycosylation, with the exception of one N-linked oligosaccharide, occurs in the N-terminal domain.
Guanidine hydrochloride (GdnHCl) was from Research Plus Laboratories. Phenylmethylsulfonyl fluoride, iodoacetamide, sodium citrate, and sodium chloride were from Sigma. Pepstatin A was from Calbiochem. Sepharose CL-4B, octyl-Sepharose, chelating Sepharose Fast Flow, DEAE-Sepharose, Q-Sepharose, Sepharose PD10, Superose 6, and Superdex 75 10/30 columns were from Pharmacia Biotech Inc. Radioisotopes ([125I]iodine and NaB3H4) were supplied by DuPont or by Amersham Life Science, Inc. Taq polymerase, dNTPs, and restriction enzymes were supplied by Life Technologies, Inc. Chondroitinase ABC was supplied by Seikagaku. PCR primers were purchased from Life Technologies, Inc., National Biosciences (Plymouth, MN), or the Texas A & M Core Facility (College Station, TX); were synthesized in-house (Shriners Hospital for Children, Tampa, FL); or were gifts from Michelle Deere. 14C-Labeled molecular mass markers were purchased from Amersham Corp. The pCRII plasmid (TA cloning) was purchased from Invitrogen. WizardTM PCR preps were purchased from Promega. Sequencing was performed with Sequenase Version 2.0 (U. S. Biochemical Corp.) or by the Automated DNA Sequencing Facility at the Interdisciplinary Center for Biotechnology Research at the University of Florida (Gainesville, FL). ProSpinTM centrifugal filters were purchased from Applied Biosystems/Perkin-Elmer (Foster City, CA). The pBluescript plasmid was obtained from Stratagene. Trypsin (sequencing-grade) and endoprotease Lys-C were purchased from Boehringer Mannheim. Zeta-probe GT membrane was purchased from Bio-Rad.
Isolation of Epiphycan from Bovine Fetal Epiphyseal CartilageDermatan sulfate proteoglycans were isolated from bovine fetal epiphyseal cartilage, as described previously for skin and bovine articular cartilage, by ion-exchange followed by gel permeation chromatography (11, 12).
Octyl-Sepharose ChromatographyThe DEAE-bound and eluted proteoglycan-containing fractions from the gel permeation chromatography were applied to an octyl-Sepharose column that had been equilibrated with 2 M GdnHCl and 0.15 M sodium acetate, pH 6.3, at 25 °C and allowed to bind for 2 h. The column was washed with 3 volumes of 2 M GdnHCl, and the proteoglycans were eluted with a linear gradient of 2-6 M GdnHCl. Fractions were analyzed for uronate, revealing the presence of two major peaks. Fractions containing both epiphycan and decorin were pooled; concentrated to 5 mg/ml with an Amicon YM-2 filter; and dialyzed against 4 M GdnHCl, 5 mM EDTA, and 50 mM Tris, pH 7.5.
Zn2+ Chelate ChromatographyChelating Sepharose (50 ml) was charged with 500 ml of ZnCl2 (2 mg/ml) and then washed with water. The gel was packed into a 60-ml water-jacketed column on top of 10 ml of uncharged chelating Sepharose and equilibrated with 500 ml of Chelex 100-treated 0.15 M NaCl and 50 mM Tris-HCl, pH 8.1. The concentrated and dialyzed eluate from the octyl-Sepharose column, containing a mixture of decorin and epiphycan, was dialyzed against Chelex-treated equilibration buffer and applied to the column. The sample was allowed to bind for 2 h at 25 °C. The column was eluted with 0.15 M NaCl and 50 mM Tris-HCl, pH 8.1 (300 ml), followed by a linear pH gradient from pH 8 to 4 (total volume of 300 ml). Fractions (7.2 ml) were collected at a flow rate of 40 ml/h and were monitored for uronate, protein, and pH and by SDS-PAGE.
Protein Sequence AnalysisProtein and peptide samples were sequenced by established methods using an Applied Biosystems 477A sequencer with on-line detection of phenylthiohydantoin-derivatives on a 120A microbore HPLC. A purified sample of epiphycan was applied to an acetonitrile-wetted polyvinylidene difluoride membrane in a ProSpinTM centrifugal filter, washed with water, and sequenced. The same protocol was used to identify the Superdex 75 V0 peak (below).
Peptide MappingInitial peptide mapping was performed using trypsin and a mixture of epiphycan and decorin as the substrate. Peptides that were unique to the map of epiphycan-containing material, as compared with purified decorin, were sequenced. Peptides were separated by gel permeation chromatography on a Superose 12 column, followed by reversed-phase HPLC of individual fractions as described elsewhere (13).
Alternatively, purified epiphycan was digested overnight with endoprotease Lys-C at an enzyme/substrate ratio of ~1:25 in 50 mM Tris-HCl, pH 8, at 37 °C. The products of the digest were partially separated by gel permeation chromatography on a Superdex 75 column prior to analysis.
The void volume peak from the Superdex 75 column was identified as being from the N terminus by Edman degradation. It was dialyzed against trypsin digestion buffer (50 mM Tris-HCl, pH 8) and further digested with trypsin. The products further separated on a Superdex 75 column, followed by reversed-phase HPLC. In this case, fractions were also assayed for sulfated GAGs by the dimethyl methylene blue assay (14).
Peptide NomenclaturePeptides are numbered in order from the N terminus to the C terminus. The letter preceding the number indicates which enzyme released the peptide from the parent protein. Thus, K-1-T-1 is the N-terminal peptide (T-1) derived from subdigestion of the N-terminal endoprotease Lys-C-derived peptide (K-1) with trypsin.
Identification of the Epiphycan-coding cDNA SequenceInitial protein sequence data enabled us to design two
degenerate oligonucleotide primers (forward primer,
HTAYTTYTAYWSHMGVTTYAA; and reverse primer, CCVARBCKRTTRTTGSWNAT) that,
based on sequence similarity to chick PG-Lb, would be expected to give
a product of 230 base pairs. Reverse transcription-PCR of bovine
chondrocyte RNA, with an annealing temperature of 40 °C, gave a band
of the expected size, which was excised from an agarose gel. The PCR product was reamplified using similar primers modified with
EcoRI and BamHI clamps at the 5-ends and cloned
into pBluescriptTM. Sequence analysis of the cloned insert indicated
that the PCR product corresponded to the determined protein
sequence.
A bovine cDNA ZAP library kit (Stratagene) was used to produce a
cDNA library from bovine epiphyseal cartilage mRNA prepared by
the method of Smale and Sasse (15). PCR primers were designed based on
the sequence obtained from the previously obtained PCR product and the
DNA sequence of the
ZAP library arms. The entire coding region for
bovine epiphycan was PCR-amplified and sequenced in both directions,
either by sequence analysis of cloned PCR products or by direct
sequencing of the PCR products using the originating PCR primers.
The SLRPs epiphycan, decorin, and biglycan (50 µg each) were radiolabeled with 0.5 mCi of [125I]iodine by the chloramine-T method (16). The labeled proteoglycan was separated from unincorporated Na125I on a PD-10 column, followed by a 1-ml mono-Q-Sepharose column.
Molecular Mass EstimationLabeled proteoglycans were further purified on a 0.5-ml DEAE-Sepharose column. The molecular masses of the intact proteoglycans were estimated by gel permeation chromatography on a Superose 6 10/30 column that was eluted at a flow rate of 0.4 ml/min in 4 M GdnHCl, 50 mM sodium acetate, pH 5.8, and 0.05% CHAPS and that had been calibrated with 14C-labeled molecular mass markers.
Isolation and Analysis of Core ProteinsThe 125I-labeled SLRPs epiphycan, decorin, and biglycan (400,000 cpm each) were digested with chondroitinase ABC (10 units/ml) for 24 h at 37 °C in 0.1 M Tris, 30 mM sodium acetate, 0.2% bovine serum albumin, 10 mM EDTA, 10 mM N-ethylmaleimide, 5 mM phenylmethylsulfonyl fluoride, and 1 mg/ml pepstatin A. The 125I-labeled core proteins were applied either to a Superose 6 column as described above or onto a 5-15% SDS-polyacrylamide gel to determine the molecular mass.
Isolation and Analysis of Glycosaminoglycan ChainsProteoglycans (50 µg) in 3 M sodium acetate,
pH 6.0, were precipitated with 2 volumes of ethanol for 1 h at
20 °C. The precipitated material was collected by centrifugation
for 15 min, washed with 70% ethanol, and dried in a SpeedvacTM. The
proteoglycans were resuspended in 100 µl of 0.05 M NaOH
containing 5 mCi of NaB3H4 at a final
concentration of 1 M and allowed to react for 24 h at
45 °C. The samples were placed on ice; 1 M acetic acid
was added dropwise until gas was no longer released; and the samples were then dried in a SpeedvacTM. The tritiated samples were washed twice in 10% methanol and dried again. Finally, the samples were resuspended in 500 µl of phosphate-buffered saline, 0.1% bovine serum albumin, and 0.1% dextran sulfate and applied to a 1-ml DEAE-Sepharose column equilibrated with phosphate-buffered saline containing 0.1% bovine serum albumin. The column was washed with equilibration buffer, and the GAGs were eluted with 2.5 column volumes
of phosphate-buffered saline containing 1 M NaCl and 0.1% bovine serum albumin. Fractions were analyzed for radioactivity and
analyzed by gel permeation chromatography on a Superose 6 10/30 column
as described above. Molecular size estimates for GAG chains are based
on the data of Wasteson (17).
Small
proteoglycans were isolated from bovine fetal epiphyseal cartilage as
described previously for skin and bovine articular cartilage (11, 12)
by dissociative extraction, equilibrium density gradient
centrifugation, DEAE-Sephacel chromatography, and gel permeation
chromatography on Sepharose CL-4B. The resultant material (Fig.
1) contained a mixture of small proteoglycans.
The proteoglycan-containing fractions from the gel permeation
chromatography were applied to an octyl-Sepharose column. The proteoglycans were eluted with a linear gradient of increasing GdnHCl
concentration. Fractions were analyzed for uronate, which revealed the
presence of two major peaks (Fig. 2). The material in
these peaks was analyzed by SDS-PAGE and peptide mapping. The results
indicated that the first peak contained biglycan, and the second peak
contained decorin and a second somewhat larger proteoglycan. This
unknown proteoglycan gave rise to unique tryptic peptides, which, when
sequenced, showed homology to avian PG-Lb. This proteoglycan was named
epiphycan. Fractions containing either biglycan or a mixture of
epiphycan and decorin were pooled separately.
The pooled mixture of epiphycan and decorin was passed over chelating
Sepharose charged with Zn2+. Decorin bound to the
Zn2+-charged column, whereas epiphycan was not retained.
The column was eluted with a linear pH gradient from pH 8 to 4. Epiphycan was completely separated from decorin in this chromatographic step as determined by monitoring for uronate, protein, and pH and by
SDS-PAGE (Fig. 3). The fractions containing epiphycan
were pooled as shown and assessed for homogeneity by SDS-PAGE.
Molecular Mass Determinations
The molecular mass of epiphycan
was estimated by gel permeation chromatography of a
125I-labeled sample on a Superose 6 10/30 column. The
elution position of the epiphycan proteoglycan (Fig. 4)
was compared with those of decorin and biglycan isolated from the same
tissue and with 14C-labeled molecular mass standards. The
elution profile for 125I-epiphycan showed one major peak
with an elution position at Kav = 0.31, which
corresponds to a molecular mass of ~130,000 Da. Epiphyseal
125I-decorin and 125I-biglycan each eluted as
single peaks at Kav = 0.34 (120 kDa) and 0.29 (150 kDa), respectively.
The molecular masses of the 125I-labeled core proteins were
determined by SDS-PAGE after digestion with chondroitinase ABC.
Epiphycan, decorin, and biglycan core proteins were of similar size and
migrated into the resolving gel to a position equivalent to a protein
of 46 kDa (Fig. 5).
The size of the GAG chains was estimated by gel permeation
chromatography after reductive -elimination in the presence of 3H-labeled sodium borohydride. The radiolabeled GAG chains
were purified on DEAE-Sepharose and analyzed on a Superose 6 column (Fig. 6). The epiphycan 3H-labeled GAGs
eluted as a heterogeneous peak at Kav = 0.46-0.57, corresponding to a molecular mass of ~23,000-34,000 Da.
Tritium-labeled GAG chains derived from decorin and biglycan each
eluted as one major peak at Kav = 0.57 (23 kDa)
and 0.54 (25 kDa), respectively.
Peptide Mapping
An endoprotease Lys-C digestion of unreduced epiphycan followed by gel permeation chromatography and reversed-phase separation of the products resulted in the isolation of peptides that were subsequently sequenced (Table I). A search of the GenBankTM Data Bank confirmed that all of the peptide sequences were highly homologous to the sequence of PG-Lb (9). The sequence determined from these peptides covered most of the epiphycan core protein (Fig. 7).
|
cDNA Analysis
Alignment of the sequences of the tryptic
peptides (Table I) generated from the decorin/epiphycan mixture (Fig.
2) with the protein sequence of PG-Lb enabled two degenerate PCR
primers to be designed (Fig. 7). Reverse transcription-PCR
amplification of a pool of fetal cartilage mRNA resulted in the
expected 230-base pair product. The product was gel-purified and
reamplified with primers that had restriction site-containing clamps
attached at the 5-end. This product was cloned into pBluescript and
sequenced, confirming that the product derived from DNA coding for
epiphycan. Similar products were obtained by PCR amplification of a
human cDNA library (18). The cloned PCR product was used to define specific primers that allowed the entire coding sequence to be amplified from a cDNA library generated from reverse-transcribed fetal epiphyseal cartilage mRNA. From this sequence, nested primers were used in a second round of PCR amplification to generate an 800-base pair PCR product, which was then excised and sequenced in both
directions. The entire coding region of bovine epiphycan mRNA and
the deduced amino acid sequence were determined (Fig. 7). The coding
sequence of bovine epiphycan has 966 base pairs, corresponding to a
translated protein of 321 amino acids. A signal peptide of 19 amino
acids precedes the mature protein, which has a calculated size of
34,721 Da and a pI of 4.52. The shorter form has a calculated size of
33,531 Da and a pI of 4.54.
Edman degradation of the intact protein indicated that epiphycan had two N-terminal sequences (APTLES ... and ETYDAT ... ) in approximately equimolar amounts. The first N-terminal sequence (APTLES ... ) derived from removal of the signal peptide. The two sequences could be identified separately by virtue of the fact that a repeat of the second sequence beginning ETY ... could be found after 11 cycles of sequencing. The second N terminus (ETYDAT ... ) may derive from the action of a protease or may be a result of cleavage by exopeptidases. Similar N-terminal processing occurs in biglycan (19) and in decorin (20) and, in these cases, appears to have a role in control of GAG chain length, either by altering the rate of intracellular transport of the proteoglycan or by altering the rate of synthesis of the GAG chain (21).
A disulfide bond was unequivocally assigned between the two C-terminal cysteines. A peptide with two N termini (TPQ ... and DMY ... ; K-9+K-12) (Table I) eluted from a Superdex 75 column at a position consistent with a size of 4-8 kDa. These two peptides have predicted molecular masses of 1859 and 5756 Da, respectively, and would not be expected to coelute on gel permeation chromatography. Reduction of this peptide allowed the isolation of a peptide (K-9R) that corresponded to one-half of the disulfide-bonded pair. Thus, Cys-278 and Cys-311 are linked by a disulfide bond. We have not been able to confirm the presence of a disulfide-bonded loop at the N terminus.
Consensus sequences for N-linked oligosaccharides were present at positions 282 and 301. A peptide was found in which the N-terminal residue could not be identified (XLTYIRK, peptide K-10). The unidentified residue corresponded to residue 282, indicating the likely presence of an N-linked oligosaccharide. This peptide also eluted anomalously early on gel permeation chromatography with an estimated size of 3-6 kDa, despite its calculated molecular mass of 907 Da. Asparagine was detected at position 301 (peptide K-11), indicating that this residue was generally not substituted.
Three likely O-substituted sites have been found at positions 60, 64, and 95 in peptide K-1, based on blanks in the Edman degradation and serine or threonine in the cDNA-derived sequence. Residue 60 in peptide K-1-T-2 (Table I) is a threonine (IEIATVMPSGN) in the cDNA-derived sequence and is likely substituted with an O-linked oligosaccharide. The GAG chain(s) are likely to be attached at serine residues (e.g. Ser-64 and/or Ser-95). Residue 64 also in peptide K-1-T-2 is a typical glycosaminoglycan attachment site (Ser-Gly) similar to the type found in decorin, biglycan, and aggrecan.
To determine the sites of GAG substitution, the high molecular mass
peptide K-1 was isolated and subdigested with trypsin. Two tryptic
peptides, K-1-T-2 and K-1-T-4, were found in the void volume of a
Superdex 75 column. This elution behavior suggested that the peptides
were covalently linked to GAG chain(s) or large oligosaccharide(s).
Reversed-phase analysis of this material resulted in a single, late
eluting homogeneous peak with an N terminus corresponding to the
sequence LIDG ... (peptide K-1-T-4) (Table I and Fig.
8) and a variety of earlier eluting, broad peaks, which
had the same N termini (AEIE ... ; peptide K-1-T-2) (Table I and
Fig. 8). Peptide K-1-T-2 contains Ser-64, which most likely is
substituted with a GAG based on its heterogeneity on reversed-phase HPLC. The late eluting peptide K-1-T-4, which contains Ser-95, is
probably substituted with an O-linked oligosaccharide based on its homogeneity on reversed-phase HPLC. However, we cannot exclude
the possibility that Ser-95 could also be substituted with a GAG
chain.
We have previously isolated decorin and biglycan from fetal skin and from bovine articular cartilage using a protocol involving dissociative extraction, ion-exchange chromatography, and hydrophobic chromatography on octyl-Sepharose 4B (11, 12). When small proteoglycans were isolated from fetal bovine epiphysis using the same procedure, we found three proteoglycans in roughly equal amounts. The third proteoglycan, which was named epiphycan based on its tissue source, was found to be the mammalian homolog of the avian proteoglycan PG-Lb. Epiphycan coeluted with decorin on octyl-Sepharose, but the two proteoglycans could be separated by metal chelate chromatography on Zn2+-charged columns.
The ability to prepare significant amounts of proteoglycan (~0.5 mg from 50 g of epiphysis) has enabled us to chemically characterize the proteoglycan. The peptide sequence data allowed degenerate oligonucleotide primers to be designed. The PCR product was sequenced, and this was used to initiate the determination of the cDNA sequence of bovine and mouse (10) and human (18) epiphycan.
N-terminal sequence analysis of intact epiphycan indicated the presence of two N termini. One of these (APTLESIN ... ) is generated by removal of the signal peptide. The other N terminus (ETYDAT ... ) did not conform to a signal peptide cleavage site and may derive from the action of a protease cleaving at NYNS-ETYD or from the action of exopeptidases. A similar two-step processing pathway has been noted for biglycan (19, 22).
The sizes of the two alternative core proteins calculated from the deduced amino acid sequence (33,531 and 34,721 Da) were smaller than those of decorin (36,421 Da) and biglycan (37,113). There are one (epiphycan), two (biglycan), or three (decorin) N-linked oligosaccharides attached to these proteins. As determined by SDS-PAGE (Fig. 5), there is little difference between the sizes of the core proteins of these three SLRPs after digestion with chondroitinase ABC. If N-linked oligosaccharides were the only substituents on the epiphycan core protein, then the difference between epiphycan and decorin or biglycan would be substantial (at least 6 kDa), implying that, in epiphycan, there are additional post-translational modifications that would increase the apparent core protein size to the same range as decorin and biglycan. These modifications are presumably O-glycosylations; based on Edman degradation, there appear to be at least two O-linked oligosaccharides and one O-linked glycosaminoglycan in epiphycan.
Intact epiphycan is intermediary in size between decorin and biglycan. The core protein, with O- and N-linked oligosaccharides attached, is similar in size to both these proteoglycans. The average size of the GAG chains released from epiphycan is rather larger than that of the GAG chains obtained from either decorin or biglycan. This would be consistent with the presence of one GAG chain in epiphycan. The tryptic peptide containing Ser-64 elutes over a broad range on reversed-phase HPLC (Fig. 8). This, coupled with its high molecular mass, suggests that it is substituted with a GAG chain. In contrast, Ser-95 is found on a peptide that elutes late and as a symmetrical peak on reversed-phase HPLC. This suggests that this peptide has a smaller and more homogeneous carbohydrate substituent, likely a conventional O-linked oligosaccharide.
Epiphycan is the mammalian homolog of chick PG-Lb (9) and, within the
LRR-containing region, is 78% identical. Epiphycan is also related
(49% identity within the LRR-containing region) to osteoglycin
(formerly named osteoinductive factor), which is a proteoglycan found
in the extracellular matrix of developing bone (23). The
epiphycan/PG-Lb family appears to be a separate branch of the
leucine-rich proteoglycans. An unrooted phylogeny diagram is shown in
Fig. 9, indicating the relationship of epiphycan to
chick PG-Lb, to a partial sequence of a shark analog of these proteoglycans,2 and to osteoglycin,
decorin, biglycan, fibromodulin, lumican, PRELP, and chondroadherin.
Each family appears to be essentially unrelated to the other families,
although all have the common feature of LRRs. It is noteworthy that
mammalian osteoglycin is more remotely related to mammalian epiphycan
(47% identical to the LRR region of bovine epiphycan) than the partial
shark-derived sequence (56% identical to the LRR region of bovine
epiphycan), implying that it diverged from a common ancestor earlier
than the cartilaginous fishes. It is therefore reasonable to assume that osteoglycin has a different role from epiphycan in the same way
that decorin and biglycan are in the same subfamily, but have different
properties and therefore, presumably, different roles in the
extracellular matrix.
The seven LRRs in epiphycan are heterogeneous in length. The start of the first detectable consensus sequence is 15 residues after the fourth cysteine. In common with other SLRPs, the first LRR is atypical, starting with a hydroxylamino acid and a weak consensus motif. This is likely to be due to a dramatic change in structure at this point, corresponding to the interface with the four-cysteine cluster. The lengths of the LRR-containing sequences are 24, 24, 20, 26, 21, 31, and 32 amino acids. The last LRR appears between the two C-terminal cysteines. This dimeric repeat pattern (long-short-long-short-long) differs from that of the proteoglycans decorin, biglycan, fibromodulin, and lumican, which have a triplet repeat pattern (long-long-short) (24). This, in turn, differs from the complete regularity of the RNase inhibitor, which has 15 LRRs spaced at intervals of 28 or 29 amino acids (8). A similar regularity is seen in chondroadherin, where the repeats are spaced at intervals of 24 amino acids (25). The significance of these patterns will probably become clear once the 3-dimensional structure of these domains has been determined, but may reflect a mechanism for changing the overall curvature of the molecule.3
Information on structurally important features of proteins can often be
obtained by comparison of the same protein in different species.
Comparison of chick PG-Lb, human epiphycan, and murine PG-Lb with
bovine epiphycan shows that the majority of the canonical SLRP
structure (cysteine-rich region, a series of LRRs, followed by a
C-terminal disulfide bond(s)) is highly conserved (Fig.
10). The majority of changes are conservative. In
common with decorin and biglycan, the N terminus, in front of the first
cysteine, differs considerably between species. However, a section of
30 amino acids in front of the first cysteine is quite conserved between members of this family. This region contains either an O-linked oligosaccharide (in epiphycan), as shown here, or a
GAG chain (postulated in PG-Lb). The conservation of this region may indicate functional importance. Osteoglycin is the closest relative to
epiphycan, but has almost no similarity in the N-terminal region and
only 52% identity in the region from the first N-terminal cysteine
cluster to the C-terminal cysteine cluster (Fig. 10). A shark
proteoglycan that has been partially characterized2 is
clearly related to these proteoglycans. It bears a greater similarity
to bovine osteoglycin than to bovine epiphycan and so may be a shark
counterpart to osteoglycin.
It remains to be seen to what extent the mammalian homolog of PG-Lb mimics the avian proteoglycan. Immunolocalization of PG-Lb in developing chick limb indicated that PG-Lb was most abundant in the region that contained flattened chondrocytes (9). This would imply that epiphycan may have a function in a region of cartilage that is not associated with calcification, perhaps acting to delay the onset of calcification or to arrange the matrix so that it is ready for the extensive remodeling that occurs during calcification. It may control collagen fibrillogenesis in a region where the type II collagen will be completely removed and replaced with a calcified, type I collagen-containing extracellular matrix.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) U77127.
We are grateful to Ray Boynton for preparation of the epiphyseal RNA.