(Received for publication, June 16, 1995)
From the
We have determined the primary structure of a connective tissue
matrix protein from the nucleotide sequence of a clone isolated from a
human articular chondrocyte cDNA library. The major part of the amino
acid sequence has also been determined by direct protein sequencing.
The translated primary sequence corresponds to 382 amino acid residues,
including a 20-residue signal peptide. The molecular mass of the mature
protein is 41,646 Da. The main part of the protein consists of 10
leucine-rich repeats ranging in length from 20 to 26 residues, with
asparagine at position 10 (B-type). The N-terminal part is unusual in
that it is basic and rich in arginine and proline. There are four
potential N-linked glycosylation sites present. In three of
these sites, post-translational modifications are likely to be present
since Asn was not found by direct protein sequencing. The amino- and
carboxyl-terminal parts contain four and two cysteine residues,
respectively, probably forming disulfide bonds by analogy with the
other members of this family. The protein shows highest identity (36%)
to fibromodulin and 33% to bovine lumican, two other leucine-rich
repeat connective tissue proteins. Northern blot analysis showed the
presence of an 3.8-kilobase mRNA in different types of bovine
cartilage and cultured osteoblasts, whereas RNAs isolated from bovine
kidney, skin, spleen, thymus, and trabecular bone and rat calvaria were
negative. Human articular chondrocyte and rat chondrosarcoma cell RNAs
contained an additional mRNA of
1.6 and 1.8 kilobases,
respectively.
Connective tissues are dominated by an extensive extracellular
matrix. In cartilage, the matrix has an abundance of collagen II and
aggrecan, the large aggregating proteoglycan. The extracellular matrix
is also rich in smaller proteoglycans and other non-collagenous
proteins. A few years ago, it was shown that one of the first sequenced
smaller proteoglycans, decorin(1, 2) , contains
leucine-rich repeats (LRRs). ()Since then, the number of
identified primary sequences for LRR proteins in connective tissues has
increased. Connective tissue proteins of the extracellular matrix with
LRRs so far known are biglycan(3) , decorin and
fibromodulin(4) , lumican(5) ,
chondroadherin(6) , proteoglycan-Lb(7) , and
osteoinductive factor(8) . Except for chondroadherin, all of
them are proteoglycans with one or a few glycosaminoglycan chains, and
several of these molecules have been shown to bind components of the
extracellular matrix, e.g. collagen(9) , growth
factors (10) , and cells(11) .
Intra- and
extracellular LRR-containing proteins are found in almost any system
studied, e.g. in mammalian and plant cells, yeast, and
prokaryotes(12) . The number of residues in a given LRR is
between 20 and 29, and the consensus sequence derived from all known
LRR proteins contains leucine or other aliphatic residues at positions
2, 5, 7, 12, 16, 21, and 24 and asparagine (B-type repeat), cysteine
(A-type repeat), or threonine at position 10. Recently, the three
dimensional structure of the LRR-containing porcine ribonuclease
inhibitor was determined first for the protein and then for its complex
with ribonuclease(13) . It is likely that other LRR proteins
may have a similar structure in their repeat region. The
three-dimensional structure shows that each LRR is composed of a
-sheet and an
-helix. In the ribonuclease inhibitor, the
-sheets that form the consensus part of the LRR are arrayed on one
face of the protein, while the less conserved helices are arrayed on
the opposite face. LRR-containing proteins are known to participate in
protein-protein interactions. The specificity and the diversity of the
protein-protein interactions probably arise from the non-consensus
residues.
The protein described here was originally purified as a prominent component of bovine articular cartilage with a molecular mass of 58 kDa (14) . The amino acid composition of the 58-kDa protein was similar to that of fibromodulin, with a high content of leucine and aspartic acid/asparagine residues. The 58-kDa protein was also rich in proline. The proteins showed different ionic properties, with only fibromodulin binding to DEAE-cellulose. Studies of the distribution among tissues by radioimmunoassays showed that the protein was present in many types of cartilage and also in non-cartilage connective tissues such as aorta, sclera, cornea, kidney, liver, skin, and tendon. It was not detected in bone extracts.
To further characterize the 58-kDa protein, we have determined its primary structure. This reveals that the protein belongs to the LRR family of connective tissue proteins with four potential N-linked glycosylation sites and, in contrast to those previously described, a rather basic N-terminal extension rich in arginine and proline. We therefore propose to refer to the protein as PRELP (proline arginine-rich end leucine-rich repeat protein).
Two-dimensional chromatography was performed on an endoprotease Lys-C digest by preceding the reversed-phase separation of peptides with a gel filtration step on Superdex 75 (Pharmacia Biotech Inc.) equilibrated in 4 M guanidine HCl, 25 mM phosphate, pH 6.5. Peptides were isolated from individual fractions by reversed-phase HPLC.
Figure 1:
The cDNA sequence and the translated
amino acid sequence of PRELP. The putative cleavage site of the signal
sequence is indicated (). Cysteine residues are encircled, and potential N-linked glycosylation sites
are underlined.
There is some evidence for a substituent,
possibly an O-linked oligosaccharide, on Thr. A
relatively high molecular mass peptide (10-15 kDa) could be
isolated after digestion with endoprotease Lys-C. This peptide included
the Thr residue and gave a blank on Edman degradation at this position.
However, the background in the sequence data was high, and this peptide
probably represented the blocked N-terminal peptide. A more detailed
description of the N-terminal region of this protein will require
further analysis.
Removal of N-linked oligosaccharides by N-glycosidase digestion resulted in a band of 48 kDa in
mobility (Fig. 2). Digestion with keratanase resulted in a
small, but clearly visible shift on an 8% SDS-polyacrylamide gel, thus
indicating the possible existence of keratan sulfate or polylactosamine
on the protein (Fig. 2). O-Glycosidase digestion with
and without prior digestion with neuraminidase resulted in no shift
(data not shown), indicating the absence of O-glycosidically
linked sugars.
Figure 2:
Digestion of PRELP with keratan-sulfate
1,4--D-galactanohydrolase and N-glycosidase F.
PRELP was electrophoresed on an 8% SDS-polyacrylamide gel under
reducing conditions. PRELP (10 µg) was treated as follows: lane
1, digested with keratan-sulfate
1,4-
-D-galactanohydrolase; lane 2, undigested; lane 3, digested with N-glycosidase
F.
Figure 3: Dendrogram of the core regions of LRR-containing connective tissue proteins made by the pair group maximum averages method(26, 27) . The sequences of PRELP, fibromodulin, biglycan, decorin, chondroadherin, and osteoinductive factor (OIF) are from human sources. Lumican is of bovine origin, and proteoglycan Lb (PG-Lb) is of embryonic chick origin.
Figure 4:
Alignment of PRELP (human origin),
fibromodulin (human origin), and lumican (bovine origin). The conserved
cysteine residues are indicated (&cjs0435;), and the potential N-glycosylation sites at identical sites in the three proteins
are shown (). The lengths and numbers of the LRRs are indicated.
Identical residues are shown by shaded
boxes.
Figure 5: A, Northern blot analysis of total RNAs extracted from human articular chondrocytes (lane 1), bovine osteoblasts (lane 2), bovine tracheal chondrocytes (lane 3), rat chondrosarcoma cells (lane 4), and rat calvariae (lane 5). B, Northern blot analysis of total RNAs extracted from a second trimester bovine fetus: articular cartilage (lane 1), epiphyseal cartilage (lane 2), trabecular bone (lane 3), skin (lane 4), kidney (lane 5), liver (lane 6), spleen (lane 7), and thymus (lane 8). kb, kilobases.
The primary structure of mature human PRELP connective tissue protein represents 362 amino acid residues, which correspond to a calculated molecular mass of 41,646 Da. Laser desorption mass spectrometry indicates that the mass of the intact protein falls into a broad range of 52,000 ± 2500 Da. This difference in size compared with the protein isolated from articular cartilage is at least partly due to N-linked oligosaccharide modifications since N-glycosidase digestion of the protein resulted in an apparent molecular mass of 48 kDa. The discrepancy in size of the protein after N-glycosidase digestion compared with the translated amino acid sequence is likely to be caused by the presence of other post-translational modifications of the protein. An indication of other modifications was obtained from a Lys-C digest that showed that the threonine at position 23 could not be detected by Edman degradation, indicating the presence of an O-glycosidically linked carbohydrate moiety. Direct analysis for O-glycosidically linked oligosaccharides by enzyme digestion was, however, negative.
The main part of PRELP consists of 10 LRRs. The -sheet forming
part of the repeats is highly conserved, whereas residues from position
15 in the repeats to the end of the repeats are less well conserved.
The length of the repeats ranges from 20 to 26 residues. They show a
periodicity beginning with two 24-26-residue-long repeats
followed by a shorter 20-21-residue-long repeat. The end of the
last repeat is, however, difficult to predict. This periodicity of the
last is also present in fibromodulin, lumican, decorin, and biglycan.
In analogy with the other related proteins(6, 29) ,
disulfide bridges are likely in the amino-terminal part between
cysteine residues at positions 53 and 69 and in the carboxyl-terminal
part between cysteine residues at positions 312 and 353. Several other
LRR-containing proteins show a similar pattern, with PRELP being most
homologous to fibromodulin and lumican. Proteoglycan-Lb (7) and
osteoinductive factor (8) are shorter proteins with fewer LRRs,
but with the same conserved cysteine residues in the amino-terminal
part and a pair of cysteines in the carboxyl-terminal part.
Chondroadherin diverges partly from this pattern of cysteines at its
carboxyl-terminal end, with four cysteines forming two disulfide
bridges(6) .
The post-translational modifications of the LRR-containing connective tissue proteins differ. All of them, except for chondroadherin, appear to have oligosaccharide substitutions, but with different content and in different numbers. Biglycan and decorin have two and one chondroitin/dermatan sulfate chains, respectively, close to the amino terminus(30) . Fibromodulin and lumican have at least one and at the most four keratan sulfate chains positioned at the N-linked oligosaccharide substitutions(31) . The four N-glycosylation sequences are located at conserved positions in LRR-1, -3, -5, and -8 in both fibromodulin and lumican. In fibromodulin, the four sites are identified as hexosamine-rich, but whether they all contain keratan sulfate is uncertain. The sulfate substitutions show variations in different tissues(32) . Arterial lumican appears to be unsulfated, whereas corneal lumican is highly sulfated. Fibromodulin contains sulfated tyrosine residues in the amino-terminal part(30) , and lumican contains consensus sites for tyrosine sulfation. The sulfate substitutions of fibromodulin and lumican contribute acidic properties to the proteins. However, in addition, the primary sequences show low pI values. PRELP differs considerably from lumican and fibromodulin in that its basic amino-terminal region lacks consensus sites (33, 34) for tyrosine sulfation. Whether PRELP is a proteoglycan with keratan sulfate chains is not clear. Four potential N-linked glycosylation sites are present according to the consensus sequence Asn-Xaa-Ser/Thr. In three of the sites, post-translational modifications are likely to be present as amino acid sequence analysis gave blank cycles at these positions. Two of the N-linked oligosaccharide sites are situated in the same position in the LRRs as in fibromodulin and lumican (LRR-1 and -8), whereas the last substituted glycosylation site is positioned in the last repeat. The behavior on DEAE-cellulose chromatography shows that PRELP has basic properties. However, due to basic residues in the amino-terminal part, which give the protein's primary structure a higher pI than without (9.7 versus 8.3), a shorter keratan sulfate chain might be present. Keratanase digestion showed a small shift on a SDS-polyacrylamide gel, which might indicate either a keratan sulfate chain or a non-sulfated polylactosamine. Carbohydrate analysis (14) does not exclude keratan sulfate/polylactosamine substitutions. Attempts to identify keratan sulfate chains on the intact protein and on peptide fragments of the protein by the use of several monoclonal antibodies to keratan sulfate were not conclusive.
Residues 4-47 contain an arginine- and proline-rich segment followed by a proline-rich segment. Inserted between these is a short acidic segment. The proline-rich segment is reminiscent of three turns of an extended collagen-type helix and is therefore likely to form an extended structure. The arginine- and proline-rich segment is also likely to form an extended structure due to steric occlusion and/or charge repulsion of the side chains. The N terminus may, therefore, form an extended structure or loop back on itself in a hairpin, depending on whether the basic and acidic residues interact with each other.
The basic region of the amino-terminal part in PRELP contains two T/GRRPRP sequences. This sequence corresponds to the proposed sequence for protein-glycosaminoglycan interactions: X-B-B-X-B-X, where B denotes a basic residue(35) . The consensus sequence was derived from 12 known heparin-binding sequences in vitronectin, apolipoproteins E and B-100, and platelet factor 4. It has also been suggested that the non-collagenous NC4 domain of collagen type IX, which is basic and has one consensus sequence at the N-terminal end, may interact with polyanionic glycosaminoglycan in cartilage(36) . The existence of an interaction between PRELP and glycosaminoglycans, however, has to be experimentally verified.
Most of the LRR-containing proteins have been shown to participate
in protein-protein interactions probably mediated through the LRR
structure. Chondroadherin is cell binding(11) . Fibromodulin
and decorin bind to collagens I and II and affect fibril
formation(9, 37, 38) , and biglycan binds to
collagen VI. ()Biglycan, decorin, and fibromodulin have all
been shown to bind to transforming growth factor-
(10) .
Whether PRELP has any of these properties has yet to been determined.
In Northern blot analysis, PRELP seems to be synthesized in high amounts only in cartilage since in RNA isolated from tissues of a bovine fetus (liver, kidney, skin, spleen, and thymus), no mRNA corresponding to PRELP was detected. The detection of PRELP mRNA in cultured bovine osteoblasts may be the effect of up-regulated PRELP expression under culture conditions since the bovine trabecular bone and rat calvarial RNAs gave no positive signal. In the radioimmunoanalysis carried out by Heinegård et al.(14) , no PRELP was detected in bone extracts. However, the radioimmunoanalysis indicated the presence of PRELP in bovine kidney, liver, and skin extracts. One possible explanation could be the age of the tissues. In the literature, a 55-kDa protein with similar properties to PRELP has been described(39) . This protein appears to be identical to PRELP as partial peptide sequences of the 55-kDa protein are identical to PRELP at positions 138-149 and 217-228. The 55-kDa protein seems to be deficient in newborns and accumulates in cartilage with age rather than being destroyed and resynthesized by the chondrocytes.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U29089[GenBank].