(Received for publication, November 16, 1995; and in revised form, February 7, 1996)
From the
Previous studies showed that the keratan sulfate-containing proteoglycans of bovine corneal stroma contain three unique core proteins designated 37A, 37B, and 25 (Funderburgh, J. L., Funderburgh, M. L., Mann, M. M., and Conrad, G. W.(1991) J. Biol. Chem. 266, 14226-14231). Degenerate oligonucleotides designed from amino acid sequences of the 37A protein were used to screen a cDNA expression library from cultured bovine keratocytes. A cDNA clone coding for keratocan, a 37A protein, was isolated and sequenced. The deduced keratocan amino acid sequence is unique but related to two other keratan sulfate-containing proteins, lumican (the 37B core protein) and fibromodulin. These three proteins share approximately 35% amino acid identity and a number of conserved structural features. Northern hybridization and immunoblotting of tissue extracts found keratocan distribution to be more limited than that of lumican or fibromodulin. Keratocan is abundant in cornea and sclera and detected in much lesser amounts in skin, ligament, cartilage, artery, and striated muscles. Only in cornea was keratocan found to contain large, sulfated keratan sulfate chains. Keratocan, like lumican, is a core protein of a major corneal proteoglycan but is present in non-corneal tissues primarily as a non-sulfated glycoprotein.
Proteoglycans of the corneal stroma are important in maintaining
the transparency of the cornea. These complex molecules are responsible
for the hydrophilic character of the tissue, providing tissue hydration
that is essential for transparency(1) . It is not surprising,
therefore, that the corneal stroma has a markedly different
proteoglycan composition than that of other fibrous connective tissues
such as skin and sclera. The unique character of corneal proteoglycans
was recognized almost 60 years ago with the initial description of
keratan sulfate, the most abundant glycosaminoglycan in the
cornea(2) . Corneal keratan sulfate is a highly sulfated,
linear polymer of N-acetyllactosamine, linked to asparagine
residues in the KSPG ()core proteins via a
mannose-containing oligosaccharide(3, 4) . Because the
keratan sulfate glycosaminoglycans are posttranslational modifications
of the KSPG proteins, determination of the structure and
tissue-specific expression of these core proteins are essential to
understanding the biological roles of KSPG. Research from our
laboratory has demonstrated the existence of three KSPG proteins in
bovine cornea. These proteins (designated 37A, 37B, and 25) have unique
primary structures and differ in glycosylation, with protein 37A
containing three keratan sulfate chains and the other two proteins
containing one keratan sulfate chain each(5) . Complementary
DNA coding for protein 37B (lumican) has been cloned, and the deduced
amino acid sequence revealed homology to three other proteoglycan
proteins, fibromodulin, decorin, and biglycan(6, 7) .
Lumican is present in several tissues other than cornea in a
non-sulfated form(6, 7) .
In this paper, we report the sequence of cDNA encoding a KSPG 37A protein (keratocan). This DNA was isolated by screening a cDNA library with degenerate oligonucleotides representing amino acid sequences of tryptic peptides from bovine corneal KSPG protein 37A. The deduced amino acid sequence, size of mRNA, and tissue distribution of keratocan differentiate it from lumican.
Total RNA was isolated
from cultured keratocytes and from various bovine tissues after
pulverization in liquid nitrogen and homogenization in a denaturing
solution containing 4 M guanidine isothiocyanate, 0.1 M 2-mercaptoethanol, 0.5% Sarkosyl, and 25 mM sodium
citrate, pH 7.0(8) . The homogenate was mixed with 0.1
volume 2 M sodium acetate, pH 4.0, and then extracted with an
equal volume of phenol/chloroform/isoamyl alcohol (25:24:1). RNA was
ethanol-precipitated from the aqueous phase, and carbohydrates were
extracted with 4 M LiCl. After reprecipitation with isopropyl
alcohol, RNA was dissolved in 10 mM Tris, pH 7.6, 1 mM EDTA, and stored at -70 °C.
Oligonucleotides were designed by reverse translation of amino acid
sequences of tryptic peptides from the 37A protein (6) inserting inosine to reduce the degeneracy of the probes.
The oligonucleotide 5`-TGGTAYYTITAYYTIGARAAYAAYYT-3` (degeneracy 256)
was produced from the sequence of the peptide designated IWYL, and the
oligonucleotide 5`-YTIGAYYTICARCAYAAYAAYAA-3` (degeneracy 128) was
produced from the peptide designated FSNL. Initial screening was done
at 20-30 10
plaques/150-mm plate. Phage DNA
from plaques was transferred to Magna Lift nylon membranes (Micron
Separations); denatured in 0.5 mM NaOH, 1.5 M NaCl;
neutralized in 0.5 M Tris-HCl, pH 7.6, 1.5 M NaCl;
and fixed with 120 mJ/cm
UV light. The membranes were
prehybridized overnight at 42 °C in prehybridization solution
containing 6
SSC (SSC is 0.15 M NaCl, 0.015 M sodium citrate, pH 7.0), 0.010 M sodium phosphate, pH
6.8, 5
Denhardt's solution(9) , 100 µg/ml
denatured herring sperm DNA, 1 mM EDTA, and 0.5% SDS. The
oligonucleotide probe FSNL, end-labeled with
[
-
P]ATP to a specific activity of >2
10
cpm/µg, was diluted to 1
10
cpm/ml of prehybridization solution, and the membranes were
hybridized at temperatures decreasing from 60 to 37 °C over a
period of 2-2.5 days. The membranes were washed in 6
SSC,
0.05% SDS at room temperature for 1 h with three changes of wash
solution and then washed again under the same conditions except that
the temperature was raised to 37 °C. Positive clones were
rescreened with probe IWYL using the same methods at a density of
200-300 plaques/100-mm plate to ensure isolation of pure plaques.
The pBluescript plasmid containing the cDNA insert was released from the Uni-Zap XR bacteriophage by coinfecting E. coli XL1-Blue cells with purified phage and ExAssist helper phage. The inserts were excised from purified plasmid DNA by digestion with EcoRI and XhoI, and their sizes were checked by agarose gel electrophoresis. The sequence of the cDNA insert was determined using an Applied Biosystems sequencer and the walking primer method(9) . The sequence was read at least two times in each direction.
The nucleotide sequence of the clone 3710 insert is shown in Fig. 1. The IWYL and FSNL probe sequences are located starting at nucleotides 445 and 671, respectively. The longest open reading frame is shown as a 352-amino acid protein. This predicted protein contains the amino acid sequences (underlined in Fig. 1) of all three tryptic peptides (IWYL, FSNL, and NVXV) previously identified in bovine corneal 37A protein(5) . The 20 most N-terminal amino acids have characteristics of a signal peptide(13) , predicting a mature protein of 332 amino acids with a molecular mass of 38,047 Da. This size corresponds closely to the experimentally determined molecular mass of the 37A core protein(5) .
Figure 1: Complementary DNA and deduced amino sequence of clone 3710. The DNA sequence of the cDNA insert of clone 3710 is shown from its insertion site (EcoRI) to the beginning of the poly(A) tail. Translation of the longest open reading frame is shown below the DNA sequence with the amino acid residue numbers in parentheses at the right. The underlined regions are amino acid sequences identified previously (5) in tryptic fragments of the 37A protein. The amino acid predicted to be the N terminus after cleavage of the signal peptide is boxed.
Figure 2: Alignment of KSPG protein sequences. The deduced amino acid sequences of bovine 37A and lumican (6) are aligned inserting gaps to maximize amino acid identities. Leucine-rich repeats are underlined, and the six conserved cysteines are designated by (+). The three consensus sites for N-linked glycosylation shared between the two proteins are shown by inverted triangles. Two other such sites in the 37A sequence are designated by asterisks. Sites of potential tyrosine sulfation are boxed. The sequences are shown from the known or predicted N terminus of the mature protein (residues 17 for lumican and 21 for 37A)(21) . Residues are numbered from the beginning of the deduced amino acid sequence.
Figure 3: Amino acid sequence relationships among bovine proteoglycan proteins. Bovine lumican, 37A, fibromodulin, decorin, and biglycan were aligned using a PAM 250 matrix(22) , and the relatedness of the five sequences was determined using the unweighted pair group method with arithmetic mean(23) . Similarity of amino acid sequences between any two proteins is inversely proportional to the line lengths connecting the two.
Figure 4: Detection of lumican and 37A transcripts. Northern blotting of 10 µg of total RNA purified from several bovine tissues was carried out using a labeled probe for the 37A mRNA as described under ``Experimental Procedures'' (A). After exposure, the blot was stripped of 37A probe, assessed for completeness of stripping, and reprobed with a probe for lumican (B) as described under ``Experimental Procedures.'' Migration of RNA standards (in kilobases) is shown on the left.
Figure 5: Specificity of anti-37A antibody. Bovine corneal KSPG core proteins were separated into two fractions by DEAE-Sephacel chromatography as described previously(5) . The fractions were then separated by SDS-polyacrylamide gel electrophoresis and subjected to immunoblotting with antibody against KSPG (lanes 1 and 2) or with antibody affinity-purified with a synthetic peptide from the deduced 37A sequence (lanes 3 and 4). Size of marker proteins (in kDa) is shown on the left.
Figure 6:
Detection of 37A protein in bovine
tissues. Extracts of bovine tissues in 6 M urea were treated
with endo--galactosidase, then fractionated by SDS-polyacrylamide
electrophoresis, and subjected to immunoblotting with the
affinity-purified anti-37A antibody as described under
``Experimental Procedures.'' Size of marker proteins (in kDa)
is shown on the left. The major band in cornea was
approximately 50 kDa.
Figure 7:
Size of fully glycosylated 37A molecules.
Crude tissue extracts were fractionated by selective alcohol
precipitation, then analyzed by SDS-polyacrylamide gel electrophoresis,
and transferred to nitrocellulose as described under
``Experimental Procedures.'' The 37A antigens were unmasked
by treatment of the membrane in 0.1 M Tris phosphate, pH 6.8,
1 mg/ml bovine serum albumin with endo--galactosidase, 0.00015
unit/ml, overnight at room temperature. The antigens were detected with
anti-37A antibody as in Fig. 6. Size of marker proteins (in kDa)
is given on the right.
Our previous studies have demonstrated that bovine corneal
KSPG is a mixture of at least three unique core proteins. Our current
data confirm that two of these three are structurally similar proteins
that arise from unique messenger RNAs. The lack of significant regions
of identity between lumican and 37A mRNAs indicates that they do not
arise as a result of alternate splicing. The most reasonable
interpretation of these data is that the 37A and lumican proteins are
products of different genes. This interpretation is confirmed by recent
studies that found lumican (15) and 37A ()genes to
map at different chromosomal loci in the mouse. We propose the name
``keratocan'' for the 37A gene because only in the cornea
does the protein assume the form of a proteoglycan.
The keratocan gene appears to be the newest member of the family of proteoglycans containing leucine-rich repeat motifs. These proteins each contain an N-terminal hypervariable region, six highly conserved cysteines, and 11 leucine-rich repeats(6) . There are now five bovine proteins in this group for which cDNA sequences have been determined. Amino acid sequence similarities suggest two subgroups within this family of proteins. Keratocan, fibromodulin, and lumican (all keratan sulfate proteoglycans) are in one group, and decorin and biglycan (dermatan sulfate proteoglycans) are in the other (Fig. 3). This grouping also appears to be reflected at the genomic level. The genes for decorin and biglycan contain 8 exons(16) , whereas those for fibromodulin and lumican contain 3 exons(17, 18) . Another feature that differentiates these groups is the presence of consensus sites for tyrosine sulfation. Keratocan, lumican, and fibromodulin each contain at least one tyrosine followed by an acidic amino acid (Asp or Glu) in the N-terminal (hypervariable) portion of the molecule. In the case of fibromodulin, these tyrosines have been shown to be sulfated(14) . In keratocan, a single potentially sulfated tyrosine is found at amino acid residue 27 (Fig. 2). Similar tyrosine residues are absent in biglycan and decorin. The presence of potentially sulfated tyrosines in all three keratan sulfate-containing proteins and their absence in the dermatan sulfate proteoglycans suggest the possibility that sulfated tyrosine may provide a signal for posttranslational addition of keratan sulfate chains.
This family of related proteoglycans currently has five
known members, but it is highly likely that more exist. A BLAST search
of GenBank sequences similar to keratocan identified an
unpublished sequence (accession number U29089) from a human cartilage
cDNA library that codes for a protein with an amino acid sequence very
similar to those for the three keratan sulfate-containing members of
the family (not shown). In addition, amino acid sequences from tryptic
fragments of the third corneal KSPG core protein (the 25-kDa protein)
indicate that it may also be a member of this group(5) .
Lumican and keratocan both have a characteristic unlike other known proteoglycans. In the cornea, these proteins are core proteins of corneal keratan sulfate proteoglycan and bear long, highly sulfated keratan sulfate chains. In the other tissues in which they are found, however, these proteins occur as poorly sulfated or non-sulfated glycoproteins. Lumican from the artery contains oligomeric N-acetyllactosamine, essentially a non-sulfated form of keratan sulfate (19) . Our data (Fig. 7) suggest that keratocan, like lumican, contains short, non-sulfated poly(N-acetyllactosamine) chains in tissues other than cornea. The different abundance and tissue localization of these two proteins suggest that, outside of the cornea, they may have different functions. In the cornea, however, both proteins have been recruited into the unique KSPG molecules thought to be essential for corneal transparency.
The specialized nature of the KSPG form of these two proteins may be
inferred from their pattern of glycosylation. Recent x-ray
crystallographic studies of ribonuclease inhibitor, another
leucine-rich repeat-containing protein, show that the repeats in this
protein form -sheets in the hydrophobic core of the molecule,
whereas the intervening sequences form
helices on the exterior of
the molecule(20) . Lumican and keratocan have three conserved
consensus sites for N-glycosylation (Fig. 2). Each of
these occurs between leucine-rich repeats, therefore making these sites
candidates for the addition of keratan sulfate chains. Our previous
work showed that corneal lumican contains one keratan sulfate chain and
that keratocan contains three(5) . This pattern of
post-translational addition of keratan sulfate indicates a high level
of selectivity in terms of both the individual proteins involved and of
choice of sites to which keratan sulfate is added. Such a level of
targeting suggests the presence in keratocytes of a unique
glycosylation system involved in biosynthesis of corneal KSPG.
Understanding the nature of this system and the signals that initiate
its action are important goals in understanding the role of KSPG in the
cornea and the biochemical mechanisms by which this role is
maintained.
Note Added in Proof-Since submission of this manuscript, a description of the Protein U29089 was published (Bengtsson, E., Neame, P. J., Heinegard, D., and Sommarin, Y.(1995) J. Biol. Chem.270, 25639-25644). [Abstract/Full Text]
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U48360[GenBank].