(Received for publication, February 15, 1995; and in revised form, May 31, 1995)
From the
A human lumican cDNA sequence was derived by polymerase chain reaction techniques from RNA obtained from intestine, placenta, and articular cartilage. A contiguous sequence of 1729 bases was obtained corresponding to an observed message size of 1.8 kilobases (kb). The cDNA sequence consists of an 80-base pair (bp) 5`-untranslated region, a 1014-bp coding sequence, and a 618-bp 3`-untranslated region terminating in a 17-bp poly(A) tail. The deduced lumican protein sequence has 338 amino acids, including a putative 18-residue signal peptide. The human lumican gene was shown to be spread over about 7.5 kb of genomic DNA and to be located on chromosome 12q22. The gene consists of 3 exons separated by introns of 2.2 and 3.5 kb. The shorter 5`-intron resides 21 bases prior to the translation initiation codon, and the 3`-intron resides 152 bases prior to the translation termination codon. The lumican message is expressed at high levels in adult articular chondrocytes but at low levels in the young juvenile. This age-related trend in message level is not, however, common to all tissues in which the lumican gene is expressed. Lumican is present in the extracellular matrix of human articular cartilage at all ages, although its abundance is far greater in the adult. In the adult cartilage lumican exists predominantly in a glycoprotein form lacking keratan sulfate, whereas the juvenile form of the molecule is a proteoglycan.
Lumican belongs to the family of relatively small leucine-rich
proteoglycans that are present in the extracellular matrix of many
tissues. In addition to lumican, the family includes decorin, biglycan,
and fibromodulin(1, 2) , and each family member has a
common structure consisting of a central region of leucine-rich repeats
that are flanked at either side by a disulfide-bonded domain. The
central leucine-rich region possesses attachment sites for N-linked oligosaccharides, which in fibromodulin and lumican
may be modified by sulfation of their polylactosamine units to yield
keratan sulfate. Fibromodulin and lumican may therefore also be classed
as keratan sulfate-proteoglycans (KS-PG). ()Such
modification does not usually occur in decorin and biglycan, and they
retain unsulfated oligosaccharides. These proteoglycans do possess
attachment sites for chondroitin sulfate in the amino-terminal regions
of their core protein. In many tissues the chondroitin sulfate is
modified to dermatan sulfate by epimerization, resulting in decorin and
biglycan being classed as dermatan sulfate-proteoglycans (DS-PG).
The structure of the leucine-rich repeats places lumican, fibromodulin, decorin, and biglycan in the leucine-rich repeat (LRR) superfamily of proteins(3) . Each member of the family is characterized by multiple adjacent leucine-rich regions, which may possess from 20 to 29 amino acid residues and be repeated up to 30 times. In the case of the four proteoglycans, the leucine-rich repeats consist of 24 amino acid residues, which is the most common size in other family members. Each of the proteoglycans also possesses 10 of the repeating units between the flanking disulfide-bonded domains. The presence of a common structural motif suggests that the four proteoglycans may share common functional properties. Such a common function may be the interaction with fibrillar collagen(4, 5, 6) . The presence of the proteoglycans is thought to influence the interaction of the collagen fibrils with one another or with other matrix components(7) , and as the binding site for each proteoglycan appears to be distinct(7, 8) , it is likely that each family member fulfills a different role.
The complete amino acid sequences for bovine and chicken lumican have been deduced from cDNA clones(9, 10) . The bovine and chicken cDNA clones encode proteins of 342 and 343 amino acids, respectively, and in both cases the first 18 amino acids are thought to represent signal peptides. In the case of the bovine cornea, several distinct KS-PG isoforms have been isolated with deglycosylated core protein sizes of 37 and 25 kDa(11) . The 37-kDa core proteins could be further subdivided into two components, designated 37A and 37B, on a charge basis due to differences in KS content(12) . Only the proteoglycan with the lowest KS content (37B) represents lumican(13) . The other proteoglycans appear to have different protein structures (12) and represent the products of separate mRNAs(14) . It is likely that one of these proteoglycans may be fibromodulin, although it has been shown that corneal decorin may exist in a KS-PG form, bearing both chondroitin sulfate and keratan sulfate chains in the adult chick (15) . It is also apparent that under some conditions lumican can exist in a glycoprotein form rather than a proteoglycan form(16) , and in the cornea the conversion from nonsulfated polylactosamine chains to keratan sulfate chains is developmentally regulated(17) .
While there is considerable information on corneal lumican, much less is known about the expression of this molecule in other connective tissues, particularly in cartilage, which is known to be rich in decorin, biglycan, and fibromodulin(18) . Furthermore, there is as yet no information on the organization of the lumican gene or its chromosomal localization. The aim of this work is to address these deficits in the human, where there is also no reported information on the protein sequence of lumican.
Figure 1: The cDNA sequence of the human lumican gene. The figure describes the nucleotide sequence, with numbering appearing on the right. The positions of the translation initiation codon (ATG; underline), the translation termination codon (TAA; brokenunderline), and the polyadenylation signal (AATAAA; doubleunderline) are indicated.
In addition, the cDNA fragments HAC.1 and HAC.2, spanning the expressed lumican sequence, were amplified from adult human chondrocyte cDNA. Primers representing bp 42-59 and bp 1091-1111 for HAC.1 and bp 768-785 and bp 1507-1526 for HAC.2 (Fig. 1) were chosen from the human lumican sequence derived from the placental and intestinal cDNA. Both amplifications were for 40 cycles (using 94 °C for 45 s, 42 °C for 30 s, 72 °C for 1 min per cycle for HAC.1 and 94 °C for 45 s, 50 °C for 30 s, 72 °C for 1.5 min for HAC.2). All amplified fragments described above were cloned into the pCRII vector using the TA cloning kit of Invitrogen, following the manufacturer's instructions, and subjected to sequence analysis.
To obtain data on intron size and splice junction sites, the Expand Long Template PCR system was used to PCR amplify intron fragments from genomic DNA. Primer pairs representing bp 42-61 and bp 113-132, and bp 768-785 and bp 1091-1111, from the human lumican cDNA sequence (Fig. 1) were used with 250 or 500 ng of genomic DNA, as suggested by the manufacturer. After an initial denaturation for 2 min at 94 °C, amplifications were performed for 30 cycles (94 °C for 10 s, 42 °C for 30 s, 68 °C for 15 min per cycle). 20 µl from each reaction were analyzed on a 0.7% agarose gel. The major band from each reaction was purified from the gel using the Qiaquick gel extraction kit. Purified fragments were blunt-ended using the Klenow fragment of Escherichiacoli DNA polymerase I and phosphorylated with bacteriophage T4 polynucleotide kinase(23) . They were then cloned into SmaI-digested pUC18 for identification by sequence analysis.
To obtain the chromosome location, a panel of human/hamster somatic cell hybrids covering the entire human genome was PCR screened in two separate reactions. The first reaction used primers representing bp 301-320 and bp 459-478 (Fig. 1) in a 40-cycle reaction (94 °C for 45 s, 50 °C for 30 s, 72 °C for 30 s per cycle), while the second reaction used primers representing bp 768-785 and bp 850-867 (94 °C for 45 s, 42 °C for 30 s, 72 °C for 45 s per cycle). The products were analyzed on a 1.5% agarose gel.
Figure 2:
Comparison of the deduced protein sequence
of human lumican with those from other species. The complete amino acid
sequences are given for human lumican (HLM), bovine lumican (BLM), and chick lumican (CLM). The sequences are
aligned to indicate conserved cysteine residues (openboxes), asparagine residues capable of participating in N-linked glycosylation (shadedboxes), and
leucine-rich regions (blackboxes). The position for
cleavage of the putative signal peptides is also indicated ().
The numbering for each sequence appears on the left, and amino
acids are represented in single letter code. The ditto sign (``)
indicates the same amino acid as is present in the human lumican
sequence. The sequences for the chick and bovine lumican are derived
from the published data of Blochberger et al. (9) and
Funderburgh et al. (10) ,
respectively.
The cDNA encodes a protein of 338 amino acids (Fig. 2), of which the first 18 amino acids represent a putative signal peptide(33, 34) . The deduced amino acid sequence was verified by additional nucleotide sequencing using PCR products generated from adult human chondrocyte RNA. The sequence shows six cysteine residues at amino acid positions 37, 41, 43, 53, 295, and 328 of the primary translation product that could be involved in disulfide bond formation, with 10 leucine-rich regions separating the first four and last two cysteine residues. This central leucine-rich repeat region contains four asparagine residues at amino acid positions 88, 127, 160, and 252 that form part of the consensus sequence (NX(S/T)) necessary for substitution by N-linked oligosaccharides or keratan sulfate. These features are conserved in the published sequences for bovine (10) and chick (9) lumican.
Figure 3: Schematic representation of the organization of the human lumican gene. The lumican cDNA is depicted as a lineabove that for the corresponding genomic DNA, with the relative sites of intron insertion in the genomic DNA indicated by connectingdashedlines. The genomic DNA depicts exons as openbars, with the nucleotide sequence surrounding each splice site indicated below. The nucleotide sequences representing introns are given in lowercase. The genomic DNA sequence is drawn to scale with the length of each intron indicated above. The positions of the translation initiation codon (ATG), the termination codon (TAA), and the polyadenylation signal (AATAAA) are depicted above the cDNA. The numbering of bases in the nucleotide sequences refers to the positions in the cDNA sequence (Fig. 1).
The location of the putative introns was confirmed by nucleotide sequence analysis, with both showing classical GT/AG sequences flanking the intron splice junctions(35) . The 5`-intron splice site occurs between nucleotides 59 and 60 in the cDNA sequence, a site that is within the 5`-untranslated region and 21 bases prior to the translation initiation codon (Fig. 3). The 3`-intron splice site occurs between nucleotides 942 and 943 and resides within the coding sequence at a site 152 nucleotides prior to the translation termination codon. This junction indicates a phase 1 intron, which splits the codon for lysine (AAG) at amino acid residue 288 in the deduced protein sequence between the first and second bases. The occurrence of the above splice junctions suggests that the human lumican gene consists of three exons spanning about 7.5 kb of genomic DNA, although it is possible that additional introns may exist at the extreme 5`- and 3`-ends of the gene. The first exon encodes most of the 5`-untranslated region, the second exon encodes most of the coding region, and the third exon encodes the remainder of the coding region and the 3`-untranslated region.
Figure 4: Analysis of DNA from human-hamster somatic cell hybrids. DNA was analyzed by PCR using oligonucleotide primers that give a specific reaction product from the human lumican gene. Analysis was carried out on DNA from 12 hybrids, with a positive reaction being obtained in three cell lines (lightshading). The human chromosomes contained in each cell line are indicated (dot), as is the chromosome common to the cell lines generating a positive response (darkshading).
To determine the precise location of the gene on chromosome 12, fluorescence in situ hybridization analysis was performed using the 2.3- and 3.8-kb genomic plasmid clones spanning the first and second introns of the gene, respectively. Two independent experiments were performed, and over 100 metaphase cells were evaluated. Signals were clearly seen on two chromatids of at least one chromosome band 12q22 in 25% of cells using the 2.3-kb plasmid clone (Fig. 5) and in 5% using the 3.8-kb plasmid clone (data not shown). No other chromosomal sites with consistent signals were detected in more than 1% of cells.
Figure 5: Fluorescence in situ hybridization mapping of the human lumican gene. A human chromosome preparation was hybridized with the biotinylated plasmid probe, which contains the 2.3-kb DNA insert spanning the first intron of the human lumican gene. The fluorescein isothiocyanate signals were clearly shown at the chromomycin and distamycin reverse banded chromosomal region of 12q22 (A). A human chromosome 12 ideogram showing the location of the human lumican gene at the region of 12q22 is depicted next to the micrograph (B).
Figure 6: Northern blot analysis of lumican, fibromodulin, decorin, and biglycan messages in articular chondrocytes. Total RNA preparations were obtained from chondrocytes isolated from a 4-week-old neonate (N) and a 66-year-old adult (A), and 10 µg was used for analysis. The blot was analyzed with cDNA probes for the four proteoglycan messages and that for glyceraldehyde-3-phosphate dehydrogenase. The positions of the 18 and 28 S ribosomal RNA are indicated.
The increase in lumican message expression between juvenile and adult chondrocytes is not unique to the lumican gene but is also apparent for the fibromodulin and decorin genes. While quantitation from Northern blots can be misleading due to variation in probe specific activity and multiple size messages, the current data strongly suggest that the relative abundance of the lumican message is much higher than that for the messages of the three other LRR-proteoglycan genes in adult articular chondrocytes (Fig. 6). The biglycan gene is the only family member whose gene shows decreased expression in the adult relative to the juvenile for articular cartilage.
Expression of the lumican gene is not confined to articular cartilage but is widespread in different tissues (Fig. 7). Expression in the adult was high in heart, placenta, skeletal muscle, kidney, and pancreas but low in brain, lung, and liver. Expression in kidney showed the same age trend as observed in articular cartilage, with low expression in fetal RNA and high expression in the adult. In contrast, the lung showed the opposite trend, with higher message expression in the fetus than the adult. In general, the expression of the lumican message resembled that of the fibromodulin and decorin messages in the adult but was quite distinct from that of the biglycan message, which was expressed at high levels in the lung and liver. In the fetus, differences were observed for the relative expression of each LRR-proteoglycan message in different tissues.
Figure 7: Northern blot analysis of lumican, fibromodulin, decorin, and biglycan messages in various tissues. Total RNA preparations were analyzed from heart (lane1), brain (lane2), placenta (lane3), lung (lane4), liver (lane5), skeletal muscle (lane6), kidney (lane7), and pancreas (lane8) with tissue obtained from either adults or fetuses. The blots were probed for the four proteoglycan messages plus that for glyceraldehyde-3-phosphate dehydrogenase. The positions of reference messages of defined size are indicated (4.4, 2.4, and 1.35 kb).
Figure 8:
Western blot analysis of lumican in
extracts of articular cartilage. Dialyzed cartilage extracts were
analyzed by SDS-PAGE without any prior purification. The fractionated
proteins were transferred to nitrocellulose by electroblotting and
immunolocalized. Extracts were from a 6-week-old neonate (N),
a 3-year-old juvenile (J), a 37-year-old adult (A), and a 64-year-old adult (A
), and the samples represent the material
extracted from 0.5 mg of tissue. Extracts were analyzed without
modification (A) or following treatment with keratanase II (B) or endo-
-galactosidase (C). The positions of
prestained molecular weight markers are
indicated.
The molecular structure of the human cartilage
lumican was further studied by investigating the effect of keratanase
II and endo--galactosidase, which are capable of degrading
sulfated and nonsulfated polylactosamine chains, respectively. Both
enzymes reduced the size heterogeneity of the cartilage lumican,
indicating substitution by keratan sulfate. However, the greatest
effect was by endo-
-galactosidase, which yielded the same size
product from all ages studied. This product had an average size of
about 57 kDa and was equivalent in size to the major form of lumican
detected in the adult cartilage extracts prior to enzyme treatment.
This suggests that in adult cartilage much of the lumican exists in a
form devoid of keratan sulfate or polylactosamine chains. These
molecules are still presumably substituted by oligosaccharides, since
their size is greater than that expected for a mature core protein of
320 amino acids (Fig. 2). The size difference would suggest that
all four of the potential N-linked oligosaccharide attachment
sites might be occupied, as occurs in fibromodulin(36) . In the
mature adult there is some evidence for fragmentation of the lumican
core proteins, as small amounts of immunoreactive material are detected
with sizes ranging between 20 and 50 kDa.
The human lumican message contains an open reading frame of 1014 bases giving rise to a deduced protein sequence of 338 amino acids. This is slightly shorter than the corresponding primary translation products for the bovine and chick lumican messages, which give rise to proteins of 342 and 343 amino acids, respectively(9, 10) . In all cases signal peptides representing the first 18 amino acids have been predicted, and at least in the bovine the subsequent amino acid has been shown to represent the start of the mature protein isolated from cornea(10) . The coding region of the human lumican message shows 86 and 84% identity at the nucleotide and amino acid levels, respectively, with the equivalent bovine sequences, but only 65 and 67% identity at the nucleotide and amino acid levels, respectively, with the equivalent chick sequences. The 5`- and 3`-untranslated regions for the lumican messages for the three species are also of similar size, giving rise to single components on Northern blotting of between 1.8 and 2.0 kb.
The coding sequence of the human lumican message is shorter than that for the three other human LRR-proteoglycan messages, which encode proteins of 376, 368, and 359 amino acids for fibromodulin, biglycan, and decorin, respectively(25, 27) . This difference is reflected mainly in the number of amino acids preceding the N-terminal conserved disulfide-bonded domain in the molecules. The coding region for the lumican message shows 53 and 47% identity at the nucleotide and amino acid levels, respectively, with the equivalent region of the fibromodulin message but lower identity with the decorin and biglycan messages, where identity at the amino acid level decreases to 36% in both cases. The size of the human lumican message is similar to that of decorin but is smaller than those for biglycan and fibromodulin, which are about 2.6 and 3.0 kb, respectively(27, 37) , due mainly to variation in the length of the 3`-untranslated regions.
The different members of the LRR-proteoglycan family show considerable conservation of amino acid sequence with respect to the presence of 10 leucine-rich repeats flanked by cysteine-rich domains. This homology is greatest in the KS-PG members of the family, which also show conservation of four asparagine residues that can act as potential sites for N-linked oligosaccharide substitution in the leucine-rich repeat region. In the case of bovine fibromodulin(38) , it has been shown that each of these sites can also be occupied by a N-linked keratan sulfate chain(36) . One would therefore predict that all members of this KS-PG family may be substituted with keratan sulfate at the equivalent sites, although not all sites need be occupied on a given molecule.
The greater homology of lumican with fibromodulin than with decorin or biglycan at both the protein and glycosaminoglycan levels also extends to gene organization. Both the human lumican and fibromodulin genes are composed of three exons, with the first intron just preceding the translation initiation site and the second intron just preceding the termination codon(25) . In the case of the human fibromodulin gene, the first intron is only 4 bases prior to the initiation codon, whereas this separation increases to 21 bases in the lumican gene. The intron itself is also smaller in the fibromodulin gene relative to the lumican gene, with sizes of about 1.0 and 2.2 kb, respectively. The second intron resides 150 bases prior to the termination codon in the human fibromodulin gene, whereas the separation is 152 bases in the lumican gene. This two-base difference results in fibromodulin having a phase 0 intron that separates two distinct codons, whereas lumican has a phase 1 intron that divides a single codon. The second intron in fibromodulin is larger than that in lumican, with sizes of about 5 and 3.5 kb, respectively. The differences in intron size and a longer exon 3 encoding the 3`-untranslated region make the entire fibromodulin gene about 1 kb longer than the lumican gene.
The human biglycan gene is of a size similar to that of the fibromodulin and lumican genes, but it is quite different in organization, being composed of eight exons(39) . Here the majority of the coding region is composed of six exons in contrast to the single exon used in fibromodulin and lumican. The human decorin gene also consists of eight exons and has an organization similar to that of the biglycan gene(40, 41) . It is, however, much larger than the genes of the other family members, spanning at least 38 kb of genomic DNA(40) .
Although the lumican and decorin genes show the greatest differences in genomic organization, they appear to reside quite close to one another on human chromosome 12. In this work the lumican gene has been shown to reside on chromosome 12q22, whereas others have shown that the human decorin gene resides between regions 12q21.3 (41) and 12q23(40) . The human biglycan and fibromodulin genes are on distinct chromosomes, with the biglycan gene residing on chromosome Xq28 (42) and the fibromodulin gene on chromosome 1q32(36) . The region of chromosome 12 encompassing the lumican and decorin genes is of interest, since it has been shown to be the locus for Holt-Oram syndrome(43) . This is an autosomal dominant condition that causes skeletal abnormalities, particularly in the upper limbs, and cardiac abnormalities, which led to the more descriptive name of heart-hand syndrome. The cause of this disorder is unknown, but lumican can now be added as a candidate gene.
In articular cartilage, lumican message expression is higher in the adult than in the young juvenile, a trend also exhibited by the messages for fibromodulin and decorin but not that for biglycan, which shows the opposite trend with age. This trend may be related to the different roles played by the proteoglycans, since unlike the other family members biglycan localizes to the pericellular matrix rather than the more remote matrix rich in collagen fibrils(44) . It is also apparent that lumican, fibromodulin, and decorin are not expressed in a similar age-related manner by all tissues, and this probably reflects differences in the functional properties of the proteoglycans and the functional needs of the tissues.
The higher level of lumican message expression in the adult chondrocytes is mirrored by a higher level of lumican residing in the adult cartilage matrix. However, there are distinct differences in the matrix form of cartilage lumican between adult and juvenile tissues. In the juvenile cartilage all the lumican exists as a keratan sulfate proteoglycan, being substituted with sulfated polylactosamine chains. However, in the adult most of the lumican exists as a glycoprotein form devoid of polylactosamine chains. Such glycoprotein forms of lumican have previously been described in early embryonic cornea, prior to later substitution with keratan sulfate(17) . However, they have not been previously described in mature connective tissues. The predominance of the glycoprotein form in adult cartilage is unlikely to be the product of glycosidase action within the extracellular matrix, since the keratan sulfate chains of aggrecan are present and are longer in the adult than in the juvenile(45) . One therefore presumes that with age the human articular chondrocytes switch from the synthesis of a proteoglycan form of lumican to a glycoprotein form. The reason for this switch is not obvious, and neither is its functional effect on the tissue. It is interesting to note, however, that glycoprotein forms of fibromodulin, the other KS-PG member of the LRR-proteoglycan family, have also been shown to exist in mature bovine cartilage(46) .
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U18728[GenBank].