(Received for publication, June 13, 1994; and in revised form, October 13, 1994)
From the
We isolated and sequenced genomic and cDNA clones encoding the
complete amino-terminal portion and the 5`-untranslated region of mouse
pro-2(XI) collagen mRNA. Fourteen exons encoded the amino-terminal
propeptide, which was divided into three consecutive domains (a long
globular domain, an amino-terminal triple helical domain, and a
telopeptide domain). The long globular domain was further divided into
an upstream basic subdomain and a downstream highly acidic subdomain,
as is the case for the amino-terminal propeptides of pro-
1(V) and
pro-
1(XI) collagens. We also demonstrated that the primary
transcript undergoes complex alternative splicing. Three consecutive
exons (exons 6, 7, and 8) encoding most of the acidic subdomain showed
alternative splicing which dramatically affected the structure of the
amino-terminal propeptide of pro-
2(XI) collagen. Using the reverse
transcription-polymerase chain reaction, we analyzed the expression of
these exons in various tissues and in developing limb buds of mice. The
pro-
2(XI) transcripts were abundant in cartilage, but most of them
lacked the 3-exon sequences encoding the acidic domain. Most of other
tissues also contained mRNAs that corresponded to longer splice
variants, including exons 6-8. The differential expression of
specific domains of pro-
2(XI) collagen may be important in
modulating interactions between various components of the extracellular
matrix and/or may influence heterotypic collagen assembly.
The fibrillar component of hyaline cartilage consists of several
types of collagen. Type II collagen is the major component of the
fibrils, whereas a quantitatively minor type IX collagen is associated
with their surfaces(1, 2, 3) . Type XI
collagen is another component of cartilage fibrils that seems to be
localized in the interior of the fibrils(4, 5) . The
type XI collagen molecule is composed of three distinct polypeptide
subunits: 1(XI),
2(XI), and
3(XI)(6) . The
3(XI) chain is believed to be a post-translational variation
product of the
1(II) collagen gene(7) , whereas the
1(XI) and
2(XI) chains are distinct gene products that are
closely related to the
1(V)
chain(8, 9, 10) . Type XI collagen is
predominantly found in cartilage. However, transcripts of the
1(XI) chain have also been detected in a variety of
non-cartilagenous cells and
tissues(11, 12, 13) . In addition, the
1(XI) chain has been detected in bone in association with the
1(V) and
2(V) chains(14) . Furthermore, the fibrils
in the vitreous humor are assembled from molecules containing the
1(XI) and
2(V) chains(15) . It has therefore been
suggested that type V and XI collagens are not separate collagen types
but are part of a larger collagen family(15) .
The functions
of type V/XI collagen are still obscure. However, emerging evidence
suggests that type V collagen forms heterotypic fibrils with the more
abundant type I collagen and that it may influence fibrillogenesis by
controlling the lateral growth of the fibrils through the coassembly
process(16, 17) . This function of type V collagen is
presumably based on the particularly slow and/or limited processing of
its amino-terminal propeptide (18, 19) (in this paper,
the entire amino-terminal portion between the signal peptide and the
start of the major triple helix is referred to as the N-propeptide). ()Similarly, type XI collagen coassembles with type II
collagen and may regulate the diameter of cartilage collagen
fibrils(4, 5) . The N-propeptide domain of type XI
collagen seems to be at least partly retained after proteolytic
processing(20, 21) .
In the present study, we
extended our cloning experiments on pro-2(XI) collagen. This study
investigated the genomic structure coding for N-propeptide and most of
the major triple-helical domain of the mouse pro-
2(XI) chain. In
addition, we obtained evidence that the N-propeptide domain of
pro-
2(XI) collagen is differentially expressed because of
alternative RNA splicing. The differentially expressed domain is highly
acidic and is encoded by three exons that are expressed in various
combinations, potentially increasing the functional versatility of type
XI collagen.
Figure 2:
Schematic representation of 2(XI)
N-propeptide and the alignment of two cDNA clones with exon-intron
organization. The two cDNAs have the same sequence at their 5` and 3`
ends. pRAC2-28 contains a 321-bp deletion relative to
pRAC1-15. At the top is the domain structure of
2(XI) N-propeptide encoded by pRAC1-15. SP, BS, AS, TH,
and TP indicate the signal peptide, basic subdomain, acidic
subdomain, amino-terminal triple helical domain, and amino-terminal
telopeptide, respectively. Vertical arrows indicate the
position of cysteinyl residues in the basic subdomain. The locations of
the primers used to obtain cDNAs are indicated by horizontal
half-arrows. The bottom part of the figure shows the
location of exons 1-14 in the gene. As discussed under
``Results,'' exons 6-8 (hatched boxes) are
alternatively spliced exons encoding an acidic
subdomain.
Figure 3:
Combined nucleotide sequences of the mouse
pro-2(XI) cDNAs, pRAC2-28 and pRAC1-15, and of the 5`
portion of
2(XI) genomic clones as well as the amino acid sequence
of the conceptual translation product. Capital letters indicate the coding sequences, whereas lowercase letters signify noncoding sequences. The numbering of nucleotides and
amino acid residues begins with the start of the putative signal
sequence. The third and fourth rows show human and bovine amino acid
sequences(32, 47) . Identical amino acid residues are
indicated by short lines. Dots indicate a stretch of the human
sequence which is not available. Asterisks indicate sequences
absent in the human cDNA. Vertical lines mark the beginning
and end of the human and bovine sequences. The positions of exon-intron
splice junctions are indicated by triangles above the
nucleotide sequence. The open arrow represents the putative
signal peptidase cleavage site. The amino-terminal triple helical
domain and part of the major triple helical domain are shown by boxed sequences. The alternatively spliced domain encoded by
exons 6-8 is shown by a dashed box. Putative
N-proteinase cleavage sites in the mouse sequence are underlined. Imperfections in the Gly-Xaa-Yaa triplet structure
are underlined by double lines. The cysteine residues
are indicated by vertical arrows, and the putative tyrosine
sulfation sites are circled.
Figure 1:
Genomic clones coding for mouse
pro-2(XI) collagen and the partial exon-intron structure of the
gene. The positions of the genomic clones are shown at the top. B and S are restriction sites for BamHI and SalI, respectively. The locations of the identified exons (vertical lines) in the gene are shown below the restriction
map. Dashed lines indicate the region which has not been
sequenced. Exons coding for the major triple helical domain are bracketed.
Since several minor
5`-RACE products were present in addition to the major cDNA clone,
pRAC2-28, some of them were also subcloned and sequenced.
Analysis of the longest cDNA clone, pRAC1-15, showed that it had
exactly the same 5` and 3` sequences as pRAC2-28 (Fig. 2).
However, pRAC1-15 also contained a region of 321 bp which was not
present in pRAC2-28. These 321 nucleotides (nt 799-1119 in Fig. 3) found in pRAC1-15 did not change the reading frame
and coded for an additional 107 amino acid residues. This additional
sequence in the N-propeptide contained multiple tyrosine residues and
was highly acidic, with a theoretical pI value of 3.1. Among the 13
tyrosine residues found in this region, at least 4 were embedded within
sequences which fulfilled the consensus features for a tyrosine
sulfation site(30) . The inclusion of this domain made the
configuration of the pro-2(XI) N-propeptide quite similar to that
of pro-
1(XI) or
pro-
1(V)(10, 11, 31) . Examination of
the nucleotide-derived structure indicated that the longer form of
mouse pro-
2(XI) N-propeptide was divided into three consecutive
domains: a long globular domain, an interrupted collagenous domain
(amino-terminal triple helical domain), and a short nonhelical segment
(amino-terminal telopeptide). In addition, as noted for the
pro-
1(XI) and pro-
1(V) chains, the globular domain of the
pro-
2(XI) N-propeptide was divided into two subdomains, which were
an upstream basic subdomain (theoretical pI = 10.2) containing 4
cysteine residues and a downstream highly acidic subdomain rich in
tyrosine ( Fig. 2and 3). These results indicated that there are
at least two distinct populations of mouse pro-
2(XI) collagen
mRNA. The major transcript encodes a 352-amino acid-long N-propeptide,
whereas the minor transcript encodes a 459-amino acid-long N-propeptide
containing a highly acidic subdomain. Comparison of the mouse
pro-
2(XI) N-propeptide sequence with that of humans (32) revealed that the human sequence contained a similar 21
amino acids coding region located within the acidic subdomain encoded
by pRAC1-15 (Fig. 3) and absent from pRAC2-28.
Mouse pro-2(XI) collagen contains 11 putative N-proteinase
cleavage sites (Ala-Gln or Pro-Gln), and some of them have no
counterparts in the human or bovine sequences (Fig. 3). Although
Ala-Ala
Gln at positions 468-470 most closely resembled the
conserved sequences of fibrillar procollagen
chains(11, 33) , it lacked an associated phenylalanine
at position -3, which is suggested to be critical for the action
of N-proteinase(33) .
To detect the
expression of each of the alternatively spliced exons, Northern blot
analysis was performed. As shown in Fig. 4, the insert of the
cDNA clone (pRAC2-28) hybridized to an RNA band migrating around
6.0-6.4 kb. Oligonucleotide probes specific for exons 6 and 8
also hybridized to bands that were almost identical with the band
obtained using the cDNA probe. Probe specific for exon 7 did not show a
clear positive signal under these hybridization and washing conditions.
These results suggest that at least exons 6 and 8 were present in part
of the mouse pro-2(XI) transcripts. However, the size difference
that was predicted to be contributed by exons 6-8 could not be
resolved into separate bands by this agarose gel electrophoresis
method.
Figure 4:
Northern blot hybridization of
pro-2(XI) cDNA and exon-specific probes. The probes used were the
random primer-labeled insert of clone pRAC2-28 and 3` end-labeled
oligonucleotides (see ``Experimental Procedures'' and A of Fig. 5for the positions of the exon-specific probes).
Each lane contained approximately 3 µg of poly(A)
RNA isolated from whole skeletal tissue. The positions of the RNA
size markers are indicated on the left in
kilobases.
Figure 5:
RT-PCR analysis of alternative splicing of
exons 6-8 of the pro-2(XI) gene. First-strand cDNA prepared
from the indicated tissues was subjected to the PCR using primers A and
B (A). The amplification products were separated out on 2%
agarose gels and stained with ethidium bromide (B). Control
PCR products of glyceraldehyde-3-phosphate dehydrogenase ( G3PDH ) are shown below B. C, Southern blots of the
PCR products hybridized with oligonucleotide probes (see
``Experimental Procedures''). Probes specific for exons 5/9,
6, 7, and 8 were used to identify combination of alternatively spliced
exons. The predicted sizes of the PCR products with various exon
combinations are shown on the left of the panel. Each
designation represents the splice variant containing the indicated
exons. Possible splice variants that may represent low level aberrant
splicing are indicated in brackets. p.c.,
postcoitum.
Proof of the identity of some of the PCR products (E59, E569, E589, E5689, and E56789) was obtained by subcloning and DNA sequencing of the amplification products (data not shown), and the exon-intron boundaries were further confirmed by comparison with the gene sequences (Fig. 6). The sequences at the 5` and 3` splice sites conformed well with the general splice consensus sequences(34, 35) . There was an auxiliary splice site located 6 nt downstream from the 5` end of exon 8 and which was also used in some of the subclones from E589 (Fig. 6). Between 15 and 40 nt upstream from the 3` cleavage site of each intron, putative lariat branch point sequences were found (Fig. 6) (see (34) and (35) for review). The pattern of divergence from the consensus sequence, as such, did not seem to correlate with alternative splicing of exons 6-8.
Figure 6: Nucleotide sequences at the alternatively spliced exon-intron boundaries. Exon sequences are indicated by capital letters and intron sequences by lowercase letters. The consensus sequences of the 5` and 3` splice site and branch point (34, 35) are shown at the top. The putative lariat branch point sequences located 15-40 nucleotides upstream from the 3` cleavage sites of the introns are underlined. The six nucleotides at the 5` end of exon 8 were spliced out in some subclones of the E589 variant (denoted by italics). r, purine; y, pyrimidine; n, any base.
We isolated mouse genomic clones containing the entire col11a-2 and sequenced approximately 70% of its coding region,
including the complete N-propeptide coding exons. Comparison of the
gene sequence with cDNAs indicated that the mouse pro-2(XI)
N-propeptide was encoded by 14 exons. In addition, the results provided
evidence that there are complex alternative spliced forms of the mouse
pro-
2(XI) collagen mRNA. At least four combinations of exons
6-8 (E59, E569, E56789, and E589) have been identified. Depending on the combination of
these exons, the length of the
2(XI) N-propeptide could range from
352 to 459 amino acid residues. Inclusion of each of these exons not
only changes the length but seems to dramatically affect the structure
and nature of the N-propeptide, since each of the alternatively spliced
exons encodes a highly acidic domain.
Although the longest
2(XI) mRNA shows relatively low expression, it seems to encode an
N-propeptide with a domain structure very similar to those of
pro-
1(XI) and pro-
1(V) collagens. In this context, the
pro-
1(XI), pro-
2(XI), and pro-
1(V) chains appears to
constitute a distinct subclass within the type V/XI collagen family.
However, as reported previously, the primary structure of each of the
domains in the N-propeptide is not equally similar between these
collagen chains(31) . The primary structure of the acidic
subdomain in the N-propeptide differs among the pro-
1(V),
pro-
1(XI), and pro-
2(XI) chains in contrast to the
similarities found in other regions. The occurrence of alternative
splicing of the pro-
2(XI) chain generates further diversity and
may increase the potential functional versatility of the N-propeptide
subdomain.
Many genes are known to produce alternatively spliced
mRNAs that each encode a different protein(36) . In some
instances, alternative splicing produces several forms of a protein
that are necessary at different times in different tissues. Alternative
splicing is also known to occur in collagen genes such as 1(II),
3(IV),
2(VI),
3(VI), and
1(XIII)(37, 38, 39, 40, 41, 42, 43) .
Our present findings thus add another example of alternative splicing
in collagen genes. As demonstrated by RT-PCR analysis, complex
alternative splicing of pro-
2(XI) collagen occurred in various
tissues. Pro-
2(XI) mRNA without exons 6-8 was more abundant
in cartilage, and mRNAs including each of these exons in various
combinations were mostly found in non-cartilaginous tissues. It has
been demonstrated previously that the pro-
1(II) mRNA, including
exon 2, which codes for a cysteine-rich globular domain, is observed in
non-cartilaginous tissues and prechondrogenic cells, whereas the mRNA
without exon 2 is localized in cartilage and cells with a chondrogenic
phenotype(40, 44, 45) . The structures of the
alternatively spliced domains in the N-propeptides of pro-
1(II)
and pro-
2(XI) are not homologous. However, considering that
procollagen
1(II) and
2(XI) mRNAs are coordinately expressed
in many tissues (46) and that the pro-
1(II) gene also
codes for the
3 chain of type XI collagen, it is tempting to
assume that similar tissue-specific splicing mechanisms act on both of
these mRNAs. We also speculate that the
1 chain of type XI
collagen could undergo similar alternative splicing, since the domain
structure (and possibly the gene structure) of the
1(XI) and
2(XI) N-propeptides is highly conserved.
Procollagen type XI
has been suggested to undergo two-step proteolytic processing during
its conversion to the matrix form, with the first step removing the
carboxyl propeptides and the second step involving cleavage within the
N-propeptides. Rotary shadowing data (20) have revealed a
substructure within the N-propeptide domain consisting of a hinge,
followed by a short rod like stretch in the matrix form of type XI
collagen. It is therefore unlikely that the putative N-proteinase
cleavage sites found in the amino-terminal telopeptide of the
2(XI) chain and the
1(XI) chain (11) are utilized,
since the entire N-propeptide would be removed if such site was
cleaved. It is more likely that cleavage of the
2(XI) propeptide
occurs upstream of the amino-terminal triple helical domain. In support
of this, the basic domain of the
2(XI) N-propeptide has been
recovered as a continuous peptide (proline/arginine-rich domain; amino
acid positions 28-245 in Fig. 3) from cartilage as a
product of processing(47) . It is therefore possible that
Pro-Gln at positions 258-259 or 263-264 is the site cleaved
by N-proteinase. Alternatively, yet another procollagen peptidase may
be necessary for processing of the N-propeptide of the
2(XI)
collagen chain.
The importance of these alternatively spliced
domains of the 2(XI) N-propeptide remains unclear. However, it
should be remembered that the alternatively spliced tyrosine-rich
acidic subdomain corresponds to a region in pro-
1(V) collagen that
has been implicated in the regulation of heterotypic type I + V
collagen fibrillogenesis. This region of the
1(V) chain remains
after procollagen processing and projects away from the major
triple-helical axis with a short amino-terminal triple helical
arm(17) . Accordingly, the acidic subdomain resides on the
surface of fibrils and may sterically prevent further deposition of
collagen molecules(16, 17) . In the type II + XI
collagen fibrils, the persisting N-propeptide region of type XI
collagen seems to play a homologous
role(5, 20, 21) . We propose that the
differential expression of the acidic subdomain in the pro-
2(XI)
collagen plays an additional regulatory role in type II + XI
collagen fibrillogenesis. If such a domain was, at least transiently,
expressed on the surface of the fibrils, it would modulate interactions
with other molecules.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) D38412[GenBank].