(Received for publication, May 22, 1995 )
From the
The human COL11A2 gene was analyzed from two
overlapping cosmid clones that were previously isolated in the course
of searching the human major histocompatibility region (Janatipour, M.,
Naumov, Y., Ando, A., Sugimura, K., Okamoto, N., Tsuji, K., Abe, K.,
and Inoko, H.(1992) Immunogenetics 35, 272-278).
Nucleotide sequencing defined over 28,000 base pairs of the gene. It
was shown to contain 66 exons. As with most genes for fibrillar
collagens, the first intron was among the largest, and the introns at
the 5`-end of the gene were in general larger than the introns at the
3`-end. Analysis of the exons coding for the major triple helical
domain indicated that the gene structure had not evolved with the genes
for the major fibrillar collagens in that there were marked differences
in the number of exons, the exon sizes, and codon usage. The gene was
located close to the gene for the retinoic X receptor in a
head-to-tail arrangement similar to that previously seen with the two
mouse genes (P. Vandenberg and D. J. Prockop, submitted for
publication). Also, there was marked interspecies homology in the
intergenic sequences. The amino acid sequences and the pattern of
charged amino acids in the major triple helix of the
2(XI) chain
suggested that the chain can be incorporated into the same molecule as
1(XI) and
1(V) chains but not into the same molecule as the
3(XI)/
1(II) chain. The structure of the carboxyl-terminal
propeptide was similar to the carboxyl-terminal propeptides of the
pro
1(XI) chain and pro
chains of other fibrillar collagens,
but it was shorter because of internal deletions of about 30 amino
acids.
Over 19 types of collagens are known, each with an apparently
unique biological function(1, 2, 3) . A major
subclass is the fibrillar collagens that form ordered extracellular
fibrils and that include type I, type II, type III, type V, and type XI
collagens. Type I and type III collagens are found in most
non-cartilaginous tissues. Type II is found primarily in cartilage
where it is the most abundant protein, but it is also present in the
vitreous humor and several other tissues in early embryonic
development. Type XI collagen was originally recognized as a minor
fibrillar collagen in cartilage that was similar to type II collagen.
The protein was considered to consist of three chains referred to
as
1(XI),
2(XI), and
3(XI)(4, 5, 6) . The
3(XI) chain
was subsequently shown to be derived from the same gene as the
1(II) chain of type II collagen that, by an unknown mechanism, was
assembled with the
1(XI) and
2(XI) chain to form a unique
procollagen
molecule(4, 5, 6, 7, 8, 9) .
In further analyses, type XI collagen was found to be closely related
in structure to type V collagen, and both type V and type XI collagens
were found in small amounts in a variety of cartilaginous and
non-cartilaginous connective tissues(10, 11) . Amino
acid sequencing of fragments of collagen fibrils from mammalian
vitreous humor demonstrated that fibrils were assembled from molecules
containing
1(XI) and
2(V) chains(12) . Also, during
the maturation of articular cartilage, isolated fractions rich in type
XI collagen were found to contain an increasing proportion of the
1(V) chain and a decreasing proportion of
1(XI)
chains(7) . These observations and others (13, 14, 15, 16, 17) led to
the suggestion (2) that type V and type XI collagens are a
heterogeneous class of collagens comprised of five or six different
chains, i.e. the
1(V),
2(V),
3(V),
1(XI), and
2(XI) together with the
3(XI) chain, which
apparently has the same primary structure as the
1(II) chain.
Here, we report the complete structure of the human gene for the
2(XI) chain of type XI collagen. The results demonstrate that (a) the gene structure did not evolve with the genes for the
major fibrillar collagens in that there were marked differences in the
number of exons, the exon sizes, and the codon
usage(1, 3) ; (b) the gene was located close
to the gene for the retinoic X receptor
in a head-to-tail
arrangement similar to that seen previously with the two mouse genes,
and there was marked interspecies homology in the intergenic sequences; (
)(c) the amino acid sequences and the pattern of
charged amino acids in the major triple helix of both the
1(XI)
chain (11) and
2(XI) chain (18, 19) as
well as the
1(V) chain (20) differed from the charged
amino acid pattern seen with the
3(XI)/
1(II)
chain(21) , an observation suggesting that the
3(XI)/
1(II) chain is not incorporated into the same molecule;
and (d) the structure of the C-propeptide (
)was
similar to the C-propeptides of the pro
1(XI) chain and pro
chains of other fibrillar collagens, but it was shorter because of
internal deletions of about 30 amino acids.
To prepare templates for
sequencing, 3-5 µg of a cosmid clone or plasmid subclone in
10 mM Tris-HCl buffer (pH 8) and 1 mM EDTA was
denatured with 0.2 M NaOH for 5 min at room temperature in a
total volume of 20 µl. The sample was precipitated by adding 2
µl of 5 M NaCl and 55 µl of 100% ethanol and then
incubating the sample on dry ice for 10 min. The sample was centrifuged
for 10 min, and the pellet was washed with 200 µl of 70% ethanol,
dried, and dissolved in 7 µl of distilled water. For annealing, 25
ng of a 17- or 18-mer primer was used in a reaction volume of 10
µl. The samples were annealed at 37 °C for over 30 min. For the
initial DNA sequencing, primers were designed based on published
sequences of the COL11A2 gene by Kimura et al.(18) and Zhidkova et al.(19) . Primers
for the retinoic X receptor gene were designed from the published
sequences by Fleischhauer et al.(25) and Epplen and
Epplen(26) . Additional primers were developed based on the
sequences derived during the progress of the work. Nucleotide sequence
analysis was carried out using the Wisconsin Sequence Analysis Package
(GCG) Version 8.0-UNIX (Genetics Computer Group).
Figure 1: Schematic diagram of the two cosmid clones 515 and 505-1 containing the human COL11A2 gene(22) . Exons and introns are drawn to scale. Sites for cleavage by selected restriction enzymes are shown. The size of exon 1 is not defined.
Figure 2: Nucleotide sequences of the exon-intron boundaries and the sizes of the exons and the introns of the human COL11A2 gene. Intron sequences are in lowercase, and exon sequences are in uppercase. Amino acids are numbered by the first glycine in the major triple helix defined as position 1. Numbers indicate the first amino acid in each exon.
Figure 3:
Nucleotide sequences between the retinoic
X receptor gene and the COL11A2 gene in the human (top line) and mouse (bottom line) genomes. Rectangle, putative poly(A) addition signal for the retinoic X
receptor
gene; four successive openboxes,
putative MAZ binding sites for the termination of
transcription(28) ; capitalletters and openrectangle, coding sequences for the COL11A2 gene; ellipses, gaps created to align sequences; lines, conserved SpI sites.
The first
11 exons of the mouse COL11A2 gene were previously
defined. Comparison of the mouse gene with the human gene
here established that the sizes of the first 11 exons were identical (Table 1). Also, the sizes of the first 10 introns were similar.
Figure 4: Schematic of the exon sizes for the major triple helical domain of the human COL11A2 gene and the human COL2A1 gene. As indicated, the junction exons in the COL11A2 gene are exons 14 and 63 and in the COL2A1 gene exons 6 and 49. The exons are drawn to scale. The exon sizes for the human COL2A1 gene are from Ala-Kokko and Prockop(29) .
As indicated in Fig. 4,
the coding region for the major triple helical domain of the 2(XI)
chain had a large number of 54-bp exons. The remaining exons appeared
to maintain a 54-bp motif in that seven exons were twice 54 bp (or 108
bp). Also, seven exons were 45 bp or the equivalent of 54 bp with a
9-bp deletion. These exons were similar in size, therefore, to many of
the exons found in the major fibrillar
collagens(1, 3) . However, there was no exon of 162 bp
as is found in each of the major fibrillar collagens. Also, there were
no exons of 99 bp, the size of five exons in the COL2A1 gene.
In addition, there were two exons of unusual size in that one exon was
90 bp (exon 40) and the other was 36 bp (exon 61). As indicated in Fig. 4, the number of exons and the pattern of exon sizes in the COL11A2 gene was different from the COL2A1 gene(29) . Despite these differences, the number of amino
acids for the major triple helical domain was 1,014, the same number as
in major fibrillar collagens.
The differences between the COL11A2 gene and other genes for fibrillar collagens were also
emphasized by comparison of a third base used in codons for glycine,
proline, and alanine (Table 2). The four bases were more
uniformly used for glycine codons in the 2(XI) chain than in the
1(I) and
1(II) chains. In addition, the pattern of third base
usage in codons for alanine appear to be different between the
2(XI) chain and the
1(I),
1(II), and
1(XI) chains.
Overall, the pattern of codon usage for glycine, proline, and alanine
in the
2(XI) chain was most similar to the pattern of codon usage
for the
1(V) chain.
Figure 5:
Comparison of amino acid sequences for the
major triple helical domains of 2(XI),
1(XI),
1(V), and
1(II) chains. The residues identical with those found in the
2(XI) chain are indicated by a dash. The sequences for
the
1(XI) chain are from Bernard et al.(11) ,
sequences for the
1(V) chain are from Takahara et
al.(20) , and sequences for the
1(II) chain are from
Baldwin et al.(21) .
Figure 6:
Schematic for the distribution of
positively charged amino acid residues in the 2(XI),
1(XI),
1(V), and
1(II) chains. As indicated, there is a large
identical pattern distribution of positively charged amino acids
between the first three chains, but the pattern in the
1(II) chain
is different. The sequences for the
1(XI) chain are from Bernard et al.(11) , sequences for the
1(V) chain are
from Takahara et al.(20) , and sequences for the
1(II) chain are from Baldwin et
al.(21) .
Figure 7: The 3`-end of the human COL11A2 gene. The nucleotide sequences of exon 65, intron 65, and exon 66 are indicated. Underlinedsequences are the three nested forward primers used in the 3`-RACE assay. The box indicates the unusual ATTTAA sequence that is probably a poly(A) signal sequence.
Figure 8:
Comparison of the amino acid sequences of
the C-propeptides of pro2(XI), pro
1(XI), pro
1(V),
pro
2(V), pro
1(II), and pro
1(I) chains. Homologous amino
acid sequences are indicated by boxes. Darkverticalrectangles indicate conserved cysteine
residues. Dashes indicate gaps created to align sequences.
Homologous amino acids are defined as suggested by Dayhoff et
al.(35) . The data for pro
1(XI) are from Bernard et al.(11) , for pro
1(V) from Takahara et
al.(20) , for pro
2(V) from Weil et
al.(36) , for pro
1(II) from Cheah et
al.(37) , and for pro
1(I) from Bernard et
al.(30) .
Initial analyses of the genes for the major fibrillar
collagens revealed a striking pattern in the exons encoding for the
major triple helical domains of the proteins (see (1) and (3) ). All the exons began with a complete codon for glycine.
Also, the sizes had a 54-bp motif in that most exons were 54 bp or
simple multiples thereof. In addition, the bases used for the third
position in codons for glycine, proline, and alanine were similar among
the genes. Therefore, the results suggested that the genes for the
fibrillar collagens evolved from a 54-bp exon that was duplicated
during evolution. The suggestion was supported by the further
observation that these features of the exons were conserved among man,
rodents, and chick. Also, with one exception, the exon sizes were
conserved among the four genes for type I collagen, type II collagen,
and type III collagen. Analysis of the genes for type IV collagen
demonstrated a different pattern in that many of the exons began with
split codons for glycine, and the exon sizes did not show a consistent
54-bp motif. The genes for other non-fibrillar collagens also varied in
exon structures, but it was generally assumed that all fibrillar
collagens maintained the same gene structure through long periods of
evolution. The results here present the first complete structure of a
gene for a minor fibrillar collagen. The exon structure does not fit
the pattern found in the genes for the major fibrillar collagens. Also,
the codon usage differs. Therefore, the results demonstrate that the COL11A2 gene has not evolved with the genes for the major
fibrillar collagens. Previous reports demonstrated that the codon usage
for the 1(V) chain of type V collagen differed from that of other
fibrillar collagens(20) . Also, the distribution of charged
amino acids in the
2(V) chain appeared to differ from the
distribution in the major fibrillar collagens(38) . Therefore,
the genes for
1(V) and
2(V) chains of type V collagen may
also have evolved differently from the genes for the major fibrillar
collagens.
Considerable evidence has suggested that the 1(XI)
and
2(XI) chains are incorporated into the same procollagen
molecules as the
3(XI)/
1(II) chain and the
1(V)
chain(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17) .
If this conclusion is correct, there should be a similarity in the
amino acid sequences and in the distribution of charged amino acids
among the
chains (see (32) ). The results here
demonstrated that a high degree of homology in the sequences and in the
pattern of charged amino acids was in fact found between the
2(XI)
chain and the
1(XI) and
1(V) chains. Therefore, the results
were consistent with the chemical data indicating that heterotrimeric
molecules can be comprised of varying combinations of
1(XI),
2(XI), and type V collagen chains. However, there was relatively
little conservation of sequence and of distribution of charged amino
acids with the
3(XI)/
1(II) chain. Therefore, if the
3(XI)/
1(II) chain is incorporated into the same molecule as
1(XI) and
2(XI) chains, the resulting triple helical molecule
must be far more heterogeneous in the distribution of amino acid side
chains that are on the surface of the molecule and that direct fibril
assembly than any better characterized molecule of a fibrillar collagen
(see (32) ). Because of the differences among the
chains
observed here, it appears that further documentation of the chemical
structure of the molecules containing
3(XI)/
1(II) chains is
now warranted.
The C-propeptide of the pro2(XI) chain showed a
relatively high degree of homology with the C-propeptides of other
fibrillar collagens (39) . However, there was a series of
internal deletions that made the chain shorter by about 30 amino acids.
Apparently, the deleted 30 or so amino acids are not critical for
directing chain association and chain selection.
Recent analyses on
the mouse COL11A2 gene demonstrated that the gene was located
at the 3`-end of the gene for the retinoic X receptor and that
the two genes were close together.
The results here
demonstrate that a similar arrangement of the two genes is maintained
in the human genome. In addition, there is a high degree of
conservation of the intergenic sequences. Therefore, the results
suggest that there may be some functional consequences of this unusual
arrangement of genes. The presence of four MAZ sequences at the 3`-end
of retinoic X receptor
gene apparently prevents continuous
transcription of the two genes, much as has been found in at least
three other pairs of genes that are in a similar head-to-tail
arrangement (see (28) ).
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U32169[GenBank].