(Received for publication, June 8, 1994; and in revised form, September 12, 1994)
From the
In order to eventually elucidate the mechanisms regulating
1(XI) collagen expression in cartilaginous and non-cartilaginous
tissues, we performed an initial analysis of the structural-functional
features of the promoter of the human gene (COL11A1). After
cloning and sequencing the 5` portion of COL11A1, primer
extension and nuclease protection assays identified several minor
transcriptional start sites clustered around a major one located 318
base pairs from the ATG codon. Consistent with this finding, analysis
of the upstream sequence revealed the absence of a TATA motif and the
presence of several GC boxes. Transient transfection experiments
delineated the smallest promoter sequence directing relatively high
expression of a reporter gene in a cell type-specific manner. Nine
nuclear protein-bound areas were located within this promoter sequence
of the COL11A1 gene. Sequence homologies suggested that the
majority of the footprints correspond to potential binding sites for
ubiquitous nuclear proteins, such as AP2 and Sp1. Additional
experimental evidence indicated that one of the protected areas may
bind a transcriptional complex that is identical or closely related to
the one that regulates tissue specificity in the coordinately expressed
2(V) collagen gene.
The fibrillar group of collagens includes five structurally
related trimers which are traditionally divided into major (types
I-III) and minor (types V and XI) types, based on their relative
abundance(1) . Each of the minor collagens is an integral
component of the fibrils of one of the major
types(2, 3, 4) . This and experiments of
fibril reconstitution in vitro suggest that the minor
collagens regulate types I and II collagen fibrillogenesis in
cartilaginous (type XI) and non-cartilaginous (type V)
matrices(5, 6, 7) . In addition to cartilage,
the 1 subunit of type XI collagen is also produced by a large
variety of non-cartilaginous tissues and cultured cells. They include
bone, vitreous, skin, heart, sterna, arterial smooth muscle cells, and
two rhabdomyosarcoma cell
lines(8, 9, 10, 11, 12, 13, 14, 15) .
The non-cartilaginous synthesis of
1(XI) collagen has been often
associated with the production of hybrid types V/XI
trimers(9, 13, 14) .
Based on this
evidence and contrary to the previous belief, it is now hypothesized
that the minor collagen types represent a distinct subclass of
fibrillar molecules, which perform comparable functions in a large
variety of tissues and molecular associations(1) . A corollary
to this postulate is that tissue-specific expression of minor collagen
genes must be regulated in coordination with that of the major collagen
genes and, in some cases, with each other. For example, distinct
transcriptional programs are expected to control 1(XI) collagen
production in cartilaginous and non-cartilaginous tissues and in
coordination with types II and V collagen expression, respectively.
Consistent with this hypothesis, recent work has shown that
co-expression of the
1(XI) and
2(V) collagen genes (COL11A1 and COL2A5) (
)in arterial smooth
muscle cells is modulated in a similar manner by serum deprivation and
transforming growth factor-
1(15) .
Because of their potential relevance to connective tissue physiopathology, we are interested in elucidating the function and regulation of the minor collagen types. Accordingly, we recently began to examine the factors and mechanisms responsible for coordinated and tissue-specific expression of the human COL11A1 and COL5A2 genes. Our first report showed that 152 bp of the COL5A2 promoter are sufficient to sustain cell type-specific transcription of a reporter gene in transient transfection experiments (16) . Various DNA assays demonstrated that the activity of the COL5A2 promoter is under the positive control of two neighboring cis-acting elements, termed FP-A and FP-B. These two COL5A2 promoter sequences apparently contain novel nuclear protein-binding sites which are conserved in the corresponding region of the mouse gene(17) .
As an extension of this work, we have now cloned and sequenced the 5` portion of the human COL11A1 gene. We also delineated the composition and extent of the minimal COL11A1 promoter and defined its nuclear protein-binding pattern. Interestingly, a protein-bound area of the minimal COL11A1 promoter seems to bind a transcriptional complex identical or closely related to the one that recognizes the FP-B element of the COL5A2 gene.
Figure 1: Genomic and cDNA clones at the 5` portion of the human COL11A1 gene. Top, map of portion of the two genomic inserts with indicated exon number, exon and intron length, and the position of EcoRI (E) and PstI (P) sites. Note that the 5` foremost PstI fragment is the one used to derive the promoter/CAT constructs. Below the gene map are depicted the two cDNA clones discussed in the text with the indication of the restriction sites common to the genomic clones. Bottom, sequences of the exon/intron junctions with the encoded amino acids numbered according to Yoshioka and Ramirez(10) .
Figure 2:
Composition of the sequence around the
start sites of transcription. The sequence extends from the ATG codon
to nucleotide -96, relative to the major start site of
transcription. Continuous and discontinuous underlinings indicate the positions of the oligonucleotides used in the primer
extension and RACE experiments, respectively. The PstI site of
the 248-nt fragment used in the primer extension experiments is
highlighted by doubleunderlining. The 5` ends of the
extension products are signified by the blacktriangles (using the PstI-EcoRI fragment as a primer), and
by the whitetriangles (using the synthetic
oligonucleotide as a primer). The approximate 5` ends of the RNase
protected bands are indicated by the stippledboxes,
with the blackbox indicating the position of the
minor 300-nt product (see Fig. 3C). The 5` ends of
the RACE products are instead shown by the blackdots. The asterisks highlight the 5` of the
cDNAs isolated from this (HY 91, nucleotide +5) and previous (HY
83, nucleotide 142) library
screenings(10) .
Figure 3: Determination of the start site of COL11A1 transcription. Panel A, primer extension experiment using the PstI-EcoRI fragment (see Fig. 2) and RNA from A-204 (lane 1) and HT-1080 (lane 2) cells. Size markers (in bp) are on the right side of the autoradiogram. Panel B, primer extension experiment using the synthetic oligonucleotide primer (see Fig. 2) and RNA from A-204 (lane1), HT-1080 (lane2), and yeast (lane3); lane4 is the control sample without addition of RNA. Positions of the various extension products are indicated by the arrows with the numbers referring to the nucleotides shown in Fig. 2. Sequencing reactions using the same primer are shown in the last four lanes (C, T, A, and G). Panel C, RNase protection experiment of the 719-nt riboprobe using RNA from A-204 (lane 2) and HT-1080 (lane3) cells. The undigested riboprobe is in lane1, whereas size markers (in bp) are indicated on the right of lane4. The arrows on the left highlight the protected products whose 5` ends are shown in Fig. 2.
In addition to identifying these exons, the analysis also determined
the composition of about 1.8 kb of sequence lying immediately upstream
of the ATG codon ( Fig. 2and Fig. 4). Previous work has
shown that the 5` end of clone HY 83 is located 161 bp upstream of the
ATG codon ( Fig. 1and Fig. 2)(10) . In order to
identify possible transcripts initiating further upstream of HY 83, we
first screened an unamplified cDNA library from RD rhabdomyosarcoma
cells with a 5` fragment of HY 83. Analysis of the resulting positive
clone, HY 91, revealed that this cDNA contains an additional 154 bp of
non-coding sequence ( Fig. 1and Fig. 2). Based on these
data and in order to establish the 5` boundary of exon 1, extension
experiments were performed using two different primer sequences and RNA
purified from the 1(XI) collagen-producing A-204 cells and
non-producer HT-1080 fibrosarcoma cells(32, 33) .
Figure 4: Composition of the COL11A1 promoter sequence. The sequence extends from immediately 5` of the major start site of transcription (-1, see also Fig. 2) to the 5` PstI site(-1454) of the 1.7-kb genomic clone mentioned under ``Materials and Methods.'' The underlined nucleotides indicate the 5` ends of the promoter/CAT constructs shown in Fig. 5. The arrows signify the boundaries of the probes used in the DNase I footprinting assays, while the boxed sequences delineate the approximate extent of the nuclear protein-bound areas.
Figure 5: Deletion analysis of the COL1A1 promoter. The histograms indicate the percentage of CAT conversion of the chimeric constructs normalized for the internal control and relative to pSVCAT. Plasmids were transfected in HT-1080 (white histogram), A-204 (grayhistogram), and 1120 (stippledhistogram) cells. The data represent an average of three to five independent tests ± S.D.
The first primer is the 248-nt PstI/EcoRI internal
fragment of HY 83, whose 5` is positioned 49 bp upstream of the ATG
codon ( Fig. 1and Fig. 2)(10) . This primer
sequence yielded two major extension products with estimated sizes of
approximately 470 and 490 nt (Fig. 3A). After
subtracting the length of the primer sequence, the extension products
placed the end of the transcripts about 20 and 40 bp downstream of the
5` of clone HY 91 (Fig. 2). The second primer extension
experiment utilized a 21-mer complementary to a sequence located in the
5` portion of HY 91 (Fig. 2). This primer was chosen to
ascertain the possible presence of transcripts extending further
upstream of clone HY 91. The second primer extension experiment gave
rise to four identifiable products ending 30-70 nt further 5` of
HY 91 ( Fig. 2and Fig. 3B). Incidentally, both
sets of experiments yielded extension products only with RNA isolated
from the 1(XI) collagen-producing cell line A-204.
In order to have an independent estimate of the relative representation of the two groups of extension products, we employed the RACE protocol using primers located downstream of the 5` end of clone HY 83 (Fig. 2). Following subcloning, 11 positive RACE clones were randomly chosen and sequenced. Four of the RACE products were found to end downstream of HY 91, with two of them nearly coinciding with the last nucleotide of HY 83 (Fig. 2). Two of the RACE products extended upstream of HY 91, approximately within the region where the ends of the second group of primer extension products had been mapped (Fig. 2). Finally, 5 of the 11 race clones ended 3 nucleotides 5` of HY 91, suggesting that this represents the major start site of COL11A1 transcription (Fig. 2).
The location of the
major start site of transcription was independently assessed by a
nuclease protection experiment. To this end, we utilized a 719-nt
riboprobe spanning from a PstI site located upstream of the
ATG codon and common to both the HY 83 and HY 91 clones ( Fig. 1and Fig. 2). After RNase digestion, we observed
three major groups of resistant products, ranging in size from 220 to
260 nt, and a minor product of about 300 nt (Fig. 3C).
The largest of the three major groups of protected bands placed the end
of the transcripts at the same location as the 5` of HY 91 and the
majority of the RACE products (Fig. 2). The other two major
groups of protected products were found to be nearly superimposable to
the 5` end of the first primer extension products (Fig. 2).
Finally, the 300-nt resistant species mapped a minor transcript
coincident within the region of the second primer extension products (Fig. 2). Also in this case, specificity was supported by the
negative data obtained with RNA from the 1(XI) collagen
non-producer cell line HT-1080.
Based on this evidence, we concluded that the COL11A1 gene contains multiple transcriptional start sites with a major one (+1 in Fig. 2) located 318 nt upstream of the translational initiation codon. The presence of multiple start sites of transcription was indirectly supported by the observation that the sequence immediately upstream of +1 lacks a TATA box and contains several potential GC boxes (Fig. 4).
The
transcriptional activity of the COL11A1/CAT plasmid containing
1.4-kb upstream sequence was compared in A-204 and 1120 cells versus HT-1080 cells. In contrast to HT-1080, transfection of
the -1454 COL11A1/CAT plasmid in the 1(XI)
collagen-producing cells yielded some levels of CAT enzyme activity (Fig. 5). This result suggested that the 1.4-kb promoter
sequence contains cell type-specific regulatory elements (Fig. 4). To narrow down the length of the most active promoter
sequence with tissue specific expression, five constructs harboring
progressive 5` deletions of the 1.4-kb sequence were assayed in the
three cell lines. The results revealed a substantial loss of
transcriptional activity when the sequence between -541 and
-200 was omitted from the construct (Fig. 5). The
-541 and -199 promoter sequences are both expressed in a
cell type-specific manner, but at significantly different levels;
because of this and in order to distinguish them, they will herein be
referred to as the minimal and the basal promoter, respectively. The
experiments also suggested the possible presence of negative cis-acting elements in the promoter region spanning form
-744 to -542 (Fig. 5).
Figure 6: DNase I footprinting analysis of the basal promoter sequence. The -199 to -36 region was analyzed using the overlapping probes A and B of Fig. 4. In each test, the DNA was incubated without nuclear extract (lanes1), and with 60 µg of nuclear extract (lanes2) in the presence of 100-fold molar excess of specific (lanes3) or unspecific (lanes4) competitor DNA. LanesG and A are Maxam and Gilbert sequencing reactions used as markers. Vertical bars indicate the approximate extent of the nuclease-protected areas (see also Fig. 4).
The overlapping probes A and B cover the most proximal region of the COL11A1 promoter (-199 to -36); this is the segment that we have arbitrarily defined as the basal promoter because significantly less active than the -541 sequence. Three different footprints were identified within the 163-bp-long basal promoter (Fig. 6). They correspond to GC-rich boxes (numbered 1-3 in Fig. 4) highly homologous to the consensus recognition sequences for the binding of transcription factors AP2 and Sp1(34) . The presence of these particular binding sites is consistent with the multiple transcriptional start sites and the TATA-less nature of the COL11A1 promoter(35) .
The
other three probes cover the -541 to -200 segment of the
minimal COL11A1 promoter. The DNase I footprinting analysis
revealed that this region is characterized by the presence of at least
six protected areas (Fig. 7). A computer-aided search identified
possible homologies to known nuclear factor binding sites(34) .
They include the ubiquitously expressed factors NF-B (footprint
6), CF1 (footprints 4 and 5), AP3 (5` third of footprint 9), and AP2
(3` third of footprint 9). In addition, a potential GATA-like
recognition sequence was identified in the middle of footprint
9(36) . Footprint 8 scored the highest homology with a
regulatory sequence (GGGXGGPuPu) of human polyoma virus
JC(37) . The same element was also noted in the 5` half of
footprint 5. Finally, the search identified a good candidate for the
binding of the transcriptional complex that interacts with one of the
aforementioned cis-acting elements of the COL5A2 promoter. This initial prediction was solely based on the homology
between the 3` half of footprint 7 (TTGAATACAG) and the core sequence
of FP-B (ATCAATCAG)(16) . FP-B is the major cis-acting
sequence necessary for high and tissue specific COL5A2 gene
expression(16) . Indeed, 5-nucleotide substitution of the CAATC
motif in FP-B was shown to result in the loss of protein binding in
vitro and of transcriptional activity in transfection
assays(16) .
Figure 7: DNase I footprinting analysis of the minimal promoter sequence. The -541 to -200 region was analyzed using the three contiguous probes C, D, and E of Fig. 4. In each test, the DNA was incubated without nuclear extract (lanes 1), and with 60 µg of nuclear extract (lanes 2) in the presence of 100-fold molar excess of specific (lanes 3) or unspecific (lanes 4) competitor DNA. Lanes G and A are Maxam and Gilbert sequencing reactions used as markers. Vertical bars indicate the nuclease-protected areas (see also Fig. 4).
To confirm the above hypothesis, we performed a competition experiment using the gel mobility shift assay. To this end, a radiolabeled oligonucleotide (-395 to -379) encompassing the sequence of footprint 7 was incubated with nuclear proteins purified from 1120 nuclei. Binding to footprint 7 was challenged by increasing amounts of unlabeled oligonucleotides for footprint 7, FP-B, or an unrelated sequence. With the exception of the last one, the other two oligonucleotides competed binding with comparable effectiveness (Fig. 8). The results therefore suggested that the DNA elements of the COL11A1 and COL5A2 promoters bind identical or closely related nuclear proteins. Consistent with this conclusion, the FP-B oligonucleotide containing the aforementioned 5-nucleotide substitution (ACCGA for CAATC) failed to compete binding of nuclear proteins to footprint 7 (Fig. 9).
Figure 8: Complex competition between COL11A1 and COL5A2 promoter elements. DNA binding was analyzed by the gel mobility shift assay using the footprint 7 (F7) probe. Binding was challenged with increasing molar amounts of the same sequence, and of oligonucleotides for FP-B (FB) and the sequence of oligonucleotide C (C, see also Fig. 4). Numbers above each lane indicate the -fold excess of the competitors.
Figure 9: Further evidence for a common type V-XI nuclear protein binding site. The gel mobility shift assay was used to analyze the effect of challenging nuclear protein binding to footprint 7 with 50-fold molar excess of the same sequence (F7), FP-B (FB), oligonucleotide C (C), and the mutated FP-B (FB*) sequence.
Experiments in progress are testing the validity of this hypothesis. They are also examining the specificity of the COL11A1 promoter in the more physiological environment of the transgenic mouse. Additional work is mapping the cytokine-responsive elements of the COL11A1 and COL5A2 genes using vascular smooth muscle cells as a model. Aside from elucidating how a specific set genes are modulated by the same cytokines, this information may provide new insights into the etiopathogenesis of diseased vascular tissues.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U12139[GenBank].