(Received for publication, September 24, 1996, and in revised form, February 5, 1997)
From the Departments of Pathology,
§ Pediatrics, and ¶ Biochemistry and Molecular
Biology, and ** Committee on Developmental Biology, University of
Chicago, Chicago, Illinois 60637
Aggrecan is a large chondroitin sulfate
proteoglycan, the expression of which is both tissue-specific and
developmentally regulated. Here we report the cloning and sequencing of
the 1.8-kilobase genomic 5 flanking sequence of the chick aggrecan
gene and provide a functional and structural characterization of its
promoter and enhancer region. Sequence analysis reveals potential Sp1,
AP2, and NF-I related sites, as well as several putative transcription factor binding sites, including the cartilage-associated silencers CIIS1 and CIIS2. A number of these transcription factor binding motifs
are embedded in a sequence flanked by prominent inverted repeats.
Although lacking a classic TATA box, there are two instances in the
1.8-kb genomic fragment of TATA-like TCTAA sequences, as have been
defined previously in other promoter regions. Primer extension and S1
protection analyses reveal three major transcription start sites, also
located between the inverted repeats. Transient transfections of chick
sternal chondrocytes and fibroblasts with reporter plasmids bearing
progressively reduced portions of the aggrecan promoter region allowed
mapping of chondrocyte-specific transcription enhancer and silencer
elements that are consistent with the sequence analysis. These findings
suggest the importance of this regulatory region in the tissue-specific
expression of the chick aggrecan gene.
During development, the extracellular matrix is a complex dynamic structure, the components and organization of which help to establish the requisite position and state of differentiation. The large chondroitin sulfate proteoglycan (CSPG),1 aggrecan, has been localized predominantly to skeletal tissue and is considered to be a hallmark of cartilage differentiation. In chick cartilage, aggrecan expression begins at embryonic day 5 in limb rudiments, continues through the entire period of chondrocyte development, and remains a biochemical marker of the cartilage phenotype thereafter. In very early embryos, aggrecan is expressed in the notochord as early as stage 16, long before chondrogenesis occurs (1).
We have extensively studied the properties and expression of aggrecan from embryonic chick cartilage. These studies include synthesis and processing (2-5), structural analysis via peptide sequencing to elucidate glycosylation motifs, and a consensus sequence for O-xylosylation and mapping of the S103L monoclonal antibody epitope (6-10). Moreover, we have conducted molecular analysis to construct the composite sequence of chick cartilage CSPG from overlapping cDNAs and to identify a defect in the aggrecan gene associated with the chondrodystrophy, nanomelia (9, 11).
This sequence, obtained from 10-day-old chick embryos, has 6464 nucleotides that include an open reading frame encoding 2109 amino
acids and 16 nucleotides of the first untranslated exon (11). Another
chick aggrecan cDNA sequence, obtained from embryonic chick brain,
was 6597 nt in length, including 265 nt of 5-untranslated exon
sequence (12). Using chick CSPG cDNA probes, we subsequently isolated genomic clones containing exons encoding the chick CSPG core
protein. The two 5
globular domains, G1 and G2, are encoded by four
and three exons, respectively, and the interglobular domain is encoded
by a single exon. The chondroitin sulfate attachment domain is encoded
by the largest exon, 3216 bp, which is approximately 50% of the total
coding sequence. These data reveal that the chick CSPG gene contains at
least 18 exons spanning more than 30 kb. No evidence was obtained for
multiple genes for aggrecan in the chick genome. Elucidation of the
genomic organization of chick aggrecan has allowed for a more thorough
comparison with the mammalian aggrecans, as well as the avian and
mammalian link proteins, with respect to origin and mechanisms of
divergence. A summary of this work was published recently (13).
We have also found that aggrecan is developmentally expressed, in ovo and in limb bud cultures, on both protein and mRNA levels in a pattern commensurate with the onset of chondrogenesis. The modulation of expression of this cartilage-specific CSPG and type II collagen mRNA in stage 24 limb bud mesenchyme cells cultured in high density was examined under conditions that promote chondrogenesis in vitro (14) and mimics the same process in limb development in ovo. Morphologically, mesenchymal proliferation ceases by day 2, condensation occurs first in the formation of aggregates by days 4-5, and then of overt nodules by days 6-8, concomitant with cellular differentiation and production of matrix. Quantitatively, a 50-fold increase in aggrecan mRNA occurs from day 2 (when first detected) to day 6, followed by a slight decline (about 2-fold) by day 8 when the message reaches a plateau thereafter (15). This same pattern is observed immunologically, using the monoclonal antibody S103L, which is specific to the aggrecan protein. These studies indicate that during limb development the expression of these two differentiation-specific proteins are stringently controlled until the establishment of the cartilage phenotype. Thereafter, aggrecan continues to be synthesized and deposited in the extracellular matrix, perhaps to effect a decrease in cell adhesion necessary for maintenance of the chondrogenic state.
Concurrent with studies of mechanisms that control the temporal-spatial
aspects of cartilage differentiation are structural and functional
analyses of expression of the differentiation-specific products of the
extracellular matrix. For instance, significant work has been done to
understand the tissue-specific expression of collagen genes and the
mechanisms that regulate their distinct transcriptional programs
(16-18). In contrast, there have been no studies of the
transcriptional regulation of the aggrecan gene that examine its
tissue-specific expression during development. Mouse aggrecan has been
cloned; however, no functional analysis has been performed to examine
its tissue specificity (19). A preliminary characterization of the rat
aggrecan promoter has also appeared, describing a 120-bp sequence
containing transcription start sites (20). It is not clear whether this
120-bp genomic fragment contains tissue-specific control elements,
because the 5 promoter/enhancer region is probably larger or may
contain additional regulatory elements. The same report described
promoter assays on a larger isolate containing an additional 520 bp of 5
flanking sequence, but the sequence data were not presented.
Therefore, to begin to elucidate the mechanisms that govern aggrecan
expression in chondrocytes, we have cloned the promoter region of the
embryonic chick S103L-reactive CSPG (aggrecan). The aim of the present
study was to identify and characterize the cell- and stage-specific
elements in the 5 genomic flanking region of the aggrecan gene, which
could regulate the expression of this extracellular macromolecule
during embryonic development.
Oligonucleotides were made with an Applied Biosystems 3808 DNA synthesizer. Reagents for biochemical and molecular cloning experiments were of the highest quality available from commercial vendors. Restriction endonucleases were from New England Biolabs unless otherwise stated. T4 DNA ligase, T4 kinase, S1 nuclease, avian myeloblastosis virus reverse transcriptase, and Klenow polymerase were from Promega. Taq polymerase was from Perkin-Elmer. A chick genomic library was purchased from CLONTECH Laboratories.
Preparation of Probe and Screening of Chick Genomic LibraryA chick aggrecan cDNA fragment comprising 260 bp of
the 5-untranslated exon plus 56 bp of the signal peptide (SP) exon was obtained via PCR from the previously reported cDNA, clone 1 (11). Because the template clone was inserted in pGEM-4Z, the upstream primer
was the SP6 promoter primer (Promega); the downstream primer was a
17-mer, 5
-CTGTGGTGATGGCTTGC-3
, from the antisense strand of the SP
exon. The probe was then purified by low-melting-point agarose gel
electrophoresis and labeled with 32P using a Multiprime DNA
labeling system and [
-32P]dCTP purchased from Amersham
Corp. Approximately 50,000 independent members of the chick genomic
library were screened. The chick genomic library was plated, and
nitrocellulose plaque-lifts were prepared and probed by
hybridization according to standard methods (21). Positive plaques were
picked, then re-plated, and screened as above for two or three rounds
until the plaques were purified.
The screening
yielded a 14-kb genomic fragment (Fig. 1B).
Phage DNA was purified from plate lysates (21). Isolates from the
library screening were subcloned into the vector pGEM-4Z by standard
methods (21). Southern blot analysis using the same aggrecan
untranslated exon probe identified an approximately 1.8-kb BglII-BbsI genomic fragment that was subcloned
into pGEM-4Z. Initial sequencing with the T7 promoter primer (Promega)
revealed that one end of the subclone had a sequence identical to the
5 145 bp of a previously published S103L-CSPG cDNA sequence (12), with the exception of three dA residues that were not present in the
cDNA sequence. The genomic clone has a tract of 21 dAs where the
cDNA has a stretch of 18 dAs. This likely reflects an error arising
during library generation because the flanking sequences are identical.
The 1.8-kb insert was excised from pGEM-4Z by
EcoRI-KpnI digestion, treated with Klenow
polymerase, and blunt-end ligated into the reporter vector pGL2-Basic
(Promega), which had been linearized with the restriction enzyme
NheI and treated with Klenow. The reporter vector pGL2-Basic
does not contain any eukaryotic promoter or enhancer elements.
Sequences to be assayed for promoter activity are inserted upstream
(5
) of a luciferase gene. Plasmids were sequenced to find clones that
had the insert positioned in the forward (+) and reverse (
)
orientations (Fig. 2C). The forward orientation was defined as having the 1.8-kb insert ligated into the
reporter vector pGL2-Basic with the same 5
-3
orientation relative to
the reporter gene as the native sequence in the genomic clone relative
to the aggrecan gene. Constructs that contained the 1.8-kb genomic
insert of the chick aggrecan gene were named Ag-1(+) and Ag-1(
).
Sequence Determination and Analysis
Dideoxynucleotide chain termination sequencing (22) of the BglII/BbsI DNA fragments subcloned into pGEM-4Z plasmids was performed using the U. S. Biochemical Sequenase (version 2.0) system. Primers were T7 or SP6 promoter primers (Promega) or 18-20-mer oligonucleotides synthesized according to the obtained sequence. Multiple sequence determinations were made for each primer used. Ambiguities in sequencing were resolved by using a different polymerase (e.g. avian myeloblastosis virus reverse transcriptase), sequencing the complementary strand, or both. All residues were confirmed by at least two separate sequence determinations. DNA sequence analysis was performed using the Wisconsin Package (23). Searching for palindromic sequences was done using the program COMPARE to find inverted repeats by comparing the sequence to its own complement (24), and the results were displayed via the program DOTPLOT. Putative transcription factor binding sites were located with the program FINDPATTERNS using the pattern file tfsite.dat, which comprises the Transcription Factor Database (25).
Purification of DNAPlated colonies were used to inoculate 5 ml of LB medium (21). The cells were grown overnight at 37 °C with vigorous shaking. The 5-ml culture was added to 400 ml of LB. The culture was shaken at 37 °C for at least 12 h, cells were harvested, and plasmid DNA was recovered using the QIAGEN Plasmid Maxiprep kit.
Synthesis of Deletion ConstructsThe inserts for plasmid
constructs 1300(+), 900(+), 500(+), and 500() were made by PCR using
the Ag-1(+) construct as a template (Fig. 2, A and
B, and Fig. 6B). XhoI sites were
introduced at the end of the amplified fragments via the primers used.
PCR fragments were purified using Qiaquick PCR Preps (QIAGEN) and
digested with XhoI for 2 h. The fragments were gel
purified and ligated into the XhoI site of the pGL2-Basic
vector. Inserts A(+) to F(+) were made via PCR with Ag-1(+) as a
template, and the primer oligonucleotides contained downstream
BglII/SmaI and upstream KpnI
restriction enzyme cutting sites. The PCR fragments were gel purified,
digested with BglII and KpnI, and ligated
directly into pGL2-Basic, producing the constructs A(+) to F(+) (Fig.
2, A and B, and Fig. 6B). The constructs A(
) to F(
) were made in the same fashion as above, except that each insert was digested with SmaI and
KpnI at the insert ends to ensure their opposite orientation
in the pGL2-Basic vector relative to the A(+) to F(+) inserts (Fig. 2,
A and B, and Fig. 6B). Sequencing of
the various constructs was done to confirm the appropriate orientation
of the inserts and exclude PCR artifacts.
Cell Cultures
Cultures of day-14 chick sternal chondrocytes were established according to the procedures described by Cahn et al. (26) and as modified by Campbell and Schwartz (3). Cultures of fibroblasts were established from skin of day-10 chick embryos following trypsinization (3). Cells were plated at an initial density of 1.5 × 106/100-mm tissue culture dishes (Falcon) in either F-12 medium (chondrocytes) or Dulbecco's modified Eagle's medium (fibroblasts) and supplemented with 10% fetal calf serum. The cells were permitted to attach to the dishes, and subsequent growth (2-3 days) was maintained by a complete change of the medium every 2 days (2). On the day of transfection, chondrocyte cultures were trypsinized, and single cells were suspended in F-12 medium, replated, and allowed to attach to the dishes for 3-4 h before treatment as described below.
TransfectionStandard methods were followed for transient
calcium phosphate transfections (21). Duplicate plates containing
approximately 5 × 106 cells (either chondrocytes or
fibroblasts) received 20 pmol of a given plasmid construct to be
assayed. Five µg of a -galactosidase reporter plasmid were
cotransfected with each experimental construct to correct for
cell loss. Duplicate transfection sets were repeated three times, each
time yielding similar results. The transfections were allowed to
proceed for 36 h. The relative efficiency of transfecting the
chondrocytes was approximately 13% that of transfecting the fibroblasts.
Reagents for the luciferase and
-galactosidase assays were purchased from Promega. Because both
luciferase assays and
-galactosidase assays were performed,
Promega's Reporter Lysis Buffer (RBL, E3971) was used to prevent the
inhibition of
-galactosidase activity that occurs in buffers
containing detergents such as Triton X-100. No deviations were made
from the manufacturer's protocol for preparation of extracts from
tissue culture cells. The enzymatic activity of luciferase was measured
with a luminometer (Analytical Luminescence Laboratory, Monolight
1500). The enzymatic activity for
-galactosidase was measured with a
microplate reader (Dynatech) at 409 nm. Standard deviations were
determined for the six assays performed on duplicate plates within one
experiment.
The Z2
or Z3 oligonucleotides (Z2, 5-AATTCCCTGTGTGGTATTTCAGGTCCTTTCAGGC-3
,
nt 193-226; Z3, 5
-GCAAGAGAGACCATCAAACTCCTGTCAGCCTCCT-3
, nt
68-101) for primer extension experiments or S1 analysis were end
labeled using [
-32P]ATP and T4 DNA kinase according to
standard protocols (21). Three ethanol precipitations were performed to
remove the residual [
-32P]ATP from the labeled
oligonucleotides.
Established methods were used to perform S1 analysis (27).
Single-stranded probes were made from the double-stranded 900(+) and
D(+) plasmids. Plasmids were alkali-denatured, and a
32P-5-end-labeled oligonucleotide primer, Z2 or Z3, was
annealed to the template, 900(+) or D(+), and extended with Klenow
(Promega). Probes were cut to the appropriate 5
length by digestion
with restriction enzyme KpnI. The single-stranded probes
were separated from the template DNA by alkaline low-melting-point
agarose electrophoresis, and radiolabeled bands were cut out and
purified by phenol extraction and ethanol precipitation (21).
Approximately 5000 cpm of probe was hybridized to 25 µg of total RNA
from day-14 chick sternal chondrocytes. The hybridization occurred at
55 °C for 12 h in an aqueous hybridization solution (21). The
resultant RNA:DNA hybrid was digested with 200 units of S1 nuclease for
60 min. The products were electrophoresed in 6% polyacrylamide
sequencing gels.
Approximately 5000 cpm of labeled Z2 or Z3 probe was hybridized to 25 µg of RNA derived from day-1 chick sternal chondrocytes. Hybridization was done in S1 hybridization solution for 12 h at 30 °C (21). Extended products were produced by treating the hybrid RNA:primer with 40 units of avian myeloblastosis virus reverse transcriptase (Promega). Products were extracted in phenol/chloroform, precipitated in ethanol, and electrophoresed on 6% polyacrylamide sequencing gels.
To guide functional studies, the complete 1.8-kb Ag-1
sequence was determined and found to comprise 1875 bp (Fig.
3). Examination of the sequence revealed the lack of a
classical TATA box or CCAAT box. When the Ag-1 fragment was analyzed
for transcription factor binding sequences, it was found that at least
202 potential sites were present, including putative AP2 and Sp1
binding sites. The relative positions of some of these eukaryotic
transcription factor-associated sequences are indicated in Fig. 3. The
numbering of the sequence is relative to the most upstream
transcription start site (as detailed below). The Ag-1 sequence was
also compared with known promoter sequences in the eukaryotic promoter
data base (EPD) using the National Center for Biotechnology Information
BLAST server (25), and no extensive identity with other promoter
sequences was found. However, tracts of multiple dA and dT residues,
analogous to those found in Ag-1 in the ranges 250 to 280 and 144 to
78, respectively, were seen to occur in many other described promoter regions. These dA and dT tracts, in particular the dT16
from
87 to
78 and the dA21 from 250 to 270, constitute
an inverse repeat or palindrome with the potential to give rise to a
pair of large stem-and-loop structures or a cruciform structure (28).
Hence, additional analyses were performed on the Ag-1 sequence with the aim of detecting other, less obvious, palindromic sequences.
The Ag-1 sequence from positions 300 to 340 was analyzed by
comparison to its own reverse complement sequence with the Wisconsin Package program COMPARE. The dot plot reveals a widely spaced pair of
inverted repeats centered around
100 and 250, corresponding to the dT
and dA tracts, separated by over 300 bp. However, no other potential
secondary structures of comparable scale are seen in this sequence with
the window/stringency parameters used in this analysis; a few less
prominent repeat pairs occur in the downstream third of the sequence.
Interestingly, the putative Sp1, AP2, and TFII sites in addition to
other potential factor-specific sequences, as well as all three of the
mapped start sites, lie in the putative loop portion of this potential
structure. Such secondary structures, in addition to potential
transcription factor binding sites, may be involved in mechanisms by
which the aggrecan message is developmentally regulated.
Two
methods, S1 analysis and primer extension, were used to locate the
sites where transcription of the aggrecan mRNA is initiated. Because the 5-untranslated cDNA sequence previously reported by
this laboratory (11, 12) overlaps with the 3
end of the Ag-1 genomic
isolate by 145 nucleotides, transcription initiation occurs still
farther upstream in Ag-1. Templates used to generate single-stranded
DNA probes for S1 analysis included the 900(+) and D(+) plasmid
constructs, as represented in Fig. 4C. S1
analysis with the downstream primer Z2 yielded three major protected
fragments: 226 bp, 187 bp, and a 69/70-bp doublet, corresponding to
start sites at positions 1, 40, and 157-158 (Fig. 4A, lanes
1 and 2). Position 1 in Fig. 3 is defined as the
farthest 5
transcription starting site. These locations were obtained
with probes generated from both the 900(+) and the D(+) constructs. The
two upstream transcription start sites at positions 1 and 40 were
confirmed with the downstream primer Z3-generated probes, again using
the 900(+) and D(+) constructs as DNA templates (Fig. 4B, lanes
4 and 5). Z3-generated probes from the 900(+) and D(+)
constructs gave protected fragments of 101 and 62 bp, respectively,
confirming the position 1 and 40 transcription starting sites. The Z3
primer lies upstream of the 157/158 transcription starting site.
Primer extension experiments used the same antisense oligonucleotides, Z2 and Z3, as used in the S1 analyses. Primer extensions on RNA from cultured day-14 sternal chondrocytes gave products of the same sizes as the corresponding S1-protecting experiments, confirming the three transcription starting sites at positions 1, 40, and 157-158, as shown in Fig. 4, A and B, lanes 3 and 6. These results are represented schematically in Fig. 4D.
Functional Analysis of the Aggrecan Promoter SequenceTransient transfections of day-14 chick embryo sternal
chondrocytes with the construct Ag-1(+) (the forward orientation of the
1.8-kb insert in the promoter/enhancer-free pGL2-Basic reporter vector)
revealed a plasmid dose-dependent level of luciferase expression (Fig. 5A), i.e.
increasing concentrations of transfected construct produced increases
in luciferase activity, establishing that the 1.8-kb region contains
elements capable of promoter function. In subsequent experiments,
constructs Ag-1(+) and Ag-1(), in addition to pGL2-Basic vector with
no insert, were transiently transfected into both 14 day-old chick
sternal chondrocytes and, to examine tissue specificity, into 10 day-old chick embryo fibroblasts. In transfected chondrocytes, the
construct Ag-1(+) produced a 45-fold increase in luciferase activity
compared with the no-insert control (Fig. 5B), whereas
transfected fibroblasts produced less than a 10-fold increase.
Transfections with either the negative control pGL2-Basic vector with
no insert or the Ag-1(
) construct resulted in much lower luciferase
expression, with activity equivalent to background in both transfected
chondrocytes and fibroblasts.
A series of constructs that progressively deleted the Ag-1(+) sequence
was used to relate the locations of potential transcription factor
binding sites and secondary structure to promoter function and tissue
specificity. The constructs and transfection results are summarized in
Fig. 6. The initial deletion removed approximately 500 bp from the upstream end of the Ag-1(+) construct, as well as a tract
of 21 dA residues from the downstream end. The resulting construct,
1300(+), produced a modest increase in luciferase activity in
chondrocytes versus that promoted by the construct Ag-1(+). Transfected fibroblasts showed little difference in luciferase activity
from Ag-1(+) to 1300(+); the latter was slightly lower. Deletion of
another 500 bp from the 5 end (including a CIIS2 site) generated the
construct 900(+); this deletion had a dramatic effect, because both
chondrocyte and fibroblast luciferase yields nearly tripled when
compared with assays of the original Ag-1(+) construct (to 140- and
30-fold, respectively). Although chondrocyte activity remained
substantially higher than that in fibroblasts, there was a greater
proportional increase in luciferase activity in fibroblasts, 260% when
compared with the 1300(+) construct in fibroblasts versus a
160% increase in chondrocytes. This increase may be due to loss of
tissue specificity or to coincidental but independent effects of
silencers in both cell types.
Removal of approximately 400 additional bp from the upstream end of the 900(+) construct (including another CIIS2 site) produced the 500(+) construct. Promoter activity in chondrocytes returned to approximately 50-fold, similar to that assayed for the constructs Ag-1(+) and 1300(+); yet in fibroblasts, luciferase activity of the 500(+) construct was only slightly lower than that seen for the 900(+) construct (Fig. 6). This finding suggests that the upstream half of 900(+) may contain enhancer elements that are used in chondrocytes.
A newly generated construct A(+), 590 bp, was made that was similar to
the 500(+) construct, except that the insert contained the 3 stretch
of poly(dA) regions and 36 bp in the 5
direction to include the
putative IgHC.21 site (Fig. 3). These changes produced a modest
increase in luciferase activity in chondrocytes only. Measured
luciferase activity in fibroblasts modestly decreased when compared
with the luciferase activity measured from fibroblasts transfected with
the 500(+) construct. The deletion construct B(+), 547 bp, which does
not contain the IgHC.21 site, lost approximately 40% of the activity
of the A(+) construct in chondrocytes; the activity in fibroblasts was
reduced by 70%, resulting in luciferase activity as low as that seen
for many of the (
) constructs. A further deletion construct, D(+),
376 bp, which included only the three transcription start sites and the
putative Sp1 and AP2 binding sites, produced a significant amount of
luciferase activity in chondrocytes (nearly 60-fold), and in
transfected fibroblasts luciferase activity was equivalent to the
1.8-kb Ag-1(+) construct. The D(+) construct deleted the poly(dT)
region but included the poly(dA) region. The 308-bp construct, E(+),
included the three major start sites at positions 1, 40, and 157/158
but did not include the consensus sequences Sp1-CS4, GR-MT-IIA, and
AP-2-CS4. Deletion of these potential nuclear factor binding sites
caused a 75% loss of activity in chondrocytes while not substantially altering luciferase activity in transfected fibroblasts. Construct E(+)
had comparable luciferase activity in both chondrocytes and fibroblasts
of approximately 15-fold when compared with the no-insert control
vector. The 140-bp construct F(+) did not include any of the determined
starting sites and produced modest luciferase activity in transfected
chondrocytes and baseline luciferase activity in transfected
fibroblasts. In all but one instance, the reverse orientation
constructs of all of these genomic fragments yielded minimal luciferase
activity in both transfected chondrocytes and fibroblasts. That
exception, the activities seen for the 500(
) construct, suggests that
some low-level promoter activities may result from largely accidental
sequence assemblages. In sum, the data suggest the following functional
roles for portions of the aggrecan 5
flanking sequence in the two cell
types: 1) general repression upstream of the pr900 site, especially
between
638 and
1038 (pr1300); 2) strong chondrocyte-specific
enhancement in the pr900-pr500 interval (
638 to
247); 3) a positive
element, possibly IgHC.21, occurs in the small prA-prB interval (
283
to
240); 4) the prB-prD segment (
240 to
69) has a negative role, strongest in fibroblasts; and 5) the small (
69 to
1) pD-prE interval, bearing SP1 and AP-2 elements, is stimulatory in
chondrocytes. It is also apparent that constructs lacking either the dT
or dA tracts (e.g. 900(+) and D(+)) are quite active;
therefore, interaction between these repeats is not required for
promoter function in this system.
We have found that a 1.8-kb genomic fragment from the 5 end of
the chick aggrecan gene is able to drive expression of the pGL2-Basic
luciferase reporter gene in a tissue-specific manner. Determining the
sequence of this construct revealed more than 202 potential
transcription factor binding sites. This structural information allowed
us to proceed with a functional analysis of the effects of potentially
active cis elements that may confer tissue and developmental
specificity on expression of the aggrecan gene by using a series of
nested deletion constructs. These sequences ranged from the full 1.8 kb
(Ag-1(+)) to a minimal 140-bp construct (F+).
Of the numerous potential cis elements found in the Ag-1
sequence, several are of particular interest with respect to control of
aggrecan expression. Positions 873 and
721 in the Ag-1 sequence are
the 5
ends of two copies of the sequence CACCTCC (CIIS2), which has
been suggested to be a silencer motif in the COL2A1 promoter
(29). This particular sequence has been shown to inhibit transcription
of the type II collagen promoter in fibroblasts while not significantly
changing expression in chondrocytes (29). Indeed, this seems to be
consistent with our results because deletion of these two motifs from
the 1300(+) to 900(+) constructs reduced the cell type specificity of
luciferase expression while the overall promoter activities increased.
This motif is also present in the promoter region of COL4A2;
however, tissue-specific regulation in fibroblasts versus
chondrocytes remains to be investigated in this system (30).
The chick aggrecan 5 flanking region contains a second silencer
consensus sequence, (CIIS1) ACCCTCTCT (29) at position 127, which is
also found in COL2A1. The CIIS1 sequence occurs in an
interspersed rat repetitive sequence (31) and in another repetitive
sequence found in the avian genome named the CR1 element (32, 33).
Further negative regulatory functions have been shown in the chick
lysozyme gene (34), rat insulin gene (31), mouse IgH gene (35), human
-interferon gene (36), and the human
-globin gene (37). In the
Ag-1 sequence, this motif is located within 200 bp downstream of the
putative Sp1 site. A "push and pull" mechanism has been proposed
for transcriptional regulation in two systems, the low density
lipoprotein receptor gene and the COL2A1 gene (29, 38). This
model proposes that the sterol-dependent binding of a
protein to a consensus sequence could inhibit the positive activation
of a nearby Sp1 binding site (38); such a silencer element acting in a
"push and pull" mechanism could likewise be responsible for the
temporal and tissue-specific regulation of the aggrecan gene.
The Ag-1 sequence contains one putative NF-I site at position 1282.
The NF-I proteins are transcriptional activators derived from a
multigene protein family in the vertebrate phylum (39-42). Chick
tissues contain NF-I products that are derived from four separate genes
that have the potential of producing 12 isoforms (42). Recently, it has
been shown that the silencer SI is very similar to the NF-I/CTF family,
and an additional silencer, SII, is similar to an NF-I/CTF half site
(43) This suggests that NF-I-related proteins can mediate
transcriptional repression in cells of mesenchymal origin (42). Our
sequence does not contain the sequence motifs of SI or SII, but Szabo
et al. (43) suggest that the NF-I family of regulator
proteins can be modulated as silencers in addition to their previously
accepted role as activators. The presence of a putative NF-I site
raises the possibility of mesenchyme-specific regulation controlled by
this element in addition to possible modulation by unreported
silencers, thus creating a more dynamic system than one based solely on
NF-I activation.
From footprinting analysis, Long and Linsenmayer (44) reported a novel
transcription factor binding sequence, ACACACAGA, acting in the
regulation of COL10A1, and suggested that this factor may
act as a silencer. The proximal promoter region of COL10A1 is responsible for regulating expression in hypertrophic
chondrocytes (44). Our reported sequence contains four positions,
1140,
491, 151, and 214, where the CACACA motif is present. Perhaps these sequences are involved in chondrocyte-specific expression of
aggrecan. The CACACA motif may also be relevant because repeats of
(CA)n are markers for Z-DNA formation, contributing to
secondary structure (45). Moreover, this motif has been shown to be a potential hot spot for recombination and can contribute to gene expression (26). Clustering of these sequences near the transcriptional start sites that have been identified for chick aggrecan may contribute to the mechanism of transcriptional regulation by altering DNA secondary structure.
The chick aggrecan promoter exhibits <40% sequence similarity to either the mouse promoter (19) or the 120-bp rat (20) promoter fragment, indicating that this promoter/enhancer region is not highly conserved across the taxa. Interestingly, the untranslated first exon in chick aggrecan contains less than 45% similarity compared with rat, mouse, or human sequences (19, 20, 46). Although the lack of identifiable similarity between the chick and mammalian aggrecan first exons might be attributable to the existence of fewer selection pressures on an untranslated sequence, this argument is not readily extended to promoter sequences. Also puzzling is that although the rat and mouse promoter sequences share 93% identity with each other, none of the described transcriptional start sites coincide with each other in these two similar promoter regions.
There are, however, similarities in TATA-binding motifs among promoters
of cartilage-specific genes. As is the case for the mouse and rat
aggrecan promoter regions, the chick 5 flanking sequence lacks a
classical TATA box and contains multiple transcriptional start sites
(19, 20). Although a TATA-less promoter with multiple GC-rich regions
is the hallmark of many housekeeping genes (47), many other genes that
are temporally regulated have been shown to have promoters with similar
structures (48, 49). It is interesting that the 5
flanking sequence of
the chick link protein gene also contains multiple transcription start
sites and lacks a classical TATA box (50); rather, it has a TATA
motif-like sequence TCTAA (51). The chick aggrecan sequence contains
two TCTAA motifs, one that is 31 bp and another that is 94 bp upstream of the start sites at positions 40 and 157-158, respectively (Fig. 3).
The TCTAA sequence is also present in the human and chick link protein
promoter region (50, 52) and in the serine/glycine-rich proteoglycan
(51). However, human link protein has only one transcription start site
(52). Thus, it would be interesting to determine whether the human
aggrecan sequence has only one transcription start site, which would
provide further evidence for similarity in the evolution of the link
protein and the aggrecan genes, as has been suggested (13).
Overall, this study has established the 5 flanking sequence as having
three major transcription start sites in addition to several putative
cis elements and a potential secondary structure that may
control expression of the aggrecan gene. We have demonstrated tissue-specific promoter activity with the 1.8-kb region and have systematically mapped subregions that produce activation or repression of downstream reporter genes in two cell types in culture. This study
paves the way for more directed studies of the individual cis elements identified and their interaction with
trans-acting factors so that we may better understand the
mechanisms by which the aggrecan gene is regulated.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) U83593[GenBank].
We thank Dr. Miriam Domowicz for helpful suggestions during the course of this study.