(Received for publication, June 11, 1996, and in revised form, November 21, 1996)
From the Department of Molecular Genetics, The
University of Texas M. D. Anderson Cancer Center,
Houston, Texas 77030 and the § Department of Basic
Gerontology, National Institute for Longevity Sciences, 36-3, Gengo
Morioka-cho, Obu, Aichi 474, Japan
We have used the yeast one-hybrid system to
clone transcription factors that bind to specific sequences in the
proximal promoters of the type I collagen genes. We utilized as bait
the sequence between 180 and
136 in the pro-
2(I) collagen
promoter because it acts as a functional promoter element and binds
several DNA-binding proteins. Three cDNA clones were isolated that
encoded portions of the mouse SPR2 transcription factor, whereas a
fourth cDNA contained a potential open reading frame for a
polypeptide of 775 amino acids and was designated BFCOL1. Recombinant
BFCOL1 was shown to bind to the
180 to
152 segment of the mouse
pro-
2(I) collagen proximal promoter and to two discrete sites in the
proximal promoter of the mouse pro-
1(I) gene. The N-terminal portion
of BFCOL1 contains its DNA-binding domain. DNA transfection experiments using fusion polypeptides with the yeast GAL4 DNA-binding segment indicated that the C-terminal part of BFCOL1 contained a potential transcriptional activation domain. We speculate that BFCOL1
participates in the transcriptional control of the two type I collagen
genes.
Type I collagen is a protein that is abundantly synthesized by a
discrete number of cell types including osteoblasts, odontoblasts, fibroblasts, smooth muscle cells, and mesenchymal cells. It is composed
of two 1 chains and one
2 chain forming a characteristic triple
helix. Expression of the genes for these polypeptides is coordinately
regulated in a variety of physiological and pathological situations
(1). Changes in the synthesis of type I collagen occur not only during
embryonic development in specific tissues but changes also take place
in disease states, for example during wound healing as well as in
fibrotic diseases such as lung fibrosis, cirrhosis, and scleroderma. In
many of these instances it is likely that the control of expression of
the two type I collagen genes is mainly exerted at the level of
transcription, but the precise mechanisms that control transcription of
these genes are still poorly understood. Our long term goal is to
identify the critical cis-acting elements in these two genes
and both the cell-specific and ubiquitous transcription factors that
presumably control their expression.
Recently, transgenic mouse studies have identified strong
tissue-specific enhancer elements in the 5-flanking regions of both
type I collagen genes (2-5). These elements are located further
upstream than the proximal promoter elements. For instance, in the
mouse pro-
1(I) gene, a potent enhancer element for osteoblast and
odontoblast expression was localized about 1.6 kilobases
(kb)1 upstream of the start of
transcription, whereas another strong element for expression in tendon
and fascia fibroblasts was found between
2.3 and
3.2 kb (2).
Similar experiments from other laboratories have produced analogous
results (3, 4). These experiments strongly suggested that separate
elements control the expression of this gene in different type I
collagen-producing cells. In the pro-
2(I) gene, an element that
strongly enhanced expression in fibroblasts and mesenchymal cells was
located between 13.5 and 17.5 kb upstream of the transcription start
(5). One can speculate that proteins binding to the upstream enhancers in both type I collagen genes cooperate with transcription factors binding to the proximal promoters to activate transcription in specific
cell types.
Previous studies have identified several functional
cis-acting elements in the 350-bp (base pair) proximal
promoter of the mouse pro-2(I) collagen gene (6). These included a
binding site for the ubiquitous heterotrimeric CCAAT-binding factor
(CBF), between
75 and
98 (7-9), redundant GC-rich binding sites
for several proteins between
65 and
105, between
114 and
131, and between
152 and
176 (10). Several classes of proteins that are
mainly ubiquitous proteins bind to these redundant sites. Transient
expression and in vitro transcription experiments with wild-type and mutant templates indicated that the segment between
40
and
170 containing the three redundant elements was essential for
promoter activation. Other studies identified three short cis-acting GC-rich elements in the human pro-
2(I)
collagen gene between
323 and
264 (11) that were capable of binding
SP1. Additional studies presented evidence that a protein complex which includes SP1 binds to this segment of the human promoter and
participates in the transforming growth factor-
activation of this
promoter (12). In the mouse promoter there is also a binding site for CTF/NF1 between
305 and
290 (13).
In the mouse pro-1(I) collagen gene, the sequence between
220 and
the TATA box presents strong homologies with the sequence of the
pro-
2(I) gene in the same region. This DNA segment contains binding
sites for DNA-binding factors that also bind to the proximal pro-
2(I) promoter. The DNA elements in the pro-
1(I) promoter include a binding site for CBF between
90 and
115, two apparently redundant sites between
190 and
170 and between
160 and
130 for
a DNA-binding protein previously designated inhibitory factor 1 (IF-1),
and two sites between
130 and
80 that flank the CBF-binding site
and are binding sites for SP1 and probably other GC-rich binding
proteins (14). DNA transfection experiments with the pro-
1(I)
promoter showed that point mutations in the CBF-binding site decreased
promoter activity, whereas small substitution mutations in some of the
other sites resulted in an increase in transcription (14, 15). It was
also shown that the sequence of the pro-
2(I) promoter between
173
and
143 was able to compete for the binding of a protein that was
forming a major DNA-protein complex with two redundant elements in the
pro-
1(I) promoter between
190 and
170 and between
160 and
130, suggesting that both type I promoters contained binding sites
for the same proteins (15). Fig. 1 summarizes binding
sites for DNA binding proteins in the two mouse proximal type I
collagen promoters.
The purpose of the present study was to identify one or more
trans-acting factors that bind to these proximal
cis-acting elements in the type I collagen promoters. We
have used the sequence between 180 to
136 of the pro-
2(I)
collagen promoter to clone cDNAs for proteins binding to this
segment using the yeast one-hybrid system (16, 17). This segment was
chosen mainly because previous experiments with the pro-
2(I)
proximal promoter showed that the DNA segment between
180 and
136
was capable of binding an array of DNA-binding proteins, many of which
also bound to the same promoter between
133 and
105 and between
105 and
65; the sequence between
180 and
136 was also binding
these proteins with greater efficiency than the more proximal sequences
(10). One of the cDNAs that was cloned encodes a polypeptide of 775 amino acids, which also bound to two discrete sites in the pro-
1(I)
collagen promoter.
The yeast strain BY 164 (MAT a his3 200 leu2-3,
112 ura3-52 lys2-801a trp1a) was provided by Dr. Stevan Marcus.
The yeast reporter plasmid was constructed as follows. Six tandem
copies of a double-stranded oligonucleotide corresponding to the
sequence from
180 to
136 bp of the mouse pro-
2(I) collagen
promoter were inserted into the BamHI site of the vector
pRS315HIS containing the LEU2 gene as selectable marker (16,
18, 19) to generate pRS315HIS-6x160 (160 denotes the sequence between
180 and
136). The XbaI-SalI fragment of
pRS315HIS-6x160 was then subcloned into the
XbaI-SalI site of the vector pRS305 (16); this
plasmid was designated pRS305HIS-6x160. After digestion with
ClaI, this vector was used for transformation. Yeast
transformation was performed by the polyethylene glycol/lithium acetate
method (20). Plasmid integration in the genome of yeast strains was
confirmed by Southern blot analysis using a 32P-labeled
oligonucleotide from
180 to
136 bp. Cells were then plated on a
minimal synthetic dextrose plate without histidine to verify background
HIS3 gene activity. One of the yeast strains that had
minimal HIS3 gene activity was also selected as the strain for the transformation after the initial selection. Plasmid
pJL638-6x160 contained six tandem copies of the sequence from
180 to
136 of the mouse pro-
2(I) promoter in the pBgl-lacZ
plasmid harboring the URA3 gene as a selectable marker (17).
The yeast strain in which both pRS305HIS-6x160 and pJL638-6x160
plasmids were integrated was used for cDNA library transformation.
However, because lacZ was constitutively expressed at low
levels in this strain, 5-bromo-4-chloro-3-indoyl
-D-galactoside staining could not be used for screening
HIS3-positive colonies. cDNAs were generated from the
mRNA of primary fibroblasts of 14-day mouse embryos by priming with
either oligo(dT) or with a random hexamer using the TimeSaver cDNA
synthesis kit (Pharmacia Biotech Inc.) and cloned into the
EcoRI-NotI site or EcoRI site of
plasmid pPC86 containing the TRP1 gene as a selectable
marker (19). The ligation products were electroporated into the
Escherichia coli strain MC1061, and the resultant
transformants (6 × 106 for the directionally cloned
library, 3 × 106 for the non-directionally cloned
plasmid library) were plated onto ampicillin plates. After scraping the
cells from the plates, the plasmid library was purified with Promega's
WizardTM Maxiprep DNA purification system with additional
phenol and chloroform extractions. Ten to 20 µg of cDNA plasmid
from the libraries were transformed into the yeast strain harboring the
two reporter plasmids integrated into the genome and plated onto plates
lacking leucine, uracil, tryptophan, histidine, and
3-amino-1,2,4-triazole. Transformation efficiency was about 2-3 × 105/µg cDNA plasmid. Colonies were picked after
5-7 days. Plasmid cDNAs were extracted and used for
retransformation either into the same yeast strain or the yeast strain
into which plasmid pRS305-HIS plasmid instead of pRS305-HIS-6x160
plasmid had been integrated.
DNA sequencing was carried out using a
primer present in the DNA for the GAL4 transactivation domain
(5-GGATGTTTAATACCACT-3
) or T3, T7, and SP6 primers.
Three different recombinant polypeptides corresponding to the full-length, the N-terminal part, and the C-terminal part of BFCOL1 were generated using the TNT-coupled reticulocyte lysate system (Promega Corp). For the full-length polypeptide, the SalI-NotI fragment of pPC86-BFCOL1 was inserted into the SalI-NotI site of the pBluescript KS vector (pBS-BFCOL1-full). For the N-terminal and C-terminal polypeptides, the SalI-NsiI fragment and the ScaI-NotI fragment of pPC86-BFCOL1 DNA were inserted into the SalI-PstI site of pGEM 5Zf(+) and the EcoRV-NotI site of pBluescript KS to generate p5Zf-BFCOL1-N and pBS-BFCOL1-C, respectively. 35S-Labeled polypeptide products were analyzed using 10% sodium dodecyl sulfate (SDS)-polyacrylamide gel electrophoresis. These were exposed to Fuji RX film.
Generation of Fusion Polypeptides with Glutathione S-Transferase (GST)Three different fusion polypeptides were generated. For the full-length and N-terminal fusion polypeptides, the SalI-NotI fragment of pPC86-BFCOL1 and the SalI-NotI fragment of p5Zf-BFCOL1-N were inserted, respectively, into the XhoI-NotI site of the pGEX-4T-3 vector (Pharmacia Biotech Inc.). For the C-terminal fusion polypeptide, the SalI-NotI fragment of pBS-BFCOL1-C was inserted into the SalI-NotI site of the pGEX-4T-3 vector. Procedures for production and purification of fusion polypeptides were carried out as suggested by the manufacturer.
DNase I FootprintingDNA-binding reactions were performed
according to the method of the Core Footprinting System (Promega Corp).
Ten femtomoles (1 µl) of the end-labeled
BamHI-NarI fragment containing the 350 to +7-bp
sequence of the pro-
2(I) collagen promoter inserted in the
HindIII site of pA3LUC (20) was used as a DNA substrate. Binding reactions were started by addition of the glutathione S-transferase (GST) protein or GST-BFCOL1 full-length
recombinant polypeptide. At the end of the reaction the samples were
heat-denatured and loaded onto a 6% polyacrylamide, 8 M
urea gel. Gels were then autoradiographed at
80 °C with an
intensifying screen.
One microliter of recombinant protein
(either products of in vitro transcription and translation
or GST-fusion polypeptides) was incubated with 5 fmol of end-labeled
double-stranded oligonucleotide in a volume of 10 µl. Incubation was
carried out at room temperature for 20 min. All binding reactions
contained 10 mM Tris-HCl (pH 7.5), 4% glycerol, 50 mM NaC1, 0.5 mM EDTA, 0.5 mM
dithiothreitol, 1 mM MgCl2, and 0.5 µg of
poly(dA-dT). Following electrophoresis in a 5% polyacrylamide Tris
borate/EDTA gel, the gel was dried and subjected to autoradiography at
room temperature. The SP1 consensus oligonucleotide was purchased from
Promega. The Krox consensus oligonucleotide (23) and other
oligonucleotides containing specific sequences of the pro-1(I) and
pro-
2(I) collagen promoters were produced by an oligonucleotide
synthesizer. Sequences of oligonucleotides used for competition
experiments are listed in Fig. 2.
Transfection Experiments
The
SmaI-SacI fragment of pPC86-BFCOL1, which
contains the full-length DNA of BFCOL1, and the ScaI
fragment of pPC86-BFCOL1, which contains the N-terminal portion of
BFCOL1, were subcloned into the SmaI-SacI- and
SmaI-digested pSG424 vector, respectively (24). In the case
of the plasmid containing the C-terminal part of BFCOL1, we first made
two constructs. First, the XhoI-BamHI fragment of
pAS2 (25) containing the DNA-binding domain of the yeast transcription
factor GAL4 as well as several cloning sites was subcloned into pSG424
to add supplementary cloning sites (pSG424 m). Second, the
ScaI-NotI fragment of pPC86-BFCOL1 that contained the C-terminal part of BFCOL1 was subcloned into the
EcoRV-NotI site of plasmid pCITE-2a (Novagen,
Inc.) (pCITE-2a-BFCOL1-C). Finally, the NdeI-XhoI
fragment of pCITE-2a-BFCOL1-C was subcloned into the
NdeI-SalI site of pSG424-m to make the
pSG424-BFCOL1-C. Transfections were carried out using 10 µg of the
reporter plasmid containing the GAL-binding sites upstream of an SV40
promoter linked to the chloramphenicol acetyltransferase (CAT) gene, 5 µg of pSG424-derivative plasmid, and 5 µg of SVgal plasmid into 714 BALB 3T3 fibroblasts cells (26). Cells were harvested after 48 h and assayed for CAT activity (27).
-Galactosidase activity was
measured with a resorufin-
-D-galactopyranoside substrate (Boehringer Mannheim).
Total RNA was
extracted from 714 BALB 3T3 fibroblasts, NIH 3T3 fibroblasts, S194 B
cells, and EL4 T cells using TRIzolTM solution (Life
Technologies, Inc.). About 20 µg of each total RNA preparation was
electrophoresed on 1% agarose gels containing 1.1 M
formaldehyde. The RNA was transferred to Hybond-N nylon membranes
(Amersham Corp.). mRNA was detected with a 32P-labeled
EcoRI-BamHI fragment of BFCOL1 (Multiprime
labeling system, Amersham Corp.) after hybridization for 18 h at
42 °C in 5 × SSPE (1 × SSPE is composed of 0.18 M NaCl, 10 mM sodium phosphate (pH 7.7), and 1 mM EDTA), 5 × Denhardt's solution (1 × Denhardt's solution is composed of 0.02% bovine serum albumin, 0.02%
Ficoll, and 0.02% polyvinylpyrrolidone), 50% formamide, 0.1% SDS, 50 µg/ml heat-denatured salmon testis DNA, and radioactive probe.
Membranes were washed twice for 15 min each at 65 °C in a solution
containing 2 × SSC (1 × SSC is composed of 0.15 M NaCl and 15 mM sodium citrate) and 0.1% SDS,
then once in 1 × SSC with 0.1% SDS for 30 min at 65 °C, and
finally twice for 15 min each in 0.1% SSC with 0.1% SDS at room
temperature. The membranes were then autoradiographed at 80 °C
using Fuji RX film. Human glyceraldehyde-3-phosphate dehydrogenase
cDNA (Ambion) was used as an internal control.
Total RNA of S194 B cells, HT29 cells, and HSC34 cells
were annealed with the oligonucleotide 5-CTCTTAATCTCCACATTCAGTGCCTG-3
(12c) and reverse-transcribed using avian myeloblastosis virus reverse
transcriptase. The cDNA products were then subjected to PCR using
oligonucleotides 12n 5
-AATAGTAAGAGAAGTCTGAA-3
and 12c. The sense
strands of 12n and 12c are indicated in Fig. 3, A and B.
Previous studies indicated that the GC-rich DNA segment
between 180 and
136 of the mouse pro-
2(I) collagen gene was able to bind several different DNA-binding proteins in vitro and
that this segment was also able to compete for the binding of proteins to two other redundant but discrete sites closer to the transcription start site of this promoter (10). A deletion of this same segment resulted in substantial decrease in promoter activity. To begin to
identify some of the proteins that bound to this segment, we used this
DNA as bait in the yeast one-hybrid system and screened two mouse
embryo fibroblast cDNA libraries, one primed with oligo(dT) (library 1) and the other primed with a random hexamer oligonucleotide (library 2). In the yeast strain that was used for selection, plasmid
pRS305-HIS-6x160 was integrated into the genome. In this plasmid, six
tandem copies of the sequence of the pro-
2(I) gene between
180 and
136 were cloned upstream of a minimal yeast GAL1 promoter itself
linked to the HIS3 gene. After screening six million
independent colonies from library 1 and three million from library 2, an initial 81 histidine-positive colonies from library 1 and 44 histidine-positive colonies from library 2 were picked; 17 cDNA
plasmids from library 1 and 12 cDNA plasmids from library 2 gave
positive colonies upon retransformation of the parental strain.
However, most of these also gave histidine-positive colonies with the
yeast strain in which the control plasmid pRS305-HIS was integrated, a
plasmid that contains the minimal GAL1 promoter but not six tandem
copies of the
180 to
136 sequence. Only four cDNAs, all from
library 1, could specifically activate the HIS3 gene of the
yeast strain containing the pRS305-HIS-6x160 plasmid without activating
the HIS3 gene of the strain with the pRS305-HIS control
plasmid. This suggested the possibility that the recombinant fusion
polypeptides encoded by each of these four cDNAs might bind
specifically to the
180 to
136-bp segment of the pro-
2(I) collagen gene and not to the DNA sequence of the GAL1 minimal promoter.
One cDNA clone was designated as BFCOL1 (see below). Two other
cDNAs contained an almost full-length coding sequence for mouse
SPR-2 (28), and the fourth cDNA was a shorter partial cDNA for
SPR-2.
Fig. 3A presents the sequence of BFCOL1 cDNA. The open
reading frame starting from the first methionine codon in the cDNA is 2328 nucleotides long and encodes a putative polypeptide of 775 amino acids. The amino acid sequence corresponding to the N-terminal
400 amino acids of BFCOL1 presents a 95% identity with human ht, a
protein previously identified as binding to the promoter of the gene
for the V
8.1 chain of the human T cell receptor (29). Subsequent to
the codon for amino acid 400 in BFCOL1, the reported nucleotide
sequence of ht
DNA displays two translational frame changes compared
with that of BFCOL1 immediately followed by a termination codon (29).
The sequence of the 340 amino acids at the C terminus of BFCOL1 has no
significant amino acid sequence homology with other polypeptides
present in Genebank data bases. The nucleotide sequence preceding the
initial methionine codon of BFCOL1 is also similar to that of ht
except that the first 25 nucleotide residues at the 5
end of the
cDNA of BFCOL1 are different from comparable residues in ht
. As
reported for ht
, the deduced amino acid sequence of BFCOL1 contains
four potential zinc finger motifs.
Since the reported nucleotide sequence of the cDNA of ht
following the termination codon is also about 90% identical to that of
BFCOL1 DNA, we performed RT-PCR experiments in order to verify the
nucleotide sequence of the human homologue of BFCOL1 RNA. We used two
primers that bracketed the sequence containing the reported frameshifts
and stop codon in ht
and RNAs from two different human cell lines,
the colon carcinoma cell line HT29 and the stomach cell line HSC34. The
location of the primers that were used (12n, 12c) are indicated in Fig.
3A. The sequence of the PCR product from the RNAs of these
human cells is presented in Fig. 3B and shows a continuous
open reading frame without the frameshifts that were reported
earlier.
We then asked whether the entire open reading frame shown in Fig.
3A was translated into a polypeptide of the expected size, and we constructed three plasmids for in vitro
transcription-translation. One plasmid encoded the full-length BFCOL1
(pBS-BFCOL1-full, from 1 to 2426), whereas the others encoded the
N-terminal part (p5Zf-BFCOL1-N, from 1 to 1156) and the C-terminal
segment (pBS-BFCOL1-C, from 1395 to 2426) of BFCOL1. The major product
of pBS-BFCOL1-full (Fig. 4, lane 1) was a
single polypeptide, whereas the DNA of p5Zf-BFCOL1-N (lane
2) gave rise to two major polypeptides and several fainter
species. The major polypeptide species appeared to run more slowly by
SDS-polyacrylamide gel electrophoresis than expected from their
estimated molecular sizes (expected sizes are 89 kDa for
pBS-BFCOL1-full and 43 kDa for p5Zf-BFCOL1-N) maybe due to increased
SDS binding. The major product of pBS-BFCOL1-C (Fig. 4, lane
3) was a single polypeptide that had about the expected size
(estimated size is 37 kDa).
DNA-binding Experiments
Gel shift experiments were performed
with the 180 to
136 DNA segment of the mouse pro-
2(I) collagen
promoter that was used in the one-hybrid screen to verify whether
BFCOL1 was able to form a DNA-protein complex under the conditions of
this assay and to determine which segment of the BFCOL1 polypeptide
contained a DNA-binding domain. In these experiments, the products of
in vitro transcription-translation shown in Fig. 4 were
tested. With the full-length BFCOL1, one major protein-DNA complex was
detected (Fig. 5, lane 4), whereas two
protein-DNA complexes were seen with the N-terminal BFCOL1 (Fig. 5,
lane 3), the upper complex being less intense than the lower
complex, although both complexes were migrating faster than the complex
with full-length BFCOL1. The presence of these two complexes could
possibly be a result of the heterogeneity of protein products observed
with the cDNA for N-terminal BFCOL1 (see Fig. 4, lane
2). With the polypeptide corresponding to the C-terminal part of
BFCOL1, no protein-DNA complexes were detected other than nonspecific
bands (Fig. 5, lane 2). These results suggested that the
N-terminal part of BFCOL1 contained a DNA-binding domain and that DNA
binding might be mediated by the four tandem zinc fingers, consistent
with previous results obtained with ht
(29).
The proximal promoter of the mouse pro-2(I) collagen gene contains
several redundant GC-rich elements. To examine whether other segments
of the 350-bp proximal promoter of this gene contained binding sites
for BFCOL1, in vitro DNase I footprints were performed using
a recombinant GST-full-length BFCOL1 fusion polypeptide and the
promoter segment from
350 to +7. As shown in Fig. 6, the recombinant GST-full-length BFCOL1 fusion polypeptide protected the
region between
180 and
152 (lane 2) and only this
segment. When a similar DNase I footprint was performed using the
promoter labeled on the other strand (lane 4), again no
other protected regions were observed. Hence, recombinant BFCOL1 binds
only to one specific sequence in this proximal promoter and not to
other GC-rich sequences.
To further confirm the specific binding of BFCOL1 to a discrete site in
the proximal promoter of the pro-2(I) collagen gene, gel shift
experiments were performed using a 32P-labeled
180 to
136 oligonucleotide as probe and several competitor DNA
oligonucleotides corresponding to specific sequences present in the
mouse pro-
2(I) collagen proximal promoter (Fig. 7).
In this experiment, the product of full-length BFCOL1 DNA was used, generated by in vitro transcription-translation. As
expected, the
180 to
136 oligonucleotide (lane 2)
competed for binding, as did, with somewhat less efficiency, the
shorter
176 to
152 oligonucleotide (lane 3), which is
included in the former. Other oligonucleotides from
140 to
86
(lane 4), from
135 to
104 (lane 5), from
105 to
65, which includes the CBF-binding site (lane 6),
and from
315 to
284, which includes a CTF/NF1-binding site
(lane 7), were unable to compete. An SP1 consensus
oligonucleotide had practically no effect (lane 8), and an
oligonucleotide containing a Krox consensus binding site was also
unable to compete (lane 9). When labeled
140 to
86 and
105 to
65 oligonucleotide probes were used in gel shift assays,
BFCOL1 was unable to bind to these DNAs (data not shown). These results
confirmed that BFCOL1 was specifically binding to the
180 to
152
region of the pro-
2(I) collagen promoter and indicated that this
binding site must be different from a binding site for either SP1 or
Krox and their family members.
The results of previous gel shift experiments using crude nuclear
extracts of NIH/3T3 fibroblasts were consistent with the hypothesis
that a DNA-binding protein was binding to discrete sites in both the
proximal pro-1(I) and pro-
2(I) collagen promoters (14, 15). This
protein was tentatively designated inhibitory factor 1 (IF-1) based on
the result that substitution mutations in its binding sites in each
promoter, which inhibited its binding, resulted in an increase in
promoter activity. The binding site of IF-1 in the pro-
2(I) promoter
corresponded to the binding site for BFCOL1, whereas the binding sites
in the pro-
1(I) promoter were located between
190 and
170 and
between
160 and
130. To test whether these sites in the pro-
1(I)
promoter could also bind BFCOL1, gel shift assays were performed using
two labeled oligonucleotides from
194 to
168 and from
168 to
129 in the mouse pro-
1(I) collagen promoter in conjunction with
the GST-full-length BFCOL1 fusion polypeptide. When this GST-fusion
polypeptide was used with an oligonucleotide corresponding to the
sequence between
180 to
136 in the pro-
2(I) promoter, two
complexes were observed, a slower migrating complex and a more intense
faster migrating complex (Fig. 8, lane 1).
The difference in the pattern of DNA-protein complexes between those
observed with GST-BFCOL1 fusion polypeptides synthesized in E. coli and those seen with BFCOL1 synthesized in the reticulocyte
lysate (see Fig. 5) could possibly be due to the heterogeneity of the
GST-BFCOL1 fusion polypeptides as examined by SDS-polyacrylamide gel
electrophoresis (data not shown). Fig. 8 shows that the recombinant
GST-BFCOL1 full-length fusion polypeptide was also able to bind to the
168 to
129 pro-
1(I) oligonucleotide and with much less
efficiency to the
194 to
168 pro-
1(I) oligonucleotide DNAs (Fig.
8, lanes 9, and 17).
Mutations were then introduced into these DNA segments, i.e.
5 CGCGC
CCC 3
5
CGCGC
CCC 3
in the
194 to
168 sequence of the pro-
1(I) (sequence represents lower
strand) and 5
TCC
CCCTC 3
5
TCC
CCCTC 3
in both the
168 to
129
sequence of pro-
1(I) and the
180 to
136 sequence of the
pro-
2(I) promoter, and the mutant oligonucleotides tested in
DNA-binding assays with recombinant BFCOL1. Lanes 8, 16, and
24 of Fig. 8 show that each of these mutations abolished the
binding of the recombinant GST-BFCOL1 full-length polypeptide. We also
performed competition experiments using the wild-type oligonucleotides
as probes. With each of the three wild-type-labeled oligonucleotide
probes, the wild-type
180 to
136 sequence of pro-
2(I) (Fig. 8,
lanes 2, 10, and 18) and the wild-type
168 to
129 sequence of pro-
1(I) (Fig. 8, lanes 4, 12, and 20) acted as strong competitors. In contrast, the mutant
180 to
136 sequence of pro-
2(I) (Fig. 8, lanes 3, 11, and 19) and the mutant
168 to
129 sequence of
pro-
1(I) (Fig. 8, lanes 5, 13, and 21) were
unable to compete. The wild-type
194 to
168 oligonucleotide of the
pro-
1(I) collagen promoter had little effect as competitor when the
other two oligonucleotides were used as probes (Fig. 8, lanes
6 and 14), confirming the notion that this sequence
binds BFCOL1 much less efficiently than the other two sites. A mutant
oligonucleotide corresponding to the
194 to
168 sequence of the
pro-
1(I) promoter had no effect as competitor with all
oligonucleotide probes (Fig. 8, lanes 7, 15, and
23). Hence, BFCOL1 binds to two sites in the pro-
1(I) proximal promoter with different efficiencies and to one site in the
pro-
2(I) collagen proximal promoter. The locations of these binding
sites are the same as those previously identified as binding to IF-1.
The same substitution mutations that inhibited IF-1 binding in crude
nuclear extracts (14, 15) also inhibited BFCOL1 binding.
To test whether BFCOL1 could either
activate or inhibit transcription, DNAs for the "full-length" and
N-terminal segment of BFCOL1 were cloned in a mammalian expression
vector, and these DNAs were cotransfected with a pro-2(I) collagen
promoter (
350 to +54) linked to the luciferase reporter gene in BALB
3T3 fibroblasts. A plasmid containing DNA for the C-terminal part of
BFCOL1, lacking the DNA-binding domain and driven by the same mammalian
expression promoter, served as control. No activation occurred with any
of the three BFCOL1 constructions (data not shown). At higher
concentrations the full-length BFCOL1 and the C-terminal BFCOL1 segment
caused inhibition of the 350-bp pro-
2(I) collagen promoter,
presumably as a result of squelching, i.e. titration of
another transcription factor that is important for expression of this
promoter (30). Very similar results were observed when the promoter
contained six tandem repeats of the
180 to
136 pro-
2(I) sequence
cloned upstream of a minimal pro-
2(I) promoter (
40 to +54). Again
at higher concentrations of BFCOL1, inhibition occurred, but this took
place even when the reporter plasmid contained mutations that abolished
binding of BFCOL1, strongly suggesting that the inhibition was not
dependent on binding of BFCOL1 to DNA and hence presumably due to
squelching. Similar results were also obtained after cotransfection of
the BFCOL1 plasmids and the reporter plasmids in S194 B cells (data
not shown).
To determine whether segments of BFCOL1 contained a potential
transactivation domain, three mammalian expression plasmids were
constructed coding for fusion polypeptides with the yeast GAL4
DNA-binding domain. The DNAs for full-length, N-terminal, and
C-terminal fusion polypeptides were cotransfected along with a plasmid
containing a GAL4-binding site upstream of an SV40 promoter itself
linked to the CAT gene. Activation of the reporter gene occurred only
with the plasmid coding for the GAL4-BFCOL1 C-terminal fusion
polypeptide. No transcriptional activation was detected with either the
GAL4-BFCOL1 full-length or the GAL4-BFCOL1 N-terminal fusion
polypeptides (Fig. 9). This experiment suggested that
the C-terminal segment of BFCOL1 contained a potential transcription activation domain. This does not exclude the possibility that the other
segments of BFCOL1 might contain additional activation domains. Indeed,
the presence of the BFCOL1 DNA-binding domain in the two other fusion
polypeptides might eventually interfere with binding to the
GAL4-binding site in the promoter of the reporter plasmid.
Northern Blot Analysis
To determine the size of the BFCOL1
RNA transcripts, a Northern hybridization experiment was performed
(Fig. 10). Three RNA transcripts were identified as
follows: one transcript had a size of about 9 kb, another of about 5.5 kb, and a third RNA, which migrated more slowly than the 9-kb species,
was seen in S194 B cells. These RNAs are all larger than the size of
our cDNA. This pattern of RNAs is analogous to that previously
shown to hybridize to the ht DNA probe although the shorter RNA was
shown to have a mobility of 4 to 4.2 kb in humans. It is possible that
in the human RNA either the 3
-untranslated segment or the
5
-untranslated segment or both are shorter than in the mouse RNA (29).
In our experiments, the two major species of 9 and 5.5 kb were seen in two fibroblast cell lines, a B cell line and a T cell line.
We have used the yeast one-hybrid system to clone a cDNA for a
protein, designated BFCOL1, that binds to the segment between 180 and
136 in the promoter of the mouse pro-
2(I) collagen gene. This
DNA-binding protein, which appears to be a ubiquitous protein, does not
bind to other sites within the proximal 350 bp of this promoter but
binds to two discrete sites in the proximal promoter of the mouse
pro-
1(I) collagen gene. The cDNA of BFCOL1 contains an open
reading frame for a polypeptide of 775 amino acids. The sequence of the
N-terminal 400 amino acids of BFCOL1 shows a 95% identity with that of
ht
, a human polypeptide that binds to the promoter of a
-subunit
of the human T cell receptor gene (29). The reported open reading frame
of ht
codes for only 454 amino acids due to two translational
frameshifts and a termination codon in the sequence compared with that
of BFCOL1. In contrast, our RT-PCR experiments indicated a continuous
open reading frame in this segment of the human RNA encoding an amino acid sequence essentially identical to that specified by the mouse BFCOL1 RNA. It is, therefore, almost certain that BFCOL1 and ht
correspond to one and the same gene. The C-terminal amino acid segment
of BFCOL1 corresponds to a new unique sequence containing at least one
serine-threonine-rich segment and is overall much more hydrophobic than
the N-terminal half. Our experiments using fusion polypeptides with the
DNA-binding domain of GAL4 indicate that the C-terminal part of BFCOL1
has the potential to serve as a transcriptional activator domain.
Preliminary experiments with the yeast two-hybrid system also indicated
that the C-terminal part of BFCOL1 could interact with the TATA-binding
protein- associated factor TAF 110 (not shown). The N-terminal portion
of BFCOL1 includes its DNA-binding domain, and its sequence contains
four potential zinc fingers of the class
Cys2-X12-His2.
The sequences in the two type I collagen genes to which BFCOL1 binds do
not contain the CACCC box sequence to which ht was proposed to bind
(29). The sequence between
176 and
152 in the pro-
2(I) collagen
promoter and the sequence between
168 and
129 in the pro-
1(I)
collagen promoter contain an 11-bp identical sequence (5
CCTCCCCCCTC
3
) that must be part of the binding site as mutations within this
sequence abolish binding of BFCOL1. The other binding site in the
pro-
1(I) promoter between
194 and
168 is a much weaker binding
site, and although it is pyrimidine-rich, it does not contain the same
11-bp sequence. The same mutations in each of the two identical 11-bp
conserved sequences abolish binding of BFCOL1 to larger pro-
2(I) and
pro-
1(I) oligonucleotides. In earlier experiments, these same
mutations were shown to increase the activity of each of the two
promoters 3- to 4-fold in transient DNA-transfection experiments (14,
15). Mutations in the
194 to
168-binding site of the pro-
1(I)
promoter also caused an increase in promoter activity, but this
increase was clearly much smaller than with mutations in the 11-bp
identical sequence of the pro-
1(I) and pro-
2(I) promoters. Hence,
there is a correlation between the efficiency of binding of BFCOL1 to
its binding sites in the two type I collagen promoters and the
previously reported effects of mutations in these sites on increase in
promoter activity. Another recently identified DNA-binding protein,
cKrox (31), was also shown to bind to the same sequences in the
pro-
1(I) promoter as those to which BFCOL1 is binding. However,
cKrox appeared to bind more efficiently to the
194 to
168 sequence
than to the
168 to
129 sequence.
Recent experiments from our laboratory have shown that several classes
of proteins could bind to the segment of the pro-2(I) collagen
promoter between
180 and
136 (10). These proteins included SP1,
proteins different from SP1 that bind to an SP1 consensus binding site,
proteins that bind to a Krox consensus binding site, and probably
others. Many of these proteins could also bind to two discrete sites
that are closer to the start of transcription in this promoter. A
deletion of the
180 to
136 segment decreased promoter activity
about 2-3-fold. If the three redundant sites are deleted together, no
promoter activation occurred above that of a minimal promoter
containing a TATA box and a transcription start site (
40 to +54).
Hence, the overall activity of the
180 to
136 element in the
pro-
2(I) promoter is clearly that of an enhancer, implying that at
least some of the several proteins that bind to this element should act
as transcription activators. We speculate that as with nuclear extracts
in vitro (10), several proteins in vivo also
compete for binding to this element. In experiments using fusion
polypeptides with the yeast GAL4 DNA-binding domain, we showed that the
C-terminal part of BFCOL1 had the potential of being a transcriptional
activator domain; the degree of activation was, however, weaker than
with the activation domain of two other DNA-binding proteins that were
similarly tested as GAL4 fusion polypeptides.2 One possible hypothesis for
the increase in promoter activity which took place with mutations that
abolished the binding of BFCOL1 (14, 15) is that the binding of other
transcription factors that bind to the same area of the promoter to
binding sites which would overlap partly with that of BFCOL1 would now occur more efficiently. If these transcription factors were more potent
activators than BFCOL1, then the net result would be an increase in
promoter activity.
In brief, BFCOL1 appears to be one of several ubiquitous proteins that bind to discrete sites in the proximal promoters of the two type I collagen genes and probably control the activity of these promoters in conjunction with other ubiquitous DNA-binding proteins, as well as tissue-specific transcription factors.
It was interesting that in addition to BFCOL1 the yeast one-hybrid
system identified SPR2, a DNA-binding protein related to SP1. Our
earlier experiments had suggested that the 180- to
136-bp segment
of the pro-
2(I) gene was capable of binding SP1 and other proteins
different from SP1 that bind to a consensus SP1-binding site (10). SPR2
could be one of these proteins.
We are indebted to Xin Zhou for many useful suggestions with the yeast one-hybrid system. We thank Patricia McCauley for editorial assistance.