(Received for publication, October 11, 1995; and in revised form, December 14, 1995)
From the
1,4-Galactosyltransferase (
4-GT) is a constitutively
expressed enzyme that synthesizes the
4-N-acetyllactosamine structure in glycoconjugates. In
mammals,
4-GT has been recruited for a second biosynthetic
function, the production of lactose which occurs exclusively in the
lactating mammary gland. In somatic tissues, the murine
4-GT gene
specifies two mRNAs of 4.1 and 3.9 kilobases (kb), as a consequence of
initiation at two different start sites
200 base pairs apart. We
have proposed that the region upstream of the 4.1-kb start site
functions as a housekeeping promoter, while the region adjacent to the
3.9-kb start site functions primarily as a mammary gland-specific
promoter (Harduin-Lepers, A., Shaper, J. H., and Shaper, N. L.(1993) J. Biol. Chem. 268, 14348-14359).
Using DNase I
footprinting and electrophoretic mobility shift assays, we show that
the region immediately upstream of the 4.1-kb start site is occupied
mainly by the ubiquitous factor Sp1. In contrast, the region adjacent
to the 3.9-kb start site is bound by multiple proteins which include
the tissue-restricted factor AP2, a mammary gland-specific form of
CTF/NF1, Sp1, as well as a candidate negative regulatory factor that
represses transcription from the 3.9-kb start site. These data
experimentally support our conclusion that the 3.9-kb start site has
been introduced into the mammalian 4-GT gene to accommodate the
recruited role of
4-GT in lactose biosynthesis.
1,4-Galactosyltransferase (
4-GT) (
)is a
trans-Golgi resident, type II membrane-bound glycoprotein that is
widely distributed in the vertebrate kingdom. It catalyzes the transfer
of galactose to N-acetylglucosamine residues, forming the
4-N-acetyllactosamine (Gal
4-GlcNAc) or
poly-N-acetyllactosamine structure found in glycolipids and
the N- and O-linked side chains of glycoproteins and
proteoglycans(1) . Since glycoconjugate biosynthesis occurs in
essentially all tissues, it can be considered a housekeeping function.
In mammals,
4-GT has been recruited for an additional
tissue-specific biosynthetic function, which is the production of
lactose (Gal
4-Glc) in the lactating mammary gland
(LMG)(2) .
The synthesis of lactose is catalyzed by the
protein heterodimer, lactose synthetase (EC 2.4.1.22), which is
assembled from 4-GT and
-lactalbumin. The net result of this
association is to lower the K
of glucose
for
4-GT about three orders of magnitude, thus making glucose an
effective acceptor substrate at physiological concentration.
-Lactalbumin is synthesized exclusively in the epithelial cells of
the mammary gland beginning in late pregnancy(3) . Enzymatic
levels of
4-GT also increase in the mammary gland beginning in
mid-pregnancy, in preparation for lactose biosynthesis(3) . The
expression of both
-lactalbumin and
4-GT is positively
influenced by the lactogenic hormones, insulin, hydrocortisone, and
prolactin(3) .
We have shown that the murine (4) and
bovine (5) 4-GT genes specify two mRNAs of
4.1 and
3.9 kb in somatic cells. The two transcripts are generated as a
result of initiation at two different start sites located on exon 1,
and separated by
200 bp. The main difference between the two mRNAs
is the length and extent of predicted secondary structure present in
the respective 5`-untranslated region(6) . Because each start
site is positioned either upstream of the first two in-frame ATGs (4.1
kb) or between these two in-frame ATGs (3.9 kb), translation of the two
mRNAs results in the synthesis of two functional, structurally related
protein isoforms that differ only in the lengths of their
NH
-terminal cytoplasmic domain (reviewed in Shaper and
Shaper(7) ).
The 4.1-kb start site is predominantly used in
all somatic cells and tissues examined. An exception is found in the
mid- to late pregnant and lactating mammary gland, where the 3.9-kb
start site is preferentially utilized(6) . This switch to the
predominant use of the 3.9-kb start site is coincident with the
cellular requirement for increased levels of 4-GT enzyme for
lactose biosynthesis. These observations, combined with a promoter
deletion analysis using
4-GT/CAT hybrid constructs, led us to
propose a model for transcriptional and translational regulation of the
4-GT gene in which the distal region upstream of the 4.1-kb start
site functions as a housekeeping promoter in all somatic cells, while
the proximal region upstream of the 3.9-kb start site serves primarily
as a mammary gland-specific promoter. In addition, we proposed that a
putative negative regulatory region identified adjacent to the 3.9-kb
start site, down-regulates transcription from this start site in all
somatic tissues except the mid- to late pregnant and lactating mammary
gland. The key feature of our model is that mammals have evolved a
two-step mechanism to generate the elevated levels of
4-GT
enzymatic activity required for lactose biosynthesis. First, there is
an up-regulation of the steady state levels of
4-GT mRNA by the
predominant synthesis of the transcript (3.9 kb) that is regulated by
mammary gland-specific factors. Second, the 3.9-kb
4-GT transcript
with its short (
20 nucleotides), less structured 5`-untranslated
region is translated more efficiently compared to its housekeeping
counterpart (4.1 kb) which has a long (
200 nucleotides), highly
structured 5`-untranslated region(6) .
In this study, we
have focused on verifying those predictions of our model pertaining to
the transcriptional regulation of the 4-GT gene. We have used
DNase I protection and electrophoretic mobility shift assays (EMSAs) to
identify specific cis-acting elements and the corresponding
trans-acting factors potentially involved in the expression of the
4.1-kb and the 3.9-kb
4-GT transcripts. We show that the distal
promoter region immediately upstream of the 4.1-kb start site is bound
primarily by the ubiquitous transcription factor Sp1. In contrast, the
proximal promoter region adjacent to the 3.9-kb start site is a target
for binding by multiple proteins which include a candidate negative
regulatory factor, Sp1, a mammary gland-specific form of CTF/NF1 and
the tissue-restricted factor, AP2.
We have previously shown that the cellular requirement for
4-GT enzymatic activity correlates with the transcriptional start
site used (6) . In the majority of mouse somatic tissues,
including the mammary gland from virgin mice, (
)and
established cell lines derived from somatic tissues (e.g. L-cells), the 4.1-kb start site is predominantly used (the ratio
of the 4.1- to the 3.9-kb transcript is
5:1). However, in brain
tissue, the N18TG2 neuroblastoma cell line, and spermatogonia, the
steady state levels of
4-GT mRNA are
10-fold lower relative
to most somatic tissues and L-cells, and the 4.1-kb start site is
exclusively used. Additionally, in the mid- to late pregnant and
lactating mammary gland, the steady state
4-GT mRNA levels are
10-fold higher compared to most somatic tissues and L-cells, and
the 3.9-kb start site is preferentially used (the ratio of the 4.1- to
the 3.9-kb transcript is
1:10). This differential utilization of
the two start sites suggested that housekeeping and mammary
gland-specific transcription factors, binding to different promoter
elements, regulated the use of the 4.1- and the 3.9-kb start sites,
respectively. Therefore, to experimentally verify this prediction, the
DNA sequence flanking the two start sites was analyzed for protein
binding by DNase I footprinting and EMSAs using nuclear extracts
prepared from L-cells, brain tissue, and LMG, which represent the three
patterns of
4-GT mRNA expression described above.
To determine whether these, or other, sequence
elements do in fact bind nuclear factors, and if this binding is
tissue-specific, a single end-labeled DNA fragment containing the
4-GT sequence from -172 to +110 was subjected to DNase
I footprinting analysis using nuclear extracts from mouse L-cells,
brain tissue, and LMG. Five protected regions, designated FP-1 to FP-5,
were seen on the noncoding strand (Fig. 1A), and four
protected regions corresponding to FP-1 to FP-4, were observed on the
coding strand (Fig. 1B). The sequence of each protected
region was subsequently compared against the entries in the
transcription factor data base (15) . The combined results of
these analyses are summarized in Fig. 2.
Figure 1:
DNase I footprinting
analyses of the region adjacent to the 3.9-kb transcriptional start
site. A, the DNA fragment containing 4-GT sequence from
-172 to +110 was labeled at the 3`-end of the noncoding
strand, incubated with BSA (lane 1) or nuclear extract from
L-cells (L, lane 2), brain (Br, lane
3), and lactating mammary gland (LMG, lane 4),
and treated with DNase I. An A + G chemical sequencing reaction (lane 5) performed on the same probe was run in parallel with
the samples on an 8% sequencing gel. The nucleotide numbering is
relative to A (+1) of the first in-frame ATG (Fig. 2). The
areas protected from DNase I digestion are marked by brackets and designated FP-1 to FP-5. The DNase I
hypersensitive sites are indicated by arrows. B, identical to A except that the DNA fragment (-172 to +110) was
labeled at the 3`-end of the coding strand. C, identical to A except that an overlapping DNA fragment (-295 to
+55) 3`-end labeled on the noncoding strand was used. Footprints
FP-3 to FP-7 are shown.
Figure 2:
The location of the DNase I protected
regions and the nuclear factor binding motifs in the 5`-flanking region
of the 4-GT gene. The sequence of the
4-GT gene (-850
to +60) is shown; numbers are relative to A (+1) of the first
in-frame ATG. The first two in-frame ATGs are underlined. The
clusters of upward bent arrows designate the transcriptional
start sites of the 3.9-kb (+14 to +24), the 4.1-kb
(-190 to -145) and the male germ cell-specific (Gc, -732) transcripts. The sequences protected from
DNase I digestion on the coding and the noncoding strands are overlined and underlined, respectively, and are
labeled FP-1 to FP-15. Protein binding motifs,
identified by comparison to a transcription factor data base, are boxed. In the case of FP-2, FP-4, FP-7, and FP-8, the
protected region extends further than the designated Sp1 site and may
well contain an additional Sp1 site. The binding of each indicated
nuclear factor was experimentally established by
EMSA.
FP-1, a rather weak footprint seen with all three extracts, is located between +36 to +60. It contains a GC-rich element (5`-GGGCGCG-3`) which is similar to a sequence motif (5`-GGGCGGC-3`) found just upstream (+24 to +30) of FP-1. Although this upstream region was not protected, a hypersensitive site indicative of a protein-DNA interaction, was seen at position +29 (Fig. 1B). Footprints FP-2 and FP-4 were also obtained with all three extracts but were more clearly observed on the coding strand (Fig. 1B) compared to the noncoding strand (Fig. 1A), where the interactions were primarily characterized by the presence of hypersensitive sites (indicated by the arrows). FP-2 (-34 to +2) contains an inverted GT box (5`-CCCACCC-3`) and FP-4 (-119 to -87) an inverted GA box (5`-CCCTCCC-3`).
In contrast to footprints FP-1, FP-2, and FP-4, footprint FP-3 was most prominent with the LMG extract and footprint FP-5 was clearly LMG-specific (Fig. 1A, lane 4). FP-3 (-70 to -42) is a complex region that contains multiple overlapping protein binding motifs: a CTF/NF1 half-site (5`-TGGC-3`), a GC-rich element (5`-GGGCGGC-3`) identical to that found at +24 to +30, an AP2 site (5`-GCCTGCGGG-3`), and an Sp1 site (5`-GGGCGGG-3`). The only motif noted within FP-5 (-162 to -140) is a perfect AP2 site (5`-GCCGCAGGC-3`). Because FP-5 extended to the extreme 5`-end of the DNA probe, an overlapping fragment (-295 to +55) was used to more precisely map its 5`-boundary (Fig. 1C). This analysis revealed an additional LMG-specific, DNase I-protected region (FP-6, -185 to -165) that also contains an AP2 site (5`-TCCCGCGGC-3`).
The above data obtained from the DNase I footprinting analysis corroborates our previous studies (6) and shows that the region adjacent to the 3.9-kb start site is recognized by mammary gland-specific as well as ubiquitous factors. In order to characterize the nuclear proteins interacting with the protected sites, double-stranded oligonucleotides corresponding to the footprinted regions were analyzed by EMSA using nuclear extracts from mouse L-cells, brain tissue, and LMG.
A protein-DNA complex (Fig. 3A, indicated by the solid arrow) of similar mobility was seen with all three extracts (lanes 2-4), with the brain extract giving the most intense band. It should be noted that, even though footprint FP-1 was rather weak, the protein-DNA complex as visualized by EMSA was quite strong. This is due to the fact that EMSA is a more sensitive DNA-protein binding assay than the DNase I protection assay(13) . The formation of the complex was extract-dependent, as it was not seen in the control reaction performed in the absence of the nuclear extract (Fig. 3A, lane 1). The specificity of binding was demonstrated by competition assays in which unlabeled oligo 1 was preincubated with L-cell nuclear extract followed by the addition of labeled oligo 1. As seen in Fig. 3B, preincubation with a 100-fold molar excess of unlabeled oligo 1 greatly diminished complex formation (lane 2) and preincubation with a 500-fold molar excess abolished complex formation (lane 3).
Figure 3: Characterization by EMSA of the putative negative regulatory factor that binds to the FP-1 site. A, labeled oligo 1 (+20 to +59), spanning the FP-1 site, was incubated without (-, lane 1) or with nuclear extract from L-cells (L, lane 2), brain (Br, lane 3), and lactating mammary gland (LMG, lane 4) and subjected to electrophoresis on a 5% nondenaturing, polyacrylamide gel. The solid arrow indicates the position of the specific protein-DNA complex and the open arrow that of the free probe. B, competition experiments in which none (-, lane 1) or a 100- or 500-fold molar excess of unlabeled oligo 1 (lanes 2 and 3) or oligo Sp1, containing the consensus Sp1 recognition sequence (lanes 4 and 5), was incubated with the L-cell extract prior to the addition of labeled oligo 1. C, L-cell nuclear extract and labeled oligo 1 were incubated without (-, lane 1), or with irrelevant serum (IS, lane 2) or anti-Sp1 antibodies (Sp1, lane 3).
Since the GC-rich elements (GGGCGGC and GGGCGCG) contained within oligo 1 are similar to the Sp1 recognition sequence (GGGCGGG), an oligonucleotide containing the consensus Sp1 site (oligo Sp1, Table 1) was also tested in competition assays with labeled oligo 1 as the probe. Oligo Sp1 was not an effective competitor, even at a 500-fold molar excess (Fig. 3B, lanes 4 and 5), indicating that the protein recognizing oligo 1 was not Sp1 or an Sp1 family member. This conclusion was verified by showing that polyclonal antibodies against human Sp1, which cross-react with the mouse protein, neither inhibited nor caused a supershift (retard the mobility) of the specific protein-DNA complex (Fig. 3C, lane 3). The anti-Sp1 antibodies were shown to supershift authentic Sp1 in a control experiment (data not shown, also see Fig. 4). Analogous experiments performed using oligo 1 and nuclear extracts from brain and LMG gave results similar to those described for L-cells (data not shown).
Figure 4: Sp1 or Sp1-related nuclear factor(s) binds to the FP-2 site. A, labeled oligo 2 (-47 to -10) spanning the FP-2 site was incubated without (-, lane 1) or with nuclear extract from L-cells (L, lane 2), brain (Br, lane 3), and lactating mammary gland (LMG, lane 4) or Sp1 protein (Sp1, lane 5), and analyzed by EMSA. The position of the two protein-DNA complexes (I and II) is shown by the solid arrows (see footnote 3). The open arrow indicates the position of the free probe. B, identical to A except labeled oligo Sp1 was used. C, unlabeled oligo 2 (lanes 2 and 3) or oligo Sp1 (lanes 4 and 5) at the indicated molar excess was used as competitor for the formation of specific protein-DNA complexes between labeled oligo 2 and L-cell nuclear extract. D, L-cell nuclear extract and labeled oligo 2 were incubated without (-, lane 1) or with irrelevant serum (IS, lane 2) or anti-Sp1 antibodies (Sp1, lane 3).
We had previously identified a sequence motif between -15 and -6 with a weak similarity to a negative element described by Kageyama and Pastan(16) . However, EMSA using an oligonucleotide containing this sequence motif failed to demonstrate any protein binding (data not shown). Therefore, the protein binding to oligo 1, which we term GC binding factor (GCBF), is the candidate for the negative regulatory factor. Both GC-rich elements in oligo 1 appear to be important for high affinity binding, as two separate oligonucleotides containing either of the GC-rich elements showed very weak binding (data not shown). GCBF is predicted to have a broad tissue distribution as the 3.9-kb transcript is down-regulated in most somatic tissues. Consistent with this prediction, a preliminary survey has established that this factor is also present in liver, lung, and kidney (data not shown).
To compare the binding of nuclear factors in each nuclear extract to the consensus Sp1 site, EMSAs were conducted using oligo Sp1. As seen in Fig. 4B, an identical pattern of bands with mobilities similar to those observed with oligo 2 was obtained, except that all the bands were proportionally more intense. These results suggest that the same factor(s) that binds the GT box (oligo 2) somewhat weakly, binds the GC box (oligo Sp1) strongly. To confirm this, competition experiments using oligo 2 and oligo Sp1 were performed with the L-cell extract (Fig. 4C). A 50-fold molar excess of unlabeled oligo 2 had little effect on binding of labeled oligo 2 (compare lane 2 to the reaction lacking the competitor oligonucleotide in lane 1). A 250-fold molar excess of unlabeled oligo 2 (lane 3) resulted in partial competition with a proportionate weakening of both bands. In contrast, a 50- or a 250-fold molar excess of unlabeled oligo Sp1 (lanes 4 and 5), abolished the formation of both complexes. The formation of the two protein-DNA complexes with oligo 2 was also inhibited by anti-Sp1 antibodies (Fig. 4D, lane 3). These data demonstrate that complex I and II, obtained upon incubation of the L-cell nuclear extract with oligo 2, are specific and result from the binding of Sp1 or Sp1-like proteins, which have a greater affinity for the GC box (oligo Sp1) than the GT box (oligo 2).
Analogous experiments established that Sp1 or a related family member also binds the FP-4 site which contains an inverted GA box (CCCTCCC). Similar results were obtained when these experiments were repeated using brain and LMG nuclear extracts (data not shown).
Figure 5: Identification of mammary gland-enriched and ubiquitous transcription factors interacting at the FP-3 site. A, labeled oligo 3 (-82 to -37) spanning the FP-3 site was incubated without (-, lane 1) or with nuclear extract from L-cells (L, lane 2), brain (Br, lane 3) and lactating mammary gland (LMG, lane 4) and analyzed by EMSA. The four protein-DNA complexes (I-IV) are indicated by the solid arrows. The open arrow indicates the position of the free probe. B, a 250-fold molar excess of oligo 3 (lane 2), oligo Sp1 (lane 3), oligo AP2 containing the consensus AP2 site (lane 4), or oligo C/N containing the consensus CTF/NF1 site (lane 5), was used as competitor for the binding of nuclear factors present in the LMG extract to labeled oligo 3. C, a binding reaction containing labeled oligo 3 and LMG extract was performed in the presence of irrelevant serum (IS, lane 1), anti-Sp1 antibodies (Sp1, lane 2), anti-AP2 antibodies (AP2, lane 3), or anti-CTF/NF1 antibodies (C/N, lane 4). Reactions shown in lanes 1 and 4, and lanes 2 and 3 were from two separate experiments. D, labeled oligo C/N (lanes 1-3) or oligo 3 (lane 4) was incubated with the nuclear extract from the indicated tissue.
To demonstrate if the formation of complexes I-IV was specific and corresponded to the four putative protein binding motifs identified in this region (Sp1, AP2, GC-rich element, and CTF/NF1 half-site; see Fig. 2), competition assays, using labeled oligo 3 and unlabeled oligo Sp1, oligo AP2, and oligo C/N containing the respective consensus binding site as competitors, were performed. The results of one such assay, using the LMG extract, showed that addition of a 250-fold molar excess of unlabeled oligo 3 inhibited complexes I-III; complex IV was partially diminished (Fig. 5B, lane 2), suggesting that the formation of all four complexes is specific.
Complex I formation was abolished in the presence of unlabeled oligo Sp1 (Fig. 5B, lane 3) and anti-Sp1 antibodies (Fig. 5C, lane 2), confirming that Sp1 or an Sp1-like protein binds to the perfect Sp1 motif (GGGCGGG) at the FP-3 site. Similar results were obtained with the nuclear extract from L-cells (data not shown). It was surprising that Sp1 binding to this site was weak since both L-cells and the LMG contain relatively high levels of Sp1 (see Fig. 4B). This may be due to competition between multiple factors binding to overlapping sequence elements at the FP-3 site.
Complex II formation was abolished in the presence of a 250-fold molar excess of unlabeled oligo AP2 (Fig. 5B, lane 4) and anti-AP2 antibodies (Fig. 5C, lane 3), confirming that nuclear factor AP2 binds to the AP2 motif (GCCTGCGGG) at the FP-3 site.
A number of observations led to the conclusion that complex III, which was seen with all three nuclear extracts, may result from the binding of GCBF or a GCBF-like factor to the GC-rich sequence (GGGCGGC): (i) The GC-rich motif at the FP-3 site is identical to the GCBF binding site (+24 to +30) upstream of FP-1. (ii) The mobility of complex III is similar to that of the complex seen with oligo 1. (iii) Complex III formation is highest in brain and GCBF levels are also highest in this tissue. (iv) The formation of complex III is not inhibited by an excess of unlabeled oligo Sp1 (Fig. 5B, lane 3), nor by anti-Sp1 antibodies (Fig. 5C, lane 2), as noted for GCBF.
Complex IV formation, which is unique to the LMG, was abolished in the presence of a 250-fold molar excess of unlabeled oligo C/N (Fig. 5B, lane 5) and greatly diminished by anti-CTF/NF1 antibodies (Fig. 5C, lane 4), indicating that CTF/NF1 or a CTF/NF1-like factor binds to the CTF/NF1 half-site (TGGC). This nuclear factor has a greater affinity for the palindromic consensus CTF/NF1 site than the half-site, as oligo C/N competed more effectively for the formation of complex IV than oligo 3 (compare complex IV in lanes 5 and 2, Fig. 5B). It is noteworthy that inhibition of complex IV (either by oligo C/N or anti-CTF/NF1 antibodies) enhanced the formation of complex II, suggesting that there is competition between binding of CTF/NF1 and AP2 to their respective site. However, CTF/NF1 appears to preferentially bind at the FP-3 site, as complex IV is the major band seen with the LMG nuclear extract.
Analogous experiments using L-cell nuclear
extract showed that complex II formation was not competed by unlabeled
oligo AP2 but it was competed by unlabeled oligo C/N (data not shown).
These results are consistent with the fact that AP2 is a
tissue-restricted transcription factor that is present in LMG but not
in L-cells or brain (23) (see Fig. 6A).
Therefore, in the LMG, complex II formation is due to AP2, whereas in
L-cells it is due to CTF/NF1. These results suggest that two different
forms of CTF/NF1, with varying mobilities, exist in the LMG and
L-cells. To test this directly, labeled oligo C/N was incubated with
each nuclear extract (Fig. 5D). An intense,
heterogeneous band with mobility comparable to complex II was observed
with all three extracts (lanes 1-3), consistent with the
widespread distribution of CTF/NF1(24) . A higher mobility
band, similar to complex IV, was seen only with the LMG extract (lane 3), suggesting the presence of a mammary gland-specific
form of CTF/NF1 which we term, MG-C/N. Although CTF/NF1 is abundant in
all three tissues, its binding to oligo 3 is reduced in L-cells and
absent in brain. This may be attributed to the fact that the ubiquitous
form of CTF/NF1 has a greater affinity for the full palindromic binding
motif (TGG(C/A)(N)GCCA) than the half-site (TGGC) present
in oligo 3(24, 25) . Alternatively, competition may
occur between multiple factors binding to overlapping sites in this
region. As noted earlier, MG-C/N also has a greater affinity for the
full site than the half-site (Fig. 5B), but it appears
to bind to the half-site (in the context of the FP-3 site) better than
the ubiquitous form of CTF/NF1.
Figure 6: Identification of AP2 as the nuclear factor that binds to the FP-5 site. A, oligo 5 (-162 to -127), spanning the FP-5 site, was labeled and incubated without (-, lane 1) or with nuclear extract from L-cells (L, lane 2), brain (Br, lane 3), and lactating mammary gland (LMG, lane 4) and analyzed by EMSA. The solid arrow designates the position of the protein-DNA complex obtained with the LMG extract. The open arrow indicates the position of the free probe. B, a 50- or a 250-fold molar excess of either oligo 5 (lanes 2 and 3) or oligo AP2 (lanes 4 and 5) was used as a competitor for the formation of the specific protein-DNA complex between labeled oligo 5 and LMG nuclear extract. C, a binding reaction with labeled oligo 5 and LMG nuclear extract was performed without (-, lane 1) or with irrelevant serum (IS, lane 2) or anti-AP2 antibodies (AP2, lane 3). The position of the supershifted band in lane 3 is shown.
In summary, the results from the EMSA are in agreement with the DNase I footprinting analysis and confirm that the major interaction at the FP-3 site is mammary gland-specific and is the result of binding a mammary gland-specific form of CTF/NF1 (MG-C/N) to the CTF/NF1 half-site.
Analogous experiments using an oligonucleotide spanning the FP-6 site gave similar results, although the intensity of the complex was reduced (data not shown). Since the regions represented by FP-5 and FP-6 were equally well protected in the DNase I protection assay (Fig. 1C), these data suggest cooperative binding to the two AP2 sites. For example, binding of AP2 at the FP-5 site may stabilize binding at the FP-6 site. More importantly, these results show that the mammary gland-specific interactions at the FP-5 and the FP-6 sites are due to the binding of AP2 or an AP2-like protein that is absent in L-cells and brain.
Six protected regions (FP-7 to FP-12), demarcated by hypersensitive sites, were seen when the DNA fragment (-474 to +55) was analyzed by the DNase I footprinting assay (Fig. 7; FP-7 is better visualized in the bottom half of Fig. 1C). FP-7, FP-8, and FP-9 were observed with nuclear extracts from all three tissues and each footprint contains an inverted GC or GT box (Fig. 2). FP-10, FP-11, and FP-12 were seen with the L-cell and LMG extracts but not with the brain extract. FP-11 was more pronounced with the LMG extract compared to the L-cell extract on the noncoding strand, however, on the coding strand both extracts showed equivalent protection (data not shown). FP-10 and FP-11 contain an imperfect, inverted GC box and an inverted GA box, respectively (Fig. 2). The protection of the FP-12 region was qualitatively different between the L-cell and LMG extracts; the L-cell extract showed better protection at the top (3`)-half of FP-12, whereas the LMG extract protected the bottom (5`)-half better. The reason for this became apparent when an inspection of this protected sequence revealed overlapping binding sites for AP2 (absent in L-cells) and Sp1 (present in L-cells and LMG).
Figure 7:
DNase I footprinting analysis of the
region immediately upstream of the 4.1-kb transcriptional start site. A
DNA fragment containing the 4-GT gene sequence from -474 to
+55 was labeled at the 3`-end of the noncoding strand, incubated
with BSA (lane 1) or nuclear extract from L-cells (L, lane 2), brain (Br, lane 3), and lactating
mammary gland (LMG, lane 4) and digested with DNase
I. An A + G sequencing reaction performed on the same probe was
run in parallel with the samples on an 8% polyacrylamide-urea gel (lane 5). The regions protected from DNase I digestion are
marked by brackets and labeled FP-7 to FP-12. The
DNase I hypersensitive sites are indicated by the arrows.
Oligonucleotides corresponding to FP-7 to FP-12
were then analyzed by EMSA and protection at each site was shown to be
the result of binding by Sp1 or a related family member (data not
shown). As expected, the oligonucleotide corresponding to the FP-12
site also showed weak binding by AP2 with the LMG extract. These data
confirm that Sp1, or a family member, interacts at multiple sites in
the immediate vicinity of the 4.1-kb start site and that the region
upstream of this start site may well function as a housekeeping
promoter. Consistent with this conclusion is the correlation between
the levels of Sp1 binding activity and the 4.1-kb mRNA in the three
different tissues tested. Brain which has 10-fold lower steady
state levels of the 4.1-kb mRNA compared to L-cells and LMG, also shows
the lowest level of Sp1 binding activity, whereas L-cells and LMG which
have comparable amounts of the 4.1-kb transcript, have similar levels
of Sp1 binding activity ( Fig. 4and Fig. 6). The relative
Sp1 binding activity most likely reflects the amount of Sp1 protein
present in each tissue, as the study by Saffer et al.(27) shows that Sp1 protein levels are very low in the
brain tissue.
Figure 8:
DNase I footprinting analysis of the
region between -474 to -805. A DNA fragment containing the
4-GT gene sequence from -828 to -449 was 5`-end
labeled on the noncoding strand and incubated with BSA (lane
1) or nuclear extract from L-cells (L, lane 2),
brain (Br, lane 3), and lactating mammary gland (LMG, lane 4) and treated with DNase I. An A + G
sequencing reaction performed on the same probe was run in parallel
with the samples on an 8% polyacrylamide-urea gel (lane 5).
The regions protected from DNase I digestion are marked by brackets and designated FP-13 to FP-15.
We have recently shown that a 798-bp genomic fragment
spanning the male germ cell start site is sufficient to target
expression of the reporter gene, -galactosidase, exclusively to
the late pachytene spermatocytes and round spermatids of transgenic
mice(30) . This fragment contains several motifs including two
CRE (cAMP-responsive element)-like elements, that have been noted in
the promoters of other genes expressed during the later stages of
spermatogenesis (see discussion in Shaper et
al.(30) ). CRE-motifs have been shown to bind a unique
form of the CRE binding protein (CREM
) expressed only
in postmeiotic male germ cells (31) .
With respect to
4-GT expression in somatic cells, our previous promoter deletion
studies in L-cells revealed two potential promoter regions; one
upstream of the 4.1-kb start site that contained binding sites for the
ubiquitous factor Sp1, and the other adjacent to the 3.9-kb start site
that contained motifs for several positive factors (CTF/NF1, mammary
gland activating factor (MAF), Sp1) and a negative factor. Based on
these initial studies we proposed a model of transcriptional regulation
of the
4-GT gene in which expression of the 4.1-kb transcript is
governed by a housekeeping promoter, whereas expression of the 3.9-kb
transcript is regulated by a tissue-specific promoter(6) .
In the present study we have used DNase I protection and EMSAs to determine if these cis-acting elements identified by ``paper analysis'' do in fact bind the corresponding trans-acting factors. The results are summarized in Fig. 9and reveal a modular arrangement of binding sites. The cluster of sites adjacent to the 3.9-kb start site bind the mammary gland-enriched factors, MG-C/N and AP2, the ubiquitous factor Sp1 and a putative negative regulatory factor, GCBF. The cluster of sites located just upstream of the 4.1-kb start site bind Sp1 or related family members. These data agree remarkably well with the model we previously proposed, although several modifications were noted. The sequence motif (-15 to -6) similar to the negative element described by Kageyama and Pastan (16) and the sequence motif (-9 to +1) similar to the binding site for MAF, a factor shown to be involved in the mammary gland-specific expression of mouse mammary tumor virus (MMTV) (32) , did not show protein binding.
Figure 9:
Schematic showing the sites bound by
trans-acting factors as determined by DNase I footprinting and EMSAs.
The positions of the binding sites for various nuclear factors present
in the lactating mammary gland (LMG), L-cells and brain tissue
in the 4-GT gene sequence between -800 to +100 are
shown. The upward bent arrows indicate the location of the
3.9- and the 4.1-kb start site; increasing thickness of the
arrow depicts increasing transcriptional activity. The GCBF is shown
tightly bound to the site downstream of the 3.9-kb start site in the
brain, somewhat displaced in L-cells and completely displaced in the
LMG. The low level of Sp1 in brain is indicated by lightly shaded
ovals compared to higher Sp1 levels in L-cells and LMG, indicated
by dark ovals. The CTF/NF1 binding indicated by the asterisk at
-500 may not be functionally important
in L-cells and brain. See text for a more detailed
discussion.
While the ubiquitous
form of CTF/NF1 binds to the palindromic CTF/NF1 site at -495 to
-507, this factor is unlikely to be functionally involved in the
regulation of the 4.1-kb transcript, since the promoter deletion
analysis in L-cells (6) shows that the 4-GT/CAT construct
containing both this motif and the cluster of Sp1 sites
(-805/-187), has CAT activity similar to the construct
lacking the CTF/NF1 site (-474/-187). The tissue-restricted
distribution of AP2 rules out any role for this protein in 4.1-kb mRNA
expression.
Although GCBF or a
GCBF-like protein appears to bind to the GC-rich motif at the FP-3 site
( Fig. 2and Fig. 5), this binding does not seem to have a
negative effect, as a reduction in CAT activity was not observed when
the FP-3 region was included in one of the 4-GT/CAT constructs
(-172 to -13) previously analyzed(6) . Therefore,
the sequence context of the GC-rich element may determine whether GCBF
acts as a negative or positive regulator. Examples of transcription
factors exhibiting dual function are YY1(40) ,
Egr-1(41) , and WT-1(42) .
Our data show that both the ubiquitous form of CTF/NF1 and MG-C/N bind with higher affinity to the palindromic sequence than to the half-site, but the half-site in FP-3 is notable in that it binds MG-C/N with higher affinity than the ubiquitous form. This may result from cooperative interaction with AP2, which also binds at the FP-3 site. It has been proposed that CTF/NF1 binding may be stabilized by interactions with factors bound to adjacent sites(25) .
The FP-13 site, containing the palindromic
CTF/NF1 sequence, binds both forms and shows equivalent protection
using nuclear extract from L-cells, brain, or LMG. While this might
suggest that this site is involved in 4.1-kb mRNA expression, we think
it unlikely as 4-GT/CAT constructs, that contained or lacked this
sequence, exhibited similar CAT activities(6) . However,
binding at this site may be important for 3.9-kb mRNA expression in the
LMG, as it is juxtaposed between two AP2 sites ( Fig. 2and Fig. 9).
AP2 also appears to be involved in 3.9-kb mRNA expression as it is found only in the LMG and not in L-cells or brain. The close proximity of the three AP2 sites to the CTF/NF1 half-site just upstream of the 3.9-kb start site suggests that these factors may function cooperatively, as proposed for MMTV, to increase transcription from the 3.9-kb start site. The three additional AP2 sites and the palindromic CTF/NF1 site, located upstream of the 4.1-kb start site, may function in an enhancer-like capacity. A redundancy of cis-acting elements involved in tissue-specific expression has been noted in other genes. For example, multiple binding sites for factors (CTF/NF1 and mammary gland factor (MGF)) critical for mammary gland-specific expression of the whey acidic protein gene are found in the promoter proximal and distal regions, and it has been suggested that interaction at both sites is necessary for high level expression(52) .
As discussed, the
3.9-kb 4-GT transcript is predominantly expressed in the mid- to
late pregnant and lactating mammary gland, therefore, it was of
interest to compare the regulatory elements involved in its expression
with those of the milk protein genes. CTF/NF1 has been implicated in
the expression of
-lactalbumin (49) and
-lactoglobulin (50) and has been shown to be functionally
important for the expression of whey acidic protein(52) . MMTV,
which is expressed primarily in the late pregnant and lactating mammary
gland, also contains a functional CTF/NF1 site (32) in addition
to a functional AP2 site(57) . Binding sites for the mammary
gland-enriched factor, MGF, are found in all milk protein genes
(reviewed in Groenen and van der Poel(58) ) and have been shown
to be functionally involved in the expression of
-casein(59) , whey acidic protein(52) , and
-lactoglobulin(60) . However, this site is not present in
the
4-GT gene sequence analyzed.
With the recruitment of
4-GT for lactose biosynthesis, the problem arose as to how to
increase the levels of this enzyme in the LMG, while maintaining the
relatively low levels of constituitively expressed enzyme in all
somatic tissues. Based on our analysis of the structure and regulation
of the murine
4-GT gene, we would argue that this was achieved by
the generation of the 3.9-kb start site and its accompanying
tissue-restricted regulatory elements. It is interesting to note in
this regard that both AP2 and GCBF, two of the transcription factors
implicated in the regulation of transcription from the 3.9-kb start
site, bind to GC-rich sequence motifs, which could have been generated
by mutations in the GC-rich regions flanking the 4.1-kb start site.
In summary, the results presented in this study support the
conclusion that the presence of the 3.9-kb start site in the mammalian
4-GT gene is a direct consequence of the recruitment of
4-GT
for the mammary gland-specific biosynthesis of the uniquely mammalian
disaccharide, lactose.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) L16840[GenBank].