(Received for publication, August 22, 1995; and in revised form, October 5, 1995)
From the
Although many TATA-less promoters transcribed by RNA polymerase II initiate transcription at multiple sites, the regulation of multiple start site utilization is not understood. Beginning with the prediction that multiple start site promoters may share regulatory features and using the P-glycoprotein promoter (which can utilize either a single or multiple transcription start site(s)) as a model, several promoters with analogous transcription windows were grouped and searched for the presence of a common DNA element. A downstream protein-binding sequence, MED-1 (Multiple start site Element Downstream), was found in the majority of promoters analyzed. Mutation of this element within the P-glycoprotein promoter reduced transcription by selectively decreasing utilization of downstream start sites. We propose that a new class of RNA polymerase II promoters, those that can utilize a distinctive window of multiple start sites, is defined by the presence of a downstream MED-1 element.
Promoters transcribed by RNA polymerase II are divided into two classes: those that contain a canonical TATA box and those that do not. TATA-containing promoters usually direct transcription from a single initiation point, the location of which is determined by the position of TATA(1) . In promoters that lack a TATA box, start site selection is not as well understood and has been investigated primarily in genes that use a single transcription start site, where the presence of an ``initiator'' element at or near the start site appears to be responsible for localizing the preinitiation complex(2) . However, despite the fact that many TATA-less promoters utilize multiple start sites, there is little information as to how this multiple selection process occurs. One hypothesis is that the utilization of many start sites is a random or default response to the lack of a strong ``selector'' such as the TATA box(3) . Another possibility is that each site is independently regulated by a separate initiator-type element; apropos of this, the role of initiators in multiple start site selection within the thymidylate synthase promoter was investigated, but multiple initiators were not identified(4) .
We have previously shown that
transcription from the TATA-less P-glycoprotein (pgp1)
promoter can either begin at a single site (+1) or can include
multiple downstream start sites within a 70-nucleotide
window(5) , suggesting that +1 and the downstream sites
are independently regulated. In our efforts to understand the
activation of the additional downstream start sites within the pgp1 promoter, we have investigated the possibility that the
utilization of multiple start sites in TATA-less promoters is neither
random nor mediated by independent initiator elements but rather that
TATA-less promoters with a similar ``window'' of start sites
may share a common element that regulates their selection and/or
activation. In this report we show that: 1) as opposed to being
``random,'' the size and arrangement of the multiple start
site ``window'' is quite similar in many promoters; 2)
multiple start sites can be regulated as a cassette, rather than
individually; 3) a conserved sequence motif (MED-1) can be found in the
majority of these promoters downstream of the initiation window; and 4)
mutation of this motif within the pgp1 promoter decreases
transcription from the downstream start site cassette. We therefore
propose that the P-glycoprotein gene is a member of a subclass of
TATA-less promoters, which can be classified according to a
characteristic transcription ``window'' and the presence of a
common downstream regulatory element.
Figure 1:
Multiple start site TATA-less promoters
containing the MED-1 element. A, alignment of promoters with
multiple start sites. Promoters were chosen and aligned as described
under ``Materials and Methods.'' Only the 3`-half of the
alignment is shown. Identity among all five promoters is indicated by closed circles. The MED-1 element is outlined. Arrows above the sequence indicate pgp1 transcription
start sites. The most upstream start site in each promoter is numbered
+1. PGP1, P-glycoprotein(5) ; HMGCOA,
HMG-CoA reductase(7) ; TS, thymidylate
synthase(10) ; TK, thymidine kinase(11) ; HPRT, hypoxanthine phosphoribosyltransferase(12) . B, transcription initiation window size and position relative
to MED-1. Arrows indicate transcription start sites, as
described in references
noted(5, 7, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21) .
The size of the arrow does not necessarily correspond to the
relative strength of the particular transcription start site. Asterisk indicates values for G were
approximated from the data available. Due to inherent artifacts in
assays used to identify mRNA 5`-ends, genes in which start sites were
not confirmed by at least two independent methods were excluded from
these comparisons. Only promoters with complete identity to the MED-1
consensus are shown. ACBP, bovine acyl-CoA-binding
protein(13) ; WT1, human Wilms'
tumor(14) ; MHC-A, human nonmuscle myosin heavy
chain(15) ; N-RAS, mouse N-ras(16) ; CATL, rat catalase(17) ; G
,
mouse G protein(18) ; AK2, bovine adenylate kinase
isozyme(19) ; GHRH, rat growth hormone-releasing
hormone(20) ; AAT, rat aspartate
aminotransferase(21) .
Figure 2:
Gel shift analyses of protein complexes
interacting with MED-1. A, 1 ng of P-labeled
MED-1 oligonucleotide was incubated with nuclear extracts prepared from
DC-3F/ADII cells without competitor (lane 1), with cold
wild-type (WT) oligonucleotide (lanes 2-4),
with cold mutant (MUT) oligonucleotide (lanes
5-7), or with a nonspecific (NS) oligonucleotide (lanes 8-10). 20, 80, and 160 ng of competitor were
added. Arrows designate specific complexes. B, same
as A, except competed with an oligonucleotide representing
sequences within the HMG-CoA reductase promoter (lanes
2-4) (7) or a nonspecific oligonucleotide (lanes
5-7). 100 and 200 ng of competitor oligonucleotide were
added.
To construct pgp1/globin reporters, the unique BamHI site of the pgpLUC-B plasmid was first converted into an ApaI site by site-directed mutagenesis; the resulting plasmid was designated pgpLUC-Ba. pgp1GL was created by cloning a B-globin insert (isolated from PTAG-1 (8) by ApaI/HindIII digestion) into pgpLUC-Ba, replacing the luciferase gene and the SV40 3`-untranslated region. pgpGLm was created by the same approach, using pgpLuc-Bm as vector.
6 10
cells were co-transfected by the
calcium phosphate method with 12 µg of reporter plasmid and 0.25
µg of the neomycin resistance plasmid, p308 (ATCC). After 36 h,
cells were split into dishes containing medium supplemented with 400
µg/ml G-418 (Life Technologies, Inc.). A typical experiment yielded
300-400 neomycin-resistant clones. Individual clones were
isolated after 15-18 days. The presence and integrity of the
luciferase constructs were confirmed by Southern blot analyses (data
not shown). Luciferase assays were performed using the Promega
luciferase reporter assay system, as recommended by the vendor. Protein
concentrations were determined using the bicinchoninic acid assay kit
(Pierce) using microtiter plates(9) . For analysis of pgp1/globin constructs, resulting clones were pooled prior to
RNA isolation.
A search of the literature identified 14
promoters(7, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22) similar to the pgp1 promoter with respect to the
distribution of multiple start sites within the transcription
initiation window (see ``Materials and Methods'' for search
strategy). We began with the assumption that a DNA element involved in
multiple start site selection would be common to all these promoters
and, analogous to a TATA box, would lie within a conserved distance
from the window. In order to test this hypothesis, several of the
promoters were aligned and analyzed for such an element (Fig. 1A). A hexanucleotide sequence, GCTCC(C/G), which
we have designated MED-1 (Multiple start site Element Downstream), was
identified as the only element common to these promoters (Fig. 1A, outlined). The relationship of this
element to the initiation window is shown in Fig. 1B.
MED-1 was present in 14 out of 15 promoters (it was not found in the
human Ha-ras promoter(22) ) and lies 20-45 bp
downstream of the 3`-end and a maximum of 110 bp downstream of the
5`-end of the transcription initiation window.
The striking conservation of MED-1 in multiple start site promoters suggested a role for this element in multiple start site selection and/or activation. In order to test the possibility that MED-1 was a site for protein binding, gel shift assays were performed using an oligonucleotide containing the pgp1 MED-1 sequence (Fig. 2). Two specific DNA-protein complexes were identified (Fig. 2A, lane 1). While both complexes were specifically competed with an excess of wild-type pgp1 oligonucleotide (lanes 2-4), a mutation that converted the MED-1 site from GCTCCC to CCAAGG significantly impaired competition for binding of both complexes (lanes 5-7); moreover, when used as a probe, the mutant oligonucleotide was greatly reduced in its ability to form both complexes (data not shown). We do not yet know whether the two specific complexes contain different proteins or multimers of the same protein.
In order to determine whether the sixth base of the MED-1 consensus could be either a G or C as suggested by the computer alignment (Fig. 1A) and to substantiate the importance of this binding site in other promoters, an oligonucleotide representing a comparable region of the HMG-CoA reductase gene (7) was used as competitor and found to compete for both complexes (Fig. 2B, lanes 2-3). These results are consistent with the notion that the same protein factor(s) are binding to this promoter as well as to the others identified in Fig. 1B.
The functional role of MED-1
in pgp1 transcription was assayed in DC-3F/ADII cells, in
which the endogenous pgp1 is transcribed from multiple
sites(5) . In the first set of experiments, cells were stably
transfected with one of three constructs: a wild-type pgp1 promoter/luciferase construct, a MED-1 mutant/luciferase construct
containing the mutation previously shown to reduce DNA-protein complex
formation, or luciferase vector alone. A minimum of 11 independent
transfectants was isolated and analyzed for each construct. The results
presented in Fig. 3indicate that mutation of the MED-1 element
reduced expression from the pgp1 promoter to 25% that of
wild type (p = 0.0001). In light of the remarkable
conservation of MED-1 in multiple start site promoters, we predicted
that this reduction in expression might be due to a specific effect on
the downstream start sites. In order to investigate this possibility,
similar experiments were performed using pgp1/globin reporter
constructs (luciferase RNA was undetectable in the previous
experiments). Following stable transfection of these reporter
constructs into DC-3F/ADII cells, two significant observations were
made. First, the endogenous multiple start site pattern was
recapitulated (Fig. 4B, lane 1), confirming
our initial observation (5) that the selection of multiple
start sites is not simply a result of a mutation in the endogenous
promoter. Second, mutation of the MED-1 element resulted in a
3-fold reduction in utilization of the downstream start sites
relative to +1 (Fig. 4, B and C),
indicating that the downstream cassette can be regulated independently.
Figure 3:
Mutation of MED-1 reduces pgp1 promoter activity. Each bar indicates an independent
clone stably expressing a pgp1/luciferase reporter construct:
wild type (hatched), n = 16, =
2966 ± 1497; MED-1 mutant (white), n =
14, = 731 ± 682; promoterless (black), n = 11, = 119 ± 110. Arbitrary
luciferase units were obtained after normalization of the values
against the amount of protein in each extract. The clones were rank
ordered within each group according to activity for ease of visual
comparison. Statistical significance was determined by independent t test ( = 0.05).
Figure 4: Role of MED-1 in start site utilization. A, endogenous pgp1 start sites were determined by nuclease protection analysis as described previously(5, 23) . Lane 1, single start site selection in DC-3F cells; lane 2, multiple start site selection in DC-3F/ADII cells. Reproduced from (5) . B, total RNA from DC-3F/ADII cells stably transfected with wild type (W) or MED-1 mutant (M) pgp1 promoter/globin constructs (30 and 60 µg, respectively) was analyzed by nuclease protection(5) , using a riboprobe derived from the pgp1/globin construct. Lane 1, untransfected control (U), 30 µg of RNA. Position of start sites is indicated. C, quantitation of data presented in B, represented as the ratio of transcripts initiating within the downstream cassette (DSC)/transcripts initiating at +1. W, wild type; M, MED-1 mutant.
Previous efforts to understand the regulation of promoters containing multiple start sites have focused on individual genes, and the results have been largely inconclusive(4) . We therefore began with the assumption that multiple start site promoters share common regulatory features and that any sequence that is involved in start site selection would be at a conserved position relative to the transcription initiation window. It is important to emphasize that the alignment shown in Fig. 1preceded the functional evaluation of the MED-1 element, thereby reducing the bias that can be associated with data base searches for a DNA element following its identification in a single gene. Therefore, our analysis of the role of MED-1 in pgp1 transcription has strong predictive value relative to its function in the other genes in which it has been identified. Apropos of this, it is interesting to note that in earlier studies deletion of downstream sequences in both MHC-A(15) and N-ras(16) promoters significantly reduced expression from these genes; we now know that these deletions included the MED-1 element.
Whether MED-1 and its cognate binding proteins act as selectors or activators of multiple start sites is not yet known. However, it is clear that the mere presence of MED-1 is not sufficient for activation of multiple start sites since 1) we have already shown that the same pgp1 promoter that supports multiple start sites in some cells uses only the +1 site in others (5) and 2) the protein binding activity shown in Fig. 2is also present in cells which only utilize +1 (data not shown). Therefore, we suggest that MED-1 is necessary but not sufficient for multiple start site utilization and that other, likely trans-acting, factor(s) impose a higher order of regulation on the recognition of this element.
In conclusion, we propose that a new class of RNA polymerase II promoters can be defined by 1) the size of the transcription initiation window and the arrangements of start sites therein and 2) the presence of a downstream MED-1 element. Since the criteria imposed upon selection of the promoters included in Fig. 1B were quite stringent (requiring verification of start site position by both nuclease protection and primer extension analyses, complete homology with the MED-1 element defined in Fig. 1A, as well as the spatial restrictions suggested by the initial alignment), we predict that as more is known about the spatial and sequence requirements for the MED-1 element, additional promoters will be included in this class.
This paper is dedicated to J. R. Bertino on the occasion of his birthday.