(Received for publication, September 7, 1994; and in revised form, October 20, 1994)
From the
GPAT and AIRC encode two enzymes that catalyze
steps 1 and 6 plus 7, respectively, of the de novo purine
biosynthetic pathway. The chicken genes are closely linked and
divergently transcribed from an 230-base pair intergenic region.
The promoter was scanned by deletion mutagenesis in a bireporter vector
that allowed assay of transcriptional activity in both directions in
transfected HepG2 and chicken LMH cells. Three classes of deletions
were obtained: those affecting bidirectional transcription, those
predominantly affecting GPAT transcription, and those
predominantly affecting AIRC transcription. Defects in
bidirectional transcription resulted from removal of an initiator-like
element overlapping the AIRC transcription start site, as well
as deletions removing a series of GC and CCAAT boxes from the AIRC proximal half of the promoter and a CCAAT-containing segment from
the GPAT side. Several regions in the GPAT proximal
half of the promoter, including an octamer-like motif downstream from
the transcription start site, were required predominantly for GPAT expression. Evidence for interaction of HeLa nuclear proteins with
some of these sites was obtained by gel retardation, DNase I, and
methylation interference assays. Overall, the results showed that the
intergenic region is an integrated bidirectional promoter and that a
novel initiator-like element plays a central role in coordinating
expression of the divergently transcribed AIRC and GPAT genes.
De novo biosynthesis of purine nucleotides proceeds by a 14-step branched pathway via IMP. GPAT-encoded glutamine 5`phosphoribosylpyrophosphate amidotransferase catalyzes the first committed step of the pathway, and 5`-phosphoribosylaminoimidazole carboxylase/5`-phosphoribosyl 4-(N-succinocarboxamide)-5-aminoimidazole synthetase, encoded by AIRC, catalyzes steps 6 and 7. The approximate chromosomal locations of the seven human genes required for AMP synthesis were deduced by complementing Chinese hamster ovary mutants deficient in AMP synthesis and by subsequent cytogenetic analysis of Chinese hamster ovary-human somatic cell hybrids. GPAT and AIRC were thus mapped to overlapping regions of chromosome 4, whereas other genes of the pathway were localized on different chromosomes (Barton et al., 1991). More recently, the human GPAT-AIRC locus has been mapped by in situ fluorescence hybridization to the q12 region of chromosome 4 (Brayton et al., 1994).
In order to
set the groundwork for investigations of gene expression and regulation
of this pathway in vertebrates, we recently cloned and characterized
the chicken and human GPAT genes and the proximal AIRC genes (Brayton et al., 1994; Gavalas et al.,
1993). This work established that GPAT and AIRC are
closely linked and divergently transcribed from intergenic regions of
approximately 230 and 625 bp ()in chickens and humans,
respectively. Intron/exon boundaries are strictly conserved, as is the
approximate size of the GPAT gene. On the other hand, human AIRC is approximately 2-fold larger than the corresponding
chicken gene. The two promoters have also diverged significantly,
although the close linkage of the two genes has been retained along
with a high GC content and the presence of several Sp1 boxes. Both
promoters lack TATA elements. Although the functional consequences of
tight linkage between GPAT and AIRC are not known,
promoters with the capacity to direct bidirectional transcription may
provide one mechanism for the co-regulation of functionally related
genes. As such, this arrangement may constitute the eukaryotic
equivalent of a prokaryotic operon. This structural unit was named a
dioskourion (from the Greek Dioskouri, the mythological inseparable
twin sons of Zeus) (Gavalas et al., 1993). Bidirectional
promoters may also be useful as a genetic engineering tool, directing
expression of two genes in predetermined relative amounts and/or in a
tissue-specific manner.
Previous experiments (Gavalas et al., 1993) have shown that chicken GPAT-AIRC promoter strength was about 10-fold higher in the AIRC direction compared with the GPAT direction using a bireporter promoter vector in transfected HepG2 cells. In addition, the intergenic region was dissected to yield ``half-promoters'' having about 30% function in the GPAT direction and 80% function in the AIRC direction. In this earlier work, a bidirectional promoter was defined operationally as a short segment of DNA that initiates bidirectional transcription in vivo. The question remains, however, whether common cis-elements and assembled transcription factors are used for transcription in both directions, as in an ``authentic'' bidirectional promoter or whether expression in the two directions employs distinct cis-elements. The latter case could result from juxtaposition of separate promoters. In order to identify cis-acting sites and to distinguish between the two types of promoter function, deletion mutants were constructed and tested for transcriptional activity by transfection of a bireporter vector carrying the LUC and CAT genes in divergent orientations. Two hepatoma cell lines, human HepG2 and chicken LMH, were used to evaluate the effect of these deletions. cis-Elements were found for bidirectional expression in both cell lines. One of these cis-elements, central for the expression of both genes, is a novel initiator (Inr)-like element, situated around the AIRC transcription start site. Mobility shift, DNase I, and methylation interference assays limited this element to 41 bp, showed that nuclear protein binding results in hypersensitivity to DNase I at the AIRC transcription start site, and identified nucleotides involved in specific DNA/protein contacts. cis-Elements also exist that are largely side-specific, most notable of which is an octamer-like motif found downstream from the GPAT transcription start site.
Figure 1: Map of promoter deletions. A, the plasmid pSK-PRO1 was used for initial construction before subcloning the mutated promoters into the bireporter plasmid pLUC/CAT-3. The middle box between HindIII and SalI represents the promoter. The hatched parts correspond to the 5`-untranslated regions of the two genes that were incorporated into the promoter. The LUC and CAT boxes represent 5` parts of these genes that are incorporated into this plasmid. The line between the promoter and the reporter boxes represents short polylinker sequences. B, schematic representation of the promoter. Consensus sites for cis-elements and an octamer-like sequence are shown. The lowercase letter indicates a base deviating from the consensus. Arrows denote the transcription start sites, and arrowheads indicate the positions of SmaI sites that were introduced by site-directed mutagenesis. The exact sequence of this region and the exact position of the SmaI sites are shown in Fig. 7. WT, wild type. C, schematic representation of the promoter mutations. The deletions are noted by the interrupted line and the number of deleted bp is written in the gap. Mutant 2.7i bears no deletion; instead, the region between points 2 and 7 is inverted. Constructs are named according to the region deleted.
Figure 7: Nucleotide sequence of the AIRC/GPAT bidirectional promoter. Flanking sites for HindIII at the 5` end and SalI at the 3` end that are used for subcloning are not shown. Arrowheads indicate the positions of SmaI sites used for construction of deletions. Potential GC and CCAAT elements are boxed. Large bent arrows and filled squares represent the transcription start sites as determined from transient transfection experiments and endogenous mRNA, respectively. The lowercase sequence at positions 50-90 contains the AIRC Inr-like element, the upward vertical arrow shows the position of the DNase I-hypersensitive site on the bottom (noncoding) strand, and asterisks mark the interfering nucleotides on the top strand (above the sequence) and the bottom strand (below the sequence). Half-arrows above the Inr-like element indicate the presence of an imperfect palindrome with a 3-bp spacer. A region similar to the adenovirus major late promoter Inr is noted at positions 43-55. Direct (underlined) repeats close to the GPAT transcriptional start sites are noted by Roman numerals. An octamer-like motif is shown with lowercase letters at positions 320-327, and its DNase I footprint is shown by solid lines above (top strand) and below (bottom strand) the sequence.
The resulting mutagenized promoters were subcloned as HindIII/SalI restriction fragments into the bireporter plasmid pLUC/CAT-3. This plasmid is identical to the plasmid pLUC/CAT-1 that was used earlier (Gavalas et al., 1993) with the exception that the contiguous BamHI and SacI sites at the 3` end of the chloramphenicol acetyltransferase (CAT) reporter have been replaced by a ClaI restriction site. The resulting plasmids were checked by restriction mapping and were purified through two cesium chloride gradient ultracentrifugations in preparation for the transient transfection assays.
Figure 3:
Analysis of protein binding to the AIRC Inr region by gel retardation. A, protein-DNA
complexes with promoter probe fragment shown in C. The probe
was isolated from plasmid pSma-2 by digestion with HindIII and XmaI, sites that flank the sequence shown, and was end-labeled
with [-
P]ATP. Arrows identify two
specific protein-DNA complexes. These complexes were competed by Sp1
oligonucleotide and by segments of the promoter proximal to the AIRC transcription start site defined in C. Plasmid
pBluescript polylinker (169 bp) was the nonspecific competitor. All
competitors were used at a 100-fold molar excess. Noncompeted unmarked
bands may be nonspecific. B, protein-DNA complexes with probe PCR #79 (see C). This probe has 5` and 3` ends
corresponding to probes PCR #7 and PCR #9,
respectively. An arrow marks the position of a protein-DNA
complex competed by the unlabeled probe. The molar excess concentration
of the competitor is given. The nonspecific competitor (100-fold molar
excess) is the same as in A. C, the sequence of the
promoter around the AIRC transcription start site is shown.
The arrow and the filled square represent the
transcription start site determined by transient transfections and
endogenous mRNA, respectively. Arrowheads mark the positions
where SmaI sites were introduced and, therefore, the end
points of deletion mutations. The large box indicates the
minimal sequence around the transcription start site that competes for
the low mobility specific complex, and the small box indicates
the Sp1 site. The box with the dotted line represents
the homology with the adenovirus major late promoter Inr (see also Fig. 7). The open bars represent DNA fragments used as
competitors. PCR #1 is the same as the sequence
shown.
Typical
protein-DNA binding reactions of 20 µl contained 5 µg of HeLa
nuclear extract (Promega), 1 µg of
poly(dIdC)-poly(dI
dC) (Boehringer Mannheim), 25 mM HEPES, pH 7.6, 50 mM KCl, 0.1 mM EDTA, 5 mM MgCl
, 10% glycerol, 1 mM dithiothreitol, and
0.1% Nonidet P-40. Incubation was for 25 min at room temperature.
Subsequently, 20-25 fmol of labeled probe were added, and the
reaction was incubated for an additional 25 min at room temperature.
Protein-DNA complexes were resolved by electrophoresis on a 4% native
polyacrylamide gel. Competitor DNAs were prepared either by restriction
digestion or PCR and were preincubated, before the addition of the
labeled probe, in the binding reaction. Commercially available duplex
oligonucleotides (Promega) were also used as competitors: Sp1,
5`-ATTCGATCGGGGCGGGGCGAGC-3`; TFIID, 5`-GCAGAGCATATAAGGTGAGG TAGGA-3`;
OCT1, 5`-TGTCGAATGCAAATCACTAGAA-3`; CTF/NF1, 5`-CCTTTGGCATGCTGCCAATATG.
For the AIRC Inr methylation interference assays, the probes used above were methylated (Ausubel et al., 1987). A 5-fold scaled-up binding reaction was performed with HeLa extract and was electrophoresed on a 4% acrylamide preparative gel. The gel was exposed at 4 °C, and then the bands representing the complex and the free probe were recovered. DNA was eluted and cleaved with piperidine to yield the G ladder (Ausubel et al., 1987). Samples were run on a 10% polyacrylamide sequencing gel, and the gel was exposed at -80 °C for up to 4 days.
Figure 2: Transcriptional activity of mutant promoters. The activities of the mutant promoters in HepG2 and LMH cells are expressed as the percentage of the wild type and are represented by solid bars. Values are the average of at least five independent transfections with standard deviations shown as error bars.
Figure 4: Methylation interference and DNase I assays for the AIRC Inr region. A, methylation interference assay. The promoter fragment shown in Fig. 3was methylated. The piperidine cleavage reactions of the free (F), nonbound (N), and bound (B) DNA are shown. Interfering methylated nucleotides are noted with an asterisk. The position of G residues in the top strand around the binding site is noted with a line. B, DNase I cleavage of the bottom AIRC noncoding strand. Binding of nuclear extract resulted in the hypersensitive site marked by the arrow.
By virtue of its position around the AIRC transcription start site and its capacity to activate transcription at distinct positions in the absence of a TATA box, we infer that the site described above may represent an Inr-like element that affects transcription bidirectionally. This element does not have sequence similarity with known Inr elements (Azizkhan et al., 1993; Weis and Reinberg, 1992). Interestingly, an overlapping stretch of 13 nucleotides (see Fig. 7) has similarity with the adenovirus major late promoter Inr (Weis and Reinberg, 1992). However, this element does not appear to contribute to protein binding in this region, because the competition experiments showed that most of it is dispensable for binding, and a 30-bp double-stranded oligonucleotide, nucleotides 35-65 (see Fig. 3and Fig. 7), encompassing this region of homology did not compete either.
HeLa nuclear extracts and a SmaI/AvaI restriction fragment from plasmid pSma-2 (nucleotides 90-188, see Fig. 7) were used to detect protein-DNA complexes in the 2.4 subsection of the promoter. A specific complex was detected that was readily competed with an Sp1 oligonucleotide (Fig. 5) but not by a CTF/NF1 oligonucleotide (Chodosh et al., 1988) (data not shown). This result provides evidence for the binding of Sp1 to one or more of the three Sp1 sites in the 2.4 promoter region. No other complexes were detected using this probe or a probe encompassing sequences downstream of the AvaI site up to point 7 on the GPAT side under the conditions used. Thus, specific protein complexes with CCAAT motifs were not detected.
Figure 5: Sp1 binding in the promoter. The SmaI/AvaI fragment containing 2.4 DNA from plasmid pSma2 was used as a labeled probe, and an Sp1 oligonucleotide was used as competitor. The molar excess of the competitor is given above the lanes. The nonspecific competitor is the pBluescript polylinker used in 100-fold molar excess. The arrow points to the specific complex.
Deletion 4.5 also had a bidirectional effect, but, in contrast to the mutations described above, it resulted in increased transcription on both sides. This could result from removal of a repression element analogous to a structural control element in the dihydrofolate reductase (DHFR) promoter (Azizkhan et al., 1993) or could be because interactions required for bidirectional transcription are distance-dependent.
In order to search for protein-DNA interactions in the promoter region between SmaI sites 6 and 9, gel mobility shift assays were carried out with two DNA probes. The first probe was isolated as an AvaI/SmaI fragment from pSma-7 (see Fig. 7, nucleotides 188-262). Specific binding was not detected in this region (data not shown), even though it appears to contain the cis-acting sites needed for GPAT transcription (Fig. 2). This may reflect weak interactions in this region of the promoter that need the presence of distal sequences for stabilization. The second probe was isolated from plasmid pSma-7 as an XmaI/SalI fragment (see Fig. 7, nucleotides 262-349). Two specific complexes were detected using this probe and HeLa nuclear extract (Fig. 6, A and B). Both were competed with an octamer-containing oligonucleotide but not by nonspecific DNA. The two bands may result either from two proteins binding on this site or from protein-protein interactions resulting in the second complex that migrates more slowly. A DNase I footprint was obtained between nucleotides 314-334 on the top strand and nucleotides 320-339 on the bottom strand (Fig. 6C). These positions encompass the octamer-like motif downstream of the GPAT transcription start site (Fig. 7).
Figure 6: Protein binding to the GPAT octamer-like motif. A, gel retardation assay and competition by octamer DNA. The DNA complex was formed with HeLa nuclear extract and an XmaI/SalI end-labeled probe from plasmid pSma-7. Arrows point to specific complexes. Molar excess of an octamer-containing oligonucleotide is shown. B, effect of nonspecific pBluescript polylinker competitor (100-fold excess). C, DNase I footprints of the octamer-like site. The protein-DNA complex was formed with HeLa nuclear extract and an AvaI/SalI probe labeled on either strand. Lanes are shown for 0(-), 50 µg (+), and 100 µg (++) of protein. Nucleotide positions were determined by an adjacent dideoxy sequencing ladder alongside (not shown). The boundaries for the protected regions are numbered according to the sequence in Fig. 7.
Construct 2.7i is an inversion of the bidirectional promoter between points 2 and 7. With this inversion we expected to see increased transcription from the GPAT direction and a corresponding decrease from the AIRC side. This would allow an estimate of the relative contribution of Inr-like element(s) to transcription of both sides. However, transcription from both directions decreased relative to the wild type. This would appear to reflect a defined organization of sites with restricted interplay between elements that are at and that flank the sites for transcription initiation.
Earlier work has established that the chicken GPAT and AIRC genes are tightly linked and divergently transcribed from an intergenic region of about 230 bp (Gavalas et al., 1993). Operationally, the intergenic region was referred to as a bidirectional promoter. The objective of this work was to identify cis-elements that are important for promoter function, identify potential initiator elements in the TATA-less promoter that direct transcription initiation from well defined points, and distinguish between two models for bidirectional transcription: a model in which bidirectional transcription is driven by two largely independent promoters arranged back-to-back and a model in which a bidirectional promoter drives expression of both genes. In the former case, transcription of each gene should be driven using its own set of cis-acting sites, whereas in the latter case shared cis-elements would be used for transcription from both sides. In order to address these questions, potential cis-elements were inferred in the promoter sequence, and the promoter was scanned by deletion mutagenesis. The function of the mutant promoters was then tested in a bireporter vector that allowed a simultaneous assay of the effect of each deletion in both directions. The results support a model in which several important cis-elements function bidirectionally, and, therefore, expression of these genes is tightly coupled using these elements. Fig. 7gives an overview of the basal promoter sequence and the elements to be discussed. This sequence includes the intergenic region plus approximately 60 bp encoding the 5`-untranslated region of each mRNA.
An important element required for bidirectional transcription overlaps the AIRC transcription start site. The boundaries of this element within the 1.2 sequence shown in Fig. 7were mapped by protein-DNA binding. By virtue of its requirement for transcription and a position overlapping the site for transcription initiation, we refer to this cis-element as Inr-like.
There are at least three classes of Inr elements in RNA polymerase II-transcribed promoters (Azizkhan et al., 1993; Weis and Reinberg, 1992). A 17-bp element around the transcription start site of the terminal deoxynucleotidyltransferase gene (Weis and Reinberg, 1992) directs transcription from a single nucleotide in vivo and in vitro and has a CTCANTCT consensus sequence, where the underlined nucleotide represents the initiation site. Adenovirus major late and IVa2 promoter Inr elements (Smale and Baltimore, 1989) and the porphobilinogen deaminase gene Inr (Beaupain et al., 1990) belong to this class.
The minimal promoter elements required for expression of the dihydrofolate reductase DHFR gene are an Sp1 site and the DHFR Inr, which represents the second class of Inr elements. This Inr element is required for the hamster, mouse, and human DHFR genes (Azizkhan et al., 1993) and genes for hypoxanthine phosphoribosyltransferase, Ki-Ras, 3-phosphoglycerate kinase, osteonectin, and interferon regulatory factor 1 (Linton et al., 1989, and references therein).
The adeno-associated virus
type 2 p5 promoter has a third class of initiator that can function by
itself or can direct TATA- and Sp1- activated transcription (Seto et al., 1991). A similar element is found in the TATA-less
promoter of the human DNA polymerase gene (Weis and Reinberg,
1992).
The AIRC Inr-like element has no sequence similarity with the types of Inr elements mentioned above. A unique feature of this element is its bidirectionality. The AIRC Inr-like element may mediate assembly of the basal transcription machinery on both sides or do so only for the AIRC side, and it acts as a transcriptional activator for the distal side of the promoter. The experiments described here defined this element as a 41-bp region, which is unusually long compared with other types of Inr elements. Part of this element is an imperfect palindrome with a 3-bp spacer (Fig. 7). Further experiments will be needed to establish whether mutations in this region affect the selection of the transcription start site and whether this element is able to direct basal transcription in the absence of any upstream activator elements.
Apart from the AIRC Inr-like element, other sites were
shown to be important for bidirectional transcription. AIRC proximal GC boxes in region 2.4 are required for transcription of AIRC and GPAT. It is likely but was not directly
established that the two CCAAT boxes in fragment 3.4 contribute to the
function of this region in bidirectional transcription. Direct evidence
for the role of the two CCAAT boxes in fragments 3.4 and the one in
fragment 5.6 would require more precise mutational disruption or
detection of specific protein complexes with these sites. The
bidirectionality of the GC and CCAAT elements in the GPAT/AIRC promoter is not surprising, because they function in both
orientations upstream of the transcription start sites of their target
genes (Wingender, 1990). Examples for activation of bidirectional
transcription include a GC box in the center of the 130-bp intergenic
region for the 1(IV) and
2(IV) collagen genes (Heikkila et al., 1993), a GC box in exon 1 of the
2(IV) gene
(Heikkila et al., 1993), and GC boxes between the
transcription start sites of DHFR and Rep-1 (Fujii et al., 1992).
Aside from sequences that activate GPAT-AIRC bidirectional transcription, several cis-elements influence only one promoter side. A GC box downstream of the AIRC transcription start site enhances expression of its cognate side. Sequences upstream and around the GPAT transcription start site in regions 6.7 and 7.8, respectively, function to activate transcription from this side. Downstream from the GPAT transcription start site an octamer-like motif, ATGTAAAT (differing by only 1 nucleotide from the consensus ATGCAAAT), was implicated in full expression of the GPAT side. The octamer motif can mediate activation of transcription by a subfamily of the POU transcription factors, the octamer-binding factors (Herr, 1992). Oct-1, the best characterized of its class, is broadly expressed and regulates transcription of small nuclear RNAs, histone H2B, and others via the ATGCAAAT motif. Other octamer-binding factors, such as Oct-2, Oct-4, and Oct-6, are expressed in a temporally and spatially restricted manner and are involved in developmental regulation (Schöler, 1991). The positioning of the element downstream of the GPAT transcription start site is peculiar to this promoter, because octamer motifs are generally found upstream of the transcription initiation site of their target gene.
Deletion 4.5 in the middle of the promoter resulted in increased activity on both sides. In the absence of recognizable sequence motifs, this result may reflect a distance-dependent effect for cooperation of the two promoter sides or the presence of an uncharacterized repressor-like structural control element analogous to that of the DHFR/Rep-1 promoter (Azizkhan et al., 1993). Decreased bidirectional transcriptional activity in mutant 2.7i emphasizes the importance of correct alignment of activator sequences and Inr elements for maximal activity.
Earlier experiments (Gavalas et al., 1993) suggest that the GPAT and AIRC half-promoters retained 30 and 80% of the wild type activity for their cognate side, respectively. However, in these mutants, promoter context was altered with respect to upstream flanking sequences. These changes altered the vector substantially and made the comparison with the wild type less reliable than in the present work. Here, the sequence context was maintained in the half-promoter mutants; the vector was not altered. Therefore, we consider the results that indicate that the GPAT half-promoter has essentially no activity and that the AIRC half-promoter retains approximately 20-40% of the bidirectional activity to be a better approximation of function.
A number of genes in vertebrates are divergently
transcribed from a bidirectional promoter element. These include the
housekeeping genes surf-1 and surf-2 of the surfeit locus (Colombo et al., 1992), the murine and
human 1(IV) and
2(IV) collagen genes (Shimada et
al., 1989; Soininen et al., 1988), histone H2A and H2B
genes (Hentschel and Birnstiel, 1981), the DHFR and Rep-1 genes (Linton et al., 1989; Schilling and Farnham, 1989),
and the Wilms tumor locus (Huang et al., 1990). In other cases
of bidirectional transcription, genes were not identified for both
sides. These include the proliferating cell nuclear antigen gene
promoter (Rizzo et al., 1990), an SV40-like monkey genomic
locus (Saffer and Singer, 1984), the HTF9 CpG island (Lavia et
al., 1987), the human histidyl-tRNA synthase gene (Tsui et
al., 1993), the c-myc oncogene promoter (Chang et
al., 1991), and the VH441 promoter of the heavy chain
immunoglobulin promoter (Nguyen et al., 1991).
The AIRC/GPAT locus is the only case where two genes encoding enzymes of the same pathway are closely linked. This linkage may provide for co-regulation, but it is not a prerequisite of it, because five of seven human genes for AMP synthesis are found on different chromosomes. Given the properties of the AIRC Inr-like element, a number of models may explain how bidirectional transcription occurs from this promoter. Two basal transcription complexes (Buratowski, 1994) may assemble independently on each transcription start site. In this case, the protein(s) binding on the AIRC Inr would direct assembly of the transcription complex on the AIRC side, and it acts as a transcriptional activator(s) for the distal GPAT side. Alternatively, transcriptional complexes having two orientations may assemble on an AIRC Inr. This would fit with the palindromic nature of the AIRC Inr and the bidirectional activity of deletion 5.9. Another possibility is that the AIRC and GPAT transcription start sites are close in space, through looping of the central promoter region, and, therefore, the AIRC Inr is able to direct assembly of the basal transcription complexes on both sites. The data currently available do not distinguish among these possibilities.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) L12533[GenBank].