©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
DNA Binding Specificities and Pairing Rules of the Ah Receptor, ARNT, and SIM Proteins (*)

(Received for publication, May 22, 1995; and in revised form, August 24, 1995)

Hollie I. Swanson William K. Chan Christopher A. Bradfield (§)

From the Department of Molecular Pharmacology and Biological Chemistry, Northwestern University Medical School, Chicago, Illinois 60611

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

The Ah receptor (AHR), the Ah receptor nuclear translocator protein (ARNT), and single-minded protein (SIM) are members of the basic helix-loop-helix-PAS (bHLH-PAS) family of regulatory proteins. In this study, we examine the DNA half-site recognition and pairing rules for these proteins using oligonucleotide selection-amplification and coprecipitation protocols. Oligonucleotide selection-amplification revealed that a variety of bHLH-PAS protein combinations could interact, with each generating a unique DNA binding specificity. To validate the selection-amplification protocol, we demonstrated the preference of the AHRbulletARNT complex for the sequence commonly found in dioxin-responsive enhancers in vivo (TNGCGTG). We then demonstrated that the ARNT protein is capable of forming a homodimer with a binding preference for the palindromic E-box sequence, CACGTG. Further examination indicated that ARNT may have a relaxed partner specificity, since it was also capable of forming a heterodimer with SIM and recognizing the sequence GT(G/A)CGTG. Coprecipitation experiments using various PAS proteins and ARNT were consistent with the idea that the ARNT protein has a broad range of interactions among the bHLH-PAS proteins, while the other members appear more restricted in their interactions. Comparison of this in vitro data with sites known to be bound in vivo suggests that the high affinity half-site recognition sequences for the AHR, SIM, and ARNT are T(C/T)GC, GT(G/A)C (5`-half-sites), and GTG (3`-half-sites), respectively.


INTRODUCTION

The AHR (^1)is a bHLH protein that mediates the metabolic, carcinogenic, and teratogenic effects of compounds such as TCDD(1) . In response to agonists, the AHR interacts with a related protein known as ARNT to form a dimeric (^2)complex that is capable of binding genomic enhancer elements, known as DREs, and activating transcription at adjacent promoters(2, 3, 4, 5) . The AHR and ARNT have sequence similarities to two regulatory proteins found in Drosophila, SIM, and PER(6, 7, 8, 9, 10) . SIM is a developmentally regulated bHLH protein involved in controlling central nervous system midline gene expression (11) . PER lacks a bHLH domain and thus may be an inhibitor of a related signaling pathway involved in the maintenance of circadian rhythms (12) . The hallmark of this family of proteins is that they all possess homology in a sequence of 200-300 amino acids termed a PAS domain (13) . In the AHR, the PAS domain has been shown to be involved in ligand binding, interaction with Hsp90, and may serve as a secondary surface to support ARNT dimerization(2, 14, 15, 16) .

Basic/helix-loop-helix proteins are involved in a variety of tightly regulated biological processes, such as the regulation of myogenesis (MyoD/E47)(17) , neurogenesis (Achaete-scute/Daughterless)(18) , regulation of immunoglobulin genes (TFEC/TFE3) (19) , cellular proliferation (Myc/Max)(20, 21) , and xenobiotic metabolism (AHR/ARNT)(10) . Biochemical and crystallographic data suggest that the HLH domains often act in concert with secondary dimerization surfaces (e.g. ``leucine zippers'' and possibly PAS domains) to position the two alpha helical basic regions within opposing major grooves of B-DNA, generating a ``scissor grip'' structure with high affinity for the core DNA sequence, CANNTG (22, 23, 24) . This DNA enhancer sequence is commonly referred to as an E-box and contains either CG or GC dinucleotides at the degenerate positions (i.e. CACGTG or CAGCTG) (25, 26, 27, 28) . Current models suggest that E-boxes can be viewed as containing two half-sites, with each partner's basic region determining half-site specificity (e.g. the 5`-CAN or the NTG-3` half-sites within 5`-CANNTG-3`). The multiplicity of half-sites and potential dimerization partners may allow production of a large number of homo- or heterodimeric pairs, each with unique sequence binding specificities and consequences for cellular signaling. In contrast to the recognition sites for most bHLH dimers, the cognate response element of the AHRbulletARNT complex, the DRE, usually contains TNGCGTG(5, 29, 30, 31, 32) . Unlike the E-box, the DRE is not palindromic, and thus the DNA half-site specificities of each protein are not readily apparent and are probably different.

In this study, we employed a DNA selection and amplification protocol to identify those bHLH-PAS protein combinations that could form productive DNA binding species and to characterize their individual DNA recognition sites. To validate the approach, we first demonstrated that the AHRbulletARNT heterodimer would select the known DRE sequence from a pool of over 10^7 sequences. We then used this selection approach to demonstrate that ARNT also has the capacity to form homodimers as well as heterodimers with SIM, with each complex generating a unique DNA sequence binding specificity. Integration of the DNA selection and coprecipitation results allowed us to deduce the half-site specificities and pairing rules for the AHR, ARNT, and SIM.


EXPERIMENTAL PROCEDURES

Materials

The SIM cDNA expression plasmid, pSIMNB40, was a gift from Dr. Stephen Crews (University of North Carolina, Chapel Hill). The plasmids pmuAHR, pmuAHRCDelta516, pmuAHRGNDelta315, and phuARNT were constructed as described previously (2, 33) . The affinity-purified anti-ARNT polyclonal immunoglobulins were a gift from Dr. Alan Poland(34) . The affinity-purified anti-AHR polyclonal immunoglobulins, G1295 4B, were raised in a goat against a synthetic peptide corresponding to the N-terminal sequence of the protein as described previously(35) . Purified immunoglobulin was obtained from Sigma. Nickel-nitriloacetic acid resin was obtained from Qiagen (Chatsworth, CA).

Oligonucleotides

Oligonucleotides were synthesized at the Northwestern University Biotechnology Center using an Applied Biosystems DNA synthesizer (Foster City, CA). The commonly recognized core DRE sequences are underlined, E-box recognition sites are in bold, and SIM/ARNT recognition sites are in italics. OL73, TCGAGTAGATCACGCAATGGGCCCAGC; OL74, TCGAGCTGGGCCCATTGCGTGATCTAC; OL185, GGCGGATCCTGAGTCTGAAC; OL186, CGTCTCGAGACGCTCAGG; OL187, GGCGGATCCTGAGTCTGAACNCCTGAGCGTCTCGAGACG; OL224, GGCGGATCCGATCTAGATTCN(7)GCGTGN(7)CCTGAGCGTCTCGAGACG; OL225, GGCGGATCCGATCTAGATTC; OL316, TCGAGCTGGGCAGGTCATGTGGCAAGGC; OL317, TCGAGCCTTGCCACATGACCTGCCCAGC; OL318, TCGAGCTGGGGGCATTGCGTGACATACC; OL319, TCGAGGTATGTCACGCAATGCCCCCAGC; OL321, TCGAGCTGGGCAGGTCACGTGGCAAGGC; OL322, TCGAGCCTTGCCACGTGCACCTGCCCAGC; OL323, TCGAGCTGGGCAGGTCAGCTGGCAAGGC; OL324, TCGAGCCTTGCCAGCTGACCTGCCCAGC; OL329, TCGAGCTGGGCATGTCACGTGACCGAGC; OL330, TCGAGCTCGGTCACGTGACATGCCCAGC; OL331, TCGAGCCATGGGATGTGCGTGACATTTC; OL332, TCGAGAAATGTCACGCACATCCCATGGC; OL464, TCGAGCCATGGGATGTACGTGACATTTC; OL465, TCGAGAAATGTCACGTACATCCCATGGC; OL501, TCGACTAGAAATTTGTACGTGCCACAGA; OL502, TCTGTGGCACGTACAAATTTCTAGTCGA; OL503, TCGACTAGAAATTTGTGCGTGCCACAGA; OL504, TCTGTGGCACGCACAAATTTCTAGTCGA.

Protein Expression

In vitro expression of the AHR, AHRCDelta516, AHRGNDelta315, ARNT, and SIM proteins was carried out in rabbit reticulocyte lysates (Promega) as previously reported(2) . For verification of protein expression, the translation was performed in the presence of [S]methionine, and the product was analyzed by SDS-polyacrylamide gel electrophoresis. Quantitation of the expressed proteins was determined by excising the radiolabeled proteins from the gel and scintillation counting. Baculovirus expression and purification of histidine-tagged AHR and ARNT were carried out as reported previously(36) .

Gel Shift Analysis

The DNA probes were radiolabeled with either [-P]ATP, by end labeling with T4 polynucleotide kinase(37) , or by PCR of the appropriate template in the presence of [alpha-P]dCTP, using OL186 and either OL185 or OL225 as primers(27) . Unincorporated nucleotides were removed using a 1-ml G-25 Sephadex spin column. The protein combinations were incubated for 30 min at 30 °C to facilitate protein dimerization. The clone AHRCDelta516, a constitutively active form of the AHR that interacts with ARNT and binds DNA in a ligand-independent manner, was used to circumvent the use of agonist in some experiments(2) . When full-length AHR was used, the incubation period was extended to 2 h in the presence of 10 µM of the AHR agonist beta-naphthoflavone. To minimize nonspecific interactions, 200 ng of poly(dI-dC) was added to the protein mixture along with KCl (final concentration, 100 mM). After 10 min of incubation at room temperature, the DNA probe was added (100,000 cpm), and the sample was allowed to incubate for an additional 10 min. Samples were then subjected to 4% acrylamide nondenaturing gel electrophoresis using 0.5 times TBE (45 mM Tris base, 45 mM boric acid, 1 mM EDTA, pH 8.0) as the running buffer(38) .

DNA Selection and Amplification

The DNA binding site selection and amplification was performed essentially as described (27) . For example, 10 ng of OL187 containing 13 sequential 4-fold degenerate nucleotides (4 7 times 10^7 possible sequences) was annealed to a 5-fold molar excess of primer OL186. The complementary strand was synthesized by incubation with the Klenow fragment of DNA polymerase (5 units) at 37 °C for 1 h. The resultant double-stranded DNA was purified by agarose gel electrophoresis (NuSieve, FMC Bioproducts, Rockland, ME), electroelution, and precipitation. For the first round of selection, 10 ng of the double-stranded oligonucleotide pool and either 1 fmol of in vitro expressed protein or 20 fmol of baculovirus-expressed protein were subjected to gel shift analysis. The electrophoresis was terminated when the bromphenol blue dye marker had migrated 1.5 cm. In this manner, the protein-complexed oligomer could be efficiently recovered and the majority of unbound oligonucleotide eliminated. The protein-bound oligonucleotide was then isolated from the upper 1 cm of the gel and was eluted for 3 h at 37 °C in buffer containing 10 mM Tris-HCl (pH 8.0), 1 mM EDTA, 50 mM NaCl, and 0.2% SDS. The eluant was extracted with phenol:chloroform:isoamyl alcohol (25:24:1), 10 µg of glycogen was added, and the DNA was precipitated. One-fifth of the recovered oligonucleotide pool was amplified by PCR. PCR conditions were 95 °C (1 min), 55 °C (1 min), 72 °C (30 s) for 25 cycles. Reactions contained 10 mM Tris-HCl, 50 mM KCl, 1.5 mM MgCl(2), 0.001% (w/v) gelatin, 200 µM of each deoxyribonucleotide triphosphate, 2 units of Taq polymerase and primers OL185 and OL186 (when OL187 was the template) or OL186 and OL225 (when OL224 was the template) in a total volume of 100 µl. After ethidium bromide visualization, approximately 5 ng of amplified template was radiolabeled by PCR and subjected to gel shift analysis for the subsequent round of selection. In these later rounds of selection, the bromphenol blue dye was allowed to migrate approximately 8 cm from the top of the gel to achieve higher resolution. The gels were then dried, the specific complexes were visualized following autoradiography, and the appropriate areas were excised. In most analyses, a double-stranded oligonucleotide corresponding to a commonly used synthetic DRE (annealed OL73/74) served as a migration marker. In initial rounds of selection and amplification, specific complex formation was determined by its migration similar to the OL73/74 complex and its dependence on the expressed protein (e.g. absent in lanes containing only one member of the heteromeric pair or unprogrammed reticulocyte lysate when analyzing for homomeric interactions). The presence of the AHR or ARNT in a complex was verified by the ability to ``supershift'' the complex upon polyacrylamide gel electrophoresis using either anti-AHR or anti-ARNT immunoglobulins (1 ng). Once a discrete protein-oligomer complex could be detected (typically after three or four rounds of selection and amplification), the amplified oligonucleotide was either digested with BamHI and XhoI and subcloned into pBluescript SK (Stratagene) or extracted with phenol:chloroform:isoamyl alcohol (25:24:1) and directly subcloned into pGEM-T (Promega). Individual clones were sequenced using the dideoxy chain termination method(39) .

Dissociation Rate Analysis

The dissociation rates of each DNA binding complex (i.e. full-length AHRbulletARNT, ARNTbulletARNT, and SIMbulletARNT) were determined by gel shift analysis using the indicated DNA sequences as probes. For each off-rate analysis, a master binding reaction equivalent to at least six reactions as described above was used with 1 ng of end-labeled probe. Following binding, 100-200-fold molar excess of unlabeled, doubled-stranded oligonucleotide that was identical to the probe DNA was added. Aliquots (25 µl) were removed and analyzed at the indicated times. To determine the end point value, a 100-fold excess of unlabeled competitor was added prior to the introduction of labeled probe. Complex formation at each time point was determined using a Fuji PhosphorImager. The amount of protein-DNA complex from the end point value was subtracted from each intermediate time point value. To evaluate possible degradation of the protein-DNA complexes, a mixture containing the protein(s) and end-labeled probe was incubated for 20 min in the absence of competitor oligonucleotide (data not shown). Quantitation of this control indicated a degradation of less than 5% of the complex over the time period analyzed. Half-life (t) was calculated from the slope of the linear regression curve where t = 0.693/k and k = -(2.303)(slope).

Coprecipitation

Sf9 soluble extract containing approximately 120 µg of baculovirus-expressed ARNT and S-labeled reticulocyte lysate-expressed protein (ARNT, full-length AHR, AHRCDelta516, AHRGNDelta315, or SIM) was combined with the nickel-nitriloacetic acid resin in wash buffer (50 mM Tris, pH 7.4, 100 mM KCl, 10% glycerol, 10 mM beta-mercaptoethanol, 0.4% Tween 20, and 5 mM imidazole) and mixed gently for 2 h at 4 °C. In parallel reactions, uninfected Sf9 soluble extract containing similar amounts of total protein was substituted for ARNT soluble extract as a negative control. Samples containing oligonucleotides were incubated at room temperature for 10 min in the presence of poly(dI-dC) (10 µg) followed by the addition of the indicated oligonucleotides prior to the 2-h incubation. The resin was pelleted following centrifugation at 16,000 times g for 10 s, and the samples were washed five times using 1 ml of wash buffer. The pellets were resuspended and analyzed by SDS-polyacrylamide gel electrophoresis and autoradiography.

Statistical Analysis

-Square goodness of fit test was used to determine whether frequencies of nucleotides at each position of the oligonucleotide were different than expected random frequencies(40) . In the case where DNA selection and amplification-derived AHRbulletARNT sequences were compared with those present in bona fide (^3)DREs, two by two contingency tables were used to compare frequencies of nucleotides. Significance for all tests was set at p < 0.01.


RESULTS

Validation of the DNA Selection and Amplification Strategy

To validate the DNA selection and amplification technique, we first examined the nucleotide specificity of the AHRbulletARNT heterodimer. We amplified a pool of oligonucleotides, derived from OL187, to generate double-stranded oligomers that contained 13 consecutive random nucleotides, theoretically encoding greater than 7 times 10^7 unique sequences. The oligonucleotides that specifically bound to the AHRbulletARNT complex were subjected to three rounds of selection and amplification. 24 selected oligonucleotides were cloned and sequenced (Fig. 1A). Statistical analysis by -square was performed to identify those nucleotides preferentially selected for by the AHRbulletARNT complex (Fig. 1B). Nucleotides that occurred at greater than expected frequencies (p < 0.01) were used to derive a consensus recognition sequence, TNGCGTGC (Fig. 1C). Of these oligonucleotides, 22 contained the GCGTG core sequence that is commonly found in bona fide DREs. Two oligonucleotides, AHA23 and AHA24, contained similar core motifs, TCGTG and GTGTG, respectively. Subsequent gel shift analysis indicated that these two sequences were capable of binding AHRbulletARNT complexes, albeit at lower affinities than those sites containing the complete core motif, GCGTG (results not shown). Analysis of the OL187 oligonucleotide pool that was cloned, amplified, and sequenced directly, without selection by the AHRbulletARNT heterodimer, served as a control. -Square analysis of these sequences indicated that the AHRbulletARNT selected sequence was not the result of biased oligonucleotide synthesis (Fig. 1D).


Figure 1: AHRbulletARNT recognition sites selected from random sequences after three rounds of selection. A, the double-stranded oligonucleotide pool generated from OL187 was incubated with 1 fmol of reticulocyte lysate-expressed AHRCDelta516 and ARNT. The mixture was subjected to the selection and amplification protocol, and the individual clones were sequenced. The most highly conserved sequence, GCGTG, is boxed. B, tabulation of nucleotide frequencies at each position (n = 24). All frequencies were multiplied by 100. The frequencies of individual nucleotides were analyzed by -square at the p < 0.01 level. Frequencies above the expected random level are underlined. C, an AHRbulletARNT consensus sequence derived from statistically significant nucleotides. D, a sample from the double-stranded oligonucleotide pool (OL187) was cloned and sequenced to verify equal representation of each nucleotide, and the frequencies were calculated and analyzed by -square (n = 19).



Analysis of Sequences Flanking the GCGTG Motif

As shown in Fig. 1, the position of the GCGTG was biased toward the 3`-end of the oligonucleotide. To determine if this bias was the result of flanking nucleotides, we constructed an additional nucleotide pool (OL224) that fixed the core motif, GCGTG, between seven random nucleotides on the 3`- and 5`-ends. This oligonucleotide pool, containing approximately 3 times 10^8 possible sequences, was subjected to three rounds of the selection and amplification protocol, and the selected oligonucleotides were sequenced (Fig. 2A). -Square analysis was performed (Fig. 2B) and indicated that nucleotide preference occurred at 11 of the 14 flanking positions, resulting in a consensus sequence of GGGNATYGCGTGACANNCC (underlined sequences are fixed, Fig. 2C). Again, analysis of the control oligonucleotide pool indicated that nucleotide preference was not the result of biased oligonucleotide synthesis (Fig. 2D). To confirm that our consensus sequence was highly specific for the AHRbulletARNT complex, we synthesized the corresponding oligonucleotide (OL318/319) and performed gel shift analysis. As demonstrated in Fig. 3, complex formation required both proteins. Neither ARNT nor AHRCDelta516 recognized this motif alone (Fig. 3, lanes 1-3), recognition of the consensus sequence by full-length AHR and ARNT was ligand responsive (Fig. 3, lanes 4 and 5), and the complex was recognized by anti-ARNT and anti-AHR immunoglobulins (Fig. 3, lanes 6 and 7). Addition of purified immunoglobulin did not affect the migration of the AHRbulletARNT complex (Fig. 3, lane 8).


Figure 2: AHRbulletARNT selection analysis of sequences flanking the GCGTG core. A, The double-stranded OL224 oligonucleotide pool containing the fixed sequence, GCGTG, flanked by seven random nucleotides on each side was incubated with 1 fmol of reticulocyte lysate-expressed AHRCDelta516 and ARNT. The mixture was subjected to three rounds of DNA selection and amplification, and the individual clones were sequenced. B, tabulation of the nucleotide frequency at each position (n = 25). All frequencies were multiplied by 100. The frequencies of individual nucleotides were analyzed by -square at the p < 0.01 level. Frequencies above the expected level are underlined. C, nucleotides with above expected frequencies were used to derive an AHRbulletARNT consensus sequence. D, the double-stranded oligonucleotide pool (OL224) was cloned and sequenced to verify equal representation of each nucleotide, and the frequencies were calculated and analyzed by -square (n = 23).




Figure 3: Specific recognition of the derived consensus sequence obtained from DNA selection and amplification by the AHRbulletARNT complex. Approximately 0.5 fmol of reticulocyte lysate-expressed proteins were subjected to gel shift analysis using OL318/319, the derived AHRbulletARNT consensus sequence (Fig. 2C), as a probe. The AHRbulletARNT complex is indicated by the arrow. The incubations contained the following proteins: lane 1, ARNT alone; lane 2, AHRCDelta516 alone; lane 3, both AHRCDelta516 and ARNT; lane 4, full-length AHR and ARNT incubated with dimethyl sulfoxide (vehicle control); lane 5, full-length AHR and ARNT incubated with 10 µM beta-naphthoflavone; lane 6, AHRCDelta516 and ARNT incubated with anti-ARNT immunoglobulin; lane 7, AHRCDelta516 and ARNT incubated with anti-AHR immunoglobulin (G1295); lane 8, AHRCDelta516 and ARNT incubated with purified IgG.



Comparison of the AHRbulletARNT Selected Sequence With Bona Fide Enhancer Elements

To support the idea that our strategy would select for biologically relevant DNA binding motifs, we compared the consensus sequence selected by the AHRbulletARNT complex in vitro to sequences known to correspond to functional enhancers in vivo. For this comparison, we first analyzed 10 bona fide DREs to determine the frequency of nucleotides at each position (Fig. 4). These frequencies were then compared to the corresponding frequencies observed in the selected and amplified oligonucleotides (Fig. 4D). The in vitro derived consensus was similar to the bona fide DREs at 14 out of 19 of the nucleotide positions. Statistically significant differences were detected at the outer most positions(-8, -9, 9, 10) and at the -5 position.


Figure 4: Sequence comparison of AHRbulletARNT binding motifs with bona fide DREs found upstream of several regulated genes. A, the indicated binding motifs are found in the upstream regions of the following genes: sites A-F and DRE 4, mouse cytochrome P4501A1(31, 52) ; rXRE1 and rXRE2, rat cytochrome P4501A1 are identical to sites D and E and thus were omitted from the analysis(29) ; Ya DRE, glutathione S-transferase Ya(53) ; QR DRE, quinone reductase(54) ; and huXRE, human cytochrome P4501A1(55) . B, tabulation of nucleotide frequencies at each position (n = 10). All frequencies were multiplied by 100. The frequencies of individual nucleotides were analyzed by -square at the p < 0.01 level. Frequencies above the expected level are underlined. C, a bona fide AHRbulletARNT consensus sequence derived from statistically significant nucleotides. D, nucleotide frequencies of the DRE obtained by the selection and amplification analysis (in vitro, Fig. 2) were compared to those that occur in bona fide DREs from part A. Frequencies that were statistically significantly different are indicated by a superscript. a, C occurs more frequently in the bona fide DREs than the in vitro analysis. b, A occurs more frequently in the in vitro analysis than in bona fide DREs. c, A occurs more frequently in the bona fide DREs than in the in vitro analysis. d, C occurs more frequently in the in vitro analysis than in bona fide DREs. e, G occurs more frequently in the bona fide DREs than in the in vitro analysis.



Selection and Amplification of ARNT-Homodimer Recognition Sequences

Purified ARNT obtained from baculovirus-infected Sf9 cells (36) with the addition of unprogrammed reticulocyte lysate was subjected to the same DNA selection and amplification protocol described above, using double-stranded oligonucleotides generated from OL187. After four rounds of selection and amplification, 20 ARNT-specific sequences were aligned and analyzed by -square to yield a consensus sequence, CACGTG (Fig. 5). Unlike the oligonucleotides selected from OL187 by the AHRbulletARNT complex, no bias was observed due to flanking nucleotides, and no statistically significant specificities were observed for nucleotides that flanked this core (Fig. 5B). Four sequences that contained the AACGTG (AA17, AA18, AA19, AA20) motif were also amplified. Gel shift analysis demonstrated that these sequences were recognized by the ARNT complex but at a lower affinity than sequences containing the CACGTG sequence (data not shown). To confirm that the derived consensus sequence, CACGTG, was specific for ARNT homodimers, we synthesized the corresponding consensus oligonucleotide, and demonstrated that a specific ARNTbulletDNA complex was formed in gel shift analysis (Fig. 6A, lane 2). The presence of ARNT in the complex was confirmed by supershifting the complex in the presence of anti-ARNT immunoglobulin (Fig. 6A, lane 4) but not by purified immunoglobulin (Fig. 6A, lane 5). In agreement with our previous results(36) , purified bHLH-PAS proteins require heat denaturable factor(s) found in reticulocyte lysate for function (Fig. 6A, lanes 1-3). The addition of bovine serum albumin also stabilizes the ARNT dimer formation to a lesser degree, demonstrating that the only bHLH-PAS protein in the complex is ARNT (Fig. 6A, lane 3). Finally, we confirmed that the ARNTbulletDNA binding complex could be formed at the lower concentrations of ARNT that are typically generated in the reticulocyte lysate expression system and that may also be found in cells (i.e. 1 fmol/5 µl) (Fig. 6B, lane 1).


Figure 5: Determination of ARNT homodimer DNA recognition sites. A, the double-stranded oligonucleotide pool containing 13 random nucleotides (OL187) was incubated with 20 fmol of baculovirus-expressed ARNT and 10 µg of unprogrammed reticulocyte lysate, the mixture was subjected to four rounds of DNA selection and amplification, and the individual clones were sequenced. The most highly conserved sequence, CACGTG, is boxed. Lower case letters represent nucleotides that comprise the primer annealing region of the oligonucleotide and do not represent randomly selected nucleotides. B, tabulation of the nucleotide frequency at each position (n = 20). All frequencies were multiplied by 100. The frequencies of individual nucleotides were analyzed by -square at the p < 0.01 level. Frequencies above the expected level are underlined. C, An ARNT consensus sequence derived from statistically significant nucleotides.




Figure 6: Specificity of ARNT homodimer recognition of the derived consensus sequence. Approximately 4 fmol of baculovirus (Bac)-expressed ARNT (A) or 1 fmol of reticulocyte lysate-expressed ARNT (B) was subjected to gel shift analysis using OL329/330, the derived ARNT consensus sequence (Fig. 5C) as a probe. A, lane 1, baculovirus-expressed ARNT alone; lane 2, baculovirus-expressed ARNT with 10 µg of unprogrammed reticulocyte lysate; lane 3, baculovirus-expressed ARNT with 10 µg of bovine serum albumin; lane 4, baculovirus-expressed ARNT with 10 µg of unprogrammed reticulocyte lysate and anti-ARNT immunoglobulin; lane 5, baculovirus-expressed ARNT with 10 µg of unprogrammed reticulocyte lysate and purified IgG. B, lane 1, reticulocyte lysate-expressed ARNT; lane 2, reticulocyte lysate-expressed ARNT with anti-ARNT immunoglobulin. Panel B was subjected to three times longer exposure than panel A. The arrow indicates the ARNTbulletARNT DNA binding complex.



ARNT and SIM Interact Resulting in Unique DNA Binding Specificity

The selection and amplification protocol was performed to determine if ARNT could interact with SIM and recognize a specific DNA sequence. Using the oligonucleotide pool derived from OL187, we were unable to select and amplify a discrete SIMbulletARNTbulletDNA complex that was dependent on the presence of both proteins. We repeated the procedure using OL224 as the oligonucleotide source (see ``Discussion''). Following four rounds of selection and amplification, a pool of specific SIM/ARNT selected DNA was cloned and sequenced. Given the apparently weak interaction of the complex and the comigration of nonspecific protein-oligonucleotide species, 80 of the selected sequences were radiolabeled, and each was individually reanalyzed by gel shift analysis to confirm its interaction with both SIM and ARNT. Of the 80 amplified oligonucleotides, 19 were specific for the SIMbulletARNT complex as judged by the formation of specific gel shift bands that were detected only in the presence of both proteins and that were recognized by the ARNT-specific antibodies (Fig. 7A). Nucleotides that were associated with the SIMbulletARNTbulletDNA complex formation were identified by -square analysis and were used to derive a consensus sequence, GNNNNGTGCGTGANNNTCC (Fig. 7, B and C). Gel shift analysis using an oligonucleotide corresponding to the derived consensus sequence (OL331/332) confirmed that it was specific for the SIMbulletARNT complex (Fig. 8A). Again, complex formation required both proteins, since neither ARNT nor SIM could recognize the sequence alone (Fig. 8A, lanes 1-3), and the complex was recognized by ARNT-specific antibodies but not purified IgG (Fig. 8A, lanes 4 and 5).


Figure 7: Determination of SIMbulletARNT DNA recognition sites. A, Double-stranded OL224 containing the fixed sequence, GCGTG, and flanked by seven random nucleotides was incubated with 1 fmol each of reticulocyte lysate-expressed SIM and ARNT, the mixture was subjected to four rounds of DNA selection and amplification, and the individual clones were sequenced. The most highly conserved sequence, GTGCGTGA, is boxed. B, tabulation of the nucleotide frequencies at each position (n = 19). All frequencies were multiplied by 100. The frequencies of individual nucleotides were analyzed by -square at the p < 0.01 level. Frequencies above the expected level are underlined. C, A SIMbulletARNT consensus sequence derived from statistically significant nucleotides.




Figure 8: Gel shift analysis of the SIMbulletARNT DNA binding complex. Approximately 0.5 fmol of reticulocyte lysate-expressed SIM and 4 fmol of baculovirus-expressed ARNT were subjected to gel shift analysis using OL331/332, which contained the derived SIMbulletARNT consensus sequence (Fig. 7C), as a probe. A, specificity of SIMbulletARNT heterodimer recognition of the derived consensus sequence. The incubation mixtures contained ARNT with 5 µl of unprogrammed reticulocyte lysate (lane 1), SIM alone (lane 2), ARNT and SIM (lane 3), ARNT and SIM with the anti-ARNT immunoglobulin (lane 4), and ARNT and SIM with purified IgG (lane 5). The arrow indicates the SIMbulletARNT DNA binding complex. B, gel shift analysis of SIM and ARNT using P-labeled oligonucleotides corresponding to different flanking and core sequences. Gel shift experiments were performed with SIM alone (lanes 2, 5, 8, and 11), ARNT alone (lanes 3, 6, 9, and 12), or both SIM and ARNT (lanes 1, 4, 7, 10) using P-labeled OL331/332 (GGGATGTGCGTGACATTC, lanes 1-3), OL464/465 (GGGATGTACGTGACATTC, lanes 4-6), OL501/502 (AATTTGTACGTGCCACAGA, lanes 7-9), or OL503/504 (AATTTGTGCGTGCCACAGA, lanes 10-12). Unprogrammed reticulocyte lysate was added, if necessary to normalize the amount of lysate in each reaction.



While this work was in review, a report described a consensus sequence found upstream of SIM-regulated genes in Drososphila, GTACGTG(41) . This core sequence differed by a single nucleotide from the sequence deduced by our in vitro approach (i.e. GTACGTG versus GTGCGTG). Since our selected SIMbulletARNT sequence was biased for a G at this position due to the use of oligonucleotides with a fixed GCGTG core, we chose to examine the impact of this single nucleotide difference on binding by the SIMbulletARNT complex. To control for effects of adjacent sequences, we engineered oligonucleotides that contained these two core sequences into flanking sequences derived from either the SIMbulletARNT consensus that was deduced in Fig. 7C (i.e. GGGATGT(A/G)CGTGACATTC; OL464/465 and OL331/332; respectively) or the SIM-dependent enhancer found upstream of the Drosophila Tl gene (i.e. AATTTGT(A/G)CGTGCCACAGA; OL501/502 and OL503/504, respectively). Gel shift analysis indicated that all four sequences were bound by the SIM/ARNT with a similar binding affinity. Thus, either an A or a G is well tolerated at this position, with no difference in binding observed when the core sequence is within the context of the flanking sequences derived from the Tl enhancer (Fig. 8, A and B).

Half-site Recognition of ARNT, AHR, and SIM

The experiments described above suggest that ARNT is capable of forming a homodimer that recognizes the previously described E-box sequence, CACGTG, forming a heterodimer with the AHR recognizing TNGCGTG and forming a heterodimer with SIM recognizing GT(G/A)CGTG. Since all ARNT-containing complexes bind sequences with a GTG 3`-half-site and the ARNT alone complex binds a palindrome of this site (CACGTG), we conclude that this half-site corresponds to an ARNT binding half-site. The observation that unique heteromeric partners each yield different 5`-half-sites is consistent with T(C/T)GC being the 5`-half-site of the AHR and GT(G/A)C being the 5`-half-site of SIM.

Examination of Other Possible PAS-Protein DNA Complexes

In an effort to determine if additional PAS proteins could interact and generate DNA binding specificity, we attempted our selection and amplification protocol with either OL187 or OL224 and SIM, AHR, or a combination of the AHR and SIM. After several rounds of selection, neither the AHR, SIM, or a combination of the two proteins developed specific DNA binding complexes. To increase the sensitivity of these attempts, experiments were also performed using baculovirus-expressed AHR. All combinations were repeated three times without detection of a specific DNA binding complex. In addition, we synthesized oligonucleotides containing a palindrome of the predicted recognition half-sites of the AHR and SIM (core sequences of T(C/T)GCGC(A/G)A and GTGCGCAC, respectively). Gel shift analysis of either the AHR or SIM with these radiolabeled oligonucleotides failed to yield specific DNA binding complex formation (data not shown).

DNA Binding Specificity of bHLH-PAS Dimers for Their Selected Consensus Sequences and Various E-boxes

As an additional demonstration of DNA binding specificity, we used competitive binding analysis to compare the affinities of bHLH-PAS dimers for oligonucleotides corresponding to their consensus DNA sequences and a variety of E-boxes. Competitive binding analysis with each productive bHLH-PAS pair (i.e. AHRbulletARNT, ARNTbulletARNT, or SIMbulletARNT) demonstrated that each DNA binding complex had the greatest affinity for its derived consensus sequence over all of the E-box sequences tested (see Fig. 9, A-C). Presence of the ARNT homodimer consensus sequence (OL329/330) diminished the complex formation in all reactions that contained the ARNT protein (Fig. 9, A and C, lane 3). The ARNT homomeric species demonstrated the greatest affinity for the E-box CACGTG (Fig. 9B, lanes 3 and 5), with much lower affinity for the TNGCGTG sequence (Fig. 9B, lane 2) or the other E-boxes, CAGCTG or CATGTG (Fig. 9B, lanes 6 and 7).


Figure 9: Specificity of DNA recognition by AHRbulletARNT, ARNTbulletARNT, and SIMbulletARNT complexes. Gel shift analysis of incubation mixtures are shown containing reticulocyte lysate-expressed ARNT and AHR (0.5 fmol of each protein) with OL318/319 as the probe (A), baculovirus-expressed ARNT (4 fmol) and 10 µg of unprogrammed reticulocyte lysate with OL329/330 as the probe (B), or reticulocyte lysate-expressed SIM and ARNT (0.5 fmol of each protein) with OL331/332 as the probe (C) and 100-fold molar excess of the indicated competing oligonucleotides: lane 1, none; lane 2, OL318/319; lane 3, OL329/330; lane 4, OL 331/332; lane 5, OL321/322; lane 6, OL323/324; and lane 7, OL316/317.



Relative DNA Binding Affinities of AHR-, ARNT-, and SIM-containing Complexes

To obtain estimates of the relative DNA binding affinities of the full-length AHRbulletARNT, ARNTbulletARNT, and SIMbulletARNT complexes, we performed dissociation rate analysis using the gel shift assay as an end point. As shown in Fig. 10, the calculated half-life values of the full-length AHRbulletARNT and ARNTbulletARNT complexes are similar (3.2 versus 5.06 min) while that of the SIMbulletARNT complex was considerably more rapid (less than 0.2 min).


Figure 10: Dissociation rate analysis of the full-length AHRbulletARNT, ARNTbulletARNT, and SIMbulletARNT DNA binding complexes. Each binding reaction containing the indicated proteins was allowed to come to binding equilibrium with 1 ng of the appropriate radiolabeled oligonucleotide (i.e. the derived consensus sequence of each DNA binding complex). Excess of unlabeled oligonucleotide was added to the mixture, and aliquots were removed at the indicated time points. Each value represents the average of two independent experiments ± S.E. See ``Experimental Procedures'' for details.



Demonstration of PAS Protein Interactions by Coprecipitation

To further establish the interaction of bHLH-PAS proteins, we utilized a coprecipitation assay (Fig. 11). Protein-protein interactions of ARNT-AHR, ARNT-AHRCDelta516, and ARNT-ARNT, but not ARNT-SIM, were observed. Specificity of the ARNT-containing interactions was demonstrated by the lack of coprecipitation using the S-labeled GNDelta315 AHR construct in which most of the dimerization domain has been replaced by the DNA binding and dimerization domain of Gal4(2) . Interestingly, ARNT-ARNT interactions were observed only when the incubations contained the CACGTG-containing oligonucleotides.


Figure 11: Coprecipitation analysis of AHR-ARNT, ARNT-ARNT, and SIM-ARNT interactions. S-Labeled full-length AHR (top left), AHRCDelta516 (top middle), AHRGNDelta315 (top right), ARNT (bottom left), and SIM (bottom right) were coprecipitated in the presence (+) or absence(-) of the baculovirus-expressed six histidine-tagged ARNT (ARNT-his) using nickel-nitriloacetic acid resin in the presence or absence of the following: 10 µM beta-naphthoflavone (top left), OL318/319 (top middle), OL329/330, OL316/317, OL 323/324 (bottom left), and OL331/332 (bottom right).




DISCUSSION

Strategy

Our hypothesis was that bHLH-PAS proteins could form a variety of heteromeric and homomeric combinations and that each complex would display unique oligonucleotide binding specificities. We predicted that the analysis of these different recognition sites would allow us to deduce the half-site specificity of each protein. To test these ideas, we utilized a DNA selection and amplification strategy to identify the preferred recognition sequences of various AHR, ARNT, and SIM combinations(27) . The oligonucleotides bound by these protein complexes were isolated from pools of millions of independent, unbound sequences. Once selected by the protein complex, the oligonucleotides were isolated from nondenaturing polyacrylamide gels and amplified by PCR. To increase the specificity of the method, the oligonucleotide pools were typically subjected to multiple rounds of selection and amplification prior to cloning and sequence analysis. The power of this method arises from the fact that it is independent of any prior knowledge or preconceptions regarding DNA binding specificity and has the potential to yield information about protein-DNA interactions not readily attainable by more conventional methods such as DNA footprinting or site-directed mutagenesis of a single oligonucleotide sequence.

Specific versus Nonspecific Interactions

A number of approaches were used to ensure that amplified sequences were specific for the protein complex and not simply sequences that were nonspecifically comigrating in the gel. First, bands of amplified oligonucleotides were analyzed (considered specific) only if the band was dependent upon the presence of all of the bHLH-PAS proteins used in the assay. Second, specificity was confirmed by the capacity of ARNT- or AHR-specific antibodies to supershift the radiolabeled complex. Third, a consensus was deduced from each set of selected oligonucleotides, and this information was used to design consensus oligonucleotides that were used in gel shift assays to confirm specificity of interaction. Only in the case of SIMbulletARNT sequences was the presence of a comigrating nonspecific oligonucleotide observed. In this case, we reanalyzed each of the 80 amplified oligonucleotides independently by gel shift analysis to eliminate any nonspecific sequences (see above).

Validation of the DNA Selection and Amplification Strategy

To validate our strategy, we first employed this technique using the AHRbulletARNT complex that recognizes the DRE sequence, TNGCGTG(5, 29, 30, 31, 32) . We anticipated one of two outcomes. Either the AHRbulletARNT complex would recognize sequences containing this known core and validate our experimental approach or the complex would recognize a unique DNA sequence, such as the E-box motif, that is commonly recognized by most other bHLH proteins. In our initial experiment, we performed AHRbulletARNT selection on a pool of oligonucleotides that had mixed bases incorporated at 13 sequential positions (OL187). -Square analysis of the nucleotide frequencies at various positions revealed a consensus sequence of TNGCGTGC. This sequence was essentially identical to the previously described DRE, TNGCGTG(5, 29, 30, 31, 32) . No sequences conforming to E-boxes were found in any of the 24 clones that were sequenced.

The analysis presented in Fig. 1indicated that the positioning of the TNGCGTG core sequence within the random 13-mer was biased by the flanking sequences required for annealing PCR primers (i.e. most core sequences were found closer to the 3`-end of the oligonucleotide, Fig. 1A). This observation led us to examine the impact of flanking sequences on AHRbulletARNT DNA binding specificity. The analysis using OL224 as the oligonucleotide pool revealed a consensus binding sequence of GGGNAT(C/T)GCGTGACANNCC (Fig. 2). (^4)Nucleotides that were present at frequencies above expected random values were identified at 11 of the 14 flanking positions, including those in positions -4, -3, 4, 5, and 6. These results are consistent with those obtained using substitution mutagenesis of a DRE-containing oligonucleotide(31, 32) . The selection of flanking nucleotides suggests that both the AHR and ARNT (or other proteins within this complex) are capable of DNA contacts at sites adjacent to the commonly recognized core sequence. In addition, our results suggest that positions not identified previously, the -9, -8, -7, -5, 9, and 10 positions, are selected for and thus could also play a role in the AHRbulletARNT-DNA recognition.

If binding affinity is the only determinant of a functional DRE in vivo, then our consensus sequence for the AHRbulletARNT complex should be identical to bona fide DREs. In an attempt to address this question, we compared our selected sequences to 10 DREs known to function upstream of TCDD-regulated genes. Since similarity is a difficult assertion to prove statistically, we identified those nucleotides that were statistically different. The most interesting discrepancy between the in vitro and in vivo consensus is the preference for an A at position -5 for the in vitro derived sequence and the lack of an A at -5 in any reported DRE. The absence of A at -5 may be an indication that inappropriate contacts are occurring in vitro, that additional proteins are required for in vivo interactions, or that some attenuation of binding affinity is required for optimal control of gene expression in vivo.

DNA Recognition by ARNT Homodimers

In an effort to determine half-site recognition of ARNT and to determine if ARNT could recognize a specific DNA sequence as a homodimer or as a heterodimer with other bHLH-PAS partners, we performed a series of selection and amplification experiments with various combinations of the AHR, ARNT, and SIM. The observation that ARNT is not found in association with Hsp90 (42) and is present at high concentrations in the nuclear compartment of hepatoma cells (34) led us to first attempt to characterize oligonucleotide sequences that were specifically bound by ARNT alone (presumably as an ARNT homodimer). We attempted to increase the sensitivity of the selection by using ARNT that had been purified from a baculovirus expression system(36) . Given our previous results suggesting that the purified AHR from this expression system required uncharacterized protein factors for DNA binding, we routinely added 10 µg of unprogrammed reticulocyte lysate to the ARNT/oligonucleotide incubation mixture(36) . Using these conditions, we found that ARNT recognized the sequence CACGTG. This complex migrated to a position similar to that of the AHRbulletARNT and SIMbulletARNT heterodimers, suggesting that ARNT recognized this sequence as an oligomer of a size similar to the other complexes, presumed to be dimeric. Further, the ability of bovine serum albumin to stabilize the ARNT complex (albeit to a lesser degree) indicates that the ARNT DNA binding complex is not the result of an interaction with an unknown protein present in the reticulocyte lysate. As shown in both Fig. 6B and Fig. 11, we could also detect this interaction using the concentrations of ARNT generated in our reticulocyte lysate system (1 fmol). This indicates that the interaction can occur at the lower ARNT concentrations that may be found in cell nuclei(34) . While this manuscript was in review, work by Sogawa et al.(43) also reported that ARNT homodimers recognize the CACGTG motif and used chimeric reporter constructs to suggest that this interaction may be capable of up-regulating endogenous promoters downstream of the corresponding E-box element in vivo. Our dissociation rate experiments indicate that the relative stabilities of the AHRbulletARNT and ARNTbulletARNT complexes for their respective recognition sites are similar (Fig. 10). However, analysis of these complexes by coprecipitation yielded lower amounts of complexed ARNTbulletARNT than that of AHRbulletARNT (especially in the absence of oligonucleotides). The reason for this discrepancy is unclear, but it is an indication that ARNTbulletARNT interactions are weaker than AHRbulletARNT interactions in the absence of DNA. Taken together, these studies suggest that the ARNTbulletARNT homodimer may act as an important transcriptional regulator through its interaction with E-box elements.

DNA Recognition by SIMbulletARNT Heterodimers

The observation that ARNT homodimeric complexes could specifically interact with DNA suggested that other bHLH-PAS combinations might recognize unique DNA sequences and shed light on the half-site recognition and pairing rules of this family of transcription factors. Although our initial attempts to demonstrate SIM-ARNT-DNA interactions using OL187 were unsuccessful, we also initiated the selection analysis with OL224. This strategy was initiated for two reasons. First, the AHR and SIM share the highest degree of sequence similarity in their bHLH, PAS, and C-terminal domains (Fig. 12)(6) , thus we predicted these would have the most similar DNA recognition sequences. Second, our preliminary experiments led us to suspect that amplification of ARNT homodimer-specific sequences (CACGTG) was preferentially occurring in our attempts to select and amplify sequences specific for the SIMbulletARNT complex. Our results presented in Fig. 5indicated that use of OL224 would minimize DNA interactions resulting from ARNT homodimers, thus minimizing contamination by ARNT-specific sequences (i.e. CACGTG). Using this strategy, we were able to amplify oligonucleotides that bound SIMbulletARNT complexes specifically, with the consensus sequence GNNNNGTGCGTGANNNTCC. Our failure to detect SIM-ARNT interactions using the coprecipitation assay (Fig. 11) combined with the rapid dissociation rate of the SIMbulletARNTbulletDNA complex (Fig. 10) indicate that the SIM-ARNT interaction is relatively weak. The weak interaction of the SIM-ARNT complex found in this study is in contrast to that reported by Sogawa et al.(43) .


Figure 12: Comparison of basic regions and recognition half-sites of bHLH-PAS proteins to other bHLH proteins. The conserved glutamic acid and arginine residues of class A and class B are indicated with a solid line. The conserved arginine that distinguishes class B is indicated with a dotted line. The letter B denotes a basic amino acid, and dashes represent highly degenerate positions.



Recently, a number of SIM-responsive elements have been cloned from Drosophila using an enhancer trapping technique(41) . Sequence alignment of these regulatory elements revealed a consensus motif, (G/A)(T/A)ACGTG. This sequence differs by a single nucleotide when compared to the SIMbulletARNT consensus core sequence we describe in Fig. 7, GTGCGTG. The difference exists at the -2 position (underlined) within the putative SIM binding 5`-half-site (A versus G). To examine the importance of this nucleotide position, we performed a series of gel shift experiments to determine the impact that this nucleotide had on SIMbulletARNT recognition. We found that both A and G at the -2 position are specifically bound by the SIMbulletARNT complex (Fig. 8B). Our inability to predict an A nucleotide at this position arose from our use of OL224 that has a fixed GCGTG core (see above). Thus, we conclude that the in vitro SIMbulletARNT consensus core sequence is more appropriately GT(A/G)CGTG, with GTACGTG possibly having greater relevance to SIM-responsive gene regulation in vivo.

Half-site Recognition of ARNT, AHR, and SIM

The identification of half-site recognition of ARNT, AHR, and SIM in combination with analysis of the amino acid sequences of their basic regions should provide insights into the relationships between the bHLH-PAS proteins and members of other bHLH families. Interestingly, the ARNT-specific sequence half-site is also recognized by other bHLH proteins such as Max(44) , Myc(45) , and USF(22, 46, 47) . The bHLH proteins that bind the 3`-half-site GTG sequence (binding CACGTG as homodimers) have been denoted as class B proteins and are distinguished by the presence of an arginine (R) residue in their basic region immediately following the sequence ERRR (i.e. ERRRR) (48) (Fig. 12). The bHLH proteins that lack this C-terminal Arg residue commonly recognize the 3`-half-site CTG sequence (binding CAGCTG) and are denoted class A. Our results suggest that ARNT is a class B protein since its homomeric form recognizes the palindromic CACGTG sequence with greatest affinity, and its basic region has an Arg residue at the characteristic position. In addition, many bHLH proteins possess a critical glutamic acid residue (ERRR), which has been shown to contact the CA of the E-box sequence CANNTG(49) . Although this residue is present in the basic region of ARNT, it does not occur at corresponding positions in either the AHR or SIM proteins. Thus, by predictions derived from these rules and from their primary amino acid sequences, neither the AHR nor SIM proteins would be expected to bind any known E-box half-sites. Our results support this prediction and suggest that when complexed with ARNT, the AHR has the greatest affinity for the 5`-half-site T(C/T)GC, and SIM has the greatest affinity for the half-site GT(A/G)C. We suggest that these proteins represent a unique class of bHLH proteins and designate this group as class C. While this paper was in review, another group determined the position of ARNT as the 3`-GTG half-site of the DRE(50) .

Pairing Rules of bHLH-PAS DNA Binding Complexes

Our results indicate that certain rules dictate pairing and subsequent DNA binding of bHLH-PAS proteins. In contrast to the identification of DNA binding complexes formed with ARNT alone, AHR and ARNT, or SIM and ARNT, no oligonucleotide sequences could be selectively amplified when the AHR and SIM (each alone or mixed) were used as the binding species. These experiments were repeated multiple times, using either OL187 or OL224 and the higher concentrations of protein that were attainable with baculovirus-expressed AHR. The fact that heterodimeric binding of the bHLH-PAS proteins was detected only with ARNT suggests that ARNT may be a general dimerization partner for PAS proteins that respond to cellular signals. In addition, the multiplicity of productive bHLH-PAS protein combinations may have a significant impact on the spectrum of DNA binding sites, enhancer elements, and responsive genes affected by these proteins in the presence and absence of compounds such as TCDD. A second explanation for the limited number of bHLH-PAS protein pairs that were detected by this method cannot be ruled out. Our inability to detect AHR or SIM homodimeric or AHR-SIM heterodimeric interactions with DNA may be due to a failure of the method to detect weaker protein-protein or protein-DNA interactions in vitro.

Summary

These data support several important conclusions. First, ARNT is capable of forming distinct DNA binding complexes with another molecule of ARNT, the AHR, or SIM. This suggests that bHLH-PAS proteins may be involved in a combinatorial mechanism of gene regulation that involves the formation of multiple homo- or heterodimeric pairs, each with a role in controlling expression of distinct batteries of genes(51) . For example, the observation that ARNT may interact with E-box elements suggests that in the absence of AHR agonists, ARNT homodimers play a role in the regulation of a second battery of genes, possibly through interactions at E-boxes that may be down-regulated in the presence of TCDD. Second, since ARNT is capable of recognizing DNA as a component of several distinct complexes, we were able to elucidate the DNA recognition half-sites of these PAS proteins. As predicted by amino acid sequence homology to other class B bHLH proteins, ARNT recognizes the 3`-half-site GTG. In contrast, the basic region amino acid sequences of both the AHR and SIM are unique and specify distinct 5`-half-sites, T(C/T)GC and GT(A/G)C, respectively. Finally, the AHRbulletARNT complex displays a preference for nucleotides that flank the core T(C/T)GCGTG motif, suggesting that the protein-DNA interactions of this complex extend beyond the core motif. Other PAS protein complexes (i.e. ARNTbulletARNT or SIMbulletARNT) display fewer preferences for flanking nucleotides, suggesting that the sequence specificity of various PAS protein complexes may differ substantially or may be less restricted than that of the AHRbulletARNT complex.


FOOTNOTES

*
This work was supported by The Pew Foundation, National Institutes of Health Grants ES-05703, ES-05660, and ES-05589, and a postdoctoral fellowship sponsored by The Colgate-Palmolive Co. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
To whom correspondence should be addressed: Dept. of Molecular Pharmacology and Biological Chemistry, Northwestern University Medical School, 303 East Chicago Ave., Chicago, IL, 60611. Tel.: 312-503-9855; Fax: 312-503-5349; c-bradfield@nwu.edu.

(^1)
The abbreviations used are: AHR, Ah receptor; bHLH, basic helix-loop-helix; TCDD, 2,3,7,8-tetrachlorodibenzo-p-dioxin; DRE, dioxin-responsive elements; PER, period; Hsp90, heat shock protein of 90 kDa; AHRCDelta516, AHR deletion in which 516 amino acid residues have been removed from the C terminus; AHRGNDelta315, AHR construct in which the N-terminal 316 amino acids have been replaced by the DNA binding and dimerization domain of Gal4(3) ; PCR, polymerase chain reaction.

(^2)
In this manuscript, we refer to the DNA binding complexes of bHLH-PAS proteins as dimers. This assumption is based on x-ray crystallographic evidence demonstrating that the fundamental DNA binding complex of most bHLH proteins is dimeric(23, 24, 49) . However, we cannot rule out the possibility that higher order complexes exist, such as dimers of dimers (tetramers), with each dimer independently interacting with separate DNA sites(22) . The existence of such higher order complexes would not alter any of our conclusions.

(^3)
Bona fide enhancer elements are those defined DREs that have been shown to regulate gene transcription in a ligand-responsive manner in vivo.

(^4)
Interestingly, this consensus sequence did not contain a C at position 4 as was observed in the analysis presented in Fig. 1C. Given the proximity of this C to the primer site in OL187, the observed bias reported using that oligonucleotide, and our inability to reproduce the conservation of C at position 4 using OL224, we consider the assignments derived in Fig. 2C as our final AHRbulletARNT selected consensus sequence.


ACKNOWLEDGEMENTS

We thank Alan Poland for the anti-ARNT polyclonal immunoglobulins, Stephen Crews for pSIMNB40, and Alfred Rademaker of The Biostatistics Core of the Robert Lurie Cancer Center for assistance with -square analysis.


REFERENCES

  1. Poland, A., and Knutson, J. C. (1982) Annu. Rev. Pharmacol. Toxicol. 22, 517-554 [CrossRef][Medline] [Order article via Infotrieve]
  2. Dolwick, K. M., Swanson, H. I., and Bradfield, C. A. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 8566-8570 [Abstract/Free Full Text]
  3. Hoffman, E. C., Reyes, H., Chu, F. F., Sander, F., Conley, L. H., Brooks, B. A., and Hankinson, O. (1991) Science 252, 954-958 [Medline] [Order article via Infotrieve]
  4. Reyes, H., Reisz-Porszasz, S., and Hankinson, O. (1992) Science 256, 1193-1195 [Medline] [Order article via Infotrieve]
  5. Denison, M. S., Fisher, J. M., and Whitlock, J. P., Jr. (1989) J. Biol. Chem. 264, 16478-16482 [Abstract/Free Full Text]
  6. Burbach, K. M., Poland, A., and Bradfield, C. A. (1992) Proc. Natl. Acad. Sci. U. S. A. 89, 8185-8189 [Abstract]
  7. Crews, S. T., Thomas, J. B., and Goodman, C. S. (1988) Cell 52, 143-151 [Medline] [Order article via Infotrieve]
  8. Ema, M., Sogawa, K., Watanabe, N., Chujoh, Y., Matsushita, N., Gotoh, O., Funae, Y., and Fujii-Kuriyama, Y. (1992) Biochem. Biophys. Res. Commun. 184, 246-253 [Medline] [Order article via Infotrieve]
  9. Jackson, F. R., Bargiello, T. A., Yun, S. H., and Young, M. W. (1986) Nature 320, 185-188 [Medline] [Order article via Infotrieve]
  10. Swanson, H. I., and Bradfield, C. A. (1993) Pharmacogenetics 3, 213-230 [Medline] [Order article via Infotrieve]
  11. Muralidhar, M. G., Callahan, C. A., and Thomas, J. B. (1993) Mech. Dev. 41, 129-138 [CrossRef][Medline] [Order article via Infotrieve]
  12. Zeng, H., Hardin, P. E., and Rosbash, M. (1994) EMBO J. 13, 3590-3598 [Abstract]
  13. Nambu, J. R., Lewis, J. O., Wharton, K. A., Jr., and Crews, S. T. (1991) Cell 67, 1157-1167 [Medline] [Order article via Infotrieve]
  14. Huang, Z. J., Edery, I., and Rosbash, M. (1993) Nature 364, 259-262 [CrossRef][Medline] [Order article via Infotrieve]
  15. Reisz-Porszasz, S., Probst, M. R., Fukunaga, B. N., and Hankinson, O. (1994) Mol. Cell. Biol. 14, 6075-6086 [Abstract]
  16. Whitelaw, M. L., Gottlicher, M., Gustafsson, J. A., and Poellinger, L. (1993) EMBO J. 12, 4169-4179 [Abstract]
  17. Neuhold, L. A., and Wold, B. (1993) Cell 74, 1033-1042 [Medline] [Order article via Infotrieve]
  18. Cabrera, C. V., and Alonso, M. C. (1991) EMBO J. 10, 2965-2973 [Abstract]
  19. Zhao, G.-Q., Zhao, Q., Zhou, X., Mattei, M.-G., and Crombrugghe, B. (1993) Mol. Cell. Biol. 13, 4505-4512 [Abstract]
  20. Amati, B., Brooks, M. W., Levy, N., Littlewood, T. D., Evan, G. I., and Land, H. (1993) Cell 72, 233-245 [Medline] [Order article via Infotrieve]
  21. Gu, W., Cechova, K., Tassi, V., and Dalla-Favera, R. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 2935-2939 [Abstract]
  22. Ferre'-D' Amare', A. R., Pognonec, P., Roeder, R. G., and Burley, S. K. (1994) EMBO J. 13, 180-189 [Abstract]
  23. Ferre-D' Amare', A. R., Pendergast, G. C., Ziff, E. B., and Burley, S. K. (1993) Nature 363, 38-45 [CrossRef][Medline] [Order article via Infotrieve]
  24. Ma, P. C., Rould, M. A., Weintraub, H., and Pabo, C. O. (1994) Cell 77, 451-459 [Medline] [Order article via Infotrieve]
  25. Blackwell, T. K., Huang, J., Ma, A., Kretzner, L., Alt, F. A., Eisenman, R. N., and Weintraub, H. (1993) Mol. Cell. Biol. 13, 5216-5224 [Abstract]
  26. Davis, R. L., Cheng, P.-F., Lassar, A. B., and Weintraub, H. (1990) Cell 60, 733-746 [Medline] [Order article via Infotrieve]
  27. Blackwell, T. K., and Weintraub, H. (1990) Science 250, 1104-1110 [Medline] [Order article via Infotrieve]
  28. Murre, C., McCaw, P. S., Vaessin, H., Caudy, M., Jan, L. Y., Jan, Y. N., Cabrera, C. V., Buskin, J. N., Hauschka, S. D., Lassar, A. B., Weintraub, H., and Baltimore, D. (1989) Cell 58, 537-544 [Medline] [Order article via Infotrieve]
  29. Fujisawa-Sehara, A., Sogawa, K., Yamane, M., and Fujii-Kuriyama, Y. (1987) Nucleic Acids Res. 15, 4179-4191 [Abstract]
  30. Neuhold, L. A., Shirayoshi, Y., Ozato, K., Jones, J. E., and Nebert, D. W. (1989) Mol. Cell Biol. 9, 2378-2386 [Medline] [Order article via Infotrieve]
  31. Shen, E. S., and Whitlock, J. P., Jr. (1992) J. Biol. Chem. 267, 6815-6819 [Abstract/Free Full Text]
  32. Yao, E. F., and Denison, M. S. (1992) Biochemistry 31, 5060-5067 [Medline] [Order article via Infotrieve]
  33. Dolwick, K. M., Schmidt, J. V., Carver, L. A., Swanson, H. I., and Bradfield, C. A. (1993) Mol. Pharmacol. 44, 911-917 [Abstract]
  34. Pollenz, R. S., Sattler, C. A., and Poland, A. (1994) Mol. Pharmacol. 45, 428-438 [Abstract]
  35. Poland, A., Glover, E., and Bradfield, C. A. (1991) Mol. Pharmacol. 39, 20-26 [Abstract]
  36. Chan, W. K., Chu, R., Jain, S., Reddy, J. K., and Bradfield, C. A. (1994) J. Biol. Chem. 269, 26464-26471 [Abstract/Free Full Text]
  37. Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual , 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
  38. Ellington, A. (1987) in Current Protocols in Molecular Biology (Ausubel, F. A., Brent, R. B., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., and Struhl, K., eds) pp. 2.5.1-2.5.6, Greene Publishing and Wiley-Interscience, New York
  39. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U. S. A. 74, 5463-5467 [Abstract]
  40. Milton, J. S., and Tsokos, J. O. (1983) Statistical Methods in the Biological and Health Sciences (Corrigan, J. J., and Amar, J. S., eds) McGraw-Hill, Inc., New York
  41. Wharton, K. J., Franks, R. G., Kasai, Y., and Crews, S. T. (1994) Development 120, 3563-3569 [Abstract/Free Full Text]
  42. McGuire, J., Whitelaw, M. L., Pongratz, I., Gustafsson, J. A., and Poellinger, L. (1994) Mol. Cell. Biol. 14, 2438-2446 [Abstract]
  43. Sogawa, K., Nakano, R., Kobayashi, A., Kikuchi, Y., Ohe, N., Matsushita, N., and Fujii, K. Y. (1995) Proc. Natl. Acad. Sci. U. S. A. 92, 1936-1940 [Abstract]
  44. Blackwood, E. M., and Eisenman, R. N. (1991) Science 251, 1211-1217 [Medline] [Order article via Infotrieve]
  45. Blackwell, T. K., Kretzner, L., Blackwood, E. M., Eisenman, R. N., and Weintraub, H. (1990) Science 250, 1149-1151 [Medline] [Order article via Infotrieve]
  46. Prendergast, G. C., and Ziff, E. B. (1991) Science 251, 186-189 [Medline] [Order article via Infotrieve]
  47. Pognonec, P., and Roeder, R. G. (1991) Mol. Cell. Biol. 11, 5125-5136 [Medline] [Order article via Infotrieve]
  48. Dang, C. V., Dolde, C., Gillison, M. L., and Kato, G. J. (1992) Proc. Natl. Acad. Sci. U. S. A. 89, 599-602 [Abstract]
  49. Ellenberger, T., Fass, D., Arnaud, M., and Harrison, S. (1994) Genes and Dev. 8, 970-980 [Abstract]
  50. Bacsi, S. G., Reisz-Porszasz, S., and Hankinson, O. (1995) Mol. Pharmacol. 47, 432-438 [Abstract]
  51. Lahoz, E. G., Xu, L., Schreiber-Agus, N., and DePinho, R. A. (1994) Proc. Natl. Acad. Sci. U. S. A. 91, 5503-5507 [Abstract]
  52. Lusska, A., Shen, E., and Whitlock, J. P., Jr. (1993) J. Biol. Chem. 268, 6575-6580 [Abstract/Free Full Text]
  53. Rushmore, T. H., King, R. G., Paulson, K. E., and Pickett, C. B. (1990) Proc. Natl. Acad. Sci. U. S. A. 87, 3826-3830 [Abstract]
  54. Favreau, L. V., and Pickett, C. B. (1991) J. Biol. Chem. 266, 4556-4561 [Abstract/Free Full Text]
  55. Hines, R. N., Mathis, J. M., and Jacob, C. S. (1988) Carcinogenesis 9, 1599-1605 [Abstract]

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.