©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
Solution Structure of the Sequence-specific HMG Box of the Lymphocyte Transcriptional Activator Sox-4 (*)

(Received for publication, June 14, 1995; and in revised form, September 6, 1995)

Leo P. A. van Houte (1) (2)(§) Vasily P. Chuprina (2) (3) Marc van der Wetering (1) Rolf Boelens (2) Robert Kaptein (2) Hans Clevers (1)

From the  (1)Department of Immunology, University Hospital, P.O. Box 85500, 3508 GA Utrecht, The Netherlands, (2)Bijvoet Center for Biomolecular Research, University of Utrecht, Padualaan 8, 3584 CH Utrecht, The Netherlands, and the (3)Institute of Mathematical Problems of Biology, Russian Academy of Sciences, 142292 Pushchino, Moscow Region, Russia

ABSTRACT
INTRODUCTION
MATERIALS AND METHODS
RESULTS
DISCUSSION
FOOTNOTES
REFERENCES

ABSTRACT

Two groups of HMG box proteins are distinguished. Proteins in the first group contain multiple HMG boxes, are non-sequence-specific, and recognize structural features as found in cruciform DNA and cross-over DNA. The abundant chromosomal protein HMG-1 belongs to this subgroup. Proteins in the second group carry a single HMG box with affinity for the minor groove of the heptamer motif AACAAAG or variations thereof. A solution structure for the non-sequence-specific C-terminal HMG box of HMG-1 has recently been proposed. Now, we report the solution structure of the sequence-specific HMG-box of the SRY-related protein Sox-4. NMR analysis demonstrated the presence of three alpha-helices (Val-Gln, Glu-Leu and Phe-Tyr) connected by loop regions (Ser-Ala and Leu-Pro). Helices I and II are positioned in an antiparallel mode and form one arm of the HMG box. Helix III is less rigid, makes an average angle of about 90° with helices I and II, and constitutes the other arm of the molecule. As in HMG1B, the overall structure of the Sox-4 HMG box is L-shaped and is maintained by a cluster of conserved, mainly aromatic residues.


INTRODUCTION

The cloning of the RNA polymerase I transcription factor UBF (^1)(1) has originally led to the recognition of a novel type of DNA-binding domain, the so-called HMG box. The HMG box was named after its homology with high mobility group (HMG)-1 proteins and is defined by a loose consensus sequence of about 80 amino acids(2) . At this moment, more than 60 proteins with one or more HMG boxes have been reported. An evolutionary study of the HMG box family indicated that two major subfamilies can be discriminated(3) . One of these subfamilies contains proteins with a single HMG box, which binds with high sequence specificity to variants of the DNA sequence (A/T)(A/T)CAAAG. Members of this subfamily include products of the mammalian sex determinator Sry and related Sox genes (Sry HMG box-containing genes)(4, 5) , the Schizosaccharomyces pombe transcription factor Ste11+(6) , the lymphoid factors TCF-1 (7, 8) and LEF-1(9, 10) , and the products of several fungal mating type genes such as Mat-Mc of S. pombe(11) and Mt a1 of Neurospora crassa(12) .

DNA binding occurs in the minor groove, as was shown for TCF-1, LEF-1, Mat-Mc, SRY, and Sox-4 by methylation- and diethyl-pyrocarbonate carboxylation interference footprinting and T(C/A)I nucleotide substitutions (13, 14, 15, 16) and is accompanied by the induction of a strong bend in the DNA helix(14, 16, 17, 18) . A bend-swap experiment demonstrated that LEF-1 and its specific DNA-binding motif can functionally replace bending induced by the integration host factor at the attP locus in phage integrase reaction(16) .

The other subfamily includes proteins with multiple HMG boxes and with a rather nonspecific affinity for DNA, such as the HMG-1 and -2 proteins(19) , UBF (1) and mtTF1(20) . Characteristic of these HMG boxes is their affinity for the cis-platinated -GG- adduct in DNA (21, 22) and cruciform DNA(23, 24) , independent of sequence determinants. This suggested that the non-sequence-specific HMG boxes recognize DNA structure instead of DNA sequence(25) .

Circular dichroism measurements and secondary structure prediction methods indicated a high alpha-helical content for sequence-specific HMG domains(17) . This is consistent with NMR studies on the tertiary structure of the second HMG box of HMG1 (26, 27) and HMG-D(28) . The 60-amino acid core of these non-sequence-specific HMG boxes consists of three alpha-helices, which form an unusual L-shaped molecule. The angle between the two arms is 70-80° and is defined by a cluster of conserved, aromatic residues(26, 27, 28) . Based on an identical secondary structure observed for the HMG box of Sox-5, a similar L-shaped structure has been suggested for this sequence-specific HMG domain (29) .

Hydrophobic interactions of the HMG box of SRY with DNA by partial isoleucine side chain intercalation predicts the positioning of an alpha-helix into a widened minor groove and might account for sequence specificity and DNA bending (30) Using the solution structure of rat HMG1B (26) a model for the SRY-DNA complex was proposed(31) .

Since a detailed structure for a sequence-specific HMG box has not yet been determined, we have pursued the elucidation of the NMR solution structure of the HMG box domain of the lymphocyte transcriptional activator Sox-4. This HMG box shows high sequence-specific binding toward the AACAAAG DNA-binding motif with a K of 10M(15) . The biological significance of the Sox-4 gene has recently been underscored in a gene disruption experiment. Mice carrying two null alleles of Sox-4 fail to develop functional valves in the heart and have a severe block in early lymphoid development. (^2)The NMR data indicate that the secondary structure of Sox-4 HMG box is closely related to that of Sox-5 (29) and that the overall fold compares well with that of HMG1B (26, 27) and HMG-D(28) .


MATERIALS AND METHODS

Plasmid Construction

The Sox-4 HMG box was cloned by PCR from pSox-4 DNA using the primers 5`-ATACATATGGCTAAGACGCCCAGTGGCCAC-3` and 5`-CCCGGATCCTACGACCTTCTTTCG-3` and inserted between the NdeI and BamHI sites of pET-3c(32, 33) . The identity of the subcloned HMG-box fragment was confirmed by DNA sequencing. The resulting plasmid was transformed into Escherichia coli strain BL21(DE3).

Production and Purification of the Sox-4 HMG Box Peptide

The production and purification of the Sox-4 HMG box peptide was basically done as described for the HMG box of TCF-1(17) . For the production of unlabeled HMG box peptide the transformed cells were grown at 37° C in LB, while for the production of uniformly N-labeled protein the cells were grown in minimal medium containing NH(4)Cl. Both media contained 100 µg/ml ampicillin. In the midlog phase the cells were induced with 0.3 mM isopropyl-1-thio-beta-D-galactopyranoside. After 3 h of induction the bacteria were harvested by centrifugation (20 min, 4000 times g, 4° C) and resuspended in ice-cold lysis buffer (50 mM Tris-HCl, 1 mM EDTA, 10% glycerol, 250 mM NaCl, 5 mM dithiothreitol, 4 mM CaCl(2), 40 mM MgCl(2), 0.5 mM phenylmethylsulfonyl fluoride, pH 8.0). Next, Triton X-100 (final concentration 0.1% (v/v)) was added, and the cells were lysed by sonification (10 times 2 min, 4° C). To reduce the viscosity of the cell lysate, the DNA was broken down with DNase I (10 µg/ml) for 15 min at room temperature. The cell debris was pelleted by centrifugation (15 min, 15,000 times g, 4° C). The DNA in the supernatant was precipitated with polyethyleneimine (final concentration 0.2% (v/v)). The HMG box peptide was collected in a 60% (NH(4))(2)SO(4) precipitation. The precipitate was resolved in 50 mM Tris-HCl, 1 mM EDTA, 10% glycerol, 50 mM NaCl, 1 mM NaN(3), 5 mM dithiothreitol, and 0.5 mM phenylmethylsulfonyl fluoride, pH 7.5, and dialyzed against the same buffer at 4° C. The dialyzed protein solution was applied to a 30 times 1-cm Accell Plus CM cation exchange column (Waters). The Sox-4 HMG box was eluted from the column with a 0.1-1 M NaCl gradient. The Sox-4 HMG box fractions were pooled, concentrated, and taken up in the desired buffer by Amicon ultrafiltration.

SDS-Polyacrylamide Gel Electrophoresis

The purity of the isolated protein was checked by SDS-polyacrylamide gel electrophoresis on a Pharmacia PhastSystem using precast 20% SDS-polyacrylamide gels. The gels were developed by silver staining.

Protein Sequencing

The identity of the isolated protein was also checked by analysis of the first 10 amino acids of the protein sequence (34) using an Applied Biosystems model 476A protein sequencing system.

Gel Retardation Analysis

The biological activity was tested in a gel retardation experiment. For this purpose annealed oligonucleotides were labeled by T4 kinase with [-P] ATP. The probes were purified by nondenaturing polyacrylamide electrophoresis. For a typical binding reaction, 10 ng of purified protein was incubated in a volume of 15 µl containing 10 mM Hepes, 60 mM KCl, 1 mM EDTA, 1 mM dithiothreitol, and 12% glycerol. After a 5-min incubation at room temperature, probe (10,000-20,000 cpm, equalling 1 ng) was added, and the mixture was incubated for an additional 20 min. The samples were than electrophoresed through a nondenaturing 8% polyacrylamide gel in 0.25 times TBE at room temperature.

The oligonucleotide probes were MW-1 (d(GGGAGACTGAGAACAAAGCGCTCTCACAC) annealed to d(CCCGTGTGAGAGCGCTTTGTTCTCAGTCT)) and MW-1sac (d(GGGAGACTGAGCCGCGGTCGCTCTCACAC) annealed to d(CCCGTGTGAGAGCGACCGCGGCTCAGTCT)).

Circular Dichroism Experiments

CD measurements were performed on a Jasco-600 spectropolarimeter equipped with a temperature-controlled water bath. The CD signal was calibrated with d-10 camphor sulfonic acid(35) . The spectrum represents an average of 10 scans. The CD spectra were fitted as described elsewhere(36) . The CD measurements were done at 293 K in 10 mM sodium phosphate, 100 mM NaCl, and 1 mM sodium azide, pH 7.4. The protein concentration of the CD sample was 48 µM.

NMR Experiments

NMR samples (95% H(2)O, 5% D(2)O) contained 1-2 mM Sox-4 HMG box in 10 mM sodium phosphate, 100 mM NaCl and 1 mM sodium azide, pH 6.5.

NMR spectra were recorded on 500 and 600 MHz Bruker AMX spectrometers at 293 and 298 K. All spectra were required with solvent suppression during relaxation delay. NOESY spectra (37) were recorded with a mixing times of 100 and 150 ms. TOCSY spectra (38) were recorded with a clean MLEV17 pulse sequence (39) and spin-locking times of 20, 40, 60, and 85 ms. For these two-dimensional spectra 512 t(1) increments each consisting of 96 transients per FID of 2048 data points were collected. Two-dimensional N-^1H HSQC spectra were collected with 121-360 t(1) increments consisting of 2-144 transients per FID of 1024 data points. Three-dimensional N-^1H NOESY-HSQC spectra of 184 (t(1)) times 64 (t(2)) times 1024 (t(3)) datapoints and 8 transients/FID with a mixing time of 150 ms and three-dimensional N-^1H TOCSY-HSQC spectra of 160 (t(1)) times 64 (t(2)) times 1024 (t(3)) data points and 24 transients/FID with spin-locking times of 50 ms and a clean MLEV17 pulse sequence were recorded. Pulsed field gradients were used for artifact suppression (40) . Fast exchange of amide protons with water were identified from the difference of a NH sensitivity-enhanced N-HSQC experiment with and without presaturation(41) . In this experiment 160 (t(1)) times 1024 (t(2)) points were collected. N backbone dynamics were determined using ^1H-N heteronuclear NOE experiments(42, 43) . Gradient sensitivity-enhanced T(1) measurements (41, 43) were done with relaxation times of 6, 12, 18, 24, 36, 54, 72, 96, 120, 150, and 192 ms. The N magnetization was spin-locked in the transverse plane during the relaxation period using a spin-lock field strength of 2.5 kHz. Spectra with 160 (t(1)) times 1024 (t(2)) data points were acquired.

The spectra were processed on a Silicon Graphics workstation using the TRITON NMR software package developed at the Bijvoet Center, University of Utrecht. The two-dimensional spectra were processed using a /2-shifted sine-bell window for t(1) and a /3-shifted squared sine-bell window for t(2). The t(1) data of the two-dimensional spectra were zero-filled to 1024 points. The three-dimensional spectra were processed using a /2.5-shifted sine-bell window for t(1), a /2-shifted sine-bell window for t(2), and a /2.5-shifted squared sine-bell window for t(3) . The t(1) and t(2) data of the three-dimensional spectra were also zero-filled to 256 and 128 points, respectively. Fourth-order polynominal base-line corrections were applied in each frequency domain (44) . The ^1H chemical shift values were calibrated using the H(2)O resonance with a chemical shift of 4.81 relative to 3-(trimethylsilyl)propionate at 293 K; the N chemical shift values were referred to the NH(4)Cl signal at 22.3 ppm at 293 K. The spectra were analyzed using the program ALISON developed at the Bijvoet Center, University of Utrecht(45) .

For the generation and analysis of Sox-4 HMG box structures InsightII version 2.2.0beta (Biosym Technologies Inc., San Diego, CA) was used. For distance geometry calculations we used the program DGII(46) . Triangle smoothing for sequential pairs of residues with a wobble of 10° for the peptide bond planarity was used in generating the distance bounds matrix. The structures were embedded by prospective metrization in four dimensions using a uniform probability distribution for selecting trial distances. The fit of the embedded structures was improved by a weighted least-square fit of the distances in the newly embedded coordinates to the distances in the trial distance matrix using 10 Guttman transformations with constant distance weights. For optimization the structures were submitted to 10,000 iterations of simulated annealing using an initial energy of 2500 kcal/mol, a maximum temperature of 200 K, a time step of 0.2 ps, and atomic masses of 1 kDa. Finally, the structures were submitted to 2500 iterations of conjugate gradient energy minimization.

The structures were refined further by restrained energy minimization and molecular dynamics using Discover version 2.8 (Biosym Technologies Inc., San Diego, CA). The protocol consisted of an energy minimization phase using 500 iterations of steepest descent and 3000 iterations of conjugate gradient minimization, followed by molecular dynamics at 311 K of 10,000 iterations of 0.5 fs and a final energy minimization of 500 iterations of steepest descent and 2500 iterations of conjugate gradient minimization. The consistent valence force field was used without cross-correlation terms and Morse potentials. In the calculations, the weighting factors of all physical terms were set to 1, and a distance restraint force constant of 300 kcalbulletmolbulletÅ with a maximum force of 2000 kcalbulletmolbulletÅ were used. The peptide bonds were forced to trans with a force constant of 60 kcalbulletmolbulletrad.

The stereochemical quality of the structures was checked with the program PROCHECK(47) .


RESULTS

Expression, Purification, and Characterization of the HMG Box of Sox-4

The HMG box of murine Sox-4 (amino acids 59-135) (15) was produced in a T7-based expression system(32, 33) . For this purpose a pET-3/Sox-4 HMG box plasmid was constructed. The identity of the inserted Sox-4 HMG box fragment was confirmed by DNA sequencing. This recombinant plasmid was transformed into E. coli BL21(DE3), where the Sox-4 HMG box was overexpressed after isopropyl-1-thio-beta-D-galactopyranoside induction (Fig. 1A). The overexpressed HMG box peptide was purified to homogeneity in a single-step cation exchange chromatographic run. A typical elution profile is presented in Fig. 1B. The procedure yielded 1-2 mg of Sox-4 HMG box protein/liter of bacterial culture with a purity greater than 95% as judged from a silver-stained SDS-polyacrylamide gel. The identity of the first 10 amino acids of the isolated peptide was confirmed by protein sequencing. The DNA binding activity of the protein was established in a gel retardation assay (Fig. 1C).


Figure 1: A, silver-stained SDS-polyacrylamide gel of the crude E. coli BL21(DE3) lysate after isopropyl-1-thio-beta-D-galactopyranoside induction (lane 2); the proteins precipitated by 60% (NH(4))(2)SO(4) (lane 3); and the purified Sox-4 HMG box after cation exchange chromatography (lane 4). Lane 1 shows the molecular mass markers. B, cation exchange elution of the Sox-4 HMG box peptide. The 60% (NH(4))(2)SO(4) pellet was redissolved, loaded onto an Accell Plus CM cation exchange column, and eluted using a linear salt gradient 0.1 M NaCl (10% buffer B) to 1 M NaCl (100% buffer B). The Sox-4 HMG box peptide elutes at 0.4 M NaCl (40% buffer B) as a single peak from the column. C, sequence-specific DNA binding of the Sox-4 HMG box peptide. Gel retardation analysis shows that the Sox-4 HMG box binds to the MW-1 DNA probe (lane 1) containing the AACAAAG heptamer motif of the CD3- enhancer and does not interact with the MW-1sac DNA probe (lane 2), in which the heptamer motif is changed to CCGCGGT.



Circular Dichroism

Fig. 2shows the CD spectrum of the HMG box peptide of Sox-4. Deconvolution of the spectrum predicted a secondary structure with 54% alpha-helix, 11% beta-sheet, and 35% random coil. A similar high alpha-helical content was observed for the HMG boxes of TCF-1(17) , HMG1B(26, 27) , HMG-D(28) , and Sox-5(29) .


Figure 2: CD spectrum of the Sox-4 HMG box peptide. Deconvolution of the CD spectrum revealed an alpha-helical content of 54%. The CD spectrum was measured at 293 K in 10 mM sodium phosphate, 100 mM NaCl, and 1 mM sodium azide, pH 7.4. The protein concentration was 48 µM.



NMR Measurements

Assignment

Unlabeled as well as uniformly N-labeled Sox-4 HMG box samples were used for NMR spectroscopy. The NMR data were collected at pH 6.5 and at temperatures of 293 and 298 K. Conditions with a pH lower than 6.5 resulted in precipitation of the protein, while at temperatures above 298 K the protein starts to unfold. The predominantly alpha-helical nature of the protein results in a limited chemical shift dispersion. This causes a severe overlap in the amide and fingerprint region of the NOESY and TOCSY spectra and makes it difficult to assign these spectra completely. However, the N signals of the various residues are well separated in the two-dimensional N-^1H HSQC experiment (Fig. 3). Therefore, we collected three-dimensional N-^1H NOESY-HSQC and three-dimensional N-^1H TOCSY-HSQC data at 293 and 298 K. The majority of sequential assignments of the amino acid spin systems was found by comparison of the amide region of three-dimensional N-^1H NOESY-HSQC and three-dimensional N-^1H TOCSY-HSQC spectra. In some cases additional information of NOESY, TOCSY, and/or two-dimensional N-^1H HSQC spectra was helpful. First, stretches of spin systems with sequential NH-NH and CH-NH NOE contacts were identified in the three-dimensional N-^1H NOESY-HSQC spectrum (Fig. 4). Assignment of these spin systems to specific residues was done by comparison of the amino acid side-chain resonances in the three-dimensional N-^1H NOESY-HSQC, three-dimensional N-^1H TOCSY-HSQC, NOESY, and TOCSY spectra. Two-dimensional spectra were especially helpful for the assignment of the side chains of the aromatic residues. Using this strategy we were able to assign more than 80% of the backbone resonances of the HMG box. The N and ^1H chemical shift data are deposited at the BioMagResBank (University of Wisconsin, Madison) (Table 1). Although we collected data sets at two temperatures (293 and 298 K), some residues at the N and C termini could not be identified, due to flexibility and/or overlap. Also, the assignments of Asn, Ala, Lys, Ile, and Pro, which are located in the two loop regions, could not be established.


Figure 3: N-^1H HSQC spectrum of the Sox-4 HMG box peptide. The spectrum was recorded at 600 Mhz in 10 mM sodium phosphate, 100 mM NaCl, and 1 mM sodium azide, pH 6.5. The temperature was 293 K. W11 NH and W39 NH indicate the NH protons of the side chains of Trp and Trp, respectively. One-letter amino acid codes are used.




Figure 4: Sequential NH-NH and CH-NH NOE contacts in helix I (Val-Gly) of the Sox-4 HMG box peptide. Slices from the two-dimensional NOE planes of a 600-MHz three-dimensional N-^1H NOESY-HSQC spectrum of the HMG box of Sox-4 are shown. The spectrum was recorded in 10 mM sodium phosphate, 100 mM NaCl, and 1 mM sodium azide, pH 6.5. The temperature was 293 K.





Secondary Structure

In total we found 426 identifiable interresidue NOE cross-peaks in the NOESY (100- and 150-ms mixing times) and the three-dimensional N-^1H NOESY-HSQC (150-ms mixing time) spectra: 256 sequential (i to (i + 1)), 106 medium range (i to geq(i + 2) and leq(i + 4)), and 56 long range (i to geq(i + 5)). A similar low number of interresidue NOE contacts was found for the non-sequence-specific HMG-D box(28) , while for the HMG1B box about twice as many interresidual NOEs were observed (26, 27) .

The observation of stretches of strong d(i, i + 1) and weak d(i, i + 1) connectivities in combination with d(i, i + 3), d(i, i + 4), and alphabeta(i, i + 3) contacts (48) in the three-dimensional N-^1H NOESY-HSQC and NOESY spectra provided evidence for the existence of three alpha-helical regions in the Sox-4 HMG box. The alpha-helices are formed by residues Val-Gln, Glu-Leu and Phe-Tyr (Fig. 5). Based on these NMR data an alpha-helical content of 53% was calculated for the Sox-4 HMG box. This is consistent with the analysis of the CD spectrum of the Sox-4 HMG box (Fig. 2), which revealed an alpha-helical content of 54% (see ``Circular Dichroism'').


Figure 5: Sequential and medium range NOE contacts, NH proton exchange, backbone mobility, and alpha-helical regions in the Sox-4 HMG box. Residues with high and intermediate backbone mobility are indicated by filled and open triangles, respectively. Those residues with low backbone mobility are indicated with plus signs. Filled and open circles indicate residues with fast and intermediate exchanging NH backbone protons, respectively. Slowly exchanging NH backbone protons are indicated by times signs. Boldface amino acids are not assigned (see ``Assignment'').



Fast and intermediate exchanging NH protons with water were identified from the difference of a NH sensitivity-enhanced N-HSQC experiment with and without presaturation. Fast exchanging NH protons are mainly found outside the helical regions with exception of helix III, which also contains a number of fast and intermediate exchanging NH protons (Fig. 5). A more or less similar distribution of mobile backbone NH protons was observed in a heteronuclear NOE experiment ( Fig. 5and 6A). These observations are in accordance with a less rigid and more exposed character of helix III. The most instable region of helix III is Glu-Arg-Leu-Arg-Leu as is indicated by a patch of fast exchanging and mobile NH backbone protons. However, helix III is not flexible as indicated by the T(1) relaxation times (Fig. 6B). The different time scale of NH exchange, ^1H-N NOE, and NH T(1) relaxation explains the seemingly contradictory results. Possible salt bridges between Arg and Glu in helix I, between Arg and Glu in helix II, and between Lys and Asp in helix III might contribute to helix stabilization(49) . Loop regions are located between Ser and Ala and between Leu and Pro (Fig. 5). The N-terminal residues Asn^6-Met^9 as well as the C-terminal amino acids between Pro and Pro have an extended conformation, as was indicated by the observation of strong sequential d and weak d contacts and the absence of most medium range NOE contacts(48) . Turns involving 4 residues are characterized by a strong d(3,4) connectivity together with a d(2,4) contact(48, 50) . Such a pattern was found in the sequence Leu-Lys-Asp-Ser. A very strong d contact was observed between Asp and Ser. In addition, a d(i,i + 2) contact with medium intensity was detected between Lys and Ser. Type I and type II turns can be distinguished from each other by the intensity of the d(2,3) (strong in type I, absent in type II) and d(2,3) (weak in type I, strong in type II) connectivities(48) . Since Lys and Asp show a weak d(2,3) contact and a d(2,3) cross-peak with medium intensity, we were unable to classify this turn. However, in our refined structure this sequence has a type I turn conformation (see later). In the sequence Ser-Pro-Asp-Met, just after helix I, we found a very strong d contact between Asp and Met and a weak d(1,4) connectivity between Ser and Met, suggesting the presence of a turn. Unfortunately, we were unable to identify a d(2,4) contact in this sequence, since the CH proton of Pro was not assigned. It is noted that this sequence has a type I turn structure in the final model (see later).


Figure 6: A, the mobility of the backbone NH protons as indicated by the ratio of the ^1H-N heteronuclear NOE intensities (I/I(0)). I and I(0) are the peak intensities measured with and without saturation of the protons during the NOE delay period. Residues with an I/I(0) ratio > 0.70 are considered as immobile. Those with values between 0.70 and 0.50 have intermediate mobility, and residues with an I/I(0) ratio < 0.50 are mobile (see also Fig. 5). B, T(1) relaxation times of the backbone N as determined from a series of T(1) experiments (see ``Materials and Methods'').



The unassigned residues Arg^3-Asn^6 and Arg-Lys at the N and C termini are most probably flexible and unstructured.

Tertiary Structure

Long distance NOEs were only observed between a limited number of residues, which are located in the hydrophobic core of the HMG box peptide. Ala^7, Phe^8, Met^9, Val, and Trp are located at the N-terminal end of helix I and contact Trp and Leu at the C-terminal end of helix II. Amino acids Phe, Glu, and Ala located in the N-terminal end of helix III show NOEs with Val and Trp of helix I.

All interresidue NOE cross peaks were classified according to their intensities as strong, medium, or weak. The corresponding distance restraints were 1.8-2.75 Å (strong), 1.8-3.75 Å (medium), and 1.8-5.25 Å (weak). The three-dimensional structure was calculated using these experimental restraints in a distance geometry (DG) calculation followed by restrained molecular dynamics and energy minimization calculations. The distribution of the NOE distance restraints against the residue number is shown in Fig. 7. In total 50 DG structures were generated. The 14 structures with highest values of the DG error function were discarded. The 36 remaining structures were submitted to a three-phase protocol consisting of an energy minimization run (3500 iterations), molecular dynamics (5 ps, 311 K), and a final energy minimization step (3000 iterations). From the resulting structures those with the lowest energy (<3000 kcal/mol) and with leq6 distance violations of leq0.1 Å were selected. The stereochemical quality of the structures was evaluated with the program PROCHECK(47) . Those structures with D-amino acids and/or cis peptide bonds were also discarded. A final set of 15 structures is presented in Fig. 8. The overall structure of the Sox-4 HMG box is L-shaped (Fig. 8). Helix I (Val-Gln) and II (Glu-Leu) are positioned in an antiparallel mode and form one arm of the molecule. The position of helix III (Phe-Tyr) varies, makes an average angle of 90° with helices I and II, and constitutes the other arm of the L-shaped HMG box. The average pairwise RMSD value of the backbone atoms (N, C, C`) of helix I (Val-Gln) and II (Glu-Leu) is 0.84 ± 0.29 Å. As a result of the variable position of helix III this value goes up to 1.97 ± 0.77 Å when helix III is included. However, the internal average pairwise RMSD of helix III (Phe-Tyr) is 0.76 ± 0.26 Å. A similar pattern is observed when the RMSD value of the C backbone atoms is plotted against the residue number (Fig. 9). In accordance with the T(1) relaxation times (Fig. 6B), these data indicate that in these computations helix III forms a helical element whose position varies relative to helix I and II. This variation is caused by the absence of long range NOE contacts between helix III and the other parts of the Sox-4 HMG box. Residues Ala^7, Phe^8, Met^9, Val, and Trp, Trp, Leu, and Phe form a hydrophobic core and stabilize the structure of Sox-4 HMG box. Note that these residues with the exception of Lys are conserved within the HMG box family (2) .


Figure 7: Distribution of the number of NOE distance restraints against the residue number of the Sox-4 HMG box.




Figure 8: Final set of 15 structures of the Sox-4 HMG box. A, superposition of helix I (Val-Gln) and II (Glu-Leu). The average pairwise RMSD value of the backbone atoms of helix I and II is 0.84 ± 0.29 Å. Due to the variable position of helix III (Phe-Tyr) relative to helix I and II this value goes up to 1.97 ± 0.77 Å, when helix III is included. B, superposition of helix III (Phe-Tyr) of the Sox-4 HMG box. The internal average pairwise RMSD value of the backbone atoms is 0.76 ± 0.26 Å, indicative of a structured helix element.




Figure 9: RMSD values (Å) of the C backbone atoms of the final 15 structures of the Sox-4 HMG box plotted against its residue number. RMSD calculation based on the average HMG box structure (-) and with superposition of helix I (Val-Gly) and II (Glu-Leu) (bullet).




DISCUSSION

Here, the NMR solution structure of the sequence-specific HMG box of Sox-4 is presented. The overall L-shape structure compares well with that reported for the non-sequence-specific HMG boxes of HMG1B(26, 27) and HMG-D(28) , which recognize structural features of DNA(25) . As in the HMG1B and HMG-D, three alpha-helical regions dominate the HMG box structure of Sox-4. The sequential positions of helix I and II coincide with the corresponding helices in HMG1B (26, 27) and HMG-D(28) . Helix III is positioned between proline 49 and 66 and is 4 residues shorter than helix III of HMG1B (26, 27) and HMG-D(28) . Apparently, this results from the helix-breaking Pro, which is unique to the sequence-specific HMG boxes (2) but is replaced by a structurally neutral alanine in HMG1B (26, 27) and by lysine in HMG-D (28) . The helices I and II are followed by loops that start with type I turns (Ser-Met after helix I and Leu-Ser after helix II). The presence of such turns was not reported for HMG1B (26, 27) and HMG-D(28) .

The overall HMG box fold is stabilized by a hydrophobic core involving residues Ala^7, Phe^8, Val, and Trp, Trp, Leu, and Phe. With the exception of Leu, these residues are conserved within the HMG box family, irrespective of their binding specificity(2) . The structure of this hydrophobic core should be considered as the HMG box ``signature.''

The mechanism of binding to DNA is fundamentally different for the two types of HMG boxes. The non-sequence-specific HMG1B box binds to preexisting structures(25) , such as cruciform DNA (23, 24) and DNA bent by the cis-platinum -GG- adduct(21, 22) . The binding of the HMG box proteins to cruciform DNA has not been reported to induce conformational changes in the DNA. Therefore, it is likely that the rigid HMG1B-type box fits directly onto these unusual DNA structures. This is in contrast with the sequence-specific HMG box proteins, that alter the DNA conformation significantly. The binding of a monomeric sequence-specific HMG box to the minor groove of a straight DNA helix (13, 14, 15, 16) introduces a sharp bend (on the order of 90°) in the DNA helix as determined in circular permutation assays(14, 16, 17, 18) . This is supported by the dispersion of the P resonances in the SRY-DNA complex (31) .

Exchange of the N- and C-terminal regions of the sequence-specific HMG box of hLEF-1 with those of non-sequence-specific HMG1B showed that the sequence specificity of hLEF-1 is maintained by the N- and C-terminal residues(51) . Mutation of the sequence-specific HMG box of SRY at position V60L (Ile^1), M64I (Met^5), I68T (Met^9), I90M (Ile), G95R (Gly), K106I (Lys) (41, 52, 53, 54, 55) as well as the double mutation K298E,K299E (K2R3) and the point mutation L301T (Met^5) in the sequence-specific HMG box of LEF-1 (56) affect the DNA-binding. The corresponding residue positions in Sox-4 are given in parentheses. These mutations are mainly located in the N-terminal part of the HMG box. (Fig. 9). Gly is located in helix II, and Lys is positioned in the loop region between helix II and III. (Fig. 9). Mutations in other parts of the HMG box such as F109S (Phe) in SRY (55) and V316L (Met), and Y346S (Phe) in LEF-1 (56) do not influence the binding properties. However, they can still disrupt the biological function of the protein as is demonstrated by the presence of the F109S (Phe) mutation in SRY of sex-reversed XY female(55) . Of special interest is mutation M64I (Met^5) in SRY, which shows an almost normal DNA-binding affinity, but decreases the DNA bending with 20°(55) .

The side chain of Ile (Met^9) in the N-terminal region of the sequence-specific HMG box of SRY intercalates partially from the minor groove side between the two central AT base pairs of its d(AACAATCA)bulletd(TGATTGTT) heptamer motif(30, 31) . Note that in murine SRY and Sox-4 this interacting Ile is replaced by Met. With this information a model for the SRY-DNA complex was constructed(31) . Here, the concave surface of the HMG box of SRY, whose structure was based on the NMR solution structure of the non-sequence-specific HMG1B box, faces the bent DNA with helix I, which is docked in a widened minor groove.

The effect of the mutations located in the N-terminal region of the HMG box of SRY (41, 52, 53, 54, 55) and LEF-1 (56) on the DNA binding (see also above and Fig. 10) indicate that the N-terminal residues of the HMG box interact with the DNA. On the other hand the results of methylation- and diethyl-pyrocarbonate carboxylation interference footprinting and T(C/A)I nucleotide substitutions(13, 14, 15, 16) show that the HMG box interacts in the minor groove with the first 6 base pairs of the d(AACAAAG)bulletd(CTTTGTT) consensus sequence.


Figure 10: The L-shaped Sox-4 HMG box structure. The amino acid residues (one-letter codes) corresponding with mutations in SRY and/or LEF-1 are indicated (see ``Discussion'').



Based on the notion that the N terminus of the HMG box interacts with the first 6 base pairs of the d(AACAAAG)bullet d(CTTTGTT) binding sequence and the finding that Ile (Met^9) of SRY intercalates between the two central AT base pairs of the d(AACAATCA)bulletd(TGATTGTT) heptamer motif(30, 31) , we add the proposal that in the HMG box-DNA complex the N terminus of the HMG box points in the direction of the 5` AT base pair of the d(AACAAAG)bulletd(CTTTGTT) consensus binding sequence. Considering the sequence homology between the HMG boxes of SRY and Sox-4 a similar model for the Sox-4 HMG box-DNA complex might be proposed. However, a definitive model awaits the experimental determination of the structure of the complex of Sox-4 and DNA.

Note Added in Proof-After this manuscript was submitted, the structure of the DNA complex of SRY (57) and LEF-1 (58) was reported.


FOOTNOTES

*
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

§
To whom correspondence should be addressed.

(^1)
The abbreviations used are: UBF, upstream binding factor of RNA polymerase I; NOESY, nuclear Overhauser effect spectroscopy; TOCSY, total correlation spectroscopy; RMSD, root mean square deviation; HSQC, heteronuclear single quantum coherence; DG, distance geometry; LEF-1, lymphoid enhancer factor-1; SRY, (sex-determining region Y) protein encoded by the human testis-determining gene SRY; TCF-1, T cell transcription factor-1; FID, free induction decay.

(^2)
M. Schilham, P. Moerer, and H. Clevers, unpublished results.


REFERENCES

  1. Jantzen, H. M., Admon, A., Bell, S. P., and Tjian, R. (1990) Nature 344, 830-836 [Medline]
  2. Ner, S. S. (1992) Curr. Biol. 2, 208-210
  3. Laudet, V., Stehelin, D., and Clevers, H. (1993) Nucleic Acids Res. 21, 2493-2501 [Medline]
  4. Gubbay, J., Collignon, J., Koopman, P., Capel, B., Economou, A., Munsterberg, A., Vivian, N., Goodfellow, P., and Lovell-Badge, R. (1990) Nature 346, 245-250 [Medline]
  5. Sinclair, A. H., Berta, P., Palmer, M. S., Hawkins, J. R., Griffiths, B. L., Smith, M. J., Foster, J. W., Frischau, A.-M., Lovell-Badge, R., and Goodfellow, P. N. (1990) Nature 346, 240-244 [Medline]
  6. Sugimoto, A., Lino, Y., Maeda, T., Watanabe, Y., and Yamamoto, M. (1991) Genes & Dev. 5, 1990-1999 [Medline]
  7. Van de Wetering, M., Oosterwegel, M., Dooijes, D., and Clevers, H. (1991) EMBO J. 10, 123-132 [Medline]
  8. Oosterwegel, M., van der Wetering, M., Dooijes, D., Klomp, L., Winoto, A., Georgopoulos, K., Meijlink, F., and Clevers, H. (1991) J. Exp. Med. 173, 1133-1142 [Medline]
  9. Travis, A., Amsterdam, A., Belanger, C., and Grosschedl, R. (1991) Genes & Dev. 5, 880-894 [Medline]
  10. Waterman, M. L., Fischer, W. H., and Jones, K. A. (1991) Genes & Dev. 5, 656-669 [Medline]
  11. Kelly, M., Burke, J., Smith, M., Klar, A., and Beach, D. (1988) EMBO J. 7, 1537-1547 [Medline]
  12. Staben, C., and Yanofsky, C. (1990) Proc. Natl Acad. Sci. U. S. A. 87, 4917-4921 [Medline]
  13. Van de Wetering, M., and Clevers, H. (1992) EMBO J. 11, 3039-3044 [Medline]
  14. Dooijes, D., Van de Wetering, M., Knippels, L., and Clevers, H. (1993) J. Biol. Chem. 268, 24813-24817 [Medline]
  15. Van de Wetering, M., Oosterwegel, M., van Norren, K., and Clevers, H. (1993) EMBO J. 12, 3847-3854 [Medline]
  16. Giese, K., Cox, J., and Grosschedl, R. (1992) Cell 69, 185-195 [Medline]
  17. Van Houte, L., van Oers, A., van de Wetering, M., Dooijes, D., Kaptein, R., and Clevers, H. (1993) J. Biol. Chem. 268, 18083-18087 [Medline]
  18. Ferrari, S., Harley, V. R., Pontiggia, A., Goodfellow, P. N., Lovell-Badge, R., and Bianchi, M. E. (1992) EMBO J. 11, 4497-4506 [Medline]
  19. Johns, E. W. (1982) The HMG Chromosomal Proteins , Academic Press, Inc., New York
  20. Parisi, M. A., and Clayton, D. A. (1991) Science 252, 965-969 [Medline]
  21. Pil, P. M., and Lippard, S. J. (1992) Science 256, 234-237 [Medline]
  22. Pil, P. M., Chow, C. S., and Lippard, S. J. (1993) Proc. Natl Acad. Sci. U. S. A. 90, 9465-9469 [Medline]
  23. Bianchi, M. E., Beltrame, M., and Paonessa, G. (1989) Science 243, 1056-1059 [Medline]
  24. Bianchi, M. E., Falciola, L., Ferrari, S., and Lilley, D. M. J. (1992) EMBO J. 11, 1055-1063 [Medline]
  25. Lilley, D. M. J. (1992) Nature 357, 282-283 [Medline]
  26. Weir, H. M., Kraulis, P. J., Hill, C. S., Raine, A. R., Laue, E. D., and Thomas, J. O. (1993) EMBO J. 12, 1311-1319 [Medline]
  27. Read, C. M., Cary, P. D., Crane-Robinson, C., Driscoll, P. C., and Norman, D. G. (1993) Nucleic Acids Res. 21, 3427-3436 [Medline]
  28. Jones, D. N. M., Searles, M. A., Shaw, G. L., Churchill, M. E. A., Ner, S. S., Keeler, J., Travers, A. A., and Neuhaus, D. (1994) Structure 2, 609-627 [Medline]
  29. Connor, F., Cary, P. D., Read, C. M., Preston, N. S., Driscoll, P. C., Denny, P., Crane-Robinson, C., and Ashworth, A. (1994) Nucleic Acids Res. 22, 3339-3346 [Medline]
  30. King, C-Y., and Weiss, M. A. (1993) Proc. Natl Acad. Sci. U. S. A. 90, 11990-11994 [Medline]
  31. Haqq, C. M., King, C.-Y, Ukiyama, E., Falsafi, S., Haqq, T. N., Donahoe, P. K., and Weiss, M. A. (1994) Science 266, 1494-1501 [Medline]
  32. Studier, F. W., and Moffatt, B. A. (1986) J. Mol. Biol. 189, 113-130 [Medline]
  33. Studier, F. W., Rosenberg, A. H., Dunn, J. J., and Dubendorff, J. W. (1990) Methods Enzymol. 185, 60-89 [Medline]
  34. Edman, P. (1950) Acta Chem. Scand. 4, 283-293
  35. Gillen, M. F., and Williams, R. E. (1975) Can. J. Chem. 53, 2351-2353
  36. De Jongh, H. H. J., and de Kruyff, B. (1990) Biochim. Biophys. Acta 1029, 104-112
  37. States, D. J., Haberkorn, R. A., and Ruben, D. J. (1982) J. Magn. Reson. 48, 286-297
  38. Davis, D. G., and Bax, A. (1985) J. Am. Chem. Soc. 107, 2820-2821
  39. Griesinger, C., Otting, G., Wuthrich, K., and Ernst, K. K. (1988) J. Am. Chem. Soc. 110, 7870-7872
  40. Bax, A., and Pochapsky, S. (1992) J. Magn. Reson. 99, 638-643
  41. Kaÿ, L. E. (1993) J. Am. Chem. Soc. 115, 2055-2057
  42. Peng, J. W., and Wagner, G. (1992) Biochemistry 31, 8571-8586 [Medline]
  43. Peng, J. W., and Wagner, G. (1992) J. Magn. Reson. 98, 308-332
  44. Boelens, R., Scheek, R. M., Dijkstra, K., and Kaptein, R. (1985) J. Magn. Reson. 62, 378-386
  45. Kleywegt, G. J. (1991) Computer-assisted Assignment of 2D and 3D NMR Spectra of Proteins . Ph.D. thesis, University of Utrecht, The Netherlands
  46. Havel, T. F., and Biosym Technologies (1992) NMRchitect Users Guide , Biosym Technologies Inc., San Diego, CA
  47. Laskowski, R. A., MacArthur, M. W., Moss, D. S., and Thornton, J. M. (1993) J. Appl. Cryst. 26, 283-291
  48. W ü thrich, K. (1986) NMR of Proteins and Nucleic Acids , John Wiley and Sons, Inc., New York
  49. Marqusee, S., and Baldwin, R. (1987) Proc. Natl. Acad. Sci. U. S. A. 84, 8898-8902 [Medline]
  50. Chazin, W. J., and Wright, P. E. (1988) J. Mol. Biol. 202, 623-636 [Medline]
  51. Read, C. M., Cary, P. D., Preston, N. S., Lnenicek-Allen, M., and Crane-Robinson, C. (1994) EMBO J. 13, 5639-5646 [Medline]
  52. Berta, P., Hawkins, J. R., Sinclair, A. H., Taylor, A., Griffiths, B. L., Goodfellow, P. N., and Fellous, M. (1990) Nature 348, 448-450 [Medline]
  53. Nasrin, N., Buggs, C., Kong, X. F., Carnazza, J., Goebl, M., and Alexander-Bridges, M. (1991) Nature 354, 317-320 [Medline]
  54. Harley, V. R., Jackson, D. I., Hextall, P. J., Hawkins, J. R., Berkovitz, G. D., Sockanathan, S., Lovell-Badge, R., and Goodfellow, P. N. (1992) Science 255, 453-456 [Medline]
  55. Pontiggia, A., Rimini, R., Harley, V. R., Goodfellow, P. N., Lovell-Badge, R., and Bianchi, M. E. (1994) EMBO J. 13, 6115-6124 [Medline]
  56. Giese, K., Amsterdam, A., and Grosschedl, R. (1991) Genes & Dev. 5, 2567-2578 [Medline]
  57. Werner, M. H., Huth, J. R., Gronenborn, A. M., and Clore, G. M. (1995) Cell 81, 705-714 [Medline]
  58. Love, J. J., Li, X., Case, D. A., Giese, K., Grosschedl, R., and Wright, P. E. (1995) Nature 376, 791-795 [Medline]

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.