(Received for publication, June 14, 1995; and in revised form, September 6, 1995)
From the
Two groups of HMG box proteins are distinguished. Proteins in
the first group contain multiple HMG boxes, are non-sequence-specific,
and recognize structural features as found in cruciform DNA and
cross-over DNA. The abundant chromosomal protein HMG-1 belongs to this
subgroup. Proteins in the second group carry a single HMG box with
affinity for the minor groove of the heptamer motif AACAAAG or
variations thereof. A solution structure for the non-sequence-specific
C-terminal HMG box of HMG-1 has recently been proposed. Now, we report
the solution structure of the sequence-specific HMG-box of the
SRY-related protein Sox-4. NMR analysis demonstrated the presence of
three -helices (Val
-Gln
,
Glu
-Leu
and
Phe
-Tyr
) connected by loop regions
(Ser
-Ala
and
Leu
-Pro
). Helices I and II are positioned in
an antiparallel mode and form one arm of the HMG box. Helix III is less
rigid, makes an average angle of about 90° with helices I and II,
and constitutes the other arm of the molecule. As in HMG1B, the overall
structure of the Sox-4 HMG box is L-shaped and is maintained by a
cluster of conserved, mainly aromatic residues.
The cloning of the RNA polymerase I transcription factor UBF ()(1) has originally led to the recognition of a
novel type of DNA-binding domain, the so-called HMG box. The HMG box
was named after its homology with high mobility group (HMG)-1 proteins
and is defined by a loose consensus sequence of about 80 amino
acids(2) . At this moment, more than 60 proteins with one or
more HMG boxes have been reported. An evolutionary study of the HMG box
family indicated that two major subfamilies can be
discriminated(3) . One of these subfamilies contains proteins
with a single HMG box, which binds with high sequence specificity to
variants of the DNA sequence (A/T)(A/T)CAAAG. Members of this subfamily
include products of the mammalian sex determinator Sry and
related Sox genes (Sry HMG box-containing
genes)(4, 5) , the Schizosaccharomyces pombe transcription factor Ste11+(6) , the lymphoid factors
TCF-1 (7, 8) and LEF-1(9, 10) , and
the products of several fungal mating type genes such as Mat-Mc of S. pombe(11) and Mt a1 of Neurospora crassa(12) .
DNA binding occurs in the
minor groove, as was shown for TCF-1, LEF-1, Mat-Mc, SRY, and
Sox-4 by methylation- and diethyl-pyrocarbonate carboxylation
interference footprinting and T(C/A)I nucleotide substitutions (13, 14, 15, 16) and is accompanied
by the induction of a strong bend in the DNA
helix(14, 16, 17, 18) . A bend-swap
experiment demonstrated that LEF-1 and its specific DNA-binding motif
can functionally replace bending induced by the integration host factor
at the attP locus in phage integrase
reaction(16) .
The other subfamily includes proteins with multiple HMG boxes and with a rather nonspecific affinity for DNA, such as the HMG-1 and -2 proteins(19) , UBF (1) and mtTF1(20) . Characteristic of these HMG boxes is their affinity for the cis-platinated -GG- adduct in DNA (21, 22) and cruciform DNA(23, 24) , independent of sequence determinants. This suggested that the non-sequence-specific HMG boxes recognize DNA structure instead of DNA sequence(25) .
Circular dichroism measurements and secondary
structure prediction methods indicated a high -helical content for
sequence-specific HMG domains(17) . This is consistent with NMR
studies on the tertiary structure of the second HMG box of HMG1 (26, 27) and HMG-D(28) . The 60-amino acid
core of these non-sequence-specific HMG boxes consists of three
-helices, which form an unusual L-shaped molecule. The angle
between the two arms is 70-80° and is defined by a cluster of
conserved, aromatic residues(26, 27, 28) .
Based on an identical secondary structure observed for the HMG box of
Sox-5, a similar L-shaped structure has been suggested for this
sequence-specific HMG domain (29) .
Hydrophobic interactions
of the HMG box of SRY with DNA by partial isoleucine side chain
intercalation predicts the positioning of an -helix into a widened
minor groove and might account for sequence specificity and DNA bending (30) Using the solution structure of rat HMG1B (26) a
model for the SRY-DNA complex was proposed(31) .
Since a
detailed structure for a sequence-specific HMG box has not yet
been determined, we have pursued the elucidation of the NMR solution
structure of the HMG box domain of the lymphocyte transcriptional
activator Sox-4. This HMG box shows high sequence-specific binding
toward the AACAAAG DNA-binding motif with a K of 10
M(15) . The
biological significance of the Sox-4 gene has recently been
underscored in a gene disruption experiment. Mice carrying two null
alleles of Sox-4 fail to develop functional valves in the heart and
have a severe block in early lymphoid development. (
)The NMR
data indicate that the secondary structure of Sox-4 HMG box is closely
related to that of Sox-5 (29) and that the overall fold
compares well with that of HMG1B (26, 27) and
HMG-D(28) .
The oligonucleotide probes were MW-1
(d(GGGAGACTGAGAACAAAGCGCTCTCACAC) annealed to
d(CCCGTGTGAGAGCGCTTTGTTCTCAGTCT)) and MW
-1sac
(d(GGGAGACTGAGCCGCGGTCGCTCTCACAC) annealed to
d(CCCGTGTGAGAGCGACCGCGGCTCAGTCT)).
NMR spectra were recorded on 500 and 600
MHz Bruker AMX spectrometers at 293 and 298 K. All spectra were
required with solvent suppression during relaxation delay. NOESY
spectra (37) were recorded with a mixing times of 100 and 150
ms. TOCSY spectra (38) were recorded with a clean MLEV17 pulse
sequence (39) and spin-locking times of 20, 40, 60, and 85 ms.
For these two-dimensional spectra 512 t increments
each consisting of 96 transients per FID of 2048 data points were
collected. Two-dimensional
N-
H HSQC spectra
were collected with 121-360 t
increments
consisting of 2-144 transients per FID of 1024 data points.
Three-dimensional
N-
H NOESY-HSQC spectra of
184 (t
)
64 (t
)
1024 (t
) datapoints and 8 transients/FID
with a mixing time of 150 ms and three-dimensional
N-
H TOCSY-HSQC spectra of 160 (t
)
64 (t
)
1024 (t
) data points and 24 transients/FID with
spin-locking times of 50 ms and a clean MLEV17 pulse sequence were
recorded. Pulsed field gradients were used for artifact suppression (40) . Fast exchange of amide protons with water were
identified from the difference of a NH sensitivity-enhanced
N-HSQC experiment with and without
presaturation(41) . In this experiment 160 (t
)
1024 (t
) points
were collected.
N backbone dynamics were determined using
H-
N heteronuclear NOE
experiments(42, 43) . Gradient sensitivity-enhanced
T
measurements (41, 43) were done
with relaxation times of 6, 12, 18, 24, 36, 54, 72, 96, 120, 150, and
192 ms. The
N magnetization was spin-locked in the
transverse plane during the relaxation period using a spin-lock field
strength of 2.5 kHz. Spectra with 160 (t
)
1024 (t
) data points were acquired.
The spectra
were processed on a Silicon Graphics workstation using the TRITON NMR
software package developed at the Bijvoet Center, University of
Utrecht. The two-dimensional spectra were processed using a
/2-shifted sine-bell window for t
and a
/3-shifted squared sine-bell window for t
.
The t
data of the two-dimensional spectra were
zero-filled to 1024 points. The three-dimensional spectra were
processed using a
/2.5-shifted sine-bell window for t
, a
/2-shifted sine-bell window for t
, and a
/2.5-shifted squared sine-bell
window for t
. The t
and t
data of the three-dimensional spectra were also
zero-filled to 256 and 128 points, respectively. Fourth-order
polynominal base-line corrections were applied in each frequency domain (44) . The
H chemical shift values were calibrated
using the H
O resonance with a chemical shift of 4.81
relative to 3-(trimethylsilyl)propionate at 293 K; the
N
chemical shift values were referred to the
NH
Cl signal at 22.3 ppm at 293 K. The spectra
were analyzed using the program ALISON developed at the Bijvoet Center,
University of Utrecht(45) .
For the generation and analysis
of Sox-4 HMG box structures InsightII version 2.2.0 (Biosym
Technologies Inc., San Diego, CA) was used. For distance geometry
calculations we used the program DGII(46) . Triangle smoothing
for sequential pairs of residues with a wobble of 10° for the
peptide bond planarity was used in generating the distance bounds
matrix. The structures were embedded by prospective metrization in four
dimensions using a uniform probability distribution for selecting trial
distances. The fit of the embedded structures was improved by a
weighted least-square fit of the distances in the newly embedded
coordinates to the distances in the trial distance matrix using 10
Guttman transformations with constant distance weights. For
optimization the structures were submitted to 10,000 iterations of
simulated annealing using an initial energy of 2500 kcal/mol, a maximum
temperature of 200 K, a time step of 0.2 ps, and atomic masses of 1
kDa. Finally, the structures were submitted to 2500 iterations of
conjugate gradient energy minimization.
The structures were refined
further by restrained energy minimization and molecular dynamics using
Discover version 2.8 (Biosym Technologies Inc., San Diego, CA). The
protocol consisted of an energy minimization phase using 500 iterations
of steepest descent and 3000 iterations of conjugate gradient
minimization, followed by molecular dynamics at 311 K of 10,000
iterations of 0.5 fs and a final energy minimization of 500 iterations
of steepest descent and 2500 iterations of conjugate gradient
minimization. The consistent valence force field was used without
cross-correlation terms and Morse potentials. In the calculations, the
weighting factors of all physical terms were set to 1, and a distance
restraint force constant of 300
kcalmol
Å
with a
maximum force of 2000
kcal
mol
Å
were
used. The peptide bonds were forced to trans with a force
constant of 60
kcal
mol
rad
.
The stereochemical quality of the structures was checked with the program PROCHECK(47) .
Figure 1:
A, silver-stained SDS-polyacrylamide
gel of the crude E. coli BL21(DE3) lysate after
isopropyl-1-thio--D-galactopyranoside induction (lane
2); the proteins precipitated by 60%
(NH
)
SO
(lane 3); and the
purified Sox-4 HMG box after cation exchange chromatography (lane
4). Lane 1 shows the molecular mass markers. B,
cation exchange elution of the Sox-4 HMG box peptide. The 60%
(NH
)
SO
pellet was redissolved,
loaded onto an Accell Plus CM cation exchange column, and eluted using
a linear salt gradient 0.1 M NaCl (10% buffer B) to 1 M NaCl (100% buffer B). The Sox-4 HMG box peptide elutes at 0.4 M NaCl (40% buffer B) as a single peak from the column. C, sequence-specific DNA binding of the Sox-4 HMG box peptide.
Gel retardation analysis shows that the Sox-4 HMG box binds to the
MW
-1 DNA probe (lane 1) containing the AACAAAG heptamer
motif of the CD3-
enhancer and does not interact with the
MW
-1sac DNA probe (lane 2), in which the heptamer motif
is changed to CCGCGGT.
Figure 2:
CD spectrum of the Sox-4 HMG box peptide.
Deconvolution of the CD spectrum revealed an -helical content of
54%. The CD spectrum was measured at 293 K in 10 mM sodium
phosphate, 100 mM NaCl, and 1 mM sodium azide, pH
7.4. The protein concentration was 48
µM.
Figure 3:
N-
H HSQC spectrum
of the Sox-4 HMG box peptide. The spectrum was recorded at 600 Mhz in
10 mM sodium phosphate, 100 mM NaCl, and 1 mM sodium azide, pH 6.5. The temperature was 293 K. W11 NH and W39 NH indicate the NH protons of the side chains of
Trp
and Trp
, respectively. One-letter amino
acid codes are used.
Figure 4:
Sequential NH-
NH
and C
H-
NH NOE contacts in helix I
(Val
-Gly
) of the Sox-4 HMG box peptide.
Slices from the two-dimensional NOE planes of a 600-MHz
three-dimensional
N-
H NOESY-HSQC spectrum of
the HMG box of Sox-4 are shown. The spectrum was recorded in 10 mM sodium phosphate, 100 mM NaCl, and 1 mM sodium
azide, pH 6.5. The temperature was 293 K.
The
observation of stretches of strong d(i, i + 1) and weak d
(i, i + 1) connectivities in combination with d
(i, i + 3), d
(i, i + 4), and
(i, i + 3) contacts (48) in
the three-dimensional
N-
H NOESY-HSQC and NOESY
spectra provided evidence for the existence of three
-helical
regions in the Sox-4 HMG box. The
-helices are formed by residues
Val
-Gln
, Glu
-Leu
and Phe
-Tyr
(Fig. 5). Based on
these NMR data an
-helical content of 53% was calculated for the
Sox-4 HMG box. This is consistent with the analysis of the CD spectrum
of the Sox-4 HMG box (Fig. 2), which revealed an
-helical
content of 54% (see ``Circular Dichroism'').
Figure 5:
Sequential and medium range NOE contacts,
NH proton exchange, backbone mobility, and -helical regions in the
Sox-4 HMG box. Residues with high and intermediate backbone mobility
are indicated by filled and open triangles,
respectively. Those residues with low backbone mobility are indicated
with plus signs. Filled and open circles indicate residues with fast and intermediate exchanging NH
backbone protons, respectively. Slowly exchanging NH backbone protons
are indicated by
signs. Boldface amino acids are not
assigned (see ``Assignment'').
Fast and
intermediate exchanging NH protons with water were identified from the
difference of a NH sensitivity-enhanced N-HSQC experiment
with and without presaturation. Fast exchanging NH protons are mainly
found outside the helical regions with exception of helix III, which
also contains a number of fast and intermediate exchanging NH protons (Fig. 5). A more or less similar distribution of mobile backbone
NH protons was observed in a heteronuclear NOE experiment ( Fig. 5and 6A). These observations are in accordance with a
less rigid and more exposed character of helix III. The most instable
region of helix III is
Glu
-Arg
-Leu
-Arg
-Leu
as is indicated by a patch of fast exchanging and mobile NH backbone
protons. However, helix III is not flexible as indicated by the T
relaxation times (Fig. 6B).
The different time scale of NH exchange,
H-
N
NOE, and NH T
relaxation explains the
seemingly contradictory results. Possible salt bridges between
Arg
and Glu
in helix I, between Arg
and Glu
in helix II, and between Lys
and Asp
in helix III might contribute to helix
stabilization(49) . Loop regions are located between Ser
and Ala
and between Leu
and Pro
(Fig. 5). The N-terminal residues
Asn
-Met
as well as the C-terminal amino
acids between Pro
and Pro
have an extended
conformation, as was indicated by the observation of strong sequential d
and weak d
contacts
and the absence of most medium range NOE contacts(48) . Turns
involving 4 residues are characterized by a strong d
(3,4) connectivity together with a d
(2,4) contact(48, 50) .
Such a pattern was found in the sequence
Leu
-Lys
-Asp
-Ser
. A
very strong d
contact was observed between
Asp
and Ser
. In addition, a d
(i,i + 2) contact
with medium intensity was detected between Lys
and
Ser
. Type I and type II turns can be distinguished from
each other by the intensity of the d
(2,3) (strong
in type I, absent in type II) and d
(2,3)
(weak in type I, strong in type II) connectivities(48) . Since
Lys
and Asp
show a weak d
(2,3) contact and a d
(2,3)
cross-peak with medium intensity, we were unable to classify this turn.
However, in our refined structure this sequence has a type I turn
conformation (see later). In the sequence
Ser
-Pro
-Asp
-Met
,
just after helix I, we found a very strong d
contact between Asp
and Met
and a weak d
(1,4) connectivity between Ser
and Met
, suggesting the presence of a turn.
Unfortunately, we were unable to identify a d
(2,4) contact in this sequence, since the
C
H proton of Pro
was not assigned. It is
noted that this sequence has a type I turn structure in the final model
(see later).
Figure 6:
A, the mobility of the backbone NH protons
as indicated by the ratio of the H-
N
heteronuclear NOE intensities (I/I
). I and I
are the peak intensities measured
with and without saturation of the protons during the NOE delay period.
Residues with an I/I
ratio > 0.70 are
considered as immobile. Those with values between 0.70 and 0.50 have
intermediate mobility, and residues with an I/I
ratio < 0.50 are mobile (see also Fig. 5). B, T
relaxation times of the backbone
N as determined from a series of T
experiments (see ``Materials and
Methods'').
The unassigned residues Arg-Asn
and Arg
-Lys
at the N and C
termini are most probably flexible and unstructured.
All interresidue
NOE cross peaks were classified according to their intensities as
strong, medium, or weak. The corresponding distance restraints were
1.8-2.75 Å (strong), 1.8-3.75 Å (medium), and
1.8-5.25 Å (weak). The three-dimensional structure was
calculated using these experimental restraints in a distance geometry
(DG) calculation followed by restrained molecular dynamics and energy
minimization calculations. The distribution of the NOE distance
restraints against the residue number is shown in Fig. 7. In
total 50 DG structures were generated. The 14 structures with highest
values of the DG error function were discarded. The 36 remaining
structures were submitted to a three-phase protocol consisting of an
energy minimization run (3500 iterations), molecular dynamics (5 ps,
311 K), and a final energy minimization step (3000 iterations). From
the resulting structures those with the lowest energy (<3000
kcal/mol) and with 6 distance violations of
0.1 Å were
selected. The stereochemical quality of the structures was evaluated
with the program PROCHECK(47) . Those structures with D-amino acids and/or cis peptide bonds were also
discarded. A final set of 15 structures is presented in Fig. 8.
The overall structure of the Sox-4 HMG box is L-shaped (Fig. 8).
Helix I (Val
-Gln
) and II
(Glu
-Leu
) are positioned in an
antiparallel mode and form one arm of the molecule. The position of
helix III (Phe
-Tyr
) varies, makes an
average angle of
90° with helices I and II, and constitutes
the other arm of the L-shaped HMG box. The average pairwise RMSD value
of the backbone atoms (N, C
, C`) of helix I
(Val
-Gln
) and II
(Glu
-Leu
) is 0.84 ± 0.29
Å. As a result of the variable position of helix III this value
goes up to 1.97 ± 0.77 Å when helix III is included.
However, the internal average pairwise RMSD of helix III
(Phe
-Tyr
) is 0.76 ± 0.26
Å. A similar pattern is observed when the RMSD value of the
C
backbone atoms is plotted against the residue number (Fig. 9). In accordance with the T
relaxation times (Fig. 6B), these data indicate that in
these computations helix III forms a helical element whose position
varies relative to helix I and II. This variation is caused by the
absence of long range NOE contacts between helix III and the other
parts of the Sox-4 HMG box. Residues Ala
, Phe
,
Met
, Val
, and Trp
,
Trp
, Leu
, and Phe
form a
hydrophobic core and stabilize the structure of Sox-4 HMG box. Note
that these residues with the exception of Lys
are
conserved within the HMG box family (2) .
Figure 7: Distribution of the number of NOE distance restraints against the residue number of the Sox-4 HMG box.
Figure 8:
Final
set of 15 structures of the Sox-4 HMG box. A, superposition of
helix I (Val-Gln
) and II
(Glu
-Leu
). The average pairwise RMSD
value of the backbone atoms of helix I and II is 0.84 ± 0.29
Å. Due to the variable position of helix III
(Phe
-Tyr
) relative to helix I and II
this value goes up to 1.97 ± 0.77 Å, when helix III is
included. B, superposition of helix III
(Phe
-Tyr
) of the Sox-4 HMG box. The
internal average pairwise RMSD value of the backbone atoms is 0.76
± 0.26 Å, indicative of a structured helix
element.
Figure 9:
RMSD values (Å) of the C backbone atoms of the final 15 structures of the Sox-4 HMG box
plotted against its residue number. RMSD calculation based on the
average HMG box structure (-) and with superposition of
helix I (Val
-Gly
) and II
(Glu
-Leu
)
(
).
Here, the NMR solution structure of the sequence-specific HMG
box of Sox-4 is presented. The overall L-shape structure compares well
with that reported for the non-sequence-specific HMG boxes of
HMG1B(26, 27) and HMG-D(28) , which recognize
structural features of DNA(25) . As in the HMG1B and HMG-D,
three -helical regions dominate the HMG box structure of Sox-4.
The sequential positions of helix I and II coincide with the
corresponding helices in HMG1B (26, 27) and
HMG-D(28) . Helix III is positioned between proline 49 and 66
and is 4 residues shorter than helix III of HMG1B (26, 27) and HMG-D(28) . Apparently, this
results from the helix-breaking Pro
, which is unique to
the sequence-specific HMG boxes (2) but is replaced by a
structurally neutral alanine in HMG1B (26, 27) and by
lysine in HMG-D (28) . The helices I and II are followed by
loops that start with type I turns (Ser
-Met
after helix I and Leu
-Ser
after helix
II). The presence of such turns was not reported for HMG1B (26, 27) and HMG-D(28) .
The overall HMG
box fold is stabilized by a hydrophobic core involving residues
Ala, Phe
, Val
, and
Trp
, Trp
, Leu
, and
Phe
. With the exception of Leu
, these
residues are conserved within the HMG box family, irrespective of their
binding specificity(2) . The structure of this hydrophobic core
should be considered as the HMG box ``signature.''
The
mechanism of binding to DNA is fundamentally different for the two
types of HMG boxes. The non-sequence-specific HMG1B box binds to
preexisting structures(25) , such as cruciform DNA (23, 24) and DNA bent by the cis-platinum
-GG- adduct(21, 22) . The binding of the HMG box
proteins to cruciform DNA has not been reported to induce
conformational changes in the DNA. Therefore, it is likely that the
rigid HMG1B-type box fits directly onto these unusual DNA structures.
This is in contrast with the sequence-specific HMG box proteins, that
alter the DNA conformation significantly. The binding of a monomeric
sequence-specific HMG box to the minor groove of a straight DNA helix (13, 14, 15, 16) introduces a sharp
bend (on the order of 90°) in the DNA helix as determined in
circular permutation
assays(14, 16, 17, 18) . This is
supported by the dispersion of the P resonances in the
SRY-DNA complex (31) .
Exchange of the N- and C-terminal
regions of the sequence-specific HMG box of hLEF-1 with those of
non-sequence-specific HMG1B showed that the sequence specificity of
hLEF-1 is maintained by the N- and C-terminal residues(51) .
Mutation of the sequence-specific HMG box of SRY at position V60L
(Ile), M64I (Met
), I68T (Met
), I90M
(Ile
), G95R (Gly
), K106I (Lys
) (41, 52, 53, 54, 55) as
well as the double mutation K298E,K299E (K2R3) and the point mutation
L301T (Met
) in the sequence-specific HMG box of LEF-1 (56) affect the DNA-binding. The corresponding residue
positions in Sox-4 are given in parentheses. These mutations are mainly
located in the N-terminal part of the HMG box. (Fig. 9).
Gly
is located in helix II, and Lys
is
positioned in the loop region between helix II and III. (Fig. 9). Mutations in other parts of the HMG box such as F109S
(Phe
) in SRY (55) and V316L (Met
),
and Y346S (Phe
) in LEF-1 (56) do not influence the
binding properties. However, they can still disrupt the biological
function of the protein as is demonstrated by the presence of the F109S
(Phe
) mutation in SRY of sex-reversed XY
female(55) . Of special interest is mutation M64I
(Met
) in SRY, which shows an almost normal DNA-binding
affinity, but decreases the DNA bending with 20°(55) .
The side chain of Ile (Met
) in the
N-terminal region of the sequence-specific HMG box of SRY intercalates
partially from the minor groove side between the two central AT base
pairs of its d(AACAATCA)
d(TGATTGTT) heptamer
motif(30, 31) . Note that in murine SRY and Sox-4 this
interacting Ile is replaced by Met. With this information a model for
the SRY-DNA complex was constructed(31) . Here, the concave
surface of the HMG box of SRY, whose structure was based on the NMR
solution structure of the non-sequence-specific HMG1B box, faces the
bent DNA with helix I, which is docked in a widened minor groove.
The effect of the mutations located in the N-terminal region of the
HMG box of SRY (41, 52, 53, 54, 55) and
LEF-1 (56) on the DNA binding (see also above and Fig. 10) indicate that the N-terminal residues of the HMG box
interact with the DNA. On the other hand the results of methylation-
and diethyl-pyrocarbonate carboxylation interference footprinting and
T(C/A)I nucleotide substitutions(13, 14, 15, 16) show that the HMG box interacts in the minor groove with
the first 6 base pairs of the d(AACAAAG)d(CTTTGTT) consensus
sequence.
Figure 10: The L-shaped Sox-4 HMG box structure. The amino acid residues (one-letter codes) corresponding with mutations in SRY and/or LEF-1 are indicated (see ``Discussion'').
Based on the notion that the N terminus of the HMG box
interacts with the first 6 base pairs of the d(AACAAAG)
d(CTTTGTT) binding sequence and the finding that Ile
(Met
) of SRY intercalates between the two central AT base
pairs of the d(AACAATCA)
d(TGATTGTT) heptamer
motif(30, 31) , we add the proposal that in the HMG
box-DNA complex the N terminus of the HMG box points in the direction
of the 5` AT base pair of the d(AACAAAG)
d(CTTTGTT) consensus
binding sequence. Considering the sequence homology between the HMG
boxes of SRY and Sox-4 a similar model for the Sox-4 HMG box-DNA
complex might be proposed. However, a definitive model awaits the
experimental determination of the structure of the complex of Sox-4 and
DNA.
Note Added in Proof-After this manuscript was submitted, the structure of the DNA complex of SRY (57) and LEF-1 (58) was reported.