(Received for publication, October 11, 1996, and in revised form, January 29, 1997)
From the Burnham Institute, La Jolla Cancer Research Center, La Jolla, California 92037
SATB1 is a cell type-specific nuclear matrix
attachment region (MAR) DNA-binding protein, predominantly expressed in
thymocytes. We identified an atypical homeodomain and two Cut-like
repeats in SATB1, in addition to the known MAR-binding domain. The
isolated MAR-binding domain recognizes a certain DNA sequence context
within MARs that is highly potentiated for base unpairing. Unlike the MAR-binding domain, the homeodomain when isolated binds poorly and with
low specificity to DNA. However, the combined action of the MAR-binding
domain and the homeodomain allows SATB1 to specifically recognize the
core unwinding element within the base-unpairing region. The core
unwinding element is critical for MAR structure, since point mutations
within this core abolish the unwinding propensity of the MAR. The
contribution of the homeodomain is abolished by alanine substitutions
of arginine 3 and arginine 5 in the N-terminal arm of the homeodomain.
Site-directed mutagenesis of the core unwinding element in the 3 MAR
of the immunoglobulin heavy chain gene enhancer revealed the sequence
5
-(C/A)TAATA-3
to be essential for the increase in affinity mediated
by the homeodomain. SATB1 may regulate T-cell development and function
at the level of higher order chromatin structure through the critical
DNA structural elements within MARs.
Eukaryotic chromosomes are thought to be separated into topologically independent loop domains by periodic attachment onto an intranuclear frame known as the nuclear matrix or skeleton, defined as the insoluble material left in the nucleus after a series of biochemical extraction steps (1). Specific DNA sequences that bind to the nuclear matrix in vitro are called matrix attachment regions (MARs),1 and these sequences have been postulated to form the base of chromosomal loops (reviewed in Refs. 2 and 3). MARs may be important to organize chromosomes and regulate DNA transcription and replication within the nucleus. In support of this notion, MARs often colocalize or are located in close proximity to regulatory sequences including enhancers (4-9), and some MARs can augment transcription from heterologous promoters in stable transformants (5-7, 10, 11). Recent evidence shows that MARs play a role in tissue-specific gene expression. The MARs associated with the immunoglobulin µ heavy chain locus are essential for transcription of a rearranged µ gene in transgenic B lymphocytes (12). Identification of the cell type-specific MAR-binding protein SATB1, which is predominantly expressed in thymocytes, shows that MARs can be specific targets for a cell type-specific factor (13).
SATB1 defines a novel class of DNA-binding proteins that recognize a
specific sequence context that exhibits a high base unpairing or
unwinding propensity. MARs are generally AT-rich and typically contain
a subregion(s) that exhibits a strong potential to base-unpair under
negative superhelical strain (10, 14). A high AT content, however, is
not sufficient to confer high affinity binding to SATB1; specific
mutations within MARs, which maintain the AT-richness but eradicate the
unwinding capability, substantially reduce or abolish SATB1 binding
(13). Analysis of SATB1 binding sites in MARs revealed that binding is
restricted to the subregion of MARs that has a high unwinding
propensity. This base-unpairing region consists of a cluster of
sequence stretches with a special AT-rich DNA sequence context, in
which Cs are sequestered exclusively on one strand and Gs on the other
(ATC sequences) (13). A short core unwinding element can be present
within one of these ATC sequences, which can be detected by virtue of
its most persistent base unpairing even under conditions that favor the
double-stranded DNA configuration; mutation of this element abolishes
the base-unpairing propensity of MARs (14). The unpairing potential was
demonstrated to be essential for MAR function; a concatemer, wild-type
(25)7, of the core unwinding element of the 3 MAR of the
immunoglobulin heavy chain (IgH) enhancer displays high binding
affinity to the nuclear matrix, unwinds under superhelical strain, and
enhances transcription from a linked reporter gene. A corresponding
mutated version, mutated (24)8, has lost all of these
properties (10).
To date, three proteins with similar binding specificity have been
identified in addition to SATB1: nucleolin, a major nucleolar protein
with multiple functions (15), p114, isolated from breast carcinoma
(16), and Bright, a protein that is predominantly expressed in B-cells
(17). These proteins bind with high affinity to MARs, and we showed
that nucleolin and p114 can distinguish wild-type (25)5
from mutated (24)8. Unlike other proteins known to bind
MARs such as lamin B1 (18) and topoisomerase II (19), SATB1
binds MARs with very high affinity, exhibiting dissociation constants
(Kd) in the range of 109 to
10
10 M, comparable to many sequence-specific
transcription factors.
To understand the biological role of SATB1, it is important to delineate the functional domains in this protein. A minimum 150-amino acid MAR-binding domain that contains novel DNA-binding motifs was previously identified (20). We report here that SATB1 contains an additional domain that shares homology with known homeodomains. Homeodomains are 60-amino acid DNA-binding domains, and their amino acid sequence is highly conserved, as well as their three-dimensional structure. Homeodomain proteins function in vitro and in vivo as sequence-specific transcription factors, and they are important developmental regulators that determine position or cell-type specificity (reviewed in Refs. 21 and 22). Unlike known homeodomains that directly and independently bind DNA, the homeodomain in SATB1 does not bind to the MAR probes analyzed here nor does it bind to a dimerized sequence (RP2) that resembles the homeodomain consensus sequence (23). When associated with the MAR-binding domain, however, the SATB1 homeodomain enhances binding specificity toward the core unwinding element of a MAR.
We performed searches of the SWISS-PROT data base (release 26.0, August, 1993) using the program Blast (24) and Blitz (25). Computations were performed using the Blast server at NCBI and the Blitz server at EMBL. Best results were obtained with Blitz searches using the PAM 120 matrix and a gap penalty of 13. To lower the background of nonsignificant matches, it was necessary to remove a segment rich in glutamines and prolines from the query sequence (residues 593-619 of SATB1). The MAR-binding domain was previously delineated by the successive deletion mapping combined with gel mobility shift analysis, and the repeated regions (boxes I and II) were detected by computer-aided sequence comparisons (20).
Protein ExpressionPlasmids for the fusion protein expression were constructed as follows. The desired SATB1 fragments were amplified from the human cDNA clone pAT1146 (13) by the polymerase chain reaction using Taq DNA polymerase and the appropriate primers containing a BamHI or EcoRI site. The fragments were isolated from agarose gels, purified by Elutip D columns (Schleicher & Schuell), and cloned in frame in the BamHI or BamHI/EcoRI site of the vector pGEX2T (Pharmacia Biotech Inc.). Deletion of the homeodomain was achieved by first synthesizing a fragment ranging from the N-terminal residue of the MAR domain (position 346) to the N-terminal residue of the homeodomain (position 641) and cloning it into BamHI/EcoRI-digested pGEX2T. In a second step, a fragment ranging from the C-terminal residue of the homeodomain (position 702) to the end of the cDNA was amplified and ligated in frame in the EcoRI site downstream of the insert made in the first step. Glutathione S-transferase (GST)-fusion proteins were overexpressed in Escherichia coli (XL1 Blue) and purified on glutathione-Sepharose according to standard procedures (26). Protein concentrations were determined using a Bradford protein assay kit (Bio-Rad), which was followed by quantitation of the fusion proteins by Coomassie Blue staining of SDS-polyacrylamide gels. To obtain precise comparisons, the different fusion proteins were run side by side on the same gel, and the band intensities were compared by laser densitometer scanning.
DNA-binding AssaysGel mobility shift assays were carried
out as described (13), with no poly(dI-dC)·poly(dI-dC) added or with
0.5 µg/20 µl. The 3 MAR is identical to the IgH 3
-En fragment
described previously (13). The wild-type 3
MAR and the mutated
fragments were subcloned in the EcoRI site of Bluescript
(Stratagene), and the fragments were isolated by EcoRI
restriction enzyme digestion and purification from an agarose gel.
Pentamer repeats of binding sites V and VI were made exactly as
described for wt(25)5 (13), using the following oligonucleotides: 5
-CTTAAAATTACTCTATTATTCGAAttc-3
with its
complementary strand 5
-TTCGAATAATAGAGTAATTTTAAGgaa-3
for
wt(V)5, and 5
-TTCCCTCTGATTATTGGTCTCCATGAAttc-3
with
5
-TTCATGGAGACCAATAATCAGAGGGAAgaa-3
for wt(VI)5. The
lowercase letters indicate single-stranded overhangs used for end to
end ligation of the double-stranded oligonucleotides.
Probes for gel mobility shift analysis were prepared by labeling isolated restriction fragments at both ends using Klenow polymerase and [32P]dATP. Under conditions of protein excess, the concentration required for half-maximal binding may be considered an estimate of the equilibrium binding coefficient (27). Autoradiographs of the gel mobility shift experiments were scanned by laser densitometry, and the percentage of free probe remaining was plotted against the protein concentration in nM.
DNA titration experiments were performed as described (28) with some modifications. The concentration of the DNA fragment to be labeled was determined using a TKO 100 minifluorometer (Hoefer Scientific Instruments), followed by agarose gel electrophoresis and ethidium bromide staining using a plasmid of known concentration as a standard. The concentration of protein that gave rise to a 40-70% shift at the lowest DNA concentration was determined empirically. All the DNA titrations were done in the presence of 0.5 µg/20 µl of poly(dI-dC). The binding reaction was incubated for 30 min at room temperature to ensure that equilibrium was reached. After electrophoresis the gels were dried and analyzed by a PhosphorImager (Bio-Rad).
Site-directed MutagenesisThe single point mutations mut 2 to mut 7, and mut IV of the 3 MAR were previously described (14). Mut
V, mut VI, and mut 8 were made by a PCR-based approach using four
primer sets (29). Briefly, complementary oligonucleotides containing
the desired mutations were synthesized, and they were used separately
as primers in two PCR reactions with either KS or SK primer from the
pBluescript polylinker region flanking the 300-bp 3
MAR. The two PCR
products, one containing the desired mutation at its 5
-end and the
other at its 3
-end, were mixed at an equimolar ratio, annealed, and amplified by PCR with both KS and SK primers. The amplified fragments containing the mutation were purified with a Wizard PCR preps system
(Promega), digested with EcoRI, and subcloned in the
EcoRI site of the vector Bluescript. Mut 8 was confirmed by
Sanger sequencing. Mut V and mut VI were confirmed by the presence of
an XhoI site in mut V or an SpeI site in mut VI,
which were introduced by the multiple point mutations (see Fig.
3A). Alanine substitutions were introduced in the
homeodomain following the Exsite PCR-based mutagenesis protocol
(Stratagene), with TaqPlus and Pfu polymerase (both from
Stratagene) and the pGEX2T plasmid containing the (MD + HD)-encoding insert as template. The mutations were designed to introduce a novel
restriction site and were confirmed by restriction enzyme digestion and
protein expression.
In addition
to the MAR-binding domain (residues 346-495) previously reported (20),
computer-aided homology searches of the Swiss-Prot data base (30)
identified a homeodomain homology at the C terminus of SATB1 (residues
641-702) (Fig. 1A). Many of the residues
that are most conserved among homeodomains are also found in the SATB1
homeodomain, which shares 33% identity with the engrailed class of
homeodomains (reviewed in Ref. 31, Fig. 1B). Identities are
found with residues that in the x-ray structure of other homeodomains
contribute to the hydrophobic core and residues that directly interact
with DNA (32). This putative homeodomain is, however, divergent. Major
differences include a single amino acid insertion at the end of the
first helix and a substitution of the highly conserved WFQ motif in the
third helix of known homeodomains by FFQ in both human and mouse
SATB1.
In addition to the homeodomain homology, a set of two repeats was found near the center of SATB1 (residues 370-445 and 493-568), similar to the Cut repeats of the Cut- and Clox-homeo-proteins of Drosophila and mammals, respectively. Cut proteins contain a homeodomain and three additional DNA-binding domains of 73 amino acids, called Cut repeats (33-36). The two Cut-like repeats in SATB1 (named here A and B) contain the previously documented repeats box I (residues 382-415 and 505-538) and box II (429-445 and 552-568), respectively (20) (Fig. 1, A and C). Repeat A occurs at the center of the MAR-binding domain of SATB1, but it does not include the N- and C-terminal amino acids that are mandatory for MAR binding (20). The two repeats of SATB1 are 45% identical over 75 residues with each other and display 27-35% identity with the Cut repeats. This similarity is considered to be significant, since no gaps were required for optimal alignment (Fig. 1C).
The Homeodomain Increases Binding Affinity of SATB1 to a MARMost homeodomain proteins contain a homeodomain as the sole
DNA-binding domain. A group of homeodomain proteins have additional domains that assist the homeodomain in DNA binding specificity (reviewed in Refs. 37 and 22). In the case of SATB1, the MAR-binding domain by itself is sufficient to recognize and bind a specific region
(base-unpairing region) within MARs that has a high propensity for base
unpairing, and the homeodomain may have a new role in DNA recognition.
To explore this possibility, glutathione S-transferase (GST)-SATB1 fusion proteins were constructed; one protein contained the
MAR domain and homeodomain linked together in their natural protein
context (GST(MD + HD)); one protein had the 60-amino acid homeodomain
specifically deleted (GST(MDHD)), and one fusion protein contained
the homeodomain separately (GST(HD)) (Fig.
2A). These purified fusion proteins were used
in quantitative gel mobility shift experiments with a fixed
concentration of a synthetic MAR probe, wild-type (25)5,
and increasing protein concentrations. This probe was derived from the
core unwinding element of the MAR located 3
of the IgH enhancer, and
it has the same properties as a natural MAR (10). Fig. 2B
shows the gel mobility shift experiments and the binding curves that
were derived from these autoradiographs. Each of these and the
following gel shift experiments were repeated at least three times
giving similar results. The isolated HD showed virtually no binding
activity for the wt(25)5 probe; however, when HD was
associated with MD (MD + HD), the binding affinity was approximately
10 times higher (Kd = 0.1 nM) than for
MD
HD (Kd = 1.0 nM). The affinity of
MD
HD toward wt(25)5 was virtually identical to that of
the isolated MD alone (GST(MD)), indicating that the C-terminal animo acids from 496 to 763 besides HD have no additional contribution toward
binding to wt(25)5 (data not shown). HD weakly bound to longer MAR fragments, but this activity was mainly nonspecific, since
it could be competed by nonspecific competitors (data not shown). This
effect of the homeodomain on binding affinity was confirmed by
additional DNA titration experiments, in which the dissociation
constants were determined using a fixed protein concentration and
increasing concentrations of the DNA probe (Fig. 2C).
Dissociation constants determined in this manner are independent of
minor variations in the protein concentration determination or the
amount of active protein in the protein preparation. The results
obtained from gel mobility shift experiments were quantitated using a
PhosphorImager, and the Kd values were calculated
from the least squares fit of a Scatchard plot of bound/free DNA as a
function of bound DNA. The approximate Kd values
were estimated from the negative reciprocal of the slope, and the
Kd for MD + HD (0.06 nM) was
approximately 7 times lower than for MD
HD (0.4 nM) (Fig.
2C). The dissociation constants determined by protein titration or DNA titration were similar, indicating that nearly all the
protein in the protein sample was in an active form.
The SATB1 Homeodomain Promotes Binding of the MAR Domain to the Core Unwinding Element of the IgH 3
SATB1 binds a variety of
MARs from different species and selectively recognizes sites within
MARs that are prone to become stably base-unpaired under negative
superhelical strain (13).2 The structural
properties and the SATB1 binding sites of the 5 MAR and the 3
MAR,
which flank the IgH enhancer, were previously characterized (13, 14)
(Fig. 3A). These natural MARs were used as
probes in quantitative gel mobility shift experiments to determine
whether HD can increase binding affinity to MARs in general. When the
5
MAR fragment was used as probe, HD had no effect on the binding
affinity, both MD + HD and MD
HD exhibited nearly equal affinity to
the 5
MAR (Kd = 7 and 10 nM, respectively) (Fig. 3B). In the case of the 3
MAR, however,
the association of HD with MD increased the affinity by 6-fold compared with MD alone; the dissociation constants (Kd) for
MD + HD and MD
HD were 2.5 and 15 nM, respectively
(Fig. 3B). This differential effect of the homeodomain could
be due to the different structural properties that distinguish these
two MARs. Both MARs contain a base-unpairing region, but only the IgH
3
MAR has a core unwinding element. The unwinding propensity is much
greater for the 3
MAR than the 5
MAR; in a supercoiled plasmid,
significant unwinding can be detected in the 5
MAR only when the 3
MAR is deleted (14). The core unwinding element is defined as a short
discrete site that resists base pairing even under conditions that
greatly favor a double-stranded configuration, and mutation of these
sites results in a complete loss of the unwinding propensity of the MAR. Previous missing nucleoside experiments (13) showed that SATB1
directly contacts three sites in the 5
MAR (sites I, II, and III) when
the isolated 5
MAR was used as a substrate. Using the 3
MAR as a
substrate, three adjacent ATC sequence stretches (sites IV, V, and VI)
were detected as the SATB1 contact sites (Fig. 3A). Binding
site IV overlaps with the core unwinding element and is the major
binding site, since SATB1 makes contacts with sites V and VI only when
site IV is mutated and is no longer bound (13).
To determine whether the homeodomain in SATB1 contributes to this
preferential recognition of site IV, we used mutated MAR fragments as
probes in gel mobility shift experiments with GST(MD + HD) and
GST(MDHD). Each of these mutated MARs had one of the three sites
destroyed by mutation and two sites intact (Fig. 3A). The affinity of the MAR-binding domain alone (MD
HD) to each of the three
mutated fragments was nearly the same as to wild-type 3
MAR with
estimated Kd values of 15, 20, 22, and 12 nM for 3
MAR, mut IV, mutV, and mut VI, respectively (Fig.
3B, only the results for wild-type 5
- and 3
MARs and mut
IV are shown). This result indicates that the MAR domain, in the
absence of the homeodomain, cannot effectively distinguish among the
three sites in the ATC sequence cluster. Regardless of which site was
mutated, binding by (MD
HD) was unaffected. On the other hand, the
presence of the homeodomain together with the MAR-binding domain
(MD + HD) exhibited a significantly reduced binding affinity to mut
IV (Kd = 14 nM) compared with wild-type
3
MAR (Kd = 2.5 nM) (Fig.
3B). No significant decrease in binding affinity was
detected for MD + HD to mut V or mut VI compared with wild type, as
long as site IV remained intact (data not shown). These results also show that the HD-mediated increase in affinity to the 3
MAR does not
merely reflect a cooperativity of binding, caused by the presence of
adjacent binding sites. If this were the case, any one of the three
mutations would be expected to abolish the effect of the homeodomain
and not just mutation of site IV. In fact, binding of (MD + HD) is
virtually noncooperative, since a Hill coefficient of 1.2 was
determined. On the contrary, the weaker binding of (MD
HD) to 3
MAR
appears to be cooperative, with a Hill coefficient of 2.2 (data not
shown).
The contribution of the homeodomain in directing SATB1 to the core
unwinding element was further confirmed by employing concatemers of
each site with short surrounding sequences as probes in gel mobility
shift experiments (data not shown). The concatemer wt(IV)5 is identical to the previously described synthetic MAR wild type (25)5 (10). If the homeodomain does assist the MAR-binding domain to preferentially recognize the core unwinding element, it
should specifically increase affinity to wt(IV)5 but not to wt(V)5 or wt(VI)5. Indeed, the increase in
binding affinity observed with (MD + HD) compared with (MDHD) was
10-fold for wt(IV)5 but less than 2-fold for
wt(V)5 and wt(VI)5 (data not shown). Thus, the
homeodomain of SATB1 contributes to binding specificity by selectively
increasing the affinity to site IV that contains the wild-type core
unwinding element. It should be noted that, although the MAR-binding
domain alone cannot distinguish among the three sites in the natural
context of the 3
MAR, when each binding site was concatemerized and
used separately as probe, (MD
HD) showed moderate preference for site
IV over site V and site VI. This preference for site IV, however, was
much more pronounced when MD was associated with HD.
These results strongly indicate that in the context of the 3 MAR
fragment, the MAR-binding domain of SATB1 is sufficient for the ATC
sequence context recognition, because it can bind to any one of the
three sites in the ATC sequence cluster of the IgH 3
MAR with
comparable affinities. The specific recognition of the core unwinding
element within the 300-bp MAR fragment, however, requires the
association of the MAR-binding domain with the homeodomain. The
homeodomain appears to direct SATB1 toward a preferential recognition
of the core unwinding element in a cluster of ATC sequences, as
illustrated in Fig. 5.
Specific Mutations within the N-terminal Arm of the SATB1 Homeodomain Reduce Homeodomain Activity
Homeodomains generally
contact DNA by two separate regions, an N-terminal arm lies in the
minor groove and specific DNA contacts are mediated by Arg-3 and Arg-5.
The third -helix or recognition helix fits in the major groove of
the recognition site, and Gln-50 and Asn-51 were shown to specifically
contact DNA (32, 37). These residues are conserved in the SATB1
homeodomain, and we tested by site-directed mutagenesis whether these
residues are required for the homeodomain-mediated increase in
affinity. In GST-(MD + HD) Arg-3 and Arg-5 of the N-terminal arm of
the homeodomain were substituted with alanine residues
(mutR3R5), and in the putative third helix the
FQN motif (position 50-52) was replaced with alanine residues (mutFQN)
(Fig. 4A). Mut R3R5
showed a 4.4-fold decrease in binding affinity to the 3
MAR in
comparison to that of wild-type MD + HD (Fig. 4B). The
effect of mut R3R5 is, therefore, comparable to
the effect of the homeodomain deletion that resulted in a 6-fold decrease in affinity. Mut FQN showed an intermediate effect on binding
affinity by exhibiting a 2.4-fold decrease in binding (Fig.
4B). Thus, the major contribution of the homeodomain is mediated by its N-terminal arm, most likely in the minor groove. This
binding may be supported by the interaction of the third helix of the
homeodomain in the major groove.
The Homeodomain Recognizes a Short (C/A)TAATA Motif That Colocalizes with the Core Unwinding Element
To examine if
specific residues in binding site IV are necessary for homeodomain
recognition, we analyzed a series of single point mutations as shown in
Fig. 4B, left panel. Among these, mut 4, mut 5, and mut 6 each had one of the three base substitutions made in mut IV. These
single point mutations did not alter the high unpairing propensity of
DNA sequences in the 3 MAR (14). When Kd values
were determined for (MD + HD) versus (MD
HD) using these
singly mutated fragments, it was found that the homeodomain did not
increase binding affinity of the GST-fused SATB1 to mut 5, mut 6, or
mut 7 (just like for mut IV). Mut 8, in which 5
-CTAATA-3
was replaced
with 5
-ATAATA-3
, had an intermediate effect; the homeodomain still
increased binding affinity to mut 8, although to a lesser extent than
wild type. These experiments show that the specific sequence
5
-(C/A)TAATA-3
(742-747), located within binding site IV
5
-TTCTAATATAT-3
(740-750), is essential for recognition by the SATB1
homeodomain. The MAR domain alone did not distinguish the point
mutations in the 300-bp 3
MAR fragment; the Kd
values for (MD
HD) were essentially the same for wild-type 3
MAR,
mut IV, and mut 2-8 (Fig. 4B). Furthermore, mut
R3R5 had a similar effect on binding affinity
as the homeodomain deletion (MD
HD). This series of experiments
indicates that the specificity of SATB1 toward the core unwinding
element of the 3
MAR is achieved by the presence of both MAR-binding
domain and the homeodomain. It remains to be established whether the homeodomain, when linked to the MAR-domain in the natural protein context, directly contacts DNA.
SATB1, a cell type-specific MAR-binding protein essential for T-cell development, contains a MAR-binding domain and a newly identified atypical homeodomain. These two domains act together to confer binding specificity toward the core unwinding element found within a MAR.
Multiple Domain Structure of SATB1The MAR-binding protein SATB1 contains a MAR-binding domain, a homeodomain, and two Cut-like repeats. The SATB1 homeodomain is unique among known homeodomains; a striking feature is the replacement of the invariant tryptophan at position 49 of the homeodomain with a phenylalanine in SATB1. This may have important implications for structure and function of the protein, since tryptophan 49 is not only conserved in all the homeodomains so far identified but is also essential for homeodomain function. Mutations of the WFQ motif containing the tryptophan 49 in the oct-1 homeodomain abolished DNA binding (38), and the mutant phenotype of dwarf mice, characterized by abnormal development of the anterior pituitary gland, is caused by a single point mutation that replaces tryptophan with cysteine in the POU homeodomain of pit-1 (39).
The presence of Cut-like repeats and a homeodomain in SATB1 suggests structural similarity to the Cut proteins identified from various species (34, 36, 40, 41). Cut proteins contain a set of three cut repeats followed by a homeodomain. The phenotype of mutants in Drosophila suggests a role for cut protein in cell specification in several tissues including the wing (42), the external sensory organs (34), and Malpighian tubules (43). SATB1 may be considered a distant relative of the cut family of proteins; however, the SATB1 homeodomain shares more homology with the homeodomain of engrailed (33% identity) than with that of Cut proteins (26% identity). Furthermore, unlike known Cut repeats that were shown to be specific DNA-binding domains (33, 35, 44), the Cut-like repeats in SATB1 did not appear to bind SATB1-binding sites. It remains to be established if the SATB1 cut-like repeats recognize other DNA sequences that were not tested here.
Homeodomain Contribution to MAR BindingThe isolated SATB1 homeodomain exhibits only very weak nonspecific binding activity to base-unpairing sequences. The MAR domain, on the other hand, can bind independently with high affinity and specificity; it distinguishes MARs that can unwind from mutated MARs that have lost this capability. Thus, the homeodomain initially appeared to be nonsignificant in DNA binding. However, a unique function is now attributed to this homeodomain. When associated with the MAR domain in the natural protein context, the SATB1 homeodomain directs the MAR domain to the core unwinding element of a MAR. This distinguishes SATB1 from the way by which Paired protein, the Drosophila Cut, and the mammalian Cut-like proteins recognize their target DNA. In these proteins, the homeodomains bind DNA independently, and the associated domains contribute to binding specificity by making additional DNA contacts (33, 35, 44, 45). The SATB1 homeodomain is similar to the homeodomains in the POU transcription factors, which cannot bind independently or bind with low affinity and relaxed specificity (reviewed in Ref. 46). In the case of the POU transcription factors, both the POU domains and the homeodomains are equally required for high affinity binding, and together they form a bipartite binding domain (38). For SATB1, on the other hand, the MAR domain alone displays fully functional MAR-binding activity, and the contribution of the homeodomain results in further selection of specific elements embedded within a MAR sequence context. The contribution of the homeodomain is small, however, and was previously missed when the minimum domain that confers MAR binding was delineated (20). This in part could be due to the active protein component in the full-sized bacterially produced SATB1 not being accurately determined.
The dissection of SATB1 protein in individual components has brought to
light how these multiple levels of recognition are ultimately put
together to achieve a high degree of binding site specificity that is
unprecedented among MAR-binding proteins. This is illustrated in Fig.
5. We had previously shown that SATB1 does not bind MARs
merely on the basis of their high AT content but that it specifically
recognizes AT-rich regions in MARs that have a high propensity for base
unpairing, and within these base-unpairing regions it exhibits a
preference for binding to the core unwinding element (13). First, we
showed in a separate study using a phage display library of random
peptides that a short peptide homologous to the N-terminal arm of the
MAR-binding domain can effectively recognize AT-rich DNA (47). This
suggests that the short homologous N- and C-terminal amino acid
stretches of the MAR-binding domain are individually sufficient for
recognizing AT-rich DNA, but to distinguish between AT-rich DNA with
high unwinding propensity and DNA that lacks this property, the entire
150-amino acid MAR-binding domain is required. Within an AT-rich DNA
sequence with high unwinding propensity, the specific recognition of
the core unwinding element that is critical in affecting overall DNA
structure of the MAR (14) is achieved by the combined action of a
unique homeodomain and a MAR-binding domain. Core unwinding elements
have been identified in several other MARs, such as in the MAR at the
5 boundary of the human
-globin locus control region
(48).2 These elements are remarkably similar to the SATB1
homeodomain recognition site of the IgH 3
MAR, which suggests that
SATB1 may exhibit preference for core unwinding elements in
general.
The MAR-binding domain in SATB1 binds DNA in the minor
groove, making little contact with DNA bases. SATB1 presumably
recognizes DNA sequences indirectly by binding to the altered sugar
phosphate backbone structure dictated by a specific DNA sequence
context (13). Although the homeodomain in SATB1 does not bind DNA
independently, mutagenesis of the target DNA revealed that a specific
sequence 5-(C/A)TAATA-3
, in the SATB1 binding site IV, is necessary
for the increase in affinity mediated by the homeodomain. Furthermore, the increase in affinity was almost completely abolished by alanine substitutions of arginine residues in the N-terminal arm of the SATB1
homeodomain, which is known in other homeodomains to bind the minor
groove. The corresponding region for other homeodomain was found to be
flexible and lack any secondary structure as shown by NMR and x-ray
crystallography (reviewed in Ref. 49). Therefore, the effect resulting
from alanine substitutions of the two arginine residues is unlikely to
be a consequence of the subsequent change in the overall protein
folding. These results taken together suggest, but do not prove, that
the homeodomain, in the context of the SATB1 protein, may directly
contact the target DNA site in the minor groove. Unlike other
homeodomains, mutagenesis of residues in the third helix, which is
known to interact with the major groove, has only a minor effect on
SATB1 binding. This finding is consistent with previous results showing
that SATB1 is a minor groove binding protein.
The SATB1 homeodomain recognition sequence found in site IV is similar
to the homeodomain binding site consensus, TAAT core (22, 50), and it
overlaps with the direct SATB1 contact site IV. Missing nucleoside
experiments revealed no additional contacts with (MD + HD) compared
with (MDHD) (data not shown). This result, taken together with the
fact that the sequence 5
-(C/A)TAATA-3
in site IV is responsible for
the positive effect of the homeodomain in SATB1 binding, may suggest
that upon binding to a MAR, the SATB1 homeodomain and the MAR domain
contact the same site simultaneously, possibly from opposite sides of
the DNA helix. Crystal structural analysis must be done to determine
whether the SATB1 homeodomain in its natural protein context directly
makes contact with DNA. It is of interest that the crystal structure of
the even-skipped homeodomain showed that two homeodomains
are bound by one 10-bp consensus sequence on both faces of the DNA,
without any steric hindrance (51). This simultaneous occupation of one
site from both sides of the DNA helix could provide significant
stability to the protein-DNA complex. This protein-DNA interaction is
unusual, however. The multiple DNA-binding domains found in the POU,
Cut, and the Paired proteins bind to sites that are juxtaposed.
Similarly, in the transcription factor oct-1, the POU-specific domain
and the homeodomain were suggested to occupy adjacent positions in the
major groove (52).
Homeodomains represent the hallmark of developmental regulatory proteins (reviewed in Ref. 21), and the presence of this domain in a MAR-binding protein is unprecedented. In this regard, SATB1 is unique among several other proteins that preferentially bind MARs in vitro including nucleolin (15), topoisomerase II (19), histone H1 (53), the high mobility group proteins HMG I/Y (54), lamin B1 (18), ARBP (55), and hnRNPU (SAF-A) (56-58). In fact, a recent study of SATB1 knockout mice showed that SATB1 ablation results in a major defect in T-cell development and alterations in expression of multiple genes.3 Genomic DNA sequences that are bound to SATB1 in vivo have recently been characterized based on cross-linking techniques. This study revealed that in the nucleus SATB1 actually binds DNA sequences containing ATC sequence clusters and that these sequences are tightly bound to the nuclear matrix, representing MARs.4 This result, together with the results from the SATB1 knockout experiments, suggests that higher order chromatin structure may be involved in T-cell-specific gene regulation. Such regulation could be directed toward MARs at the base of chromatin loops, in particular toward the core unwinding elements, as specified by the combined action of the MAR-binding domain and the homeodomain of SATB1.
We thank Dr. Yoshinori Kohwi for valuable discussions, Dr. Craig Hauser for helpful comments and critical reading of the manuscript, and Dr. Joel Gottesfeld for expert advice and constructive criticism of the manuscript.