Sequence, Structure, and Chromosomal Mapping of the Mouse
Lgals6 Gene, Encoding Galectin-6*
Michael A.
Gitt
§,
Yu-Rong
Xia¶,
Robert E.
Atchison
,
Aldons J.
Lusis¶,
Samuel H.
Barondes
, and
Hakon
Leffler
**
From the
Center for Neurobiology and Psychiatry,
Department of Psychiatry, and
Department of Pharmaceutical
Chemistry, University of California, San Francisco, California
94143-0984 and the ¶ Department of Medicine and Department of
Microbiology and Molecular Genetics, UCLA School of Medicine,
Los Angeles, California 90024
 |
ABSTRACT |
In the accompanying paper (Gitt, M. A.,
Colnot, C., Poirier, F., and Barondes, S. H., and Leffler, H. (1998) J. Biol. Chem. 273, 2954-2960), we reported
that mouse gastrointestinal tract specifically expresses two closely
related galectins, galectins-4 and -6, each with two carbohydrate
recognition domains in the same peptide. Here, we report the isolation,
characterization, and chromosomal mapping of the complete mouse
Lgals6 gene, which encodes galectin-6, and of a fragment of
a distinct gene, Lgals4, which encodes galectin-4. The
coding sequence of galectin-6 is specified by eight exons. The upstream
region contains two putative promoters. Both Lgals6 and the
closely related Lgals4 are clustered together about 3.2 centimorgans proximal to the apoE gene on mouse chromosome
7. The syntenic human region is 19q13.1-13.3.
 |
INTRODUCTION |
Galectins (1, 2) are a family of proteins that have at least one
carbohydrate recognition domain
(CRD)1 with conserved
sequence elements and affinity for
-galactosides. Although each
galectin is abundantly expressed in only a few cell types, the
distributions of the best studied galectins, galectin-1 and galectin-3,
encompass a wide range of tissues and change during embryogenesis. In
the accompanying paper (3), we have reported a much more restricted
expression of two other galectins, galectin-4 and galectin-6, to the
gastrointestinal tract both in fetal and adult mice. Galectin-4 and the
newly discovered galectin-6 (3) are closely related and belong to a
subfamily of galectins with two CRDs within one peptide chain, joined
by a link region of variable length (4), which also includes galectin-8
(5, 6) and galectin-9 (7, 8). We here report the isolation and
structure of Lgals6, the gene encoding galectin-6, and show its relationship to the structure of genes encoding galectins with a
single CRD (9-14), as well as features of the upstream region that may
account for the expression of galectin-6 in the gastrointestinal tract.
We also demonstrate that the Lgals4 gene encoding galectin-4
is distinct from Lgals6, and that these two genes are very
close together on mouse chromosome 7.
 |
EXPERIMENTAL PROCEDURES |
Materials and General Methods--
Unless otherwise indicated,
all nucleic acid enzymes were obtained from Boehringer Mannheim and all
chemicals were from Sigma. Nitrocellulose filters were from Schleicher
& Schuell, and Magnagraph nylon filters for blotting were purchased
from Micron Separations Inc. (Westboro, MA).
[
-32P]Deoxycytidine 5
-triphosphate (3000 Ci/mmol) and
[35S]deoxyadenosine 5
-(
-thio)triphosphate (1000-1500
Ci/ml, sequencing grade) were purchased from NEN Life Science Products.
For general molecular biological techniques such as hybridization
screening, restriction, gel electrophoresis, blotting, and elution, we
followed protocols collected by Maniatis et al. (15).
Oligonucleotides and Polymerase Chain Reactions
(PCR)--
Oligonucleotides are listed in Table
I. For probing of Southern blots, the
oligonucleotides were labeled with digoxigenin by 3
tailing using
digoxigenin-11-dideoxyUTP and terminal deoxynucleotide transferase, and
visualized by chemiluminescence after treatment with conjugated
anti-digoxigenin and using reagents and procedures from Boehringer
Mannheim. Hybridization was done at 37 °C in hybridization buffer
(200 mM Na2HPO4, pH 7.2, 7% SDS,
1% bovine serum albumin, 15% formamide, 1 mM EDTA), and
blots were washed for 10 min at room temperature in 2 × SSC, 1%
SDS.
PCR was carried out using Ampli-Taq (Perkin-Elmer). One µl
of different dilutions of template was mixed with 25 pmol of each primer, buffer (10 mM Tris-HCl, pH 8.3, 50 mM
KCl, 1.5 mM MgCl2, 0.001% (w/v) gelatin), and
250 µM deoxynucleotides. Amplification consisted of 45 cycles: 40 s of denaturation at 96 °C for the first five cycles
and 94 °C for the remaining cycles, 1 min of annealing at 60 °C,
and 1-4 min of extension at 72 °C. Amplified fragments were
visualized and purified on 1% agarose gels, stained with ethidium
bromide. Excised fragments were electroeluted, phenol-extracted, and
precipitated with ethanol.
Isolation of Lgals6 and Subcloning--
A mouse genomic DNA
(strain 129/SV) library in
FIXII (Stratagene, La Jolla, CA) was
screened with a cDNA probe containing all the coding sequence but
no untranslated sequence of rat galectin-4 (16). The probe was labeled
with [
-32P]dCTP by random primer polymerization (17)
and used in hybridization screening (15) of approximately 1 × 106 plaques using Escherichia coli SRB as host.
The hybridization was done in hybridization buffer (see above) plus
20% dextran sulfate at 52 °C with 2.4 × 105
cpm/ml probe. Washes were done at the hybridization temperature, first
in 2 × SSC (15), 1% SDS, then in 0.2 × SSC, 0.1% SDS, 30 min each. After drying, the filters were autoradiographed, using X-Omat
film (Eastman Kodak Co.) and intensifying screens at
70 °C.
One phage clone,
Lgals6, was isolated by plaque
purification, and its DNA was purified from high titer liquid culture.
The lysate was centrifuged at 6000 × g for 20 min, and
the supernatant was treated with 10 µg/ml DNase and 20 µg/ml RNase,
after which the phage were precipitated for 1 h at 4 °C with
10% PEG 8000 in 5 mM Tris-HCl, pH 7.5, 0.5 M
NaCl, 5 mM MgSO4 (final concentrations). The
pellet was resuspended in 10 mM Tris-HCl, pH 7.5, 10 mM MgSO4, and extracted with phenol and
chloroform. Finally, the phage DNA was precipitated with isopropanol
and resuspended in 10 mM Tris-HCl, pH 7.5, 10 mM NaCl, 1 mM EDTA.
The purified DNA from
Lgals6 was digested by
XbaI, and the two of the resulting three fragments that
hybridized with the rat galectin-4 cDNA probe were subcloned into
pBluescript SK+ (Stratagene, La Jolla, CA), generating clones
pLgals6-1 and pLgals6-2. pLgals6-3
containing a DNA fragment spanning the junction between clones
pLgals6-1 and pLgals6-2 was isolated by PCR
between primers mG6M and rG4F using
Lgals6 as template
followed by cloning into pCRII (Invitrogen, San Diego, CA).
Further subcloning of fragments of these Xba fragments is
described in Fig. 1. Clones
pLgals6-1a, pLgals6-2b, pLgals6-2c, and pLgals6-2f were generated by ApaI,
HindIII, PstI, and SstI digestion,
respectively, of the appropriate plasmid, followed by religation. DNA
fragments from pLgals6-2 were subcloned into pBluescript
yielding clones pLgals6-2a (2-kb HindIII
fragment), pLgals6-2d (1-kb HincII fragment),
pLgals6-2e (400-bp NcoI fragment), and
pLgals6-2g (800-bp PvuII/PstI
fragment). Additional fragments were generated by PCR and analyzed
directly (fLgals6-1b) or cloned into pCRII
(Invitrogen)(pLgals6-1c and pLgals6-2h).
fLgals6-1b and pLgals6-2h were from PCR between
the primer pairs mG6F/rG4C and mG6N/rG4G, respectively. To generate a
clone containing upstream sequence, we used pLgals6-1 as
template in PCR with the vector-specific T3 primer and the intron
I-specific antisense primer mG6E. However, the resulting product, clone
pLgals6-1c, contains only 227 bp of the upstream sequence
because the mG6E oligonucleotide, in addition to the expected priming
site in intron 1, inadvertently primed at a site within the upstream
region (nt
227 through
223, AAGGG, identical to the 3
-end of this
primer).

View larger version (17K):
[in this window]
[in a new window]
|
Fig. 1.
Subcloning strategy for
Lgals6. The top bar represents the
FIX-II clone, Lgals6, with insert (solid)
and part of the flanking sequence (hatched). Two
XbaI fragments of Lgals6 were cloned into
Bluescript generating clones pLgals6-1 and
pLgals6-2. Further subclones obtained by restriction
religation, subcloning, or PCR are shown below pLgals6-1 and
pLgals6-2. fLgals6-1b was isolated as a PCR
product without cloning. FIlled boxes represent exons
|
|
Sequencing--
The different subclones were sequenced using
primers synthesized based on rat galectin-4 sequence (16), and later,
mouse galectin-6 sequence (see Table I), as well as vector-specific primers. In most cases, we used a modification (10) of the Sanger technique (18) using Sequenase (U. S. Biochemical), as described by
the manufacturer. Denatured double-stranded DNA prepared by the method
of Kraft et al. (19) was the template. To eliminate artifactual banding caused by presumed secondary structure in intron 7, we used the method described by McCrea et al. (20) employing
a terminal deoxynucleotidyltransferase chase after the termination
reaction. All exonic regions, intron boundaries, upstream and
downstream sequences were verified by sequencing on both strands, except for the 3
end of exon 6, which, because of the repetitive DNA
in intron 6, was confirmed by sequencing with different primers on the
same strand.
Isolation of a Fragment of Lgals4 by PCR--
Inbred mouse
strain 129/SV genomic DNA (Jackson Laboratories, Bar Harbor, ME) was
amplified by means of oligodeoxynucleotides representing sequences
distributed throughout the galectin-4 gene. Oligonucleotides mG6F and
rG4C gave clear non-cDNA-sized bands on amplification, and
therefore a sample of the reaction was ligated into plasmid pCRII
(Invitrogen). DNA of selected clones was sequenced using T7 and M13
reverse primers, and the gene-specific primers mG6H and mG6K to obtain
sequence on both strands.
Restriction Map--
The size of each intron was determined by
one of several methods. Introns 1, 4, 5, and 7 were sequenced
completely. The size of intron 3 was determined by ApaI
restriction digest analysis of clone pLgals6-1. Introns 2 and 6 were sized by PCR amplification between exonic primers
surrounding the respective intron to generate fLgals6-1b and
pLgals6-2h (Fig. 1). The identity of the PCR products was
confirmed by sequencing the ends of each fragment, and the size was
determined by gel electrophoresis. Intron sizes aided in the analysis
of restriction digest data of both pLgals6-1 and pLgals6-2.
Primer Extension--
We used a modified version of the
procedure summarized by Ausubel et al. (21). For galectin-6,
we used the antisense primer mG6Q (Table I), and as controls we used
the antisense primers corresponding to mouse
-actin (GenBankTM
accession no. X03672; CACATGCCGGAGCCGTTGTCGACGACCAGC) and GAPDH
(GenBankTM accession no. M32599; TCTCCACTTTGCCACTGCAAATGGCAGCCC). The
primers were labeled with [
-32P]ATP and polynucleotide
kinase and purified by ethanol precipitation in the presence of
ammonium acetate as described (15). After resuspension in 100 µl of
TE, 3.5 µl of the labeled primer was combined with 10 µl of mouse
colon RNA, 1.5 µl of hybridization buffer, and heated for 90 min at
65 °C and then cooled to room temperature. Buffer, dNTPs,
actinomycin D, 1 unit/µl RNasin (Promega), and avian myeloblastosis
reverse transcriptase (Boehringer Mannheim) were then added to the
hybridization mixture and incubated for 1 h at 42 °C. After
RNase digestion and phenol extraction, the cDNAs were precipitated
with ethanol, washed, then resuspended in loading buffer (47.5%
formamide, 10 mM EDTA, 0.025% bromphenol blue, 0.025%
xylene cyanol FF) and denatured for 5 min at 80 °C, before
electrophoresis on an 8 M urea 8% polyacrylamide
sequencing gel. Molecular weight marker was prepared by digesting
x174 DNA (Life Technologies, Inc.) with HinfI and then 5
labeling with [
-32P]ATP (15).
Genomic Southern Blots and Chromosomal Mapping--
The
chromosomal localization of Lgals4 and Lgals6 was
mapped by restriction fragment length polymorphism (RFLP) linkage
analysis in an interspecific backcross between Mus spretus
and C57BL/6J mice ((C57BL/6J × Mus spretus) F1 × C57BL/6J) (22). At first, a Southern blot of genomic DNA from both
C57BL/6J and M. spretus digested with several different
restriction enzymes (BamHI, BglII, EcoRI, HindIII, MspI, PstI,
PvuII, SstI, TaqI, and
XbaI) was probed with either the insert from
pLgals6-1c (Fig. 1) specific for Lgals6, or the
rat galectin-4 cDNA detecting both Lgals4 and
Lgals6. MspI- and HindIII-digested DNA
resulted in different sizes of hybridizing bands from the two parental
strains (RFLPs) for the Lgals6 probe and galectin-4 cDNA
probe, respectively. DNA extracted from 66 progeny of the backcross was
cut with MspI or HindIII, electrophoresed, blotted, and hybridized with the appropriate probe. The pattern of
M. spretus-specific bands in the 66 progeny was then
compared with patterns of parental polymorphic bands observed for
other, previously mapped, genes to obtain linkage with other
markers.
 |
RESULTS AND DISCUSSION |
Cloning and Sequencing of the Gene Encoding Galectin-6--
The
clone
Lgals6 was isolated by screening a mouse (strain
129/SV) genomic
FIX-II library using rat galectin-4 cDNA as
a probe, and characterized by restriction
mapping, subcloning and sequencing as shown in Figs. 1-3
. The insert was split into two 4.8-kb
fragments and one 3.7-kb fragment by XbaI. One of the 4.8-kb fragments and the 3.7-kb fragment were subcloned into pBluescript SK+ (Stratagene), with resultant colonies (pLgals6-1 and
pLgals6-2, respectively) hybridizing to the rat galectin-4
cDNA probe (Fig. 1).

View larger version (20K):
[in this window]
[in a new window]
|
Fig. 2.
Restriction map of Lgals6 and
sequencing strategy. A schematic of the Lgals6 gene,
with coding sequences marked by boxes and exon number above. No cutting
was detected for ClaI, EcoRI, EcoRV,
SmaI, or XhoI. Arrows give obtained
sequence either on the sense strand (rightward arrow) or the
antisense strand (leftward arrow).
|
|

View larger version (59K):
[in this window]
[in a new window]

View larger version (53K):
[in this window]
[in a new window]
|
Fig. 3.
Sequence of Lgals6. Coding
regions (bold print) have their corresponding amino acid
residues written above the sequence. The underlined
nucleotides in the upstream region form a direct repeat of 29 bp,
containing the putative TATA box and transcriptional initiation site.
Asterisks are placed over E box sequences (35), and
plus signs demarcate the sequence that strongly resembles the intestinal-specific regulatory element of the apolipoprotein B gene
(32). The dollar signs and pound signs indicate
possible exon-intron and intron-exon boundary sites, respectively.
Numbering is from the first nucleotide of the translational initiation
site. Restriction sites are indicated below the pertinent sequences. Repetitive elements in introns 2, 5, 6, and 7 are designated by alternating underlines and overlines. The
consensus polyadenylation signal is indicated by asterisks
over the site.
|
|
Sequencing the ends revealed that the 4.8-kb insert of
pLgals6-1 contained
FIX-II sequence
(stippled in Fig. 1) and thus came from one end of the
Lgals6 insert, whereas the 3.7-kb insert in
pLgals6-2 lacked
FIX-II sequence and thus came from the
middle of the
Lgals6 insert (Fig. 1). Moreover, the
sequence of a DNA fragment (pLgals6-3) spanning the junction
between pLgals6-1 and pLgals6-2 inserts showed
that they are joined together and no intervening fragment had been
overlooked. Probing of Southern dot blots of pLgals6-1 and
pLgals6-2 with oligonucleotides revealed that
pLgals6-1 contained the 5
end of the gene and
pLgals6-2 contained the 3
end of the gene.
To sequence the gene, additional subclones were generated from
pLgals6-1 and pLgals6-2 as described in Fig. 1,
and sequenced with both vector-specific and gene-specific
oligonucleotide primers (Table I). The sequencing "strategy" and
restriction map are shown in Fig. 2, and the sequence in Fig. 3.
The two characterized subclones pLgals6-1 and
pLgals6-2 together contained all the galectin-6 coding
sequence (as determined in the accompanying paper (3)) encompassing
about 5,500 bp including introns. pLgals6-1 also contained
1,100 bp of upstream sequence and pLgals6-2 contained 1,800 bp of downstream sequence. This gene is named Lgals6 in
accordance with the naming of other galectin genes (23). All of the
partial galectin-6 cDNA sequence (3) was represented within
Lgals6 and was identical to the determined gene sequence
with the exception of three base changes in exon 4 (nt 384, 447, and
461 in the cDNA), which could be ascribed to the different strain
sources of the RNA and genomic DNA.
Organization of the Galectin-6 Coding Sequence--
Galectin-6 is
encoded by eight exons. Sequence alignment with other
galectins suggests that the overall
organization of the part of the genes encoding the CRDs is conserved
(Figs. 4 and 5). Thus, exons 2-4 and
6-8 of galectin-6 correspond to exons 2-4 of galectins-1 and -2 (10,
11), galectin-10 (14), and of the chicken galectin C16 (9), and exons
4-6 of galectin-3 (12, 13).

View larger version (37K):
[in this window]
[in a new window]
|
Fig. 4.
Organization of Lgals6 compared
with other galectin genes. The coding exon sequences are denoted
by boxes, stippled for sequence that is part of
the tightly folded carbohydrate-binding domain and open for
other sequence. The exon number is given in or above each box, and the
number of nucleotides in the coding sequence below each box. The first
exon of mouse Lgals3 (not shown) does not code for any
translated amino acids (12, 13). References are as follows: human
LGALS1 (10), murine Lgals1 (47), human LGALS2 (11), murine Lgals3 (12, 13), murine
Lgals6 (this paper), human CLC encoding
galectin-10 (14), and chicken C-14 gene (9).
|
|

View larger version (50K):
[in this window]
[in a new window]
|
Fig. 5.
Comparison of exon boundaries within the
carbohydrate binding domains of several galectins. The galectin-6
amino acid sequence has been aligned to the sequence of human
galectins-1 and -2 (10, 11) and -10 (14), and mouse galectin-3 (12, 13). Exon boundaries are indicated by vertical bars.
Conserved residues interacting with bound carbohydrate (25) are
indicated with asterisks under the sequences.
|
|
This group of three exons (stippled in Fig. 4) encode the
tightly folded canonical galectin CRDs as revealed in the crystal structures of galectins-1, -2, and -10 (2, 24-26), with the middle
exon of each group of three exons encoding all of the residues interacting directly with bound carbohydrate. The site of the boundaries of these exons within the Lgals6 sequence appear
to be highly conserved with one exception (Fig. 5). Exons 3 of
Lgals6 (encoding part of the galectin-6 domain I) and
CLC (encoding galectin-10, Ref. 14) are 30-47 nucleotides
longer than the exons encoding the corresponding part of galectins-1
and -2 (exon 3), C16 (exon 3), galectin-3 (exon 5), and domain II of
galectin-6 (exon 7), resulting in a 30-nucleotide downstream shift of
the 3
exon-intron boundary. Since exon 4 of Lgals6 has the
same length as the corresponding exons in the other galectin genes, its
3
end is also shifted downstream. Hence, in addition to the last part
of the carbohydrate-binding domain I, exon-4 of Lgals6
encodes part of the link region in galectin-6.
Exons 1 and 5 of Lgals6 encode sequences that are not part
of the tightly folded carbohydrate-binding domains (open
boxes in Fig. 4). Similarly, exon 1 in LGALS1 and
LGALS2, and the CLC gene encode the first few amino acids
that are disordered in the crystal structures of galectin-1, -2, and -10 (24-26), and exons 2 and 3 in Lgals3 encode other
domains in galectin-3 with no sequence similarity to the
carbohydrate-binding domain.
The sequence encoded by exon 5 of galectin-6 forms most of the link
region between the two CRDs; the rest of the link region is, as
mentioned above, encoded by the last part of exon 4. Considering the
high amount of sequence identity between galectin-4 and galectin-6 elsewhere (3), it is notable that galectin-6 has a link region that is
24 amino acids shorter. If this marked structural difference had arisen
because of a mutation in sequences involved in splicing, then a mutated
vestige of the "missing" 72 nt should be found within intron 4. However, the complete sequencing of intron 4 gave no evidence for such
a sequence. Hence, either the galectin-6 gene underwent a deletion in
its evolution or the galectin-4 gene had an insertion or
duplication.
For another bi-CRD galectin, galectin-9, a variation of link region
length appears instead to be caused by alternative splicing. In this
case, alternative splicing was proposed to account for the insertion of
93 nucleotides coding for an additional 31 amino acids at the beginning
of the link region (Ref. 8; see also Fig. 3 of the accompanying paper
(3)).
Confirmation of the Translation Start Site and Identification of a
Primary Transcription Initiation Site--
In the accompanying paper
(3), the start site of the galectin-6 coding sequence was only
tentatively assigned based on analogy with galectin-4. To substantiate
this matter, we sought further evidence based on the genomic
sequence.
Computer analysis of the entire Lgals6 sequence using the
program FGENEH,2 which tries
to reconstruct coding sequence by searching for spliceable open reading
frames and other criteria (27), predicted nt 1 as the translation start
site. The few ATG codons in the preceding sequence are unlikely to act
as translation start sites because they are followed by multiple
in-frame stop codons. HSPL, another program available at the same web
site2 that is specifically designed to identify intron/exon
boundaries, also did not predict any splicing within the upstream
region that would remove these stop codons.
Visual identification and confirmation by the TSSW program2
located two possible promoters with TATA boxes at
475 and
79 nt.
TSSW tries to predict promoters by weighing together the
likelihood of a large number of transcription factor binding sites (28) using a modification of the method of Prestridge et
al. (29). No other promoters were predicted within the entire
Lgals6 sequence. The location of the suggested promoter at
79 nt is consistent with a transcription initiation site at about
50 nt and translation initiation site at ATG at nt 1-3. The location
of the promoter at
475 nt predicts transcription initiation at about
-450 nt but, as mentioned above, translation initiation at nt 1 is most likely the case here as well.
To identify the major transcription initiation site(s), we performed a
primer extension experiment. With an antisense primer (mG6Q, Table I)
hybridizing with sequence between nt 62 and 32 downstream of the
putative translational start codon, a 113-nt primer extension product
was generated (Fig. 6), which would
correspond to a transcription start site at nt
51. No longer products
were detected. In control experiments, the size of the longest primer extension products using an actin-specific primer and a GAPDH-specific primer agreed with the reported transcriptional initiation sites (Fig.
6). Moreover, the predicted transcriptional start site for galectin-6
is 24 nt downstream of the TATA box at nt
79, and conforms well with
the consensus transcriptional initiation site (30).

View larger version (26K):
[in this window]
[in a new window]
|
Fig. 6.
Primer extension analysis. Mouse colon
RNA was reverse transcribed with an antisense primer specific for actin
mRNA (lane a), galectin-6 mRNA (lane b),
and GAPDH mRNA (lane c). The cDNAs produced were
electrophoresed along with the molecular weight markers
(HinfI-digested X174, labeled with 32P; sizes
indicated to the right). The arrowhead indicates
the major galectin-6 cDNA formed (113 nt). A schematic is shown at the top. The distance between the 5 end of the antisense primers and
the ATG is shown to the right, and the length of the 5
untranslated RNA is shown to the left as reported for
-actin (48) and GAPDH (49), and deduced here for galectin-6.
|
|
Some of the primer extension product in lane b of Fig. 6 may
be due to galectin-4 because two recently reported mouse galectin-4 expressed sequence tags (GenBankTM accession numbers AA265412 and
AA499921) suggest that mouse galectin-4 is almost identical to
galectin-6 between about
50 nt and 62 nt. However, even if this were
the case, the 113-nt product must also derive from galectin-6 since no
other primer extension product was found. Moreover, the amount of
galectin-4 and galectin-6 mRNAs are within the same magnitude (3)
and therefore, both would be detected in this experiment.
In conclusion, the main transcription start site for galectin-6
mRNA in normal adult colon is probably at
51 nt. Since the distal
putative promoter (at
475 nt) lies within a 29-bp direct repeat of
the sequence of the confirmed proximal promoter, it is reasonable that
it would be active as well, perhaps under other physiological
conditions and other parts of the intestine.
The translational initiation site in the transcript from the proximal
promoter is predicted by the rules of Kozak (31) to be the ATG at nt
1-3 since this is the first ATG and it is also in a favorable context.
As with all other known galectins, we found no evidence for a signal
sequence or transmembrane sequence in the galectin-6 gene. This
indicates that galectin-6, like other galectins, is expressed mainly as
a soluble cytosolic protein, but may be secreted by non-classical
mechanisms (2).
Upstream Regulatory Elements--
In the accompanying report, we
provide extensive evidence that expression of galectin-6 is limited to
the gastrointestinal tract. We therefore searched the upstream region
for the presence of any regulatory elements that are involved in
tissue-specific expression of other intestinally expressed genes. We
found a sequence between bp
354 through bp
367 (indicated by + signs in Fig. 3) that is 72% identical to part of a 19-bp sequence
within the apolipoprotein B upstream region that has been implicated in
intestine-specific expression of this protein (32). This element is a
strongly positive inducer of expression together with other sequences, and can also by itself confer expression of a reporter gene in the
intestinal cell line Caco-2, as well as in the hepatoma HepG2. Screening of the upstream region against a data base of mammalian transcription factor binding sites using MatInspector
(33)3 revealed a wide variety
of well known possible regulatory elements. Notable among those are six
E boxes (at bp
70,
295,
336,
382,
415, and
466, indicated
by asterisks in Fig. 3). One resembled a MycMax binding
site, whereas others resembled MyoD binding sites. Such E boxes have
been implicated in the regulation of gene expression in proliferating
and differentiating epithelial cells (see, e.g., Refs. 34
and 35), but also expression of other genes in other tissues. Although
the upstream sequences of Lgals6 do not permit prediction of
the regulation of galectin-6 expression without further experiments,
these sequences are clearly different from upstream regions of the
genes encoding galectin-1 and -2 (10, 11) or galectin-3 (12, 13).
In addition, it is clear that the regulatory elements governing the two
promoters in Lgals6 differ, suggesting that they may respond
to different environmental or developmental stimuli. It is noteworthy
that the mouse Lgals3 gene encoding galectin-3 contains two
promoters as well (12, 13), generating two different mRNAs encoding
the same protein (36) but under different regulation (37).
Untranslated 3
Sequence--
The sequence 3
of the stop codon in
Lgals6 is very similar to the 3
-untranslated sequence of
rat galectin-4 (Ref. 16; see also Fig. 2 in the accompanying paper (3))
up to a consensus polyadenylation signal AATAAA 51 bp after the
termination codon. Downstream of the polyadenylation signal there is a
(GT)26 dinucleotide repeat. Besides sometimes being useful
as polymorphic markers, such GT repeats have been implicated in message
processing (38). GT repeats also may form Z-DNA (39), which binds
specific proteins (40) and may modify nucleosome structure (41),
thereby affecting transcription.
Introns--
When the Lgals6 sequence was plotted in a
dot matrix plot against
itself,4 several repetitive
sequences were revealed.
The last 100 bp of intron-2 consist of an almost perfect 50-bp tandem
duplication (Figs. 3 and 7,
top). The sequence of this repeat did not resemble any other
known repeated sequence. It ends at the splice acceptor site and
encodes an open reading frame, which, however, is out of frame with
exon-3.

View larger version (20K):
[in this window]
[in a new window]
|
Fig. 7.
Repeated sequences in intron 2 and intron
6. For intron 2 the first copy is shown at the top, and
for intron 6 a consensus is shown at the top. Below are
shown the repeated sequence(s) with identical residues indicated by a
dot, gaps by a dash, and indeterminate
nucleotides by an X. For intron 6, the numbers
along the left refer to repeat number (1-5 adjacent to exon
6 and n-3 to n adjacent to exon 7), and (//) indicates the part of the
intron that was not sequenced.
|
|
All the known sequence of intron-6, except for the first 3 nt and last
40 nt, consists of a 30-nt repeating sequence (Fig. 7,
bottom). This repeating sequence has not been reported
before, but it resembles a mouse mini-satellite DNA (42).
Intron 7 contains seven repeats of the pentanucleotide ACCTC. The ACCTC
sequence occurs as six tandem repeats in the opposite orientation in
intron 3 of the mouse NCAM gene (43), but the significance is unknown.
The remainder of intron 7 3
of the pentanucleotide repeat also
contains repetitive sequence consisting of about 80% C and 20% T on
the sense strand. This region was remarkably refractory to sequencing
by the standard protocols. We were able to read this sequence only when
we used the protocol described by McCrea et al. (20), which
employs a tailing chase to dilute prematurely terminated chains.
Two Distinct Genes Encoding Galectin-4 and
Galectin-6--
Although galectin-4 and galectin-6 are very similar,
the distribution of differences along the whole coding sequence
suggests that they are encoded by separate genes rather than being
alleles or products of alternative splicing. We confirmed this by
isolating a fragment of the galectin-4 gene by PCR from the genomic DNA of the same homozygous mouse strain, 129/SV, from which we isolated the
galectin-6 gene. The coding sequence of the galectin-4 gene fragment
was identical to the overlapping parts of the galectin-4 cDNA
clones (3), and showed the expected differences from galectin-6 coding
sequence (Fig. 8). Surprisingly, some
intronic sequence is also remarkably similar between the two genes,
suggesting that Lgals6 and Lgals4 must have
diverged relatively recently.

View larger version (48K):
[in this window]
[in a new window]
|
Fig. 8.
Comparison of a fragment of
Lgals4 with Lgals6. The relevant part of
the Lgals6 sequence (pLgals6-1) is aligned with a
cloned fragment of Lgals4 isolated by PCR
(pLgals4-1). Residues identical to the corresponding
position in Lgals6 are indicated by a dot, and
gaps by a dash. The bottom sequence is the overlapping part
of a galectin-4 cDNA clone (pmG4-2), demonstrating that
pLgals4-1 is indeed a fragment of a gene encoding
galectin-4 and not galectin-6. (//) indicates the parts of intron 2 that were not sequenced.
|
|
Further proof of the existence of two separate genes is provided by
genomic Southern blots. When EcoRI-digested mouse DNA was
hybridized with an upstream Lgals6-specific probe (the
insert of pLgals6-1c, Fig. 1), one band was observed (Fig.
9, lanes a-c), whereas with a
rat galectin-4 cDNA probe that recognizes both genes, two bands
were observed (lanes g-i). Since there are no EcoRI sites within the Lgals6 gene (Fig. 2), the
second cDNA-detected band must correspond to Lgals4.
Again, for HindIII-digested mouse DNA, only one band is
detected by the Lgals6-specific probe (Fig. 9, lanes
d-f), whereas additional stronger bands are detected by the rat
galectin-4 cDNA probe (lanes j-l). These data can only be explained by the presence of two genes that are highly
homologous.

View larger version (40K):
[in this window]
[in a new window]
|
Fig. 9.
Southern blot of mouse genomic DNA probed
with galectin-6- and galectin-4/6-specific probes. Experimental
components are given above the lanes. Mouse genomic DNA from strain
C57BL/6J (B), M. spretus (S), or the
F1 hybrid (F), digested with EcoRI or
HindIII (indicated as Hind), was probed with an
upstream fragment of the galectin-6 gene that is specific for the
galectin-6 gene, or rat galectin-4 cDNA (prG4) that hybridizes to
both the galectin-4 and galectin-6 genes. The EcoRI
fragments were about 8 kb and 5.0 kb. The top HindIII band
hybridizing with pLgals6-1c was above 23 kb, and the other
HindIII fragments were about 4.9 kb and 3.2 kb.
|
|
Chromosomal Localization of Genes Encoding Galectin-4 and
Galectin-6--
The chromosomal location of Lgals6 was
mapped by linkage analysis of RFLPs in an interspecific backcross
between M. spretus and C57BL/6J (22). The
Lgals6-specific upstream probe detects one unique band in
EcoRI and HindIII digested DNA from either parent
or F1 hybrids (Fig. 9). An RFLP found for the restriction enzyme
MspI (not shown) was used for mapping. A Southern blot of
MspI-digested DNA from 66 offspring of backcrosses of the F1 with the C57BL/6J parental produced a pattern that was most coincident with several markers on chromosome 7. The frequency of differences was
used to calculate distances from Lgals6 to these
markers (Fig. 10).

View larger version (16K):
[in this window]
[in a new window]
|
Fig. 10.
Lgals6 and Lgals4 are
clustered on proximal mouse chromosome 7. The figure shows a
representation of mouse chromosome 7, with the centromere at the top,
indicating locations of genetic markers typed for this study. The
markers were located by analysis of an interspecific backcross
((C57BL/6J × Mus spretus) F1 × C57BL/6J). The
ratios of the number of recombinants/the total number of informative mice, and the recombination frequencies ± standard errors (in centimorgans) for each pair of loci are indicated to the
left of the chromosome. For pairs of loci that cosegregate,
the upper 95% confidence interval is shown in parentheses.
Ucla markers were reported by Warden et al. (22)
or are unpublished data. References for other linked loci can be
obtained from the Mouse Genome Database (Mouse Genome Informatics
Project, The Jackson Laboratory, Bar Harbor, ME; available via World
Wide Web (URL: http//www.informatics.jax.org).
|
|
Since the galectin-4 probes we used also react with DNA encoding
galectin-6, we achieved specific mapping of Lgals4 by
analyzing a HindIII polymorphism that is detected with these
probes but not with the Lgals6-specific probe (Fig. 9,
lanes i-l), and therefore is uniquely associated with
Lgals4. The Lgals4 mapped to the same region on
chromosome 7 as Lgals6. Such close linkage was previously found for the human LGALS1 and LGALS2 genes (23)
encoding galectin-1 and galectin-2, respectively, and certain C-type
lectins (44). The mapped genes in this region on mouse chromosome 7 are
syntenic with the q13.1-13.3 region of human chromosome 19, suggesting that the human homolog(s) are likely to be found there. Interestingly, the genes encoding galectin-7 (45) and galectin-10 (the Charcot-Leyden crystal protein) (46) also map to human chromosome 19. A summary table
of galectin family genes that have been mapped is presented in Table
II.
View this table:
[in this window]
[in a new window]
|
Table II
Chromosomal location of mapped genes encoding galectins
References for genes are as follows: human LGALS1 and
LGALS2 (23), murine Lgals1 (51), human
LGALS3 (52), mouse Lgals4 and Lgals6
(this paper), rat Lgals5 (50), human LGALS7 (45), and the human galectin-10 (Charcot Leyden Crystal protein) gene (46).
|
|
 |
FOOTNOTES |
*
This work was supported by grants from the Cigarette and
Tobacco Surtax Fund of the State of California through the
Tobacco-Related Disease Research Program of the University of
California (to H. L.) and by Grant HL38627 from the National
Institutes of Health (to S. H. B.).The costs of publication of this
article were defrayed in part by the
payment of page charges. The article
must therefore be hereby marked
"advertisement" in
accordance with 18 U.S.C. Section
1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) AF026796, AF026797, AF026798, and AF026799.
§
Present address: United States Department of Agriculture,
Agricultural Research Station, Western Regional Research Center-CIU, Albany, CA 94710.
**
To whom correspondence should be addressed. Present address: Inst.
of Medical Microbiology, Dept.Clinical Immunology, Sölvegatan 23, S 22362 LUND, Sweden. Tel.: 46-46-173274; Fax: 46-46-137468; E-mail:
hakon.leffler{at}mmb.lu.se.
1
The abbreviations used are: CRD, carbohydrate
recognition domain; PCR, polymerase chain reaction; RFLP, restriction
fragment length polymorphism; bp, base pair(s); nt, nucleotide(s); kb, kilobase pair(s); GAPDH, glyceraldehyde-3-phosphate
dehydrogenase.
2
This program is available via the World Wide Web
(URL: http://dot.imgen.bcm.tmc.edu:9331/gene-finder/gf.html).
3
This program is available via the World Wide Web
(URL: http://www.gsf.de/cgi-bin/matsearch.pl).
4
This program is available via the World Wide Web
(URL: http://alces.med.umn.edu/rawdot.html).
 |
REFERENCES |
-
Barondes, S. H.,
Castronovo, V.,
Cooper, D. N. W.,
Cummings, R. D.,
Drickamer, K.,
Feizi, T.,
Gitt, M. A.,
Hirabayashi, J.,
Hughes, R. C.,
Kasai, K.,
Leffler, H.,
Liu, F.-T.,
Lotan, R.,
Mercurio, A. M.,
Monsigny, M.,
Pillai, S.,
Poirier, F.,
Raz, A.,
Rigby, P.,
Rini, J. M.,
Wang, J. L.
(1994)
Cell
76,
597-598[Medline]
[Order article via Infotrieve]
-
Barondes, S. H.,
Cooper, D. N. W.,
Gitt, M. A.,
Leffler, H.
(1994)
J. Biol. Chem.
269,
20807-20810[Free Full Text]
-
Gitt, M. A.,
Colnot, C.,
Poirier, F.,
Barondes, S. H.,
Leffler, H.
(1998)
J. Biol. Chem.
273,
2954-2960[Abstract/Free Full Text]
-
Leffler, H.
(1997)
Trends Glycosci. Glycotechnol.
9,
9-19
-
Hadari, Y. R.,
Paz, K.,
Dekel, R.,
Mestrovic, T.,
Accili, D.,
and Zick, Y.
(1995)
J. Biol. Chem.
270,
3447-3453[Abstract/Free Full Text]
-
Su, Z. Z.,
Lin, J.,
Shen, R.,
Fisher, P. E.,
Goldstein, N. I.,
Fisher, P. B.
(1996)
Proc. Natl. Acad. Sci. U. S. A.
93,
7252-7[Abstract/Free Full Text]
-
Tureci, O.,
Schmitt, H.,
Fadle, N.,
Pfreundschuh, M.,
and Sahin, U.
(1997)
J. Biol. Chem.
272,
6416-6422[Abstract/Free Full Text]
-
Wada, J.,
and Kanwar, Y. S.
(1997)
J. Biol. Chem.
272,
6078-6086[Abstract/Free Full Text]
-
Ohyama, Y.,
and Kasai, K.
(1988)
J. Biochem. (Tokyo)
104,
173-177[Abstract]
-
Gitt, M. A.,
and Barondes, S. H.
(1991)
Biochemistry
30,
82-89[Medline]
[Order article via Infotrieve]
-
Gitt, M. A.,
Massa, S.,
Leffler, H.,
and Barondes, S. H.
(1992)
J. Biol. Chem.
267,
10601-10606[Abstract/Free Full Text]
-
Gritzmacher, C. A.,
Mehl, V. S.,
and Liu, F.-T.
(1992)
Biochemistry
31,
9533-9538[Medline]
[Order article via Infotrieve]
-
Rosenberg, I. M.,
Iyer, R.,
Cherayil, B.,
Chiodino, C.,
and Pillai, S.
(1993)
J. Biol. Chem.
268,
12393-12400[Abstract/Free Full Text]
-
Dyer, K. D.,
Handen, J. S.,
and Rosenberg, H. F.
(1997)
Genomics
40,
217-221[CrossRef][Medline]
[Order article via Infotrieve]
-
Maniatis, T.,
Fritsch, E. F.,
and Sambrook, J.
(1982)
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
-
Oda, Y.,
Herrmann, J.,
Gitt, M. A.,
Turck, C.,
Burlingame, A. L.,
Barondes, S. H.,
Leffler, H.
(1993)
J. Biol Chem.
268,
5929-5939[Abstract/Free Full Text]
-
Feinberg, A. P.,
and Vogelstein, B.
(1984)
Anal. Biochem.
137,
266-267[Medline]
[Order article via Infotrieve]
-
Sanger, F.,
Nicklen, S.,
and Coulson, A. R.
(1977)
Proc. Natl. Acad. Sci. U. S. A.
74,
5463-5467[Abstract]
-
Kraft, R.,
Tardiff, J.,
Krauter, K. S.,
Leinwand, L. A.
(1988)
BioTechniques
6,
544-549 [Medline]
[Order article via Infotrieve]
-
McCrea, K. W.,
Marrs, C. F.,
and Gilsdorf, J. R.
(1993)
BioTechniques
15,
843-844 [Medline]
[Order article via Infotrieve]
-
Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., and Struhl, K. (eds) (1992) Current Protocols in Molecular Biology, Suppl. 20, Greene
Publishing Associates/John Wiley and Sons, Inc., New York
-
Warden, C. H.,
Mehrabian, M. M.,
He, K.-Y.,
Yoon, M.-Y.,
Diep, A.,
Xia, Y.-R.,
Svenson, K. L.,
Sparkes, R. S.,
Lusis, A. J.
(1993)
Genomics
18,
295-307[CrossRef][Medline]
[Order article via Infotrieve]
-
Mehrabian, M.,
Gitt, M. A.,
Sparkes, R. S.,
Leffler, H.,
Barondes, S. H.,
Lusis, A. J.
(1993)
Genomics
15,
418-420[CrossRef][Medline]
[Order article via Infotrieve]
-
Liao, D. I.,
Kapadia, G.,
Ahmed, H.,
Vasta, G. R.,
Herzberg, O.
(1994)
Proc. Natl. Acad. Sci. U. S. A.
91,
1428-1432[Abstract]
-
Lobsanov, Y. D.,
Gitt, M. A.,
Leffler, H.,
Barondes, S. H.,
Rini, J. M.
(1993)
J. Biol. Chem.
268,
27034-27038[Abstract/Free Full Text]
-
Leonidas, D. D.,
Elbert, B. L.,
Zhou, Z.,
Leffler, H.,
Ackerman, S. J.,
Acharya, K. R.
(1995)
Structure
3,
1379-1393[Medline]
[Order article via Infotrieve]
-
Solovyev, V. V.,
Salamov, A. A.,
and Lawrence, C. B.
(1995)
in
Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology (Rawling, C., Clark, D., Altman, R., Hunter, L., Lengauer, T., and Wodak, S., eds), pp. 367-375, AAAI Press, Cambridge, United Kingdom
-
Wingender, E.
(1994)
J. Biotechnol.
35,
273-280[CrossRef][Medline]
[Order article via Infotrieve]
-
Prestridge, D. S.
(1995)
J. Mol. Biol.
249,
923-932[CrossRef][Medline]
[Order article via Infotrieve]
-
Breathnach, R.,
and Chambon, P.
(1981)
Annu. Rev. Biochem.
50,
349-383[CrossRef][Medline]
[Order article via Infotrieve]
-
Kozak, M.
(1991)
J. Cell Biol.
115,
887-903[Abstract]
-
Carlsson, P.,
and Bjursell, G.
(1989)
Gene (Amst.)
77,
113-121[Medline]
[Order article via Infotrieve]
-
Quandt, K.,
Frech, K.,
Karas, H.,
Wingender, E.,
and Werner, T.
(1995)
Nucleic Acids Res.
23,
4878-4884[Abstract]
-
Hurlin, P. J.,
Foley, K. P.,
Ayer, D. E.,
Eisenman, R. N.,
Hanahan, D.,
Arbeit, J. M.
(1995)
Oncogene
11,
2487-2501[Medline]
[Order article via Infotrieve]
-
Wheeler, M. B.,
Nishitani, J.,
Buchan, A. M. J.,
Kopin, A. S.,
Chey, W. Y.,
Chang, T.-M.,
Leiter, A. B.
(1992)
Mol. Cell. Biol.
12,
3531-3539[Abstract]
-
Cherayil, B. J.,
Weiner, S. J.,
and Pillai, S.
(1989)
J. Exp. Med.
170,
1959-72[Abstract]
-
Voss, P. G.,
Tsay, Y. G.,
and Wang, J. L.
(1994)
Glycoconjugate J.
11,
353-362[Medline]
[Order article via Infotrieve]
-
McLauchlan, J.,
Gaffney, D.,
Whitton, J. L.,
Clements, J. B.
(1985)
Nucleic Acids Res.
13,
1347-1368[Abstract]
-
Taboury, J. A.,
and Taillandier, E.
(1985)
Nucleic Acids Res.
13,
4469-4483[Abstract]
-
Leith, I. R.,
Hay, R. T.,
and Russell, W. C.
(1988)
Nucleic Acids Res.
16,
8277-8289[Abstract]
-
Casanovas, J. M.,
and Azorin, F.
(1987)
Nucleic Acids Res.
15,
8899-8918[Abstract]
-
Beridze, T.
(1986)
Satellite DNA, Springer-Verlag, Berlin
-
Barbas, J. A.,
Chaix, J.-C.,
Steinmetz, M.,
and Goridis, C.
(1988)
EMBO J.
7,
625-632[Abstract]
-
Watson, M. L.,
Kingsmore, S. F.,
Johnston, G. I.,
Siegelman, M. H.,
Lebeau, M. M.,
Lemons, R. S.,
Bora, N. S.,
Howard, T. A.,
Weissman, I. L.,
McEver, R. P.,
Seldin, M. F.
(1990)
J. Exp. Med.
172,
263-272[Abstract]
-
Madsen, P.,
Rasmussen, H. H.,
Flint, T.,
Gromov, P.,
Kruse, T. A.,
Honore, B.,
Vorum, H.,
Celis, J. E.
(1995)
J. Biol. Chem.
270,
5823-5829[Abstract/Free Full Text]
-
Mastrianni, D. M.,
Eddy, R. L.,
Rosenberg, H. F.,
Corrette, S. E.,
Shows, T. B.,
Tenen, D. G.,
Ackerman, S. J.
(1992)
Genomics
13,
240-242[Medline]
[Order article via Infotrieve]
-
Chiariotti, L.,
Wells, V.,
Bruni, C. B.,
Mallucci, L.
(1991)
Biochim. Biophys. Acta
1089,
54-60[Medline]
[Order article via Infotrieve]
-
Nudel, U.,
Zakut, R.,
Shani, M.,
Neuman, S.,
Levy, Z.,
and Yaffe, D.
(1983)
Nucleic Acids Res.
11,
1759-1771[Abstract]
-
Fort, P.,
Marty, L.,
Piechaczyk, M.,
el Sabrouty, S.,
Dani, C.,
Jeanteur, P.,
Blanchard, J. M.
(1985)
Nucleic Acids Res.
13,
1431-1442[Abstract]
-
Gitt, M. A.,
Wiser, M. F.,
Leffler, H.,
Herrmann, J.,
Xia, Y.-R.,
Massa, S. M.,
Cooper, D. N. W.,
Lusis, A. J.,
Barondes, S. H.
(1995)
J. Biol. Chem.
270,
5032-5038[Abstract/Free Full Text]
-
Baldini, A.,
Gress, T.,
Patel, K.,
Muresu, R.,
Chiariotti, L.,
Williamson, P.,
Boyd, Y.,
Casciano, I.,
Wells, V.,
Bruni, C. B.,
Mallucci, L,
Siniscalco, M.
(1993)
Genomics
15,
216-218[CrossRef][Medline]
[Order article via Infotrieve]
-
Raimond, J.,
Zimonjic, D. B.,
Mignon, C.,
Mattei, M.,
Popescu, N. C.,
Monsigny, M.,
Legrand, A.
(1997)
Mamm. Genome
8,
706-707[CrossRef][Medline]
[Order article via Infotrieve]
Copyright © 1998 by The American Society for Biochemistry and Molecular Biology, Inc.