Sequence, Structure, and Chromosomal Mapping of the Mouse Lgals6 Gene, Encoding Galectin-6*

Michael A. GittDagger §, Yu-Rong Xia, Robert E. AtchisonDagger , Aldons J. Lusis, Samuel H. BarondesDagger , and Hakon LefflerDagger par **

From the Dagger  Center for Neurobiology and Psychiatry, Department of Psychiatry, and par  Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94143-0984 and the  Department of Medicine and Department of Microbiology and Molecular Genetics, UCLA School of Medicine, Los Angeles, California 90024

    ABSTRACT
Top
Abstract
Introduction
Procedures
Results & Discussion
References

In the accompanying paper (Gitt, M. A., Colnot, C., Poirier, F., and Barondes, S. H., and Leffler, H. (1998) J. Biol. Chem. 273, 2954-2960), we reported that mouse gastrointestinal tract specifically expresses two closely related galectins, galectins-4 and -6, each with two carbohydrate recognition domains in the same peptide. Here, we report the isolation, characterization, and chromosomal mapping of the complete mouse Lgals6 gene, which encodes galectin-6, and of a fragment of a distinct gene, Lgals4, which encodes galectin-4. The coding sequence of galectin-6 is specified by eight exons. The upstream region contains two putative promoters. Both Lgals6 and the closely related Lgals4 are clustered together about 3.2 centimorgans proximal to the apoE gene on mouse chromosome 7. The syntenic human region is 19q13.1-13.3.

    INTRODUCTION
Top
Abstract
Introduction
Procedures
Results & Discussion
References

Galectins (1, 2) are a family of proteins that have at least one carbohydrate recognition domain (CRD)1 with conserved sequence elements and affinity for beta -galactosides. Although each galectin is abundantly expressed in only a few cell types, the distributions of the best studied galectins, galectin-1 and galectin-3, encompass a wide range of tissues and change during embryogenesis. In the accompanying paper (3), we have reported a much more restricted expression of two other galectins, galectin-4 and galectin-6, to the gastrointestinal tract both in fetal and adult mice. Galectin-4 and the newly discovered galectin-6 (3) are closely related and belong to a subfamily of galectins with two CRDs within one peptide chain, joined by a link region of variable length (4), which also includes galectin-8 (5, 6) and galectin-9 (7, 8). We here report the isolation and structure of Lgals6, the gene encoding galectin-6, and show its relationship to the structure of genes encoding galectins with a single CRD (9-14), as well as features of the upstream region that may account for the expression of galectin-6 in the gastrointestinal tract. We also demonstrate that the Lgals4 gene encoding galectin-4 is distinct from Lgals6, and that these two genes are very close together on mouse chromosome 7.

    EXPERIMENTAL PROCEDURES
Top
Abstract
Introduction
Procedures
Results & Discussion
References

Materials and General Methods-- Unless otherwise indicated, all nucleic acid enzymes were obtained from Boehringer Mannheim and all chemicals were from Sigma. Nitrocellulose filters were from Schleicher & Schuell, and Magnagraph nylon filters for blotting were purchased from Micron Separations Inc. (Westboro, MA). [alpha -32P]Deoxycytidine 5'-triphosphate (3000 Ci/mmol) and [35S]deoxyadenosine 5'-(alpha -thio)triphosphate (1000-1500 Ci/ml, sequencing grade) were purchased from NEN Life Science Products. For general molecular biological techniques such as hybridization screening, restriction, gel electrophoresis, blotting, and elution, we followed protocols collected by Maniatis et al. (15).

Oligonucleotides and Polymerase Chain Reactions (PCR)-- Oligonucleotides are listed in Table I. For probing of Southern blots, the oligonucleotides were labeled with digoxigenin by 3' tailing using digoxigenin-11-dideoxyUTP and terminal deoxynucleotide transferase, and visualized by chemiluminescence after treatment with conjugated anti-digoxigenin and using reagents and procedures from Boehringer Mannheim. Hybridization was done at 37 °C in hybridization buffer (200 mM Na2HPO4, pH 7.2, 7% SDS, 1% bovine serum albumin, 15% formamide, 1 mM EDTA), and blots were washed for 10 min at room temperature in 2 × SSC, 1% SDS.

                              
View this table:
[in this window]
[in a new window]
 
Table I
Oligonucleotides

PCR was carried out using Ampli-Taq (Perkin-Elmer). One µl of different dilutions of template was mixed with 25 pmol of each primer, buffer (10 mM Tris-HCl, pH 8.3, 50 mM KCl, 1.5 mM MgCl2, 0.001% (w/v) gelatin), and 250 µM deoxynucleotides. Amplification consisted of 45 cycles: 40 s of denaturation at 96 °C for the first five cycles and 94 °C for the remaining cycles, 1 min of annealing at 60 °C, and 1-4 min of extension at 72 °C. Amplified fragments were visualized and purified on 1% agarose gels, stained with ethidium bromide. Excised fragments were electroeluted, phenol-extracted, and precipitated with ethanol.

Isolation of Lgals6 and Subcloning-- A mouse genomic DNA (strain 129/SV) library in lambda FIXII (Stratagene, La Jolla, CA) was screened with a cDNA probe containing all the coding sequence but no untranslated sequence of rat galectin-4 (16). The probe was labeled with [alpha -32P]dCTP by random primer polymerization (17) and used in hybridization screening (15) of approximately 1 × 106 plaques using Escherichia coli SRB as host. The hybridization was done in hybridization buffer (see above) plus 20% dextran sulfate at 52 °C with 2.4 × 105 cpm/ml probe. Washes were done at the hybridization temperature, first in 2 × SSC (15), 1% SDS, then in 0.2 × SSC, 0.1% SDS, 30 min each. After drying, the filters were autoradiographed, using X-Omat film (Eastman Kodak Co.) and intensifying screens at -70 °C.

One phage clone, lambda Lgals6, was isolated by plaque purification, and its DNA was purified from high titer liquid culture. The lysate was centrifuged at 6000 × g for 20 min, and the supernatant was treated with 10 µg/ml DNase and 20 µg/ml RNase, after which the phage were precipitated for 1 h at 4 °C with 10% PEG 8000 in 5 mM Tris-HCl, pH 7.5, 0.5 M NaCl, 5 mM MgSO4 (final concentrations). The pellet was resuspended in 10 mM Tris-HCl, pH 7.5, 10 mM MgSO4, and extracted with phenol and chloroform. Finally, the phage DNA was precipitated with isopropanol and resuspended in 10 mM Tris-HCl, pH 7.5, 10 mM NaCl, 1 mM EDTA.

The purified DNA from lambda Lgals6 was digested by XbaI, and the two of the resulting three fragments that hybridized with the rat galectin-4 cDNA probe were subcloned into pBluescript SK+ (Stratagene, La Jolla, CA), generating clones pLgals6-1 and pLgals6-2. pLgals6-3 containing a DNA fragment spanning the junction between clones pLgals6-1 and pLgals6-2 was isolated by PCR between primers mG6M and rG4F using lambda Lgals6 as template followed by cloning into pCRII (Invitrogen, San Diego, CA).

Further subcloning of fragments of these Xba fragments is described in Fig. 1. Clones pLgals6-1a, pLgals6-2b, pLgals6-2c, and pLgals6-2f were generated by ApaI, HindIII, PstI, and SstI digestion, respectively, of the appropriate plasmid, followed by religation. DNA fragments from pLgals6-2 were subcloned into pBluescript yielding clones pLgals6-2a (2-kb HindIII fragment), pLgals6-2d (1-kb HincII fragment), pLgals6-2e (400-bp NcoI fragment), and pLgals6-2g (800-bp PvuII/PstI fragment). Additional fragments were generated by PCR and analyzed directly (fLgals6-1b) or cloned into pCRII (Invitrogen)(pLgals6-1c and pLgals6-2h). fLgals6-1b and pLgals6-2h were from PCR between the primer pairs mG6F/rG4C and mG6N/rG4G, respectively. To generate a clone containing upstream sequence, we used pLgals6-1 as template in PCR with the vector-specific T3 primer and the intron I-specific antisense primer mG6E. However, the resulting product, clone pLgals6-1c, contains only 227 bp of the upstream sequence because the mG6E oligonucleotide, in addition to the expected priming site in intron 1, inadvertently primed at a site within the upstream region (nt -227 through -223, AAGGG, identical to the 3'-end of this primer).


View larger version (17K):
[in this window]
[in a new window]
 
Fig. 1.   Subcloning strategy for Lgals6. The top bar represents the lambda FIX-II clone, lambda Lgals6, with insert (solid) and part of the flanking lambda  sequence (hatched). Two XbaI fragments of lambda Lgals6 were cloned into Bluescript generating clones pLgals6-1 and pLgals6-2. Further subclones obtained by restriction religation, subcloning, or PCR are shown below pLgals6-1 and pLgals6-2. fLgals6-1b was isolated as a PCR product without cloning. FIlled boxes represent exons

Sequencing-- The different subclones were sequenced using primers synthesized based on rat galectin-4 sequence (16), and later, mouse galectin-6 sequence (see Table I), as well as vector-specific primers. In most cases, we used a modification (10) of the Sanger technique (18) using Sequenase (U. S. Biochemical), as described by the manufacturer. Denatured double-stranded DNA prepared by the method of Kraft et al. (19) was the template. To eliminate artifactual banding caused by presumed secondary structure in intron 7, we used the method described by McCrea et al. (20) employing a terminal deoxynucleotidyltransferase chase after the termination reaction. All exonic regions, intron boundaries, upstream and downstream sequences were verified by sequencing on both strands, except for the 3' end of exon 6, which, because of the repetitive DNA in intron 6, was confirmed by sequencing with different primers on the same strand.

Isolation of a Fragment of Lgals4 by PCR-- Inbred mouse strain 129/SV genomic DNA (Jackson Laboratories, Bar Harbor, ME) was amplified by means of oligodeoxynucleotides representing sequences distributed throughout the galectin-4 gene. Oligonucleotides mG6F and rG4C gave clear non-cDNA-sized bands on amplification, and therefore a sample of the reaction was ligated into plasmid pCRII (Invitrogen). DNA of selected clones was sequenced using T7 and M13 reverse primers, and the gene-specific primers mG6H and mG6K to obtain sequence on both strands.

Restriction Map-- The size of each intron was determined by one of several methods. Introns 1, 4, 5, and 7 were sequenced completely. The size of intron 3 was determined by ApaI restriction digest analysis of clone pLgals6-1. Introns 2 and 6 were sized by PCR amplification between exonic primers surrounding the respective intron to generate fLgals6-1b and pLgals6-2h (Fig. 1). The identity of the PCR products was confirmed by sequencing the ends of each fragment, and the size was determined by gel electrophoresis. Intron sizes aided in the analysis of restriction digest data of both pLgals6-1 and pLgals6-2.

Primer Extension-- We used a modified version of the procedure summarized by Ausubel et al. (21). For galectin-6, we used the antisense primer mG6Q (Table I), and as controls we used the antisense primers corresponding to mouse beta -actin (GenBankTM accession no. X03672; CACATGCCGGAGCCGTTGTCGACGACCAGC) and GAPDH (GenBankTM accession no. M32599; TCTCCACTTTGCCACTGCAAATGGCAGCCC). The primers were labeled with [gamma -32P]ATP and polynucleotide kinase and purified by ethanol precipitation in the presence of ammonium acetate as described (15). After resuspension in 100 µl of TE, 3.5 µl of the labeled primer was combined with 10 µl of mouse colon RNA, 1.5 µl of hybridization buffer, and heated for 90 min at 65 °C and then cooled to room temperature. Buffer, dNTPs, actinomycin D, 1 unit/µl RNasin (Promega), and avian myeloblastosis reverse transcriptase (Boehringer Mannheim) were then added to the hybridization mixture and incubated for 1 h at 42 °C. After RNase digestion and phenol extraction, the cDNAs were precipitated with ethanol, washed, then resuspended in loading buffer (47.5% formamide, 10 mM EDTA, 0.025% bromphenol blue, 0.025% xylene cyanol FF) and denatured for 5 min at 80 °C, before electrophoresis on an 8 M urea 8% polyacrylamide sequencing gel. Molecular weight marker was prepared by digesting phi x174 DNA (Life Technologies, Inc.) with HinfI and then 5' labeling with [gamma -32P]ATP (15).

Genomic Southern Blots and Chromosomal Mapping-- The chromosomal localization of Lgals4 and Lgals6 was mapped by restriction fragment length polymorphism (RFLP) linkage analysis in an interspecific backcross between Mus spretus and C57BL/6J mice ((C57BL/6J × Mus spretus) F1 × C57BL/6J) (22). At first, a Southern blot of genomic DNA from both C57BL/6J and M. spretus digested with several different restriction enzymes (BamHI, BglII, EcoRI, HindIII, MspI, PstI, PvuII, SstI, TaqI, and XbaI) was probed with either the insert from pLgals6-1c (Fig. 1) specific for Lgals6, or the rat galectin-4 cDNA detecting both Lgals4 and Lgals6. MspI- and HindIII-digested DNA resulted in different sizes of hybridizing bands from the two parental strains (RFLPs) for the Lgals6 probe and galectin-4 cDNA probe, respectively. DNA extracted from 66 progeny of the backcross was cut with MspI or HindIII, electrophoresed, blotted, and hybridized with the appropriate probe. The pattern of M. spretus-specific bands in the 66 progeny was then compared with patterns of parental polymorphic bands observed for other, previously mapped, genes to obtain linkage with other markers.

    RESULTS AND DISCUSSION
Top
Abstract
Introduction
Procedures
Results & Discussion
References

Cloning and Sequencing of the Gene Encoding Galectin-6-- The clone lambda Lgals6 was isolated by screening a mouse (strain 129/SV) genomic lambda FIX-II library using rat galectin-4 cDNA as a probe, and characterized by restriction mapping, subcloning and sequencing as shown in Figs. 1-3 . The insert was split into two 4.8-kb fragments and one 3.7-kb fragment by XbaI. One of the 4.8-kb fragments and the 3.7-kb fragment were subcloned into pBluescript SK+ (Stratagene), with resultant colonies (pLgals6-1 and pLgals6-2, respectively) hybridizing to the rat galectin-4 cDNA probe (Fig. 1).


View larger version (20K):
[in this window]
[in a new window]
 
Fig. 2.   Restriction map of Lgals6 and sequencing strategy. A schematic of the Lgals6 gene, with coding sequences marked by boxes and exon number above. No cutting was detected for ClaI, EcoRI, EcoRV, SmaI, or XhoI. Arrows give obtained sequence either on the sense strand (rightward arrow) or the antisense strand (leftward arrow).


View larger version (59K):
[in this window]
[in a new window]
 


View larger version (53K):
[in this window]
[in a new window]
 
Fig. 3.   Sequence of Lgals6. Coding regions (bold print) have their corresponding amino acid residues written above the sequence. The underlined nucleotides in the upstream region form a direct repeat of 29 bp, containing the putative TATA box and transcriptional initiation site. Asterisks are placed over E box sequences (35), and plus signs demarcate the sequence that strongly resembles the intestinal-specific regulatory element of the apolipoprotein B gene (32). The dollar signs and pound signs indicate possible exon-intron and intron-exon boundary sites, respectively. Numbering is from the first nucleotide of the translational initiation site. Restriction sites are indicated below the pertinent sequences. Repetitive elements in introns 2, 5, 6, and 7 are designated by alternating underlines and overlines. The consensus polyadenylation signal is indicated by asterisks over the site.

Sequencing the ends revealed that the 4.8-kb insert of pLgals6-1 contained lambda FIX-II sequence (stippled in Fig. 1) and thus came from one end of the lambda Lgals6 insert, whereas the 3.7-kb insert in pLgals6-2 lacked lambda FIX-II sequence and thus came from the middle of the lambda Lgals6 insert (Fig. 1). Moreover, the sequence of a DNA fragment (pLgals6-3) spanning the junction between pLgals6-1 and pLgals6-2 inserts showed that they are joined together and no intervening fragment had been overlooked. Probing of Southern dot blots of pLgals6-1 and pLgals6-2 with oligonucleotides revealed that pLgals6-1 contained the 5' end of the gene and pLgals6-2 contained the 3' end of the gene.

To sequence the gene, additional subclones were generated from pLgals6-1 and pLgals6-2 as described in Fig. 1, and sequenced with both vector-specific and gene-specific oligonucleotide primers (Table I). The sequencing "strategy" and restriction map are shown in Fig. 2, and the sequence in Fig. 3.

The two characterized subclones pLgals6-1 and pLgals6-2 together contained all the galectin-6 coding sequence (as determined in the accompanying paper (3)) encompassing about 5,500 bp including introns. pLgals6-1 also contained 1,100 bp of upstream sequence and pLgals6-2 contained 1,800 bp of downstream sequence. This gene is named Lgals6 in accordance with the naming of other galectin genes (23). All of the partial galectin-6 cDNA sequence (3) was represented within Lgals6 and was identical to the determined gene sequence with the exception of three base changes in exon 4 (nt 384, 447, and 461 in the cDNA), which could be ascribed to the different strain sources of the RNA and genomic DNA.

Organization of the Galectin-6 Coding Sequence-- Galectin-6 is encoded by eight exons. Sequence alignment with other galectins suggests that the overall organization of the part of the genes encoding the CRDs is conserved (Figs. 4 and 5). Thus, exons 2-4 and 6-8 of galectin-6 correspond to exons 2-4 of galectins-1 and -2 (10, 11), galectin-10 (14), and of the chicken galectin C16 (9), and exons 4-6 of galectin-3 (12, 13).


View larger version (37K):
[in this window]
[in a new window]
 
Fig. 4.   Organization of Lgals6 compared with other galectin genes. The coding exon sequences are denoted by boxes, stippled for sequence that is part of the tightly folded carbohydrate-binding domain and open for other sequence. The exon number is given in or above each box, and the number of nucleotides in the coding sequence below each box. The first exon of mouse Lgals3 (not shown) does not code for any translated amino acids (12, 13). References are as follows: human LGALS1 (10), murine Lgals1 (47), human LGALS2 (11), murine Lgals3 (12, 13), murine Lgals6 (this paper), human CLC encoding galectin-10 (14), and chicken C-14 gene (9).


View larger version (50K):
[in this window]
[in a new window]
 
Fig. 5.   Comparison of exon boundaries within the carbohydrate binding domains of several galectins. The galectin-6 amino acid sequence has been aligned to the sequence of human galectins-1 and -2 (10, 11) and -10 (14), and mouse galectin-3 (12, 13). Exon boundaries are indicated by vertical bars. Conserved residues interacting with bound carbohydrate (25) are indicated with asterisks under the sequences.

This group of three exons (stippled in Fig. 4) encode the tightly folded canonical galectin CRDs as revealed in the crystal structures of galectins-1, -2, and -10 (2, 24-26), with the middle exon of each group of three exons encoding all of the residues interacting directly with bound carbohydrate. The site of the boundaries of these exons within the Lgals6 sequence appear to be highly conserved with one exception (Fig. 5). Exons 3 of Lgals6 (encoding part of the galectin-6 domain I) and CLC (encoding galectin-10, Ref. 14) are 30-47 nucleotides longer than the exons encoding the corresponding part of galectins-1 and -2 (exon 3), C16 (exon 3), galectin-3 (exon 5), and domain II of galectin-6 (exon 7), resulting in a 30-nucleotide downstream shift of the 3' exon-intron boundary. Since exon 4 of Lgals6 has the same length as the corresponding exons in the other galectin genes, its 3' end is also shifted downstream. Hence, in addition to the last part of the carbohydrate-binding domain I, exon-4 of Lgals6 encodes part of the link region in galectin-6.

Exons 1 and 5 of Lgals6 encode sequences that are not part of the tightly folded carbohydrate-binding domains (open boxes in Fig. 4). Similarly, exon 1 in LGALS1 and LGALS2, and the CLC gene encode the first few amino acids that are disordered in the crystal structures of galectin-1, -2, and -10 (24-26), and exons 2 and 3 in Lgals3 encode other domains in galectin-3 with no sequence similarity to the carbohydrate-binding domain.

The sequence encoded by exon 5 of galectin-6 forms most of the link region between the two CRDs; the rest of the link region is, as mentioned above, encoded by the last part of exon 4. Considering the high amount of sequence identity between galectin-4 and galectin-6 elsewhere (3), it is notable that galectin-6 has a link region that is 24 amino acids shorter. If this marked structural difference had arisen because of a mutation in sequences involved in splicing, then a mutated vestige of the "missing" 72 nt should be found within intron 4. However, the complete sequencing of intron 4 gave no evidence for such a sequence. Hence, either the galectin-6 gene underwent a deletion in its evolution or the galectin-4 gene had an insertion or duplication.

For another bi-CRD galectin, galectin-9, a variation of link region length appears instead to be caused by alternative splicing. In this case, alternative splicing was proposed to account for the insertion of 93 nucleotides coding for an additional 31 amino acids at the beginning of the link region (Ref. 8; see also Fig. 3 of the accompanying paper (3)).

Confirmation of the Translation Start Site and Identification of a Primary Transcription Initiation Site-- In the accompanying paper (3), the start site of the galectin-6 coding sequence was only tentatively assigned based on analogy with galectin-4. To substantiate this matter, we sought further evidence based on the genomic sequence.

Computer analysis of the entire Lgals6 sequence using the program FGENEH,2 which tries to reconstruct coding sequence by searching for spliceable open reading frames and other criteria (27), predicted nt 1 as the translation start site. The few ATG codons in the preceding sequence are unlikely to act as translation start sites because they are followed by multiple in-frame stop codons. HSPL, another program available at the same web site2 that is specifically designed to identify intron/exon boundaries, also did not predict any splicing within the upstream region that would remove these stop codons.

Visual identification and confirmation by the TSSW program2 located two possible promoters with TATA boxes at -475 and -79 nt. TSSW tries to predict promoters by weighing together the likelihood of a large number of transcription factor binding sites (28) using a modification of the method of Prestridge et al. (29). No other promoters were predicted within the entire Lgals6 sequence. The location of the suggested promoter at -79 nt is consistent with a transcription initiation site at about -50 nt and translation initiation site at ATG at nt 1-3. The location of the promoter at -475 nt predicts transcription initiation at about -450 nt but, as mentioned above, translation initiation at nt 1 is most likely the case here as well.

To identify the major transcription initiation site(s), we performed a primer extension experiment. With an antisense primer (mG6Q, Table I) hybridizing with sequence between nt 62 and 32 downstream of the putative translational start codon, a 113-nt primer extension product was generated (Fig. 6), which would correspond to a transcription start site at nt -51. No longer products were detected. In control experiments, the size of the longest primer extension products using an actin-specific primer and a GAPDH-specific primer agreed with the reported transcriptional initiation sites (Fig. 6). Moreover, the predicted transcriptional start site for galectin-6 is 24 nt downstream of the TATA box at nt -79, and conforms well with the consensus transcriptional initiation site (30).


View larger version (26K):
[in this window]
[in a new window]
 
Fig. 6.   Primer extension analysis. Mouse colon RNA was reverse transcribed with an antisense primer specific for actin mRNA (lane a), galectin-6 mRNA (lane b), and GAPDH mRNA (lane c). The cDNAs produced were electrophoresed along with the molecular weight markers (HinfI-digested phi X174, labeled with 32P; sizes indicated to the right). The arrowhead indicates the major galectin-6 cDNA formed (113 nt). A schematic is shown at the top. The distance between the 5' end of the antisense primers and the ATG is shown to the right, and the length of the 5' untranslated RNA is shown to the left as reported for beta -actin (48) and GAPDH (49), and deduced here for galectin-6.

Some of the primer extension product in lane b of Fig. 6 may be due to galectin-4 because two recently reported mouse galectin-4 expressed sequence tags (GenBankTM accession numbers AA265412 and AA499921) suggest that mouse galectin-4 is almost identical to galectin-6 between about -50 nt and 62 nt. However, even if this were the case, the 113-nt product must also derive from galectin-6 since no other primer extension product was found. Moreover, the amount of galectin-4 and galectin-6 mRNAs are within the same magnitude (3) and therefore, both would be detected in this experiment.

In conclusion, the main transcription start site for galectin-6 mRNA in normal adult colon is probably at -51 nt. Since the distal putative promoter (at -475 nt) lies within a 29-bp direct repeat of the sequence of the confirmed proximal promoter, it is reasonable that it would be active as well, perhaps under other physiological conditions and other parts of the intestine.

The translational initiation site in the transcript from the proximal promoter is predicted by the rules of Kozak (31) to be the ATG at nt 1-3 since this is the first ATG and it is also in a favorable context. As with all other known galectins, we found no evidence for a signal sequence or transmembrane sequence in the galectin-6 gene. This indicates that galectin-6, like other galectins, is expressed mainly as a soluble cytosolic protein, but may be secreted by non-classical mechanisms (2).

Upstream Regulatory Elements-- In the accompanying report, we provide extensive evidence that expression of galectin-6 is limited to the gastrointestinal tract. We therefore searched the upstream region for the presence of any regulatory elements that are involved in tissue-specific expression of other intestinally expressed genes. We found a sequence between bp -354 through bp -367 (indicated by + signs in Fig. 3) that is 72% identical to part of a 19-bp sequence within the apolipoprotein B upstream region that has been implicated in intestine-specific expression of this protein (32). This element is a strongly positive inducer of expression together with other sequences, and can also by itself confer expression of a reporter gene in the intestinal cell line Caco-2, as well as in the hepatoma HepG2. Screening of the upstream region against a data base of mammalian transcription factor binding sites using MatInspector (33)3 revealed a wide variety of well known possible regulatory elements. Notable among those are six E boxes (at bp -70, -295, -336, -382, -415, and -466, indicated by asterisks in Fig. 3). One resembled a MycMax binding site, whereas others resembled MyoD binding sites. Such E boxes have been implicated in the regulation of gene expression in proliferating and differentiating epithelial cells (see, e.g., Refs. 34 and 35), but also expression of other genes in other tissues. Although the upstream sequences of Lgals6 do not permit prediction of the regulation of galectin-6 expression without further experiments, these sequences are clearly different from upstream regions of the genes encoding galectin-1 and -2 (10, 11) or galectin-3 (12, 13).

In addition, it is clear that the regulatory elements governing the two promoters in Lgals6 differ, suggesting that they may respond to different environmental or developmental stimuli. It is noteworthy that the mouse Lgals3 gene encoding galectin-3 contains two promoters as well (12, 13), generating two different mRNAs encoding the same protein (36) but under different regulation (37).

Untranslated 3' Sequence-- The sequence 3' of the stop codon in Lgals6 is very similar to the 3'-untranslated sequence of rat galectin-4 (Ref. 16; see also Fig. 2 in the accompanying paper (3)) up to a consensus polyadenylation signal AATAAA 51 bp after the termination codon. Downstream of the polyadenylation signal there is a (GT)26 dinucleotide repeat. Besides sometimes being useful as polymorphic markers, such GT repeats have been implicated in message processing (38). GT repeats also may form Z-DNA (39), which binds specific proteins (40) and may modify nucleosome structure (41), thereby affecting transcription.

Introns-- When the Lgals6 sequence was plotted in a dot matrix plot against itself,4 several repetitive sequences were revealed.

The last 100 bp of intron-2 consist of an almost perfect 50-bp tandem duplication (Figs. 3 and 7, top). The sequence of this repeat did not resemble any other known repeated sequence. It ends at the splice acceptor site and encodes an open reading frame, which, however, is out of frame with exon-3.


View larger version (20K):
[in this window]
[in a new window]
 
Fig. 7.   Repeated sequences in intron 2 and intron 6. For intron 2 the first copy is shown at the top, and for intron 6 a consensus is shown at the top. Below are shown the repeated sequence(s) with identical residues indicated by a dot, gaps by a dash, and indeterminate nucleotides by an X. For intron 6, the numbers along the left refer to repeat number (1-5 adjacent to exon 6 and n-3 to n adjacent to exon 7), and (//) indicates the part of the intron that was not sequenced.

All the known sequence of intron-6, except for the first 3 nt and last 40 nt, consists of a 30-nt repeating sequence (Fig. 7, bottom). This repeating sequence has not been reported before, but it resembles a mouse mini-satellite DNA (42).

Intron 7 contains seven repeats of the pentanucleotide ACCTC. The ACCTC sequence occurs as six tandem repeats in the opposite orientation in intron 3 of the mouse NCAM gene (43), but the significance is unknown. The remainder of intron 7 3' of the pentanucleotide repeat also contains repetitive sequence consisting of about 80% C and 20% T on the sense strand. This region was remarkably refractory to sequencing by the standard protocols. We were able to read this sequence only when we used the protocol described by McCrea et al. (20), which employs a tailing chase to dilute prematurely terminated chains.

Two Distinct Genes Encoding Galectin-4 and Galectin-6-- Although galectin-4 and galectin-6 are very similar, the distribution of differences along the whole coding sequence suggests that they are encoded by separate genes rather than being alleles or products of alternative splicing. We confirmed this by isolating a fragment of the galectin-4 gene by PCR from the genomic DNA of the same homozygous mouse strain, 129/SV, from which we isolated the galectin-6 gene. The coding sequence of the galectin-4 gene fragment was identical to the overlapping parts of the galectin-4 cDNA clones (3), and showed the expected differences from galectin-6 coding sequence (Fig. 8). Surprisingly, some intronic sequence is also remarkably similar between the two genes, suggesting that Lgals6 and Lgals4 must have diverged relatively recently.


View larger version (48K):
[in this window]
[in a new window]
 
Fig. 8.   Comparison of a fragment of Lgals4 with Lgals6. The relevant part of the Lgals6 sequence (pLgals6-1) is aligned with a cloned fragment of Lgals4 isolated by PCR (pLgals4-1). Residues identical to the corresponding position in Lgals6 are indicated by a dot, and gaps by a dash. The bottom sequence is the overlapping part of a galectin-4 cDNA clone (pmG4-2), demonstrating that pLgals4-1 is indeed a fragment of a gene encoding galectin-4 and not galectin-6. (//) indicates the parts of intron 2 that were not sequenced.

Further proof of the existence of two separate genes is provided by genomic Southern blots. When EcoRI-digested mouse DNA was hybridized with an upstream Lgals6-specific probe (the insert of pLgals6-1c, Fig. 1), one band was observed (Fig. 9, lanes a-c), whereas with a rat galectin-4 cDNA probe that recognizes both genes, two bands were observed (lanes g-i). Since there are no EcoRI sites within the Lgals6 gene (Fig. 2), the second cDNA-detected band must correspond to Lgals4. Again, for HindIII-digested mouse DNA, only one band is detected by the Lgals6-specific probe (Fig. 9, lanes d-f), whereas additional stronger bands are detected by the rat galectin-4 cDNA probe (lanes j-l). These data can only be explained by the presence of two genes that are highly homologous.


View larger version (40K):
[in this window]
[in a new window]
 
Fig. 9.   Southern blot of mouse genomic DNA probed with galectin-6- and galectin-4/6-specific probes. Experimental components are given above the lanes. Mouse genomic DNA from strain C57BL/6J (B), M. spretus (S), or the F1 hybrid (F), digested with EcoRI or HindIII (indicated as Hind), was probed with an upstream fragment of the galectin-6 gene that is specific for the galectin-6 gene, or rat galectin-4 cDNA (prG4) that hybridizes to both the galectin-4 and galectin-6 genes. The EcoRI fragments were about 8 kb and 5.0 kb. The top HindIII band hybridizing with pLgals6-1c was above 23 kb, and the other HindIII fragments were about 4.9 kb and 3.2 kb.

Chromosomal Localization of Genes Encoding Galectin-4 and Galectin-6-- The chromosomal location of Lgals6 was mapped by linkage analysis of RFLPs in an interspecific backcross between M. spretus and C57BL/6J (22). The Lgals6-specific upstream probe detects one unique band in EcoRI and HindIII digested DNA from either parent or F1 hybrids (Fig. 9). An RFLP found for the restriction enzyme MspI (not shown) was used for mapping. A Southern blot of MspI-digested DNA from 66 offspring of backcrosses of the F1 with the C57BL/6J parental produced a pattern that was most coincident with several markers on chromosome 7. The frequency of differences was used to calculate distances from Lgals6 to these markers (Fig. 10).


View larger version (16K):
[in this window]
[in a new window]
 
Fig. 10.   Lgals6 and Lgals4 are clustered on proximal mouse chromosome 7. The figure shows a representation of mouse chromosome 7, with the centromere at the top, indicating locations of genetic markers typed for this study. The markers were located by analysis of an interspecific backcross ((C57BL/6J × Mus spretus) F1 × C57BL/6J). The ratios of the number of recombinants/the total number of informative mice, and the recombination frequencies ± standard errors (in centimorgans) for each pair of loci are indicated to the left of the chromosome. For pairs of loci that cosegregate, the upper 95% confidence interval is shown in parentheses. Ucla markers were reported by Warden et al. (22) or are unpublished data. References for other linked loci can be obtained from the Mouse Genome Database (Mouse Genome Informatics Project, The Jackson Laboratory, Bar Harbor, ME; available via World Wide Web (URL: http//www.informatics.jax.org).

Since the galectin-4 probes we used also react with DNA encoding galectin-6, we achieved specific mapping of Lgals4 by analyzing a HindIII polymorphism that is detected with these probes but not with the Lgals6-specific probe (Fig. 9, lanes i-l), and therefore is uniquely associated with Lgals4. The Lgals4 mapped to the same region on chromosome 7 as Lgals6. Such close linkage was previously found for the human LGALS1 and LGALS2 genes (23) encoding galectin-1 and galectin-2, respectively, and certain C-type lectins (44). The mapped genes in this region on mouse chromosome 7 are syntenic with the q13.1-13.3 region of human chromosome 19, suggesting that the human homolog(s) are likely to be found there. Interestingly, the genes encoding galectin-7 (45) and galectin-10 (the Charcot-Leyden crystal protein) (46) also map to human chromosome 19. A summary table of galectin family genes that have been mapped is presented in Table II.

                              
View this table:
[in this window]
[in a new window]
 
Table II
Chromosomal location of mapped genes encoding galectins
References for genes are as follows: human LGALS1 and LGALS2 (23), murine Lgals1 (51), human LGALS3 (52), mouse Lgals4 and Lgals6 (this paper), rat Lgals5 (50), human LGALS7 (45), and the human galectin-10 (Charcot Leyden Crystal protein) gene (46).

    FOOTNOTES

* This work was supported by grants from the Cigarette and Tobacco Surtax Fund of the State of California through the Tobacco-Related Disease Research Program of the University of California (to H. L.) and by Grant HL38627 from the National Institutes of Health (to S. H. B.).The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) AF026796, AF026797, AF026798, and AF026799.

§ Present address: United States Department of Agriculture, Agricultural Research Station, Western Regional Research Center-CIU, Albany, CA 94710.

** To whom correspondence should be addressed. Present address: Inst. of Medical Microbiology, Dept.Clinical Immunology, Sölvegatan 23, S 22362 LUND, Sweden. Tel.: 46-46-173274; Fax: 46-46-137468; E-mail: hakon.leffler{at}mmb.lu.se.

1 The abbreviations used are: CRD, carbohydrate recognition domain; PCR, polymerase chain reaction; RFLP, restriction fragment length polymorphism; bp, base pair(s); nt, nucleotide(s); kb, kilobase pair(s); GAPDH, glyceraldehyde-3-phosphate dehydrogenase.

2 This program is available via the World Wide Web (URL: http://dot.imgen.bcm.tmc.edu:9331/gene-finder/gf.html).

3 This program is available via the World Wide Web (URL: http://www.gsf.de/cgi-bin/matsearch.pl).

4 This program is available via the World Wide Web (URL: http://alces.med.umn.edu/rawdot.html).

    REFERENCES
Top
Abstract
Introduction
Procedures
Results & Discussion
References

  1. Barondes, S. H., Castronovo, V., Cooper, D. N. W., Cummings, R. D., Drickamer, K., Feizi, T., Gitt, M. A., Hirabayashi, J., Hughes, R. C., Kasai, K., Leffler, H., Liu, F.-T., Lotan, R., Mercurio, A. M., Monsigny, M., Pillai, S., Poirier, F., Raz, A., Rigby, P., Rini, J. M., Wang, J. L. (1994) Cell 76, 597-598[Medline] [Order article via Infotrieve]
  2. Barondes, S. H., Cooper, D. N. W., Gitt, M. A., Leffler, H. (1994) J. Biol. Chem. 269, 20807-20810[Free Full Text]
  3. Gitt, M. A., Colnot, C., Poirier, F., Barondes, S. H., Leffler, H. (1998) J. Biol. Chem. 273, 2954-2960[Abstract/Free Full Text]
  4. Leffler, H. (1997) Trends Glycosci. Glycotechnol. 9, 9-19
  5. Hadari, Y. R., Paz, K., Dekel, R., Mestrovic, T., Accili, D., and Zick, Y. (1995) J. Biol. Chem. 270, 3447-3453[Abstract/Free Full Text]
  6. Su, Z. Z., Lin, J., Shen, R., Fisher, P. E., Goldstein, N. I., Fisher, P. B. (1996) Proc. Natl. Acad. Sci. U. S. A. 93, 7252-7[Abstract/Free Full Text]
  7. Tureci, O., Schmitt, H., Fadle, N., Pfreundschuh, M., and Sahin, U. (1997) J. Biol. Chem. 272, 6416-6422[Abstract/Free Full Text]
  8. Wada, J., and Kanwar, Y. S. (1997) J. Biol. Chem. 272, 6078-6086[Abstract/Free Full Text]
  9. Ohyama, Y., and Kasai, K. (1988) J. Biochem. (Tokyo) 104, 173-177[Abstract]
  10. Gitt, M. A., and Barondes, S. H. (1991) Biochemistry 30, 82-89[Medline] [Order article via Infotrieve]
  11. Gitt, M. A., Massa, S., Leffler, H., and Barondes, S. H. (1992) J. Biol. Chem. 267, 10601-10606[Abstract/Free Full Text]
  12. Gritzmacher, C. A., Mehl, V. S., and Liu, F.-T. (1992) Biochemistry 31, 9533-9538[Medline] [Order article via Infotrieve]
  13. Rosenberg, I. M., Iyer, R., Cherayil, B., Chiodino, C., and Pillai, S. (1993) J. Biol. Chem. 268, 12393-12400[Abstract/Free Full Text]
  14. Dyer, K. D., Handen, J. S., and Rosenberg, H. F. (1997) Genomics 40, 217-221[CrossRef][Medline] [Order article via Infotrieve]
  15. Maniatis, T., Fritsch, E. F., and Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
  16. Oda, Y., Herrmann, J., Gitt, M. A., Turck, C., Burlingame, A. L., Barondes, S. H., Leffler, H. (1993) J. Biol Chem. 268, 5929-5939[Abstract/Free Full Text]
  17. Feinberg, A. P., and Vogelstein, B. (1984) Anal. Biochem. 137, 266-267[Medline] [Order article via Infotrieve]
  18. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U. S. A. 74, 5463-5467[Abstract]
  19. Kraft, R., Tardiff, J., Krauter, K. S., Leinwand, L. A. (1988) BioTechniques 6, 544-549 [Medline] [Order article via Infotrieve]
  20. McCrea, K. W., Marrs, C. F., and Gilsdorf, J. R. (1993) BioTechniques 15, 843-844 [Medline] [Order article via Infotrieve]
  21. Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., and Struhl, K. (eds) (1992) Current Protocols in Molecular Biology, Suppl. 20, Greene Publishing Associates/John Wiley and Sons, Inc., New York
  22. Warden, C. H., Mehrabian, M. M., He, K.-Y., Yoon, M.-Y., Diep, A., Xia, Y.-R., Svenson, K. L., Sparkes, R. S., Lusis, A. J. (1993) Genomics 18, 295-307[CrossRef][Medline] [Order article via Infotrieve]
  23. Mehrabian, M., Gitt, M. A., Sparkes, R. S., Leffler, H., Barondes, S. H., Lusis, A. J. (1993) Genomics 15, 418-420[CrossRef][Medline] [Order article via Infotrieve]
  24. Liao, D. I., Kapadia, G., Ahmed, H., Vasta, G. R., Herzberg, O. (1994) Proc. Natl. Acad. Sci. U. S. A. 91, 1428-1432[Abstract]
  25. Lobsanov, Y. D., Gitt, M. A., Leffler, H., Barondes, S. H., Rini, J. M. (1993) J. Biol. Chem. 268, 27034-27038[Abstract/Free Full Text]
  26. Leonidas, D. D., Elbert, B. L., Zhou, Z., Leffler, H., Ackerman, S. J., Acharya, K. R. (1995) Structure 3, 1379-1393[Medline] [Order article via Infotrieve]
  27. Solovyev, V. V., Salamov, A. A., and Lawrence, C. B. (1995) in Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology (Rawling, C., Clark, D., Altman, R., Hunter, L., Lengauer, T., and Wodak, S., eds), pp. 367-375, AAAI Press, Cambridge, United Kingdom
  28. Wingender, E. (1994) J. Biotechnol. 35, 273-280[CrossRef][Medline] [Order article via Infotrieve]
  29. Prestridge, D. S. (1995) J. Mol. Biol. 249, 923-932[CrossRef][Medline] [Order article via Infotrieve]
  30. Breathnach, R., and Chambon, P. (1981) Annu. Rev. Biochem. 50, 349-383[CrossRef][Medline] [Order article via Infotrieve]
  31. Kozak, M. (1991) J. Cell Biol. 115, 887-903[Abstract]
  32. Carlsson, P., and Bjursell, G. (1989) Gene (Amst.) 77, 113-121[Medline] [Order article via Infotrieve]
  33. Quandt, K., Frech, K., Karas, H., Wingender, E., and Werner, T. (1995) Nucleic Acids Res. 23, 4878-4884[Abstract]
  34. Hurlin, P. J., Foley, K. P., Ayer, D. E., Eisenman, R. N., Hanahan, D., Arbeit, J. M. (1995) Oncogene 11, 2487-2501[Medline] [Order article via Infotrieve]
  35. Wheeler, M. B., Nishitani, J., Buchan, A. M. J., Kopin, A. S., Chey, W. Y., Chang, T.-M., Leiter, A. B. (1992) Mol. Cell. Biol. 12, 3531-3539[Abstract]
  36. Cherayil, B. J., Weiner, S. J., and Pillai, S. (1989) J. Exp. Med. 170, 1959-72[Abstract]
  37. Voss, P. G., Tsay, Y. G., and Wang, J. L. (1994) Glycoconjugate J. 11, 353-362[Medline] [Order article via Infotrieve]
  38. McLauchlan, J., Gaffney, D., Whitton, J. L., Clements, J. B. (1985) Nucleic Acids Res. 13, 1347-1368[Abstract]
  39. Taboury, J. A., and Taillandier, E. (1985) Nucleic Acids Res. 13, 4469-4483[Abstract]
  40. Leith, I. R., Hay, R. T., and Russell, W. C. (1988) Nucleic Acids Res. 16, 8277-8289[Abstract]
  41. Casanovas, J. M., and Azorin, F. (1987) Nucleic Acids Res. 15, 8899-8918[Abstract]
  42. Beridze, T. (1986) Satellite DNA, Springer-Verlag, Berlin
  43. Barbas, J. A., Chaix, J.-C., Steinmetz, M., and Goridis, C. (1988) EMBO J. 7, 625-632[Abstract]
  44. Watson, M. L., Kingsmore, S. F., Johnston, G. I., Siegelman, M. H., Lebeau, M. M., Lemons, R. S., Bora, N. S., Howard, T. A., Weissman, I. L., McEver, R. P., Seldin, M. F. (1990) J. Exp. Med. 172, 263-272[Abstract]
  45. Madsen, P., Rasmussen, H. H., Flint, T., Gromov, P., Kruse, T. A., Honore, B., Vorum, H., Celis, J. E. (1995) J. Biol. Chem. 270, 5823-5829[Abstract/Free Full Text]
  46. Mastrianni, D. M., Eddy, R. L., Rosenberg, H. F., Corrette, S. E., Shows, T. B., Tenen, D. G., Ackerman, S. J. (1992) Genomics 13, 240-242[Medline] [Order article via Infotrieve]
  47. Chiariotti, L., Wells, V., Bruni, C. B., Mallucci, L. (1991) Biochim. Biophys. Acta 1089, 54-60[Medline] [Order article via Infotrieve]
  48. Nudel, U., Zakut, R., Shani, M., Neuman, S., Levy, Z., and Yaffe, D. (1983) Nucleic Acids Res. 11, 1759-1771[Abstract]
  49. Fort, P., Marty, L., Piechaczyk, M., el Sabrouty, S., Dani, C., Jeanteur, P., Blanchard, J. M. (1985) Nucleic Acids Res. 13, 1431-1442[Abstract]
  50. Gitt, M. A., Wiser, M. F., Leffler, H., Herrmann, J., Xia, Y.-R., Massa, S. M., Cooper, D. N. W., Lusis, A. J., Barondes, S. H. (1995) J. Biol. Chem. 270, 5032-5038[Abstract/Free Full Text]
  51. Baldini, A., Gress, T., Patel, K., Muresu, R., Chiariotti, L., Williamson, P., Boyd, Y., Casciano, I., Wells, V., Bruni, C. B., Mallucci, L, Siniscalco, M. (1993) Genomics 15, 216-218[CrossRef][Medline] [Order article via Infotrieve]
  52. Raimond, J., Zimonjic, D. B., Mignon, C., Mattei, M., Popescu, N. C., Monsigny, M., Legrand, A. (1997) Mamm. Genome 8, 706-707[CrossRef][Medline] [Order article via Infotrieve]


Copyright © 1998 by The American Society for Biochemistry and Molecular Biology, Inc.