(Received for publication, October 21, 1994; and in revised form, December 22, 1994)
From the
A monomeric rat -galactoside-binding lectin previously
purified from extracts of rat lung has been localized to erythrocytes,
and the cDNA encoding it has been isolated from a rat reticulocyte cDNA
library. The deduced amino acid sequence of the cDNA predicts a protein
with a M
of 16,199, with no evidence of a signal
peptide. The deduced sequence is identical to the sequences of seven
proteolytic peptides derived from the purified lectin. Peptide analysis
by mass spectrometry indicates that the N-terminal methionine is
cleaved and that serine 2 is acetylated. The lectin shares all the
strictly conserved amino acid residues of other members of the
mammalian galectin family and is designated galectin-5 (GenBank
accession number L36862). Galectin-5 is a weak agglutinin of rat
erythrocytes, despite its monomeric structure. The gene encoding
galectin-5 (LGALS5) has been mapped in mouse to chromosome 11,
50 centimorgans from the centromere and 1.8 ± 1.8
centimorgans from the polymorphic marker D11Mit34n, a region
syntenic with human chromosome 17q11.
Galectins (Barondes et al., 1994a, 1994b) are a family
of animal lectins formerly known as S-type or S-Lac lectins. Members of
the galectin family are defined by two properties: shared
characteristic amino acid sequences and affinity for
-galactoside-containing glycoconjugates. Galectins are found in
many animal species, ranging from mammals to nematodes and sponges.
Mammalian galectins have been most extensively studied, and four
(galectin-1, -2, -3, and -4) have been well characterized based on
isolation of their cDNAs (reviewed by Barondes et al. (1994b)).
In previous studies, we identified yet another
-galactoside-binding lectin in extracts of rat lung. This putative
galectin had an apparent subunit molecular weight on SDS-polyacrylamide
gel electrophoresis of
18,000 and was called RL-18 (Cerra et
al., 1985). Like other galectins, it was purified by binding to a
-galactoside-derivatized affinity column and eluting with lactose.
Its carbohydrate binding properties resemble those of other galectins,
but some significant differences in its specificity were observed
(Leffler and Barondes, 1986).
In subsequent work (Leffler et al., 1989), we found RL-18 in other rat tissues, but when we perfused tissues to remove blood before homogenization, the lectin was no longer present in the extracts. This suggested that RL-18 was a component of blood. This is consistent with the observation by Whitney(1988) that rat erythrocytes contained a lectin that appeared to be RL-18.
To further characterize RL-18, we purified it from rat
lung extracts, prepared peptide fragments, and determined their
sequence. To our surprise, the peptide sequences matched the deduced
amino acid sequence of an incomplete cDNA presumed to have been derived
from malaria parasites (GenBank accession number L21711).
Since the library also contained transcripts derived from the rat
reticulocytes that the parasites had infected (van Belkum et
al., 1990), it seemed likely that the actual source of the
matching cDNA was the rat cells rather than the malaria cells. This
inference was confirmed by isolation of cDNAs with identical sequence
from a rat reticulocyte cDNA library. Here we report the structure of
the cDNA that encodes this rat lectin and its deduced amino acid
sequence. Since this protein shares certain absolutely conserved amino
acid residues with other galectins and fulfills the
-galactoside
binding requirement, we designate it as galectin-5. We also determined
the chromosomal location of the mouse gene encoding the homolog of rat
galectin-5.
Figure 2: Galectin-5 cDNA sequence. The open reading frame has the translated amino acid sequence above it. Arrows under the nucleotide sequence correspond to oligonucleotides, with forward and backwardarrows representing the sense and antisense directions, respectively. The putative polyadenylation signal at the 3`-end of the cDNA sequence is underlined. Numbers refer to the amino acid residue or nucleotide at the beginning of each line.
Cerra et al.(1985) reported the isolation of the
-galactoside-binding lectin originally called RL-18 from rat lung
extracts by affinity chromatography followed by ion-exchange
chromatography. To characterize this lectin, we digested a sample with
either trypsin or clostripain, fractionated the peptides, and
determined their mass and amino acid sequences (Table 1).
We
searched GenBank for cDNAs encoding galectin-like
sequences and found one (GenBank
accession number L21711)
that encoded peptides identical to those isolated from RL-18. This cDNA
had been isolated from an expression library constructed from
malaria-infected rat reticulocytes and probed with monoclonal
antibodies. Out of 12 monoclonal antibodies reacting with different
malarial proteins, 11 reacted weakly but specifically with clones
containing this cDNA. Since the antibodies are each directed against
different proteins, this indicated that the reaction between the
antibodies and a protein in the plaques was probably not a specific
antigen-antibody reaction and raised the possibility that the protein
in the plaques was reacting with a common feature of the monoclonal
antibodies. Since we now have shown that the recombinant protein is a
lectin, the binding of the recombinant protein to the antibodies is
assumed to have been by association of the lectin's
carbohydrate-binding site with complementary carbohydrate chains of the
immunoglobulins.
Given the match of the RL-18 peptides with the cDNA
clone, it seemed very likely that the cDNA had been derived from rat
reticulocyte mRNA rather than malaria parasite mRNA. We verified this
by amplifying two overlapping cDNA fragments (Le1h and Le2b) (Fig. 1) out of a rat reticulocyte library that spanned the
entire sequence of the initial cDNA (Pb46). The sequences of the two
reticulocyte isolates were identical to the relevant portions of the
Pb46 cDNA. However, we discovered a small region in the 3`-portion of
the coding region that was not reported in the original GenBank submission because the original subcloning strategy had missed
the presence of a second EcoRV site 27 base pairs downstream
from the first EcoRV site. The complete sequence of the cDNA (Fig. 2) is now stored in GenBank
(accession number
L36862).
Figure 1: Sequencing strategy. The galectin-5 cDNA is represented by a bar. The two EcoRV restriction sites used in subcloning are indicated. Sequences obtained from the P. berghei cDNA library isolate (Pb46) and two subclones (Pb46RV and Pb65RV) are presented as arrows under the appropriate clones. Similarly, sequences obtained from the two PCR-amplified clones from the rat reticulocyte cDNA library (Le1h and Le2b) are represented below. bp, base pairs.
The conclusion that the isolated cDNA contains the coding sequence for RL-18 was confirmed by comparing the deduced protein sequence (Fig. 2) with data about peptides derived from RL-18 (Table 1). The seven peptides whose sequences were determined match exactly with residues 29-92 and 123-145 deduced from the cDNAs, as numbered in Table 1. In addition, the mass of peptide 1 (Table 1) matches that for the expected N-terminal clostripain fragment, assuming that Met-1 has been cleaved and Ser-2 has been acetylated. These are common post-translational modifications of the N terminus of cytoplasmic proteins, including all galectins that have been analyzed. This strongly supports our identification of the putative initiator methionine as residue 1 and our conclusion that the cDNA contains the full-length coding sequence. Furthermore, the presence of a consensus polyadenylation signal (AATAAA) at nucleotide 821 (with the initiator codon as residue 1) suggests that this cDNA contains most of the 3`-untranslated region.
Compared with other galectin genes, galectin-5 cDNA has a long 3`-untranslated region (400 base pairs). Stretches of from 150 to over 350 base pairs of the sequence of this 3`-region are >50% identical to the 3`-tail regions of several other rodent and human cDNAs, including those encoding myeloperoxidase, microtubule-associated proteins, and a glucose transporter. Although the significance of these conserved sequences is unknown, there is evidence that 3`-tail regions play a role in the regulation of translation (Jackson and Standart, 1990).
The protein
sequence deduced from the cDNAs has many similarities to those of other
galectins (Fig. 3). In fact, this protein shares all the
absolutely conserved residues found in other members of the galectin
family (designated by asterisks in Fig. 3). Since the
protein meets both criteria for membership in the galectin family
(-galactoside binding (Cerra et al., 1985; Leffler and
Barondes, 1986; this paper) and conservation of certain characteristic
amino acid residues), we designate it as galectin-5. As with the other
galectins, we found no evidence in the cDNA sequence for a signal
peptide.
Figure 3: Sequence comparison of galectin-1-5. All the sequences are from rat (galectin-1, Clerch et al. (1988); galectin-3, Albrandt et al.(1987); galectin-4, Oda et al.(1993)), except galectin-2, which is human (Gitt et al., 1992). Only the C-terminal partial sequences of galectin-4 (residues 180-324) and galectin-3 (residues 114-262) are shown. Shaded residues are identical to the corresponding galectin-5 residue. Dashes represent gaps introduced to aid in alignment. The dashedunderline demarcates the exon that contains the majority of the conserved residues (Gitt and Barondes, 1991; Barondes et al., 1994b) that have been shown to be involved in saccharide binding (Lobsanov et al., 1993). Asterisks indicate residues that are conserved in all known galectin sequences (Barondes et al., 1994b). Residue numbers of the last residue on the line are given at the right.
The isolation of a cDNA encoding galectin-5 from a
reticulocyte library suggested that this lectin is a constituent of
erythroblasts and erythrocytes. To evaluate this further, we prepared
rat erythrocytes by separating them from plasma and leucocytes and then
applied an extract of the erythrocytes to a lactosyl-Sepharose column
and eluted with lactose to obtain galactoside-binding proteins. The
eluate from the affinity column showed one band when examined by
SDS-polyacrylamide gel electrophoresis (Fig. 4). This band had a
mobility identical to that of RL-18, which was previously assigned a M of 18,000 based on comparison with commercial
molecular weight markers (Cerra et al., 1985). When compared
with other galectin carbohydrate-binding domains, we found that it had
a calculated M
of 16,200 (Fig. 4). This
mobility is consistent with the calculated M
of
16,108 for galectin-5 from its deduced amino acid sequence (assuming
cleavage of Met-1 and acetylation of Ser-2). Furthermore, the
galectin-5 band from rat erythrocytes reacted strongly with antiserum
that had been raised against RL-18 purified from rat lung (Fig. 4). The yield of galectin-5 was 0.6 mg/2 g of protein in
the initial extract applied to the affinity column.
Figure 4: Gel electrophoresis and Western blot of galectin-5 purified from rat lung and rat erythrocytes. Purified galactoside-binding lectins from either rat lung (lane1) or rat erythrocytes (lanes2 and 3) were analyzed on a 20% gel and visualized with silver staining (lanes 1 and 2) or by probing with anti-RL-18 after Western blotting (lane3). Molecular mass markers (indicated by arrows to the left) used were recombinant human galectin-3 (26.2 kDa) and its C-terminal collagenase fragment (16.0 kDa) (Massa et al., 1993) and recombinant domain I of rat galectin-4 (17.0 kDa) (Oda et al., 1993).
On gel
filtration, galectin-5 eluted with an estimated M of 17,000. It thus behaves as a monomer under the nondenaturing
conditions employed here, in contrast to the dimeric galectin-1 and -2
(Gitt et al., 1992). A schematic summarizing the domain and
subunit structures of the known members of the galectin family is shown
in Fig. 5.
Figure 5: Schematic of domain and quaternary structures of galectin-1-5. Carbohydrate-binding domains are represented by black bars above and by sectors below. The repetitive domain of galectin-3 and homologous regions in galectin-4 and -5 are white, and the N-terminal domain of galectin-3 is striped.
To our surprise, despite its monomeric state, galectin-5 at a concentration of 300 µg/ml agglutinated rat erythrocytes. Partial agglutination was observed at 150 µg/ml, while no agglutination was observed at 30 µg/ml. The agglutination by 300 µg/ml galectin-5 was completely abolished in the presence of 30 mM lactose. In contrast, complete agglutination of rat erythrocytes by galectin-1 and -3 occurred at 100 µg/ml, and partial agglutination occurred at 10 µg/ml. Hence, galectin-5 acts as an agglutinin of rat erythrocytes, but is weaker than galectin-1 and -3 in this system.
To test whether the previous isolation of galectin-5 from lung (Cerra et al., 1985) was due to the presence of blood in the lung tissue, we analyzed galectins from lung that had been extensively perfused with saline to remove blood (data not shown). Only traces of galectin-5 were detected in the perfused lung, whereas galectin-1 and 3 were present as prominent components of lung tissue. Therefore, galectin-5 is present in lung as a component of blood.
To map the chromosomal location of the mouse homolog, we
first analyzed a Southern blot of restricted genomic DNA isolated from
two widely different inbred strains (C57BL/6J and M. spretus)
and the F1 hybrid produced from a cross of these strains. The probe
hybridized to only one band in both XbaI- and EcoRI-digested DNAs, supporting the existence of a unique gene
encoding the mouse homolog (Fig. 6). Several restriction enzymes
produced restriction fragment length polymorphisms, including TaqI, which yielded a 3.2-kilobase pair band and a
9.2-kilobase pair band, specific to C57BL/6J and M. spretus,
respectively. This restriction fragment length polymorphism was mapped
in TaqI-restricted genomic DNA isolated from progeny of a
backcross of the F1 hybrid described above and the C57BL/6J parental
strain as described under ``Materials and Methods.'' The
3.2-kilobase pair TaqI band exhibited linkage to three already
mapped polymorphic markers on chromosome 11 in the region 50
centimorgans from the centromere ( Fig. 7and Table 2and Table 3). The closest marker appears to be D11Mit34n,
only 1.8 ± 1.8 centimorgans away from LGALS5.
Neighboring genes in this region of the chromosome include tipsy (a locomotion defect (Searle, 1961)), Edp1 (an
endothelial cell protein (Buckwalter et al., 1991)), Tcf2 (a T cell transcription factor (Karolyi et al., 1992)), Idd4 (insulin-dependent diabetes susceptibility (Todd et
al., 1991)), and Glut4 (an insulin-responsive glucose
transporter (Hogan et al., 1991)). LGALS5 also occurs
near a neurofibromatosis gene, Nf-1 (Seizinger, 1987), just as LGALS1 and LGALS2 occur near Nf-2 (Mehrabian et al., 1993).
Figure 6: Southern blot of genomic DNA isolated from parental strains C57BL/6J and M. spretus and the F1 hybrid of the parental cross. For each triplet of lanes (labeled A and B), DNA from C57BL/6J is in the firstlane, M. spretus DNA is in the secondlane, and DNA from the F1 hybrid is in the thirdlane. DNAs in A and B lanes were cut with XbaI and EcoRI, respectively. Approximate sizes are given in kilobase pairs on the right.
Figure 7: Schematic of the arrangement of the LGALS5 gene and three nearby polymorphic markers. cM, centimorgans.
Herein we report the cDNA sequence and deduced protein
sequence of galectin-5, the fifth protein to fulfill criteria for
membership in the mammalian galectin family. It shares all the
apparently critical amino acid residues known to be involved in
galactoside binding (Lobsanov et al., 1993; Liao et
al., 1994), and it has a demonstrated specificity for binding
-galactosides.
Galectin-5 is found in erythrocytes, and its
mRNA is found in reticulocytes. Its cell-specific expression suggests
that it is related to a -galactoside-binding lectin previously
observed in rabbit erythrocytes and at higher levels in erythroblasts
in bone marrow (Harrison and Chesterton, 1980; Harrison and Catt,
1986). The biochemical properties of the rabbit lectin (Harrison et
al., 1984) support this conclusion: the apparent molecular weight
of the rabbit lectin on SDS-polyacrylamide gel electrophoresis is
13,000; its isoform isoelectric points are 5.2-5.65 (compare with
5.1 for galectin-5 (Leffler et al., 1989)); and, like
galectin-5 (Cerra et al., 1985; this paper), it is monomeric.
The rabbit lectin agglutinated rabbit erythrocytes just as rat
galectin-5 agglutinated rat erythrocytes, although both lectins were
weaker agglutinins compared with the dimeric galectin-1 (Harrison et al., 1984; this paper). The rabbit lectin, originally
called erythroid developmental agglutinin, was found mainly in the
cytosol, but also at the cell surface (Harrison and Catt, 1986), and
was proposed to mediate cell-cell adhesion during erythropoiesis. In
view of that proposal, galectin-5 may well function primarily in
erythrocyte differentiation rather than in the mature red blood cell.
Galectin-5 resembles the other galectins in that it exhibits characteristics of a cytoplasmic protein: its cDNA lacks an encoded signal peptide, and the protein's N terminus is apparently blocked with an acetyl group. However, this does not necessarily mean that galectin-5 is always confined to the cytosol since galectin-1 and -3, which share these properties, nevertheless are secreted by nonclassical mechanisms under specific conditions (Cooper and Barondes, 1990; Lindstedt et al., 1993; Sato et al., 1993).
Of the other galectins that have been sequenced, galectin-5 most closely resembles galectin-4 (Fig. 3) (Oda et al., 1993). This is especially true in the protein region defined by the exon that contains the majority of the conserved residues (Gitt and Barondes, 1991; Barondes et al., 1994b) and that is known to interact directly with the carbohydrate ligand (Lobsanov et al., 1993). In this region, galectin-5 and the second domain of galectin-4 have 54% amino acid identity. In contrast, comparable domains of galectin-1, -2, and -3 show 31, 37, and 48% identities, respectively.
Although galectin-5 is close in size to galectin-1 and
-2 (subunit M = 14,840 and 14,650,
respectively) and, like them, has only one carbohydrate-binding site,
it behaves as a monomer on gel filtration under nondenaturing
conditions (Cerra et al., 1985; Leffler et al., 1989;
this paper), whereas galectin-1 and -2 are dimers under these same
conditions (Fig. 5) (Barondes et al., 1994b; Gitt et al., 1992). Despite its monomeric form and monovalency,
galectin-5 acts as a weak agglutinin of fresh rat erythrocytes. The
agglutination by galectin-5 may be through a mechanism similar to that
proposed for galectin-3, involving an induced aggregation of the lectin
at the ligand-coated surface (Hsu et al., 1992; Massa et
al., 1993), which requires at least some of the N-terminal domain
of galectin-3. Since galectin-5 contains very little sequence in
addition to its carbohydrate-binding domain and therefore lacks a
domain homologous to this galectin-3 region, the galectin-5-induced
agglutination probably occurs by protein-protein interactions different
from those employed by galectin-3.
The gene encoding the mouse
homolog of galectin-5 has been mapped to chromosome 11 50
centimorgans from the centromere, a region syntenic with human
chromosome 17q11, suggesting that the human homolog of the galectin-5
gene (LGALS5) may be found in this region as well. Hence, LGALS5 is probably not linked to any of the other already
mapped galectin genes, LGALS1 and LGALS2 on human
chromosome 22 (Mehrabian et al., 1993) and LGALS3 on
chromosome 1 (Raz et al., 1991).