(Received for publication, July 31, 1995; and in revised form, October 17, 1995)
From the
In this report we describe the chromosome mapping and genomic
organization of the human Gal1,3GalNAc/Gal
1,4GlcNAc
2,3-sialyltransferase gene. The gene is localized to human
chromosome 11(q23-q24) by in situ hybridization of metaphase
chromosomes. It spans more than 25 kilobases of human genomic DNA and
is distributed over 14 exons that range in size from 61 to 679 base
pairs. Previous characterization of cDNAs encoding the
Gal
1,3GalNAc/Gal
1,4GlcNAc
2,3-sialyltransferase revealed
that the gene produces at least three transcripts in human placenta,
which code for identical protein sequences except at the 5` ends
(Kitagawa, H., and Paulson, J. C. (1994a) J. Biol. Chem. 269,
1394-1401). Repeated screening for clones that contain the 5` end
of the cDNA has identified two additional distinct mRNAs that are
expressed in human placenta. Comparison of the genomic DNA sequence
with that of the five different mRNAs indicates that these transcripts
are produced by a combination of alternative splicing and alternative
promoter utilization. Northern analysis indicated that one of them is
specifically expressed in placenta, testis, and ovary, indicating that
its expression is independently regulated from the others.
Sialic acid-containing oligosaccharide structures found on glycoproteins and glycolipids are known to vary with species, tissue type, and stage of development. The structural diversity of these carbohydrates is believed to be used by the cell to mediate specific cellular recognition processes including protein targeting, cell adhesion, and cellular differentiation and development (Kornfeld, 1987; Rademacher et al., 1988; Paulson, 1989; Brandley et al., 1990; Varki, 1992; Powell and Varki, 1995). The high degree of structural diversity observed in the terminal glycosylation sequences of glycoprotein carbohydrates is generally believed to be specified by the glycosyltransferases produced by the cell. Accumulating evidence suggests that the regulated expression of these enzymes may account for the synthesis of cell type-specific carbohydrate structures (Paulson et al., 1989; Kleene and Berger, 1993; Kitagawa and Paulson, 1994b; Natsuka and Lowe, 1994). Despite the growing number of glycosyltransferase cDNAs which have been cloned, limited information is available concerning the organization and regulation of the expression of glycosyltransferase genes (Joziasse, 1992; Kleene and Berger, 1993).
The sialyltransferase
family consists of 12-15 or more glycosyltransferases grouped by
their common function of transferring sialic acid from CMP-NeuAc to
terminal positions on the sugar chains of glycoproteins and
glycolipids. To date, 11 cDNAs of these enzymes have been cloned
(Weinstein et al., 1987; Gillespie et al., 1992; Wen et al., 1992b; Livingston and Paulson, 1993; Sasaki et
al., 1993; Kitagawa and Paulson, 1994a; Kurosawa et al.,
1994a, 1994b; Lee et al., 1994; Nara et al., 1994;
Sasaki et al., 1994; Haraguchi et al., 1994; Kojima et al., 1995; Yoshida et al., 1995; Eckhardt et
al., 1995). Of these, only the -galactoside
2,6-sialyltransferase gene has been extensively characterized
(Svensson et al., 1990; Wang et al., 1990;
O'Hanlon and Lau, 1992; Wen et al., 1992a; Svensson et al., 1992; Aasheim et al., 1993; Wang et
al., 1993). This gene was found to be relatively large, spanning
over 80 kb (
)in length, producing at least six different
messages, via alternative promoter usage and mRNA splicing in a
tissue-specific fashion.
In this report we have examined the gene of
human Gal1, 3GalNAc/Gal
1,4GlcNAc
2,3-sialyltransferase
for which the cDNA has recently been cloned from human melanoma cell
line WM266-4 and human placenta and partial characterization of the
human gene has been recently reported (Sasaki et al., 1993;
Kitagawa and Paulson, 1994a; Chang et al., 1995). The gene was
found to span more than 25 kb and produce at least five distinct
transcripts in human placenta. Northern analysis indicated that one of
them is specifically expressed in placenta, testis, and ovary. The
results suggested that the human
2,3-sialyltransferase gene is
expressed tissue specifically by a combination of alternative splicing
and alternative promoter utilization. Finally, we document that this
gene and the human Gal
1,3(4)GlcNAc
2,3-sialyltransferase
gene, which has the highest homology to this gene, reside in different
human chromosomes, 11q23-q24 and 1p34-p33, respectively.
Figure 1:
Comparison of the sequences in the
5`-untranslated region of the type B1, type B2, and type B3 forms of
2,3-sialyltransferase cDNA isolated from human placenta.
Homologous sequence (only the first 16 bp) is represented by boldface letters.
Figure 2:
Genomic map of the
2,3-sialyltransferase gene. Exons are labeled E1-E14.
EcoRI (E) and HindIII (H) restriction
sites are shown as hash marks. Exon regions are denoted by boxes. Black boxes represent coding sequence and open
boxes denote 5`- and 3`-untranslated sequences. Shown below are
the splicing patterns for each type of message described here.
Abbreviations for each message are used in the text. In the previous
publication (Kitagawa and Paulson (1994a)), types A1, A2, and B1 were
referred to as Long A, Short, and Long B,
respectively.
In summary,
the analysis suggests that the entire 2,3-sialyltransferase gene
spans over 25 kb of human genomic DNA. It should be noted that, in
contrast to the genomic organization of
2,6-sialyltransferase
(Svensson et al., 1990), the highly conserved sialylmotif,
used to clone this
2,3-sialyltransferase cDNA (Kitagawa and
Paulson, 1994a), is divided into two exons, exon E10 and E11 (Fig. 2). In addition, the unique sequence found on the 5` end
of the type B2 and type B3 forms (see Fig. 1) were mapped
between exon E1 and exon E3, indicating that these were produced by
alternative promoter utilization (Fig. 2). The exon E2 for the
5` end of type B2 form was located 397 bp upstream of exon E3. The
5`-most transcriptional start site of the type B3 form was located only
44 bp upstream of exon E3, and the mRNA was formed to the 3` end of
exon E3 without splicing. These results suggest that the five mRNA
isoforms are produced by a combination of alternative splicing and
alternative promoter utilization, and, consequently, that the mRNA is
formed from a combination of 14 exons of the
2,3-sialyltransferase
gene.
Figure 3:
Sequence of the type A1, type A2, and type
B3 forms of human 2,3-sialyltransferase promoter region. The start
sites of transcription for each isoform are shown by arrows.
The consensus binding sites for the transcription factors AP2, Sp1,
LF-A1, HLH (helix-loop-helix proteins),
NF-1, and MAF are underlined and those for the transcription factor ETF are boxed.
Figure 4:
Sequence of the type B2 form of human
2,3-sialyltransferase promoter region. The start sites of
transcription are shown by arrows. The consensus binding sites
for the transcription factors AP1, AP2, MAF, CArG, HLH, PEA3,
NF-1, and OCT are underlined, and those for the
transcription factor ETF are boxed.
As shown in Fig. 4and Table 4, the 5`-flanking region of the type B2 form also contains three sequence motifs similar to the MAF recognition element, one sequence similar to the AP1 binding site (Lee et al., 1987), four sequences similar to the AP2 binding site, seven ETF consensus sequences, one HLH consensus sequence, and one NF-1-like protein binding site. Moreover, three additional sequence motifs were detected. Three CArG consensus binding sites are present, a sequence motif required for expression of smooth muscle-specific genes (Reddy et al., 1990), one OCT (octamer binding transcription factor) consensus binding site (Cox et al., 1988) is identified, a sequence motif recognized by an octamer-related proteins which have been implicated in the control of the histone 2b gene and the melanocyte-specific tyrosinase-related protein TRP1 (Lowings et al., 1992), and one PEA3 consensus sequence is also present (Faisst and Meyer, 1992).
Figure 5:
Tissue-specific expression of the type A1
form transcript in various human tissues. Northern blots with mRNA from
various adult and fetal human tissues were hybridized with a probe
corresponding to the coding sequence of the 2,3-sialyltransferase (a) or with a probe corresponding to the specific sequence of
the type A1 form cDNA (b), as described under
``Experimental Procedures.''
Figure 6:
Distribution of labeled sites on
chromosome 11 for Gal1,3GalNAc/Gal
1,4GlcNAc
2,3-sialyltransferase gene (a) and chromosome 1 for
Gal
1,3(4)GlcNAc
2,3-sialyltransferase gene (b). The
peak of hybridization occurs on band q23-q24 of chromosome 11 (a) and on band p34-p33 on chromosome 1 (b).
Comparison of sequences of nine cloned sialyltransferases revealed
that the highest homology of Gal1,3GalNAc/Gal
1,4GlcNAc
2,3-sialyltransferase to the other 10 sequences was with that of
Gal
1,3(4)GlcNAc
2,3-sialyltransferase. Accordingly, an
experiment similar to that described above was carried out to determine
the chromosomal location of the Gal
1,3(4)GlcNAc
2,3-sialyltransferase gene. Of 100 metaphase cells examined for
hybridization, 188 silver grains were associated with chromosomes, and
51 of these (27.1%) were located on chromosome 1. As shown in Fig. 6b, 78.4% of them were mapped to the p34-p33
region of chromosome 1 short arm. These results indicate that the
Gal
1,3(4)GlcNAc
2,3-sialyltransferase gene is localized at
chromosome 1p34-33.
To date, the genomic organization has been reported for
several glycosyltransferases (Joziasse, 1992; Kleene and Berger, 1993;
Chang et al., 1995). The rat 2,6-sialyltransferase gene
is divided into at least 12 exons which span over 80 kb in length (Wen et al., 1992a). Similarly, the
1,3-galactosyltransferase
gene is distributed over 9 exons that span over 35 kb (Joziasse et
al., 1992), and the
1,4-galactosyltransferase gene is also
distributed over 6 exons that span over 40 kb of genomic sequence
(Hollis et al., 1989). This
2,3-sialyltransferase gene
falls into the same pattern. In contrast, several exceptions to this
pattern are
1,2-GlcNAc-transferase I,
1,4-GlcNAc-transferase
III, several
1,3-fucosyltransferases, and
1,6-GlcNAc-transferase genes. The entire coding sequence of these
genes appears to be contained within a single exon (Hull et
al., 1991; Lowe et al., 1991; Weston et al.,
1992a, 1992b; Bierhuizen et al., 1993; Ihara et al.,
1993). It is unclear whether the occurrence of two patterns of
glycosyltransferase genomic organization has an evolutionary
significance.
The gene for human Gal1,3GalNAc/Gal
1,4GlcNAc
2,3-sialyltransferase is distributed over 14 exons that span at
least 25 kb of genomic sequence. Transcription of this gene results in
the production of five distinct mRNAs (type A1, type A2, type B1, type
B2, and type B3 forms) in human placenta, each approximately 2.0 kb in
size, that are generated by a combination of alternative splicing and
alternative promoter utilization. Translation of these individual mRNAs
predicts the biosynthesis of three related protein isoforms of the
2,3-sialyltransferase which were previously referred to as the
Long A, Long B, and Short forms, of 332, 333, and 322 amino acids,
respectively, which has been confirmed by in vitro translation. (
)Structurally, these three protein
isoforms differ from each other only at its N-terminal that is the
cytoplasmic tail and the part of transmembrane domain (Kitagawa and
Paulson, 1994a). The biological significance of the three different
protein isoforms is presently unclear.
The observation of multiple
transcripts for this 2,3-sialyltransferase gene has also been
observed with other glycosyltransferase genes including those of the
2,6-sialyltransferase and the
1,4-galactosyltransferase
(Paulson et al., 1989; Russo et al., 1990; Wen et
al., 1992a; Aasheim et al., 1993; Wang et al.,
1993). In case of the
2,6-sialyltransferase, at least six
different transcripts were produced via alternative splicing and
alternative promoter usage. The most well-characterized one is a 4.3-kb
mRNA found almost exclusively in the liver (Wen et al.,
1992a), which is generated from six exons of the gene (Svensson et
al., 1990). Two distinct forms of a 4.7-kb mRNA, one is highly
expressed in B-cells and another is expressed at low levels in most
tissues (Aasheim et al., 1993; Wang et al., 1993),
have been identified. The two transcripts are also produced from the
same six exons as the 4.3-kb one with the addition of one or two
5`-untranslated exons (Aasheim et al., 1993; Wang et
al., 1993). Thus, these three transcripts have identical coding
sequences but having different 5`-untranslated sequences. Since the
coding sequences are identical, the different mRNA species in this case
are a consequence of the cell type-specific regulation of the
expression of this complex gene. In addition, three forms of a 3.6-kb
mRNA have been isolated from rat kidney. Although these transcripts are
generated from the
2,6-sialyltransferase gene, they retain less
than 50% of the coding region and do not have sialyltransferase
activity (Wen et al., 1992a; Harduin-Lepers et al.,
1993). Moreover, they have only been detected in the kidney in rat and
not in human (Kitagawa and Paulson, 1994b).
In the case of the
2,3-sialyltransferase, as shown in this report, at least five
transcripts are produced from a single gene locus by a combination of
alternative splicing and alternative promoter usage, which each codes
for identical protein, Gal
1,3GalNAc/Gal
1,4GlcNAc
2,3-sialyltransferase, except at the 5` ends. Why are alternative
promoters and alternative splicing used for the production of five
mRNAs coding for the same enzyme? Northern analysis of the tissue
distribution of them demonstrated that one of the
2,3-sialyltransferase mRNA isoforms, type A1 form, is specifically
expressed in placenta, testis, and ovary although either of them is
constitutively expressed in all the tissues examined (Fig. 5).
This pattern of expression is likely a consequence of differential
regulation at the level of transcription as demonstrated for the
2,6-sialyltransferase and the
1,4-galactosyltransferase
(Svensson et al., 1992; Harduin-Lepers et al., 1993).
The sequence analysis of the 5`-flanking region of the
2,3-sialyltransferase isoforms revealed the heterogeneous
transcriptional start sites and the absence of typical TATA and CCAAT
boxes coupled with the presence of GC boxes ( Fig. 3and Fig. 4). These structural features are believed to be typical of
the so-called housekeeping genes, which are expressed at low levels in
essentially all tissues (Kadonaga et al., 1986), suggesting
that its regulation would be governed, at least in part, by the Sp1
binding sites like that of the
1,4-galactosyltransferase
(Harduin-Lepers et al., 1993). Further work is required to
confirm this mechanism.
Chromosomal assignments have been reported
for several glycosyltransferases including two sialyltransferases,
Gal1,4GlcNAc
2,6-sialyltransferase and
NeuAc
2,3Gal
1, 4Glc
1-1`Cer
2,8-sialyltransferase,
which reside at human chromosome 3q27-q28 and on human chromosome 12,
respectively (Kleene and Berger, 1993; Wang et al., 1993;
Sasaki et al., 1994). The present study demonstrates that the
two additional sialyltransferases, the
Gal
1,3GalNAc/Gal
1,4GlcNAc
2,3-sialyltransferase and the
Gal
1,3(4)GlcNAc
2,3-sialyltransferase, are also localized on
entirely different human chromosomes, 11q23-q24 and 1p34-33,
respectively, despite the fact that their four genes share the highly
conserved region, sialylmotif. These results strongly suggest that the
four sialyltransferases diverged from an ancestor gene early in
evolution. It remains to be determined whether the rest of the
sialyltransferase genes are likewise dispersed in the human genome.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) L29553[GenBank].