From the Department of Biology, Massachusetts
Institute of Technology, Cambridge, Massachusetts 02139, the
§ Department of Medicine, Harvard Medical School, Beth
Israel Deaconess Medical Center, Boston, Massachusetts 02215, and the
Mammalian Genetics Laboratory, ABL-Basic
Research Program, National Cancer Institute, Frederick Cancer Research
and Development Center, Frederick, Maryland 21702
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() |
---|
3-O-Sulfated glucosaminyl residues
are rare constituents of heparan sulfate and are essential for the
activity of anticoagulant heparan sulfate. Cellular production of the
critical active structure is controlled by the rate-limiting enzyme,
heparan sulfate D-glucosaminyl 3-O-sulfotransferase-1 (3-OST-1) (EC 2.8.2.23). We have
probed the expressed sequence tag data base with the carboxyl-terminal sulfotransferase domain of 3-OST-1 to reveal three novel, incomplete human cDNAs. These were utilized in library screens to isolate full-length cDNAs. Clones corresponding to predominant transcripts were obtained for the 367-, 406-, and 390-amino acid enzymes 3-OST-2, 3-OST-3A, and 3-OST-3B, respectively. These
type II integral membrane proteins are comprised of a divergent
amino-terminal region and a very homologous carboxyl-terminal
sulfotransferase domain of ~260 residues. Also recovered were partial
length clones for 3-OST-4. Expression of the full-length enzymes
confirms the 3-O-sulfation of specific glucosaminyl
residues within heparan sulfate (Liu, J., Shworak, N. W.,
Sina Heparan sulfate proteoglycans are hybrid molecules composed of a
protein core to which is attached one or more linear glycosaminoglycan chains of the heparan sulfate variety. Extreme structural diversity of
the heparan sulfate side chains enables interactions with a broad array
of protein effector molecules that modulate a wide range of biologic
processes. The specificity of any given heparan sulfate-protein
interaction is largely dictated by placement of sulfate groups along
the chain's length. Thus the order and ring position of sulfate
substituents creates distinct oligosaccharide sequences (fine
structure) and defines corresponding biologic activities (reviewed in
Refs. 1-3).
The profound functional diversity of heparan sulfate proteoglycans
necessitates a mechanism that can generate and independently regulate
the production of a myriad of fine structures. Such control is
predominantly imposed in a cell type-specific fashion by varying the
functional status of the Golgi apparatus, with the core proteins potentially contributing only a minor degree of influence (4, 5). Thus,
heparan sulfate biosynthetic enzymes are implicated as key components
in generating regions of defined monosaccharide sequence. The
production of the antithrombin-binding site by the enzyme heparan
sulfate D-glucosaminyl 3-O-sulfotransferase-1
(3-OST-1)1 (EC 2.8.2.23),
reveals a mechanism for the independent biosynthesis of a specific
heparan sulfate sequence that regulates an important biologic activity.
Antithrombin is a natural anticoagulant that neutralizes serine
proteases of the intrinsic blood coagulation cascade through the
formation of a 1:1 enzyme-antithrombin covalent complex. The rate of
complex formation is dramatically enhanced via interactions with
glycosaminoglycans containing the antithrombin-binding site; i.e. pharmaceutical heparin and anticoagulant heparan
sulfate. The latter is generated by endothelial cells, which line the
blood vessel wall. The importance of the anticoagulant
heparan-antithrombin interaction is evidenced by the arterial
thrombotic events that occur in patients producing antithrombin
variants defective in anticoagulant heparan binding (reviewed in Ref.
6). Given this critical role, it is not surprising that the cellular
production of anticoagulant heparan is regulated independently of the
general bulk of heparan sulfate (5, 7).
Antithrombin specifically recognizes the structure: -Glc(NS or
Ac)6S-GlcA-GlcNS3S±6S-IdoA2S-GlcNS6S-,2
which triggers a conformational change that results in the accelerated neutralization of specific coagulation proteases (reviewed in Refs. 3
and 8). The central 3-O-sulfate group is absolutely essential for induction of the conformational change and high affinity
antithrombin binding. Binding specificity additionally requires the
6-O-sulfate groups on residues 1 and 5, the amino group at
residue 5 and carboxyl groups at other sites (9). The critical role of
the 3-O-sulfate group and the extreme paucity of this
substituent within heparan sulfate (5, 7), raises the possibility of a
key regulatory role. Indeed, we have recently demonstrated that the
enzyme 3-OST-1 performs the rate-limiting biosynthetic reaction that
determines cellular production of anticoagulant heparan (10, 11). The
enzyme recognizes a specific precursor structure, corresponding to the
antithrombin-binding site devoid of just the
3-O-sulfate, and adds this rare substituent to complete the
formation of anticoagulant heparan (10). Thus, 3-OST-1 activity controls cellular anticoagulant phenotype. This example raises the
possibility that additional heparan sulfate biosynthetic enzymes may
function in an analogous fashion, controlling production of other
important heparan sulfate fine structures.
The molecular cloning of the cDNA for the precursor protein of
3-OST-1 showed that the enzyme undergoes removal of an amino-terminal leader sequence to generate a Golgi intraluminal resident of ~290 amino acids (12). Most importantly, the carboxyl-terminal ~260 residues have striking homology to the comparable region of the bifunctional biosynthetic enzymes heparan sulfate
N-deacetylase/N-sulfotransferase-1 and -2 (~50% similarity to both NST-1 and NST-2), and at least 30%
similarity to virtually every type of sulfotransferase enzyme previously identified. Consequently, this conserved structure that
spans the majority of 3-OST-1 has been presumptively designated as the
sulfotransferase domain (12).
In the present article, we have employed this conserved structure to
molecularly clone related cDNAs, which encode homologous carboxyl-terminal sulfotransferase domains but distinct amino-terminal structures. Unlike 3-OST-1, the novel 3-OST-2, 3-OST-3A,
and 3-OST-3B enzymes are predicted to have type II integral
membrane architecture. An incomplete cDNA encoding the
sulfotransferase domain portion of the enzyme 3-OST-4 was also
obtained. Comparison of the enzyme structures predicts motifs that may
govern sequence specific modification. Northern hybridizations show
isoform-specific expression patterns, whereas genomic characterizations
identified at least 7 human 3-OST genes. Thus, the 3-OST
multigene family is exquisitely suited to encode key enzymes that
regulate the production of many distinct heparan sulfate fine structures.
Isolation of 3-OST-2, 3-OST-3, and 3-OST-4 cDNA
Clones--
The National Center for Biotechnology Information data
bank of I.M.A.G.E. Consortium (Lawrence Livermore National Laboratory) expressed sequence tag cDNA clones (13) was probed with the deduced
sulfotransferase domain region of mouse 3-OST-1 (12), which identified
partial length clones that were obtained from the TIGR/ATCC Special
Collection (ATCC). Complete sequencing of the inserts revealed three
clone categories: 3-OST-2, I.M.A.G.E. Consortium (Lawrence Livermore
National Laboratory) Clone ID c-20d10 (GenBankTM accession number
F07258) (14) from a normalized library generated from total brain of a
3-month muscular atrophy female (15);
3-OST-3ACTF, Clone ID 284542 (GenBankTM
accession number N71828), from a library of 4 multiple sclerosis
lesions isolated from a 46-year-old male (13); and 3-OST-4, Clone IDs
HIBCX69 (GenBankTM accession number T33472) from human brain (16),
IB727 (GenBankTM accession number T03677) from infant brain (17, 18),
166466 (GenBankTM accession number R88592) from adult brain (13), 23279 (GenBankTM accession number T75445) from infant brain, c-3ie01
(GenBankTM accession number F13088) (from the same library as c-20d10) (14). To obtain full-length clones, we first identified cDNA regions, described below, which would function as isoform-specific probes by hybridizing Southern blots of human genomic DNA with expressed sequence tag fragments 32P-labeled by random
priming. The corresponding fragments were used to screen Characterization of cDNA Clones--
The 5' and 3' insert
regions were enzymatically sequenced from flanking primer sites of the
respective cloning vectors. The remaining sequence of both strands was
obtained with internally priming oligonucleotides. Primers were spaced
no more than 400 base pairs (bp) apart with a 200-bp offset between + and Computer Analysis of Sequence Data--
DNA sequence files were
aligned and compiled with the program Sequencher 3.0 (Gene Codes
Corp.). Sequence comparison searches were performed with Gapped BLAST
(19) on the data bases of GenBankTM, EMBL, DDBJ, PDB, SwissProt, PIR,
dSTS, htgs, and dbEST. The following protein features were predicted
with the corresponding programs: secondary structure, PHDsec
() (20); hydrophobicity (Kyte Doolittle), DNA Strider 1.0; membrane spanning segments, PHDhtm
() (21); O-glycosylation sites, NetOGlyc 2.0 () (22). Polyadenylation
signals were detected with the Genefinder package
(). All additional
manipulations were performed with the University of Wisconsin Genetics
Computer Group (GCG) sequence analysis software package.
Nucleic Acid Probes--
cDNA libraries were screened with
probes containing both sulfotransferase domain and 3'-untranslated
region sequences as follows: a EcoRI/XbaI 1.6-kb
fragment isolated from c-20d10 (nucleotides 385-1952 of 3-OST-2), a
EcoRI/XbaI 1.1-kb fragment from clone 284542 (1-1152 of 3-OST-3ACTF), and a 921-bp
EcoRI/BamHI fragment from clone 23279 (1-920 of 3-OST-4). Southern analysis was performed with the following probes which contain only sulfotransferase domain sequences: ST-1, a KpnI/BspHI 558-bp fragment from pJL30 (12)
(428-985 of human 3-OST-1); ST-2, a 521-bp polymerase chain reaction
product from pJL-2.7 (683-1203 of 3-OST-2); ST-3, a 192-bp polymerase
chain reaction product from pJL-3.7 (1712-1903 of 3-OST-3A
and 1199-1390 of 3-OST-3B); ST-4, a BstXI
575-bp fragment from clone 23279 (156-730 of 3-OST-4); and mST-1, a
EcoRI/SacII 779-bp fragment from pNWS182 (12)
(395-1173 of mouse 3-OST-1). Southern analysis also included polymerase chain reaction-generated probes containing predominantly SPLAG domain (Ser, Pro, Leu, Ala, and Gly enriched domain) sequences (SPLAG-A is 385 bp from 823-1207 of 3-OST-3A, and SPLAG-B
is 271 bp from 448-718 of 3-OST-3B) as well as probes
containing only 3'-untranslated regions (3'A is 290 bp from 2006-2295
of 3-OST-3A, 3'B is 309 bp from 1511-1819 of
3-OST-3B, 3'BCTF and 364 bp from 718-1081 of
3-OST-3BCTF). Northern analysis for 3-OST-2 and
3-OST-4 was performed with the same probes as was for library
screening. Northern analysis for the 3-OST-3 species included the above
described sulfotransferase domain, SPLAG domain, and 3'-untranslated
region-specific probes. All samples were random prime labeled with
[ Southern Blot Analysis--
Genomic DNA from human endothelial
cells was used for the analysis of human 3OST gene copy number and for
genomic restriction mapping studies. Genomic DNA was isolated from 76 plates (150 mm) of primary human umbilical vein endothelial cells
(Clonetics Corp., San Diego, CA) grown according to the suppliers
protocol. Cells were harvested by trypsinization, washed in
phosphate-buffered saline, and pelleted by centrifugation at 1000 × g for 3 min. Cells were lysed by vortexing for 5 min in
ice-cold 140 mM NaCl, 1.5 mM MgCl2,
10 mM Tris, pH 8.0, and 0.5% Triton X-100. Nuclei were
collected by centrifugation at 1500 × g for 4 min and
resuspended in 3 ml of 150 mM NaCl, 25 mM EDTA,
and 10 mM Tris, pH 8.0, then combined with 133 µg of
RNase A and 333 µg of proteinase K. Nuclei were lysed with the
addition of 3 ml of 0.4% SDS and then incubated at 65 °C for
16 h. Samples were extracted 5 times against 8 ml of phenol,
chloroform, isoamyl alcohol (25:24:1), the aqueous phase was combined
with 15 ml of isopropyl alcohol, DNA was harvested by spooling, washed
with 80% ethanol, and resuspended in 4 ml of 10 mM Tris,
pH 8.0, 1 mM EDTA. Copy number determinations for the mouse
3Ost genes were performed on genomic DNA isolated by the
above procedure from the previously described clonal mouse L cell line
LTA (10, 23, 24). Dr. Chao Sun (Whitehead Institute) generously
provided human DNA samples isolated from peripheral leukocytes of 16 unrelated male individuals for the analysis of the BstXI
pattern generated by 3-OST probes.
Typically, 10-µg samples were restriction digested, resolved by 0.8%
or 1% agarose gel electrophoresis, and transferred to GeneScreen Plus
membranes. Membranes were hybridized for 16 h in 1% SDS, 10%
dextran sulfate buffer containing SSC and formamide. The concentrations
of these latter two components were adjusted so that all homologous
hybridizations were incubated at Tm Northern Blot Analysis--
Tissue and cell type-specific
expression of 3-OST forms was performed with human multiple tissue and
human cancer cell line Northern blots, respectively
(). Endothelial expression of 3-OST forms
was tested on 10 µg of total RNA from immortalized rat fat pad
endothelial cells and primary mouse cardiac microvascular cells, as
well as 5 µg of poly(A)+ prepared from primary human
umbilical vein endothelial cells, as described previously (12). Samples
were resolved on 1.2% formaldehyde-agarose gels and capillary
transferred to GeneScreen Plus membranes. Membranes were hybridized for
16 h in 1% SDS, 10% dextran sulfate buffer containing SSC and
formamide. The concentrations of these latter two components were
adjusted so that all homologous hybridizations were incubated at
Tm Interspecific Mouse Back-cross Mapping--
Interspecific
back-cross progeny were generated by mating (C57BL/6J × Mus
spretus)F1 females and C57BL/6J males as described (27). A total of 205 N2 mice were used to map the
3Ost loci, as described under "Results." DNA isolation,
restriction enzyme digestion, agarose gel electrophoresis, Southern
blot transfer, and hybridization were performed essentially as
described (28). All blots were prepared with Hybond-N+
nylon membrane (Amersham). The employed hybridization probes are
described above. The 3Ost1 probe, mST-1, was labeled with [
A description of the probes and restriction fragment length
polymorphisms for the loci linked to the 3Ost genes has been
reported previously. These include Adra2c, Msx1, and
Bst1, chromosome 5 (29)3; Pth, Pkcb,
Spn, and Mgmt, chromosome 7 (29, 31); and Adra1a, Csfgm, Myhsf1, and Trp53, chromosome 11 (32, 33).
Recombination distances were calculated using Map Manager, version
2.6.5. Gene order was determined by minimizing the number of
recombination events required to explain the allele distribution patterns.
Chromosomal Mapping of Human 3OST Genes--
Data base searching
identified bacterial artificial chromosome clones containing human
3OST2, 3OST3A1, and 3OST3B1 genes (GenBankTM accession numbers
AC003661, AC002287, AC005411, AC005375, AC005224). Data base
searching with a combination of genomic and cDNA sequences
identified expressed sequence tag and sequence-tagged site markers
(GenBankTM accession numbers G24436, T03677, G21216, and G03581). The
chromosomal location of these markers was then determined through the
Human Genome Sequencing Index ().
Isolation and Characterization of cDNAs Encoding 3-OST-2,
3-OST-3, and 3-OST-4 Isoforms
We probed the National Center for Biotechnology Information data
base of expressed sequence tag cDNA clones (13) with the deduced
amino acid sequence of the presumptive sulfotransferase domain from the
human 3-OST-1 cDNA to reveal several human partial length cDNAs
encoding novel related species, as described under "Experimental
Procedures." Sequencing the contained cDNA inserts confirmed
three distinct forms designated as 3-OST-2,
3-OST-3ACTF, and 3-OST-4. Isotype-specific
probes were generated, 32P-labeled, and screened against
, P., Schwartz, J. J. Zhang, L., Fritze, L. M. S., and Rosenberg, R. D. (1999) J. Biol.
Chem. 274, 5185-5192). Southern analyses suggest the human
3OST1, 3OST2, and 3OST4 genes, and the corresponding mouse
isologs, are single copy. However, 3OST3A and 3OST3B genes are
each duplicated in humans and show at least one copy each in mice.
Intriguingly, the entire sulfotransferase domain sequence of the
3-OST-3B cDNA (774 base pairs) was 99.2% identical to
the same region of 3-OST-3A. Together, these data argue
that the structure of this functionally important region is actively
maintained by gene conversion between 3OST3A and 3OST3B loci.
Interspecific mouse back-cross analysis identified the loci for mouse
3Ost genes and syntenic assignments of corresponding human
isologs were confirmed by the identification of mapped sequence-tagged site markers. Northern blot analyses indicate brain exclusive and brain
predominant expression of 3-OST-4 and 3-OST-2 transcripts, respectively; whereas, 3-OST-3A and 3-OST-3B
isoforms show widespread expression of multiple transcripts. The
reiteration and conservation of the 3-OST sulfotransferase domain
suggest that this structure is a self-contained functional unit.
Moreover, the extensive number of 3OST genes with diverse expression
patterns of multiple transcripts suggests that the novel 3-OST
enzymes, like 3-OST-1, regulate important biologic properties of
heparan sulfate proteoglycans.
INTRODUCTION
Top
Abstract
Introduction
References
EXPERIMENTAL PROCEDURES
TriplEx
brain and liver cDNA libraries (), as
described previously (12). Positive plaques were purified, TriplEx-based plasmids were excised in vivo according to the
manufacturers protocol, and inserts were sequenced as described below.
strands, thus each nucleotide was detected within 200 bp of
a primer. Automated fluorescence sequencing was performed with
Perkin-Elmer Applied Biosystems Models 373A and 477 DNA Sequencers.
Each reaction typically yielded 400 to 600 bases of high quality sequence.
-32P]dCTP for GC-rich probes or
[
-32P]dATP for AT-rich probes, as described previously
(12).
25 °C and
all heterologous hybridizations were incubated at Tm
35 °C, where for a DNA:DNA hybrid Tm = 81.5 °C + (16.6 × log([Na+])) + (41 × percentage GC)
(500
length of probe template in bp)
(62 × percentage formamide) (25). Membranes were washed in 1%
SDS and sufficient SSC to generate a final incubation stringency of at
least Tm
25 °C or Tm
35 °C, as described above, respectively.
25 °C and all heterologous hybridizations
were incubated at Tm
35 °C, where for a
DNA:RNA hybrid Tm = 79.8 °C + (18.5 × log([Na+])) + (58.4 × percentage GC) + (11.8 × (percentage GC)2)
(820
length of probe
template in bp)
(50 × percentage formamide) (26). Membranes
were washed in 1% SDS and sufficient SSC to generate a final
incubation stringency of at least Tm
25 °C or
Tm
35 °C, as described above, respectively.
-32P]dCTP using a random primed labeling kit
(Stratagene); washing was done to a final stringency of 1.0 × SSCP, 0.1% SDS, 65 °C. A fragment of 6.3 kb was detected in
ScaI-digested C57BL/6J (B) DNA and a fragment of 8.4 kb was
detected in ScaI-digested M. spretus (S) DNA. The
3Ost2 probe, ST-2, detected major BglI fragments of 23.0 (B) and 16.0 (S) kb. The 3Ost3a probe, SPLAG-A,
detected major ScaI fragments of 16.5 and 5.4 (B) and 16.5 and 7.0 kb (S). The 3Ost3b probe, SPLAG-B, detected major
HincII fragments of 18.0 and 5.0 kb (B) and 9.0 and 5.0 kb
(S). Finally, the 3Ost4 probe, ST-4, detected
HindIII fragments of 1.7 (B) and 2.4 (S) kb. The presence or
absence of the M. spretus-specific fragments was followed in
back-cross mice.
RESULTS
TriplEx human cDNA libraries made from brain (3 × 106 plaques for 3-OST-2 and 3-OST-4) and liver (4.5 × 106 plaques for 3-OST-3) to identify 7, 8, and 4 additional
clones of 3-OST-2, -3, and -4 groups, respectively. The contained
inserts of the corresponding isolates were completely sequenced, which revealed two forms for 3-OST-2 (-2 and -2CTF) and 4 kinds
of 3-OST-3 cDNAs (-3A, -3ACTF,
-3B, and -3BCTF) (Fig.
1).4
The 3-OST-4 clones overlapped with clone 23279, but were all shorter
partial-length clones and so are not presented; thus, the longest
obtained 3-OST-4 cDNA contains an incomplete coding sequence. The
primary structures of 3-OST-2, -3A, and -3B
composite cDNAs are presented in Figs.
2, 3, and
4; whereas the sequence data for the
incomplete 3-OST-4 cDNA can be obtained from the GenBankTM /EMBL
Data Bank. The accompanying article (34) describes the analysis of
recombinantly expressed 3-OST-2, -3A, and -3B cDNAs, which confirms that the encoded enzymes specifically
3-O-sulfate glucosaminyl residues within heparan
sulfate.
View larger version (19K):
[in a new window]
Fig. 1.
cDNAs encoding 3-OST isoforms.
Schematic representation of four distinct 3-OST composite cDNAs
with boxes representing the protein coding region. Protein
regions homologous to the 3-OST-1 putative sulfotransferase domain are
cross-hatched; whereas, the nonhomologous amino-terminal
coding regions encompass cytoplasmic (stippled),
hydrophobic (inverse stippled), and SPLAG- (Ser,
Pro, Leu, Ala, Gly enriched) (wavey) domains. K
indicates the position of a conserved lysine of presumed catalytic
function. Within nucleic acid sequences, the sites of putative
polyadenylation signals are shown by the inverted triangles.
The sulfotransferase domains of 3-OST-3A and
3-OST-3B cDNAs (between and
) show nearly
identical sequences that differ by only 6 point mutations (
) found
in all 3-OST-3B clones between points
and
.
Indicated in the middle of each cDNA set are the size and location
of individual cDNA inserts, and corresponding plasmids
designations. Clones obtained by library screening are designated by
the pJL- prefix, whereas clone IDs are given for expressed
sequence tag clones.
shows positions of point mutations in 3-OST-2
inserts where the sequence differs from clone pJL-2.7. The
bottom of each cDNA set shows the size, location, and
designation of hybridization probes.
View larger version (74K):
[in a new window]
Fig. 2.
Composite nucleotide and predicted amino acid
sequences of 3-OST-2. The cDNA sequence was compiled from the
individual 3-OST-2 clones displayed in Fig. 1. The presented structure
corresponds to the allelic variant represented by clone pJL-2.7, the
four point mutations found in clones pJL-2.6 and c-20d10 are indicated
(double underline). Also shown within the nucleic acid
sequence are the locations of presumptive polyadenylation signals
(single underline). Shown within the amino acid sequence are
the hydrophobic region (single underline), the start of the
putative sulfotransferase domain ( ), and predicted sites for
O-linked (*), and N-linked (dot underlined
boldface type) glycosylations.
View larger version (83K):
[in a new window]
Fig. 3.
Composite nucleotide and predicted amino acid
sequences of 3-OST-3A. The cDNA sequence was
compiled from the individual 3-OST-3A clones displayed in
Fig. 1. The sequence starting at position 2315 was appended from the
3-OST-3ACTF clone (previously described under
Footnote 4) given that this splice variant contains the complete
3'-untranslated region. Shown within the nucleic acid sequence are
positions of point mutations which differ between 3-OST-3A
and 3-OST-3B (double underline) and the location
of presumptive polyadenylation signals (single underline).
,
, and
are described in the text. Shown
within the amino acid sequence are the hydrophobic region (single
underline), the start of the putative sulfotransferase domain
(
), and predicted sites for O-linked (*), and
N-linked (dot underlined boldface type)
glycosylations.
View larger version (77K):
[in a new window]
Fig. 4.
Composite nucleotide and predicted amino acid
sequences of 3-OST-3B. The cDNA sequence was
compiled from the individual 3-OST-3B clones displayed in
Fig. 1. Shown within the nucleic acid sequence are positions of point
mutations which differ between 3-OST-3A and
3-OST-3B (double underline). ,
,
, and are described in the text. Shown within the amino acid
sequence are the hydrophobic region (single underline), the
start of the putative sulfotransferase domain (
), and predicted
sites for O-linked (*), and N-linked (dot
underlined boldface type) glycosylations.
Table I summarizes the major structural
features of all composite cDNA forms. The length of the
5'-untranslated region from the full-length cDNAs varies widely
(from 72 to 798 bp) and all ATG codons within this region are followed
by in-phase termination codons. For each full-length cDNA, the
assigned coding region is by far the longest open reading frame and
begins with an initiation ATG conforming to Kozak's consensus sequence
(a purine at 3 and/or a G at +4) (35). Moreover, each initiator
sequence is precede by one or more in-phase termination codons. A
consensus polyadenylation signal (AATAAA) occurs within 20-30 bp of
the 3'-untranslated region termini and is followed by a poly(A) tail
for all cDNAs except 3-OST-3B (Fig. 1). This
distinction indicates the 3-OST-3B composite cDNA
contains only an incomplete 3'-untranslated region; especially since
the cDNA is 4.2 kb shorter than its corresponding transcript (Table
I). For the 3-OST-2 cDNA, an alternate site for polyadenylation is
also predicted by an extra signal occurring ~200 bp from the
3'-untranslated region termini.
|
The composite 3-OST-2 cDNA sequence presented in Fig. 2 was derived
from clones pJL-2.1, pJL-2.2, pJL-2.3, and pJL-2.7 (Fig. 1). However,
clones c-20d10 and pJL-2.6 both differ from the composite cDNA
sequence at four positions (G804 A, T1249
G, T1350
C, C1507
T) (Figs. 1 and
2). These two clones were isolated from different libraries and so the
sequence discrepancies could not have possibly arisen by errors in
reverse transcription or cDNA amplification. Given that the human
3OST2 gene is single copy, described below, these differences indicate
allelic variation. The G804
A transition is the
only coding region variant, but does not alter the amino acid sequence.
The remaining mutations are found in the 3'-untranslated region; thus,
all mutations may be silent.
Most importantly, significant nucleic acid sequence conservation only
occurs for the sulfotransferase domain portion of the cDNAs. Within
this span, each of the novel cDNAs exhibits ~55% identity to
3-OST-1. However, sulfotransferase domains share ~72% identity
between 3-OST-2, -3, and -4 classes. Conservation is most extreme
between 3-OST-3A and 3-OST-3B, with 99.2%
identity over 774 bp that encodes the entire sulfotransferase domain
region of 3-OST-3B (between and
in Figs. 3 and 4).
Immediately after this shared region the 3-OST-3A coding
sequence extends two codons (Gly and Stop), whereas the
3-OST-3B cDNA just encodes a Stop codon. Thus, the
predicted sulfotransferase domain of 3-OST-3A is 1 amino
acid longer than that of 3-OST-3B. The nearly identical regions could have resulted from a single genetic locus by alternative splicing, but only if the nonidentical residues stem from allelic variation. However, this possibility is statistically unlikely (p = 0.016).5
Alternative splicing is completely excluded by genomic restriction mapping, which reveals separate 3-OST-3A and
3-OST-3B genes, as described below (Fig. 6). The profound
conservation of a genomic segment between distinct loci is indicative
of gene conversion, as described below.
Characterization of 3OST Genomic Loci
Four 3OST3 Genes Exist-- The genomic loci of the various 3OST genes were characterized to identify the origins of these structurally related cDNAs. The copy number of all known 3OST genes was assessed by Southern blot analysis of human genomic DNA with gene specific probes (Fig. 5A). This analysis suggests that 3OST1, 3OST2, and 3OST4 only occur as single copy genes. Heterologous hybridization of the same probes to mouse genomic DNA separately digested with the same 5 restriction enzymes, described under Fig. 5, yielded comparable results (data not shown). The combined analyses strongly argue that both humans and mice possess only single copies of 3OST1, 3OST2, and 3OST4 genes.
|
Southern analysis targeting distinct gene regions reveals the human 3-OST-3 multigene subfamily. Sulfotransferase domain sequences common to all 3-OST-3 cDNAs were detected with the probe ST-3, whereas 3'-untranslated regions specific to 3-OST-3A or 3-OST-3B cDNAs were detected with probes 3'A or 3'B, respectively (probe locations shown in Fig. 1). The existence of at least two 3OST3 genes was initially suggested by hybridizations to EcoRI-digested genomic DNA. ST-3 revealed two bands, one exclusively detected by 3'A and the other identified only by 3'B (Fig. 5A). Indeed, we have recently identified genomic clones of two distinct genes 3OST3A1 and 3OST3B1, as noted under "Experimental Procedures." However, BstXI digestions suggest greater complexity as ST-3 displayed three bands of about 2.0, 1.1, and 0.5 kb in a 1:2:1 stoichiometry, respectively. 3'A detected both of the weak bands; whereas 3'B identified only the strong band (Fig. 5A). This pattern could only result from just two 3OST3 genes if a single copy 3OST3A gene has an allelic BstXI restriction fragment length polymorphism. Alternatively, if such an allelic polymorphism is not present, then the pattern must result from a minimum of four 3OST3 genes with BstXI sites differing in the two 3OST3A forms but being invariant in the two 3OST3B copies. The possibility of only two 3OST3 genes was excluded by ST-3 probing of BstXI-digested genomic DNA from an additional 16 unrelated individuals. In contrast to an allelic segregation pattern, all samples generated the identical 1:2:1 band pattern described above (data not shown).
Duplication of the amino-terminal portions of 3OST3A and 3OST3B genes was also documented by Southern analyses with the isoform-specific probes SPLAG-A and SPLAG-B, respectively (Fig. 5B). Only some of the detected fragments were predicted from a computer-generated restriction map of 3OST3A1 and 3OST3B1 gene sequences, described above. Consequently, these fragments (A1 and B1 in Fig. 5B) are derived from 3OST3A1 and 3OST3B1. The additional unanticipated bands (A2 and B2) document the duplicated amino-terminal regions of genes 3OST3A2 and 3OST3B2. Single bands were occasionally detected (A1 & 2 and B1 & 2 in Fig. 5), which indicates conservation of sequence within a gene pair. We conclude from the above data that the human genome contains two 3OST3A genes and two 3OST3B genes.
Inspection of genomic clone sequence also reveals that discriminating 0.5- and 2.0-kb BstXI fragments are derived from 3OST3A1 and 3OST3A2 genes, respectively. Similarly, a BamHI polymorphism between ST-3 and 3'B differentiates the genes 3OST3B1 from 3OST3B2, respectively.6 Examination of the individual cDNAs inserts that encompass these defining regions shows the 3-OST-3A clone pJL-3.4 derives from 3OST3A1; whereas the 3-OST-3B clones pJL-3.6, -3.7, and -3.9 all originate from 3OST3B1. However, the limited number of analyzed clones are insufficient to exclude functionality of 3OST3A2 and 3OST3B2. Moreover, it remains unclear whether each gene pair produces identical or distinct products.
Southern analysis does not always resolve each member of a human 3OST3 gene pair (e.g. EcoRI of Fig. 5), suggesting a high degree of sequence homology between each pair of copied genes. Accordingly, we assessed the extent of similarity by performing genomic restriction mapping on the 3'-untranslated regions of 3OST3A and 3OST3B forms, given that 3'-untranslated region sequences are typically divergent, even within multigene families (36). The data demonstrate a high degree of identity in the 3'-untranslated regions of each pair of copies; indeed, discrimination between members of each gene pair was not observed with any of the employed enzyme combinations (Fig. 6). This suggests either a very late duplication of 3OST3A and 3OST3B forms, or a concerted mechanism, i.e. gene conversion, to maintain primary structures. We note the murine genome must contain at least one copy of both forms,7 which indicates that human 3OST3A and 3OST3B genes cannot have resulted from late duplication. Accordingly, the human 3OST3 genes have apparently been subjected to gene conversion. At the minimum, gene conversion homogenizes the sulfotransferase domain sequences between human 3OST3A1 and 3OST3B1 loci. It is even possible that conversion maintains the 3' structural similarities between the 3OST3A gene pair and between the 3OST3B gene pair.
|
Chromosomal Localization of Mouse 3Ost Loci-- The mouse chromosomal location of each 3Ost locus was determined by interspecific back-cross analysis using progeny derived from matings of [(C57BL/6J × M. spretus)F1 × C57BL/6J] mice. This interspecific back-cross mapping panel has been typed for over 2500 loci that are well distributed among all the autosomes as well as the X chromosome (27). C57BL/6J and M. spretus DNAs were digested with several enzymes and analyzed by Southern blot hybridization for informative restriction fragment length polymorphisms using cDNA probes specific for each gene. The strain distribution pattern of each polymorphism in the interspecific back-cross mice was then determined and used to position the 3Ost loci on the interspecific map (Fig. 7).
|
3Ost1 mapped to the proximal region of mouse chromosome 5, 0.5 centimorgan distal of Msx1 and 3.7 centimorgan proximal of Bst1. 3Ost2 and 3Ost4 mapped to the distal region of chromosome 7: 3Ost2 did not recombine with Pkcb in 165 animals typed in common, suggesting that the two loci are within 1.8 centimorgans (upper 95% confidence limit), and 3Ost4 mapped 2.3 centimorgans distal of this cluster and 0.7 centimorgans proximal of Spn. Finally, 3Ost3a and 3Ost3b mapped to the central region of mouse chromosome 11 and did not recombine with each other in 141 mice typed in common, suggesting the two loci are within 2.1 centimorgans of each other (upper 95% confidence limit). The cluster of the two murine 3Ost3 genes is 3.8 centimorgans distal of Csfgm and 2.4 centimorgans proximal of Myhsf1 on mouse 11. The very tight linkage between 3Ost3a and 3Ost3b suggests that the genes arose by a tandem duplication event.
We have compared our interspecific map of chromosomes 5, 7, and 11 with a composite mouse linkage map that reports the map location of many uncloned mouse mutations (provided from Mouse Genome Data Base, a computerized data base maintained at The Jackson Laboratory, Bar Harbor, ME). The 3Ost loci mapped in regions of the composite map that lack mouse mutations with a phenotype that might be expected for an alteration in these loci (data not shown).
The proximal region of mouse chromosome 5 shares a region of homology with human chromosome 4p (Fig. 7). Our placement of 3Ost1 in this interval suggests that the human isolog 3OST1 will map to 4p, as well. The distal region of mouse chromosome 7 shares regions of homology with human chromosomes 11p, 16p, and 10q. Both Pkcb and Spn have been mapped to 16p in human chromosomes. The tight linkage in mouse between Pkcb and Spn, and 3Ost2 and 3Ost4 suggests that the human isologs 3OST2 and 3OST4 will also map to human 16p. Indeed, the identification of cloned mapping markers confirms that 3OST2 and 3OST4 localize to human 16p12 and 16p11.2, as described under "Experimental Procedures." Similarly, 3Ost3a and 3Ost3b map between Csfgm and Myhsf1 in mouse. These two latter genes have been assigned to 5q31 and 17pter-p11 in humans, respectively, which suggests the human 3OST3 genes will map to 5q or 17p. The identification of cloned markers resolves this ambiguity and shows that 3OST3A1 and 3OST3B1 both localize to 17p12-p11.2. The 3OST chromosomal regions lack human disorders with a phenotype that might be expected for an alteration in these loci (data not shown).
Tissue and Cell-type Specific Expression of Multiple Transcripts
Northern analyses with isoform-specific probes reveal tissue-specific expression for members of the multigene family (Fig. 8). Moreover, the more ubiquitously expressed members produce multiple transcripts that predominantly show coordinate regulation. 3-OST-4 exhibits the most selective pattern with only a single transcript detected in brain. The transcripts of 3-OST-2 predominantly occur in brain, but low expression is also observed in heart, placenta, lung, and skeletal muscle. Levels of the two 3-OST-1 transcripts are predominant in kidney and brain, intermediate in heart and lung, and low but detectable in the remaining analyzed organs (Fig. 8). The 3-OST-3 forms show the broadest expression pattern and the largest number of transcript forms. Although most tissues express both 3-OST-3A and 3-OST-3B, quantitative differences are evident. For example, the highest expression of 3-OST-3A occurs in heart and placenta, whereas 3-OST-3B is most abundant in liver and placenta. Furthermore, each tissue exhibits a distinct ratio of 3-OST-3A subtypes and 3-OST-3B subtypes. For 3-OST-2, 3-OST-3A, and 3-OST-3B the small transcripts of minor abundance are alternative splice variants that encode the unusual carboxyl-terminal fragments. The characterization of these and additional 3-OST-3 transcript classes shall be provided in a separate communication (as described above).
|
Interestingly, 3-OST-3 versus 3-OST-2 and 3-OST-4 transcripts show essentially reciprocal tissue expression. In contrast, the tissue-specific pattern of 3-OST-1 has overlap with all other types. However, Northern analysis of RNA samples from immortalized and primary endothelial cells that have previously demonstrated 3-OST-1 transcripts (12); failed to detect expression of 3-OST-2, -3, or -4 isoforms with specific probes (data not shown). Thus, 3-OST forms are also expressed in a cell type-specific fashion. Indeed, a Northern survey of several immortalize nonendothelial cell lines with 3-OST-3 probes reveals cells that express exclusively 3-OST-3A, exclusively 3-OST-3B, or varying proportions of both transcript types.8
Predicted Protein Structures
Extensive data base searching revealed the full-length 3-OST-2, -3A, and -3B enzymes and the partial length 3-OST-4 sequence to all be novel proteins. The 3-OST-2, -3A, and -3B cDNAs predict type II integral membrane proteins (37) of 367, 406, and 390 residues, respectively. Each is comprised of four domains beginning with a short (19-32 residues) amino-terminal cytoplasmic tail that exhibits a net positive charge (3-OST-2, -3A, and -3B contain 32, 12, and 19% basic residues but only 0, 4.2, and 3.1% acidic residues, respectively) and terminates with 2 or 3 basic residues (Figs. 2, 3, and 4). Interestingly, this segment of 3-OST-3B contains a polyproline run of 7 residues (residues 22-28).
The second domain is hydrophobic, has a high probability of forming
-helix, and is flanked by charged residues; thus, is anticipated to
function as a membrane spanning segment (Figs. 2, 3, and 4).
Kyte-Doolittle hydropathy analysis reveals this section to be the only
hydrophobic region of sufficient length to cross a membrane. The
lengths of the hydrophobic regions of 3-OST-2 and -3-OST-3A
(22 and 19 residues, respectively) are typical for transmembrane
domains; however, 3-OST-3B has a 33-amino acid stretch of
hydrophobic groups. Although the extent to which the 3-OST-3B hydrophobic region is buried in the membrane
is presently unclear, sequence analysis with trained neural networks
favors a transmembrane helix extending from Leu35 to
Gly53 (21). Interestingly, the hydrophobic regions contain
3, 2, and 5 Cys residues (3-OST-2, -3A, and
-3B, respectively), which is atypical of transmembrane domains.
The third domain ranges from 67 to 104 residues, and is designated as
the SPLAG domain due to an extreme enrichment in Ser, Pro, Leu, Ala, and Gly
(comprising 69, 62, and 70% of third domain residues in 3-OST-2,
-3A, and -3B, respectively). Consequently, this
region is predicted to be predominantly devoid of secondary structure,
with only 4.4, 10, and 13% of contained residues having potential to
form -helix or
-sheet, for 3-OST-2, -3A, and
-3B, respectively. Thus, this segment is likely to act as a
flexible stem which links the catalytic sulfotransferase domain to the membrane anchor. Only the stem region of 3-OST-2 contains cysteines (two residues present), with Cys55 and Cys73
potentially forming a disulfide bond that generates a peptide loop of
19 amino acids (Fig. 2). Within the SPLAG domain, 3-OST-2 contains a
single potential N-glycosylation site but all enzymes harbor
potential O-glycosylation sites (5, 2, and 6 sites for 3-OST-2, -3A, and -3B) with mucin-like
clustering (Figs. 2-4). A similarly high enrichment of SPLAG residues
occurs in the amino-terminal stretch that abuts the sulfotransferase
domain of the intraluminal resident 3-OST-1, and also in the
putative stem regions of the type II structured NST-1, NST-2, and
heparan sulfate D-glucosaminyl 6-O-sulfotransferase (6-OST) but not in the stem of heparan
sulfate uronosyl 2-O-sulfotransferase (2-OST) (SPLAG
abundance 50, 59, 63, 52, and 21% in residues 21-52, 40-78, 43-83,
23-69, and 28-65, respectively, accession numbers given under Fig.
9). Despite the shared composition, the
SPLAG domains of these enzymes do not show significant homology of the
primary sequences.
|
The final region of ~260 residues extends to the carboxyl terminus
and is the putative sulfotransferase domain. Although the 3-OST-2,
-3A, and -3B enzymes all show a common regional
organization, only the primary structures of the sulfotransferase
domain show significant homology (Fig. 9A). Indeed, the
3-OST-3A and 3-OST-3B sulfotransferase domains
are almost identical, except the 3-OST-3A form contains an
additional carboxyl-terminal residue (Gly406). As described
above, this identity results from the 3-OST-3A and
3-OST-3B cDNAs exclusively sharing a common
sulfotransferase domain sequence. The entire sulfotransferase domain is
extremely basic (about 20% His, Lys, Arg versus 10% Glu
and Asp); however, this region does not exhibit previously recognized
heparin binding motifs (38). Only two cysteine residues are
present, which are closely spaced and could form a disulfide bond to
generates peptide loops of 13 amino acids, respectively (Figs. 2-4).
The 3-OST-2 and the common 3-OST-3 domains contain 3 and 2 potential
sites for N-glycosylation but all show a single
potential O-glycosylation site. Interestingly, all
3-OST enzymes show a conserved potential N-glycosylation
signal just before the potential peptide loop (Fig. 9A,
consensus residues 214-216).
![]() |
DISCUSSION |
---|
The 3-OST Multigene Family and Heparan Diversity-- Heparan sulfate proteoglycans bearing glycosaminoglycans with distinct fine structures have been implicated in a myriad of biologic roles; however, the means to independently regulate the production of such a broad array of functionally important structures has remained largely unclear. Indeed, such a mechanism is only exemplified by the rate-limiting action of 3-OST-1. To find new candidates for regulating heparan sulfate structure, we identified expressed sequence tag clones homologous to the sulfotransferase domain of 3-OST-1 and subsequently isolated human cDNAs encoding 3-OST-2, -3A, -3B, and an incomplete clone of 3-OST-4. We also obtained novel splice variants encoding carboxyl-terminal fragments, which shall be separately described. Southern analyses revealed a surprisingly extensive multigene family, with 7 human members (3OST1, 3OST2, 3OST3A1, 3OST3A2, 3OST3B1, 3OST3B2, and 3OST4). However, the functionality of 3OST3A2 and 3OST3B2 remains to be established. Localization of the mouse isologs (3Ost1, 3Ost2, 3Ost3a, 3Ost3b, and 3Ost4) and bioinformatic identification of cloned markers predicts the chromosomal loci of the corresponding human genes. These analyses suggest that the human genes are not candidates for previously mapped genetic disorders.
Northern analyses show that the human 3-OST genes are differentially regulated in both tissue and cell type-specific fashions, testifying to distinct functional roles. Moreover, multiple transcript sizes occur for most isoforms. Multiplicity has also been observed for the transcripts of heparan biosynthetic enzymes NST-1, 2-OST, and uronosyl C5-epimerase (39-41). Additional mRNAs might engender enhanced regulatory control or distinct functional properties. On one hand, the two 3-OST-1 messages probably differ by alternative splicing within the 5'-untranslated region, which occurs extensively for the murine counterpart (12). Such differences in noncoding regions can provide for differential regulation of translational efficiency or message accumulation (42, 43). On the other hand, alternative splicing within the coding region produces minor transcript variants of 3-OST-2, -3A, and -3B, which encode carboxyl-terminal fragments that likely serve a nonenzymatic function. Presumably, the large number of 3-OST-3 transcripts implies participation in several biologic processes.
Distinct biologic roles for each isoform is also indicated by our elucidation that 3-OST-1, -2, and -3 forms each generate unique 3-O-sulfated structures (34). Given the paucity of 3-O-sulfated glucosaminyl residues within heparan sulfate (7, 23), the novel isoforms may mimic 3-OST-1 by functioning in a critical rate-limiting capacity (5, 10). The newly isolated enzymes should then serve as key regulatory components that enhance the functional diversity of heparan sulfate. We speculate that 3-OST-2 may play a role in the nervous system, whereas the 3-OST-3 isoforms might contribute to the permselectivity of the glomerular basement membrane (elaborated in Ref. 34). However, the extreme complexity of the multigene family suggests these enzymes may serve to modulate a rather diverse array of biologic functions.
Structural Features of the Divergent Amino-terminal Region-- Examination of the deduced structures of the novel enzymes reveals several common as well as distinctive features and provides a foundation for exploring the molecular basis of heparan sequence diversity. The 3-OST-2, -3A, and -3B enzymes are type II integral membrane proteins and so are structurally comparable to all previously cloned glycosaminoglycan biosynthetic enzymes except for 3-OST-1, which has an intraluminal resident style (12, 40, 41, 44-49). The architecture of type II enzymes is akin to that of the glycosyltransferases (46), which show two major functional regions. The large carboxyl-terminal region accounts for most of the intraluminal portion and forms a globular catalytic domain. The smaller amino-terminal region encompasses the cytoplasmic, transmembrane, and flexible stem domains; however, residues from each of these regions have been shown to direct localization to Golgi subcompartments (50). Thus, the entire amino-terminal region may be considered in terms of compartmentalization and protein-protein interactions.
The 3-OST family parallels this division via the conserved carboxyl-terminal sulfotransferase domain and the divergent amino-terminal regions. That these two regions may be functionally discrete is supported by examination of the presumptive Caenorhabditis elegans 3-OST. In this organism, we have identified only a single gene and the encoded enzyme shows features of a primordial 3-OST. Specifically, the sulfotransferase domain is most closely related to the type II enzymes (Fig. 9B); however, the amino-terminal domain shows an intraluminal resident style like 3-OST-1.9 If this hybrid structure represents the primordial enzyme, then the type II amino-terminal domain must have evolved long after the elaboration of a functional sulfotransferase domain. Functional distinctiveness is also favored by the determination that 3-OST-3A and 3-OST-3B generate identical 3-O-sulfated disaccharides (34). Thus, sulfation specificity corresponds to the nearly identical sulfotransferase domains and is not perturbed by the unique amino-terminal regions.
That the amino-terminal region serves a compartmentalization/protein interaction role is supported by an analysis of NST-1, which occurs in the trans-Golgi network. The amino-terminal 161 residues are sufficient for retention within the Golgi (51). Within this region of NST-1, NST-2, and 6-OST the flexible stem shows a SPLAG enrichment comparable to the 3-OST stem region (SPLAG domain). However, the absence of such enrichment in 2-OST suggests that extreme SPLAG enrichment is not exclusively necessary for conveying flexibility and the SPLAG domain may thereby participate in an additional process, such as compartmentalization. Such a role could also account for the intraluminal retention of 3-OST-1, which is simply composed of an amino-terminal SPLAG domain fused to a carboxyl-terminal sulfotransferase domain. Compartmentalization/protein interactions may additionally involve residues within the transmembrane region or the cytoplasmic tail. In the first case, the unusual placement of cysteine residues within the transmembrane segment of 3-OST-2, -3A, and -3B raises the possibility of a covalent interaction with a retention partner or with biosynthetic components. Such a role has previously been proposed for the conserved cysteine residue that occurs in the membrane spanning domain of the syndecan-1 core protein (52). In the second case, the cytoplasmic tail of 3-OST-3B contains a polyproline tract. Poly-L-proline can form a rigid left-handed-helix and such motifs are critical elements bound by protein interaction modules such as SH3 and WW domains (53-55). In summation, protein-protein interactions within the amino-terminal regions may control the formation of specific heparan sulfate sequences by constraining the enzyme's spatial organization or functional interactions. Consequently, the unique amino-terminal regions of 3-OST-3A and 3-OST-3B may engender distinctive biologic roles to the virtually identical sulfotransferase domains.
Structural Features of the Conserved Sulfotransferase Domain-- 3-OST family members are defined by the highly conserved sulfotransferase domain. The importance of this structure is highlighted by our finding that gene conversion maintains virtually identical sulfotransferase domains between 3OST3A1 and 3OST3B1 genes. Gene conversion occurs in the germ line as a transfer of genetic information from donor to acceptor loci without alternation of the donor material. This process can prevent mutational drift and proceeds quite efficiently between nonallelic loci on the same chromosome (reviewed in Ref. 56), which would be constant with the proposed 3OST3 multigene cluster. It is especially striking that the limits of the converted DNA sequence correspond exactly to the limits of the sulfotransferase domain of 3-OST-3B.
We have previously employed simultaneous multiple sequence alignment to
shown that the sulfotransferase domain of 3-OST-1 shows homology to a
broad range of sulfotransferases, including cytosolic and Golgi enzymes
isolated from animals, plants, and bacteria (12). Critical features are
revealed by extending this comparison to include virtually all known
adenosine 3'-phosphate 5'-phosphosulfate (PAPS) requiring
sulfotransferases found in GenBankTM. Collectively, this group modifies
a broad range of molecules, yet these enzymes show a 260-290
carboxyl-terminal region with at least 25-30% similarity to each
3-OST sulfotransferase domain. Such conservation reflects common
structural and functional constraints imposed by the obligate cofactor
PAPS (57). In particular, we have observed that the consensus sequence
(L/I/V)3-4-X3-5-K-S-G-T-X1-2-(W/L) occurs near the amino terminus of the sulfotransferase domain of all
enzymes (each consensus residue occurs in at least 50 of 66 tested
sequences, minor conservative substitutions not presented). This
consensus predominantly overlaps conserved region I (of cytosolic sulfotransferases) that appears to be a critical active site component, as indicated by affinity labeling with a PAPS analog and by mutational analysis (58-60). The central basic residue, typically lysine (92%), is considered essential for stabilization of a transition state intermediate, as the Lys Ala mutant of flavonol 3-sulfotransferase dramatically reduces enzymatic activity with minimal affect on PAPS
binding (59). These assertions are confirmed by x-ray crystallography of the estrogen sulfotransferase bound to adenosine 3'-phosphate 5'-phosphate (a PAPS analog), where the consensus region forms a
-strand/P-loop/
-helix motif. The P-loop corresponds to the underlined tetrapeptide and amide nitrogens from each residue may
hydrogen bond with the 5'-phosphate. Moreover, N
from the central
lysine neutralizes the negative charge of this phosphate (61). Thus,
the above consensus ascribes a fundamental sulfotransferase structure
that is critically required for both the binding PAPS and the catalysis
of sulfate transfer. This consensus region is almost invariant
among the human 3-OST enzymes (Fig. 9A) and secondary structure analysis predicts a strand-loop-helix motif for each enzyme.
Moreover, the conserved lysine occurs in all heparan sulfate sulfotransferases and likely serves an equivalent catalytic role. Indeed, alanine mutagenesis of the conserved lysyl has recently been
shown to dramatically reduce sulfotransferase activity of 3-OST-110 and NST-1
(62).
A second, less well conserved, consensus K-(aliphatic)5-R-N-X2-(D/E)-X3-S-X-Y forms a sheet-turn-helix structure in the estrogen sulfotransferase and side groups from underlined residues interact with oxygens of the 3'-phosphate (61). This region is predicted to form a sheet-loop-helix structure in the 3-OSTs which would also be consistent with phosphate binding. Recently, Kakuta et al. (57) have similarly noted the importance of the above two regions (57); however, our analysis additionally reveals a previously unidentified structure, G-X(W/Y)-X2-H-X3-(W/L)2. We have determined that this third sequence maps to a loop-helix structure at the active site and the underlined residues are in a vicinity to approximate the 5'-sulfate of PAPS. These interactions could facilitate sulfate binding or enzymatic transfer. Of course, such contacts could not have been crystallographically observed because estrogen sulfotransferase was co-crystallized with the sulfate-free analog adenosine 3'-phosphate 5'-phosphate (61). This potential sulfate interaction region is predicted to also form a loop-helix structure in the 3-OSTs.
Comparison of just the heparan sulfate sulfotransferases allows the designation of three distinct types of sulfotransferase domains (HS1, HS2, and HS3; Fig. 9B). Although four sulfotransferase families are clearly delineated, the N- and 3-O-groupings both possess a related HS1 structure (~50% similarity between families) (Fig. 9B). Presumably, unique features of individual sulfotransferase domains enable discrimination of distinct precursor structures and thereby provide a mechanism for generating and regulating heparan sulfate sequence diversity. In this regard, the HS1 form is distinguished by a carboxyl-terminal region of ~30 residues that contains the presumptive cystine-bridged peptide loop (Fig. 9A, consensus residues 211-240). Within this highly conserved region, cysteines are invariant but the intervening 8-11 amino acids are poorly conserved. Indeed, the peptide loop is structurally distinct for each 3-OST isoform. Thus, this variable loop might serve to discriminate between different heparan sulfate structures and thereby account for the distinct sequences generated by individual 3-OST isoforms (34).
In conclusion, the multiple functions of heparan sulfate proteoglycans
necessitate a biosynthetic mechanism that tightly regulates the
generation of a myriad of distinct heparan sulfate fine structures. The
paradigm of 3-OST-1 shows that a biologic activity of heparan sulfate
can be individually regulated by controlling the level of a
sulfotransferase that contributes a rare modification to complete the
formation of a critical heparan sulfate sequence. The utility of this
mechanism may account for the large number of 3OST genes with distinct
tissue and cell type-specific expression patterns. 3-OST isoforms with
different sulfotransferase domains differentially place the rare
3-O-sulfate in different sequence contexts to presumably
regulate discrete biologic activities. This capacity of the
sulfotransferase domain to generate distinct sequences may in turn be
modulated by distinct amino-terminal domains. The elucidation of the
critical nonconserved and conserved residues which determine the
sequence specificity for sulfation and enzyme interactive
properties is fundamental groundwork toward understanding the
regulated production of defined monosaccharide sequences.
![]() |
ACKNOWLEDGEMENTS |
---|
We thank Linda M. S. Fritze and Debra J. Gilbert for excellent technical assistance. We are grateful for the technical expertise of Dr. Richard D. Cook and members of the HHMI/MIT Biopolymers Lab as well as Pushba Srivastava of the Molecular Medicine Unit (Beth Israel Deaconess Medical Center) for assistance in automated DNA sequencing. We thank members of the Rosenberg laboratory for insightful comments.
![]() |
FOOTNOTES |
---|
* This work is supported in part by National Institutes of Health Grant 5-PO1-HL-41484, and the National Cancer Institute, Department of Health and Human Services, under contract with Advanced BioScience Laboratories, Inc.The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) AF105374, AF105375, AF105376, AF105377, and AF105378.
¶ To whom correspondence and reprint requests should be addressed. Present address: Angiogenesis Research Center, Beth Israel Deaconess Medical Center, SL-418, 330 Brookline Ave., Boston, MA 02215. Fax: 617-975-5201; E-mail: nshworak{at}caregroup.harvard.edu.
Recipient of an American Heart Association, Massachusetts
Affiliate, Postdoctoral Fellowship.
** Recipient of a National Institutes of Health Postdoctoral Fellowship.
![]() |
ABBREVIATIONS |
---|
The abbreviations used are: 3-OST, heparan sulfate D-glucosaminyl 3-O-sulfotransferase; NST, heparan sulfate N-deacetylase/N-sulfotransferase; 3Ost, mouse 3-OST genes; 3OST, human 3-OST genes; B, C57BL/6J; S, M. spretus; SPLAG, Ser, Pro, Leu, Ala, and Gly enriched; 2-OST, heparan sulfate uronosyl 2-O-sulfotransferase; 6-OST, heparan sulfate D-glucosaminyl 6-O-sulfotransferase; PAPS, adenosine 3'-phosphate 5'-phosphosulfate; bp, base pair(s); kb, kilobase pair(s).
2
Where -GlcN is
(1
4)D-glucosamine; -GlcA is
(
1
4)D-glucuronic acid; -IdoA is
(
1
4)L-iduronic acid; NS, 2S, 3S, and 6S, respectively, define sulfate substituents in amino, 2-O,
3-O, or 6-O positions; and Ac is an acetyl group.
3 J. Inazawa, M. Isobe, N. G. Copeland, T. Kaisho, T. Mori, M. Itoh, K. Ishihara, D. J. Gilbert, N. A. Jenkins, and T. Hirano, submitted for publication.
4 The CTF designation denotes alternative splice variants that encode carboxyl-terminal fragments of each respective enzyme. These unusual forms are predicted to be localized to the cytosol and to lack sulfotransferase activity. These splice variants occur as minor transcripts of 1.8 (3-OST-2CTF), 1.6 (3-OST-3ACTF), and 2.4 (3-OST-3ACTF) kb. They shall be described fully in a separate article (N. W. Shworak, J. Liu, and R. D. Rosenberg, manuscript in preparation).
5
Within all five 3-OST-3B clones the
region from to
contains 6 silent point mutations in wobble
positions (C729
G, A762
C,
C798
T, C843
T, C852
T,
and C876
T), whereas both of the two
3-OST-3A clones lack these mutations (Fig. 1). If these
differences reflect the sequences of two distinct alleles of the same
gene, then the probability that both 3-OST-3A clones would
be of either single allele is 1/2 and the probability that all
3-OST-3B clones would be of the opposite allele is
(1/2)5. Thus, the observed exclusive distribution
would randomly occur with a frequency of 1/2 × 1/25 = 1/64 = 0.016. Thus, it is extremely unlikely that this exclusive distribution could have resulted from allelic variation of a single gene.
6 On BamHI digests, 3'B detects two equal intensity bands but only one co-hybridizes to ST-3. In addition, ST-3 reveals an unaccounted band that is not detected with 3'B or 3'A. Given that 3OST3B1 lacks a 3' BamHI, the data indicate that 3OST3B2 contains a BamHI site between ST-3 and 3'B (N. W. Shworak, unpublished data).
7 We have found mouse expressed sequence tag clones derived from each gene (GenBankTM accession numbers W14854, W49404, and W71608 from 3Ost3a, AA254888 and AA288201 from 3Ost3b), and have detected the corresponding genes with SPLAG-A and SPLAG-B probes by heterologous hybridization to genomic DNA, as described under "Interspecific Mouse Back-cross Mapping."
8 Exclusive expression of 3-OST-3A occurs in HeLa S3 (cervical carcinoma) and G361 (melanoma) cells; exclusive expression of 3-OST-3B in HL-60 (promyelocytic leukemia), MOLT-4 (lymphoblastic leukemia), and Raji (Burkitt's lymphoma) cells; whereas, both transcript types are found in K-562 (chronic myelogenous leukemia), SW480 (colorectal adenocarcinoma), and A549 (lung carcinoma) cells (N. W. Shworak, unpublished data).
9
C. elegans 3-OST was identified from
data banks as described in the legend to Fig. 9. The amino-terminal
portion (residues 1-22;
MKYRLLLILHLIDLISCGVIPN)
show striking similarities to 3-OST-1. In particular, the short
hydrophobic stretch with internal charged residues (single
underline) and a potential signal peptidase cleavage site (
)
(63), suggest C. elegans 3-OST is an intraluminal resident just like 3-OST-1 (12). Furthermore, residues immediately preceding the
sulfotransferase domain are nearly identical between C. elegans 3-OST
(double underline) and human 3-OST-1 (residues 44-48, GVAPN).
10 J. Liu, unpublished data.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() |
---|