(Received for publication, October 18, 1996, and in revised form, January 8, 1997)
From the Departments of Human Molecular Genetics and ¶ Virology, National Public Health Institute, FIN-00300 Helsinki, Finland and the § Orion Corporation, Orion Pharma, FIN-00700 Helsinki, Finland
Aspartylglucosaminidase (AGA) is a lysosomal
enzyme that catalyzes one of the final steps in the degradation of
N-linked glycoproteins. Here we have analyzed the
tissue-specific expression and regulation of the human and mouse AGA
genes. We isolated and characterized human and mouse AGA 5-flanking
sequences including the promoter regions. Primer extension assay
revealed multiple transcription start sites in both genes,
characteristic of a housekeeping gene. The cross-species comparison
studies pinpointed an approximately 450-base pair (bp) homologous
region in the distal promoter. In the functional analysis of human AGA
5
sequence, the critical promoter region was defined, and an
additional upstream region of 181 bp exhibiting an inhibitory effect on
transcription was identified. Footprinting and gel shift assays
indicated protein binding to the core promoter region consisting of two
Sp1 binding sites, which were sufficient to produce basal promoter
activity in the functional studies. The results also suggested the
binding of a previously uncharacterized transcription factor to a 23-bp stretch in the inhibitory region.
Aspartylglucosaminidase (AGA,1 EC 3.5.1.26) is a lysosomal hydrolase that catalyzes the cleavage of the N-glycosidic bond between asparagine and N-acetylglucosamine in the degradation of glycoproteins (1). Deficiency of the enzyme leads to an autosomal recessively inherited lysosomal storage disorder, aspartylglucosaminuria (AGU) (2). The human AGA gene has been assigned to chromosome 4 q 34-35, corresponding to mouse syntenic region 8B, where the mouse gene is located (3). Both cDNAs encoding for 346-amino acid long AGA polypeptides have been previously cloned, and the genomic structures of the genes were resolved (3-6). The 1041-bp coding regions are 84% homologous. Northern hybridization analysis of human control fibroblasts has demonstrated the presence of two mRNA species of 2.2 and 1.4 kb due to the utilization of alternative polyadenylation signals. In mouse liver, only one, even shorter, transcript has been found. (3).
AGA is a ubiquitous enzyme widely distributed in mammalian tissues (7). The three-dimensional structure of human AGA has been resolved by crystallization (8). The mature enzyme was shown to be a heterotetramer representing the only known eukaryotic member of the recently described enzyme family of N-terminal hydrolases (9). Furthermore, its intracellular synthesis, assembly, and catalytic function have been well established (10-12). However, only preliminary data exist on the expression of AGA enzyme in normal tissues and in the cells of AGU patients (7). Despite the household nature of the enzyme, some variation in the expression of AGA protein and in specific AGA activity has been observed between tissues; leukocyte homogenate and liver exhibit the highest levels of AGA activity, whereas brain tissue and fibroblasts display only 10% or less of the AGA activity detected in leukocytes. The distribution of AGA polypeptides has been shown to be similar in tissues from control individuals and AGU patients with the exception of brain samples. No trace of AGA protein has been detected in the cerebral cortex of AGU patients; this finding is in agreement with the clinical phenotype of AGU, in which the most severe symptoms are due to dysfunction in the central nervous system.
The present study was undertaken to investigate the function and
regulation of expression of the AGA gene. We present for the first time
data on the expression of AGA mRNA in various human and mouse
tissues and show that both of the differentially polyadenylated human
mRNAs are translated into a polypeptide. We have also characterized the promoter region of the human AGA gene and performed comparison studies with the mouse AGA 5 sequence. Following characterization of
the 5
sequence, we located the areas responsible for transcriptional activity by analyzing serial deletions of the human 5
-flanking sequence in a reporter construct. The binding sites for the
trans-acting regulatory proteins were evaluated employing the DNase I
footprinting assay and gel-shift method.
A
PCR-amplified DNA fragment containing the first exon and the
5-untranslated region of the AGA gene together with the AGA cDNA
were used as 32P-labeled probes to screen a human placenta
genomic lambda phage library (Stratagene). As a result, a DNA clone
containing 400 bp of the first intron of AGA and extending 12 kb
upstream was isolated. A 4.8-kb PstI fragment from the 3
end of the genomic clone was subcloned into pGEM3Zf(+) vector (Promega)
and sequenced from both strands. The 5
sequence of the mouse AGA gene
was previously cloned by us (3). Sequence analysis and comparison
studies were carried out with a GCG computer program using Compare,
Dotplot, or Bestfit. Putative binding sites for transcription factors
were identified using Findpatterns and a Tfsites GCG-file created by Dr. David Ghosh in publicly accessible transcription factor
database.
Northern blot analysis was carried out by
using commercially available human and mouse poly(A)+ RNA
membranes (Clontech). The blots were hybridized with a
32P-labeled human or mouse AGA cDNA and -actin
cDNA (Clontech). To determine 5
ends of human and mouse AGA
transcripts, total RNA was isolated by the guanidine thiocyanate/CsCl
method from cultured fibroblasts of normal human individuals and from
normal mouse liver tissue as described previously (13). Primer
extension of 15 µg of mouse liver and human fibroblast total RNA was
performed with 32P-end-labeled oligonucleotide
complementary to the human AGA gene region nt
138 to
169 (relative
to ATG) and to the mouse AGA gene region nt +33 to +1 as described
(14).
HeLa, N18 glioblastoma, and COS-1 cells were grown for 24 h in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum and antibiotics. The 85% confluent cells were transfected with 5 µg of plasmid DNA using Lipofectin reagent (Life Technologies, Inc.) as described previously (15).
Construction of Human AGA/hGH Reporter PlasmidsPCR-amplified genomic fragments containing progressive
deletions of the 5-flanking sequence were subcloned into the
HindIII/SalI site of a promoterless plasmid pØGH
(Nichols Institute, San Juan Capistrano, CA) containing the human
growth hormone (hGH) gene (16). The resulting chimeric constructs were
AGA(+232 to +279)hGH, AGA(+156 to +279)hGH, AGA(
2 to +279)hGH,
AGA(
143 to +279)hGH, AGA(
322 to +235)hGH, AGA(
474 to +235)hGH,
and AGA(
968 to +279), where +1 corresponds to the transcription
initiation site.
HeLa and glioblastoma cells were transfected with the reporter plasmid constructs. A pXGH5 (Nichols Institute) plasmid containing the mouse metallothionein-I promoter fused to hGH structural sequences was used as a control. After a 48-h incubation, aliquots of media were collected and assayed for hGH protein in duplicate by using a commercially available radioimmunoassay (Nichols Institute). Dot blot hybridization analysis with a 32P-labeled reporter plasmid was used to monitor the differences in transfection efficiencies (17).
Preparation of Nuclear ExtractsNuclear extracts from HeLa cells were prepared as described previously (18). Protein concentration was determined according to Bradford (19). A commercially available HeLa cell nuclear extract (Promega) was used in some assays.
DNase I FootprintingThe probes for DNase I footprinting
analysis were prepared essentially as described previously (20).
Appropriate genomic regions of the AGA promoter DNA were PCR amplified
and subcloned into pGEM3Zf(+) vector (Promega). The construct AGA(2
to +279)hGH was used for probe FP1. The regions of human AGA promoter
analyzed in DNase footprinting were FP1 (
2 to +279), FP2 (
247 to
+58), and FP3 (
471 to
199) numbered relative to the major
transcription start site. Both template and sense strand probes were
analyzed (only reactions with the sense strand are shown). DNase I
footprint assays with 5-20 µg of HeLa cell nuclear extract and 1 footprinting unit of Sp1 protein were carried out essentially as
described (20, 21). When Sp1 protein was used, nonspecific competitor poly(dI-dC) DNA was not added. The digestion was performed with 0.1-0.6 units of DNase (Promega).
Sp1
(5-GGGCGCCAGGCGGGCGGGGC), Inh (5
-TAGGCCGTTTCTGTTTTTCTTCC), and an
unrelated competitor (5
-AGGAAGTGCTACAAAAAGCTGTGGTG) oligonucleotides
and their complementary strands were synthesized by an automatic DNA
synthesizer, purified on 7 M urea, 15% polyacrylamide gel
and annealed in 10 mM Tris-HCl, pH 7.4, 150 mM
NaCl, 10 mM MgCl2. The Sp1 consensus
oligonucleotide (5
-ATTCGATCGGGGCGGGGCGAGC) used in competition
assays was purchased from Promega. The probes were labeled with
[
-32P]ATP (Amersham Life Science, Inc.) and T4
polynucleotide kinase (Pharmacia Biotech Inc.). The binding reactions
were performed as described previously except that in Sp1 assays,
instead of 5 µg of HeLa nuclear extract, 1 footprinting unit of Sp1
protein (Promega) was used (22).
To study AGA mRNA expression in diverse tissues,
Northern blot analysis of ten different human and mouse tissues was
performed using commercially available multitissue membranes (Fig.
1). AGA mRNA is detected in all human and mouse
tissues studied, except in mouse brain and spleen where mRNA levels
are virtually undetectable. In human brain, only the longer transcript
is expressed. Since low enzyme levels have been detected in brain (7),
we further explored whether the longer 2.2-kb mRNA is translated
into a polypeptide. The polyadenylation signals for the shorter
mRNA were destroyed by site-directed mutagenesis, and the mutant
construct coding for only the longer 2.2-kb mRNA was in
vitro expressed in COS-1 cells. A shorter construct containing AGA
cDNA was used as a control. Immunoprecipitation analysis
demonstrated that the longer mRNA also produces polypeptide (data
not shown).
Determination of Transcription Start Sites of the Human and Mouse AGA Genes
Accurate mapping of the 5 ends of the human and mouse
AGA genes was accomplished by using primer extension (Fig.
2). In the human AGA gene, one major transcription start
site
298 (relative to the ATG translation start codon) and two minor
sites
286 and
395 were detected. The initiation of transcription in
the mouse AGA gene was scattered in a larger region. Results displayed
multiple transcription start sites between nucleotides
70 and
142
(relative to ATG). No major transcription start site was present.
Isolation and Characterization of the 5
To isolate the 5 regions of the human AGA gene,
a human genomic
phage library was screened using PCR-amplified
genomic and cDNA fragments of AGA as probes. Finally, a 4.8-kb
fragment upstream from the first intron of the AGA gene was subcloned
into a plasmid vector and sequenced to produce 3.9-kb of novel 5
AGA sequence. The mouse AGA gene together with its 5
-flanking region has
been recently cloned (3). Here we have sequenced a total of 1000 bp of
5
upstream region of the mouse AGA gene. A computerized analysis (GCG
program) of the human AGA sequence revealed two complete Alu-repeats,
one direct and one inverted (data not shown). The sequence homologies
of the repeats to the Alu consensus sequence were 82% and 88%
respectively. The GC content of the human AGA 5
-untranslated region
was determined to be 58%, while in the coding region of AGA it was
46%. The GC contents of the mouse AGA 5
-untranslated region and
coding region were 61% and 47%, respectively.
The alignment of human and mouse 5-flanking sequences by the GCG
computer program demonstrated 58.2% homology (Fig.
3A). Subsequent comparison by two different
programs, linear sequence, and dot matrix analyses displayed a region
of highest homology covering 442 bp from nt
475 to
916 (relative to
ATG translation start codon) in the human and 453 bp from nt
550 to
1002 in the mouse AGA gene (Fig. 3, A and B).
The sequence identity in this particular region was 76.5%.
Unexpectedly, approximately 500 bp of the human and mouse AGA gene
immediately upstream of the translation initiation site were
significantly less homologous than the sequence further upstream. To
ascertain that this was not due to any cloning artifact, the human and
mouse AGA 5
regions were PCR amplified and sequenced from genomic DNA.
No changes as compared with the genomic
clones could be detected
(data not shown). More detailed analysis of the promoter sequences
revealed several putative binding sites for transcription factors that are indicated in Fig. 3A.
Sequence analysis of the human and mouse AGA
5 regions. A, optimized alignment of the human and mouse 5
AGA sequences extending approximately 900 bp upstream of ATG
translation initiation codon. The large arrow indicates the
major transcription initiation site of the human AGA gene while two
minor start sites are indicated by smaller arrows. The human
sequence is numbered relative to the major transcription
start site. A triangle depicts 50 bp of nonhomologous
sequence in the mouse AGA gene, and dots indicate where gaps
have been placed for optimal alignment. The region of the highest
homology (76%) is boxed. The inhibitory region detected in
functional analyses overlaps with the sequence of the highest homology and is marked by bold
lettering. Putative sites for transcription factor binding found
in both the human and mouse sequences are bordered by a box.
Binding site motifs found only in either of the sequences are indicated
by brackets above (human) or below (mouse) the
sequences. Putative binding motifs indicated are TATA (45), CAAT (45),
Sp1 (45, 46), AP-1 (45), AP-2 (45, 47), Ecr (48), PEA3 (45), Ets-1 (45), HNF-5 (45), XRE (49), TFIID-EIIa (50), histone H4 (51), GH-CSE2
(52), and NFk
(45). B, dot matrix analysis by GCG program
using Compare and Dotplot. The window size was 21 nucleotides, and the
stringency was 14. C, a diagram illustrating the overlapping
of the highest homology region and the inhibitory region detected in
the functional analyses of the human AGA gene. The sequence is
numbered relative to the major transcription start site of
the human AGA gene.
Functional Analysis of Human AGA Promoter Region
To define
the regions accounting for transcriptional activity, seven deletion
constructs consisting of variable lengths of the 5 region of the human
AGA gene were produced (Fig. 4). The fragments including
putative regulatory elements were inserted into a promoterless hGH
reporter plasmid. HeLa and glioblastoma cells were transiently
transfected with the fusion genes, and the transcriptional efficiency
of each construct was determined by measuring the amount of hGH
secreted into the culture medium. The highest transcriptional
efficiencies were obtained with constructs AGA(
143)hGH in HeLa cells
and AGA(+156)hGH in glioblastoma cells (Fig. 4). In HeLa cells, a
deletion extending to nt +232 completely abolished the transcriptional
activity. The construct AGA(+156)hGH containing three putative Sp1
binding sites restored 36% activity while the construct containing 143 bp upstream of the transcription initiation site was sufficient to
produce the highest promoter activity. The activity observed with
AGA(
322)hGH was only 22%, suggesting that the region spanning nt
322 to
143 may bind a negatively acting transcription factor (Figs.
3A and 4). This region overlaps with the highest homology
area between the human and mouse sequence (Fig. 3C). In
glioblastoma cells, the inhibitory effect was milder and detected over
a relatively larger area extending from nt
474 to
143.
Protein Binding Elements of the AGA Promoter
To determine
whether the differences observed in the deletion analysis were related
to the actual binding of nuclear proteins, three fragments, FP1-FP3,
from the 5-flanking region of the human AGA gene were analyzed by
DNase I footprinting assays using purified Sp1 protein or a nuclear
protein extract prepared from HeLa cells (Fig. 5,
A-D). The locations of the
protected fragments were determined from adjacent dideoxy sequencing
reactions. With probe FP1, a protected region from nt +214 to +240 was
detected using the Sp1 protein (Fig. 5B). This region
contains two overlapping Sp1 consensus binding sites (Fig.
3A). With HeLa cell nuclear extract, the footprint is seen
in a more restricted region. With probe FP2, no detectable protected
regions were observed. Probe FP3, overlapping the inhibitory region
identified in the functional analysis, revealed a protected area from
nt
321 to
292 (Fig. 5D).
Binding of nuclear proteins to the protected regions was further
assessed by gel retardation assays. A 20-bp double-stranded oligonucleotide, nt +207 to +226, (5-GGGCGCCAGGCGGGCGGGGC) containing two Sp1 binding sites, that protected a region in footprinting analysis
with FP1 was analyzed with purified human Sp1 protein. The results show
formation of a specific complex, which completely disappears in the
presence of 100-fold molar excess of an unlabeled Sp1 consensus
oligonucleotide (Fig. 6A). Analysis of the
protected region detected with probe FP3 in the inhibitory region using a 23-bp double-stranded oligonucleotide, Inh, nt
322 to
300, (5
-TAGGCCGTTTCTGTTTTTCTTCC), and HeLa cell nuclear extract also revealed one DNA-protein complex (Fig. 6B). In competition
assays with an unlabeled Inh oligonucleotide, a gradual decrease in the intensity of the complex is seen as the concentration of the
oligonucleotide is increased. In contrast, the intensity of the complex
remains unaltered with increasing amounts of an unrelated competitor. This distinct difference detected between the assays with a
self-competitor and an unrelated competitor suggests binding of
proteins to this particular area, but no precise consensus motifs for
known factors could be identified by computer analysis.
The human and mouse AGA genes were found to be expressed in diverse tissues, consistent with the housekeeping role of the enzyme. In mouse brain and spleen, the AGA mRNA was virtually undetectable as judged by the steady-state mRNA levels. However, we have previously shown that AGA-specific mRNA is also present in mouse brain (3). Northern hybridization of the human brain RNA visualized only the longer AGA transcript, which we observed to produce polypeptide as well. To further evaluate this finding, the precise half-lives of the two forms of mRNA should be analyzed. The transcription initiation start sites of human and mouse genes were also quite characteristic of a housekeeping gene; multiple start sites were detected. In mouse, however, start site utilization is less well defined. This could implicate that the regulation of AGA has gained more importance during evolution and needs to be more strictly controlled in human.
To analyze the regulation of AGA expression, we isolated the
5-flanking region of the human AGA gene and compared it with the
recently cloned mouse AGA 5
sequence (3). The human sequence contained
an unusually high number of Alu repeats (23), which might be involved
in sequence rearrangements. The results of the comparison studies of
human and mouse 5
sequences were quite surprising; no significant
sequence homology was detected up to 500 bp upstream of the translation
initiation start codon. Similarly, no conserved proximal promoter
elements could be identified. Nevertheless, both human and mouse AGA 5
regions are relatively GC-rich, containing several putative Sp1 binding
sites. In the human AGA promoter, no TATA box relative to the major
transcription start site is present, suggesting that the gene is
regulated by a housekeeping-type promoter. There is, however a
TATA-like sequence
28 from one of the minor start sites, but it is
probably nonfunctional, since the region was not protected in the
footprinting analysis. Conventionally, housekeeping genes involved in
the metabolic functions of the cell are considered to be GC-rich and
lack a TATA box (24, 25). Many genes encoding for lysosomal enzymes
fulfill these criteria (26-32), but human glucocerebrosidase, mouse
-hexosaminidase Hexb, and murine
-glucuronidase genes do have
TATA elements (31, 33, 34). The lysosomal cathepsin D gene contains a
mixed promoter, which has features of a housekeeping gene as well as a
functional TATA box, when it is under estrogen regulation (35).
A number of TATA-less genes have been reported to contain initiator
elements (Inr) for determination of the transcription initiation site.
A loose sequence consensus, 5-YYCAYYYYY-3
(Y is pyrimidine), for
these elements had been noticed several years ago (36). Smale and
Baltimore (37) further restricted the consensus to 5
-CTCANTCT-3
(transcription initiation at A) in the murine terminal deoxynucleotide
transferase promoter. Two other types of initiators, YY1, binding a
consensus sequence 5
-AANATGGN(G/C)-3
(38, 39), and E2F, which binds
the sequence 5
-TTTCGCGC-3
in the dihydrofolate reductase promoter,
have also been identified (40-42). In two genes coding for lysosomal
enzymes, human
-glucuronidase and mouse HEXA, Inr sequence
homologies have been detected (19, 26). Some homology is seen in the
sequence at the human AGA major transcription initiation site
(5
-TTCCCAATAT-3
, initiation at the second T) as well, but the
transcription initiation takes place at T instead of A. The presence of
the major transcription initiation site in the human AGA gene would
justify existence of an Inr element, but it would have a somewhat
modified consensus sequence.
The functional analyses of the human AGA gene demonstrated that the
first 145 bp upstream of the translation initiation were sufficient to
produce the highest promoter activity in glioblastoma cells. In HeLa
cells, this region containing three putative Sp1 binding sites
exhibited 36% activity, which can be considered as a basal promoter
region since the activity clearly exceeded (1.5-fold, data not shown)
the activity of the pXGH5 control plasmid (see "Materials and
Methods"). However, in HeLa cells, an additional factor binding
upstream seems to be required for the highest promoter activity, which
is achieved with construct AGA(143)hGH. The activity observed with
construct AGA(
2)hGH is not significantly lower either, implicating
that the putative binding site for AP-2 could be responsible for this
enhanced activity (Fig. 4). Additional consensus sites for CAAT box,
AP-1, and Sp1 found in the upstream sequence may be contributing
factors. Moreover, the analyses pointed out a 181-bp region displaying
a strong inhibitory effect on the reporter expression in HeLa cells.
This region maps to the 3
end of the human-mouse homologous sequence,
possibly suggesting that this stretch of DNA may play an important role
in the regulation of AGA. In glioblastoma cells, a weaker inhibitory
effect in a larger region was detected. It can be speculated that the
expression of AGA is kept low under normal conditions and only in
certain situations, when it is needed in higher amounts, will the
inhibitory control decline leading to enhanced AGA gene
transcription.
Footprinting and gel-shift assays demonstrated binding of Sp1 protein to the same region that was sufficient in the functional analyses to provide the highest promoter activity in glioblastoma cells and basal activity in HeLa cells. Pugh and Tjian have concluded that the same set of basic initiation factors are required in the presence and absence of a TATA sequence, and that Sp1 acts to recruit TFIID to TATA-less promoters (43, 44). In the inhibitory region of AGA, a protected area was identified in the footprinting analysis, and the binding of protein(s) was further supported by gel-shift assays. In the competition assays, all the protein was not completely competed off as in Sp1 assays, most probably due to a more complex composition of HeLa nuclear extract. Future identification and purification of bound protein(s) is a prerequisite for detailed characterization of the inhibitory interaction. Moreover, detection of few protected regions in the footprinting assays may be due to weak DNA-protein interactions rather than to their complete absence.
In conclusion, the human aspartylglucosaminidase gene appears to be regulated by a core promoter consisting of two functionally important Sp1 binding sites and, possibly, an additional contributing AP-2 site. Moreover, a more distantly located region exhibiting inhibitory control on gene expression was detected. Subsequent studies in neuron cultures and in the AGU knock-out mouse model will be relevant to further characterize the regulation of the AGA gene, especially in neuronal tissues. The results presented here facilitate the elucidation of molecular pathogenesis of AGU disease and are essential for strategy design of potential gene therapy in the disease.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) U82618[GenBank] for the human and U82617[GenBank] for the mouse.
We thank Dr. Taina Pihlajaniemi from the University of Oulu for supplying the hGH plasmids.
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
All ASBMB Journals | Molecular and Cellular Proteomics |
Journal of Lipid Research | Biochemistry and Molecular Biology Education |