(Received for publication, September 27, 1996, and in revised form, December 17, 1996)
From the Department of Pathology and Kaplan
Comprehensive Cancer Center, New York University Medical Center, New
York, New York 10016 and the § Department of Biological
Sciences, SUNY, Albany, New York 12222
We previously purified a bovine pyrimidine
hydrate-thymine glycol DNA glycosylase/AP lyase. The amino acid
sequence of tryptic bovine peptides was homologous to Escherichia
coli endonuclease III, theoretical proteins of
Saccharomyces cerevisiae and Caenorhabditis elegans, and the translated sequences of rat and human
3-expressed sequence tags (3
-ESTs) (Hilbert, T. P., Boorstein, R. J.,
Kung, H. C., Bolton, P. H., Xing, D., Cunningham, R. P., Teebor, G. W. (1996) Biochemistry 35, 2505-2511). Now the human 3
-EST
was used to isolate the cDNA clone encoding the human enzyme,
which, when expressed as a GST-fusion protein, demonstrated thymine
glycol-DNA glycosylase activity and, after incubation with
NaCNBH3, became irreversibly cross-linked to a thymine
glycol-containing oligodeoxynucleotide, a reaction characteristic of
DNA glycosylase/AP lyases. Amino acids within the active site, DNA
binding domains, and [4Fe-4S] cluster of endonuclease III are
conserved in the human enzyme. The gene for the human enzyme was
localized to chromosome 16p13.2-.3. Genomic sequences encoding putative
endonuclease III homologues are present in bacteria, archeons, and
eukaryotes. The ubiquitous distribution of endonuclease III-like
proteins suggests that the 5,6-double bond of pyrimidines is subject to
oxidation, reduction, and/or hydration in the DNA of organisms of all
biologic domains and that the resulting modified pyrimidines are
deleterious to the organism.
When a pyrimidine residue in cellular DNA becomes modified by
oxidation, reduction, or hydration of its 5,6-double bond, repair is
initiated by a DNA-glycosylase activity that cleaves the
N-glycosyl bond of the damaged residue, releasing the
modified base and creating an abasic (AP) site in the DNA backbone.
Such DNA glycosylase activities have been identified in bacteria,
yeast, and mammalian species (1-8) The first such enzyme described was
Escherichia coli endonuclease III, which was identified not
on the basis of its DNA glycosylase activity, but rather because it
nicked UV-irradiated DNA (9). For this reason it was termed an
endonuclease, because it was thought that nicking resulted from
enzyme-catalyzed hydrolysis of internucleotide phosphodiester bonds at
sites of DNA damage. It has since been determined that the enzyme nicks
DNA not via hydrolysis, but by catalyzing -elimination of the
3
-phosphate group at the AP site formed as a result of the enzyme's
DNA glycosylase activity (10-12). The modified base that was
enzymatically released from UV-irradiated DNA proved to be cytosine
and/or uracil hydrate (8). Enzymes that effect base release together
with strand cleavage via
-elimination are now termed DNA
glycosylase/AP lyases and, in addition to endonuclease III, include
the Fpg protein of E. coli (13), the OGG1 protein of
Saccharomyces cerevisiae (14, 15), and T4 endonuclease V
(16).
DNA glycosylase/AP lyases function through N-acylimine
(Schiff's base) enzyme-substrate intermediates (17). Such
enzyme-substrate intermediates can be chemically reduced to stable
secondary amines, resulting in irreversible cross-linking of the
enzymes to their particular substrates (13, 16-18). We previously used
this cross-linking reaction to definitively identify a pyrimidine
hydrate-thymine glycol DNA glycosylase/AP lyase purified from calf
thymus. Incubation, done under reducing conditions, of a
32P-labeled oligodeoxynucleotide containing a single
thymine glycol (5,6-dihydroxy-5,6-dihydrothymine) residue with a
5000-fold purified enzyme preparation resulted in cross-linking of a
predominant 31-kDa protein to the oligodeoxynucleotide as determined by
SDS-PAGE1 analysis and phosphor imaging.
Tryptic digestion of this protein, followed by microsequencing of
several of the resulting peptides demonstrated that the bovine enzyme
was homologous to theoretical proteins translated from the genomic DNA
of S. cerevisiae and Caenorhabditis elegans. Both
of these theoretical proteins, in turn, were homologues of E. coli endonuclease III. The bovine peptide amino acid sequences
were also homologous to the translated sequences of 3-ESTs from
H. sapiens brain tissue (accession number F04657[GenBank]) and
Rattus sp. PC 12 cells (accession number H33255[GenBank]) (18). In
the current study, we used probes based upon the homologous human
3
-EST, to isolate clones that encode the human homologue of E. coli endonuclease III from a splenic cDNA library. Once
determined, the cDNA sequence was used to express the enzyme as a
functional recombinant protein and to determine the chromosomal
localization of the human gene.
[-32P]dCTP
(3000 Ci/mmol), [
-32P]dATP (3000 Ci/mmol), and
[methyl-3H]TTP (70-90 Ci/mmol) were purchased
from DuPont NEN.
Oligodeoxynucleotides based upon
the human 3-EST sequence (accession number F04657[GenBank]) were used to
isolate homologous clones from a Superscript human spleen cDNA
library in the pCMV-SPORT plasmid vector (Life Technologies, Inc.)
using the GENETRAPPER cDNA positive selection system (Life
Technologies), according to the manufacturer's protocol. Briefly, the
amplified double-stranded cDNA library was made single-stranded by
treatment with the Gene II product (phage F1) endonuclease and E. coli exonuclease III and then hybridized to a biotinylated sense
strand-specific oligodeoxynucleotide, P1
(5
-GTGGCACGAGATCAATGGACTCTTG). The cDNA-oligodeoxynucleotide hybrids were captured using streptavidin paramagnetic beads.
Nonspecifically bound cDNAs were washed away at high stringency,
and specifically bound cDNAs were eluted from the paramagnetic
beads by denaturing the cDNA-oligodeoxynucleotide hybrids. Selected
cDNA clones were then made double-stranded via repair, which was
primed by a second sequence-specific oligodeoxynucleotide, P2
(5
-ATCATTGGACTCTGGGTGGGC). The selected repaired plasmids were
electroporated into the E. coli strain DH5
and plated
onto Lennox L agar plates containing 50 µg/ml ampicillin (LB/amp
agar).
After 20 h of incubation at 37 °C, colonies were analyzed for
the presence of the desired cDNA insert via colony PCR, according to the manufacturer's protocol, using a second set of 3-EST-specific primers (P3, 5
-CAACAGGCGTGGCTTCCTGAAGCG; P4,
5
-GGTGGGCTTCGGCCAGCAGACCTGT) to maximize specificity of the selection
procedure. PCR was conducted as follows: 1 cycle of 95 °C for 2 min
and 37 cycles of 94 °C for 1 min, 60 °C for 1 min, 72 °C for 1 min, followed by a final cycle of 10 min at 72 °C. PCR products were
then analyzed by electrophoresis in a 1.2% agarose gel. Colonies that
proved positive through the first PCR, by virtue of the production of a
180-base pair product, were subjected to a second round of colony PCR
in order to determine the size of the inserts using T7 and SP6-specific
primers (5
-TAATACGACTCACTACTATAGGAGA and 5
-AGCTATTTAGGTGACACTATAG,
respectively). Of the 23 colonies obtained, 10 proved, through colony
PCR and sequencing analysis, to contain the sequence of interest.
In order to isolate additional cDNA clones that
contained long inserts and thus had a higher probability of containing
the full-length cDNA sequence, the GENETRAPPER cDNA selection
system was used a second time, substituting a second set of
oligodeoxynucleotides for capture (P5, 5-ACAGAGACTGCGTGTGGCCTATGAG)
and repair (P6, 5
-AAGAGAGCCTGCAGCAGAAGC) of the selected clones. These
primers were based not upon the human 3
-EST sequence but were specific for the 3
portion of previously sequenced cDNA inserts and
therefore were specific for the 5
portion of the mRNA. Colonies
were again screened, and insert size was determined by PCR as described
above. However, rather then using the T7 primer, an additional
sequence-specific primer, P7 (5
-CACCTTGCTCCAGAAACC), was used as a
primer in PCR with the SP6 primer to determine the size of the plasmid
inserts. PCR-positive colonies that contained the largest inserts were sequenced.
Additionally, to confirm the sequence of
the 5-terminus of the mRNA, the 5
-RACE System (Life Technologies)
was used to amplify the 5
-terminus of the message for sequencing. The
manufacturer's protocol for GC-rich cDNAs was followed. Briefly,
2.5 pmol of a gene-specific primer P8 (5
-CATCAGTGACAGCAGCACCT) was
hybridized to 100 ng of human spleen poly(A)+ RNA
(Clontech) and cDNA was synthesized using Superscript II Reverse
Transcriptase (Life Technologies). The RNA was then degraded with
RNase, and the cDNA was isolated. A poly(dC) tail was then added to
the 3
-terminus of the purified cDNA using dCTP and TdT, and the
cDNA region corresponding to the 5
-end of the mRNA was amplified by two successive rounds of PCR using additional
gene-specific primers P9 (5
-CATAGGCCACACGCAGTCTC) and P10
(5
-CTTCTGCTGCAGCCTCTCTTC), together with the anchor primers supplied
by the manufacturer.
The second round of PCR yielded a single amplified product that, when
analyzed by electrophoresis on a 1.2% agarose gel, corresponded in
size to what was expected on the basis of the longest
GENETRAPPER-isolated cDNA sequences. The PCR product was
gel-purified and cloned into the pCR II cloning vector (Invitrogen)
using the TA cloning kit (Invitrogen), electroporated into the E. coli strain DH5, and plated onto LB/amp agar plates. Colonies
were used to inoculate Lennox L broth cultures containing 50 µg/ml
ampicillin (LB/amp broth), and the inserts of 10 isolated plasmids were
sequenced.
Plasmid DNA was purified for sequencing using the QIAprep Spin Plasmid Miniprep kit (QIAGEN) from 5 ml of LB/amp broth cultures, containing 50 µg/ml ampicillin incubated for 16 h at 37 °C. DNA sequencing was carried out by the New York University Kaplan Cancer Center sequencing facility, using a model 373 automated DNA sequencer (ABI), and model 800 Lab Station (ABI).
Construction of a GST Fusion Protein in pGEX-2TThe DNA
sequence encoding amino acids (8-304) of the open reading frame (Fig.
1) were amplified via PCR from 50 ng of the purified cDNA
containing plasmid via PCR using the following primers: P11
(5-CTTGGATCCATGCTGACCCGGAGCCGGAGC) and P12
(5
-CTCGAATTCGAGCCATGCGGCCCTCCGAGA). These primers were designed to
incorporate BamHI and EcoRI restriction sites
into the 5
- and 3
-ends of the sense strand, respectively. PCR was
conducted as follows: 1 cycle of 95 °C for 2 min and 35 cycles of
94 °C for 1 min, 65 °C for 1 min, and 72 °C for 2 min, followed by a final cycle of 10 min at 72 °C. The resulting PCR product was digested with BamHI and EcoRI,
gel-purified, ligated into gel-purified pGEX-2T vector (Pharmacia
Biotech Inc.) that had previously been digested with BamHI
and EcoRI, and electroporated into the E. coli
strain NB42. Colonies were selected via growth on LB agar/amp plates,
and the presence of the appropriate insert was verified via colony PCR
as described above, using primers P3 and P4. Expression of the
full-length fusion protein was confirmed via the induction of log phase
(A590 = 0.6) 5-ml LB/amp broth cultures with 0.1 mM IPTG for 4 h at 37 °C. To prepare total cell SDS
lysates, 1-ml aliquots of induced and uninduced cultures were centrifuged at 5000 × g for 2 min, the supernatant was
discarded, and the pelleted bacteria were resuspended in 100 µl of
SDS-PAGE loading buffer and heated at 95 °C for 5 min. Thirty µl
of each sample was then analyzed on a 15% Tricine gel. After the gels were stained with Coomassie Blue, induced and uninduced samples were
compared to demonstrate the expression of the full-length (65-kDa)
fusion protein. Bacterial lysates produced in an identical manner were
also run on the SDS-PAGE gel in Fig. 3 in order to demonstrate
induction of the GST fusion protein.
Protein Expression and Purification
600 ml of LB/amp broth were inoculated with 10 ml of overnight cultures. Bacteria were grown at 37 °C until the A590 reached 0.6. Expression of the fusion protein was induced by incubation with 0.1 mM IPTG for 5 h at 30 °C (the lower temperature was used to increase the solubility of the fusion protein). Bacteria were then placed on ice for 1 h and pelleted by centrifugation at 3200 × g in 250-ml centrifuge tubes (Corning) for 10 min. The supernatant was discarded, and the pellet was resuspended in 20 ml of sonication buffer (50 mM Tris, pH 8.0, 500 mM NaCl, 5 mM EDTA, 0.5% Triton X-100, 0.25 mM phenylmethylsulfonyl fluoride, 0.1 mg/ml aprotinin). The bacteria were transferred to a 30-ml Corex centrifuge tube and sonicated for 2 min at 70% power using a Heat Systems model W-375 sonicator equipped with a model 419 standard tapered microtip. The sonicate was then centrifuged for 15 min at 10,000 × g, and the supernatant was transferred to a 50-ml plastic centrifuge tube containing 1.2 ml of glutathione-agarose 4B affinity medium (volume of medium was measured as a slurry in 20% ethanol, as supplied by the manufacturer) prewashed with 2 × 40 ml of wash buffer (50 mM Tris, pH 8.0, 500 mM NaCl, 5 mM EDTA, 0.5% Triton X-100). The sample was incubated on ice with agitation for 30 min to allow adsorption of the fusion protein. The affinity medium was then pelleted by centrifugation for 2 min at 950 × g. The supernatant was removed by pipetting, and the affinity medium was washed once with 20 ml of sonication buffer and 4 times with 40 ml of wash buffer by thorough resuspension of the beads in the appropriate buffer followed by centrifugation at 950 × g for 1 min. After the final wash, the affinity medium was resuspended in 1 ml of wash buffer, transferred to a 2-ml plastic tube, and centrifuged again at 950 × g for 1 min to pellet the beads. The supernatant was removed, and the beads were resuspended in 1 ml of glutathione-agarose elution buffer (100 mM Tris, pH 8.0, 500 mM NaCl, 2.5 mM EDTA, 0.1% Triton X-100, 20 mM glutathione (Sigma)) and incubated for 12 h on ice with agitation. Beads were then quickly pelleted by centrifugation at 950 × g, and the supernatant that contained the eluted fusion protein was transferred to a fresh tube. All purification procedures from sonication through elution of the fusion protein were carried out at 4 °C. The purification yielded 9.9 mg of fusion protein.
As a control, the 26-kDa glutathione S-transferase (GST) of Schistosoma japonicum was expressed from the pGEX-2T vector (without a fusion insert) in the bacterial strain NB42 according to the same procedure described for the fusion protein. Twelve mg of purified GST was purified from 600 ml of induced bacterial culture.
Purification of E. coli Endonuclease IIIEndonuclease III was purified from E. coli strain UC6444 carrying the plasmid pHIT1 as described previously (19).
SpectrophotometrySpectrophotometric measurements of
proteins were made in elution buffer (100 mM Tris, pH 8.0, 500 mM NaCl, 2.5 mM EDTA, O.1% Triton X-100,
20 mM glutathione) in a quartz cuvette. The optical absorption spectra of the GST fusion protein and the unfused GST protein were recorded between 200 and 700 nm using a Spectronic Genesystems 5 spectrophotometer (Milton Roy). In order to allow comparison of the absorption spectra of the purified GST fusion protein
and purified GST (see Fig. 6), the purified proteins were diluted prior
to analysis with glutathione-agarose elution buffer to the same
absolute protein concentration (5.5 mg/ml).
FISH Analysis
FISH Analysis was performed by SeeDNA Biotech
Inc. (Dept. of Biology, York University, Ontario, Canada). Lymphocytes
isolated from human blood were cultured in -minimal essential medium
supplemented with 10% fetal calf serum and phytohemagglutinin at
37 °C for 68-72 h. The lymphocyte cultures were treated with BrdUrd
(0.18 mg/ml; Sigma) to synchronize the cell
population. The synchronized cells were washed 3 times with serum-free
medium to release the block and recultured at 37 °C for 6 h in
-minimal essential medium with thymidine (2.5 mg/ml;
Sigma). Cells were harvested and slides were made by
using standard procedures, including hypotonic treatment, fixing, and
air drying.
To produce a probe for FISH analysis, a 1.1-kilobase pair fragment containing the entire cDNA sequence was excised from an isolated cDNA clone using EcoRI and HindIII, purified, and labeled with biotin-14-dATP using the BioNick labeling kit (Life Technologies) (20). The procedure for FISH analysis was performed according to the previously reported procedures of Heng et al. (21, 22). Briefly, slides were baked at 55 °C for 1 h. After RNase treatment, the slides were denatured in 70% formamide, 2 × SSC for 2 min at 70 °C followed by dehydration with ethanol. Probes were denatured at 75 °C for 5 min in a hybridization solution containing 50% formamide, 10% dextran sulfate, and human CotI-restricted DNA. Probes were loaded on the denatured chromosomal slides. After overnight hybridization, slides were washed and analyzed. FISH signals and the DAP1 banding pattern were recorded separately by taking photographs. Chromosomal localization was achieved by superimposing FISH signals with DAP1-banded chromosomes (22).
Northern Blot AnalysisTwo µg of mRNA, isolated from
293T cells using the FastTrack 2.0 mRNA isolation system
(Invitrogen), 1 µg of human spleen Poly(A)+ RNA
(Clontech), and 5 µg of 0.24-9.5-kilobase pair RNA ladder (Life
Technologies) were electrophoresed on an 11 × 14-cm 1.0% agarose-formaldehyde gel. The gel was rinsed with deionized water, and
RNA was transferred to a Nytran membrane (Schleicher & Schuell) using
the Turboblotter rapid downward transfer system (Schleicher & Schuell),
according to the manufacturer's specifications. Following transfer,
the membrane was gently washed in 2 × SSC for 5 min, dried on a
fresh sheet of filter paper, and baked at 80 °C for 1 h. The
portion of the membrane that contained the molecular weight markers was
cut away and stained by treatment with 5% acetic acid for 15 min and
0.5 M sodium acetate, pH 5.2, with 0.04% methylene blue
for 10 min, followed by destaining with water. The baked filter was
incubated in prehybridization solution (in 50% formamide, 3 × SSC, 0.1 M Tris, pH 7.4, 5 × Denhardt's solution)
for 4 h at 42 °C, followed by hybridization overnight at
42 °C with 2 × 106 cpm of radiolabeled probe/ml of
hybridization solution (50% formamide, 3 × SSC, 0.1 M Tris, pH 7.4, 5 × Denhardt's solution, 10%
dextran sulfate). Following hybridization, the membrane was washed
three times for 30 min at 50 °C, successively with 1 × SSC,
0.1% SDS; 0.5 × SSC, 0.1% SDS; and 0.1 × SSC, 0.1% SDS.
The membrane was exposed to x-ray film for 24 h at 70 °C. The
autoradiogram was matched to the prestained markers to determine the
size of the native mRNA. Before hybridization with the
cDNA-specific probe, the Northern blot membrane was analyzed by
hybridization to a
-actin-specific probe to confirm the integrity of
the mRNA. After hybridization to the
-actin probe detected an
mRNA species of the predicted size (approximately 2.1 kilobase
pairs), the membrane was stripped by boiling for 30 min in 0.1 × SSC, 0.5% SDS and probed according to an identical procedure with the
probe specific for the human homologue of endonuclease III (Fig.
1).
The
-actin probe was produced by PCR with sequence-specific primers
(Clontech) against cDNA made from the RNA of cells taken from a
sample of a human bone marrow aspirate. PCR was conducted as follows: 1 cycle of 95 °C for 2 min and 35 cycles of 94 °C for 1 min,
60 °C for 1 min, 72 °C for 1 min, followed by a final cycle of 10 min at 72 °C. The probe was then radiolabeled using the Random
Primed DNA Labeling kit (Boehringer Mannheim) and
[
-32P]dCTP, and it was purified using Nick-Spin
columns (Pharmacia). The specific probe for the human homologue of
endonuclease III was prepared by excising the full-length cDNA
sequence shown in Fig. 1 from the 2 µg of purified plasmid DNA via
restriction with EcoRI and BamHI followed by gel
purification of the restricted fragment. The probe was radiolabeled and
hybridized to the Northern blot membrane as described.
Poly(dA-[3H]dT) was
produced by nick translation of the alternating copolymer poly(dA-dT)
(Pharmacia) with [5,5-3H]TTP followed by oxidation with
osmium tetroxide to form thymine glycol residues (23). Thymine
glycol-containing poly(dA-[3H]dT) produced in this manner
had a specific activity of approximately 1.4 × 107
dpm/µg. Thymine glycol DNA-glycosylase assays were carried out against oxidized DNA and the released radioactive product proven to be
thymine glycol by high pressure liquid chromatography analysis as
described previously (23).
A
double-stranded oligodeoxynucleotide containing a single thymine
glycol-residue was prepared as described previously (18, 24). The
thymine glycol-containing strand was 5-end-labeled with
[
-32P]dATP, using T4 kinase (Life Technologies)
according to the manufacturer's recommendations, and purified using a
ChromaSpin-10 column (Clontech).
The purified GST fusion protein, the nonfusion GST protein, and E. coli endonuclease III were reacted with the substrate double-stranded oligodeoxynucleotide in a total volume of 50 µl under the following reaction conditions: 37.3 mM NaCNBH3, 20 mM HEPES, pH 7.5, 46.5 mM KCl, 5 mM EDTA, a 4.0 µM concentration of each oligodeoxynucleotide, and 40 ng/µl protein. In the case of E. coli endonuclease III, this represented approximately a 4-fold molar excess of substrate deoxyoligonucleotide to enzyme. After incubation at room temperature for 2 h, a 25-µl volume of 3 × SDS-PAGE loading buffer was added to each sample. Samples were then heated to 90 °C for 5 min and separated by electrophoresis on a 15% Tricine-SDS gel. Following electrophoresis, the gel was stained with Coomassie Blue, wrapped in plastic, and analyzed via autoradiography.
Gel ElectrophoresisPrior to electrophoresis all samples were incubated at 95 °C for 5 min in standard SDS-PAGE loading buffer. Fifteen percent Tricine gels (25) were prepared and run using the Mini-Protein II electrophoresis system (Bio-Rad). Gels were run at 90 V for approximately 5 h, completion being determined by the progress of prestained low molecular weight electrophoresis standards (Bio-Rad). Gels were then stained with Coomassie Blue.
Fig. 1 presents the nucleotide sequence of a
cDNA of 1045 base pairs, which contains a putative open reading
frame (ORF) of 912 base pairs. This ORF encodes a protein of 304 amino
acids with a calculated molecular mass of 33,569 and a calculated pI of
9.85, which is the human homologue of E. coli endonuclease III. The nucleotide sequence data presented in Fig. 1 were obtained from two sources. The sequence of nucleotides 6-1045 was obtained by
analysis of clones isolated from a cDNA library, using probes based
upon the sequence of the previously described human 3-EST. The
sequence of nucleotides 1-5 was obtained by sequencing the products of
5
-RACE, performed using gene-specific primers based upon the sequence
of the longest cDNA clones.
Previously we reported the sequence of four peptides obtained by proteolysis of a purified bovine pyrimidine hydrate-thymine glycol DNA glycosylase/AP lyase (18). The sequences of those four peptides as well as that of one additional peptide (GEGGEGAEHLQAP) derived from the same purified protein are also included in Fig. 1, aligned with the homologous sequences encoded within the ORF of the human cDNA.
The 1045-base pair sequence of Fig. 1 probably represents most, if not
all, of the entire full-length cDNA. The Northern blot analysis
(Fig. 2) of human splenic and 293T cell (human) mRNA each demonstrate a predominant mRNA species of approximately
1.1-1.2 kilobase pairs, which hybridized to a 32P-labeled
probe containing the entire sequence of the ORF. The difference of
approximately 50-150 nucleotides in length between the cDNA
sequence presented in Fig. 1 and the native mRNA can be explained
by the expected presence of a poly(A) tail of approximately the same
length on the native species and perhaps a few more nucleotides 5 to
the first AUG codon.
Fig. 2, lane 3, which contains mRNA extracted from 293T
cells, shows a second faint band of higher Mr.
Although we think this band is nonspecific, we cannot fully exclude the
possibility that it represents mRNA encoding a protein similar to
human endonuclease III. Such a situation is present in S. cerevisiae, which contains two homologues of E. coli
endonuclease III, one of which is thought be nuclear, the second
mitochondrial (see "Discussion" and Fig. 8).
To demonstrate that the cDNA sequence of Fig. 1 encoded a functional homologue of endonuclease III, a GST fusion protein was constructed consisting of amino acid residues 8-304 of the ORF fused to the C terminus of the 30-kDa GST protein. SDS-PAGE analysis of the IPTG-induced, affinity-purified fusion protein (Fig. 3) revealed a predominant 65-kDa full-length protein. Two additional lower molecular weight protein species were present in the purified preparation. We believe these to be fragments of the 65-kDa protein that arose through abortive synthesis of the full-length protein or proteolysis occurring before, during, and possibly after cell lysis and affinity purification, due to the action of contaminating cellular proteases.
As demonstrated previously, E. coli endonuclease III can be
specifically, irreversibly cross-linked to a thymine glycol-containing oligodeoxynucleotide via the reductive stabilization of its
characteristic enzyme-substrate intermediate (18). To further confirm
that the ORF presented in Fig. 1 encoded a fully functional homologue of E. coli endonuclease III, the cross-linking reaction, as
described under "Experimental Procedures," was applied to the
purified GST-fusion protein. The results of this reaction are
illustrated in Fig. 4. When aliquots of the purified GST
fusion protein that had been incubated with a 32P-labeled
thymine glycol-containing oligodeoxynucleotide in the absence
(lane 6) or presence (lane 7) of sodium
cyanoborohydride (NaCNBH3) were compared by SDS-PAGE
analysis, it became evident that a portion of the protein had been
irreversibly cross-linked to the oligodeoxynucleotide. This is
manifested by an increase in the apparent molecular weight of the
enzyme resulting in formation of the doublet shown in lane
7. This shift is analogous to that observed when endonuclease III
was subjected to the same reductive cross-linking reaction (lane
3) and compared with native endonuclease III (lane 2).
No shift of the major protein species was observed when the non-fusion
GST protein (lane 3) was incubated under reducing conditions with the thymine glycol-containing oligodeoxynucleotide (lane 4).
An autoradiogram of the gel in Fig. 4A is presented in Fig.
4B. As described, the thymine glycol-containing
oligodeoxynucleotide had been 5-end-labeled with 32P prior
to incubation with the proteins. Thus, cross-linking was confirmed by
this autoradiogram in which predominant radioactive species are present
only in lanes 2 (E. coli endonuclease III plus
NaCNBH3) and 7 (GST fusion plus
NaCNBH3), which correspond in apparent
Mr to the shifted species seen on the Coomassie
Blue-stained gel. Also evident on the autoradiogram in lane
7 are two visible, but less intense, lower molecular weight bands
that correspond in position to presumed degradation products of the
fusion protein present even after affinity purification (Fig. 3).
Presumably these represent cross-linked, partially degraded fusion
protein.
After purification, the fusion protein was also analyzed for thymine
glycol-DNA glycosylase activity. Fig. 5 presents the V versus [Et] plot in which thymine
glycol release is expressed as a function of increasing content of
fusion protein. The release of thymine glycol is linear with respect to
fusion protein concentration over the amount of protein used. Based on the results of this plot, the specific enzymatic activity of the fusion
protein was calculated to be about 1-2% that of genetically engineered E. coli endonuclease III using the same assay
(latter assay data not shown). This reduced level of activity is
apparently quite common among GST fusion
proteins.2 GST protein that contained no
C-terminal fusion was induced and purified in a manner identical to the
fusion protein and assayed for enzymatic activity. This non-fusion GST
protein did not demonstrate detectable thymine glycol-DNA glycosylase
activity at a protein concentration 3 orders of magnitude higher than
that at which the fusion protein was assayed.
As documented previously, E. coli endonuclease III contains an iron-sulfur cluster in which a cubane [4Fe-4S] moiety is liganded by four cysteine residues. This domain produces a distinctive absorbance at 410 nm (26). Conservation of this [4Fe-4S] cluster in the human enzyme was inferred on the basis of the cDNA sequence of Fig. 1, since the putative ORF contains the appropriate four cysteine residues at amino acid positions 282, 289, 292, and 300, and confirmed by taking an absorption spectrum of the purified GST-fusion protein, which revealed that it too absorbed strongly at 410 nm (Fig. 6).
Although purified E. coli endonuclease III has a characteristic absorption peak at 410 nm and might be expected to appear blue in solution, the color of solutions containing approximately 0.5 mg/ml or greater of purified endonuclease III are typically yellow-brown (19). Similarly, a solution of the purified GST fusion protein at similar concentrations of protein was also yellow, while a solution of the simultaneously purified non-fusion GST protein was colorless.
In order to determine the chromosomal localization of the gene encoding
the mammalian enzyme, FISH analysis was performed as described under
"Experimental Procedures." Under the conditions used, hybridization
efficiency for our probe was approximately 70% (i.e. among
100 mitotic spreads analyzed, 70 demonstrated binding of the probe to
one pair of chromosomes). DAP1 banding was used to identify the
chromosome pair to which the probe had bound (chromosome 16). The
precise localization of the gene (16p13.2) was determined by the
summary analysis of 10 pairs of photographs in which the probe signal
was matched with the results of DAP1 banding (Fig. 7).
There was no additional locus detected by FISH analysis. These results
taken together with the presence of a single mRNA species on
Northern analysis indicates that the gene for human endonuclease III is
a single copy gene.
The human sequence of Fig. 1 shows a remarkable similarity to that of several other putative homologues of the E. coli endonuclease III (National Center for Biotechnology Information (NCBI) sequence ID 119329) found in representative species of all three biologic domains. In bacteria they have been found in both Gram-negative (Hemophilus influenza, NCBI sequence ID 1169526) and Gram-positive (Bacillus subtilis, NCBI sequence ID 729418) organisms; among archeons, in Methanococcus jannaschii (NCBI sequence ID 1510694); and among eukaryotes, in Schizosaccharomyces pombe (NCBI sequence ID 1065894), S. cerevisiae (NCBI sequence ID 1419843 and 401436), C. elegans (NCBI sequence ID 974795), Rattus sp. (accession number H33255[GenBank]), and Homo sapiens (accession number F04657[GenBank]). The S. cerevisiae genome encodes two distinct theoretical homologues of E. coli endonuclease III. The alignment of the nine putative homologous sequences using the program Clustal W (version 1.5) (Fig. 8) reveals that a core sequence of amino acids is remarkably well conserved. In bacteria, the core sequence comprises virtually the entire protein. In contrast, the proteins of archeons and eukaryotes have unique extensions at their N and/or C termini. For the sake of clarity, these extensions have been omitted from Fig. 8.
Based upon similarities among several bacterial DNA glycosylases,
site-directed mutagenesis studies, and molecular modeling, Thayer
et al. (26) identified several regions and residues within the core sequence of amino acids of E. coli endonuclease III
that could be involved in DNA binding and catalysis. The region
surrounding glutamine 41 (residue numbers refer to the E. coli endonuclease III amino acid sequence unless otherwise
indicated) may form a portion of the substrate binding pocket, in which
the damaged pyrimidine fits when in the "flipped out" conformation
that the enzyme recognizes. The Helix-hairpin-helix (HhH) motif encoded by the residues surrounding the central LPGVG sequence (residues 114-118) is thought to function in nonspecific DNA recognition. Recently, Doherty et al. (27) have extended this analysis
and shown that similar HhH motifs occur in 14 homologous families of
DNA-binding proteins, including DNA glycosylases, DNA polymerases, and
"flap" endonucleases. Lysine 120 appears to be the nucleophile in
the active site of endonuclease III that contributes the -amino group necessary for the formation of the N-acylimine
enzyme-substrate intermediate, characteristic of DNA glycosylase/AP
lyases. Aspartic acid 138 has also been implicated as a functional
active site residue. All of these residues appear to be well conserved
in all of the nine sequences shown. The structure of the E. coli endonuclease III was recently solved (26), and, in light of the high degree of conservation of critical residues, it is likely that
the common core sequence of all members of the endonuclease III family
will have a similar three-dimensional structure.
In addition to the previously mentioned residues, four highly conserved cysteine residues (187, 194, 197, and 203) have been identified within this common core sequence that contribute to the [4Fe-4S] cluster of E. coli endonuclease III. Examination of the aligned sequences in Fig. 8 reveals that in E. coli endonuclease III and five of its eight putative homologues, including the human enzyme, these four cysteines are arranged according to the consensus sequence Cys-X6-Cys-X2-Cys-X5-Cys. A similar but slightly modified sequence appears in S. pombe (Cys-X6-Cys-X2-Cys-X7-Cys) and M. jannaschii (Cys-X5-Cys-X2-Cys-X7-Cys). Thayer et al. (26) suggested that basic amino acid residues between the first two cysteines of the [4Fe-4S] cluster may form a loop that functions in the nonspecific binding of DNA. While Fig. 8 does not indicate absolute conservation of these residues, some conservation is apparent, especially with respect to arginine 193.
As mentioned previously, the genome of S. cerevisiae encodes two putative homologues of E. coli endonuclease III, one of which (designated Sce non-Fe-S in Fig. 8 (NCBI sequence ID 1419843)) lacks the four-cysteine [4Fe-4S] motif completely and presents an obvious exception to this consensus sequence. However, this sequence also encodes a putative mitochondrial leader sequence (28). Whether pairs of endonuclease III-like proteins, with and without [4Fe-4S] clusters, are present in other eukaryotic organisms and whether the non-Fe-S proteins are mitochondrial remains to be determined.
This interesting question notwithstanding, the presence of endonuclease III-like enzymes in representative species of all three evolutionary domains suggests that the genomic DNA of organisms throughout phylogeny is subject to endogenous stresses that attack the 5,6-double bonds of pyrimidine residues. Previously well characterized substrates of endonuclease III include oxidized pyrimidines such as thymine glycol and 5-hydroxycytosine and hydrates of cytosine and uracil. The oxidation of DNA bases has been primarily attributed to reactive oxygen species formed as byproducts of oxidative metabolism and inflammation. The formation of pyrimidine hydrates has been primarily attributed to the action of UV radiation (reviewed in Ref. 29). The archeon M. jannaschii lives beneath the sea and therefore is not exposed to direct sunlight. Furthermore, it is characterized by a reducing rather than an oxidizing metabolism (30). The identification of a homologue of endonuclease III in the genome of this organism suggests that pyrimidines with reduced 5,6-double bonds such as 5,6-dihydrothymine may be formed spontaneously in archeon genomic DNA. Perhaps within this evolutionary domain, it is primarily the formation of such reduced rather than oxidized or photohydrated pyrimidine residues that has promoted the conservation of an endonuclease III-like enzyme.
At this time, the specific contribution that the human pyrimidine hydrate-thymine glycol DNA glycosylase/AP lyase activity makes to the maintenance of the genome is uncertain. The human gene encoding this enzyme was localized to the locus 16p13.2-.3 by FISH analysis (Fig. 7). The accuracy of this localization was corroborated through the identification of genomic data base nucleotide sequence (accession number L48777[GenBank]) obtained by exon trapping from this same region of chromosome 16 (31), which is 94.1% identical to nucleotides 699-799 of the sequence of Fig. 1. The chromosomal locus of the human endonuclease III homologue is in very close proximity to that of another DNA base excision repair enzyme, 3-methylpurine DNA glycosylase as well as the DNA nucleotide excision gene, ERCC-4. There is no apparent homology among these three proteins, so it seems unlikely that their localization to the same chromosomal region is the result of gene duplication and divergence. Loss of heterozygosity in this region has been reported to occur in 22% of human hepatocellular carcinomas (32). Whether any or all of these DNA repair proteins act as tumor suppressors for human hepatocarcinogenesis remains to be determined.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) U81285[GenBank].
We thank Philip H. Bolton for supplying us with thymine glycol-containing oligodeoxynucleotide, Ajay Chheda for assistance with computer-assisted sequence retrieval, and Archie Cummings, Jr., for technical assistance. We thank Dr. Bernard Goldschmidt of the New York University Skirball Institute Sequencing Facility for DNA sequence.