(Received for publication, January 26, 1995; and in revised form, March 13, 1995)
From the
Corynebacterial sarcosine oxidase, a heterotetrameric
(
Sarcosine oxidase is produced as an inducible enzyme when Corynebacterium sp. P-1 is grown with sarcosine as source of
carbon and energy(1) . In the presence of oxygen and
tetrahydrofolate (H Corynebacterial sarcosine oxidase contains
four different subunits ( Bacterial sarcosine oxidases have been
isolated from over a dozen different organisms and fall into two major
classes: heterotetramers ( In mammalian liver, oxidative cleavage of the methyl group from
sarcosine is catalyzed by sarcosine dehydrogenase (94 kDa), an enzyme
that exhibits many similarities with dimethylglycine dehydrogenase (96
kDa). These monomeric enzymes contain a single covalently bound flavin
and use H The
complex quaternary structure and multiple binding sites for substrates
and prosthetic groups in corynebacterial sarcosine oxidase provide a
particularly intriguing target for structure-function studies. We
recently reported the single-step cloning and overexpression of the
genes coding for sarcosine oxidase from Corynebacterium sp.
P-1 and the characterization of the recombinant enzyme(13) . In
this paper, we report the structure and organization of the sarcosine
oxidase operon and nearby gene(s).
Figure 1:
Nucleotide sequence of
a 7285 base pair Corynebacterium sp. P-1 DNA fragment
containing sarcosine oxidase and other nearby gene(s) and the deduced
amino acid sequence of the gene products. Numbering begins at the G in
the Sau3AI site at the 5` end of the corynebacterial DNA
insert in pLJC305(13) . Putative ribosomal-binding sites are
indicated by double underlining. Single underlining indicates regions where the deduced amino acid sequence of the
sarcosine oxidase subunits matched NH
Figure 2:
Gene arrangement, intergenic regions, and
restrictions sites in the sarcosine oxidase operon and nearby gene(s).
The corynebacterial DNA insert in pLJC305 is shown by the heavy
line; the lighter line indicates part of the vector
multicloning region. Arrows are used to indicate gene position
and size. Junctions between contiguous genes are detailed in the boxes. Restriction enzymes are designated as follows: Sa, SacI; X, XbaI; S, SalI; P, PstI; K, KpnI; B, BamHI. For clarity, SalI and PstI sites are omitted in the region downstream from soxD.
Starting from the 5` end of the
corynebacterial DNA insert, the first open reading frame codes for a
putative serine hydroxymethyltransferase. The next four open reading
frames (soxA, soxB, soxG, and soxD)
code for the subunits of sarcosine oxidase ( In contrast to the close
packing observed for the first five genes, the next open reading frame
is found 340 nucleotides downstream from soxG and codes for a
putative purU gene. This gene is important in the regulation
of one-carbon folate metabolism in E. coli and is also
involved in purine biosynthesis (22, 23, 24) . A seventh open reading frame
is found 73 bases from the end of the purU gene. This open
reading frame could potentially code for a 65-residue peptide fragment,
but a BLAST search failed to identify any homology with known proteins.
Potential ribosome-binding sites are found 6 bases upstream from the
start of the purU gene and the unidentified, seventh open
reading frame.
Figure 3:
Comparison of the putative serine
hydroxymethyltransferase sequence from Corynebacterium sp. P-1
with known serine hydroxymethyltransferases. A, the dendrogram
was generated by PILEUP analysis of the corynebacterial sequence with
21 known serine hydroxymethyltransferases found in the data bases.
Percent identity and similarity values, shown in parentheses,
were obtained from pairwise comparisons. B, the putative
serine hydroxymethyltransferase fragment from Corynebacterium sp. P-1 was aligned with the COOH-terminal 65% of serine
hydroxymethyltransferase from H. methylovorum(53) using the GAP program. Panel B shows the
region of the alignment surrounding the conserved lysine residue
(indicated by an arrow) that binds pyridoxal phosphate in
known serine hydroxymethyltransferases.
soxD codes for a polypeptide containing 98 residues.
Peptide sequence analysis of the A BLAST search detected no homology of the
The
Figure 4:
Alignment of the ADP-binding motif in the
The
Figure 5:
A multiple sequence alignment was
generated using the
Pairwise alignments using the GAP program
indicate about 22-25% identity and 45-47% similarity
between the
An ADP-binding motif is found near the NH
Figure 6:
A multiple sequence alignment was
generated using the NH
Using the
Figure 7:
Homologies and motifs in the
Since
the NH As might be expected, T-proteins were
identified as high-scoring sequences in a BLAST analysis using
dimethylglycine dehydrogenase as the query sequence. The region of
dimethylglycine dehydrogenase that exhibits homology with the
T-proteins overlaps with that observed for the dimethylglycine
dehydrogenase/ A multiple sequence alignment of the
COOH-terminal half of the
Figure 8:
A
multiple sequence alignment was generated using the COOH-terminal half
of the
A PILEUP comparison of the amino acid sequence
deduced for the putative corynebacterial purU gene product
with the products from E. coli, S. flexneri, and H. influenzae purU genes demonstrates a high degree of
sequence conservation, particularly in the COOH-terminal two thirds of
the proteins (Fig. 9). The alignment suggests that the
incomplete H. influenzae purU gene is missing a region coding
for the first 65 amino acids at the NH
Figure 9:
Multiple sequence alignment of the amino
acid sequence deduced for the putative corynebacterial purU
gene product with the products from E. coli, S.
flexneri, and H. influenzae purU
genes.
We have sequenced a corynebacterial DNA insert in pLJC305, a
construct previously shown to express a heterotetrameric sarcosine
oxidase in E. coli(13) . The 5` end of the insert
contains five closely packed genes. These genes code for the subunits
of sarcosine oxidase (soxA, soxB, soxG, soxD) and a putative serine hydroxymethyltransferase (glyA). They are arranged in the order, glyAsoxBDAG, and appear to be organized for
efficient, coupled translation. The sox genes were identified
by comparison of the translated nucleotide sequence with
NH Expression of all four sarcosine oxidase subunits from the
corynebacterial genes in pLJC305 is completely under the control of the
vector-encoded lac promotor(13) . Since the lac promotor is located just upstream from the 5` end of the
corynebacterial DNA insert, this means that neither the glyA
gene fragment nor any of the sox genes can contain a
transcription terminator sequence, or, at least, not one that is
recognized by the E. coli RNA polymerase. This indicates that
the sarcosine oxidase operon probably contains the glyA gene
in addition to the four sox genes. The latter result is
generally expected in prokaryotes where genes coding for subunits of a
oligomeric enzyme are typically found clustered in an
operon(49) . That the operon also includes the glyA
gene is understandable in a metabolic sense since the sarcosine oxidase
reaction generates the substrates (glycine and
5,10-CH A putative purU gene, located downstream from the soxG gene, was identified based on its homology with known purU gene products. The purU gene codes for a
10-CHO-H The The
NH The COOH-terminal half of the A generic folate motif has not been found, not even for
folate-dependent enzymes that use the same derivatives as substrates.
The observed sequence homology suggests that T-protein and the
COOH-terminal domains of the
The nucleotide
sequence(s) reported in this paper has been submitted to the
GenBank®/EMBL Data Bank with accession number(s)
U23955[GenBank].
ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES
) enzyme containing covalent and noncovalent FAD,
catalyzes the oxidative demethylation of sarcosine to yield glycine,
H
O
, and 5,10-CH
-tetrahydrofolate
(H
folate) in a reaction requiring H
folate and
O
. The sarcosine oxidase operon contains at least five
closely packed genes encoding sarcosine oxidase subunits and serine
hydroxymethyltransferase (glyA), arranged in the order glyAsoxBDAG. The operon status of a putative purU gene, found 340 nucleotides downstream from soxG, is not known. No homology with other proteins is
observed for the smallest sarcosine oxidase subunits
and
.
The
subunit (405 residues) contains an ADP-binding motif near its
NH
terminus, the covalent FAD attachment site (H175), and
exhibits homology with the NH
-terminal half of
dimethylglycine dehydrogenase (857 residues) and monomeric,
bacterial sarcosine oxidases (
388 residues), enzymes that contain
a single covalent FAD. The
subunit (967 residues) contains a
second ADP-binding motif within an
280 residue region near the
NH
terminus that exhibits homology with subunit A from
octopine and nopaline oxidases, heterodimeric enzymes that catalyze
analogous oxidative cleavage reactions with N-substituted arginine
derivatives. An
380 residue region near the COOH terminus of
exhibits homology with T-protein and the COOH-terminal half of
dimethylglycine dehydrogenase. These enzymes catalyze the formation of
5,10-CH
-H
folate, using different one-carbon
donors. The results suggest that the
subunit and dimethylglycine
dehydrogenase contain an NH
-terminal domain that binds
noncovalent or covalent FAD, respectively, and a carboxyl-terminal
H
folate-binding domain.
folate), (
)the enzyme
catalyzes the oxidative demethylation of sarcosine (N-methylglycine) to yield glycine, hydrogen peroxide, and
5,10-methylenetetrahydrofolate
(5,10-CH
-H
folate). In the absence of
H
folate, the same rate of sarcosine oxidation is observed
and the oxidized methyl group is released as formaldehyde(2) .
In addition to sarcosine, the enzyme can also oxidize cyclic amino
acids, like L-proline and L-pipecolic acid, but at
slower rates(3) .
, 100 kDa;
, 42 kDa;
, 20 kDa;
, 6 kDa), 1 mol of noncovalently bound FAD and 1 mol of FAD
covalently attached to a histidyl residue in the
subunit(1) . The noncovalent flavin accepts electrons from
sarcosine which are then transferred in one-electron steps to the
covalent flavin where oxygen is reduced to hydrogen
peroxide(3, 4) . The presence of covalent and
noncovalent flavin is a feature unique to the heterotetrameric
sarcosine oxidases but other physiologically important mammalian
enzymes, like nitric oxide synthase and NADPH-cytochrome P450
reductase, contain two different flavins that bind noncovalently at the
active site(5) .
, 96-100 kDa;
, 42-45
kDa;
, 20-23 kDa;
, 6-14 kDa) contain covalent
(
subunit attachment) and noncovalent flavin, like the
corynebacterial enzyme; monomeric enzymes (42-45 kDa) contain
only covalent flavin and are similar in size to the
subunit in
the heterotetrameric enzymes(6) . It is not known whether
H
folate can act as a substrate for the monomeric enzymes.
folate as a co-substrate, similar to that observed
for corynebacterial sarcosine
oxidase(7, 8, 9) . Proline oxidizing enzymes
are not known to contain covalently bound
flavin(10, 11) . However, pipecolic acid oxidase (46
kDa) from mammalian liver, an enzyme similar in size to the monomeric
sarcosine oxidases and the
subunit from the tetrameric sarcosine
oxidases, does contain a single covalently bound flavin(12) .
The presence of covalent flavin in sarcosine, dimethylglycine, and
pipecolic acid oxidoreductases is noteworthy since this is a fairly
uncommon feature, particularly in mammalian flavoproteins.
Materials
Restriction enzymes, T4 DNA ligase,
and calf intestinal alkaline phosphatase were purchased from New
England Biolabs and Promega and used as described by the manufacturer.
Sequenase Version 2.0 DNA Sequencing Kit was purchased from Amersham
Corp. -
P-Labeled dATP and dCTP were purchased from
ICN Biomedicals, Inc. Long Ranger Gel Solution was obtained from A. T.
Biochem. ProBlott membranes were a gift from Applied Biosystems.
Coomassie Blue R-250, TEMED, and ammonium persulfate were purchased
from Bio-Rad.
NH
The subunits from 100 pmol of purified sarcosine
oxidase from Corynebacterium sp. P-1 were separated by
SDS-polyacrylamide gel electrophoresis using a 3-24% acrylamide
gradient(13) . The separated subunits were transferred to
ProBlott membranes and stained using Coomassie Blue R-250 as described
by the manufacturer. The cysteine residues were not protected prior to
analysis. The amino termini of the subunits were sequenced by Edman
degradation on an Applied Biosystems 477A protein sequencer equipped
with an Applied Biosystems 120A Analyzer at the laboratory for
Macromolecular Analysis at the Albert Einstein College of Medicine.-terminal Amino Acid Sequence
Analysis
DNA Sequencing
The entire corynebacterial insert
in pLJC305 was sequenced using a combination of the original plasmid,
deletions, and subcloned fragments. Synthetic primers (Ransom Hill
Bioscience, Inc.) were used to fill in gaps in the sequence. All the
constructs were prepared in pBluescript II phagmid vectors.
Single-stranded template was prepared from phagmids in Escherichia
coli XL1-Blue using the helper phage VCSM13 as described by
Stratagene. Sequence data were obtained using
-
P-labeled dATP or dCTP with Sequenase 2.0. Both
strands were sequenced using both dGTP and dITP chemicals as described
by the manufacturer to combat problems inherent in sequencing high G-C
content DNA. Sequences were assembled, edited, and analyzed using the
Genetics Computer Group software package (14) .
Sequence Analysis
Sequences were masked using the
program XNU (15) prior to submission to the National Center for
Biotechnology Information for BLAST analysis(16) . Dot matrix
comparisons were conducted using the DOTBLOT program (window size
= 30, stringency setting = 16, unless otherwise
noted)(17) . Pairwise amino acid sequence alignments were made
using the GAP program(18) . Multiple sequence alignments were
generated using the PILEUP program(19) . Multiple sequence
alignment results obtained using the MACAW program (20) were
used to define the limits of homologous regions among three or more
proteins, as described under ``Results.'' Except for MACAW,
the sequence analysis programs were included in the Genetics Computer
Group package.Mass Spectrometry
Sarcosine oxidase (6.83 mg) was
mixed with guanidine hydrochloride to a final concentration of 2 M at pH 7.5. Except as noted, the subunits were isolated by
chromatography (200 µl/run) on a Superose 12 fast protein liquid
chromatography gel filtration column (Pharmacia) pre-equilibrated with
3 M guanidine hydrochloride, 1 mM EDTA, 10 mM potassium phosphate, pH 7.5, and run at 0.5 ml/min for 60 min. The
isolated subunits were pooled, concentrated using Centricon-3
microconcentrators (Amicon), and rechromatographed until the individual
subunits were pure. After the first gel filtration step, the isolated
and
subunits were incubated for 1 h at room temperature in
column buffer containing 50 mM dithiothreitol; the column
buffer used for rechromatography of these subunits contained 10 mM dithiothreitol. The volume of the isolated subunits was adjusted
to 1.5 ml in 5% acetic acid (
and
) or 5% acetic acid
containing 10 mM dithiothreitol (
and
). The samples
were centrifuged at 33,000
g for 20 min at 4 °C,
and the supernatant removed to a fresh tube. The pellet was resuspended
in an additional 1.5 ml of solvent and centrifuged as above. This
procedure was repeated until the entire pellet went into solution. The
pooled samples were concentrated to a final volume of approximately 100
µl and submitted for analysis. Mass spectrometry (electrospray and
MALDI) was kindly performed by Nolle Potier at the Laboratoire de
Spectrometrie de Masse Bioorganique, Faculte de Chimie, Universite
Louis Pasteur, Strasbourg, France.
Organization of the Sarcosine Oxidase Operon and Nearby
Gene(s)
We have sequenced the 7285-base pair corynebacterial DNA
insert in pLJC305, a pBluescript II SK(+) construct previously
shown to express high levels of sarcosine oxidase using E. coli XL1-Blue as the host organism(13) . The
nucleotide sequence and the deduced amino acid sequence of seven open
reading frames are shown in Fig. 1. Gene arrangement, intergenic
regions, and a restriction map are shown in Fig. 2. This section
provides an overview. Details of gene identification and sequence
analysis are described later.
-terminal peptide
sequencing data; gaps mainly reflect incomplete peptide sequence data,
as detailed in the text. The putative covalent flavin-binding site in
the
subunit of sarcosine oxidase (His
) is indicated
by an arrow (
).
,
,
, and
, respectively) with the genes arranged in the order, soxBDAG. The serine hydroxymethyltransferase gene (glyA) terminates 2 bases before the start of soxB.
The soxB and soxD genes are separated by 11 bases.
The stop codon of soxD overlaps with the start codon of soxA. The start codon of soxG is found 8 bases before
the end of soxA. The presence of overlapping genes or genes
separated by a short intergenic region has been associated with
translational coupling. In this phenomenon, the same ribosome or, at
least, a component thereof, serves to translate two contiguous genes
without ever dissociating from the mRNA(21) . Potential
ribosome-binding sites are identifiable 6 to 8 bases before the start
of each of the four sox genes (Fig. 2). These sites are
positioned such that translation of an upstream gene will terminate
within the ribosome-binding site of the corresponding downstream gene,
a feature required for the coupling effect.
Codon Usage and G-C Content
The overall G-C
content of the six genes identified in this study was 68%. Table 1compares the codon preference of these genes with 24 genes
from other Corynebacterium sp. found in the data base,
including two genes with a high G-C content (70-71%) and 22 genes
with a moderate G-C content (41-59%). As expected, codons
containing a G or C in the third position are used preferentially in
the Corynebacterium sp. P-1 genes and in the two G-C-rich
genes from other corynebacteria. An exception is provided by the codon
usage observed for glutamic acid in the Corynebacterium sp.
P-1 genes which is evenly distributed between GAG (55%) and GAA (45%),
similar to other corynebacterial genes with a moderate G-C content
rather than the G-C-rich genes. With the inclusion of our data, the
codon usage data base for G-C-rich corynebacterial sequences will
increase about 4-fold and may facilitate the design of oligonucleotide
probes for use in these organisms.
Identification of the Serine Hydroxymethyltransferase
Gene (glyA)
A BLAST search revealed strong homology between the
open reading frame at the 5` end of the corynebacterial DNA insert in
pLJC305 and the COOH-terminal region of various serine
hydroxymethyltransferases. A multiple sequence alignment of the
corynebacterial sequence with 21 known serine hydroxymethyltransferases
was generated using the program PILEUP. The results, presented as a
dendrogram in Fig. 3A, show a clear division into two
major clusters corresponding to prokaryotic and eukaryotic sequences.
The putative corynebacterial sequence falls within the prokaryotic
cluster and exhibits highest homology with serine
hydroxymethyltransferase from Hypomicrobium methylovorum, as
judged by BLAST analysis and pairwise comparisons using the program
GAP. The corynebacterial sequence aligns with a COOH-terminal region
containing about 65% of the hypomicrobial serine
hydroxymethyltransferase, exhibiting 57% identity and 72% similarity.
Further, a lysine residue is found in the corynebacterial sequence at a
position corresponding to a highly conserved lysine which serves as the
covalent attachment site for pyridoxal phosphate in serine
hydroxymethyltransferase (Fig. 3B).
Identification of Genes Encoding the Two Smallest
Sarcosine Oxidase Subunits (soxD and soxG)
soxG encodes a
polypeptide of 203 residues. The sequence determined for the first 11
amino acids of the subunit by peptide sequence analysis coincides
with residues 2-12 in the amino acid sequence deduced from the soxG gene sequence, indicating post-translational loss of the
initial methionine (Fig. 1). The molecular weight of the
subunit estimated from the gene sequence and corrected for loss of one
methionine (20,898) is in excellent agreement with a value determined
by electrospray mass spectrometry (20,899) and is consistent with
values previously estimated by SDS-gel electrophoreses (Table 2).
subunit resulted in the
identification of 18 out of the first 22 amino acids. The peptide data
agree with the sequence deduced from the gene sequence, except for a
mismatch in the third amino acid (Fig. 1). The peptide data
indicate that the initial methionine is not lost from the
subunit, in contrast to results obtained for all the other subunits.
The molecular weight estimated from the gene sequence (11,314) matches
the value obtained by electrospray mass spectrometry (11,313) whereas
somewhat lower values have been estimated by SDS-gel electrophoresis (Table 2).
or
subunit with any other protein in the data base. These subunits
do not contain any known motifs.
Identification and Analysis of soxB, the Gene Encoding
the Sarcosine Oxidase Subunit with Covalently Bound
FAD
soxB encodes a protein with 405 residues. Peptide
sequence analysis identified 34 out of the first 35 amino acids in the
subunit. The peptide data are in complete agreement with the
sequence deduced from the soxB gene sequence and indicate that
the initial methionine is absent in the mature protein (Fig. 1).
As expected, the molecular weight of the
subunit estimated from
the gene sequence (43,854) is somewhat lower than the value determined
by electrospray mass spectrometry (44,314). The best match with the
electrospray value is obtained when the estimated molecular weight of
includes a contribution due to covalently bound FMN (44,308). The
stability of the pyrophosphate link in FAD under the analysis
conditions is not known. Similar molecular weight values have been
obtained by SDS-gel electrophoresis (Table 2).
subunit exhibits an ADP-binding motif near the NH
terminus
which satisfies all of the 11 consensus sequence requirements described
by Wierenga et al.(25) , except for an aspartate at
position 1. A BLAST search revealed sequence homology of the
subunit with the following proteins: peptide fragment data obtained for
the
subunit from a similar heterotetrameric sarcosine oxidase
from Corynebacterium sp.
U-96(26, 27, 28) ; four monomeric bacterial
sarcosine oxidases(29, 30, 31, 32) ;
the NH
-terminal half of rat liver dimethylglycine
dehydrogenase(33) , an enzyme twice the size of the
subunit. All of these proteins contain an ADP-binding motif near the
NH
terminus. Fig. 4shows an alignment of the
NH
-terminal region of all seven sequences. An aspartate is
found in position 1 in all of the sequences except for dimethylglycine
dehydrogenase which has a glutamate in this position. An aspartate has
been observed in position 1 in several well-documented FAD-binding
sites(34, 35) . In addition to the six known proteins,
BLAST analysis also detected homology of the
subunit with a
putative 95-residue peptide fragment encoded by an unidentified open
reading frame found near the cycH gene in Paracoccus
denitrificans. The peptide exhibits 70% identity and 84%
similarity with the
subunit and contains an ADP-binding motif
near its NH
terminus (Fig. 4).
subunit from two corynebacterial sarcosine oxidases, various
monomeric sarcosine oxidases, rat dimethylglycine dehydrogenase, and an
unidentified open reading frame from P. denitrificans (accession no. Z36942). The 11 positions that define the motif as
described by Wierenga et al.(25) are marked by asterisks (*), and the variable loop is indicated between the vertical lines. The flavin attachment site in dimethylglycine
dehydrogenase is shown by an arrow (
).
subunit
exhibits greater than 80% identity with peptide fragment data
encompassing about 43% of the
subunit from the Corynebacterium sp. U-96 enzyme. These fragments are readily
aligned with the amino acid sequence deduced from the soxB
gene sequence. His
is tentatively identified as the
covalent flavin attachment site in the
subunit, based on an
alignment with a covalent flavin-containing peptide from the Corynebacterium sp. U-96 enzyme (Fig. 5). In
dimethylglycine dehydrogenase, the covalent flavin is attached to
His
, a position much closer to the ADP-binding motif. The
covalent flavin attachment site in the monomeric sarcosine oxidases is
not known. The putative covalent flavin attachment site in the
subunit aligns with a conserved asparagine in the monomeric sarcosine
oxidases (Fig. 5). On the other hand, the dimethylglycine
dehydrogenase attachment site aligns with a histidine residue that is
conserved in all of the monomeric sarcosine oxidases but not in the
subunit where an alanine (Ala
) is found at this
position (Fig. 4).
subunit from Corynebacterium sp. P-1
sarcosine oxidase, various monomeric sarcosine oxidases, and
subunit peptide fragments from Corynebacterium sp. U-96
sarcosine oxidase. The figure shows the region of the alignment
surrounding the previously identified covalent flavin attachment site
in the Corynebacterium sp. U-96
subunit (indicated by an asterisk).
subunit and the monomeric sarcosine oxidases. The
degree of homology among the monomeric sarcosine oxidases themselves
shows greater variability (37-86% identity, 58-91%
similarity). A multiple sequence alignment of the
subunit with
the monomeric sarcosine oxidases reveals that 42 residues are conserved
among the five polypeptides. The most highly conserved regions include
the NH
-terminal dinucleotide binding motif and an
60-amino acid region near the COOH terminus (Thr
to
Arg
in the
subunit) (data not shown). The level of
homology of the
subunit with dimethylglycine dehydrogenase (23%
identity, 48% similarity) is similar to that observed for the
subunit and the monomeric sarcosine oxidases but a somewhat lower
degree of homology is observed when dimethylglycine dehydrogenase is
compared with the monomeric enzymes (18-23% identity,
43-45% similarity).
Identification and Sequence Analysis of the Gene Encoding
the Largest Sarcosine Oxidase Subunit (soxA)
soxA codes
for a 967-residue protein. Results obtained for the sequence of the
first 29 amino acids in the subunit by peptide analysis are in
complete agreement with the sequence deduced from the soxA
gene sequence, except for the loss of the initial methionine (Fig. 1). The molecular weight of the
subunit determined
from the gene sequence (102,633) is in good agreement with a value
determined by MALDI mass spectrometry (103,160) and consistent with
values previously estimated by SDS-gel electrophoresis (Table 2).
terminus of
the
subunit which meets the 11 consensus sequence requirements (25) , except for an aspartate at position 1 (Fig. 6), a
feature also seen with the
subunit and the monomeric sarcosine
oxidases.
-terminal half of the
subunit
from sarcosine oxidase and the A subunit from octopine and nopaline
oxidases. The figure shows the region around the ADP-binding motif.
Residues defining this motif are indicated by asterisks (*).
subunit as the query sequence in a BLAST
analysis, two groups of high-scoring sequences were identified which
exhibit homology to regions in either the NH
- or the
COOH-terminal half of the large
polypeptide. The limits of
homologous regions involving three or more proteins were estimated
based on the location of statistically significant homology blocks
obtained when the sequences were aligned using the program
MACAW(20) . In comparing two proteins, the extent of sequence
homology was estimated based on the length of the diagonal observed in
a dot matrix comparison at a stringency setting of 16. Except as noted,
this stringency setting eliminated most background noise and the extent
of the remaining diagonal was readily discerned. Sequence analysis of
the
subunit reveals additional homology between bacterial
sarcosine oxidase and mammalian dimethylglycine dehydrogenase. The
results are summarized in Fig. 7and detailed below.
subunit
from sarcosine oxidase and rat dimethylglycine dehydrogenase. The top bar in panels A and B represent the
entire
(alpha) and dimethylglycine dehydrogenase (DMGDH) polypeptides, respectively. Shorter bars correspond to regions of homology with other proteins. Cross-hatching is used to indicate a region of weaker homology
between the
subunit and dimethylglycine dehydrogenase. The numbering in panels A and B refers to
positions in the
subunit and dimethylglycine dehydrogenase,
respectively. Stippled regions indicate ADP-binding motifs.
The covalent FAD attachment site in dimethylglycine dehydrogenase is
indicated. Beta,
subunit from sarcosine oxidase; oct/nop ox, A subunit from octopine and nopaline oxidases; monomeric sox, monomeric sarcosine
oxidases.
Homology of the
Octopine and nopaline oxidases are heterodimeric
enzymes that catalyze oxidative cleavage reactions with N-substituted
arginine derivatives, analogous to the sarcosine oxidase
reaction(36, 37, 38) . The A subunit from
octopine and nopaline oxidases exhibits sequence homology with an
NH Subunit with Octopine and Nopaline
Oxidases
-terminal region in the
subunit that encompasses
about 30% of the
subunit (Fig. 7) and includes the
ADP-binding motif (Fig. 6). The A subunit from octopine and
nopaline oxidases (504 and 435 residues, respectively) is about half
the size of the sarcosine oxidase
subunit. The region of homology
begins near the NH
terminus of the A subunit and involves
about 60% of the polypeptide. In this region, the
and A subunits
exhibit 30-32% identity and 49-52% similarity, as judged by
pairwise comparisons using the GAP program. In the same region, the
octopine and nopaline oxidase subunits exhibit 43% identity and 59%
similarity. A multiple sequence alignment reveals that 50 residues are
conserved among the three subunits within the region of homology (data
not shown).
Homology of the
T-protein, a component of the
multienzyme glycine cleavage system, is less than half the size of the
Subunit with T-protein and
Dimethylglycine Dehydrogenase
subunit. (E. coli T-protein has 364
residues(39, 40) ; eukaryotic T-proteins contain
392-408 residues prior to mitochondrial import which may result
in the loss of a
presequence(41, 42, 43, 44, 45, 46) .)
T-proteins from E. coli and six eukaryotes are found to
exhibit sequence homology with a region in the COOH-terminal half of
the
subunit that includes about 40% of the polypeptide (Fig. 7). The region of homology encompasses the entire length
of the T-proteins except for small segments at the NH
(12-46 residues) and COOH termini (9-10 residues). As
judged by pairwise comparisons using the GAP program, the homology with
the
subunit is somewhat greater with the E. coli T-protein (28% identity, 50% similarity) than with the eukaryotic
T-proteins (19-23% identity, 46-48% similarity).
-terminal half of dimethylglycine dehydrogenase
exhibits homology with the sarcosine oxidase
subunit (see above),
we were surprised to find that a region in the COOH-terminal half of dimethylglycine dehydrogenase is homologous to a region in the
COOH-terminal half of the
subunit. The major homology region of
the
subunit with dimethylglycine dehydrogenase overlaps with that
observed for the
subunit/T-protein homology, but is somewhat
smaller (Fig. 7). Within this region, the
subunit and
dimethylglycine dehydrogenase exhibit 31% identity and 56% similarity.
As described, the limits of homologous regions in pairwise comparisons
could generally be estimated based on the length of the diagonal in a
dot matrix comparison observed at a stringency setting of 16. However,
in comparing the
subunit with dimethylglycine dehydrogenase, the
COOH-terminal end of the diagonal was somewhat ambiguous; small pieces
were seen in a region extending beyond the major diagonal and became
clearly visible at stringency 14. On this basis, a weaker region of
homology may be defined that extends about 100 residues beyond the
major region (Fig. 7).
subunit homology, but is somewhat larger (Fig. 7). The homology with T-proteins encompasses more than 40%
of dimethylglycine dehydrogenase and nearly the entire length of the
T-proteins. In pairwise comparisons, dimethylglycine dehydrogenase and
the various T-proteins were found to exhibit 24-26% identity and
47-51% similarity.
subunit, the COOH-terminal half of
dimethylglycine dehydrogenase, and seven T-proteins shows that 23
residues are conserved among the nine proteins. Strikingly, 5 of the
conserved residues are acidic (4 Glu, 1 Asp), and two additional sites
are occupied by either Glu or Asp. All but two of these acidic sites
and a conserved arginine are found in a 52-residue region of the
subunit (Glu
Asp
) (Fig. 8).
When the data base was scanned using the consensus sequence for this
region, only dimethylglycine dehydrogenase and T-proteins were
identified.
subunit from sarcosine oxidase (alpha), the
COOH-terminal half of rat dimethylglycine dehydrogenase (DMGDH), and various T-proteins. The figure shows a highly
conserved region containing five acidic
sites.
Identification of the purU Gene
An open reading
frame found 340 bases downstream from the soxG gene encodes a
peptide of 286 residues. Using this peptide as the query sequence, a
BLAST search identified purU and tgs genes from E. coli as high scoring sequences(22, 23) ,
along with a purU gene from Shigella flexneri(47) and an incomplete sequence for a putative purN gene from Haemophilus influenzae(48) .
The E. coli purU and tgs genes are identical, as
judged by comparison of their sequences and chromosomal location. The E. coli purU protein exhibits sequence homology with the purN gene product (27% identity)(22) . The putative purN gene from H. influenzae is probably a purU gene, as judged by comparison of its gene product with
the E. coli purU (72% identity) and purN (27%
identity) proteins. terminus. In
pairwise comparisons, the corynebacterial sequence exhibits
42-46% identity and 63-67% similarity with the other purU gene products.
-terminal peptide sequence data and subunit molecular
weights estimated by mass spectroscopy (electrospray or MALDI) and
SDS-gel electrophoresis. The putative glyA gene was identified
based on its homology with known serine hydroxymethyltransferases.
-H
folate) for serine
hydroxymethyltransferase and the coupling of the two reactions will
result in the net conversion of sarcosine to serine. Cellular energy
demands may be met, in part, by the subsequent conversion of serine to
pyruvate, using a constitutive serine dehydratase or possibly an
isozyme that is induced by growth with sarcosine as source of carbon
and energy. Alternate fates for the sarcosine oxidase reaction products
include use of 5,10-CH
-H
folate in various
biosynthetic pathways and the catabolism of glycine to CO
and NH
via the glycine cleavage system, a reaction
which also generates NADH and 5,10-CH
-H
folate.
folate hydrolase. This enzyme regulates the
H
folate one-carbon pool in E. coli and also
provides a source of formate for a key step in de novo purine
biosynthesis(22, 24) . A transcription termination
sequence could not be identified in the intergenic region (340 bases)
between the soxG and purU genes. It is not known
whether the purU gene is part of the sarcosine oxidase operon.
subunit of sarcosine oxidase from Corynebacterium sp. P-1 contains an NH
-terminal ADP-binding motif and
the covalent flavin attachment site, tentatively identified as
His
based on an alignment with a covalent
flavin-containing peptide from a similar corynebacterial (Corynebacterium sp. U-96) sarcosine oxidase. The
subunit exhibits sequence homology with monomeric sarcosine oxidases
and the NH
-terminal half of rat liver dimethylglycine
dehydrogenase, a protein twice the size of the
subunit or the
monomeric enzymes. Homology is also found between the monomeric enzymes
and human liver pipecolic acid oxidase. (
)The results
suggest that the
subunit from corynebacterial sarcosine oxidase,
monomeric sarcosine oxidases, pipecolic acid oxidase, and the
NH
-terminal half of dimethylglycine dehydrogenase may have
evolved from a common ancestral flavoprotein that contained a
covalently bound prosthetic group. This new family of flavoproteins (or
flavodomains) contains enzymes that catalyze analogous oxidation
reactions with secondary or tertiary amino acids. We predict that a
similar domain will be found in the NH
-terminal half of
mammalian sarcosine dehydrogenase since the enzyme appears to be
homologous with dimethylglycine dehydrogenase, as judged by comparison
of amino acid sequence data obtained for the rat liver enzymes around
the covalent flavin attachment site (64% identity)(7) .
-terminal half of the
subunit from sarcosine
oxidase contains a second ADP-binding motif and exhibits homology with
the A subunit from octopine and nopaline oxidases. These heterodimeric
enzymes are encoded by a tumor-inducing plasmid from Agrobacterium
tumefaciens. Octopine and nopaline oxidases catalyze oxidative
cleavage reactions with N-substituted arginine derivatives (N
-(1-D-carboxyethyl)-L-arginine
and N
-(1,3-D-dicarboxypropyl)-L-arginine,
respectively), analogous to the sarcosine oxidase
reaction(36, 37, 38) . However, sarcosine
oxidase is not reduced upon anaerobic incubation with octopine or
creatine, a sarcosine analogue, and metabolic precursor containing a
guanido moiety. (
)
subunit from sarcosine oxidase exhibits sequence homology with
T-protein from various organisms and the COOH-terminal half of rat
dimethylglycine dehydrogenase. T-protein is a component of the
multienzyme glycine cleavage system. Corynebacterial sarcosine oxidase,
dimethylglycine dehydrogenase, and T-protein all catalyze the synthesis
of 5,10-CH
-H
folate from H
folate and
various one-carbon
donors(2, 8, 9, 50) . The results
suggest that the COOH-terminal halves of the
subunit and
dimethylglycine dehydrogenase contain a H
folate-binding
domain.
subunit and dimethylglycine
dehydrogenase may have evolved from a common
H
folate-binding protein. An ancestral
10-CHO-H
folate-binding protein has recently been proposed
as an evolutionary precursor for 10-CHO-H
folate
dehydrogenase and 5`-phosphoribosylglycinamide
transformylase(51, 52) . The ADP-binding motif found
near the NH
terminus of the sarcosine oxidase
subunit
suggests that the subunit may contain an NH
-terminal domain
that binds the enzyme's noncovalent FAD. Dimethylglycine
dehydrogenase may also contain an NH
-terminal
flavin-binding domain as judged by the presence of an ADP-binding motif
and a nearby covalent flavin attachment site. In this case, the
polypeptides may have evolved by fusion of a common ancestral gene for
a H
folate-binding protein with genes encoding different
flavin-binding proteins.
folate, tetrahydrofolate; MALDI,
matrix-assisted laser desorption/ionization;
5,10-CH
-H
folate,
5,10-methylenetetrahydrofolate; 10-CHO-H
folate,
10-formyltetrahydrofolate; TEMED, N,N,N`,N`-tetramethylethylenediamine.
We thank Nolle Potier for performing mass spectral
analysis.
©1995 by The American Society for Biochemistry and Molecular Biology, Inc.