©1995 by The American Society for Biochemistry and Molecular Biology, Inc.
Sequence Analysis of Sarcosine Oxidase and Nearby Genes Reveals Homologies with Key Enzymes of Folate One-carbon Metabolism (*)

(Received for publication, January 26, 1995; and in revised form, March 13, 1995)

Lawrence J. Chlumsky Lening Zhang Marilyn Schuman Jorns (§)

From the Department of Biological Chemistry, Hahnemann University School of Medicine, Philadelphia, Pennsylvania 19102

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

Corynebacterial sarcosine oxidase, a heterotetrameric (alphabeta) enzyme containing covalent and noncovalent FAD, catalyzes the oxidative demethylation of sarcosine to yield glycine, H(2)O(2), and 5,10-CH(2)-tetrahydrofolate (H(4) folate) in a reaction requiring H(4)folate and O(2). The sarcosine oxidase operon contains at least five closely packed genes encoding sarcosine oxidase subunits and serine hydroxymethyltransferase (glyA), arranged in the order glyAsoxBDAG. The operon status of a putative purU gene, found 340 nucleotides downstream from soxG, is not known. No homology with other proteins is observed for the smallest sarcosine oxidase subunits and . The beta subunit (405 residues) contains an ADP-binding motif near its NH(2) terminus, the covalent FAD attachment site (H175), and exhibits homology with the NH(2)-terminal half of dimethylglycine dehydrogenase (857 residues) and monomeric, bacterial sarcosine oxidases (388 residues), enzymes that contain a single covalent FAD. The alpha subunit (967 residues) contains a second ADP-binding motif within an 280 residue region near the NH(2) terminus that exhibits homology with subunit A from octopine and nopaline oxidases, heterodimeric enzymes that catalyze analogous oxidative cleavage reactions with N-substituted arginine derivatives. An 380 residue region near the COOH terminus of alpha exhibits homology with T-protein and the COOH-terminal half of dimethylglycine dehydrogenase. These enzymes catalyze the formation of 5,10-CH(2)-H(4)folate, using different one-carbon donors. The results suggest that the alpha subunit and dimethylglycine dehydrogenase contain an NH(2)-terminal domain that binds noncovalent or covalent FAD, respectively, and a carboxyl-terminal H(4)folate-binding domain.


INTRODUCTION

Sarcosine oxidase is produced as an inducible enzyme when Corynebacterium sp. P-1 is grown with sarcosine as source of carbon and energy(1) . In the presence of oxygen and tetrahydrofolate (H(4)folate), (^1)the enzyme catalyzes the oxidative demethylation of sarcosine (N-methylglycine) to yield glycine, hydrogen peroxide, and 5,10-methylenetetrahydrofolate (5,10-CH(2)-H(4)folate). In the absence of H(4)folate, the same rate of sarcosine oxidation is observed and the oxidized methyl group is released as formaldehyde(2) . In addition to sarcosine, the enzyme can also oxidize cyclic amino acids, like L-proline and L-pipecolic acid, but at slower rates(3) .

Corynebacterial sarcosine oxidase contains four different subunits (alpha, 100 kDa; beta, 42 kDa; , 20 kDa; , 6 kDa), 1 mol of noncovalently bound FAD and 1 mol of FAD covalently attached to a histidyl residue in the beta subunit(1) . The noncovalent flavin accepts electrons from sarcosine which are then transferred in one-electron steps to the covalent flavin where oxygen is reduced to hydrogen peroxide(3, 4) . The presence of covalent and noncovalent flavin is a feature unique to the heterotetrameric sarcosine oxidases but other physiologically important mammalian enzymes, like nitric oxide synthase and NADPH-cytochrome P450 reductase, contain two different flavins that bind noncovalently at the active site(5) .

Bacterial sarcosine oxidases have been isolated from over a dozen different organisms and fall into two major classes: heterotetramers (alpha, 96-100 kDa; beta, 42-45 kDa; , 20-23 kDa; , 6-14 kDa) contain covalent (beta subunit attachment) and noncovalent flavin, like the corynebacterial enzyme; monomeric enzymes (42-45 kDa) contain only covalent flavin and are similar in size to the beta subunit in the heterotetrameric enzymes(6) . It is not known whether H(4)folate can act as a substrate for the monomeric enzymes.

In mammalian liver, oxidative cleavage of the methyl group from sarcosine is catalyzed by sarcosine dehydrogenase (94 kDa), an enzyme that exhibits many similarities with dimethylglycine dehydrogenase (96 kDa). These monomeric enzymes contain a single covalently bound flavin and use H(4)folate as a co-substrate, similar to that observed for corynebacterial sarcosine oxidase(7, 8, 9) . Proline oxidizing enzymes are not known to contain covalently bound flavin(10, 11) . However, pipecolic acid oxidase (46 kDa) from mammalian liver, an enzyme similar in size to the monomeric sarcosine oxidases and the beta subunit from the tetrameric sarcosine oxidases, does contain a single covalently bound flavin(12) . The presence of covalent flavin in sarcosine, dimethylglycine, and pipecolic acid oxidoreductases is noteworthy since this is a fairly uncommon feature, particularly in mammalian flavoproteins.

The complex quaternary structure and multiple binding sites for substrates and prosthetic groups in corynebacterial sarcosine oxidase provide a particularly intriguing target for structure-function studies. We recently reported the single-step cloning and overexpression of the genes coding for sarcosine oxidase from Corynebacterium sp. P-1 and the characterization of the recombinant enzyme(13) . In this paper, we report the structure and organization of the sarcosine oxidase operon and nearby gene(s).


EXPERIMENTAL PROCEDURES

Materials

Restriction enzymes, T4 DNA ligase, and calf intestinal alkaline phosphatase were purchased from New England Biolabs and Promega and used as described by the manufacturer. Sequenase Version 2.0 DNA Sequencing Kit was purchased from Amersham Corp. alpha-P-Labeled dATP and dCTP were purchased from ICN Biomedicals, Inc. Long Ranger Gel Solution was obtained from A. T. Biochem. ProBlott membranes were a gift from Applied Biosystems. Coomassie Blue R-250, TEMED, and ammonium persulfate were purchased from Bio-Rad.

NH(2)-terminal Amino Acid Sequence Analysis

The subunits from 100 pmol of purified sarcosine oxidase from Corynebacterium sp. P-1 were separated by SDS-polyacrylamide gel electrophoresis using a 3-24% acrylamide gradient(13) . The separated subunits were transferred to ProBlott membranes and stained using Coomassie Blue R-250 as described by the manufacturer. The cysteine residues were not protected prior to analysis. The amino termini of the subunits were sequenced by Edman degradation on an Applied Biosystems 477A protein sequencer equipped with an Applied Biosystems 120A Analyzer at the laboratory for Macromolecular Analysis at the Albert Einstein College of Medicine.

DNA Sequencing

The entire corynebacterial insert in pLJC305 was sequenced using a combination of the original plasmid, deletions, and subcloned fragments. Synthetic primers (Ransom Hill Bioscience, Inc.) were used to fill in gaps in the sequence. All the constructs were prepared in pBluescript II phagmid vectors. Single-stranded template was prepared from phagmids in Escherichia coli XL1-Blue using the helper phage VCSM13 as described by Stratagene. Sequence data were obtained using alpha-P-labeled dATP or dCTP with Sequenase 2.0. Both strands were sequenced using both dGTP and dITP chemicals as described by the manufacturer to combat problems inherent in sequencing high G-C content DNA. Sequences were assembled, edited, and analyzed using the Genetics Computer Group software package (14) .

Sequence Analysis

Sequences were masked using the program XNU (15) prior to submission to the National Center for Biotechnology Information for BLAST analysis(16) . Dot matrix comparisons were conducted using the DOTBLOT program (window size = 30, stringency setting = 16, unless otherwise noted)(17) . Pairwise amino acid sequence alignments were made using the GAP program(18) . Multiple sequence alignments were generated using the PILEUP program(19) . Multiple sequence alignment results obtained using the MACAW program (20) were used to define the limits of homologous regions among three or more proteins, as described under ``Results.'' Except for MACAW, the sequence analysis programs were included in the Genetics Computer Group package.

Mass Spectrometry

Sarcosine oxidase (6.83 mg) was mixed with guanidine hydrochloride to a final concentration of 2 M at pH 7.5. Except as noted, the subunits were isolated by chromatography (200 µl/run) on a Superose 12 fast protein liquid chromatography gel filtration column (Pharmacia) pre-equilibrated with 3 M guanidine hydrochloride, 1 mM EDTA, 10 mM potassium phosphate, pH 7.5, and run at 0.5 ml/min for 60 min. The isolated subunits were pooled, concentrated using Centricon-3 microconcentrators (Amicon), and rechromatographed until the individual subunits were pure. After the first gel filtration step, the isolated alpha and beta subunits were incubated for 1 h at room temperature in column buffer containing 50 mM dithiothreitol; the column buffer used for rechromatography of these subunits contained 10 mM dithiothreitol. The volume of the isolated subunits was adjusted to 1.5 ml in 5% acetic acid ( and ) or 5% acetic acid containing 10 mM dithiothreitol (alpha and beta). The samples were centrifuged at 33,000 g for 20 min at 4 °C, and the supernatant removed to a fresh tube. The pellet was resuspended in an additional 1.5 ml of solvent and centrifuged as above. This procedure was repeated until the entire pellet went into solution. The pooled samples were concentrated to a final volume of approximately 100 µl and submitted for analysis. Mass spectrometry (electrospray and MALDI) was kindly performed by Nolle Potier at the Laboratoire de Spectrometrie de Masse Bioorganique, Faculte de Chimie, Universite Louis Pasteur, Strasbourg, France.


RESULTS

Organization of the Sarcosine Oxidase Operon and Nearby Gene(s)

We have sequenced the 7285-base pair corynebacterial DNA insert in pLJC305, a pBluescript II SK(+) construct previously shown to express high levels of sarcosine oxidase using E. coli XL1-Blue as the host organism(13) . The nucleotide sequence and the deduced amino acid sequence of seven open reading frames are shown in Fig. 1. Gene arrangement, intergenic regions, and a restriction map are shown in Fig. 2. This section provides an overview. Details of gene identification and sequence analysis are described later.



Figure 1: Nucleotide sequence of a 7285 base pair Corynebacterium sp. P-1 DNA fragment containing sarcosine oxidase and other nearby gene(s) and the deduced amino acid sequence of the gene products. Numbering begins at the G in the Sau3AI site at the 5` end of the corynebacterial DNA insert in pLJC305(13) . Putative ribosomal-binding sites are indicated by double underlining. Single underlining indicates regions where the deduced amino acid sequence of the sarcosine oxidase subunits matched NH(2)-terminal peptide sequencing data; gaps mainly reflect incomplete peptide sequence data, as detailed in the text. The putative covalent flavin-binding site in the beta subunit of sarcosine oxidase (His) is indicated by an arrow ().




Figure 2: Gene arrangement, intergenic regions, and restrictions sites in the sarcosine oxidase operon and nearby gene(s). The corynebacterial DNA insert in pLJC305 is shown by the heavy line; the lighter line indicates part of the vector multicloning region. Arrows are used to indicate gene position and size. Junctions between contiguous genes are detailed in the boxes. Restriction enzymes are designated as follows: Sa, SacI; X, XbaI; S, SalI; P, PstI; K, KpnI; B, BamHI. For clarity, SalI and PstI sites are omitted in the region downstream from soxD.



Starting from the 5` end of the corynebacterial DNA insert, the first open reading frame codes for a putative serine hydroxymethyltransferase. The next four open reading frames (soxA, soxB, soxG, and soxD) code for the subunits of sarcosine oxidase (alpha, beta, , and , respectively) with the genes arranged in the order, soxBDAG. The serine hydroxymethyltransferase gene (glyA) terminates 2 bases before the start of soxB. The soxB and soxD genes are separated by 11 bases. The stop codon of soxD overlaps with the start codon of soxA. The start codon of soxG is found 8 bases before the end of soxA. The presence of overlapping genes or genes separated by a short intergenic region has been associated with translational coupling. In this phenomenon, the same ribosome or, at least, a component thereof, serves to translate two contiguous genes without ever dissociating from the mRNA(21) . Potential ribosome-binding sites are identifiable 6 to 8 bases before the start of each of the four sox genes (Fig. 2). These sites are positioned such that translation of an upstream gene will terminate within the ribosome-binding site of the corresponding downstream gene, a feature required for the coupling effect.

In contrast to the close packing observed for the first five genes, the next open reading frame is found 340 nucleotides downstream from soxG and codes for a putative purU gene. This gene is important in the regulation of one-carbon folate metabolism in E. coli and is also involved in purine biosynthesis (22, 23, 24) . A seventh open reading frame is found 73 bases from the end of the purU gene. This open reading frame could potentially code for a 65-residue peptide fragment, but a BLAST search failed to identify any homology with known proteins. Potential ribosome-binding sites are found 6 bases upstream from the start of the purU gene and the unidentified, seventh open reading frame.

Codon Usage and G-C Content

The overall G-C content of the six genes identified in this study was 68%. Table 1compares the codon preference of these genes with 24 genes from other Corynebacterium sp. found in the data base, including two genes with a high G-C content (70-71%) and 22 genes with a moderate G-C content (41-59%). As expected, codons containing a G or C in the third position are used preferentially in the Corynebacterium sp. P-1 genes and in the two G-C-rich genes from other corynebacteria. An exception is provided by the codon usage observed for glutamic acid in the Corynebacterium sp. P-1 genes which is evenly distributed between GAG (55%) and GAA (45%), similar to other corynebacterial genes with a moderate G-C content rather than the G-C-rich genes. With the inclusion of our data, the codon usage data base for G-C-rich corynebacterial sequences will increase about 4-fold and may facilitate the design of oligonucleotide probes for use in these organisms.



Identification of the Serine Hydroxymethyltransferase Gene (glyA)

A BLAST search revealed strong homology between the open reading frame at the 5` end of the corynebacterial DNA insert in pLJC305 and the COOH-terminal region of various serine hydroxymethyltransferases. A multiple sequence alignment of the corynebacterial sequence with 21 known serine hydroxymethyltransferases was generated using the program PILEUP. The results, presented as a dendrogram in Fig. 3A, show a clear division into two major clusters corresponding to prokaryotic and eukaryotic sequences. The putative corynebacterial sequence falls within the prokaryotic cluster and exhibits highest homology with serine hydroxymethyltransferase from Hypomicrobium methylovorum, as judged by BLAST analysis and pairwise comparisons using the program GAP. The corynebacterial sequence aligns with a COOH-terminal region containing about 65% of the hypomicrobial serine hydroxymethyltransferase, exhibiting 57% identity and 72% similarity. Further, a lysine residue is found in the corynebacterial sequence at a position corresponding to a highly conserved lysine which serves as the covalent attachment site for pyridoxal phosphate in serine hydroxymethyltransferase (Fig. 3B).


Figure 3: Comparison of the putative serine hydroxymethyltransferase sequence from Corynebacterium sp. P-1 with known serine hydroxymethyltransferases. A, the dendrogram was generated by PILEUP analysis of the corynebacterial sequence with 21 known serine hydroxymethyltransferases found in the data bases. Percent identity and similarity values, shown in parentheses, were obtained from pairwise comparisons. B, the putative serine hydroxymethyltransferase fragment from Corynebacterium sp. P-1 was aligned with the COOH-terminal 65% of serine hydroxymethyltransferase from H. methylovorum(53) using the GAP program. Panel B shows the region of the alignment surrounding the conserved lysine residue (indicated by an arrow) that binds pyridoxal phosphate in known serine hydroxymethyltransferases.



Identification of Genes Encoding the Two Smallest Sarcosine Oxidase Subunits (soxD and soxG)

soxG encodes a polypeptide of 203 residues. The sequence determined for the first 11 amino acids of the subunit by peptide sequence analysis coincides with residues 2-12 in the amino acid sequence deduced from the soxG gene sequence, indicating post-translational loss of the initial methionine (Fig. 1). The molecular weight of the subunit estimated from the gene sequence and corrected for loss of one methionine (20,898) is in excellent agreement with a value determined by electrospray mass spectrometry (20,899) and is consistent with values previously estimated by SDS-gel electrophoreses (Table 2).



soxD codes for a polypeptide containing 98 residues. Peptide sequence analysis of the subunit resulted in the identification of 18 out of the first 22 amino acids. The peptide data agree with the sequence deduced from the gene sequence, except for a mismatch in the third amino acid (Fig. 1). The peptide data indicate that the initial methionine is not lost from the subunit, in contrast to results obtained for all the other subunits. The molecular weight estimated from the gene sequence (11,314) matches the value obtained by electrospray mass spectrometry (11,313) whereas somewhat lower values have been estimated by SDS-gel electrophoresis (Table 2).

A BLAST search detected no homology of the or subunit with any other protein in the data base. These subunits do not contain any known motifs.

Identification and Analysis of soxB, the Gene Encoding the Sarcosine Oxidase Subunit with Covalently Bound FAD

soxB encodes a protein with 405 residues. Peptide sequence analysis identified 34 out of the first 35 amino acids in the beta subunit. The peptide data are in complete agreement with the sequence deduced from the soxB gene sequence and indicate that the initial methionine is absent in the mature protein (Fig. 1). As expected, the molecular weight of the beta subunit estimated from the gene sequence (43,854) is somewhat lower than the value determined by electrospray mass spectrometry (44,314). The best match with the electrospray value is obtained when the estimated molecular weight of beta includes a contribution due to covalently bound FMN (44,308). The stability of the pyrophosphate link in FAD under the analysis conditions is not known. Similar molecular weight values have been obtained by SDS-gel electrophoresis (Table 2).

The beta subunit exhibits an ADP-binding motif near the NH(2) terminus which satisfies all of the 11 consensus sequence requirements described by Wierenga et al.(25) , except for an aspartate at position 1. A BLAST search revealed sequence homology of the beta subunit with the following proteins: peptide fragment data obtained for the beta subunit from a similar heterotetrameric sarcosine oxidase from Corynebacterium sp. U-96(26, 27, 28) ; four monomeric bacterial sarcosine oxidases(29, 30, 31, 32) ; the NH(2)-terminal half of rat liver dimethylglycine dehydrogenase(33) , an enzyme twice the size of the beta subunit. All of these proteins contain an ADP-binding motif near the NH(2) terminus. Fig. 4shows an alignment of the NH(2)-terminal region of all seven sequences. An aspartate is found in position 1 in all of the sequences except for dimethylglycine dehydrogenase which has a glutamate in this position. An aspartate has been observed in position 1 in several well-documented FAD-binding sites(34, 35) . In addition to the six known proteins, BLAST analysis also detected homology of the beta subunit with a putative 95-residue peptide fragment encoded by an unidentified open reading frame found near the cycH gene in Paracoccus denitrificans. The peptide exhibits 70% identity and 84% similarity with the beta subunit and contains an ADP-binding motif near its NH(2) terminus (Fig. 4).


Figure 4: Alignment of the ADP-binding motif in the beta subunit from two corynebacterial sarcosine oxidases, various monomeric sarcosine oxidases, rat dimethylglycine dehydrogenase, and an unidentified open reading frame from P. denitrificans (accession no. Z36942). The 11 positions that define the motif as described by Wierenga et al.(25) are marked by asterisks (*), and the variable loop is indicated between the vertical lines. The flavin attachment site in dimethylglycine dehydrogenase is shown by an arrow ().



The beta subunit exhibits greater than 80% identity with peptide fragment data encompassing about 43% of the beta subunit from the Corynebacterium sp. U-96 enzyme. These fragments are readily aligned with the amino acid sequence deduced from the soxB gene sequence. His is tentatively identified as the covalent flavin attachment site in the beta subunit, based on an alignment with a covalent flavin-containing peptide from the Corynebacterium sp. U-96 enzyme (Fig. 5). In dimethylglycine dehydrogenase, the covalent flavin is attached to His, a position much closer to the ADP-binding motif. The covalent flavin attachment site in the monomeric sarcosine oxidases is not known. The putative covalent flavin attachment site in the beta subunit aligns with a conserved asparagine in the monomeric sarcosine oxidases (Fig. 5). On the other hand, the dimethylglycine dehydrogenase attachment site aligns with a histidine residue that is conserved in all of the monomeric sarcosine oxidases but not in the beta subunit where an alanine (Ala) is found at this position (Fig. 4).


Figure 5: A multiple sequence alignment was generated using the beta subunit from Corynebacterium sp. P-1 sarcosine oxidase, various monomeric sarcosine oxidases, and beta subunit peptide fragments from Corynebacterium sp. U-96 sarcosine oxidase. The figure shows the region of the alignment surrounding the previously identified covalent flavin attachment site in the Corynebacterium sp. U-96 beta subunit (indicated by an asterisk).



Pairwise alignments using the GAP program indicate about 22-25% identity and 45-47% similarity between the beta subunit and the monomeric sarcosine oxidases. The degree of homology among the monomeric sarcosine oxidases themselves shows greater variability (37-86% identity, 58-91% similarity). A multiple sequence alignment of the beta subunit with the monomeric sarcosine oxidases reveals that 42 residues are conserved among the five polypeptides. The most highly conserved regions include the NH(2)-terminal dinucleotide binding motif and an 60-amino acid region near the COOH terminus (Thr to Arg in the beta subunit) (data not shown). The level of homology of the beta subunit with dimethylglycine dehydrogenase (23% identity, 48% similarity) is similar to that observed for the beta subunit and the monomeric sarcosine oxidases but a somewhat lower degree of homology is observed when dimethylglycine dehydrogenase is compared with the monomeric enzymes (18-23% identity, 43-45% similarity).

Identification and Sequence Analysis of the Gene Encoding the Largest Sarcosine Oxidase Subunit (soxA)

soxA codes for a 967-residue protein. Results obtained for the sequence of the first 29 amino acids in the alpha subunit by peptide analysis are in complete agreement with the sequence deduced from the soxA gene sequence, except for the loss of the initial methionine (Fig. 1). The molecular weight of the alpha subunit determined from the gene sequence (102,633) is in good agreement with a value determined by MALDI mass spectrometry (103,160) and consistent with values previously estimated by SDS-gel electrophoresis (Table 2).

An ADP-binding motif is found near the NH(2) terminus of the alpha subunit which meets the 11 consensus sequence requirements (25) , except for an aspartate at position 1 (Fig. 6), a feature also seen with the beta subunit and the monomeric sarcosine oxidases.


Figure 6: A multiple sequence alignment was generated using the NH(2)-terminal half of the alpha subunit from sarcosine oxidase and the A subunit from octopine and nopaline oxidases. The figure shows the region around the ADP-binding motif. Residues defining this motif are indicated by asterisks (*).



Using the alpha subunit as the query sequence in a BLAST analysis, two groups of high-scoring sequences were identified which exhibit homology to regions in either the NH(2)- or the COOH-terminal half of the large alpha polypeptide. The limits of homologous regions involving three or more proteins were estimated based on the location of statistically significant homology blocks obtained when the sequences were aligned using the program MACAW(20) . In comparing two proteins, the extent of sequence homology was estimated based on the length of the diagonal observed in a dot matrix comparison at a stringency setting of 16. Except as noted, this stringency setting eliminated most background noise and the extent of the remaining diagonal was readily discerned. Sequence analysis of the alpha subunit reveals additional homology between bacterial sarcosine oxidase and mammalian dimethylglycine dehydrogenase. The results are summarized in Fig. 7and detailed below.


Figure 7: Homologies and motifs in the alpha subunit from sarcosine oxidase and rat dimethylglycine dehydrogenase. The top bar in panels A and B represent the entire alpha (alpha) and dimethylglycine dehydrogenase (DMGDH) polypeptides, respectively. Shorter bars correspond to regions of homology with other proteins. Cross-hatching is used to indicate a region of weaker homology between the alpha subunit and dimethylglycine dehydrogenase. The numbering in panels A and B refers to positions in the alpha subunit and dimethylglycine dehydrogenase, respectively. Stippled regions indicate ADP-binding motifs. The covalent FAD attachment site in dimethylglycine dehydrogenase is indicated. Beta, beta subunit from sarcosine oxidase; oct/nop ox, A subunit from octopine and nopaline oxidases; monomeric sox, monomeric sarcosine oxidases.



Homology of the alpha Subunit with Octopine and Nopaline Oxidases

Octopine and nopaline oxidases are heterodimeric enzymes that catalyze oxidative cleavage reactions with N-substituted arginine derivatives, analogous to the sarcosine oxidase reaction(36, 37, 38) . The A subunit from octopine and nopaline oxidases exhibits sequence homology with an NH(2)-terminal region in the alpha subunit that encompasses about 30% of the alpha subunit (Fig. 7) and includes the ADP-binding motif (Fig. 6). The A subunit from octopine and nopaline oxidases (504 and 435 residues, respectively) is about half the size of the sarcosine oxidase alpha subunit. The region of homology begins near the NH(2) terminus of the A subunit and involves about 60% of the polypeptide. In this region, the alpha and A subunits exhibit 30-32% identity and 49-52% similarity, as judged by pairwise comparisons using the GAP program. In the same region, the octopine and nopaline oxidase subunits exhibit 43% identity and 59% similarity. A multiple sequence alignment reveals that 50 residues are conserved among the three subunits within the region of homology (data not shown).

Homology of the alpha Subunit with T-protein and Dimethylglycine Dehydrogenase

T-protein, a component of the multienzyme glycine cleavage system, is less than half the size of the alpha subunit. (E. coli T-protein has 364 residues(39, 40) ; eukaryotic T-proteins contain 392-408 residues prior to mitochondrial import which may result in the loss of a presequence(41, 42, 43, 44, 45, 46) .) T-proteins from E. coli and six eukaryotes are found to exhibit sequence homology with a region in the COOH-terminal half of the alpha subunit that includes about 40% of the polypeptide (Fig. 7). The region of homology encompasses the entire length of the T-proteins except for small segments at the NH(2) (12-46 residues) and COOH termini (9-10 residues). As judged by pairwise comparisons using the GAP program, the homology with the alpha subunit is somewhat greater with the E. coli T-protein (28% identity, 50% similarity) than with the eukaryotic T-proteins (19-23% identity, 46-48% similarity).

Since the NH(2)-terminal half of dimethylglycine dehydrogenase exhibits homology with the sarcosine oxidase beta subunit (see above), we were surprised to find that a region in the COOH-terminal half of dimethylglycine dehydrogenase is homologous to a region in the COOH-terminal half of the alpha subunit. The major homology region of the alpha subunit with dimethylglycine dehydrogenase overlaps with that observed for the alpha subunit/T-protein homology, but is somewhat smaller (Fig. 7). Within this region, the alpha subunit and dimethylglycine dehydrogenase exhibit 31% identity and 56% similarity. As described, the limits of homologous regions in pairwise comparisons could generally be estimated based on the length of the diagonal in a dot matrix comparison observed at a stringency setting of 16. However, in comparing the alpha subunit with dimethylglycine dehydrogenase, the COOH-terminal end of the diagonal was somewhat ambiguous; small pieces were seen in a region extending beyond the major diagonal and became clearly visible at stringency 14. On this basis, a weaker region of homology may be defined that extends about 100 residues beyond the major region (Fig. 7).

As might be expected, T-proteins were identified as high-scoring sequences in a BLAST analysis using dimethylglycine dehydrogenase as the query sequence. The region of dimethylglycine dehydrogenase that exhibits homology with the T-proteins overlaps with that observed for the dimethylglycine dehydrogenase/alpha subunit homology, but is somewhat larger (Fig. 7). The homology with T-proteins encompasses more than 40% of dimethylglycine dehydrogenase and nearly the entire length of the T-proteins. In pairwise comparisons, dimethylglycine dehydrogenase and the various T-proteins were found to exhibit 24-26% identity and 47-51% similarity.

A multiple sequence alignment of the COOH-terminal half of the alpha subunit, the COOH-terminal half of dimethylglycine dehydrogenase, and seven T-proteins shows that 23 residues are conserved among the nine proteins. Strikingly, 5 of the conserved residues are acidic (4 Glu, 1 Asp), and two additional sites are occupied by either Glu or Asp. All but two of these acidic sites and a conserved arginine are found in a 52-residue region of the alpha subunit (GluAsp) (Fig. 8). When the data base was scanned using the consensus sequence for this region, only dimethylglycine dehydrogenase and T-proteins were identified.


Figure 8: A multiple sequence alignment was generated using the COOH-terminal half of the alpha subunit from sarcosine oxidase (alpha), the COOH-terminal half of rat dimethylglycine dehydrogenase (DMGDH), and various T-proteins. The figure shows a highly conserved region containing five acidic sites.



Identification of the purU Gene

An open reading frame found 340 bases downstream from the soxG gene encodes a peptide of 286 residues. Using this peptide as the query sequence, a BLAST search identified purU and tgs genes from E. coli as high scoring sequences(22, 23) , along with a purU gene from Shigella flexneri(47) and an incomplete sequence for a putative purN gene from Haemophilus influenzae(48) . The E. coli purU and tgs genes are identical, as judged by comparison of their sequences and chromosomal location. The E. coli purU protein exhibits sequence homology with the purN gene product (27% identity)(22) . The putative purN gene from H. influenzae is probably a purU gene, as judged by comparison of its gene product with the E. coli purU (72% identity) and purN (27% identity) proteins.

A PILEUP comparison of the amino acid sequence deduced for the putative corynebacterial purU gene product with the products from E. coli, S. flexneri, and H. influenzae purU genes demonstrates a high degree of sequence conservation, particularly in the COOH-terminal two thirds of the proteins (Fig. 9). The alignment suggests that the incomplete H. influenzae purU gene is missing a region coding for the first 65 amino acids at the NH(2) terminus. In pairwise comparisons, the corynebacterial sequence exhibits 42-46% identity and 63-67% similarity with the other purU gene products.


Figure 9: Multiple sequence alignment of the amino acid sequence deduced for the putative corynebacterial purU gene product with the products from E. coli, S. flexneri, and H. influenzae purU genes.




DISCUSSION

We have sequenced a corynebacterial DNA insert in pLJC305, a construct previously shown to express a heterotetrameric sarcosine oxidase in E. coli(13) . The 5` end of the insert contains five closely packed genes. These genes code for the subunits of sarcosine oxidase (soxA, soxB, soxG, soxD) and a putative serine hydroxymethyltransferase (glyA). They are arranged in the order, glyAsoxBDAG, and appear to be organized for efficient, coupled translation. The sox genes were identified by comparison of the translated nucleotide sequence with NH(2)-terminal peptide sequence data and subunit molecular weights estimated by mass spectroscopy (electrospray or MALDI) and SDS-gel electrophoresis. The putative glyA gene was identified based on its homology with known serine hydroxymethyltransferases.

Expression of all four sarcosine oxidase subunits from the corynebacterial genes in pLJC305 is completely under the control of the vector-encoded lac promotor(13) . Since the lac promotor is located just upstream from the 5` end of the corynebacterial DNA insert, this means that neither the glyA gene fragment nor any of the sox genes can contain a transcription terminator sequence, or, at least, not one that is recognized by the E. coli RNA polymerase. This indicates that the sarcosine oxidase operon probably contains the glyA gene in addition to the four sox genes. The latter result is generally expected in prokaryotes where genes coding for subunits of a oligomeric enzyme are typically found clustered in an operon(49) . That the operon also includes the glyA gene is understandable in a metabolic sense since the sarcosine oxidase reaction generates the substrates (glycine and 5,10-CH(2)-H(4)folate) for serine hydroxymethyltransferase and the coupling of the two reactions will result in the net conversion of sarcosine to serine. Cellular energy demands may be met, in part, by the subsequent conversion of serine to pyruvate, using a constitutive serine dehydratase or possibly an isozyme that is induced by growth with sarcosine as source of carbon and energy. Alternate fates for the sarcosine oxidase reaction products include use of 5,10-CH(2)-H(4)folate in various biosynthetic pathways and the catabolism of glycine to CO(2) and NH(3) via the glycine cleavage system, a reaction which also generates NADH and 5,10-CH(2)-H(4)folate.

A putative purU gene, located downstream from the soxG gene, was identified based on its homology with known purU gene products. The purU gene codes for a 10-CHO-H(4)folate hydrolase. This enzyme regulates the H(4)folate one-carbon pool in E. coli and also provides a source of formate for a key step in de novo purine biosynthesis(22, 24) . A transcription termination sequence could not be identified in the intergenic region (340 bases) between the soxG and purU genes. It is not known whether the purU gene is part of the sarcosine oxidase operon.

The beta subunit of sarcosine oxidase from Corynebacterium sp. P-1 contains an NH(2)-terminal ADP-binding motif and the covalent flavin attachment site, tentatively identified as His based on an alignment with a covalent flavin-containing peptide from a similar corynebacterial (Corynebacterium sp. U-96) sarcosine oxidase. The beta subunit exhibits sequence homology with monomeric sarcosine oxidases and the NH(2)-terminal half of rat liver dimethylglycine dehydrogenase, a protein twice the size of the beta subunit or the monomeric enzymes. Homology is also found between the monomeric enzymes and human liver pipecolic acid oxidase. (^2)The results suggest that the beta subunit from corynebacterial sarcosine oxidase, monomeric sarcosine oxidases, pipecolic acid oxidase, and the NH(2)-terminal half of dimethylglycine dehydrogenase may have evolved from a common ancestral flavoprotein that contained a covalently bound prosthetic group. This new family of flavoproteins (or flavodomains) contains enzymes that catalyze analogous oxidation reactions with secondary or tertiary amino acids. We predict that a similar domain will be found in the NH(2)-terminal half of mammalian sarcosine dehydrogenase since the enzyme appears to be homologous with dimethylglycine dehydrogenase, as judged by comparison of amino acid sequence data obtained for the rat liver enzymes around the covalent flavin attachment site (64% identity)(7) .

The NH(2)-terminal half of the alpha subunit from sarcosine oxidase contains a second ADP-binding motif and exhibits homology with the A subunit from octopine and nopaline oxidases. These heterodimeric enzymes are encoded by a tumor-inducing plasmid from Agrobacterium tumefaciens. Octopine and nopaline oxidases catalyze oxidative cleavage reactions with N-substituted arginine derivatives (N^2-(1-D-carboxyethyl)-L-arginine and N^2-(1,3-D-dicarboxypropyl)-L-arginine, respectively), analogous to the sarcosine oxidase reaction(36, 37, 38) . However, sarcosine oxidase is not reduced upon anaerobic incubation with octopine or creatine, a sarcosine analogue, and metabolic precursor containing a guanido moiety. (^3)

The COOH-terminal half of the alpha subunit from sarcosine oxidase exhibits sequence homology with T-protein from various organisms and the COOH-terminal half of rat dimethylglycine dehydrogenase. T-protein is a component of the multienzyme glycine cleavage system. Corynebacterial sarcosine oxidase, dimethylglycine dehydrogenase, and T-protein all catalyze the synthesis of 5,10-CH(2)-H(4)folate from H(4)folate and various one-carbon donors(2, 8, 9, 50) . The results suggest that the COOH-terminal halves of the alpha subunit and dimethylglycine dehydrogenase contain a H(4)folate-binding domain.

A generic folate motif has not been found, not even for folate-dependent enzymes that use the same derivatives as substrates. The observed sequence homology suggests that T-protein and the COOH-terminal domains of the alpha subunit and dimethylglycine dehydrogenase may have evolved from a common H(4)folate-binding protein. An ancestral 10-CHO-H(4)folate-binding protein has recently been proposed as an evolutionary precursor for 10-CHO-H(4)folate dehydrogenase and 5`-phosphoribosylglycinamide transformylase(51, 52) . The ADP-binding motif found near the NH(2) terminus of the sarcosine oxidase alpha subunit suggests that the subunit may contain an NH(2)-terminal domain that binds the enzyme's noncovalent FAD. Dimethylglycine dehydrogenase may also contain an NH(2)-terminal flavin-binding domain as judged by the presence of an ADP-binding motif and a nearby covalent flavin attachment site. In this case, the polypeptides may have evolved by fusion of a common ancestral gene for a H(4)folate-binding protein with genes encoding different flavin-binding proteins.


FOOTNOTES

*
This work was supported in part by Grant GM 31704 (to M. S. J.) from the National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBank®/EMBL Data Bank with accession number(s) U23955[GenBank].

§
To whom correspondence and reprint requests should be addressed. Tel.: 215-762-7946; Fax: 215-246-5836.

^1
The abbreviations used are: H(4)folate, tetrahydrofolate; MALDI, matrix-assisted laser desorption/ionization; 5,10-CH(2)-H(4)folate, 5,10-methylenetetrahydrofolate; 10-CHO-H(4)folate, 10-formyltetrahydrofolate; TEMED, N,N,N`,N`-tetramethylethylenediamine.

^2
S. Mihalik, personal communication.

^3
L. J. Chlumsky and M. S. Jorns, unpublished observations.


ACKNOWLEDGEMENTS

We thank Nolle Potier for performing mass spectral analysis.


REFERENCES

  1. Kvalnes-Krick, K., and Jorns, M. S.(1986)Biochemistry25,6061-6069 [Medline] [Order article via Infotrieve]
  2. Kvalnes-Krick, K., and Jorns, M. S.(1987)Biochemistry26,7391-7395 [Medline] [Order article via Infotrieve]
  3. Zeller, H.-D., Hille, R., and Jorns, M. S.(1989)Biochemistry28,5145-5154 [Medline] [Order article via Infotrieve]
  4. Ali, S. N., Zeller, H.-D., Calisto, M. K., and Jorns, M. S.(1991) Biochemistry30,10980-10986 [Medline] [Order article via Infotrieve]
  5. Bredt, D. S., Hwang, P. M., Glatt, C. E., Lowenstein, C., Reed, R. R., and Snyder, S. H. (1991)Nature351,714-718 [CrossRef][Medline] [Order article via Infotrieve]
  6. Kvalnes-Krick, K., and Jorns, M. S. (1991) in Chemistry and Biochemistry of Flavoenzymes (Muller, F., ed) Vol. 2, pp. 425-435, CRC Press, Boca Raton
  7. Cook, R. J., Misono, K. S., and Wagner, C.(1985)J. Biol. Chem. 260,12998-13002 [Abstract/Free Full Text]
  8. Steenkamp, D. J., and Husain, M.(1982)Biochem. J.203,707-715 [Medline] [Order article via Infotrieve]
  9. Porter, D. H., Cook, R. J., and Wagner, C.(1985)Arch. Biochem. Biophys. 243,396-407 [Medline] [Order article via Infotrieve]
  10. Despicer, P. O., and Maloy, S.(1993)Proc. Natl. Acad. Sci. U. S. A. 90,4295-4298 [Abstract]
  11. Adams, E., and Frank, L. (1980)Annu. Rev. Biochem.49,1005-1061 [CrossRef][Medline] [Order article via Infotrieve]
  12. Mihalik, S. J., McGuinness, M., and Watkins, P. A.(1991)J. Biol. Chem.266,4822-4830 [Abstract/Free Full Text]
  13. Chlumsky, L. J., Zhang, L., Ramsey, A. J., and Jorns, M. S.(1993) Biochemistry32,11132-11142 [Medline] [Order article via Infotrieve]
  14. Devereux, R., Haeberli, P., and Smithies, O.(1984)Nucleic Acids Res. 12,387-395 [Abstract]
  15. Claverie, J.-M., and States, D. J.(1993)Computers Chem.17,191-201 [CrossRef]
  16. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J.(1990) J. Mol. Biol.215,403-410 [CrossRef][Medline] [Order article via Infotrieve]
  17. Maizel, J. V., and Lenk, R. P.(1981)Proc. Natl. Acad. Sci. U. S. A. 78,7665-7669 [Abstract]
  18. Needleman, S. B., and Wunsch, C. D.(1970)J. Mol. Biol.48,443-453 [Medline] [Order article via Infotrieve]
  19. Feng, D. F., and Doolittle, A.(1990)Methods Enzymol.183,375-387 [Medline] [Order article via Infotrieve]
  20. Schuler, G. D., Altschul, S. F., and Lipman, D. J.(1991) Protein-Struct. Funct. Genet.9,180-190
  21. Normark, S., Bergstrom, S., Edlund, T., Grundstrom, T., Jaurin, B., Lindberg, F. P., and Olof, O.(1983)Ann. Rev. Genet.17,499-525 [CrossRef][Medline] [Order article via Infotrieve]
  22. Nagy, P. L., McCorkle, G. M., and Zalkin, H.(1993)J. Bacteriol. 175,7066-7073 [Abstract]
  23. Bosl, M., and Kersten, H.(1994)J. Bacteriol.176,221-231 [Abstract]
  24. Nagy, P. L., Marolewski, A., Benkovic, S. J., and Zalkin, H.(1995)J. Bacteriol.177,1292-1298 [Abstract]
  25. Wierenga, R. K., Terpstra, P., and Hol, W. G. J.(1986)J. Mol. Biol. 187,101-107 [Medline] [Order article via Infotrieve]
  26. Shiga, Y., Hayashi, S., Suzuki, M., Suzuki, K., and Nakamura, S.(1983) Biochem. Int.6,737-742 [Medline] [Order article via Infotrieve]
  27. Suzuki, H., and Kawamura-Konishi, Y.(1991)J. Biochem. (Tokyo) 109,909-917 [Abstract]
  28. Suzuki, H., and Kawamura-Konishi, Y.(1988)Biochem. Int. 17,577-583 [Medline] [Order article via Infotrieve]
  29. Nishiya, Y., and Imanaka, T.(1993)J. Ferm. Bioeng.75,239-244
  30. Suzuki, K., Ogishima, M., Sugiyama, M., Inouye, Y., Nakamura, S., and Imamura, S. (1992)Biosci. Biotech. Biochem.56,432-436 [Medline] [Order article via Infotrieve]
  31. Koyama, Y., Yamamoto-Otake, H., Suzuki, M., and Nakano, E.(1991) Agric. Biol. Chem.55,1259-1293 [Medline] [Order article via Infotrieve]
  32. Suzuki, K., Sagai, H., Imamura, S., and Sugiyama, M.(1994)J. Ferm. Bioeng. 77,231-234
  33. Lang, H., Polster, M., and Brandsch, R.(1991)Eur. J. Biochem. 198,793-799 [Abstract]
  34. Mckie, J. H., and Douglas, K. T.(1991)FEBS Lett.279,5-8 [CrossRef][Medline] [Order article via Infotrieve]
  35. VanBeeumen, J. J., Demol, H., Samyn, B., Bartsch, R. G., Meyer, T. E., Dolata, M. M., and Cusanovich, M. A.(1991)J. Biol. Chem.266,12921-12931 [Abstract/Free Full Text]
  36. Zanker, H., Lurz, G., Langridge, U., Langridge, P., Kreusch, D., and Schroder, J. (1994)J. Bacteriol.176,4511-4517 [Abstract]
  37. von Lintig, J., Kreusch, D., and Schroder, J.(1994)J. Bacteriol. 176,495-503 [Abstract]
  38. Sans, N., Schroder, G., and Schroder, J.(1987)Eur. J. Biochem. 167,81-87 [Abstract]
  39. Stauffer, L. T., Ghrist, A., and Stauffer, G. V.(1993)DNA Sequence 3,339-346 [Medline] [Order article via Infotrieve]
  40. Okamura-Ikeda, K., Ohmura, Y., Fujiwara, K., and Motokawa, Y.(1993)Eur. J. Biochem.216,539-548 [Abstract]
  41. Okamura-Ikeda, K., Fujiwara, K., Yamamoto, M., Hiraga, K., and Motokawa, Y.(1991) J. Biol. Chem.266,4917-4921 [Abstract/Free Full Text]
  42. Okamura-Ikeda, K., Fujiwara, K., and Motokawa, Y.(1992)J. Biol. Chem. 267,18284-18290 [Abstract/Free Full Text]
  43. Okamura-Ikeda, K., Fujiwara, K., Yamamoto, M., Hiraga, K., and Motokawa, Y.(1991) J. Biol. Chem.266,4917-4921 [Abstract/Free Full Text]
  44. Hayasaka, L., Nanao, K., Takada, G., Okamura-Ikeda, K., and Motokawa, Y.(1993) Biochem. Biophys. Res. Commun.192,766-771 [CrossRef][Medline] [Order article via Infotrieve]
  45. Bourguignon, J., Vauclare, P., Merand, V., Forest, E., Neuburger, M., and Douce, R. (1993)Eur. J. Biochem.217,377-386 [Abstract]
  46. Kopriva, S., and Bauwe, H.(1994)Plant. Physiol.104,1079-1080 [Free Full Text]
  47. Hromockyj, A. E., Tucker, S. C., and Maurelli, A. T.(1992)Mol. Microbiol.6,2113-2124 [Medline] [Order article via Infotrieve]
  48. Maskell, D.(1993) Gene (Amst.) 129,155-156 [Medline] [Order article via Infotrieve]
  49. Bachmann, B. J. (1983)Microbiol. Rev.47,180-230
  50. Okamura-Ikeda, K., Fujiwara, K., and Motokawa, Y.(1987)J. Biol. Chem. 262,6746-6749 [Abstract/Free Full Text]
  51. Cook, R. J., Lloyd, R. S., and Wagner, C.(1991)J. Biol. Chem. 266,4965-4973 [Abstract/Free Full Text]
  52. Schirch, D., Villar, E., Maras, B., Barra, D., and Schirch, V.(1994)J. Biol. Chem.269,24728-24735 [Abstract/Free Full Text]
  53. Miyata, A., Yoshida, T., Yamaguchi, K., Yokoyama, C., Tanabe, T., Toh, H., Mitsunaga, T., and Izumi, Y.(1993)Eur. J. Biochem.212,745-750 [Abstract]

©1995 by The American Society for Biochemistry and Molecular Biology, Inc.