Screening genomes of Gram-positive bacteria for double-glycine-motif-containing peptides

G. Dirix1,{dagger}, P. Monsieurs2,{dagger}, K. Marchal2, J. Vanderleyden1 and J. Michiels1

1 Centre of Microbial and Plant Genetics, K. U. Leuven, Heverlee, Belgium
2 ESAT-SCD, K. U. Leuven, Heverlee, Belgium

Correspondence
J. Michiels
(Jan.Michiels{at}agr.kuleuven.ac.be)


{dagger}Both authors made equal contributions to this article.

Secreted peptides fulfil major functions in the physiology of eukaryotes as well as prokaryotes. Yet, in many genome-sequencing projects, small peptides either remain un-annotated or are classified as hypothetical open reading frames, without any function associated. Therefore, the identification of signal peptide sequences would help in finding novel peptide genes or in assigning functions to automatically annotated sequences.

In Gram-positive bacteria, the double-glycine (GG) motif plays a key role in many peptide secretion systems involved in quorum sensing and bacteriocin production. Competence-stimulating peptides and class II bacteriocins, produced by streptococci and lactic acid bacteria, respectively, are generally synthesized as inactive prepeptides containing a conserved GG-type leader sequence. This leader sequence, typically between 15 and 30 aa in length, is recognized and proteolytically removed during secretion by its cognate ABC-transporter, resulting in the release and activation of the peptide. Processed peptides vary in length from 17 to over 80 aa. The GG-type leader sequence is well conserved and possesses the following consensus: LSX2ELX2IXGG (Havarstein et al., 1994). Beside this conserved leader sequence, GG-motif-containing peptides lack common sequence similarities. Their cognate transporters contain a specific domain of about 150 aa which is responsible for the proteolytic removal of the GG-type leader peptide and, on the basis of its sequence, has been classified as the Peptidase C39 protein family domain. The Peptidase C39 domain contains two conserved motifs, called the cysteine and the histidine motifs (Havarstein et al., 1995).

Our aim was to detect new GG-motif-containing peptides in the fully sequenced genomes of Gram-positive bacteria. Since many peptides containing such a motif are small, it is likely that many of them may not have been annotated in genome-sequencing projects or have not been recognized as secreted peptides. Therefore, an in silico strategy was designed and applied at the nucleotide level. The 45 fully sequenced genomes of Gram-positive bacteria [situation on 15 September 2003; for a complete list, see Dirix et al. (2004)] were screened both for the presence of GG-motifs and for Peptidase C39 domains. For the latter screening, a motif was available (http://www.sanger.ac.uk/Software/Pfam; accession number PF03412) (Bateman et al., 2002); for the GG-motif search, a new model was built based on already known GG-motif peptides (Dirix et al., 2004; Michiels et al., 2001). Based on our knowledge of characterized GG-motif-containing peptides, several restrictions on the GG-motif candidate genes were imposed. First, the GG-motif was forced to end with a Gly-Gly or a Gly-Ala pair. Secondly, only those peptides were selected from which the coding region was located less than 10 kb from the coding region of a Peptidase C39 domain-containing gene. Finally, the length of the leader sequence and the total peptide length were set to a maximum of 30 and 150 aa, respectively. As a result, by using these restrictions, we cannot exclude that some GG-motif peptides were not retrieved during the screening process.

A search for the Peptidase C39 domain in 45 fully sequenced Gram-positive genomes resulted in a total of 29 hits. These hits were found in the genera Bacillus, Clostridium, Enterococcus, Lactobacillus, Lactococcus, Mycoplasma, Streptococcus, Streptomyces and Ureaplasma, but not in the genus Bifidobacterium, Corynebacterium, Deinococcus, Listeria, Mycobacterium, Oceanobacillus, Staphylococcus or Tropheryma. Interestingly, all of the screened lactic acid bacteria, with the exception of Streptococcus agalactiae (strains 2603V/R and NEM316) and Bifidobacterium longum NCC2705, contain a Peptidase C39 domain. In several strains belonging to the genera Streptococcus and Enterococcus, more than one protein containing the C39 domain was found. Besides two protein hits that are truncated in their Peptidase C39 domain, all hits contain the conserved cysteine and histidine motifs involved in GG-motif recognition and peptidase activity (Havarstein et al., 1995), suggesting that those domains have peptidase activity.

The screening for peptides containing a GG-motif resulted in a total of 48 candidate peptides. Although out of the 45 screened bacterial genomes, only 12 genomes were from lactic acid bacteria, 92 % of all GG-motif-containing hits were found in lactic acid bacteria (of which 80 % belong to streptococcal strains). The size of the peptides ranges from 29 to 126 aa, or in the mature form (i.e. without the leader peptide) from 11 to 103 aa. A list of the possible GG-motif-containing peptides, their cognate transport protein, their length, amino acid context, theoretical pI and molecular mass is given in Table 1. If available, the gene name of the GG-peptide-encoding sequence was taken from the genome annotation data and included in Table 1. Sixty-seven per cent of the candidate peptides have a high glycine content (>10 % Gly), whereas in 63 % of the peptides more than half of the amino acids are hydrophobic. Also, half of the hits have two or more cysteine residues and, in 56 % of the peptides, the theoretical pI is higher than 8. These data are consistent with the properties of previously described GG-motif-containing peptides (Ennahar et al., 2000; Jack et al., 1995). Among the 48 hits, three were not annotated in the corresponding genome sequence project. Seventeen hits, annotated as hypothetical proteins, did not display similarity to any known protein or peptide. The remaining hits are bacteriocins (n=15) or bacteriocin homologues (n=10), a conserved domain protein (n=1), a plantaricin biosynthesis protein (n=1) and a phage-related protein (n=1).


View this table:
[in this window]
[in a new window]
 
Table 1. List of the possible GG-motif-containing peptides

 
Physical linkage to one or more possible GG-peptide(s) was obtained in 21 out of the 29 Peptidase C39 domains retrieved. Screening of the lactic acid bacterium Lactococcus lactis subsp. lactis strain IL1403 revealed the presence of two un-annotated putative GG-peptides. In this strain, LcnC, a peptidase C39 domain containing protein, was shown previously to transport lactococcin A (Bolotin et al., 2001). The bacteriocin lactococcin A is synthesized as a precursor containing a GG-type leader peptide (only produced by some Lactococcus lactis strains) and is plasmid-encoded (Holo et al., 1991). Although the lcnC and lcnD genes are present on the chromosome of strain IL1403, no gene encoding a lactococcin A homologue was found, either on the chromosome or on one of its plasmids. As suggested previously, the LcnCD proteins could secrete compounds other than bacteriocins (Venema et al., 1996). The two putative GG-peptides obtained from this screening are good candidates for substrates of LcnCD. In Lactobacillus plantarum WCFS1, six GG-peptides were found, five of which are the plantaricin bacteriocins PlnA, PlnE, PlnF, PlnJ and PlnN, the other is PlnY, annotated as a putative plantaricin biosynthesis protein (Diep et al., 1996; Nissen-Meyer et al., 1993). Although the precursor of bacteriocin PlnK contains a GG-motif in Lactobacillus plantarum C11 (Diep et al., 1996), PlnK was not retrieved in this screening. Further analysis learnt that the plnK genes from Lactobacillus plantarum strains C11 and WCFS1 differ in two nucleotides, corresponding to one amino acid difference in the GG-motif. The ‘GG-motif’ of the Lactobacillus plantarum WCFS1 PlnK ends with a Gly-Asn pair (in contrast to Gly-Gly in Lactobacillus plantarum C11), and was therefore not identified in our screening, as only Gly-Gly or Gly-Ala pairs were allowed.

The screened streptococci can be subdivided into the naturally competent (Streptococcus pneumoniae and Streptococcus mutans) and the non-competent (Streptococcus agalactiae and Streptococcus pyogenes) species (Havarstein et al., 1997; Li et al., 2001). All screened strains belonging to the competent group have more than one Peptidase C39 domain-containing protein, of which one is the competence-stimulating peptide transporter ComA. Although the pheromone was never found in this screening (because of the 10 kb restriction), many other possible GG-peptides were retrieved. Beside hypothetical proteins, these hits constitute bacteriocins or bacteriocin homologues (de Saizieu et al., 2000). The non-competent group can be further subdivided on the basis of the presence (Streptococcus pyogenes strains MGAS315, MGAS8232 and SSI-1) or the absence (Streptococcus pyogenes strain M1 GAS and Streptococcus agalactiae) of a Peptidase C39 domain. In the Streptococcus pyogenes strains containing a Peptidase C39 domain, several putative bacteriocins (de Saizieu et al., 2000) and a putative pheromone (Smoot et al., 2002) were retrieved, including the lantibiotic salivaricin A, which functions both as a bacteriocin and a pheromone (Ross et al., 1993; Upton et al., 2001). In Enterococcus faecalis V583, one putative GG-peptide was found, annotated as a hypothetical protein.

In addition to the lactic acid bacteria, the screening also revealed four more GG-motif-encoding genes in the strains Bacillus subtilis subsp. subtilis str. 168, Clostridium acetobutylicum ATCC 824, Streptomyces avermitilis MA-4680 and Streptomyces coelicolor A3(2), encoding a phage-related protein (Bacillus subtilis) (Kunst et al., 1997) and three hypothetical proteins.

Finally, none of the Peptidase C39 transporter genes found in Mycoplasma and Ureaplasma strains, in Bacillus halodurans and Enterococcus faecalis (the second C39 domain only) is linked to a gene encoding a GG-peptide. The latter two proteins are involved in the transport of mersacidin and cytolysin, respectively. Mersacidin and cytolysin are two lantibiotics that are synthesized as prepropeptides with GG-type leader sequences that differ too much from the GG-motif consensus sequence (mersacidin) or end with a Gly-Ser pair (both precursors from cytolysin) and were therefore not retrieved in this screening (Altena et al., 2000; Gilmore et al., 1994). The same holds true for sublancin 168, a lantibiotic from Bacillus subtilis, of which the leader sequence also ends with a Gly-Ser pair (Paik et al., 1998).

To conclude, our screening strategy led to new insights in the distribution of GG-peptide processing and secretion systems among Gram-positive bacteria. The results are not dependent on previous annotations as the screening was performed at the nucleotide level. Interestingly, for all Peptidase C39 domains identified, one or more possible GG-peptide genes were found within the 10 kb limit of the in silico screening or in the literature (see also previous paragraph), except for Mycoplasma and Ureaplasma species. Although the competence-stimulating peptides from Streptococcus competent strains have not been retrieved in this analysis, other GG-motif-containing candidates were found. This could imply that cognate transporters are used to secrete multiple compounds with very different functions. More than half of the GG-hits retrieved are bacteriocins or putative bacteriocins, some of them also function as pheromones. More than 40 % of the identified peptide genes were either un-annotated or had not yet been recognized as secreted peptides in the genome-sequencing projects. These peptide genes were detected in the genomes of lactic acid bacteria, but also in the genera Bacillus, Clostridium and Streptomyces. Finally, not all known GG-motif-containing peptides were found in the current analysis. Also, in the case of several Peptidase C39 domains no corresponding peptides were present. This may be due to the 10 kb restriction, but this clearly is not always the case (e.g. for mersacidin, cytolysin and the plantaricin PlnK). On the other hand, the motif of the GG-leader sequence used may be too specific. Therefore, as more GG-containing peptides will be characterized biochemically in the future, including those with a divergent leader sequence, the algorithm could be further refined.

REFERENCES

Altena, K., Guder, A., Cramer, C. & Bierbaum, G. (2000). Biosynthesis of the lantibiotic mersacidin: organization of a type B lantibiotic gene cluster. Appl Environ Microbiol 66, 2565–2571.[Abstract/Free Full Text]

Bateman, A., Birney, E., Cerruti, L. & 7 other authors (2002). The Pfam protein families database. Nucleic Acids Res 30, 276–280.[Abstract/Free Full Text]

Bolotin, A., Wincker, P., Mauger, S., Jaillon, O., Malarme, K., Weissenbach, J., Ehrlich, S. D. & Sorokin, A. (2001). The complete genome sequence of the lactic acid bacterium Lactococcus lactis ssp. lactis IL1403. Genome Res 11, 731–753.[Abstract/Free Full Text]

de Saizieu, A., Gardes, C., Flint, N., Wagner, C., Kamber, M., Mitchell, T. J., Keck, W., Amrein, K. E. & Lange, R. (2000). Microarray-based identification of a novel Streptococcus pneumoniae regulon controlled by an autoinduced peptide. J Bacteriol 182, 4696–4703.[Abstract/Free Full Text]

Diep, D. B., Havarstein, L. S. & Nes, I. F. (1996). Characterization of the locus responsible for the bacteriocin production in Lactobacillus plantarum C11. J Bacteriol 178, 4472–4483.[Abstract]

Dirix, G., Monsieurs, P., Dombrecht, B., Daniels, R., Marchal, K., Vanderleyden, J. & Michiels, J. (2004). Peptide signal molecules and bacteriocins in Gram-negative bacteria: a genome-wide in silico screening for peptides containing a double-glycine leader sequence and their cognate transporters. Peptides (in press).

Ennahar, S., Sashihara, T., Sonomoto, K. & Ishizaki, A. (2000). Class IIa bacteriocins: biosynthesis, structure and activity. FEMS Microbiol Rev 24, 85–106.[CrossRef][Medline]

Gilmore, M. S., Segarra, R. A., Booth, M. C., Bogie, C. P., Hall, L. R. & Clewell, D. B. (1994). Genetic structure of the Enterococcus faecalis plasmid pAD1-encoded cytolytic toxin system and its relationship to lantibiotic determinants. J Bacteriol 176, 7335–7344.[Abstract]

Havarstein, L. S., Holo, H. & Nes, I. F. (1994). The leader peptide of colicin V shares consensus sequences with leader peptides that are common among peptide bacteriocins produced by gram-positive bacteria. Microbiology 140, 2383–2389.[Abstract]

Havarstein, L. S., Diep, D. B. & Nes, I. F. (1995). A family of bacteriocin ABC transporters carry out proteolytic processing of their substrates concomitant with export. Mol Microbiol 16, 229–240.[Medline]

Havarstein, L. S., Hakenbeck, R. & Gaustad, P. (1997). Natural competence in the genus Streptococcus: evidence that streptococci can change pherotype by interspecies recombinational exchanges. J Bacteriol 179, 6589–6594.[Abstract]

Holo, H., Nilssen, O. & Nes, I. F. (1991). Lactococcin A, a new bacteriocin from Lactococcus lactis subsp. cremoris: isolation and characterization of the protein and its gene. J Bacteriol 173, 3879–3887.[Medline]

Jack, R. W., Tagg, J. R. & Ray, B. (1995). Bacteriocins of gram-positive bacteria. Microbiol Rev 59, 171–200.[Medline]

Kunst, F., Ogasawara, N., Moszer, I. & 148 other authors (1997). The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390, 249–256.[CrossRef][Medline]

Li, Y. H., Lau, P. C., Lee, J. H., Ellen, R. P. & Cvitkovitch, D. G. (2001). Natural genetic transformation of Streptococcus mutans growing in biofilms. J Bacteriol 183, 897–908.[Abstract/Free Full Text]

Michiels, J., Dirix, G., Vanderleyden, J. & Xi, C. (2001). Processing and export of peptide pheromones and bacteriocins in Gram-negative bacteria. Trends Microbiol 9, 164–168.[CrossRef][Medline]

Nissen-Meyer, J., Larsen, A. G., Sletten, K., Daeschel, M. & Nes, I. F. (1993). Purification and characterization of plantaricin A, a Lactobacillus plantarum bacteriocin whose activity depends on the action of two peptides. J Gen Microbiol 139, 1973–1978.[Medline]

Paik, S. H., Chakicherla, A. & Hansen, J. N. (1998). Identification and characterization of the structural and transporter genes for, and the chemical and biological properties of, sublancin 168, a novel lantibiotic produced by Bacillus subtilis 168. J Biol Chem 273, 23134–23142.[Abstract/Free Full Text]

Ross, K. F., Ronson, C. W. & Tagg, J. R. (1993). Isolation and characterization of the lantibiotic salivaricin A and its structural gene salA from Streptococcus salivarius 20P3. Appl Environ Microbiol 59, 2014–2021.[Abstract]

Smoot, J. C., Barbian, K. D., Van Gompel, J. J. & 15 other authors (2002). Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proc Natl Acad Sci U S A 99, 4668–4673.[Abstract/Free Full Text]

Upton, M., Tagg, J. R., Wescombe, P. & Jenkinson, H. F. (2001). Intra- and interspecies signaling between Streptococcus salivarius and Streptococcus pyogenes mediated by SalA and SalA1 lantibiotic peptides. J Bacteriol 183, 3931–3938.[Abstract/Free Full Text]

Venema, K., Dost, M. H., Beun, P. A., Haandrikman, A. J., Venema, G. & Kok, J. (1996). The genes for secretion and maturation of lactococcins are located on the chromosome of Lactococcus lactis IL1403. Appl Environ Microbiol 62, 1689–1692.[Abstract]





This Article
Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Citation Map
Services
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Google Scholar
Articles by Dirix, G.
Articles by Michiels, J.
Articles citing this Article
PubMed
PubMed Citation
Articles by Dirix, G.
Articles by Michiels, J.
Agricola
Articles by Dirix, G.
Articles by Michiels, J.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
INT J SYST EVOL MICROBIOL MICROBIOLOGY J GEN VIROL
J MED MICROBIOL ALL SGM JOURNALS
Copyright © 2004 Society for General Microbiology.