Génétique Microbienne, Institut National de la Recherche Agronomique, Domaine de Vilvert, 78352 Jouy en Josas CEDEX, France
Correspondence
Alexander Bolotin
bolotine{at}jouy.inra.fr
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The GenBank/EMBL/DDBJ accession numbers for the sequences reported in this paper are DQ072985DQ073008.
Two tables showing the homology of the spacers with database sequences are available as supplementary material with the online version of this paper.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The complete sequences of PCR-amplified fragments corresponding to different CRISPR loci were obtained by primer walking. The sequences of these primers can be provided upon request to A. B. For sequence analysis the loci were divided into direct repeats and corresponding spacer sequences. These were named using acronyms composed of the strain identifier followed by a one letter descriptor (d for the repeats and s' for spacers), and a number referring to the position of the element within the locus.
Spacer sequences were analysed for homology against themselves and the NCBI entrez nucleotide sequence database. CLUSTAL (Higgins & Sharp, 1989) software was used for sequence alignment.
Nucleotide and protein sequences.
Nucleotide sequences of CRISPR loci of different bacterial strains were obtained from the NCBI (www.ncbi.nlm.nih.gov) database, corresponding accession nos are given in parentheses after the strain systematic name. cas genes sequences were obtained from NCBI, MBGD (http://mbgd.genome.ad.jp) and ERGO (http://ergo.integratedgenomics.com/ERGO) databases. Assignment of the genes was based on sequence conservation, the MBGD COG (clusters of orthologous groups) database and proximity on the genome, determined in most cases by using the pinned region function of ERGO or the genome comparison facility of MBGD.
GenBank sequence accession nos.
Nucleotide sequences were deposited in GenBank under the accession nos DQ072985DQ073008.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
The Cas5 family groups large proteins (>1100 aa) that carry an HNH motif present in various nucleases, including colicin E9, which causes cell death by introducing double-stranded breaks into DNA, and a number of restriction enzymes (Walker et al., 2002; Maté & Kleanthous, 2004
; Saravanan et al., 2004
). The Cas6 family groups short proteins (
100 aa) of high pI, the features found in the Cas2 (short) and Cas1 (high pI) families, but there is no sequence homology between Cas6 and other proteins in the databases.
CRISPR spacers have homology with extant genes
CRISPR loci were found close to cas1 genes in about 50 of the 198 complete genomes available in the NCBI database. They totalled 2156 spacers, 44 of which are homologous to other genes, in addition to those from S. thermophilus and Streptococcus vestibularis, described in detail below. The 44 spacers are carried in 4 archaeal and 10 bacterial species, spanning a broad phylogenetic range (Table 1). Most (29 out of 44) are homologous to phage genes, even if they were identified on complete genomes (see Supplementary Table 1
available with the online journal for a detailed analysis). A striking case is that of S. pyogenes, which carries two short CRISPR structures close to the cas1A and cas1B genes, containing three and six spacers, respectively. All spacers of the former, and five of the latter are homologous to S. pyogenes prophage genes.
About a third of the 44 spacers are homologous with genes with no obvious extrachromosomal origin. Remarkably, six of these share homology with genes that reside in the vicinity of extrachromosomally derived genes, three share homology with genes of aberrant G+C content (up to 59 mol% G+C, compared to 47 mol% G+C over the entire Porphyromonas gingivalis W83 genome) and two share homology with genes from regions where the gene order differs from that of phylogenetically close genomes (Clostridium tetani E88). Horizontal transfer could lead to the gene organization observed in all these cases.
CRISPR alleles in different S. thermophilus strains
To further examine the finding that CRISPR spacers can have phage origin, or more generally extrachromosomal origin, we analysed the CRISPR alleles of 22 S. thermophilus and 2 S. vestibularis strains. This was prompted by the consideration that phages of lactic acid bacteria are among the best characterized in respect of genome data (Brussow & Hendrix, 2002), and that lactic acid bacteria are exposed frequently to phage attacks under the conditions of dairy fermentations.
PCR reactions with primers homologous to two regions flanking the CRISPR structure (yc70 and yc31) yielded a single band of varying lengths for different strains (Fig. 3). The PCR products were sequenced, and the results are summarized in Table 2
. The number of repeats varied between 10 and 51 for different CRISPR alleles. The repeats were strictly identical, with the exception of 38 out of a total of 632 (6 %). The slightly divergent repeats (less than 3 out of 36 bp difference) were mostly situated in the last position of an allele. The length of spacers comprised between 28 and 32 bp, being 30 for 556 of the 618 spacers (90 %). The total length of the CRISPR loci (from the first nucleotide of the first repeat to the last nucleotide of the last repeat) varied between 628 and 3404 bp. Three identical CRISPR loci were present in more than one strain (groups A, B and C, found in five, two and two strains, respectively; Table 2
). Internal duplications were detected in almost a quarter of the loci, indicating a substantial level of recombination in CRISPR structures.
|
|
|
|
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
An extrachromosomal origin of spacers is not limited to S. thermophilus, as it is found in many other bacteria and archaea. However, some spacers are homologous with genes that are not clearly related to extrachromosomal genes. Remarkably, these genes are frequently (in 75 % of cases) located in the vicinity of extrachromosomally derived genes, or in regions potentially transferred from unrelated organisms, or in regions where gene order differs from that of the phylogenetically close genomes. Horizontal gene transfer may underlie all these cases, suggesting that incorporation of gene fragments into CRISPR structures might take place upon invasion of a prokaryote cell by foreign DNA. This invasion may be most often mediated by the extrachromosomal elements, but could also be due to other processes, such as DNA transformation. Nevertheless, the overwhelming majority of CRISPR spacers (98 %) have no homology with known genes. The further accumulation of sequences in the databases may reveal the origin of these spacers.
The mechanism of CRISPR generation is not known, but the cas genes, which are invariably closely linked to the CRISPR structures, are presumably involved in this process, as was previously pointed out by Jansen et al. (2002). The process should involve the formation of segments of a defined size, destined to become spacers, and their linkage to the repeated element. The presence of exonuclease motifs of the recB type in the cas4 genes, and the HNH endonuclease motif in the cas5 genes prompts us to suggest that the segments are formed by a nucleolytic activity, but in two different ways. The Cas4 exonuclease might act from an end, as does the RecBCD enzyme complex, aided by a Cas3 helicase, which generates oligonucleotides (Singleton et al., 2004
and references therein). In contrast, the Cas5 endonuclease might excise the segments by internal DNA cleavage, possibly directed by the short conserved sequence that we identified in the extrachromosomal donor elements at a constant position relative to the spacer-matching region. A precedent for this type of activity is the action of type III restriction enzymes, which cut 25 to 27 bases from their recognition site (see Dryden et al., 2001
for a review). The type III restriction enzyme-like endonucleolytic action might be polar, as it involves tracking on the DNA, which could account for the biased orientation of the phage-derived spacers in the S. thermophilus CRISPR structures. We envisage that the Cas1 proteins, encoded by the two related genes, cas1A and cas1B, found in the two types of cas gene clusters, may be involved in the process of linking the DNA segments to the repeats. Biochemical study of Cas proteins should allow us to test this model of CRISPR formation.
The biological role of CRISPR elements is not known, although it was suggested that this element plays a role in replicon partitioning (Mojica et al., 1995; She et al., 1998
). A protein that binds to the repeats was purified from Sulfolobus solfataricus, and it was suggested that it might be involved in DNA condensation of the CRISPR structures (Peng et al., 2003
). Here we report a correlation between the number of spacers in a locus and the resistance of S. thermophilus to phage infection, suggesting that CRISPRs can have a different biological role, protecting the bacteria against phage attack. How could such protection be mediated? A possible mechanism is via anti-sense RNA inhibition of phage gene expression, which is supported by the following observations. First, spacers that are homologous to phage coding sequences can have either of the two orientations within a CRISPR locus, and thus give rise to anti-sense RNA, irrespective of the direction of locus transcription. Second, CRISPR loci do appear to be transcribed, as reported for Archeoglobus fulgidus (Tang et al., 2002
) and Sulfolobus solfataricus (Tang et al., 2005
), and can thus generate the anti-sense RNA. It was proposed that various anti-sense short RNAs might regulate gene expression (Tang et al., 2005
). Third, it was shown that anti-sense RNA inhibits phage propagation (Sturino & Klaenhammer, 2002
, 2004
). Studies combining fully sequenced phages and strains with characterized CRISPR loci should allow further testing of this hypothesis, notwithstanding the fact that, besides the effect of CRISPR, many other factors also contribute to phage resistance (see Coffey & Ross, 2002
for a recent review). Finally, CRISPR spacers could protect bacteria not only against phage infection, but also against invasion by other extrachromosomal elements, inhibiting expression of the genes they carry. Horizontal exchanges between CRISPR elements, which we detected by comparing different loci, could extend the protective range to extrachromosomal elements that have not yet invaded a particular strain. Such a beneficial, protective role could account for the wide spread and the apparent stability of CRISPR structures among prokaryotes.
![]() |
ACKNOWLEDGEMENTS |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Brussow, H. & Hendrix, R. W. (2002). Phage genomics, small is beautiful. Cell 108, 1316.[CrossRef][Medline]
Coffey, A. & Ross, R. P. (2002). Bacteriophage-resistance systems in dairy starter strains, molecular analysis to application. Antonie van Leeuwenhoek 82, 303321.[CrossRef][Medline]
Desiere, F., Lucchini, S., Canchaya, C., Ventura, M. & Brussow, H. (2002). Comparative genomics of phages and prophages in lactic acid bacteria. Antonie van Leeuwenhoek 82, 7391.[CrossRef][Medline]
Dryden, D. T., Murray, N. E. & Rao, D. N. (2001). Nucleoside triphosphate-dependent restriction enzymes. Nucleic Acids Res 29, 37283741.
Fayard, B. (1993). Caractérisation de 69 bactériophages of Streptococcus salivarius subsp. thermophilus incluant 10 bactériophages tempérés. PhD thesis, University Nancy I, France.
Groenen, P. M., Bunschoten, A. E., van Soolingen, D. & van Embden, J. D. (1993). Nature of DNA polymorphism in the direct repeat cluster of Mycobacterium tuberculosis, application for strain differentiation by a novel typing method. Mol Microbiol 43, 10571065.
Higgins, D. G. & Sharp, P. M. (1989). Fast and sensitive multiple sequence alignments on a microcomputer. Comput Appl Biosci 43, 151153.
Hoe, N., Nakashima, K., Grigsby, D. & 7 other authors (1999). Rapid molecular genetic subtyping of serotype M1 group A Streptococcus strains. Emerg Infect Dis 43, 254263.
Jansen, R., Embden, J. D. A., van Gaastra, W. & Schouls, L. M. (2002). Identification of genes that are associated with DNA repeats in prokaryotes. Mol Microbiol 43, 15651575.[CrossRef][Medline]
Kamerbeek, J., Schouls, L., Kolk, A. & 8 other authors (1997). Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin Microbiol 43, 907914.
Le Marrec, C., van Sinderen, D., Walsh, L., Stanley, E., Vlegels, E., Moineau, S., Heinze, P., Fitzgerald, G. & Fayard, B. (1997). Two groups of bacteriophages infecting Streptococcus thermophilus can be distinguished on the basis of mode of packaging and genetic determinants for major structural proteins. Appl Environ Microbiol 63, 32463253.[Abstract]
Makarova, K. S., Aravind, L., Grishin, N. V., Rogozin, I. B. & Koonin, E. V. (2002). A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res 30, 482496.
Maté, M. J. & Kleanthous, C. (2004). Structure-based analysis of the metal-dependent mechanism of H-N-H endonucleases. J Biol Chem 279, 3476334769.
Mojica, F. J., Ferrer, C., Juez, G. & Rodriguez-Valera, F. (1995). Long stretches of short tandem repeats are present in the largest replicons of the Archaea Haloferax mediterranei and Haloferax volcanii and could be involved in replicon partitioning. Mol Microbiol 43, 8593.
Mojica, F. J., Diez-Villasenor, C., Soria, E. & Juez, G. (2000). Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria. Mol Microbiol 43, 244246.[CrossRef]
Peng, X., Brügger, K., Shen, B., Chen, L., She, Q. & Garrett, R. A. (2003). Genus-specific protein binding to the large clusters of DNA repeats (Short Regularly Spaced Repeats) present in Sulfolobus genomes. J Bacteriol 185, 24102417.
Pourcel, C., Salvignol, G. & Vergnaud, G. (2005). CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology 151, 653663.[CrossRef][Medline]
Saravanan, M., Bujnicki, J. M., Cymerman, I. A., Rao, D. N. & Nagaraja, V. (2004). Type II restriction endonuclease R.KpnI is a member of the HNH nuclease superfamily. Nucleic Acids Res 32, 61296135.
Schouls, L. M., Reulen, S., Duim, B., Wagenaar, J. A., Willems, R. J. L., Dingle, K. E., Colles, F. M. & van Embden, J. D. (2003). Comparative genotyping of Campylobacter jejuni by amplified fragment length polymorphism, multilocus sequence typing, and short repeat sequencing: strain diversity, host range, and recombination. J Clin Microbiol 41, 1526.
She, Q., Phan, H., Garrett, R. A., Albers, S. V., Stedman, K. M. & Zillig, W. (1998). Genetic profile of pNOB8 from Sulfolobus, the first conjugative plasmid from an archaeon. Extremophiles 2, 417425.[CrossRef][Medline]
Simpson, C. L., Giffard, P. M. & Jacques, N. A. (1993). A method for the isolation of RNA from Streptococcus salivarius and its application to the transcriptional analysis of the gtfJK locus. FEMS Microbiol Lett 108, 9397.[CrossRef][Medline]
Singleton, M. R., Dillingham, M. S., Gaudier, M., Kowalczykowski, S. C. & Wigley, D. B. (2004). Crystal structure of RecBCD enzyme reveals a machine for processing DNA breaks. Nature 432, 187193.[CrossRef][Medline]
Stanley, E., Fitzgerald, G. F. & van Sinderen, D. (1999). Characterisation of Streptococcus thermophilus CNRZ1205 and its cured and re-lysogenised derivatives. FEMS Microbiol Lett 176, 503510.[CrossRef][Medline]
Sturino, J. M. & Klaenhammer, T. R. (2002). Expression of antisense RNA targeted against Streptococcus thermophilus bacteriophages. Appl Environ Microbiol 68, 588596.
Sturino, J. M. & Klaenhammer, T. R. (2004). Antisense RNA targeting of primase interferes with bacteriophage replication in Streptococcus thermophilus. Appl Environ Microbiol 70, 17351743.
Tang, T. H., Bachellerie, J. P., Rozhdestvensky, T., Bortolin, M. L., Huber, H., Drungowski, M., Elge, T., Brosius, J. & Huttenhofer, A. (2002). Identification of 86 candidates for small non-messenger RNAs from the archaeon Archaeoglobus fulgidus. Proc Natl Acad Sci U S A 99, 75367541.
Tang, T. H., Polacek, N., Zywicki, M., Huber, H., Brugger, K., Garrett, R., Bachellerie, J. P. & Huettenhofer, A. (2005). Identification of novel non-coding RNAs as potential anti-sense regulators in the archaeon Sulfolobus solfataricus. Mol Microbiol 55, 469481.[CrossRef][Medline]
Terzaghi, B. E. & Sandine, W. E. (1975). Improved medium for lactic streptococci and their bacteriophages. Appl Microbiol 29, 807813.
van Belkum, A., Scherer, S., van Alphen, L. & Verbrugh, H. (1998). Short-sequence DNA repeats in prokaryotic genomes. Microbiol Mol Biol Rev 43, 275293.
Walker, D. C., Georgiou, T., Pommer, A. J., Walker, D., Moore, G. R., Kleanthous, C. & James, R. (2002). Mutagenic scan of the H-N-H motif of colicin E9: implications for the mechanistic enzymology of colicins, homing enzymes & apoptotic endonucleases. Nucleic Acids Res 30, 32253234.
Received 17 March 2005;
revised 25 May 2005;
accepted 30 May 2005.
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
J MED MICROBIOL | ALL SGM JOURNALS |