Macro-array and bioinformatic analyses reveal mycobacterial ‘core’ genes, variation in the ESAT-6 gene family and new phylogenetic markers for the Mycobacterium tuberculosis complex

Magali Marmiesse1, Priscille Brodin1, Carmen Buchrieser2, Christina Gutierrez3, Nathalie Simoes2, Veronique Vincent3, Philippe Glaser2, Stewart T. Cole1 and Roland Brosch1

1 Unité de Génétique Moléculaire Bactérienne, Institut Pasteur, 25–28 rue du Docteur Roux, 75724 Paris Cedex 15, France
2 Laboratoire de Génomique des Micro-organismes Pathogènes, Institut Pasteur, 25–28 rue du Docteur Roux, 75724 Paris Cedex 15, France
3 Centre National de Référence des Mycobactéries, Institut Pasteur, 25–28 rue du Docteur Roux, 75724 Paris Cedex 15, France

Correspondence
Roland Brosch
rbrosch{at}pasteur.fr


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
To better understand the biology and the virulence determinants of the two major mycobacterial human pathogens Mycobacterium tuberculosis and Mycobacterium leprae, their genome sequences have been determined recently. In silico comparisons revealed that among the 1439 genes common to both M. tuberculosis and M. leprae, 219 genes code for proteins that show no similarity with proteins from other organisms. Therefore, the latter ‘core’ genes could be specific for mycobacteria or even for the intracellular mycobacterial pathogens. To obtain more information as to whether these genes really were mycobacteria-specific, they were included in a focused macro-array, which also contained genes from previously defined regions of difference (RD) known to be absent from Mycobacterium bovis BCG relative to M. tuberculosis. Hybridization of DNA from 40 strains of the M. tuberculosis complex and in silico comparison of these genes with the near-complete genome sequences from Mycobacterium avium, Mycobacterium marinum and Mycobacterium smegmatis were undertaken to answer this question. The results showed that among the 219 conserved genes, very few were not present in all the strains tested. Some of these missing genes code for proteins of the ESAT-6 family, a group of highly immunogenic small proteins whose presence and number is variable among the genomically highly conserved members of the M. tuberculosis complex. Indeed, the results suggest that, with few exceptions, the ‘core’ genes conserved among M. tuberculosis H37Rv and M. leprae are also highly conserved among other mycobacterial strains, which makes them interesting potential targets for developing new specific anti-mycobacterial drugs. In contrast, the genes from RD regions showed great variability among certain members of the M. tuberculosis complex, and some new specific deletions in Mycobacterium canettii, Mycobacterium microti and seal isolates were identified and further characterized during this study. Together with the distribution of a particular 6 or 7 bp micro-deletion in the gene encoding the polyketide synthase pks15/1, these results confirm and further extend the revised phylogenetic model for the M. tuberculosis complex recently presented.


Abbreviations: BAC, bacterial artificial chromosome; RD, region of difference

Representative sequences of the junction regions reported in this article have been deposited in the EMBL database under accession numbers AJ583832, AJ583833 and AJ583834.


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
The enormous efforts in the field of mycobacterial genomics in the past few years have resulted in the availability of the whole-genome sequences of Mycobacterium tuberculosis and Mycobacterium leprae, the aetiological agents of human tuberculosis and leprosy (Cole et al., 1998, 2001). In addition, the genome sequences of several other mycobacterial species have been accomplished recently (Garnier et al., 2003; Fleischmann et al., 2002) or are in the finishing phase (for an overview see http://www.pasteur.fr/recherche/unites/Lgmb/mycogenomics.html). These sequences are providing a huge body of information, whose analysis can provide deep insights into the biology and evolution of mycobacterial pathogens and related organisms. One efficient way for extracting relevant biological information from sequence data is comparative genomics, a discipline that is based on bioinformatics and various hybridization techniques. In the present study, we have developed and employed a focused macro-array for the analysis of strains belonging to the M. tuberculosis complex, a tight-knit complex of slow-growing mycobacteria comprising M. tuberculosis, the causative agent in the vast majority of human tuberculosis cases; Mycobacterium canettii and Mycobacterium africanum, both agents of human tuberculosis in sub-Saharan Africa; Mycobacterium microti, an agent of tuberculosis in voles; and Mycobacterium bovis, which infects a wide variety of mammalian species including humans. The members of the complex share great genetic similarity, seen by homology at the DNA level of greater than 99·9 %, but differ by some particular phenotypic characteristics including different host preferences that have led researchers to retain the traditional species names of these bacteria (Brosch et al., 2000). For easier data handling and because the variability of house-keeping genes is negligible, the number of genes included in the array was restricted to 500 specially selected genes from M. tuberculosis and M. bovis. The selection of genes was based on several criteria, including their presence in the genomes of M. tuberculosis and M. leprae relative to genomes of other organisms available in public databases, their potential to be involved in virulence, their genomic location close to the highly conserved IS1081 insertion elements or their affiliation to certain gene families. Of particular interest were the ‘core’ mycobacterial genes, of which there are 219. These were first identified on the basis of their restriction to M. leprae and M. tuberculosis, and their conservation by the former, despite the massive gene decay that has occurred, strongly suggests that these gene functions are essential for M. leprae and possibly other mycobacteria. Genes that were variable between M. tuberculosis and the highly related vaccine strain M. bovis BCG (Mahairas et al., 1996; Gordon et al., 1999; Behr et al., 1999) were also included. The hybridization of genomic DNAs from a large number of mycobacterial strains to the so-designed arrays, combined with bioinformatic analyses of the sequence data available for mycobacteria, made it possible to simultaneously identify genes that were conserved throughout the mycobacteria, and to determine the genes that were missing from one or more strains. Whereas highly conserved genes that are confined only to mycobacteria encode potential new drug targets, variable genes may be implicated in the virulence or host range of the mycobacterium concerned. Furthermore, some of the newly identified and analysed variable regions in M. canettii, M. microti and seal isolates, as well as the occurrence of a particular micro-deletion of 6 or 7 bp in the polyketide synthase pks15/1 gene of certain strains, enable the recently proposed evolutionary scenario of the M. tuberculosis complex to be tested further and consolidated (Brosch et al., 2002).


   METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Bacterial strains.
The 40 M. tuberculosis complex strains were composed of 18 M. tuberculosis strains and 11 M. bovis strains isolated from different organs of humans and animals, originating from different countries. Two M. bovis BCG vaccine strains (Birkhaug and Mérieux), two strains that were listed as M. africanum (001, 940946) in the collection of the Institut Pasteur as well as four M. microti (ATCC 35782, 94/2272, 005004, OV254), two M. canettii (14000059, 990161) and one M. canettii-like clinical isolate (990263) were included. The strains have been extensively characterized by reference typing methods, i.e. IS6110-RFLP typing and spoligotyping. Other tested mycobacterial species were Mycobacterium avium, Mycobacterium marinum and Mycobacterium smegmatis. For the investigation of the micro-deletion in the pks15/1 locus, 21 selected genomic DNAs were taken from a collection previously used for the study of the evolution of the M. tuberculosis complex (Brosch et al., 2002) in addition to 15 DNAs from the macro-array hybridization study.

In silico analyses of mycobacterial species.
The complete genome sequences of M. tuberculosis and M. leprae were shown to differ extensively in size and number of genes. The genome of M. tuberculosis comprises 4 411 532 bp and 3993 protein-coding genes (Cole et al., 1998; Camus et al., 2002), whereas M. leprae contains 3 268 203 bp and only 1605 genes, but numerous pseudogenes (Cole et al., 2001). In silico comparison of the predicted proteins shared by M. tuberculosis and M. leprae was done employing BLAST and FASTA alignment programs against public databases and partial genome sequences (available at web sites http://www.sanger.ac.uk/Projects/M_marinum/ and http://www.tigr.org). To be listed as a conserved mycobacterial protein, 40 % identity at the protein level between M. tuberculosis and M. leprae was used as the cut-off level (Cole, 2002). The presence or absence of these genes in M. avium, M. marinum and M. smegmatis was determined in a similar manner by using 40 % identity over at least 70 % of the complete length of the tested protein. For selected cases, to determine if proteins corresponded to orthologous proteins in two species, the bi-directional best-hit method was applied, by comparing a given protein of M. tuberculosis with the sequence of another species, e.g. M. marinum. The protein sequence from M. marinum, which showed the highest similarity, was then compared back to the M. tuberculosis database, and, in the case of an orthologous protein, showed its best hit with the protein with which the initial comparison was started. This method was particularly useful when genes or proteins from multi-gene families were compared, as high scores due to cross-hybridization may appear.

PCR.
PCR amplification was used for making probes, for confirmation of the absence of genes that were suggested to be absent in certain tested strains by macro-array results, and for generating the DNA fragments of junction regions of deleted regions. According to the type of application, different volumes were used. For the production of probes, which were spotted on the macro-array, PCRs were performed in 96-well plates containing 12·5 µl of 10x PCR buffer [600 mM Tris/HCl pH 8·8, 20 mM MgCl2, 170 mM (NH4)2SO4, 100 mM {beta}-mercaptoethanol], 12·5 µl of 20 mM nucleotide mix, 25 µl each primer at 2 µM, 10 ng template DNA, 10 % DMSO, 2 U Taq polymerase (Gibco-BRL) and sterile water to 125 µl. Amplification of junction regions and evaluation of the presence or absence of genes that showed no or weak hybridization by macro-array experiments with genomic DNA from a given strain were performed in a total reaction volume of 12·5 µl, as described previously (Brosch et al., 2002). Thermal cycling was performed on a PTC-100 amplifier (MJ) with an initial denaturation step of 90 s at 95 °C, followed by 35 cycles of 30 s at 95 °C, 1 min at 58 °C and 4 min at 72 °C.

Macro-arrays.
The selection of 500 genes for the focused macro-array included 219 genes common to both M. tuberculosis and M. leprae that code for proteins that did not show any similarity with proteins from other organisms in the public databases. Genes that were classified in the M. tuberculosis H37Rv genome as potentially involved in virulence (Cole et al., 1998) as well as genes that belonged to certain multi-gene families (Cole et al., 1998; Tekaia et al., 1999) were also included in the selection. Differences in the copy number of the insertion element IS1081 in the M. tuberculosis complex are almost entirely restricted to M. canettii. To determine if genes that flank IS1081 elements in the genome of M. tuberculosis H37Rv were conserved throughout the members of the M. tuberculosis complex, these genes were also selected for the construction of the macro-arrays. The selection also included genes that were variable between M. tuberculosis and the highly related vaccine strain M. bovis BCG (Mahairas et al., 1996; Gordon et al., 1999; Behr et al., 1999). Several house-keeping genes and other genes for which oligonucleotides were available in the laboratory were used for control purposes. The sequences of selected genes from M. tuberculosis H37Rv and M. bovis BCG Pasteur were downloaded by using the complete genome sequence (http://genolist.pasteur.fr/TubercuList/) displayed by the ARTEMIS software (Rutherford et al., 2000) or by using in-house databases. The design of primer pairs for the amplification of ~500 bp portions of these genes was done using the PRIMER 3 software (available via http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi). The oligonucleotides used for the amplification of the probes were designed to have annealing temperatures in the range 58–60 °C. PCRs were performed in a final volume of 125 µl as described above. Fifty-five microlitres of the 125 µl of the PCR product were transferred from the 96-well plates in 384-well plates using a pipetting robot (Tecan). After estimation of the amount of amplification product by gel electrophoresis and ethidium-bromide staining, each PCR product was then deposited in duplicate on a 22x22 cm Q-filter N+222 mm membrane (Genetix) by a gridding robot (QPIX). Probes were fixed and denatured by putting the freshly spotted membranes on Whatman paper soaked with fixation solution (0·5 M NaOH, 1·5 M NaCl) and leaving them for 15 min. The reaction was stopped by using distilled water. Membranes were stored wet at -20 °C for further use. Membranes were pre-hybridized and hybridized in 10 ml of a solution containing SSPE buffer (750 mM NaCl, 50 mM NaH2PO4, 5 mM EDTA) with 1 % SDS, Denhardt's reagent composed of 0·01 % Ficoll, 0·01 % polyvinylpyrrolidone and 0·01 % BSA and sonicated salmon DNA at a final concentration of 100 µg ml-1. Pre-hybridization was performed during 1 h at 65 °C. Genomic DNAs from the various mycobacterial strains were labelled by random incorporation of [{alpha}-33P]dCTP into the synthesized complementary strand using the Prime-it II kit (Stratagene). Unincorporated nucleotides were removed by exclusion chromatography with the QIAquick Nucleotide Removal kit (Qiagen). Membranes were hybridized overnight at 65 °C with the labelled probe followed by four washing steps in 10 ml of 0·5x SSPE, 0·2 % SDS solution. The first two washes were done at room temperature for 5 min followed by two washes at 65 °C for 20 min. Membranes were sealed in Saran wrap, exposed to a screen for 2 days and scanned on a STORM phosphorimager (Molecular Dynamics); signals were quantified and visualized using the IMAGE QUANT (Molecular Dynamics) software. The hybridization signals of each spot hybridized with the genomic DNAs from the various strains were compared to a control membrane hybridized with the reference strain M. tuberculosis H37Rv using the ARRAY VISION software (Imaging Research). For normalization purposes, the intensities from the central and the surrounding area of each spot were calculated. The intensity from the surrounding area, due to non-specific background hybridization signals, was subtracted from the spot intensity. To compare spot intensities from different membranes, a mean background intensity was calculated for each membrane, which was then used for establishing a correction factor to normalize the spot intensities from individual membranes. The log10 ratios between the normalized intensities of each spot from a tested strain compared to the reference strain M. tuberculosis H37Rv were used to estimate whether a gene was present or absent in a given strain. The cut-off was determined using a Gaussian model involving the mean and the standard deviation. For confirmation purposes, the genes which were found absent by this approach were re-tested by PCR analysis in the corresponding strain.

Sequencing of junction regions.
For genes that were missing from certain strains, PCR confirmation was done as described above. Then, new primers that were situated in the flanking regions of the missing gene(s) were designed. After amplification of the fragment containing the junction region of a given new deleted region, the fragment was purified by using a QIAquick PCR purification kit. For sequencing, we used 500 ng purified amplification product, 3 µl Big Dye sequencing mix (Applied Biosystems), 2 µl (2 µM) flanking primer and 3 µl of 5x buffer (5 mM MgCl2, 200 mM Tris/HCl pH 8·8). Thermal cycling was performed on a PTC-100 amplifier (MJ), with an initial denaturation step of 1 min at 96 °C, followed by 35 cycles of 30 s at 96 °C, 15 s at 56 °C and 4 min at 60 °C). The products were then precipitated with 80 µl of 76 % ethanol, centrifuged, washed with 70 % ethanol and dried. Then, 2 µl of formamide/EDTA buffer were added and, after denaturation, the samples were loaded onto 4 % polyacrylamide gels (48 cm). Electrophoresis lasted for 10–12 h on a model 377 automated DNA sequencer (Applied Biosystems). Obtained sequences were compared to the genome sequence of M. tuberculosis H37Rv using the TubercuList server at the Institut Pasteur, allowing the size and exact location of deleted regions to be determined. Representative sequences of the junction regions were deposited in the EMBL database under accession numbers AJ583832, AJ583833 and AJ583834.


   RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Bioinformatic analysis
Comparative genome analysis was carried out by screening all proteins in public databases for similarities with putative proteins from M. tuberculosis and M. leprae using the BLASTP and FASTA programs. Initially, this approach identified 219 genes encoding orthologous proteins in M. tuberculosis and M. leprae, but which showed no appreciable similarity with other proteins (Cole, 2002). The conservation of these genes by M. leprae in the face of extensive reductive evolution strongly suggests that they encode essential functions. These genes appear to code for proteins that are restricted to mycobacteria and possibly to closely related actinobacteria, whose sequences were not available in the public databases at the time of writing. Many of these genes were classified in the original genome analysis (Cole et al., 1998) in the group for which no function could be predicted. However, during the re-annotation of the M. tuberculosis H37Rv genome (Camus et al., 2002), based on similarities with new sequence data from other organisms or experimental data (Rosenkrands et al., 2000), the majority of these genes were re-grouped into categories for which a higher level of functional information is now available. Of the 219 genes specific for M. tuberculosis and M. leprae, 102 belong to the cell-wall and cell-processes class and 97 to the class of conserved hypothetical proteins. Ten are PE and PPE proteins with the remainder belonging to other classes (Fig. 1, Table 1). To test whether these genes were indeed conserved throughout the genus Mycobacterium, in silico comparison of the translated sequences with the predicted open reading frames (ORFs) from the almost-finished genome sequences of M. marinum, M. avium and M. smegmatis were undertaken. This analysis showed that, with a few exceptions, the great majority of these genes had orthologues present in M. marinum, M. avium and M. smegmatis (Table 1). Most of the genes which were not conserved among these species belong to the PE/PPE families, suggesting that extensive variation in number and sequence of these genes exists among the mycobacteria. M. marinum, one of the closest relatives of M. tuberculosis, as determined by 16S rRNA analysis (Springer et al., 1996), was missing only nine of the 219 conserved genes, whereas M. avium lacked 20, including the genes of the RD1 region, which are also absent from M. bovis BCG and M. microti. In the fast-grower M. smegmatis, 18 of the conserved genes did not have a counterpart.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 1. Classification of 219 orthologous genes of M. tuberculosis H37Rv and M. leprae which did not show appreciable homology to genes of other organisms in public databases.

 

View this table:
[in this window]
[in a new window]
 
Table 1. Conservation of ‘core’ mycobacterial genes in silico

Presence or absence of the 219 orthologous genes of M. tuberculosis H37Rv (Mt) and M. leprae (Ml) in the mycobacterial species M. avium (Ma), M. marinum (Mm) and M. smegmatis (Ms). Genes that were found to share >40 % amino acid identity over >70 % of the predicted protein were considered as orthologous genes and their presence is represented on the table by the sign +, whereas absence of the gene is indicated by the sign -. Genes of the PE/PPE families are boxed.

 
Macro-array analysis of the M. tuberculosis complex
The focused macro-array contained 500 gene fragments chosen according to criteria outlined above. To test the specificity of the array, hybridizations were undertaken with labelled plasmid DNA from a bacterial artificial chromosome (BAC) clone (Mi10C12) containing a 100 kb fragment from M. microti OV254 corresponding to genes Rv3802 to Rv3884 in M. tuberculosis H37Rv. Analysis of the hybridization signals quantified by using a phosphorimager and the ARRAY VISION software and generic file management programs allowed a large dataset to be established (Table 2). Absence of genes was further confirmed by PCR analysis and, where possible, by in silico comparison with finished or unfinished genome sequences. As expected, most of the 219 genes conserved between M. tuberculosis and M. leprae hybridized with the set of strains from the M. tuberculosis complex. Among the 102 genes presumably implicated in cell-wall structure and cell processes, three genes embCAB (Rv37939495) code for arabinosyltransferases, which are targets for ethambutol, a front-line anti-tuberculosis drug (Belanger et al., 1996; Telenti et al., 1997). Apart from results of bioinformatic analyses, confirming the presence of the three genes in M. avium, M. marinum and M. smegmatis, these three genes were also identified by the macro-array hybridizations as being present among all tested strains of the M. tuberculosis complex. This finding is encouraging and suggests that, among the other conserved mycobacterial genes of this group, new drug targets may also be discovered. Only for seven of the 219 genes was variability detected in the genomes of the tested strains from the M. tuberculosis complex and most of these genes are situated in RD regions (Table 2). Interestingly, several of these genes belong to the ESAT-6 family, which has 23 members on the M. tuberculosis H37Rv chromosome at 11 distinct sites (Cole et al., 1998; Tekaia et al., 1999). Some of these genes were located in previously defined regions of difference, such as RD1, RD5 and RD8 (Brosch et al., 2002; Gordon et al., 1999; Tekaia et al., 1999), whereas others such as esxR (Rv3019c) and esxS (Rv3020c) were identified in this study for the first time as being absent from several strains of the M. tuberculosis complex.


View this table:
[in this window]
[in a new window]
 
Table 2. Presence or absence of the tested genes in 40 strains of the M. tuberculosis complex

Presence of an RD region is indicated by the sign +, whereas absence of the region is indicated by the sign -. Note that for regions RD1, RD2 and RD12 in some strains junction sequences are not identical, as outlined in the results section.

 
In silico and macro-array analyses of the RD1 region showed that this region is of particular interest because the gene content, as well as the gene order, at this locus is highly conserved among several, sometimes rather distant, mycobacterial species such as M. tuberculosis, M. marinum, M. leprae and M. smegmatis (Fig. 2), whereas portions of this region were found to be absent from M. avium (Gey Van Pittius et al., 2001), M. bovis BCG (Mahairas et al., 1996) and M. microti. In fact, the hybridization experiments presented here (Fig. 2), using DNA from BAC clone MiBAC10C12 (from 4252·3 to 4367·9 kb relative to M. tuberculosis H37Rv), as well as genomic DNAs from two M. microti strains, have contributed to the identification of the RD1mic deleted region, a segment of 14 kb comprising genes Rv3864Rv3876 that was deleted from the genome of M. microti strains (Brodin et al., 2002). The ESAT-6 family, genes esxOesxP (Rv2346c47c) located in the RD5 region, as well as esxVesxW (Rv3619c20c) from the RD8 region, were absent from all tested M. bovis and M. bovis BCG strains (Table 2), whereas M. microti lacks the genes in the RD8 region but has the ones from RD5 esxOesxP (Rv2346c47c) present. For esxResxS (Rv3019c20c), we found that three M. tuberculosis strains (950530, 950531, 950532) and four M. microti strains (OV254, ATCC 35782, 005004 and 94/2272) did not harbour these genes, whereas they were present in all other tested strains. PCR amplification experiments using sequences flanking esxResxS showed that parts of the neighbouring genes PPE46 (Rv3018c) and PPE47 (Rv3021c) were also absent from the strains lacking esxResxS. Inspection of the sequence of this region in M. tuberculosis H37Rv showed that in this segment numerous genes contain highly repetitive sequences [PPE46 (Rv3018c), PE27A (Rv3018A), PPE47 (Rv3021c), PPE48 (Rv3022c), PE29 (Rv3022A)]. It appears that in M. microti and the three M. tuberculosis strains the deletion of esxResxS was mediated independently by recombination between the highly similar PPE genes PPE46 (Rv3018c) and PPE47 (Rv3021c) which share stretches of 364 and 408 bp of identical sequences, removing a 2·4 kb fragment (Fig. 3). This study clearly shows that the variability is greater among the members of the ESAT-6 family than for the other conserved mycobacteria-specific genes. The strong immunogenic character of the ESAT-6 proteins during host–pathogen interactions may be related to this greater variability.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 2. (a) Macro-array hybridization results from genes in the RD1 region. (b) Representation of the genes (shown by arrows) and gene order of the RD1 region in M. marinum, M. leprae, M. tuberculosis and M. smegmatis. The genomic position of the locus relative to the M. tuberculosis H37Rv and M. leprae genomes, as well as percentage amino acid identities among the various predicted proteins, are shown. Gene identification and annotation were performed using the ARTEMIS software (Rutherford et al., 2000).

 


View larger version (17K):
[in this window]
[in a new window]
 
Fig. 3. Representation of the genes (shown by arrows) and gene order of the genomic region containing the ESAT-6 family members esxResxS (Rv3019c20c), absent from several M. tuberculosis strains and M. microti strains. The sequence of the junction region was deposited in the EMBL database under accession number AJ583832 and relates to AJ550619. Note that genes PPE46 and PPE47 share large portions of identical nucleotide sequences.

 
Identification and description of deleted regions
In contrast to the conserved genes between M. tuberculosis and M. leprae, the genes from the known RD regions showed much more variability in the 40 isolates of the M. tuberculosis complex, confirming the finding that certain lineages of strains (e.g. M. bovis) have successively lost genetic material during evolution. In agreement with previously published studies (Brosch et al., 2001, 2002; Niobe-Eyangoh et al., 2003), the RD9 region was absent from the tested M. africanum, M. microti and M. bovis strains, whereas it was present in M. tuberculosis and M. canettii strains. It represents a key element that defines one evolutionary lineage within the M. tuberculosis complex that has separated from the M. tuberculosis lineage and comprises M. africanum, M. microti and M. bovis. Within this lineage, the M. bovis BCG substrains tested, Birkhaug and Mérieux, showed the greatest number of deleted regions, followed by M. bovis strains and M. microti (Table 2). Some deletions were found to be characteristic for certain subspecies, for example, the RD1mic deletion for M. microti strains (Brodin et al., 2002).

Similarly, we identified a specific deletion, RD2seal, for strains which were isolated from infected seals in different parts of the world. Hybridization results suggested that genes Rv1978 and Rv1979 were absent from the seal isolates. Sequence analysis of the junction region in the four tested seal isolates confirmed this finding and showed that in these strains a 1941 bp deletion has removed parts of genes Rv1978 and Rv1979 (Fig. 4a). We named this region RD2seal as it overlaps the 10·7 kb RD2 region, which is missing from some but not all BCG substrains (Mahairas et al., 1996; Behr et al., 1999; Gordon et al., 1999). In addition, strains isolated from seals were deleted for regions RD7, RD8, RD9 and RD10, whereas regions RD4, RD5, RD6, RD11, RD12 and RD13, usually missing from M. bovis (Brosch et al., 2002; Mostowy et al., 2002), were present. As the RD2seal junction regions in the four seal isolates were identical, but different from the RD2 deletion of BCG strains, it appears that this deletion is a specific evolutionary marker for strains prevalent in seals and sea lions (Fig. 4d).



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 4. (a) Representation of the genes (shown by arrows) and gene order of the RD2 region in M. tuberculosis, BCG and seal isolates (accession no. AJ583834). Deleted regions are delineated by red and blue lanes. (b) Representation of the genes (represented by arrows) and gene order of the RD12 region in M. tuberculosis, M. bovis and M. canettii (accession no. AJ583833). Deleted regions are delineated by purple and brown lanes. The positions of the locus relative to the M. tuberculosis H37Rv genome are shown. (c) Representation of the nucleotide polymorphism observed in the pks15/1 gene in different members of the M. tuberculosis complex, which are specified in the evolutionary scheme shown in (d). (d) Refined evolutionary scheme after Brosch et al. (2002), showing presence or absence of conserved RD regions in members of the M. tuberculosis complex, as well as the pks15/1 polymorphism. Red arrows indicate that all tested strains belonging to the phylogenetic groups 2 and 3 (Sreevatsan et al., 1997) showed a 7 bp deletion in their pks15/1 gene, whereas all tested strains from M. africanum, M. microti and M. bovis that had RD7–RD10 deleted were characterized by a 6 bp deletion (green arrows).

 
The selection of strains used in this study also included two isolates of M. canettii, which have been proposed to represent the most distant phylogenetic variant presently known within the M. tuberculosis complex (Brosch et al., 2002; Gutacker et al., 2002). We were particularly interested in the unusually low copy number of IS1081 (1 copy) in these strains, as all other members of the M. tuberculosis complex harbour 5–6 copies and display very homogeneous IS1081 RFLP (van Soolingen et al., 1997). In this respect, one key question was whether the different copy number of IS1081 in the M. canettii strains is due to a low rate of IS1081 transposition in this group of strains or if IS1081 copies may have been deleted from the genome. Bioinformatic comparisons showed that these sites were conserved in all completely or partially sequenced strains from the M. tuberculosis complex (i.e. M. tuberculosis strains H37Rv, CDC1551, Beijing 210, M. microti OV254, M. bovis AF2122/97 and M. bovis BCG) and that the genes flanking IS1081 copies were present in these strains. Furthermore, more distantly related mycobacteria, such as M. avium, M. marinum or M. smegmatis, possess orthologues of these flanking genes, without harbouring IS1081 insertion elements (data not shown). In contrast, hybridization results obtained with genomic DNAs from the two M. canettii strains showed that these strains lacked most of the genes that flank the IS1081 copies in the other members of the M. tuberculosis complex (Table 3), suggesting that in M. canettii strains the number of IS1081 copies is low because of deletion events in this lineage. As an example, Fig. 4(b) shows the genomic region containing genes Rv3113Rv3124 from M. tuberculosis H37Rv, whereas in M. canettii 14000059 a 12 436 bp deletion was observed that truncated genes Rv3111 (moaD) at position 3 491 865 and Rv3127 at position 3 479 429 relative to M. tuberculosis H37Rv, removing the intervening genomic region that carries a copy of IS1081. Interestingly, this deletion in M. canettii partially overlaps deleted region RD12 from M. bovis strains, which is 2·4 kb in size and has not removed the IS1081 copy (Fig. 4b). As for other copies of IS1081 that are missing from M. canettii 14000059 and 990161, the absence of many flanking genes suggests that they may have been removed by deletion events as well (Table 3).


View this table:
[in this window]
[in a new window]
 
Table 3. Presence or absence of IS1081 flanking genes in some strains from the M. tuberculosis complex

In M. tuberculosis H37Rv, ORFs Rv1047, Rv1199, Rv2512, Rv2666, Rv3023 and Rv3115 code for IS1081 transposases. Strain/species: 1, M. tuberculosis CDC1551; 2, M. tuberculosis strain 210; 3, M. microti; 4, M. bovis; 5, M. bovis BCG; 6, M. canettii.

 
In the selection of strains were three belonging to the M. tuberculosis Beijing type, showing the characteristic spoligotype and IS6110 insertion in the dnaA–N region. For these strains, macro-array results suggested that they all lacked regions RvD2 and RvD3, which are also absent from M. tuberculosis H37Rv, as shown previously (Gordon et al., 1999). However, for the Beijing strains, a single deletion of 15·5 kb relative to M. bovis AF2122/97 was detected that has removed both RvD regions, situated next to each other on the chromosome, apparently by homologous recombination between two copies of IS6110. This observation is in agreement with the findings of Ho and colleagues, who showed that deletions in the RvD2 region can be as large as 20 kb (Ho et al., 2000).

Micro-deletions in gene pks15/1
In a recent study, Guilhot and colleagues showed that several well characterized M. tuberculosis strains (H37Rv, Erdman, CDC1551 and MT106) lack a particular phenolglycolipid (PGL) that is produced by M. tuberculosis 210 (Beijing type) and M. canettii 14000059, and they linked this observation to a deletion of 7 bp that introduces a frameshift in a gene encoding a polyketide synthase (pks15/1) in these strains (Constant et al., 2002). Interestingly, in M. bovis AF2122/97 and M. bovis BCG, a 6 bp deletion was observed at the same locus of the pks15/1 gene. As M. bovis and M. bovis BCG both produce PGL, it seems likely that this 6 bp deletion, which does not cause a frameshift in the pks15/1 gene, does not influence the enzymic activity of the resulting gene product for the synthesis of the particular PGL (Constant et al., 2002). However, as this interesting polymorphism has direct phenotypic consequences, we were interested to determine at what stage of the phylogenetic diversification the deletion of 7 or 6 bp occurred. Therefore, we sequenced PCR products from the polymorphic locus in the pks15/1 gene (Fig. 4c) in 15 strains from the present study and in 21 additional strains used previously (Brosch et al., 2002). The results of this approach showed that the 7 bp deletion only occurred in a particular subgroup of M. tuberculosis strains that show the katG463 mutation CGG and, according to the nomenclature of Sreevatsan and colleagues, belong to genetic group 2 or 3 (Sreevatsan et al., 1997). All these strains had region TbD1 deleted and also lacked spacers 33–36 in their spoligotype, which is a characteristic feature of these genetic groups (Brosch et al., 2002; Soini et al., 2000). In contrast, no M. tuberculosis strains of genetic group 1 (katG463 CTG), including strains of the ancestral type that have the TbD1 region present as well as strains of the Beijing type cluster, which lack the TbD1 region (Brosch et al., 2002), showed a deletion in the polymorphic locus of the pks15/1 gene.

As for the 6 bp deletion previously observed for M. bovis AF2122/97 and M. bovis BCG (Constant et al., 2002), in the present study we found the same 6 bp deletion in the pks15/1 gene of all tested M. bovis strains, seal isolates, M. microti and M. africanum lacking regions RD7–RD10. Only M. africanum strains that lack the RD9 region but have retained regions RD7, RD8 and RD10 did not show the 6 bp deletion, suggesting that it occurred after the RD9 deletion, at about the same period as deletion of regions RD7, RD8 and RD10 occurred in the M. africanum->M. bovis lineage (Fig. 4d). These findings fit well with the proposed evolutionary scenario of the M. tuberculosis complex (Brosch et al., 2002) and suggest that two independent deletion events have occurred in the pks15/1 gene in two distinct branches of the phylogenetic tree of the M. tuberculosis complex. The 7 bp deletion, which inactivated the pks15/1 gene, occurred in the branch of TbD1-deleted ‘modern’ M. tuberculosis strains at about the same time-range as the katG463 mutation (CTG->CGG), whereas the 6 bp deletion occurred after the RD9 and before the RD10 deletion in the M. africanum->M. bovis lineage. Considering a clonal structure (Supply et al., 2003; Fleischmann et al., 2002) of the M. tuberculosis complex, it seems that this 6 bp deletion in gene pks15/1 was then inherited by the other members of this branch, and can therefore be found in M. microti, seal isolates, M. bovis and BCG strains (Fig. 4d).


   DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
In this study, we evaluated the extent of genetic variability among mycobacteria and in particular those belonging to the M. tuberculosis complex, by using bioinformatic comparisons of newly available mycobacterial sequences together with the macro-array technology that allows the simultaneous screening of many more genes than is possible with previously used PCR-based strategies. In this perspective, it was particularly interesting to determine if genes that are conserved between the two major mycobacterial pathogens M. tuberculosis and M. leprae were present in all M. tuberculosis complex members and other more distantly related mycobacterial species. As shown in Results, the combined approach of bioinformatic analyses, macro-array hybridizations and sequencing of selected genes has resulted in a more-refined picture of the M. tuberculosis complex, indicating that within the M. tuberculosis complex and other tested mycobacteria, a very high degree of conservation exists for the genes shared by M. tuberculosis H37Rv and M. leprae.

Among the few exceptions, members of the ESAT-6 family were most prominent. The members of this family are characterized by a small size (~100 aa) (Cole et al., 1998), common amino acid motifs (Tekaia et al., 1999), and several of them are organized in genomic loci with similar organization, suggesting that the neighbouring genes may have some function in the transport of these proteins out of the bacterial cell (Cole et al., 1998; Tekaia et al., 1999; Pallen, 2002). The first experimental proof for this hypothesis was recently obtained for the RD1 region of M. tuberculosis (Fig. 2), which is absent from BCG and M. microti (Pym et al., 2003). In the same study, it was shown that recombinant vaccine strains that appropriately exported ESAT-6 and CFP10 induced better protection against tuberculosis in animal models. This finding may be linked to the highly immunogenic character that was demonstrated in several studies for ESAT-6 and other members of this family (Skjot et al., 2002). Indeed, most ESAT-6 proteins, deleted from one or more strains as identified in the present study, are strongly recognized by the immune system of the host (Skjot et al., 2002). Furthermore, two additional members of the ESAT-6 family (Rv3809c, Rv3905c) are reported to be altered in the sequenced M. bovis AF2122/97 strain (Garnier et al., 2003). Taken together, it seems plausible that variation of ESAT-6 family proteins in strains of M. tuberculosis and/or members of the M. tuberculosis complex could contribute to antigenic variation, eventually helping the bacteria to escape immune recognition by the host. To elucidate the biological function of this protein family, further studies are necessary. The finding that the RD1 region is highly conserved in gene content and gene order in several pathogenic and non-pathogenic mycobacterial species (Fig. 2) suggests that ESAT-6 systems may play a fundamental role in survival in specific environments. This knowledge, together with appropriate cosmid and BAC libraries from these species (Brosch et al., 1998), should enable now very focused studies on the role of these proteins in the various mycobacteria.

In the tight-knit M. tuberculosis complex, where single nucleotide substitutions do not seem to be a substantial source of genetic diversity between strains, the presence or absence of certain regions of difference may play important roles in the varying phenotypes, host range and virulence of these bacteria (Pym et al., 2002; Lewis et al., 2003). Analyses of these RD regions in well-defined strains from the M. tuberculosis complex have allowed us to describe distinct phylogenetic lineages within the M. tuberculosis complex (Brosch et al., 2002) that have evolved from a common ancestor. In this study, we describe RD regions that are characteristic for certain subpopulations of the M. tuberculosis complex. One of the regions (RD2seal) is restricted to strains that were isolated from seals. In the past, seals have been described to be susceptible to tuberculosis, but it was not always clear if the infections in seals were caused by M. tuberculosis and/or M. bovis (Zumarraga et al., 1999). However, by the use of macro-arrays and sequencing strategies we show here that the tested strains, which were isolated from seals in different geographical regions (Argentina, France), lack RD7, RD8, RD9 and RD10 and the particular region RD2seal that seems to be specific for tubercle bacilli hosted by seals. The analysis of all available genetic markers (RDs, mmpL6 polymorphism, pks15/1 polymorphism and spoligotype) showed that the seal isolates are phylogenetically more closely related to M. bovis than to M. tuberculosis. Their position in the established evolutionary scheme (Fig. 4d) is somewhere close to M. microti, which also lacks RD7–RD10, shares the mmpL6 codon 551 single nucleotide polymorphism (SNP) of M. bovis (AAG) and presents a particular deletion (RD1mic) that is restricted to this subspecies. The position of the seal isolates in the phylogenetic scheme of the members of the M. tuberculosis complex shown in Fig. 4(d) is in good agreement with a recent SNP analysis by Musser and colleagues (Gutacker et al., 2002), who also placed these isolates as intermediate between M. tuberculosis and M. bovis. In a very recent study, the seal isolates were considered as sufficiently distant from M. bovis and M. tuberculosis to place them in a separate subspecies of the M. tuberculosis complex (Cousins et al., 2003). In this respect, the marker RD2seal is a valuable tool for the rapid identification of such strains.

The analysis of the sequence polymorphism in the pks15/1 gene, which abolishes production of a particular phenolglycolipid in a large group of M. tuberculosis strains (Constant et al., 2002), showed excellent agreement of the observed polymorphism with all other evolutionary markers available and confirmed the phylogenetic position of the strains used in this study (Fig. 4c, d). These results suggest that the 6 bp deletion in the pks15/1 gene in the M. africanum->M. bovis lineage arose independently from the 7 bp deletion observed for M. tuberculosis strains of Sreevatsan's group 2 and 3. Closer inspection of the flanking sequences of this polymorphic locus (Fig. 4c) in the pks15/1 gene showed that this genomic region is very GC-rich. In genes that code for PE and PPE proteins, such GC-rich regions have previously been associated with increased sequence polymorphism between strains (Cole et al., 1998; Banu et al., 2002). Interestingly, the pks15/1 sequence polymorphism is not the only example where independent deletion events have occurred in different evolutionary lineages of the tubercle bacilli in the same genomic regions. Other examples are the RD1 region of BCG (9·7 kb) and M. microti (14 kb), the RD2 region of BCG (10·7 kb) and seal strains (2 kb), or the RD12 region in M. bovis (2·7 kb) and M. canettii (12·4 kb). The size of the deletions, as well as the junction sequences of these regions, are clearly distinct from each other, indicating that no direct phylogenetic relationship exists between them. This observation raises an important point for the interpretation of micro- and macro-array data and implies that sequencing of the junction regions of thereby identified deleted regions (Fig. 4a, b) is necessary before the presence/absence of these marker genes can be used in the construction of evolutionary schemes. From a practical point of view, the pks15/1 polymorphism may serve as an important additional marker for the identification and classification of members of the M. tuberculosis complex, as well as for the characterization of mycobacterial DNAs amplified from mummified human remains. Recent studies have shown that, according to their spoligotype and their katG463 SNP, in former human populations M. tuberculosis strains were present that resembled TbD1-deleted M. tuberculosis strains of Sreevatsan's genetic group 2 and 3 (Zink et al., 2003; Fletcher et al., 2003). As shown in the present study, it appears that a strict correlation exists between these characteristics and the frameshift mutation (deletion of 7 bp) in the pks15/1 gene.

The situation of mycobacterial research has considerably changed in the last few years due to the information contained in the whole-genome sequence of M. tuberculosis H37Rv, the paradigm strain of tuberculosis research. However, genomic variation may exist among different strains and, for the mycobacteria, only very few studies have addressed this question by the use of DNA arrays and then for a limited number of strains (Behr et al., 1999; Kato-Maeda et al., 2001). We therefore evaluated the extent of the conserved gene pool relative to the flexible gene pool in a collection of strains from the M. tuberculosis complex and for some other mycobacterial species; this has led to a better understanding of the genetic criteria that may have played a role in the selection of the most successful M. tuberculosis strains during the evolution of the pathogen. This information is of importance for the development of new therapeutic and preventive strategies in the fight against tuberculosis.


   ACKNOWLEDGEMENTS
 
We are grateful to Lionel Frangeul, Thierry Garnier and Aboubakar Maitournam for help in primer design and data comparison, and Stephen Gordon, Sarah Ngo Niobe-Eyangoh and Alexander Pym for fruitful discussions. Preliminary sequence data were obtained from The Institute for Genomic Research (TIGR) web site (http://www.tigr.org) and the M. marinum sequence database at the Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/Projects/M_marinum/). Sequencing of M. avium and M. smegmatis at TIGR was accomplished with support from NIAID, and sequencing of M. marinum at the Sanger Institute with support from Beowulf Genomics. This study received financial support from the Institut Pasteur (PTR 35), the Génopole Programme and the Association Française Raoul Follereau.


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Banu, S., Honore, N., Saint-Joanis, B., Philpott, D., Prevost, M. C. & Cole, S. T. (2002). Are the PE-PGRS proteins of Mycobacterium tuberculosis variable surface antigens? Mol Microbiol 44, 9–19.[CrossRef][Medline]

Behr, M. A., Wilson, M. A., Gill, W. P., Salamon, H., Schoolnik, G. K., Rane, S. & Small, P. M. (1999). Comparative genomics of BCG vaccines by whole-genome DNA microarray. Science 284, 1520–1523.[Abstract/Free Full Text]

Belanger, A. E., Besra, G. S., Ford, M. E., Mikusova, K., Belisle, J. T., Brennan, P. J. & Inamine, J. M. (1996). The embAB genes of Mycobacterium avium encode an arabinosyl transferase involved in cell wall arabinan biosynthesis that is the target for the antimycobacterial drug ethambutol. Proc Natl Acad Sci U S A 93, 11919–11924.[Abstract/Free Full Text]

Brodin, P., Eiglmeier, K., Marmiesse, M., Billault, A., Garnier, T., Niemann, S., Cole, S. T. & Brosch, R. (2002). Bacterial artificial chromosome-based comparative genomic analysis identifies Mycobacterium microti as a natural ESAT-6 deletion mutant. Infect Immun 70, 5568–5578.[Abstract/Free Full Text]

Brosch, R., Gordon, S. V., Billault, A., Garnier, T., Eiglmeier, K., Soravito, C., Barrell, B. G. & Cole, S. T. (1998). Use of a Mycobacterium tuberculosis H37Rv bacterial artificial chromosome library for genome mapping, sequencing, and comparative genomics. Infect Immun 66, 2221–2229.[Abstract/Free Full Text]

Brosch, R., Gordon, S. V., Pym, A., Eiglmeier, K., Garnier, T. & Cole, S. T. (2000). Comparative genomics of the mycobacteria. Int J Med Microbiol 290, 143–152.[Medline]

Brosch, R., Pym, A. S., Gordon, S. V. & Cole, S. T. (2001). The evolution of mycobacterial pathogenicity: clues from comparative genomics. Trends Microbiol 9, 452–458.[CrossRef][Medline]

Brosch, R., Gordon, S. V., Marmiesse, M. & 12 other authors (2002). A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl Acad Sci U S A 99, 3684–3689.[Abstract/Free Full Text]

Camus, J. C., Pryor, M. J., Medigue, C. & Cole, S. T. (2002). Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology 148, 2967–2973.[Abstract/Free Full Text]

Cole, S. T. (2002). Comparative mycobacterial genomics as a tool for drug target and antigen discovery. Eur Respir J Suppl 36, 78–86.

Cole, S. T., Brosch, R., Parkhill, J. & 39 other authors (1998). Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537–544.[CrossRef][Medline]

Cole, S. T., Eiglmeier, K., Parkhill, J. & 41 other authors (2001). Massive gene decay in the leprosy bacillus. Nature 409, 1007–1011.[CrossRef][Medline]

Constant, P., Perez, E., Malaga, W., Laneelle, M. A., Saurel, O., Daffe, M. & Guilhot, C. (2002). Role of the pks15/1 gene in the biosynthesis of phenolglycolipids in the Mycobacterium tuberculosis complex. Evidence that all strains synthesize glycosylated p-hydroxybenzoic methyl esters and that strains devoid of phenolglycolipids harbor a frameshift mutation in the pks15/1 gene. J Biol Chem 277, 38148–38158.[Abstract/Free Full Text]

Cousins, D. V., Bastida, R., Cataldi, A. & 16 other authors (2003). Tuberculosis in seals caused by a novel member of the Mycobacterium tuberculosis complex: Mycobacterium pinnipedii sp. nov. Int J Syst Evol Microbiol 53, 1305–1314.[Abstract/Free Full Text]

Fleischmann, R. D., Alland, D., Eisen, J. A. & 23 other authors (2002). Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol 184, 5479–5490.[Abstract/Free Full Text]

Fletcher, H. A., Donoghue, H. D., Taylor, G. M., van der Zanden, A. G. & Spigelman, M. (2003). Molecular analysis of Mycobacterium tuberculosis DNA from a family of 18th century Hungarians. Microbiology 149, 143–151.[Abstract/Free Full Text]

Garnier, T., Eiglmeier, K., Camus, J. C. & 19 other authors (2003). The complete genome sequence of Mycobacterium bovis. Proc Natl Acad Sci U S A 100, 7877–7882.[Abstract/Free Full Text]

Gey Van Pittius, N. C., Gamieldien, J., Hide, W., Brown, G. D., Siezen, R. J. & Beyers, A. D. (2001). The ESAT-6 gene cluster of Mycobacterium tuberculosis and other high G+C Gram-positive bacteria. Genome Biol 2, RESEARCH0044.1-0044.18.

Gordon, S. V., Brosch, R., Billault, A., Garnier, T., Eiglmeier, K. & Cole, S. T. (1999). Identification of variable regions in the genomes of tubercle bacilli using bacterial artificial chromosome arrays. Mol Microbiol 32, 643–655.[CrossRef][Medline]

Gutacker, M. M., Smoot, J. C., Migliaccio, C. A. & 7 other authors (2002). Genome-wide analysis of synonymous single nucleotide polymorphisms in Mycobacterium tuberculosis complex organisms. Resolution of genetic relationships among closely related microbial strains. Genetics 162, 1533–1543.[Abstract/Free Full Text]

Ho, T. B., Robertson, B. D., Taylor, G. M., Shaw, R. J. & Young, D. B. (2000). Comparison of Mycobacterium tuberculosis genomes reveals frequent deletions in a 20 kb variable region in clinical isolates. Yeast 17, 272–282.[CrossRef][Medline]

Kato-Maeda, M., Rhee, J. T., Gingeras, T. R., Salamon, H., Drenkow, J., Smittipat, N. & Small, P. M. (2001). Comparing genomes within the species Mycobacterium tuberculosis. Genome Res 11, 547–554.[Abstract/Free Full Text]

Lewis, K. N., Liao, R., Guinn, K. M., Hickey, M. J., Smith, S., Behr, M. A. & Sherman, D. R. (2003). Deletion of RD1 from Mycobacterium tuberculosis mimics bacille Calmette-Guerin attenuation. J Infect Dis 187, 117–123.[CrossRef][Medline]

Mahairas, G. G., Sabo, P. J., Hickey, M. J., Singh, D. C. & Stover, C. K. (1996). Molecular analysis of genetic differences between Mycobacterium bovis BCG and virulent M. bovis. J Bacteriol 178, 1274–1282.[Abstract]

Mostowy, S., Cousins, D., Brinkman, J., Aranaz, A. & Behr, M. A. (2002). Genomic deletions suggest a phylogeny for the Mycobacterium tuberculosis complex. J Infect Dis 186, 74–80.[CrossRef][Medline]

Niobe-Eyangoh, S. N., Kuaban, C., Sorlin, P., Cunin, P., Thonnon, J., Sola, C., Rastogi, N., Vincent, V. & Gutierrez, M. C. (2003). Genetic biodiversity of Mycobacterium tuberculosis complex strains from patients with pulmonary tuberculosis in Cameroon. J Clin Microbiol 41, 2547–2553.[Abstract/Free Full Text]

Pallen, M. J. (2002). The ESAT-6/WXG100 superfamily – and a new Gram-positive secretion system? Trends Microbiol 10, 209–212.[CrossRef][Medline]

Pym, A. S., Brodin, P., Brosch, R., Huerre, M. & Cole, S. T. (2002). Loss of RD1 contributed to the attenuation of the live tuberculosis vaccines Mycobacterium bovis BCG and Mycobacterium microti. Mol Microbiol 46, 709–717.[CrossRef][Medline]

Pym, A. S., Brodin, P., Majlessi, L. & 7 other authors (2003). Recombinant BCG exporting ESAT-6 confers enhanced protection against tuberculosis. Nat Med 9, 533–539.[CrossRef][Medline]

Rosenkrands, I., King, A., Weldingh, K., Moniatte, M., Moertz, E. & Andersen, P. (2000). Towards the proteome of Mycobacterium tuberculosis. Electrophoresis 21, 3740–3756.[CrossRef][Medline]

Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajandream, M.-A. & Barrell, B. (2000). ARTEMIS: sequence visualisation and annotation. Bioinformatics 16, 944–945.[Abstract]

Skjot, R. L., Brock, I., Arend, S. M., Munk, M. E., Theisen, M., Ottenhoff, T. H. & Andersen, P. (2002). Epitope mapping of the immunodominant antigen TB10.4 and the two homologous proteins TB10.3 and TB12.9, which constitute a subfamily of the esat-6 gene family. Infect Immun 70, 5446–5453.[Abstract/Free Full Text]

Soini, H., Pan, X., Amin, A., Graviss, E. A., Siddiqui, A. & Musser, J. M. (2000). Characterization of Mycobacterium tuberculosis isolates from patients in Houston, Texas, by spoligotyping. J Clin Microbiol 38, 669–676.[Abstract/Free Full Text]

Springer, B., Stockman, L., Teschner, K., Roberts, G. D. & Bottger, E. C. (1996). Two-laboratory collaborative study on identification of mycobacteria: molecular versus phenotypic methods. J Clin Microbiol 34, 296–303.[Abstract]

Sreevatsan, S., Pan, X., Stockbauer, K. E., Connell, N. D., Kreiswirth, B. N., Whittam, T. S. & Musser, J. M. (1997). Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. Proc Natl Acad Sci U S A 94, 9869–9874.[Abstract/Free Full Text]

Supply, P., Warren, R. M., Banuls, A. L. & 7 other authors (2003). Linkage disequilibrium between minisatellite loci supports clonal evolution of Mycobacterium tuberculosis in a high tuberculosis incidence area. Mol Microbiol 47, 529–538.[CrossRef][Medline]

Tekaia, F., Gordon, S. V., Garnier, T., Brosch, R., Barrell, B. G. & Cole, S. T. (1999). Analysis of the proteome of Mycobacterium tuberculosis in silico. Tuber Lung Dis 79, 329–342.[CrossRef][Medline]

Telenti, A., Philipp, W. J., Sreevatsan, S., Bernasconi, C., Stockbauer, K. E., Wieles, B., Musser, J. M. & Jacobs, W. R., Jr (1997). The emb operon, a gene cluster of Mycobacterium tuberculosis involved in resistance to ethambutol. Nat Med 3, 567–570.[Medline]

van Soolingen, D., Hoogenboezem, T., de Haas, P. E. & 9 other authors (1997). A novel pathogenic taxon of the Mycobacterium tuberculosis complex, Canetti: characterization of an exceptional isolate from Africa. Int J Syst Bacteriol 47, 1236–1245.[Abstract/Free Full Text]

Zink, A. R., Sola, C., Reischl, U., Grabner, W., Rastogi, N., Wolf, H. & Nerlich, A. G. (2003). Characterization of Mycobacterium tuberculosis complex DNAs from Egyptian mummies by spoligotyping. J Clin Microbiol 41, 359–367.[Abstract/Free Full Text]

Zumarraga, M. J., Bernardelli, A., Bastida, R. & 10 other authors (1999). Molecular characterization of mycobacteria isolated from seals. Microbiology 145, 2519–2526.[Abstract/Free Full Text]

Received 22 July 2003; revised 30 September 2003; accepted 2 October 2003.