The diversity within an expanded and redefined repertoire of phase-variable genes in Helicobacter pylori

Laurence Salaün1, Bodo Linz2, Sebastian Suerbaum3 and Nigel J. Saunders1

1 The Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford OX1 3RE, UK
2 Max-Planck-Institut fuer Infektionsbiologie, Dept. Molecular Biology, Schumannstrasse 21/22, D-10117 Berlin, Germany
3 Medizinische Hochschule Hannover, Institut für Medizinische Mikrobiologie und Krankenhaushygiene, Carl-Neuberg-Strasse 1, D-30625 Hannover, Germany

Correspondence
Nigel Saunders
Nigel.Saunders{at}pathology.oxford.ac.uk


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Phase variation is a common mechanism used by pathogenic bacteria to generate intra-strain diversity that is important in niche adaptation and is strongly associated with virulence determinants. Previous analyses of the complete sequences of the Helicobacter pylori strains 26695 and J99 have identified 36 putative phase-variable genes among the two genomes through their association with homopolymeric tracts and dinucleotide repeats. Here a comparative analysis of the two genomes is reported and an updated and expanded list of 46 candidate phase-variable genes in H. pylori is described. These have been systematically investigated by PCR and sequencing for the presence of the genes, and the presence and variability in length of the repeats in strains 26695 and J99 and in a collection of unrelated H. pylori strains representative of the main global subdivisions recently suggested. This provides supportive evidence for the phase variability of 30 of the 46 candidates. Other differences in this subset of genes were observed (i) in the repeats, which can be present or absent among the strains, or stabilized in different strains and (ii) in the gene-complements of the strains. Differences between genes were not consistently correlated with the geographic population distribution of the strains. This study extends and provides new evidence for variation of this type in H. pylori, and of the high degree of diversity of the repertoire of genes which display phase-variable switching within individual strains.


Abbreviations: CDS, coding sequence; MLST, multilocus sequence typing

A table of primers used for amplification and sequencing of the repeat-containing regions is available as supplementary data with the online version of this paper at http://mic.sgmjournals.org.


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Helicobacter pylori infection is one of the most common bacterial infections in the world. It infects 80–90 % of the population in developing countries and half of the population over the age of 60 in developed countries (Marshall, 1994; Parsonnet, 1995). H. pylori can persist in the stomach of infected patients, who may remain asymptomatic, over decades. However, H. pylori infection is associated with gastritis, peptic ulcers, gastric carcinoma and MALT lymphoma (Marshall & Warren, 1984; Nomura et al., 1994, Parsonnet et al., 1994).

As the organism is transmitted between human hosts it may also have to survive conditions in the lower intestinal tract, the environment through which it passes, and must transiently survive luminal acid stress before becoming established in a new stomach mucus layer. Although it is still debated, there are also colonization niches outside of the stomach, such as the oral cavity (Majmudar et al., 1990; N. J. Saunders, unpublished culture data), and possibly in the environment (Park et al., 2001; Bunn et al., 2002). The ability of this organism to generate diversity within and between strains is likely to be an important component of its ability to cause prolonged colonization and its association with different diseases (Logan & Berg, 1996; Atherton et al., 1997; Blaser, 1997). In addition, prolonged survival may require either alteration of surface antigens as a mechanism of immune evasion or as a means to generate host-mimicking structures.

Phase variation is a mechanism of gene-switching that is widely used in adaptation to altering environmental conditions and immune evasion (Saunders, 2003; Salaün et al., 2003). It is a process of high-frequency, reversible, switching between phenotypes, that is mediated by genetic reorganization, mutation or modification and results in the continuous generation of alternate phenotypes within a population. The availability of complete genome sequences has facilitated the identification of the complete repertoires of potentially phase-variable genes in those species that use instability within simple sequence repeats as a gene-switching mechanism. This has been used as a means to identify those genes likely to be involved in these critical host–environmental adaptations. This informatics-based approach has been applied to several species, including Haemophilus influenzae (Hood et al., 1996), Helicobacter pylori (Tomb et al., 1997; Saunders et al., 1998; Alm et al., 1999), Campylobacter jejuni (Parkhill et al., 2000) and pathogenic Neisseria species (Saunders et al., 2000; Tettelin et al., 2000; Snyder et al., 2001). These studies have undergone progressive refinement as the tools have been improved and as genome comparisons have become possible. Several themes have emerged as to the nature of the switched genes, and the significantly better data obtained from comparative studies suggest additional levels of diversity within the candidate gene sets. However, to date, none of these studies has been linked to a systematic analysis of informatics predictions, although a partial study has been conducted for N. meningitidis (Martin et al., 2003). This is necessary to test the predictions and more robustly define a whole organism repertoire of phase-varied genes. The inter-strain diversity in gene presence and phase variability suggested by comparisons of small numbers of genomes has also not been fully addressed.

Since the identification of candidate phase-variable genes in genomes (Tomb et al., 1997; Saunders et al., 1998; Alm et al., 1999), there have been several studies highlighting the importance of individual identified genes and confirmation of their variability. The first of these were associated with LPS phenotypic expression control (Appelmelk et al., 1998, 1999, 2000; Wang et al., 1999; Logan et al., 2000); others have been associated with the control of flagellar expression (FliP) (Josenhans et al., 2000), with adhesion properties (Ilver et al., 1998; Peck et al., 1999; Yamaoka et al., 2000; Mahdavi et al., 2002) or with adaptation to an acidic environment (Tannaes et al., 2001). This is now very clearly an important mechanism of phenotypic control in H. pylori, possibly the most important. Defining its scope on a whole-organism level offers the opportunity to consider the diversification potential, nature of adaptability and possible interaction of switching genes of this organism in a holistic fashion.

This paper describes an updated genomic analysis of candidate phase-variable genes in H. pylori based upon a comparative approach of the complete sequences of the H. pylori strains 26695 and J99, as realized for Neisseria spp. (Snyder et al., 2001). The candidate phase-variable genes were identified through their association with homopolymeric tracts and dinucleotide repeats, but considering shorter repeats than Tomb et al. (1997) and Saunders et al. (1998), with full consideration of the sequence contexts. By using such a comparative approach and by reducing the cut-off for inclusion, an expanded list of H. pylori phase-variable candidates has been identified. The revised candidate list of genes has been assessed in a representative collection of strains comprising the two sequenced strains and 21 unrelated strains to determine repeat-associated polymorphisms as additional evidence of functional repeat instability, conservation of potentially unstable repeat regions and conservation of the switching gene repertoire. This provides a significantly expanded insight into the scope of phase variation as a mechanism of diversification in this species. This is also the first systematic study of such a bioinformatics-based analysis and provides a framework for the assessment and refinement of similar analyses in other species.


   METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Repetitivity search method.
The complete genome sequences of H. pylori strains 26695 (Tomb et al., 1997) and J99 (Alm et al., 1999) were analysed using previously described whole-genome analysis methodology (Saunders et al., 1998) and displayed using an ACEDB graphical interface [http://www.acedb.org (Durbin & Thierry-Mieg, 1991)] to create a comparative sequence database as described previously (Snyder et al., 2001). The lengths, variations between and sequence contexts of simple DNA repeats were used to determine an expanded repertoire of candidate phase-variable genes. All homopolymeric tracts greater than or equal to G7 or C7, and to A9 or T9, all dinucleotide repeats greater than or equal to four copies, and potentially frame-shifted genes were assessed to determine the likely significance of repeat-length variation on the expression of the associated reading frame. The functions of the putative phase-variable genes have been addressed using the homologies identified within the ACEDB from BLASTN and BLASTX searches and the revised annotations of the H. pylori genomes (http://genolist.pasteur.fr/PyloriGene/) (Boneca et al., 2003).

H. pylori strains and DNA preparation.
Twenty-three H. pylori strains were selected, representing diverse ethnic groups and countries of origin (Table 1 and Fig. 1), including the sequenced strains 26695 (Tomb et al., 1997) and J99 (Alm et al., 1999), and the mouse-adapted strain SS1 (Lee et al., 1997). Strains were isolated from gastric biopsy samples. The source of the strains has been described in Falush et al. (2003), except for three strains from Ladakh and strains B225, VZ21 and JP96-9 (Table 1). Apart from the two strains representing the hpAfrica2 population, which were grown in a different laboratory, all H. pylori strains were grown on Columbia agar (Oxoid) containing 10 % laked horse blood (Oxoid) with Dent supplement (Oxoid). The two hpAfrica2 strains were grown on GC agar with peptone supplemented with 10 % inactivated horse serum, vitamin mix and antibiotics (vancomycin, 10 µg ml–1; polimycin, 25 U ml–1; trimethoprim, 5 µg ml–1; amphotericin B, 4 µg ml–1). Cultures were incubated for 3–6 days at 37 °C under microaerobic conditions (CampyGen; Oxoid). Strains were stored in 20 % (v/v) glycerol in BHI broth (Oxoid) at –80 °C. DNA was purified from plate cultures using the AquaPure Genomic DNA Isolation Kit (Bio-Rad) according to the manufacturer's instructions. All the strains included within the study are cagA+, except the two hpAfrica2 strains, as assessed by the primers designed by Akopyants et al. (1998).


View this table:
[in this window]
[in a new window]
 
Table 1. Geographic distribution of the 23 H. pylori strains used in this study

 


View larger version (43K):
[in this window]
[in a new window]
 
Fig. 1. Neighbour-joining tree and population distributions of the 23 H. pylori strains in this study among the multilocus haplotypes described by Falush et al. (2003). All distances are based on Kimura two-parameter estimates and are to scale (scale bar, lower left). The isolates were assigned to populations using STRUCTURE V2.0. The code number refers to the strains as indicated in Table 1.

 
Multilocus sequence typing (MLST)-based strain selection and additional assessment.
Following an initial assessment in a subset of strains, an expanded set selected on the basis of diversity and representing the major population groupings according to MLST was used. In addition, MLST was performed to determine the population characteristics of the initial strains used. Seven housekeeping genes (atpA, efp, mutY, ppa, trpC, ureI and yphC) and the virulence-associated gene vacA were assessed by MLST as described previously (Achtman et al., 1999). A complete list of primer sequences used for the MLST study are available at http://helicobacter.mlst.net/info/primers.htm. PCR and automated sequencing of PCR products from both strands was performed using standard protocols. Isolates were assigned to populations using STRUCTURE V2.0 (Falush et al., 2003) assuming four populations (K=4) as described previously (Falush et al., 2003).

The strain set used included five hpEastAsia strains, five hpAfrica1 strains (including J99), two hpAfrica2 strains, eleven hpEurope strains (including 26695) and three strains from Ladakh (North India) (Table 1). The neighbour-joining population tree (Fig. 1) shows that the strains from Ladakh form a separate branch from the European strains, although STRUCTURE does not see the Ladakh strains as a separate population due to the similar sequence components in the European strains that arose from a common ancestor (Falush et al., 2003).

Amplification and sequencing of the repeat-containing regions.
PCRs were performed to amplify regions of the putative phase-variable genes containing potentially variable repeats. Primers were designed using the published sequences of H. pylori strains 26695 (Tomb et al., 1997) and J99 (Alm et al., 1999) (see supplementary table available with the online version of this paper at http://mic.sgmjournals.org), using conserved regions as identified using ACEDB. PCRs were carried out using Taq DNA polymerase (Invitrogen) according to the manufacturer's instructions. PCR products were directly sequenced using the PCR primers. In a small number of instances direct sequencing yielded poor trace data; these products were TOPO TA-cloned into pCR2.1 (Invitrogen) according to the manufacturer's instructions. E. coli strain DH5{alpha} transformants were grown on LB agar (Oxoid) plates containing 50 µg kanamycin ml–1 (Sigma) and plasmids were extracted using the Concert Rapid Plasmid Miniprep System (Gibco-BRL). Inserts were sequenced from the reverse primer site flanking the cloning site. Automated sequencing was performed by using ABI Prism BigDye Terminator cycle sequencing, version 2.0 (Applied Biosystems) and was resolved on an ABI Prism 377 or 3100 DNA sequencer (Applied Biosystems). The trace-viewing program, Trev, of the Staden Package was used to read and edit the generated sequences (Bonfield et al., 2002). Sequences were aligned using Seqlab from the Wisconsin Package, version 10.2 (Genetics Computer Group, Madison, WN, USA) through the Oxford University Bioinformatics Centre.

Validation of the observed variations in the length of the repeats.
To determine the relative contributions to repeat length variability within the genomic DNA preparations and any PCR slippage-generated variation, representative PCR products used for sequencing were TOPO TA-cloned into pCR2.1 (Invitrogen) and sequencing reactions were conducted using two different approaches: (i) sequencing directly from the plasmid, and (ii) PCR reamplification from the plasmid template and sequencing. Sequences obtained with both approaches were compared.


   RESULTS AND DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Whole-genome comparative analysis of H. pylori strains 26695 and J99 for putative phase-variable genes
A search for dinucleotide repeats and homopolymeric tracts in the first completed H. pylori genome sequence (strain 26695) found 17 intragenic repeats, described as potentially associated with phase variation, plus an additional 17 genes associated with intergenic repeats (of which five were also associated with an intragenic repeat) (Tomb et al., 1997). A different analysis of the strain 26695 sequence used simple sequence repeats to identify 27 potentially phase-variable genes (of which one was associated with an intergenic repeat) (Saunders et al., 1998). This second analysis included 11 additional candidates to those in the initial description, but did not include any of the genes associated only with intergenic repeats indicated originally (and unintentionally omitted one gene, HP0009). A comparative analysis was performed using the second completed genome sequence (strain J99) to identify potentially phase-variable genes in this genome, which identified 26 candidates, of which seven were new (Alm et al., 1999). An additional gene, phospholipase A (pldA), has also been identified by Tannaes et al. (2001). Thus, so far, 36 putative phase-variable genes have been proposed. Through comparative genome analysis, our search has now identified a total of 46 phase-variable gene candidates (Table 2).


View this table:
[in this window]
[in a new window]
 
Table 2. The 46 putative phase-variable genes identified from the in silico comparative analysis of sequenced strains 26695 and J99

 
The list of 46 candidates (Table 2) retains some genes from our previous analysis of strain 26695 that were not included by Alm et al. (1999). Alm et al. (1999) did not include these genes because (i) in J99 a ‘stabilized’ motif (an alternative sequence with no potentially unstable repeat) replaces the repeat observed in 26695 (HP0211, HP0298, HP0585/6, HP0335, HP1369), (ii) the candidate gene in 26695 has no homologue in J99 (HP0855, HP0051), (iii) the repeat is located in the promoter (HP0103), or (iv) because of a problem with annotation (HP0058). In HP0058 the annotation suggests that the coding sequence (CDS) starts 123 bp 3' of the homopolymeric C repeat, whereas this CDS should begin 5' of the repeat and this repeat leads to a frameshift mutation resulting in an OFF status of this gene in both 26695 and J99. However, four of the additional genes identified by Alm et al. (1999) that were not described in our original analysis have been included and investigated further. This includes one of the rfaJ paralogues (jhp0820), which has a C14 repeat in strain J99, for which no homologue exists in strain 26695. Three other CDSs (HP1366, HP0143, HP1433) have been assessed because the length of the repeat in these genes varies between strains 26695 and J99 (Alm et al., 1999), although the length of repeats in these genes is below the usual cut-off for inclusion.

The 12 genes associated with only intergenic repeats from the initial assessment of strain 26695 (Tomb et al., 1997), and HP1105 and HP1397 have not been included or assessed because the associated intergenic repeats are not clearly associated with candidate promoter components and it would not be possible to interpret the potential phenotypic consequences of length variation. In addition, among the 26 candidates suggested by Alm et al. (1999), the three genes for which the repeat is located at the 3' end of the CDS (HP0164, HP1074, HP1499) have not been included because variations in the length of these repeats are not likely to alter the expression state of the associated genes.

All the 13 new putative phase-variable genes described here (Table 2) harbour a homopolymeric tract. The identification of 11 of them has been possible using a reduced cut-off to that used previously (Saunders et al., 1998) from C9 or G9 homopolymeric tracts to C7 or G7. This revision was made for three reasons: (i) there is evidence from pldA of H. pylori (Tannaes et al., 2001) and from siaD of Neisseria meningitidis (Hammerschmidt et al., 1996) that such repeats can be functionally unstable; (ii) initial sequencing in this study showed that longer repeats can vary to this shorter length (Table 3); (iii) differences in the length of short repeats were observed between the published sequence of strains 26695 and J99. This has inevitably increased the number of candidate genes included at this initial stage, but this has been addressed by comparative sequencing. In addition, an annotated CDS encoding a hypothetical protein in strain J99 (jhp0540), which has no homologue in strain 26695, was considered as a candidate on the basis of a long poly-A tract, A14, located upstream of a perfect TATAAT –10 consensus sequence in its probable promoter region. Variation in the length of this repeat would probably influence the relative spacing of the –10 and other promoter components, as seen in the Hae. influenzae fimA/B switch (van Ham et al., 1993) and vlp genes of Mycoplasma hyorhinis (Yogev et al., 1991). Another CDS encoding a short hypothetical protein (51 aa) (HP0767 in strain 26695; not annotated but located between jhp0704 and jhp0705 in strain J99) was included on the basis of a long poly-G tract, G11, associated with a probable frameshift mutation in strain 26695.


View this table:
[in this window]
[in a new window]
 
Table 3. Investigation of the repertoire of putative phase-variable genes in unrelated strains

pro, Promoter-located repeat [as a consequence the ON/OFF cannot be determined (ND)]; S, stabilized (e.g. A7S is a stabilized tract equivalent to a repeat of A7); NR, no repeat.

 
When babA and babB genes are considered, the relative locations of these paralogous genes have been swapped in the two published genomes (HP0896/jhp1164 encoding babB, and HP1243/jhp0833 encoding babA). Only the babB gene has the intragenic dinucleotide CT repeat (Alm et al., 1999) and is as such a candidate phase-variable gene. Different primer pairs were needed to amplify this gene from its two potential locations (see supplementary table available with the online version of this paper at http://mic.sgmjournals.org).

A phase-variable gene described after the inception of this project (de Vries et al., 2002) was not included in this study. This gene, identified in strain J99 (jhp1297), encodes a type III restriction-modification enzyme and was identified after correction of the publicly available annotation of strain J99 (de Vries et al., 2002); it has no homologue in strain 26695.

To assess the range of functions associated with the putative phase-variable genes, the recently revised annotation of H. pylori genomes has been used (Boneca et al., 2003) in addition to the homologies identified within the ACEDB from BLASTN and BLASTX searches. Most of the 46 putative phase-variable genes encode proteins involved in either LPS biosynthesis (seven candidates) or are cell-surface-associated proteins (22 candidates) (Table 2), highlighting the importance of phase variation in controlling the expression of cell-surface components directly interacting with the environment. The other putative phase-variable genes encode proteins of restriction-modification systems (nine candidates) and for hypothetical proteins (five candidates). Three other putative phase-variable genes encode proteins with functions associated with metabolic processes: electron transport (HP0642), degradation of protein (HP0657) and pyrimidine ribonucleotide synthesis (HP0919). A similar range of phase-variable phenotypes has been proposed in other phase-variable species such as Hae. influenzae (Hood et al., 1996; Saunders, 1999), Neisseria spp. (Saunders et al., 2000; Snyder et al., 2001) and Campylobacter jejuni (Parkhill et al., 2000).

The size and the nature of the observed repeats differ with their location. Long poly-A and poly-T repeats (consisting of more than 10 nt in strains 26695 and J99) are only found in intergenic regions where no consensus promoter components can be readily identified, except in the case of jhp0540 for which the A14 tract is located between the candidate –35 and –10 boxes. On the other hand, long poly-G and poly-C tracts are almost always intragenic; HP0103 being the only example for which a poly-G repeat is found in a promoter region. The absence of intragenic poly-A and poly-T repeats may reflect the amino acids they would encode (lysine and phenylalanine), whereas the poly-G and poly-C repeats encode the small, weakly charged, amino acids, poly-glycine and poly-proline, respectively. Dinucleotide repeats are only found in intragenic locations. The most common are the long CT and GA repeats, whereas the AT and CA repeats are only found in two CDSs (HP0211 and HP0744) and are shorter [(AT)5 and (AG)5–7, respectively] (Table 3). Only the CT and GA repeats show variation in length when the genomes of strains 26695 and J99 are compared.

Variations in the length of the repeats
Before the role of phase variation can be addressed at a whole-system level, additional evidence of functional repeat length variation is needed. When computational approaches are applied as hypothesis-generating exercises, these have to be designed with selection criteria that are inclusive. As such, properly constructed candidate lists will tend to include some genes that are not actually phase-variable. It is only by investigation of the variability of these repeats that the thresholds for selection of candidates in bioinformatics studies can be assessed. The advantages of sequence comparisons between unrelated strains in this regard are significant, as described more fully in the comparative analysis of the pathogenic Neisseria (Snyder et al., 2001).

The regions containing potentially functional repeat length variation have been sequenced from strains 26695 and J99, with a different passage history to those used in the genome sequencing projects, and a collection of 19 diverse strains representative of the main global subdivisions currently recognized (Fig. 1) (Falush et al., 2003). From the 21 strains studied, evidence of variation in the length of the repeats was obtained for 30 genes (Table 3). When they became available after the end of the investigation of the 46 candidates, two strains of the hpAfrica2 population were added to the study to address these 30 genes. Among these, 27 were candidates previously proposed and three were additional candidates identified through comparative analysis (HP0642, HP0767 and jhp0540). For seven of the candidate genes in strain 26695 and 14 in strain J99, the length of the repeat observed in this study is different from the length of the repeat in the published sequences (Table 3), showing microevolution over a short time scale without being under selective pressure from the host. The polymorphisms observed in some sequences downstream of the repeat are suggestive of repeat instability in vivo, indicating that other compensatory mutations may occasionally revert the varied phenotypes.

Mixed populations in plate cultures
In many instances variation in the length of the repeat sequences could be seen in the direct sequencing traces from the genomic DNA preparations. Two possible contributory factors could account for this: (1) variation in the genomic DNA being used as a template, providing evidence of relatively high frequency changes in repeat length, and/or (2) variation in the repeat length generated by polymerase slippage during the PCR and/or sequencing reactions. Given the number of cycles involved, PCR has the greatest potential to artefactually generate such variation. This has been described by Jennings et al. (1995) and some stability has been found in shorter repeats during PCR (Wassenaar et al., 2002). Increased variability between genomic extractions and similar plasmid templates has been used to suggest in vivo variation for one gene with a particularly long repeat (Rocha et al., 2002). Variability seen during amplification tends to increase with repeat length, so to determine the relative contributions of in vivo and artefactual length variation, a number of repeat regions representing the different repeat lengths and compositions being studied were cloned, and the diluted plasmids were used as templates for direct and PCR amplified sequencing. These plasmids are likely to be homogeneous because they have been minimally passaged prior to use as PCR templates, and the cloned regions do not contain complete genes, and therefore are not under any functional selective pressure to diversify. The results of diversity generation from the plasmid templates were then compared with the genomic DNA prepared from H. pylori cultures. Representative results are shown in Fig. 2.



View larger version (38K):
[in this window]
[in a new window]
 
Fig. 2. Electropherograms of the sequencing reaction of H. pylori strains performed (a) directly from the plasmid with the cloned PCR product, (b) from the PCR product obtained from the plasmid as template and (c) from the PCR product obtained from the genomic DNA as template. Variations in the length of the homopolymeric tract observed for HP0619/JP96-9 (c) are attributable to repeats of different length in the template and not to a PCR artefact.

 
Results for the dinucleotide repeats were the same between plasmid and genomic templates whatever the length of the repeat, suggesting that all observed variation in repeat lengths within genomic extractions originates in in vivo variation. In contrast, mixed profiles were obtained from cloned templates for homopolymeric tracts longer than 11 nt (Fig. 2). However, for repeats longer than 11 nt, the sequence of the PCR products from the plasmid is more diverse than the sequences obtained directly from the genomic DNA, while the repeat and the region downstream from the repeat are still readable, although mixed. Thus, those repeats below 11 bp showing variation in the genomic DNA template sequences probably reflect in vivo variation and these are indicated in Table 3. The threshold at which variation is generated in PCR repeat amplification can vary with the template, buffer and enzyme used (N. J. Saunders, unpublished data) and the results described here should not be extrapolated without specifically addressing the length thresholds at which length variation occurs in a different system.

In one case (HP1353-4, a CDS encoding a methyltransferase) the combined effects of a problematic sequence context and a relatively long repeat prevented interpretable repeat-length determinations with extension into the flanking sequence in eight of the 15 strains with this gene. In this instance PCR products of these regions were cloned and sequencing was performed directly on the plasmid from two or, if different results were obtained, three clones. The results shown in Table 4 suggest significant variations in the template (in addition to that generated by amplification) and clearly show that repeat lengths cannot be reliably determined in this way.


View this table:
[in this window]
[in a new window]
 
Table 4. Evidence obtained from cloning experiments of variations in the length of the poly-C repeat located 5' of the HP1353-4 CDS

PCR products corresponding to the repeat-containing region were cloned and two to three of the clones obtained were sequenced. Clones with different repeat lengths were obtained from the same strain, suggesting that the length of the repeat varies within the template.

 
Differences in which genes are variable between strains
Comparative genome analyses have revealed some features of phase-variable genes in addition to ON-OFF switching that are likely to contribute to differences in behaviour between strains (Saunders et al., 2000; Snyder et al., 2001). The presence of the repeats associated with gene switching sometimes also varies between strains, due to the absence of the repeat, shortening of the repeat to a less unstable length or stabilization of the repeat by mutations within the repeated sequence. The repertoire of phase-variable genes therefore differs between strains, even when the gene complements are similar; strains in which potential phase-variable genes do not contain putatively unstable repeat tracts are shown as hatched boxes in Table 5.


View this table:
[in this window]
[in a new window]
 
Table 5. Presence and absence of the 30 putative phase-variable genes among the 23 strains studied

Black box, gene present and harbouring the repeat; hatched box, gene present, but repeat absent or stabilized (i.e. repeat interrupted by one or more bases or a homopolymeric tract <7 bp or dinucleotide tracts with <5 repeats); white box, gene absent.

 
Repeat presence and absence.
In 16 of 18 strains the hypothetical protein HP1433 is potentially phase-variable, having a long C repeat which is absent in two strains of the hpEurope populations (26695 and 111UK) (Table 5). This gene also displays other divergence, which is a fairly common feature of the phase-variable gene repertoire. Comparison of the published sequences, between which the genes have about 84 % identity, and the sequences obtained in this study show that many recombinations occur within this gene, generating a mosaic structure affecting a region of 825 bp (in strain 26695) to 612 bp (in strain J99).

Repeat shortening.
HP1471, a gene encoding a type II restriction-modification enzyme, is an example of a gene in which severe repeat shortening occurs, for which the length of its poly(G) tract normally ranges between 10 and 14 bp. In strain L72 the number of Gs is reduced to two. However, less extreme shortening is also seen in other genes. For example, while the lower limit of 7 bp has been used to indicate the presence of a potentially variable repeat in Table 3, which is probably relatively stable, shorter repeats of C6, as seen in HP0379 in strain L67 with an additional T to return the gene in-frame, is considered to be stable (Table 5).

Repeat stabilization.
Stabilization is observed in more than one gene with long C repeats (jhp0820, HP0684-5) and also within dinucleotide repeats (HP0638), as shown by Yamaoka et al. (2000). For fliP (HP0684-5), three kinds of repeats are found: C9, C8 and ‘C8 stabilized’ (CCCCACCC in 14 strains or CCCCTCCC for strain 111UK), so it is unlikely to be phase-variable in all strains. FliP is required for flagella expression and is essential for colonization (Josenhans et al., 2000). The adaptive advantage of phase variation of this phenotype for those strains that are able to switch between flagellate and non-flagellate states will require the identification of the niche in which the phase OFF state confers a fitness advantage.

The relative contributions of repeat length and repeat context to instability have yet to be determined. In other words, it is not currently clear to what extent the presence of a repeat reflects a region prone to slippage, or whether the repeat is the sole source of local variation. As such it is not possible currently to extrapolate simply from repeat length to frequency of gene switching. This has been addressed using a model with tetrameric repeats in Hae. influenzae (De Bolle et al., 2000), but a similar analysis is not currently available for homopolymeric tracts. The shorter repeats have been shown to be associated with phase variation (Josenhans et al., 2000; Tannaes et al., 2001). However, it is likely that the length of these repeats influences the rate of phase variation, and while some genes seem to have a fairly restricted repeat length range such as C8 to C9 for HP0684-5 and G7 to G9 for HP0499, other genes vary their repeat length substantially between strains (Table 3) and this is probably associated with different switching rates in different strains over time.

Gene complement differences between the strains
The second major source of variation between strains is differences in gene complements. Comparison of two N. meningitidis genome sequences (strains MC58 and Z2491) showed a greater proportion of gene-complement differences in phase-variable genes than others (Saunders et al., 2000). This was more evident when comparing meningococcal and gonococcal sequences (Snyder et al., 2001). This may in part reflect the fact that phase-variable genes cannot be essential under all conditions or else organisms would not be viable when the genes were switched OFF. It probably also reflects a role in conferring strain-specific behavioural characteristics. The presence of the phase-variable genes in the strains investigated is shown in Tables 3 and 5.

Among the 30 genes that show variation in the length of the repeat, eight (HP0651, HP0379, HP0093-4, HP0217, HP0638, HP0143, HP0499 and HP0464) are present in all 23 strains under study, whereas 14 of the 17 genes that show no repeat-length variation are present in the 21 strains in which they have been investigated (Table 3). This is the clearest example to date of the fact that this subset of genes is particularly associated with inter-strain differences. The genes showing gene complement differences are a subset consistent with those that were strain-specific as described by Björkholm et al. (2002) assessed using microarrays.

Phase-variable genes and population structure
The various forms of inter-strain differences in the genes were compared with the population-structure relatedness of the strains as assessed by MLST. There were no consistent associations between the population structure and the distribution of any particular allelic marker, repeat sequence stabilization or gene complement features. This probably reflects a much higher degree of recombination of these genes that are under diversifying selection than of the housekeeping genes that are supposed to be under neutral (functional) selection. Whatever the functional class of the proteins encoded by the putative phase-variable genes, no correlation has been observed between its presence/absence and the geographic population associations of the strains. The geographic origin of the strains does not correlate with the status of the putative phase-variable genes, the presence of the repeat, or the variation in length of the repeat.

The findings described in this paper represent a new and significantly more robust foundation for the experimental pursuit of the phase-variable genes of H. pylori. The comparative analysis improved the initial starting list for analysis and the sequencing has demonstrated a set of 30 genes that can now be considered to be phase-variable with a reasonably high degree of certainty. It may be that other strains harbour longer and potentially unstable repeats in the remaining 16 genes, but on the basis of the available evidence it is likely that these are not phase-variable. The additional levels of inter-strain variability highlight the combined effects of a variable subset of genes, which has been suggested from other genome analyses, but not previously extended in this way. This has to be understood in the context of the particular abilities of H. pylori to generate diversity by other means, particularly to undergo inter-strain recombination (Go et al., 1996; Salaün et al., 1998; Suerbaum et al., 1998; Falush et al., 2001). Awareness and consideration of this complexity will be necessary for the proper understanding of this organism and, with particular reference to its phase-variable genes, its interactions with the host.


   ACKNOWLEDGEMENTS
 
A Wellcome Trust Advanced Research Fellowship awarded to N. J. S. supports N. J. S. and L. S.


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Achtman, M., Azuma, T., Berg, D. E. & 7 other authors (1999). Recombination and clonal groupings within Helicobacter pylori from different geographical regions. Mol Microbiol 32, 459–470.[CrossRef][Medline]

Akopyants, N. S., Clifton, S. W., Kersulyte, D. & 7 other authors (1998). Analyses of the cag pathogenicity island of Helicobacter pylori. Mol Microbiol 28, 37–53.[CrossRef][Medline]

Alm, R. A., Ling, L. S., Moir, D. T. & 20 other authors (1999). Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature 397, 176–180.[CrossRef][Medline]

Appelmelk, B. J., Shiberu, B., Trinks, C. & 10 other authors (1998). Phase variation in Helicobacter pylori lipopolysaccharide. Infect Immun 66, 70–76.[Abstract/Free Full Text]

Appelmelk, B. J., Martin, S. L., Monteiro, M. A. & 10 other authors (1999). Phase variation in Helicobacter pylori lipopolysaccharide due to changes in the lengths of poly(C) tracts in alpha3-fucosyltransferase genes. Infect Immun 67, 5361–5366.[Abstract/Free Full Text]

Appelmelk, B. J., Martino, M. C., Veenhof, E. & 7 other authors (2000). Phase variation in H type I and Lewis a epitopes of Helicobacter pylori lipopolysaccharide. Infect Immun 68, 5928–5932.[Abstract/Free Full Text]

Atherton, J. C., Peek, R. M., Jr, Tham, K. T., Cover, T. L. & Blaser, M. J. (1997). Clinical and pathological importance of heterogeneity in vacA, the vacuolating cytotoxin gene of Helicobacter pylori. Gastroenterology 112, 92–99.[Medline]

Björkholm, B. M., Guruge, J. L., Oh, J. D. & 8 other authors (2002). Colonization of germ-free transgenic mice with genotyped Helicobacter pylori strains from a case-control study of gastric cancer reveals a correlation between host responses and HsdS components of type I restriction-modification systems. J Biol Chem 277, 34191–34197.[Abstract/Free Full Text]

Blaser, M. J. (1997). Ecology of Helicobacter pylori in the human stomach. J Clin Invest 100, 759–762.[Free Full Text]

Boneca, I. G., de Reuse, H., Epinat, J. C., Pupin, M., Labigne, A. & Moszer, I. (2003). A revised annotation and comparative analysis of Helicobacter pylori genomes. Nucleic Acids Res 31, 1704–1714.[Abstract/Free Full Text]

Bonfield, J. K., Beal, K. F., Betts, M. J. & Staden, R. (2002). Trev: a DNA trace editor and viewer. Bioinformatics 18, 194–195.[Abstract/Free Full Text]

Bunn, J. E., MacKay, W. G., Thomas, J. E., Reid, D. C. & Weaver, L. T. (2002). Detection of Helicobacter pylori DNA in drinking water biofilms: implications for transmission in early life. Lett Appl Microbiol 34, 450–454.[CrossRef][Medline]

De Bolle, X., Bayliss, C. D., Field, D., van de Ven, T., Saunders, N. J., Hood, D. W. & Moxon, E. R. (2000). The length of a tetranucleotide repeat tract in Haemophilus influenzae determines the phase variation rate of a gene with homology to type III DNA methyltransferases. Mol Microbiol 35, 211–222.[CrossRef][Medline]

de Vries, N., Duinsbergen, D., Kuipers, E. J., Pot, R. G., Wiesenekker, P., Penn, C. W., Van Vliet, A. H., Vandenbroucke-Grauls, C. M. & Kusters, J. G. (2002). Transcriptional phase variation of a type III restriction-modification system in Helicobacter pylori. J Bacteriol 184, 6615–6623.[Abstract/Free Full Text]

Durbin, R. & Thierry-Mieg, J. T. (1991). A C. elegans DataBase. Documentation, code and data available from http://www.acedb.org

Falush, D., Kraft, C., Taylor, N. S., Correa, P., Fox, J. G., Achtman, M. & Suerbaum, S. (2001). Recombination and mutation during long-term gastric colonization by Helicobacter pylori: estimates of clock rates, recombination size, and minimal age. Proc Natl Acad Sci U S A 98, 15056–15061.[Abstract/Free Full Text]

Falush, D., Wirth, T., Linz, B. & 15 other authors (2003). Traces of human migrations in Helicobacter pylori populations. Science 299, 1582–1585.[Abstract/Free Full Text]

Go, M. F., Kapur, V., Graham, D. Y. & Musser, J. M. (1996). Population genetic analysis of Helicobacter pylori by multilocus enzyme electrophoresis: extensive allelic diversity and recombinational population structure. J Bacteriol 178, 3934–3938.[Abstract]

Hammerschmidt, S., Muller, A., Sillmann, H. & 7 other authors (1996). Capsule phase variation in Neisseria meningitidis serogroup B by slipped-strand mispairing in the polysialyltransferase gene (siaD): correlation with bacterial invasion and the outbreak of meningococcal disease. Mol Microbiol 20, 1211–1220.[Medline]

Hood, D. W., Deadman, M. E., Jennings, M. P., Bisercic, M., Fleischmann, R. D., Venter, J. C. & Moxon, E. R. (1996). DNA repeats identify novel virulence genes in Haemophilus influenzae. Proc Natl Acad Sci U S A 93, 11121–11125.[Abstract/Free Full Text]

Ilver, D., Arnqvist, A., Ogren, J. & 7 other authors (1998). Helicobacter pylori adhesin binding fucosylated histo-blood group antigens revealed by retagging. Science 279, 373–377.[Abstract/Free Full Text]

Jennings, M. P., Hood, D. W., Peak, I. R., Virji, M. & Moxon, E. R. (1995). Molecular analysis of a locus for the biosynthesis and phase-variable expression of the lacto-N-neotetraose terminal lipopolysaccharide structure in Neisseria meningitidis. Mol Microbiol 18, 729–740.[Medline]

Josenhans, C., Eaton, K. A., Thevenot, T. & Suerbaum, S. (2000). Switching of flagellar motility in Helicobacter pylori by reversible length variation of a short homopolymeric sequence repeat in fliP, a gene encoding a basal body protein. Infect Immun 68, 4598–4603.[Abstract/Free Full Text]

Lee, A., O'Rourke, J., De Ungria, M. C., Robertson, B., Daskalopoulos, G. & Dixon, M. F. (1997). A standardized mouse model of Helicobacter pylori infection: introducing the Sydney strain. Gastroenterology 112, 1386–1397.[Medline]

Logan, R. P. & Berg, D. E. (1996). Genetic diversity of Helicobacter pylori. Lancet 348, 1462–1463.[Medline]

Logan, S. M., Conlan, J. W., Monteiro, M. A., Wakarchuk, W. W. & Altman, E. (2000). Functional genomics of Helicobacter pylori: identification of a beta-1,4 galactosyltransferase and generation of mutants with altered lipopolysaccharide. Mol Microbiol 35, 1156–1167.[CrossRef][Medline]

Mahdavi, J., Sonden, B., Hurtig, M. & 20 other authors (2002). Helicobacter pylori SabA adhesin in persistent infection and chronic inflammation. Science 297, 573–578.[Abstract/Free Full Text]

Majmudar, P., Shah, S. M., Dhunjibhoy, K. R. & Desai, H. G. (1990). Isolation of Helicobacter pylori from dental plaques in healthy volunteers. Indian J Gastroenterol 9, 271–272.[Medline]

Marshall, B. (1994). Helicobacter pylori. Am J Gastroenterol 89, S116–S118.[Medline]

Marshall, B. J. & Warren, J. R. (1984). Unidentified curved bacilli in the stomach of patients with gastritis and peptic ulceration. Lancet 1, 1311–1315.[Medline]

Martin, P., Van De Ven, T., Mouchel, N., Jeffries, A. C., Hood, D. W. & Moxon, E. R. (2003). Experimentally revised repertoire of putative contingency loci in Neisseria meningitidis strain MC58: evidence for a novel mechanism of phase variation. Mol Microbiol 50, 245–257.[CrossRef][Medline]

Nomura, A., Stemmermann, G. N., Chyou, P. H., Perez-Perez, G. I. & Blaser, M. J. (1994). Helicobacter pylori infection and the risk for duodenal and gastric ulceration. Ann Intern Med 120, 977–981.[Abstract/Free Full Text]

Park, S. R., Mackay, W. G. & Reid, D. C. (2001). Helicobacter sp. recovered from drinking water biofilm sampled from a water distribution system. Water Res 35, 1624–1626.[CrossRef][Medline]

Parkhill, J., Wren, B. W., Mungall, K. & 18 other authors (2000). The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences. Nature 403, 665–668.[CrossRef][Medline]

Parsonnet, J. (1995). The incidence of Helicobacter pylori infection. Aliment Pharmacol Ther 9, 45–51.[Medline]

Parsonnet, J., Hansen, S., Rodriguez, L., Gelb, A. B., Warnke, R. A., Jellum, E., Orentreich, N., Vogelman, J. H. & Friedman, G. D. (1994). Helicobacter pylori infection and gastric lymphoma. N Engl J Med 330, 1267–1271.[Abstract/Free Full Text]

Peck, B., Ortkamp, M., Diehl, K. D., Hundt, E. & Knapp, B. (1999). Conservation, localization and expression of HopZ, a protein involved in adhesion of Helicobacter pylori. Nucleic Acids Res 27, 3325–3333.[Abstract/Free Full Text]

Rocha, E. P., Pradillon, O., Bui, H., Sayada, C. & Denamur, E. (2002). A new family of highly variable proteins in the Chlamydophila pneumoniae genome. Nucleic Acids Res 30, 4351–4360.[Abstract/Free Full Text]

Salaün, L., Audibert, C., Le Lay, G., Burucoa, C., Fauchere, J. L. & Picard, B. (1998). Panmictic structure of Helicobacter pylori demonstrated by the comparative study of six genetic markers. FEMS Microbiol Lett 161, 231–239.[CrossRef][Medline]

Salaün, L., Snyder, L. A. & Saunders, N. J. (2003). Adaptation by phase variation in pathogenic bacteria. Adv Appl Microbiol 52, 263–301.[Medline]

Saunders, N. J. (1999). Bacterial phase variation associated with repetitive DNA. PhD thesis, The Open University.

Saunders, N. J. (2003). Evasion of antibody responses: bacterial phase variation. In Bacterial Evasion of Host Immune Responses, pp. 103–124. Edited by B. Henderson & P. C. F. Oyston. Cambridge: Cambridge University Press.

Saunders, N. J., Peden, J. F., Hood, D. W. & Moxon, E. R. (1998). Simple sequence repeats in the Helicobacter pylori genome. Mol Microbiol 27, 1091–1098.[CrossRef][Medline]

Saunders, N. J., Jeffries, A. C., Peden, J. F., Hood, D. W., Tettelin, H., Rappuoli, R. & Moxon, E. R. (2000). Repeat-associated phase variable genes in the complete genome sequence of Neisseria meningitidis strain MC58. Mol Microbiol 37, 207–215.[CrossRef][Medline]

Snyder, L. A., Butcher, S. A. & Saunders, N. J. (2001). Comparative whole-genome analyses reveal over 100 putative phase-variable genes in the pathogenic Neisseria spp. Microbiology 147, 2321–2332.[Abstract/Free Full Text]

Suerbaum, S., Smith, J. M., Bapumia, K., Morelli, G., Smith, N. H., Kunstmann, E., Dyrek, I. & Achtman, M. (1998). Free recombination within Helicobacter pylori. Proc Natl Acad Sci U S A 95, 12619–12624.[Abstract/Free Full Text]

Tannaes, T., Dekker, N., Bukholm, G., Bijlsma, J. J. & Appelmelk, B. J. (2001). Phase variation in the Helicobacter pylori phospholipase A gene and its role in acid adaptation. Infect Immun 69, 7334–7340.[Abstract/Free Full Text]

Tettelin, H., Saunders, N. J., Heidelberg, J. & 39 other authors (2000). Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science 287, 1809–1815.[Abstract/Free Full Text]

Tomb, J. F., White, O., Kerlavage, A. R. & 39 other authors (1997). The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539–547.[CrossRef][Medline]

van Ham, S. M., van Alphen, L., Mooi, F. R. & van Putten, J. P. (1993). Phase variation of Haemophilus influenzae fimbriae: transcriptional control of two divergent genes through a variable combined promoter region. Cell 73, 1187–1196.[Medline]

Wang, G., Rasko, D. A., Sherburne, R. & Taylor, D. E. (1999). Molecular genetic basis for the variable expression of Lewis Y antigen in Helicobacter pylori: analysis of the alpha (1,2) fucosyltransferase gene. Mol Microbiol 31, 1265–1274.[CrossRef][Medline]

Wassenaar, T. M., Wagenaar, J. A., Rigter, A., Fearnley, C., Newell, D. G. & Duim, B. (2002). Homonucleotide stretches in chromosomal DNA of Campylobacter jejuni display high frequency polymorphism as detected by direct PCR analysis. FEMS Microbiol Lett 212, 77–85.[CrossRef][Medline]

Yamaoka, Y., Kwon, D. H. & Graham, D. Y. (2000). A M(r) 34,000 proinflammatory outer membrane protein (oipA) of Helicobacter pylori. Proc Natl Acad Sci U S A 97, 7533–7538.[Abstract/Free Full Text]

Yamaoka, Y., Kikuchi, S., el-Zimaity, H. M., Gutierrez, O., Osato, M. S. & Graham, D. Y. (2002a). Importance of Helicobacter pylori oipA in clinical presentation, gastric inflammation, and mucosal interleukin 8 production. Gastroenterology 123, 414–424.[CrossRef][Medline]

Yamaoka, Y., Kita, M., Kodama, T., Imamura, S., Ohno, T., Sawai, N., Ishimaru, A., Imanishi, J. & Graham, D. Y. (2002b). Helicobacter pylori infection in mice: role of outer membrane proteins in colonization and inflammation. Gastroenterology 123, 1992–2004.[CrossRef][Medline]

Yogev, D., Rosengarten, R., Watson-McKown, R. & Wise, K. S. (1991). Molecular basis of Mycoplasma surface antigenic variation: a novel set of divergent genes undergo spontaneous mutation of periodic coding regions and 5' regulatory sequences. EMBO J 10, 4069–4079.[Abstract]

Received 17 December 2003; accepted 17 December 2003.



This Article
Abstract
Full Text (PDF)
Supplementary table
Alert me when this article is cited
Alert me if a correction is posted
Citation Map
Services
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Google Scholar
Articles by Salaün, L.
Articles by Saunders, N. J.
Articles citing this Article
PubMed
PubMed Citation
Articles by Salaün, L.
Articles by Saunders, N. J.
Agricola
Articles by Salaün, L.
Articles by Saunders, N. J.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
INT J SYST EVOL MICROBIOL MICROBIOLOGY J GEN VIROL
J MED MICROBIOL ALL SGM JOURNALS
Copyright © 2004 Society for General Microbiology.