Genetic variation of dTDP-L-rhamnose pathway genes in Salmonella enterica

Qun Li1 and Peter R. Reeves1

Department of Microbiology (G08), University of Sydney, NSW 2006, Australia1

Author for correspondence: Peter R. Reeves. Tel: +612 9351 2536. Fax: +612 9351 4571. e-mail: reeves{at}angis.usyd.edu.au


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
The genetic variation in the dTDP-L-rhamnose pathway gene set (rmlB, rmlD, rmlA, rmlC) in Salmonella enterica was examined after sequencing the four genes from 11 rml-containing gene clusters encoding seven O antigens, and a 903 bp rmlB segment from another 23 strains representing the seven subspecies. There was considerable sequence variation and strong polarity in the nature and level of variation among rml genes. The 5' end of the rml gene set, including rmlB, rmlD and most of rmlA, is in general subspecies specific. In contrast, the 3' end, including part of rmlA and all of rmlC, is O antigen specific. The G+C content of the 3' end is lower than that of the 5' end. The variation in the 3' end of the gene set is much greater than that of the 5' end. It is apparent that the rml gene set of S. enterica includes genes with two different evolutionary histories. In addition, there has been extensive recombination in the gene set, probably related to O antigen transfer between subspecies. These findings provide evidence for the lateral transfer of O antigen genes between species and among subspecies of S. enterica. The results have also shown that conserved genes at the end of an O antigen gene cluster play a major role in mediating exchange of the central serogroup-specific regions.

Keywords: rml genes, rhamnose pathway, Salmonella enterica, lateral gene transfer

Abbreviations: CPS, capsular polysaccharide

The GenBank accession numbers for the sequences reported in this paper are AF279615AF279625 for the rml gene sets and AF279626AF279648 for the rmlB gene fragments.


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
The O antigen is a major surface component contributing to the variation of the Gram-negative bacterial cell wall. It is the exposed part of the lipopolysaccharide that replaces phospholipid in the outer layer of the outer membrane. Structurally, O antigen consists of repeated oligosaccaride units which typically comprise 3–6 monosaccharide residues. The variation in O antigen lies in the sugar composition and the linkages between monosaccharides. This variation is used as one of the criteria in the Kauffmann–White scheme for serotyping Salmonella enterica and Escherichia coli. Forty-six types of O antigen are recognized in S. enterica and 180 types in E. coli (including Shigella strains), among which only three are common to the two species, indicating that there have been extensive changes in O antigen since the two species diverged (Reeves, 1997 ). The S. enterica O antigens were first described by types (A, B, C, etc.), each with a combination of serological specificities (1, 2, etc.). Popoff & Le Minor (1992) proposed use of only the dominant epitope as the name of the O antigen, and we have done this except for those (A, B, C1, C2, D1, D2, D3, E1 and E4) very well known by their alphabetical names.

The genes involved in the biosynthesis of O antigen are clustered at 45 min on the chromosome in S. enterica, flanked by the galF and gnd genes at the 5' and 3' ends, respectively. The G+C content of O antigen gene clusters is generally atypical, suggesting that they have transferred from other species relatively recently (Reeves, 1993 ). The 2435 serovars of S. enterica are classified into seven subspecies (I, II, IIIa, IIIb, IV, V, VI) based on their biochemical characteristics (Popoff & Le Minor, 1997 ). This classification has been confirmed by multilocus enzyme electrophoresis and nucleotide sequencing of several housekeeping genes: gapA (glyceraldehyde-3-phosphate dehydrogenase), mdh (malate dehydrogenase) and putP (proline permease) (Boyd et al., 1994 ; Nelson & Selander, 1992 ; Nelson et al., 1991 ). Both methods have given discrete groups for each subspecies, suggesting very little recombination between subspecies. However, O antigen genes appear to have transferred among subspecies, as the majority of S. enterica O antigens are found in at least two subspecies with a mean of 3·5 subspecies per O antigen (Popoff & Le Minor, 1997 ; Reeves, 1995 ). The relatively higher level of transfer of O antigen gene clusters within or between species is believed to reflect the advantage from time to time of an alternative O antigen for bacterial adaptation to new niches, followed by natural selection for recombinants (Reeves, 1997 ).

Although research in S. enterica, E. coli and other species makes it evident that O antigen gene clusters have been subject to gene transfer between species (Comstock et al., 1995 ; Jiang et al., 1991 ; Stevenson et al., 1994 ), there are few studies on the origin of the O antigen gene clusters and the extent of their transfer between species, due to the limited availability of O antigen and other polysaccharide gene cluster sequences.

The genes in O antigen gene clusters generally fall into three classes: pathway genes for the biosynthesis of nucleotide sugars; transferase genes, mostly glycosyl transferase genes, for the synthesis and modification of the O unit; and processing genes, such as wzx and wzy, for the polymerization and transport of O units (Reeves et al., 1996 ). Compared with transferase genes and processing genes, which are very heterogeneous among different O antigens due to the wide range of linkages involved, pathway genes are generally homologous throughout all species. Thus, pathway genes are the good candidates for studying relationships and lateral gene transfer of O antigen gene clusters.

L-Rhamnose (rhamnose) is a 6-deoxyhexose sugar which is widely distributed in O antigens of Gram-negative bacteria and is also commonly present in the capsular polysaccharides (CPSs) of Gram-positive bacteria. dTDP-L-rhamnose is the activated precursor of the rhamnose moiety in O antigens and CPSs. rmlA, rmlB, rmlC and rmlD encode (in biosynthetic pathway order) glucose-1-phosphate thymidylyl-transferase, dTDP-D-glucose-4,6-dehydratase, dTDP-6-deoxy-D-glucose-3,5-epimerase and dTDP-6-deoxy-L-mannose dehydrogenase, respectively, which are responsible for the four-step biosynthesis of dTDP-L-rhamnose from glucose 1-phosphate. Some of the sugars commonly found in O antigens and CPSs are also involved in general metabolism, and the biosynthetic pathway genes for UDP-glucose and UDP-galactose, for example, are generally on the chromosome outside of the O antigen or CPS gene clusters. However, rhamnose is commonly present in bacteria only as a component of a surface polysaccharide and the four rml genes are generally arranged as a separate group within the O antigen or CPS gene cluster. The four rml genes have been identified in a range of species and are clearly homologous, although the gene order may vary from species to species (DeShazer et al., 1998 ; Guidolin et al., 1994 ; Koplin et al., 1993 ; Mitchison et al., 1997 ; Tsukioka et al., 1997 ). Therefore rml genes could be useful in studying the relationships of O antigen and other gene clusters.

Before investigating rml genes from a broad range of species, we first inspected the variation within a species and in this report focus on genetic variation in the four rml genes in S. enterica. Lüderitz et al. (1966) determined the major polysaccharide components of the then 37 known LPS forms in S. enterica and found that 12 have rhamnose in their O antigens. Among the nine O antigens which were found after Lüderitz’s study, three (E4, D3 and O62) were found to include rhamnose by DNA hybridization (Xiang et al., 1993 ) or determination of their structure (Vinogradov et al., 1994 ). O67 is a variant of B (Lei & Reeves, unpublished), O61 has no rhamnose by structural study (Vinogradov et al., 1992 ) and nothing is known about the sugar constituents of O60, O63, O65 and O66.

Of the eight previously studied and closely related O antigens of S. enterica (A, B, C2, D1, D2, D3 E1 and E4) the rml gene set of O antigen B has been fully sequenced (Jiang et al., 1991 ) and that of O antigen E1 nearly fully sequenced (Wang et al., 1992 ). It was found that the rml genes for both O antigens are arranged in the order rmlB, rmlD, rmlA and rmlC at the 5' end of the O cluster. rmlB, rmlD and most of rmlA for O antigen E1 are very similar to those of O antigen B. However, similarity of rml genes of E1 with those of B falls sharply from near the end of rmlA to the end of rmlC (Wang et al., 1992 ). The four rml genes for O antigens A, C2, D1, D2 and D3 are almost identical to those of O antigen B, based on restriction mapping (Brown et al., 1992 ; Liu et al., 1991 ; Reeves, 1993 ; Xiang, 1995 ). Likewise, the gene clusters for E1 and E4 have the same hybridization patterns, even for glycosyl transferase genes (Xiang et al., 1993 ). O antigens E2 and E3 have already been incorporated into E1 (Popoff & Le Minor, 1997 ) as they have the same chromosomal gene cluster with the differences known to be due to genes on converting phages, and the difference between E1 and E4 is proposed to be also due to the presence of a gene(s) on a converting phage in E4, although the phage has not been observed (Xiang et al., 1993 ). For other rhamnose-containing O antigens (O11, O28, O42, O53, O57 and O59), a gradient of divergence was observed when probes of the four rml genes of O antigen B strain LT2 were used for hybridization (Xiang et al., 1993 ). The rmlC genes of O11, O28, O42, O53, O57 and O59 strains, like that of E1, all failed to hybridize. For some of these O antigens, rmlA showed only weak hybridization, whereas all were strongly positive for rmlB and rmlD.

In this study, we sequenced the rml genes from a representative strain for each O antigen containing rhamnose, including an O62 strain, other than for those already sequenced or, as discussed above, known to be almost identical to those of the B or E1 O antigen gene clusters. We also sequenced the 5' end of the O antigen gene clusters of O60, O63, O65 and O66 to see if they contain rml genes. A further analysis of sequence and phylogeny was performed based on rml genes from all rhamnose-containing O antigens of S. enterica.


   METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Bacterial strains and phage.
The Salmonella strains (Table 1) were kindly provided by C. Murray from the Institute of Medical and Veterinary Science, Adelaide, Australia and M. Y. Popoff from the Pasteur Institute, Paris, France. The strains are referred to in the text by our laboratory number followed by the O antigen and the subspecies in parentheses. rmlC mutant strain P5435 was constructed by our laboratory (data not shown). Phage Ffm was kindly provided by C. Whitfield from the Department of Microbiology, University of Guelph, Canada.


View this table:
[in this window]
[in a new window]
 
Table 1. S. enterica strains

 
Molecular methods.
Chromosomal DNA of each strain was prepared as described previously (Bastin & Reeves, 1995 ). Primers used for PCR and sequencing are listed in Table 2. For the strains known to have rhamnose in their O antigens, rmlB, rmlD and the 5' half of rmlA were amplified by using primers designed previously for sequencing the corresponding genes of P9003 (LT2) (Jiang et al., 1991 ): most of rmlB was amplified and sequenced for all strains other than M1952 (62,IIIa) using primer pair 63/39. To amplify and sequence the 3' end of rmlB and rmlD, primer pair 53/61 was used for M287 (42,II), M303 (53,II) and M293 (57,II); two primer pairs, 62/40 and 62/61, were used for M285 (59,II), and 53/61 for M324 (11,VI). To amplify and sequence the 5' half of rmlA, primer pair 52/72 was used for M285 (59,II) and M293 (57,II), 187/72 for M287 (42,II) and M303 (53,II), and 53/72 for M324 (11,VI). To obtain the sequence of rmlC and the 3' end of rmlA from those strains which showed considerable divergence from LT2, the segment from the middle of rmlA to the beginning of gnd was first amplified by long range PCR using primers 719 and 575. Primer 719 is located in the middle of rmlA and primer 575 is located at the 5' end of gnd. Using these PCR products as templates, the sequences from the middle of rmlA to the end of rmlC were obtained by primer walking.


View this table:
[in this window]
[in a new window]
 
Table 2. Oligonucleotide primers used for PCR and sequencing

 
The above primers could not amplify rml genes from M1952 (62,IIIa). To obtain the rml gene sequences from this strain, the O antigen gene cluster was first amplified by long range PCR using primers 412 and 482, located at the JUMPstart sequence and gnd gene, respectively. The JUMPstart sequence is a 39 bp conserved element present in the upstream region of polysaccharide gene clusters of many species (Hobbs & Reeves, 1994 ). Using the PCR product as template, sequencing from the JUMPstart primer into the O antigen gene cluster showed that the rml genes of M1952 (62,IIIa) were also present at the 5' end of the O antigen gene cluster: the rml gene sequences were obtained by primer walking.

The same method was used to sequence the 5' end sequences of M1939 (60,V), M1953 (63,IIIa), M330 (65,IIIb) and M322 (66,V) O antigen gene clusters.

Amplification of long range PCR products was carried out by using the expand long-range PCR kit from Boehringer Mannheim according to the manufacturer’s instructions. Sequencing was carried out by the Sydney University and Prince Alfred Macromolecular Analysis Centre (SUPAMAC) using an Applied Biosystems model 377A automated DNA sequencing system and the Applied Biosystems dye terminator cycle sequencing kit.

Cloning of the rmlC gene from M269 (28,I).
No rmlC gene was found immediately downstream of rmlA in M269 (28,I) and two O53 strains. A selection method for cloning rmlC genes was devised by modification of the method described by Clarke & Whitfield (1992) . Bacteriophage Ffm (Wilkinson et al., 1972 ) lyses E. coli strains with rough LPS (Schmidt et al., 1974 ). P5435 is a E. coli K-12 strain that was constructed so that it contains all the genes necessary for the biosynthesis of K-12 O antigen with the exception of rmlC, which was deleted (data not shown). The plasmid gene banks of these strains were constructed as follows: the chromosomal DNA was partially digested with Sau3AI. DNA fragments of 2–8 kb were collected from an agarose gel and ligated to BamHI-digested pGEM7zf(+) vector. The ligation mix was then transformed into P5435 by electroporation and the transformants applied to plates pre-seeded with 105 p.f.u. phage Ffm. The bacteria transformed with an rmlC-containing plasmid would make smooth LPS, which would prevent them from being lysed by the phage Ffm, and such bacteria were obtained by selection for resistance to the bacteriophage Ffm. The bacteria were further examined by agglutination with O16 antisera and the plasmids were confirmed by sequencing of the insert.

Computer analysis.
DNA sequence data were assembled and edited using programs from ANGIS (The Australian National Genomic Information Service) at the University of Sydney. Pairwise comparisons and polymorphism analysis of DNA sequence data were conducted using the MULTICOMP package (Reeves et al., 1994 ), which incorporates a number of programs for DNA sequence and phylogeny analysis. Phylogenetic trees were constructed both by the parsimony method using PAUP (version 4.0) and the neighbour-joining method (Saitou & Nei, 1987 ) using PHYLIP (version 3.4, written by J. Felsenstein, Department of Genetics, University of Washington, Seattle, USA). Except for some minor differences, the trees generated have the same topology, and only neighbour-joining trees are presented in this study. Intragenic recombination was detected by the Stephens test (Stephens, 1985 ) and the Maximum chi-squared program (version 1.0, written by B. Spratt and N. Ross, School of Biological Sciences, University of Sussex, UK) (Smith, 1992 ).


   RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Sequencing rml genes
We sequenced the four rml genes from 11 strains with O11, O28, O42, O53, O57, O59 or O62, being the rhamnose-containing O antigens for which we did not have rml sequence data. For O11, O42, O53 and O57, strains were selected from two subspecies to examine any relationship between variation of rml genes and subspecies. In all strains except for M269 (28,I) and the two O53 strains, the four rml genes are clustered in the order rmlB, rmlD, rmlA, rmlC at the 5' end of the O cluster as previously observed for B and closely related O antigens.

There are four O antigens for which we have no information on whether they contain rhamnose (see Introduction). As all rml gene sets found so far in S. enterica are located at the 5' end of the O antigen gene cluster, we sequenced the region from strains M1939 (60,V), M1953 (63,IIIa), M330 (65,IIIb) and M322 (66,V). M1939 (60,V) has rmlB and rmlA at the 5' end of the O antigen gene cluster, but no rmlD or rmlC genes were found downstream of rmlA. As rmlB and rmlA are also involved in the biosynthesis of dTDP-N-acetyl fucosamine (Kuhn et al., 1984 ) and dTDP-N-acetyl viosamine (Reeves et al., 1996 ), we suggest that the two rml genes of M1939 (60,V) are involved in the biosynthesis of a sugar other than rhamnose. The O63, O65 and O66 strains lacked rml genes at this region. We presume that they contain no rhamnose in their O antigens and did not work further with them.

Attempted cloning of rmlC genes from M269 (28,I) and the two O53 strains
In M269 (28,I), M303 (53,II) and M1891 (53,IIIb), no rmlC gene was found immediately downstream of rmlA. We attempted to clone rmlC genes from plasmid gene banks of these strains, using an E. coli K-12 host strain with rmlC deleted and selecting for O antigen synthesis using phage Ffm (see Methods). One plasmid, pPR1996, carrying a 5 kb fragment starting from residue 301 of rmlB was obtained from M269 (28,I). Further sequencing showed that the rmlC gene of M269 (28,I) was in the O antigen gene cluster but separated from the rmlA gene by a complete ORF of 271 aa that has considerable similarity to a putative glycosyl-transferase gene (amsE) in the amylovoran gene cluster of Erwinia amylovora (Koplin et al., 1993 ). There are 28 bp and 25 bp intergenic regions upstream and downstream of this ORF, respectively. No rmlC clones were obtained with DNA from the two O53 strains.

Sequence variation
The new rml sequences include representatives of all rhamnose-containing O antigens other than B and E1, which are already sequenced, and C2, D1, D2, D3 and E4, which are known to be very similar to the O antigen B or E1 genes (see above). For the analysis of variation in rml genes, we included the previously published sequences from P9003 (B,I) and M32 (E1,I). Analysis revealed that all strains except M1952 (62,IIIa) are related in that their 5' end rml genes are very similar. All four rml genes of M1952 (62,IIIa), however, are very divergent from those of the other strains. We first focus on the sequence variation among strains other than M1952 (62,IIIa).

The average G+C contents are 0·43, 0·50, 0·45 and 0·34 mol% for rmlB, rmlD, rmlA and rmlC, respectively and in general lower than that of the S. enterica genome (0·52 mol%) (Ochman & Lawrence, 1996 ). Note that the average G+C content of rmlC is much lower than that of the other three rml genes and that of the third codon base of rmlC is only 0·21 mol% (0·18–0·28 mol%).

The sequences were aligned and alignments generated for all polymorphic and informative-only sites. Informative sites are those which affect branching of trees as they occur in at least two different nucleotides, with each present in at least two sequences. There are 802 polymorphic sites including 505 informative sites. In general, non-informative polymorphic sites have similar distributions as informative sites (shown in Fig. 1), but there is a cluster of 35 such sites at the 3' end of the rmlB gene (positions 726–870, see Fig. 4) varying only in M285 (59,II). Among rmlB sequences, there were 140 polymorphic sites, including 33 amino acid replacement sites. rmlD had 111 polymorphic sites, of which 25 were amino acid replacement sites. For rmlA, there were 218 polymorphic sites, including 45 amino acid replacement sites while rmlC had 351 polymorphic sites, 119 of which result in amino acid replacement.



View larger version (77K):
[in this window]
[in a new window]
 
Fig. 1. Aligned informative sites of the rml genes of 12 S. enterica strains exclusive of M1952 (62,IIIa). Positions of the sites, numbered vertically, are listed above the sequence. The arrow between positions 2674 and 2710 indicates the region where M1911 (57,I) diverges from M269 (28,I), M303 (53,II) diverges from M287 (42,II), and where two O57 strains, M1911 (57,I) and M293 (57,II), and two O42 strains, M1790 (42,I) and M287 (42,II), become similar. The arrow between positions 2758 and 2761 indicates the region where M32 (E1,I) and P9003 (B,I) diverge. Note that the gene lengths of rmlA and rmlC vary. rmlA of O53 strains is 873 bp and that of M269 (28,I) is 876 bp, while that of the other strains is 882 bp. rmlC of M285 (59,II) and M32 (E1,I) is 531 bp, and it is 534 bp, 540 bp, 549 bp and 552 bp for two O42 strains, two O11 strains, two O57 strains and M269 (28,I), respectively.

 


View larger version (76K):
[in this window]
[in a new window]
 
Fig. 4. Distribution of polymorphic sites in a 903 bp rmlB segment for 35 S. enterica strains, the small-segment replacements are highlighted in bold.

 
Average pairwise difference increases from rmlB to rmlC, with 3·7%, 3·7%, 9·5% and 28·4% for rmlB, rmlD, rmlA and rmlC, respectively. The divergence level for rmlC is extremely high. The difference between rmlC of P9003 (B,I), M269 (28,I) or M293 (57,II) and that of any other strain ranges from 33·5% to 44·9%. For the remaining comparisons, the differences for rmlC were up to 15·4%. In contrast, pairs of strains with the same O antigen all have very similar rmlC genes; the differences range from 1·1% to 4·4%.

For M1952 (62,IIIa), the percentage difference from other strains at the nucleotide level ranges from 24·1 to 25·1%, 28·2 to 29·7%, 21·1 to 24·6% and 24·7 to 35·0% for rmlB, rmlD, rmlA and rmlC, respectively. Amino acid sequence comparison of the four rml genes of M1952 (62,IIIa), S. enterica LT2 and E. coli K-12 shows similar levels of difference for all three pairwise comparisons (data not shown).

The rmlB and rmlA genes of M1939 (60,V) are thought to be involved in the biosynthesis of a dTDP-sugar other than rhamnose (see above). We compared these with those involved in rhamnose pathways. rmlB of M1939 (60,V) shows 5·3% to 9·4% difference from that of the other 10 O antigens if M1952 (62,IIIa) is excluded, while rmlA has 36·7% to 38·1% difference. It seems that rmlB of M1939 (60,V) has a recent common ancestor with that involved in the rhamnose pathway of S. enterica, while rmlA of M1939 (60,V) comes from a different source.

Evidence for recombination in the rml gene set
The neighbour-joining method was used to construct individual rml gene trees (Fig. 2). rml genes of three E. coli strains are also included (Marolda et al., 1999 ; Rajakumar et al., 1994 ; Stevenson et al., 1994 ). If the rml gene sets have all evolved from the same ancestral set and there have been no recombination events, it would be expected that the four gene trees would have similar topology. The most notable feature of this analysis is, however, the variation among the four rml trees. In the rmlB tree, strains from subspecies I and II are clustered in two separate branches except for M285 (59,II) which is apart from other strains mainly because of a cluster of 35 sites at which M285 differs from others (see above). The rmlD tree is similar to the rmlB tree except that M269 (28,I) and M1911 (57,I) no longer cluster with subspecies I strains. In the rmlA tree, two O57 strains of subspecies I and II are grouped with M269 (28,I) in a separate branch, and subspecies II strains are no longer clustered in one branch. The level of variation is greater for rmlA, but if we exclude the 3' 136 bp sequence, it resembles that for the rmlB and rmlD trees (data not shown). In the rmlC tree, genes from strains of the same O antigen but different subspecies group together, and genes of O antigen B, O28 and O57 strains became very divergent from those of other O antigens. As expected, in all four rml trees, M1952 (62,IIIa) forms a deep branch away from the other S. enterica strains. The rmlB, rmlD and rmlA genes of the three E. coli strains form a separate branch and are more divergent than those of S. enterica. As observed in S. enterica, the difference levels increase remarkably in rmlC, with E. coli O7 and Flexneri grouped with S. enterica M1952 (62,IIIa) in a deep branch and rmlC of E. coli K-12 becomes extremely divergent from those of the other two strains. The non-congruence of the four rml gene trees of S. enterica, as discussed below, is attributed to recombination events.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 2. Inferred gene trees based on complete sequences of rmlB (17 alleles), rmlD (17 alleles), rmlA (17 alleles) and 480 bp of rmlC (15 alleles; the 3' end of rmlC from position 481 is excluded because of poor alignment). The trees were constructed by the neighbour-joining method from a matrix of pairwise distances based on all nucleotide sites. The number adjacent to a node indicates the percentage of 1000 bootstrap trees that contain the node. Only numbers bigger than 50% are indicated. rml genes (AB002668) of Haemophilus actinomycetemcomitans (Hac) are used as outgroups for each tree. EcO7, E. coli O7; EcK-12. E. coli K-12, Ecfl; E. coli Flexneri.

 
As noted above, inspection of the informative sites shows that rmlB, rmlD and most of rmlA have a strong tendency to be subspecies specific (Fig. 1). We can identify consensus sequences for subspecies I and II, although subspecies I or II sequence may sometimes be found in other subspecies. This is summarized in Fig. 3. It can be seen firstly that the junctions in subspecies specificity are generally not the junctions between genes, and secondly that correlation of sequence with subspecies is in the order rmlB>rmlD>rmlA. Four of six subspecies I strains have ‘subspecies I’ sequence for rmlB, rmlD and most of rmlA while the other two subspecies I strains have subspecies I sequence for only part of rmlB or for rmlB and part of rmlD. All four subspecies II strains have ‘subspecies II’ sequence for rmlD and one or both of rmlB and most of rmlA.



View larger version (57K):
[in this window]
[in a new window]
 
Fig. 3. Diagrammatic representation of polymorphic sites in rml gene sets of 13 S. enterica strains. Each pattern represents a group of closely related sequences. // indicates sequences that show extreme divergence from corresponding regions in any other strain under study. The numbers I, II, etc. indicate the inferred subspecies-specific DNA. Arrows indicate the location of the chi sites. The arrow above the box indicates a chi site in the same orientation as transcription and arrows below the box indicate the reverse orientation.

 
In contrast, the sequence of rmlC and the 3' end of rmlA is not correlated with subspecies. For the three pairs of strains with the same O antigen but of different subspecies, this part of the sequence is O antigen specific.

The situation is complex in that there are often other breaks in the level of similarity and pattern of polymorphic sites, which indicate recombination. The first 1330 bp of M324 (11,VI) (Fig. 1) is apparently from a subspecies other than I or II. Evidence from two other subspecies VI strains, as we show below, suggests that this sequence is subspecies VI-specific. The remaining sequence of M324 (11,VI) has subspecies I sequence. In M269 (28,I), the first 700 bp of rmlB has typical subspecies I sequence, but from around 700 to 1735 it has subspecies II sequence and from positions 1758 to 1870, it is like subspecies I again. The next segment, from position 1873 to 2710, has sequence which does not resemble that of subspecies I or II but is shared by corresponding regions of M1911 (57,I) and M293 (57,II), although the first 1450 bp of M1911 (57,I) and first 2065 bp of M293 (57,II) have sequences typical of the subspecies to which they belong. For M1911 (57,I), the segment between positions 1451 and 2071 is subspecies II-like. The mosaic structures of these strains account for their abnormal groupings in the rml trees as discussed above. The two O53 strains are similar throughout rmlB, rmlD and rmlA and are subspecies II-specific except for the 3' end of rmlA. Both lack the rmlC gene.

Recombination is also proposed to account for the abrupt similarity changes at the 3' end of rmlC of E1, O11 and O42 strains. The sequences for these strains are highly similar from the 3' end of rmlA to the point 180 bp from the 3' end of rmlC, with a mean difference of 1·9% (0·8–3·3%) for the 5' 370 bp of rmlC. In contrast, the similarity level in the remaining 3' end rmlC is much lower, with the differences ranging from 33·3% to 42·0%.

It is noteworthy that there are chi-like sequences, indicated in Fig. 3, located adjacent to the junctions of the segments of different origins in M269 (28,I) (1727 5'-TCTGGTGG-3' 1734, 1841 5'-GCTGGTCG-3' 1834), M293 (57,II) (1841 5'-GCTGGTCG-3' 1834) and M1911 (57,I) (1841 5'-ACTGGTTG-3' 1834). The orientation is correct for chi-stimulated homologous recombination. The segments separated by chi are also shown by the Stephens test (Stephens, 1985 ) to be highly significant partitions for clustered polymorphic sites (data not shown).

Subspecies variation and evolutionary relationships of S. enterica rmlB genes
As discussed above for subspecies I and II, the 5' end of the rml gene set, particularly rmlB, is commonly subspecies specific. To examine if the phenomenon applies more generally, a 903 bp segment of the rmlB gene extending from positions 16 to 918 was sequenced for 23 strains from seven subspecies. Most of these strains have O antigens 11, 42, 53, 57 or 59, each present in three to six subspecies (Popoff & Le Minor, 1997 ). Where possible, strains from different subspecies were chosen for each O antigen. We also included rmlB of M1939 (60,V) which we believe is involved in synthesis of a sugar other than rhamnose. Together with rmlB from the 11 previously sequenced rml gene sets, a total of 35 rmlB sequences are available for analysis.

If we exclude the extremely divergent strain M1952 (62,IIIa), there are 204 polymorphic sites and 152 informative sites. Analysis of the distribution of polymorphic sites (Fig. 4) revealed that sequences from subspecies I, II and the 5' half of IV each have distinct subspecies-specific polymorphic sites regardless of O antigen. All nine subspecies I strains representing seven O antigens have subspecies I-specific sequences. Four out of seven subspecies II strains representing four O antigens have a common sequence, which we treat as subspecies II specific. Four out of five subspecies IV strains representing four O antigens have a similar sequence for the 5' half of the gene, then M1642 (11,IV) and M1894 (53,IV) become subspecies I-like from around position 500 while M1917 (57,IV) and M1799 (42,IV) are subspecies II-like. The three strains from subspecies VI have almost identical polymorphic sites that are unique to these strains and hence very likely subspecies VI-specific, but note that all have antigen O11 which is the only rhamnose-containing O antigen present in subspecies VI.

There is no sequence specific for subspecies IIIa or IIIb strains, for which the entire rmlB genes resemble those from other subspecies (Table 3, Fig. 4). This is also the case for a few subspecies II and IV strains. Evidence for intragenic recombination is also found for a number of strains (Table 3). Some of these segments are confirmed by the Stephens test to be partitions with significant P values for clustered polymorphic sites (data not shown). Others, although not detected by the Stephens test, were found to contain clustered polymorphic sites typical for other subspecies when examined by the Maximum chi-squared program, which allows one to compare the distribution of polymorphic sites in two ‘parental’ sequences and a potential ‘recombinant’ sequence with that expected to occur by chance. For most of these segments, the high level of similarity and the distinct subspecies-specific sequences suggested they were generated as a result of inter-subspecies recombination. Interestingly, the cluster of 35 polymorphic sites mentioned above in the rmlB gene of M285 (59,II) is also shared by M1914 (57,II) and M1915 (57,IIIb) (positions 726–870). The high level of divergence of this segment and of another segment shared by M1795 (42,IIIa) and M1796 (42,IIIb) at positions 102–151, suggests that they have been derived from other species.


View this table:
[in this window]
[in a new window]
 
Table 3. Inferred recombination events in rmlB genes of 22 strains

 
We used the 5' segment from positions 1 to 476 to construct a tree to show subspecies relationships (Fig. 5), as the 3' half of rmlB of subspecies IV strains is from either subspecies I or II. The subspecies VI branch is close to that of subspecies I, followed by those of subspecies IV and II. The one strain from subspecies V forms a deep branch next to the subspecies I cluster. Two O57 strains, M1914 (57,II) and M1915 (57,IIIb), are the most divergent, forming a branch apart from others.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5. Inferred gene tree for the 5' 476 bp rmlB segment for 35 S. enterica strains, constructed by the neighbour-joining method from a matrix of pairwise distances. E. coli K-12 (EcK-12) was used as the outgroup. The values adjacent to a node indicate the percentage of 1000 bootstrap trees that contain the node. Only those greater than 50% are shown except those on the node of two subspecies branches. * indicates the sequences used to evaluate the variation of rml genes for comparison with the variation in housekeeping genes.

 

   DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
In this study, we have sequenced the O antigen rml genes from 11 S. enterica strains representing seven O antigens. Including published data, we now have sequences of the four rml genes for at least one representative of each rhamnose-containing O antigen. We have also sequenced a 903 bp segment of the 5' gene rmlB from another 23 strains to give good subspecies representation for several O antigens.

Variation at the 5' and 3' ends of the rml gene set
There is a gradient in the nature of variation along the rml gene set, but we will focus first on genes at the two ends. S. enterica has a distinct subspecies structure, with housekeeping gene sequence variation correlating with subspecies (see Introduction). The rml genes are not housekeeping genes, but at least for subspecies I, II, IV and VI, the variation at the 5' end of the rml gene set is subspecies specific. For comparison of the level of the variation in rmlB genes with that in housekeeping genes, we have chosen for analysis two sequences from each of the four well represented subspecies (marked * in Fig. 5), as housekeeping gene studies usually have equal representation from each subspecies. The mean difference is 3·69%, which is within the range (3·8–4·6%) previously estimated for several housekeeping genes (Boyd et al., 1994 ; Nelson & Selander, 1994 , 1992 ; Nelson et al., 1991 ; Thampapillai et al., 1994 ). The rmlB tree and a tree based on the combined coding sequences of five housekeeping genes (Selander et al., 1996 ) shows that the relationships for subspecies I, II and VI are similar for the two trees. The exception is subspecies V which, on all previously used criteria, is the most divergent, but in the rmlB tree the subspecies V strain M1939 (60,V) is closest to the subspecies I cluster, branching from it after the other subspecies diverge. It should be noted that no subspecies V serovar has rhamnose in its O antigen and the rmlB gene of M1939 (60,V) is from a different, unknown, sugar pathway.

The three available E. coli rmlB genes group together but vary more than those of S. enterica (Fig. 2). The level of rmlB difference between E. coli and S. enterica is about 20%, with a similar value for rmlD and rmlA (excluding the 3' end of rmlA). This is only a little greater than the 15% difference estimated for the two species (Sharp, 1991 ) and is consistent with the 5' rml genes having been in E. coli and S. enterica since the two species diverged. Thus the variation within S. enterica and between S. enterica and E. coli is consistent with the genes having diverged as the species and subspecies diverge.

However, the rmlB genes of the two S. enterica O57 strains form a highly divergent separate branch, but are still closer to other S. enterica sequences than to the E. coli sequences, suggesting that they came from an as yet unidentified subspecies of S. enterica.

The situation at the 3' end of the rml gene set is quite different. The 3' end of rmlA and all of rmlC are much more variable than are the genes discussed above, and the variation at this end of the gene set is clearly O antigen and not subspecies specific. The G+C content is also much lower than that of the 5' end, and very close to that of the central serogroup-specific region as determined for gene clusters of O antigens B, C2, D1, D2, D3 and E1 (Brown et al., 1991 ; Jiang et al., 1991 ; Liu et al., 1991 ; Wang et al., 1992 ; Xiang, 1995 ). Apparently, the 3' rml genes have a different evolutionary history from the 5' rml genes. We suggest that rmlC and the 3' end of rmlA are commonly transferred with the glycosyl-transferase and O antigen processing genes, which determine O antigen specificity, and are generally in the central region of the gene cluster. The high level of sequence variation in the 3' rmlC gene indicates the divergent sources for these O antigen gene clusters (see below and Fig. 6.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 6. Inferred tree based on 60 aligned amino acid sequences at the 3' end of rmlC of 20 species. Se, S. enterica; EcO7, E. coli O7; EcK-12, E. coli K-12; Ecfl, E. coli Flexneri; Kpn, Klebsiella pneumoniae (AF097519); Bps, Burkholderia pseudomallei (AAD05456); Hac, Haemophilus actinomycetemcomitans (AB002668); Nme, Neisseria meningitidis (L09188); Sma, Serratia marcescens (AF038816); Syn, Synechocystis (D90915); Pho, Pyrococcus horikoshii (BAA29502); Bfr, Bacteroides fragilis (AAD40710); Efa, Enterococcus faecalis (AAC35921); Sph, Sphingomonas (U51197); Mtu, Mycobacterium tuberculosis (Z95390); Lin, Leptospira interrogans (U61226); Smu, Streptococcus mutans (D78182); Spn, Streptococcus pneumoniae (U09239); Xca, Xanthomonas campestris (L23941); Sme, Sinorhizobium meliloti (Z79692); Rhi, Rhizobium (AE000074); Mth, Methanothermobacter thermoautotrophicus (AE000933).

 
Mechanisms for the generation of the subspecies-specific 5' rmlB gene
The correlation of rmlB sequence variation with subspecies indicates that this gene was in S. enterica prior to subspeciation and diverged with the subspecies. However, the distribution of O antigen forms among subspecies suggests that O antigens transfer readily between subspecies, and this is supported by the correlation of variation in the 3' rml genes with O antigen specificities.

We see three possibilities for the maintenance of subspecies specificity of some 5' rml genes given the high level of lateral transfer of O antigen gene clusters:

1. It could be that rml-containing O antigen gene clusters commonly transfer between subspecies by recombination within the rml genes, the 5' end of the gene cluster gaining subspecies-specific sequence in the process.

2. The gene clusters may transfer as complete gene clusters, with the 5' rml gene set being replaced later by genes from another strain of the same subspecies.

3. The O antigen gene clusters currently present in each subspecies have been there since subspecies divergence.

In support of option 1 and against option 3 in particular is the observation that the gnd gene, located downstream of the O antigen gene cluster, often has a chimeric structure with the 5' and 3' parts of the gene from different subspecies, suggesting that the 5' end of the gene had been transferred between subspecies together with the O antigen gene cluster (Nelson & Selander, 1994 ; Thampapillai et al., 1994 ). This is similar to the situation we now observe for the 5' end of the rml gene set located at the other end of some O antigen gene clusters. The presence of both gnd and rml genes with sequence of mixed subspecies origins argues strongly in favour of the movement of O antigen genes between subspecies and argues against option 3. In addition, option 3 can not account for the high level of similarity of rmlC genes from the strains which have the same O antigen but come from different subspecies, while option 1 can.

Option 2 seems highly improbable. There appears to be no obvious selection pressure maintaining the subspecies-specificity of the 5' rml genes, as indicated by the comparable ratios of synonymous to nonsynonymous nucleotide substitutions among rmlB genes within and between subspecies (data not shown).

We conclude that option 1, that lateral transfer of rml-containing O antigen gene clusters generally involves recombination within the rml gene set fits all the known data and provides the best explanation for the 5' end of the rml gene set being subspecies specific.

Recombination involving rml gene-containing O antigen gene clusters
The chimeric structures of rml genes in S. enterica throw light on the origins of the extant O antigen gene clusters. The junctions between segments shown in Fig. 3 are presumably the sites of homologous recombination between the donor and the recipient O antigen gene clusters involved in lateral transfer. In general, recombination products that survive will be those that give a clone a new O antigen, which is favoured by natural selection. This will be achieved by the transfer of the central O antigen-specific region, which will often bring with it part of the adjacent common DNA, in this case, the 3' portion of rml gene set. We see two levels of recombination, that is inter- and intra-species recombination. The scenario illustrated in Fig. 7 could explain the results observed in this study. Clusters Ise and IIse represent two rml-containing O antigen gene clusters present in different subspecies of S. enterica since subspecies divergence. Clusters {alpha} and ß are two rml-containing polysaccharide gene clusters recently transferred to S. enterica from other species. The inter-species recombination between clusters Ise and {alpha} is shown at the 3' end of rmlA as this appears to be the most common site, resulting in cluster IVse. The O antigen is still of type {alpha} but most of the rml genes are those of S. enterica.



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 7. Hypothetical recombination events involving rml gene-containing O antigen gene clusters. The diagrams only represent recombination in rml genes at the upstream end of the O antigen gene cluster, recombination at the downstream end presumably occurring in genes flanking the O antigen gene cluster is not shown. Clusters Ise, IIse and IIIse are original O antigen gene clusters from different subspecies of S. enterica with cluster Ise and IIse containing rml genes while cluster IIIse does not. Clusters {alpha} and ß are from polysaccharide clusters of other species. Clusters IVse and Vse are S. enterica clusters formed as a result of inter-species recombination between Ise or IVse and {alpha} or ß. Clusters VIse, VIIse and VIIIse are formed as a result of inter-subspecies recombination.

 
After capture by S. enterica, these new O antigen gene clusters could move among subspecies by homologous exchange within the rml gene set as illustrated for cluster IVse and IIse. If we apply the model to our data (Fig. 3), we find different recombination points among the rml genes at this level. For two O42 strains, M1790 (42,I) and M287 (42,II), the junction is at the 3' end of rmlA (represented by cluster VIse of Fig. 7). For two O11 strains, M1635 (11,I) and M324 (11,VI), the junction is in rmlD and apparently M324 (11,VI) retrieved rmlA, rmlC and most of rmlD from subspecies I (represented by cluster VIIse). For M1911 (57,I) and M269 (28,I), it seems that multiple recombination events have occurred in a series of lateral-transfer events.

For O antigens E1, O11 and O42, the gene clusters have highly similar sequences from the 3' end of rmlA to near the 3' end of rmlC, after which they become highly divergent. One possibility is that after one of these O antigen gene clusters (represented by IVse) established in S. enterica, there was recombination at the 3' end of rmlC with other incoming polysaccharide gene clusters (represented by cluster ß in Fig. 7) at the 3' end of rmlC, resulting in gene clusters which differ only in the 3' end of rmlC (represented by cluster Vse).

In the situation where only the donor cluster contains the rml gene set, as illustrated by the recombination between cluster IVse and IIIse (Fig. 7), we suggest that the exchange of O antigen clusters occurs by recombination within housekeeping genes upstream of the O antigen cluster, for example, within galF, resulting in the replacement of the whole rml gene set (VIIIse of Fig. 7). We have not looked at the galF sequence to confirm this but movement of O antigen by recombination within genes adjacent to an O antigen cluster has already been observed in the gnd gene (Thampapillai et al., 1994 ). There are many cases from subspecies IIIa and IIIb as well as a few from other subspecies in this study showing inter-subspecies recombination involving entire rmlB segments and most probably other rml genes as well. An example is seen in the rml gene set of M1891 (53,IIIb) which appears to have come from subspecies II. The only subspecies V strain in this study might have also derived its rmlB gene from subspecies I and then diverged within the subspecies.

Although it is obvious that O antigen gene clusters of S. enterica have undergone lateral gene transfer, the mechanism for such transfer is not clear. The finding of plasmid-born O antigens in S. enterica O54 (Popoff & Le Minor, 1985 ) and E. coli Sonnei (Viret et al., 1993 ) suggests that plasmids could be the vehicle.

The origins of the O62 gene cluster
The rml gene set of M1952 (62,IIIa) is atypical as its rml genes are all as divergent from those of other S. enterica or E. coli O antigens as are those of S. enterica and E. coli from each other. It appears that the rmlB, rmlD and rmlA genes of M1952 (62,IIIa), E. coli and S. enterica O antigens diverged at approximately the same time. It could be that O62 is one of the O antigens originally in S. enterica but that its rmlB, rmlD and rmlA genes have since evolved independently of those of the other S. enterica O antigens. O62 is only present in subspecies IIIa and it is also possible that its rml gene set was captured from a species closely related to S. enterica and E. coli.

Concluding remarks
The gene clusters of several polysaccharide antigens have been found to have a cassette structure with a central set of variable serogroup-specific genes flanked by highly homologous pathway genes or other genes common to all clusters of that antigen class. This pattern has been observed in the gene clusters of several structurally related O antigens in S. enterica (Brown et al., 1992 ; Jiang et al., 1991 ; Liu et al., 1991 ; Wang et al., 1992 ) and CPS (Coffey et al., 1998 ; Frosch et al., 1989 ; Kroll & Moxon, 1990 ; Kroll et al., 1989 ) as well as E. coli group II K antigens (Roberts, 1996 ). It has been hypothesized, based on these studies, that the outer conserved genes play a role in mediating the exchange of central serogroup-specific genes. This investigation of rml genes of S. enterica is, as far as we know, the first study focusing on the evolution of the outer conserved genes, which is necessary for our understanding of the evolution of O antigen gene clusters.

We have shown that the rml genes of S. enterica do play a role in mediating transfer of the serogroup specific genes and through this carry a record of the history of the O antigen clusters as they are transferred between subspecies. The 5' part of the rml gene set has subspecies specificity, indicating that it is only rarely included in inter-subspecies transfer. Only the 3' end of the rml gene set appears to represent DNA from the donor species, but even in the highly divergent sequences of the 3' end of rmlC, the S. enterica strains tend to group together (Fig. 6): some group with the E. coli sequences, but only the O57 and O28 sequences group with sequences from more distantly related species. It may be that transfer within the Enterobacteriacae is much easier than from more distantly related species and that many gene clusters in S. enterica have come via other enterobacterial species, in the process losing all of the rml sequences from their original sources in one of the many recombination events involved.

The rml genes are extremely useful in following the history of O antigen gene clusters within S. enterica, but because of recombination in these genes, they do not appear to be very useful for our original aim of determining the ultimate source of the gene cluster.


   ACKNOWLEDGEMENTS
 
This work was supported by a grant from the Australia Research Council. We especially thank Dr R. Lan and S. Jensen for many helpful discussions and suggestions. Q. Li is a recipient of T. C. Turland Scholarship from the University of Sydney.


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Bastin, D. A. & Reeves, P. R. (1995). Sequence and analysis of the O antigen gene (rfb) cluster of Escherichia coli O111. Gene 164, 17-23.[Medline]

Boyd, E. F., Nelson, K., Wang, F.-S., Whittam, T. S. & Selander, R. K. (1994). Molecular genetic basis of allelic polymorphism in malate dehydrogenase (mdh) in natural populations of Escherichia coli and Salmonella enterica. Proc Natl Acad Sci USA 91, 1280-1284.[Abstract]

Brown, P. K., Romana, L. K. & Reeves, P. R. (1991). Cloning of the rfb gene cluster of a group C2 Salmonella: comparison with the rfb regions of groups B and D. Mol Microbiol 5, 1873-1881.[Medline]

Brown, P. K., Romana, L. K. & Reeves, P. R. (1992). Molecular analysis of the rfb gene cluster of Salmonella serovar Muenchen (strain M67): genetic basis of the polymorphism between groups C2 and B. Mol Microbiol 6, 1385-1394.[Medline]

Clarke, B. R. & Whitfield, C. (1992). Molecular cloning of the rfb region of Klebsiella pneumoniae serotype O1:K20: the rfb gene cluster is responsible for synthesis of the D-galactan I O polysaccharide. J Bacteriol 174, 4614-4621.[Abstract]

Coffey, T. J., Enright, M. C., Daniels, M., Morona, J. K., Morona, R., Hryniewicz, W., Paton, J. C. & Spratt, B. G. (1998). Recombinational exchanges at the capsular polysaccharide biosynthetic locus lead to frequent serotype changes among natural isolates of Streptococcus pneumoniae. Mol Microbiol 27, 73-83.[Medline]

Comstock, L. E., Maneval, D.Jr, Panigrahi, P., Joseph, A., Levine, M. M., Kaper, J. B., Morris, J. G.Jr & Johnson, J. A. (1995). The capsule and O antigen in Vibrio cholerae 0139 Bengal are associated with a genetic region not present in Vibrio cholerae 01. Infect Immun 63, 317-323.[Abstract]

DeShazer, D., Brett, P. J. & Woods, D. E. (1998). The type II O-antigen polysaccharide moiety of Burkholderia pseudomallei lipopolysaccharide is required for serum resistance and virulence. Mol Microbiol 30, 1081-1100.[Medline]

Frosch, M., Weisgerber, C. & Meyer, T. F. (1989). Molecular characterization and expression in Escherichia coli of the gene complex encoding the polysaccharide capsule of Neisseria meningitidis group B. Proc Natl Acad Sci USA 86, 1669-1673.[Abstract]

Guidolin, A., Morona, J. K., Morona, R., Hansman, D. & Paton, J. C. (1994). Nucleotide sequence analysis of genes essential for capsular polysaccharide biosynthesis in Streptococcus pneumoniae type 19F. Infect Immun 62, 5384-5396.[Abstract]

Hobbs, M. & Reeves, P. R (1994). The JUMPstart sequence: a 39 bp element common to several polysaccharide gene clusters. Mol Microbiol 12, 855-856.[Medline]

Jiang, X. M., Neal, B., Santiago, F., Lee, S. J., Romana, L. K. & Reeves, P. R. (1991). Structure and sequence of the rfb (O antigen) gene cluster of Salmonella serovar typhimurium (strain LT2). Mol Microbiol 5, 695-713.[Medline]

Koplin, R., Wang, G., Hotte, B., Priefer, U. B. & Puhler, A. (1993). A 3·9 kb DNA region of Xanthomonas campestris pv. campestris that is necessary for lipopolysaccharide production encodes a set of enzymes involved in the synthesis of dTDP-rhamnose. J Bacteriol 175, 7786-7792.[Abstract]

Kroll, J. S. & Moxon, E. R. (1990). Capsulation in distantly related strains of Haemophilus influenzae type b: genetic drift and gene transfer at the capsulation locus. J Bacteriol 172, 1347-1379.

Kroll, J. S., Zamze, S., Loynds, B. & Moxon, E. R. (1989). Common organization of chromosomal loci for production of different capsular polysaccharides in Haemophilus influenzae. J Bacteriol 174, 3343-3347.

Kuhn, H. M., Meier, U. & Mayer, H. (1984). ECA, das gemeinsame Antigen der Enterobacteriaceae – Stiefkind der Mikrobiologie. Forum Microbiol 7, 274-285.

Liu, D., Verma, N. K., Romana, L. K. & Reeves, P. R. (1991). Relationships among the rfb regions of Salmonella serovars A, B, and D. J Bacteriol 173, 4814-4819.[Medline]

Lüderitz, O., Staub, A. M. & Westphal, O. (1966). Immunochemistry of O and R antigens of Salmonella and related Enterobacteriaceae. Bacteriol Rev 30, 192-255.[Medline]

Marolda, C. L., Feldman, M. F. & Valvano, M. A. (1999). Genetic organization of the O7-specific lipopolysaccharide biosynthesis cluster of Escherichia coli VW187 (O7:K1). Microbiology 145, 2485-2495.[Abstract/Free Full Text]

Mitchison, M., Bulach, D. M., Vinh, T., Rajakumar, K., Faine, S. & Adler, B. (1997). Identification and characterization of the dTDP-rhamnose biosynthesis and transfer genes of the lipopolysaccharide-related rfb locus in Leptospira interrogans serovar Copenhageni. J Bacteriol 179, 1262-1267.[Abstract]

Nelson, K. & Selander, R. K. (1992). Evolutionary genetics of the proline permease gene (putP) and the control region of the proline utilization operon in populations of Salmonella and Escherichia coli. J Bacteriol 174, 6886-6895.[Abstract]

Nelson, K. & Selander, R. K. (1994). Intergenic transfer and recombination of the 6-phosphogluconate dehydrogenase gene (gnd) in enteric bacteria. Proc Natl Acad Sci USA 91, 10227-10231.[Abstract/Free Full Text]

Nelson, K., Whittam, T. S. & Selander, R. K. (1991). Nucleotide polymorphism and evolution in the glyceraldehyde-3-phosphate dehydrogenase gene (gapA) in natural populations of Salmonella and Escherichia coli. Proc Natl Acad Sci USA 88, 6667-6671.[Abstract]

Ochman, H. & Lawrence, J. G. (1996). Phylogenetics and the amelioration of bacterial genomes. In Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology, pp. 2627–2648. Edited by F. C. Neidhardt and others. Washington, DC: American Society for Microbiology.

Ornellas, E. P. & Stocker, B. A. D. (1974). Relation of lipopolysaccharide character to P1 sensitivity in Salmonella typhimurium. Virology 60, 491-502.[Medline]

Popoff, M. Y. & Le Minor, L. (1985). Expression of antigenic factor O:54 is associated with the presence of a plasmid in Salmonella. Ann Inst Pasteur Microbiol 136B, 169-179.

Popoff, M. Y. & Le Minor, L. (1992). Antigenic Formulas of the Salmonella Serovars, 6th revision. Paris: WHO Collaborating Centre for Reference and Research on Salmonella, Institut Pasteur, Paris.

Popoff, M. Y. & Le Minor, L. (1997). Antigenic Formulas of the Salmonella Serovars, 7th revision. Paris: WHO Collaborating Centre for Reference and Research on Salmonella, Institut Pasteur.

Rajakumar, K., Jost, B. H., Sasakawa, C., Okada, N., Yoshikawa, M. & Adler, B. (1994). Nucleotide sequence of the rhamnose biosynthetic operon of Shigella flexneri 2a and role of lipopolysaccharide in virulence. J Bacteriol 176, 2362-2373.[Abstract]

Reeves, P. R. (1993). Evolution of Salmonella O antigen variation by interspecific gene transfer on a large scale. Trends Genet 9, 17-22.[Medline]

Reeves, P. R. (1995). Role of O-antigen variation in the immune response. Trends Microbiol 3, 381-386.[Medline]

Reeves, P. R. (1997). Specialized clones and lateral transfer in pathogens. In Ecology of Pathogenic Bacteria: Molecular and Evolutionary Aspects, pp. 237–254. Edited by B. A. M. van der Zeijst and others. Amsterdam: Elsevier.

Reeves, P. R., Farnell, L. & Lan, R. (1994). MULTICOMP: a program for preparing sequence data for phylogenetic analysis. Comput Appl Biosci 10, 281-284.[Abstract]

Reeves, P. R., Hobbs, M., Valvano, M. & 8 other authors (1996). Bacterial polysaccharide synthesis and gene nomenclature. Trends Microbiol 4, 495–503.[Medline]

Roberts, I. S. (1996). The biochemistry and genetics of capsular polysaccharide production in bacteria. Annu Rev Microbiol 50, 285-315.[Medline]

Saitou, N. & Nei, M. (1987). The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406-425.[Abstract]

Schmidt, G., Jann, B. & Jann, K. (1974). Genetic and immunochemical studies on Escherichia coli O14:K7:H-. Eur J Biochem 42, 303-309.[Medline]

Selander, R. K., Li, J. & Nelson, K. (1996). Evolutionary genetics of Salmonella enterica. In Escherichia coli and Salmonella: Cellular and Molecular Biology, 2nd edn, pp. 2691–2707. Edited by F. C. Neidhardt and others. Washington, DC: American Society for Microbiology.

Sharp, P. M. (1991). Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J Mol Evol 33, 23-33.[Medline]

Smith, J. M. (1992). Analysing the mosaic structure of genes. J Mol Evol 34, 126-129.[Medline]

Stephens, J. C. (1985). Statistical methods of DNA sequence analysis: detection of intragenic recombination or gene conversion. Mol Biol Evol 2, 539-556.[Abstract]

Stevenson, G., Neal, B., Liu, D., Hobbs, M., Packer, N. H., Batley, M., Redmond, J. W., Lindquist, L. & Reeves, P. R. (1994). Structure of the O-antigen of E. coli K-12 and the sequence of its rfb gene cluster. J Bacteriol 176, 4144-4156.[Abstract]

Thampapillai, G., Lan, R. & Reeves, P. R. (1994). Molecular evolution in the gnd locus of Salmonella enterica. Mol Biol Evol 11, 813-828.[Abstract]

Tsukioka, Y., Yamashita, Y., Oho, T., Nakano, Y. & Koga, T. (1997). Biological function of the dTDP-rhamnose synthesis pathway in Streptococcus mutans. J Bacteriol 179, 1126-1134.[Abstract]

Vinogradov, E. V., Shashkov, A. S., Knirel, Y. A., Kochetkov, N. K., Dabrowski, J., Grosskurth, H., Stanislavsky, E. S. & Kholodkova, E. V. (1992). The structure of the O-specific polysaccharide chain of the lipopolysaccharide of Salmonella arizonae O61. Carbohydr Res 231, 1-11.[Medline]

Vinogradov, E. V., Knirel, Y. A., Kochetkov, N. K., Schlecht, S. & Mayer, H. (1994). The structure of the O-specific polysaccharide of Salmonella arizonae O62. Carbohydr Res 253, 101-110.[Medline]

Viret, J.-F., Cryz, S. J.Jr, Lang, A. B. & Favre, D. (1993). Molecular cloning and characterization of the genetic determinants that express the complete Shigella serotype D (Shigella sonnei) lipopolysaccharide in heterologous live attenuated vaccine strains. Mol Microbiol 7, 239-252.[Medline]

Wang, L., Romana, L. K. & Reeves, P. R. (1992). Molecular analysis of a Salmonella enterica group E1 rfb gene cluster: O antigen and the genetic basis of the major polymorphism. Genetics 130, 429-443.[Abstract/Free Full Text]

Wilkinson, R. G., Gemski, P., Stocker, J. & Stocker, B. A. D. (1972). Non-smooth mutants of Salmonella typhimurium: differentiation by phage sensitive and genetic mapping. J Gen Microbiol 70, 527-554.[Medline]

Xiang, S.-H. (1995). Variation in rfb gene clusters of Salmonella enterica and origin of group D2. PhD thesis, University of Sydney.

Xiang, S.-H., Haase, A. M. & Reeves, P. R. (1993). Variation of the rfb gene clusters in Salmonella enterica. J Bacteriol 175, 4877-4884.[Abstract]

Received 16 February 2000; revised 1 May 2000; accepted 31 May 2000.