The Influence of Mutation, Recombination, Population History, and Selection on Patterns of Genetic Diversity in Neisseria meningitidis

K. A. Jolley*, D. J. Wilson{dagger}, P. Kriz{ddagger}, G. Mcvean{dagger} and M. C. J. Maiden*

* Peter Medawar Building for Pathogen Research and Department of Zoology, University of Oxford, Oxford, UK; {dagger} Peter Medawar Building for Pathogen Research and Department of Statistics, University of Oxford, Oxford, UK; and {ddagger} National Reference Laboratory for Meningococcal Infections, National Institute of Public Health, Prague, Czech Republic

Correspondence: E-mail: martin.maiden{at}zoo.ox.ac.uk.


    Abstract
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
Patterns of genetic diversity within populations of human pathogens, shaped by the ecology of host-microbe interactions, contain important information about the epidemiological history of infectious disease. Exploiting this information, however, requires a systematic approach that distinguishes the genetic signal generated by epidemiological processes from the effects of other forces, such as recombination, mutation, and population history. Here, a variety of quantitative techniques were employed to investigate multilocus sequence information from isolate collections of Neisseria meningitidis, a major cause of meningitis and septicemia world wide. This allowed quantitative evaluation of alternative explanations for the observed population structure. A coalescent-based approach was employed to estimate the rate of mutation, the rate of recombination, and the size distribution of recombination fragments from samples from disease-associated and carried meningococci obtained in the Czech Republic in 1993 and a global collection of disease-associated isolates collected globally from 1937 to 1996. The parameter estimates were used to reject a model in which genetic structure arose by chance in small populations, and analysis of molecular variation showed that geographically restricted gene flow was unlikely to be the cause of the genetic structure. The genetic differentiation between disease and carriage isolate collections indicated that, whereas certain genotypes were overrepresented among the disease-isolate collections (the "hyperinvasive" lineages), disease-associated and carried meningococci exhibited remarkably little differentiation at the level of individual nucleotide polymorphisms. In combination, these results indicated the repeated action of natural selection on meningococcal populations, possibly arising from the coevolutionary dynamic of host-pathogen interactions.

Key Words: genetic diversity • multilocus sequence typing • Neisseria meningitidis • nucleotide polymorphisms


    Introduction
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
Neisseria meningitidis, the meningococcus, is a globally distributed cause of bacterial meningitis and septicemia (Rosenstein et al. 2001). Despite its reputation as an aggressive pathogen (van Deuren, Brandtzaeg, and van der Meer 2000), this encapsulated gram-negative bacterium is a common commensal of the upper respiratory tract of humans that causes disease only infrequently, relative to its carriage prevalence of between 5% and 40% in human populations (Wenzel et al. 1973; Broome 1986). Invasive disease is not a route for transmission between hosts, and the meningococcus appears in this respect to be an "accidental pathogen" (Maiden 2002) that derives no long-term evolutionary benefit from the pathology that it causes (Levin 1996). The meningococcus is both antigenically and genetically highly diverse, but comparisons of disease-associated and carried meningococci have established that most invasive disease is caused by a minority of serological groups, as defined by the expression of particular capsular polysaccharides, and genotypes, as defined by combinations of alleles at housekeeping loci distributed around the chromosome (Caugant 1998; Maiden et al. 1998).

Multilocus sequence typing (MLST) (Maiden et al. 1998), a development of multilocus enzyme electrophoresis (Selander et al. 1986), is currently the most widely employed approach to cataloging genetic variation in the meningococcus. This nucleotide sequence–based technique examines seven housekeeping gene fragments, approximately 450 bp in length, derived from loci distributed around the single haploid chromosome. Each unique sequence is assigned an allele number in order of discovery, and each combination of seven allele numbers a sequence type (ST; equivalent to haplotype) number for identification. Surveys of meningococcal isolate collections have revealed clusters of related STs, referred to as clonal complexes. These clusters are thought to correspond to lineages of bacteria that share a recent common ancestor (Urwin and Maiden 2003).

Analyses of MLST data from isolates from both carriage and disease collections have identified a number of apparent paradoxes. Certain meningococcal STs, especially those belonging to the ST-1, ST-4, ST-5, ST-8, ST-11, ST-32, and ST-41/44 clonal complexes, are overrepresented in collections of disease-associated isolates relative to their frequency in collections of asymptomatically carried isolates. These meningococci, some of which have persisted over periods of decades and during global spread (Caugant 1998; Maiden et al. 1998), are referred to as "hyperinvasive." Given that pathogenicity does not aid meningococcal transmission, this persistence is unexpected. That is, the hyperinvasive lineages would be expected to die out quickly unless there is some other force acting; for example, if increased pathogenicity is an unavoidable consequence of increased carriage transmission efficacy. Furthermore, phylogenetic analyses have demonstrated evidence for frequent recombination in meningococcal populations (Feil et al. 1999; Holmes, Urwin, and Maiden 1999; Jolley et al. 2000). In the presence of frequent recombination, the high frequency of certain genotypes, as represented by STs and clonal complexes, is also unexpected, because recombination should act to introduce genetic novelty into existing genotypes.

The persistence of hyperinvasive lineages and the maintenance of clonal complexes in the presence of frequent recombination suggest that genetic diversity within meningococcal populations is repeatedly structured by selective forces. However, under simple population genetic models of recombination, mutation, and genetic drift, genetic structuring can occur simply by chance. Furthermore, geographically restricted gene flow can lead to structuring of genetic data, as would happen if transmission between hosts is typically limited to immediate neighbors. In the absence of estimates of the rates of mutation and recombination in natural populations, and without an understanding of the influence of geography on genetic variation, inferences about selection are premature.

Here, a systematic approach is taken to test hypotheses concerning the nature of the evolutionary forces that structure genetic variation in N. meningitidis populations. Three meningococcal isolate collections were analyzed, representing (1) isolates obtained from the carriage population in the Czech Republic, (2) a temporally and geographically equivalent set of isolates from individuals with meningococcal disease, and (3) the global diversity of meningococcal disease isolates in the latter part of the twentieth century. First, a coalescent approach was employed to estimate underlying parameters of mutation and recombination, providing estimates of the relative contribution of mutation and recombination to patterns of meningococcal diversity, and the size distribution of recombination fragments. These estimates allowed rejection of the null model, in which structure was a consequence of purely neutral processes. Furthermore, there was no evidence for differentiation among the carried meningococcal isolates found in different regions of the Czech Republic, so limitations in gene flow could not explain the observed level of genetic structuring, at least in this data set. Finally, although there were marked differences in sequence type and alleles at each locus between the Czech disease-associated and carried meningococci, these isolate collections exhibited very little differentiation at the level of individual nucleotide polymorphisms. In combination, these results demonstrated that the disease-associated and carried isolate collections represented different combinations of polymorphisms from a common gene pool, rather than different gene pools. In conclusion, we suggest that patterns of diversity in N. meningitidis may reflect a dynamic in which hyperinvasive lineages arise repeatedly and spread to moderate frequency through an increased transmission advantage but persist in a given transmission system for only short periods of evolutionary time.


    Methods
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
Bacterial Isolates
Three isolate collections were examined. The first collection consisted of 217 carried meningococci obtained from unrelated 15-year-old to 24-year-old individuals in the Czech Republic during 1993 (Jolley et al. 2000, 2002). Although there was an outbreak of meningococcal disease in this country during the early 1990s (Krizova and Musilek 1995), none of the isolates in this collection originated from an individual suffering from, or who had contact with a case of, invasive meningococcal disease. The second collection was a previously unpublished collection, comprising 53 of the 55 meningococci isolated from cases of invasive disease and submitted to the Czech National Reference Laboratory for Meningococcal Infections during 1993. This collection was geographically and temporally coincident with the isolates obtained from asymptomatic carriers. The third collection included 107 mainly disease-associated meningococci, isolated over a period spanning 1937 to 1996 from a variety of locations around the world. This collection was assembled with an intentional overrepresentation of meningococci belonging to the major disease-associated genotypes described by 1996 (Maiden et al. 1998).

Genetic Characterization
Each of the isolates from all three collections was characterized by multilocus sequence typing (MLST) employing previously published methods (Maiden et al. 1998; Holmes, Urwin, and Maiden 1999). Briefly, nucleotide sequences of seven fragments of housekeeping genes were determined on each strand, and each unique sequence was assigned an arbitrary "allele" number by reference to the Neisseria MLST database (http://pubmlst.org/neisseria/) (Jolley, Chan, and Maiden 2004). The combination of allele numbers for all seven loci of a given isolate was assigned an arbitrary sequence type (ST) number. Each ST was equivalent to a unique haplotype. The MLST data for the Czech carriage and global-disease–isolate collections have been published previously (Maiden et al. 1998; Jolley et al. 2000, 2002), the Czech disease-associated isolates are reported here for the first time.

Estimation of Population Mutation and Recombination Rates
Levels of genetic variation and the degree of association between alleles at different loci, or linkage disequilibrium (LD), are generated by the interaction between population history, mutation, and recombination. Under simple models of population history, the key parameters that determine patterns of variation are not the mutation and recombination rates themselves, but their product through the compound parameters {theta} = 2Neµ and {rho} = 2Ner, where Ne is the effective population size and µ and r are the per site per generation—in this case, transmission cycle—rates of mutation and recombination. Estimates of the scaled parameters can be obtained from empirical data using a variety of statistical methods. The moment estimator of Watterson (1975) was employed to estimate {theta}, and a composite likelihood method, referred to as LDhat (McVean, Awadalla, and Fearnhead 2002), was employed to estimate {rho} and the recombination tract length: the LDhat software is available from http://www.stats.ox.ac.uk/~mcvean). For collections with more than 100 sequences, multiple random subsets of 100 sequences were analyzed for computational tractability and the average across subsets reported. In addition, nonparametric, permutation-based tests implemented in LDhat were used to test the significance of the evidence for recombination (McVean, Awadalla, and Fearnhead 2002).

In bacteria, recombination typically occurs by the nonreciprocal incorporation of short DNA fragments. Over short physical distances, such recombination generates patterns of LD similar to those observed over short ranges in species with reciprocal crossover. However, over longer physical distances, the predictions of the two models are different in that LD in bacteria will tend to asymptote to a nonzero level, whereas LD between two distant loci in organisms with crossing over will tend to zero (McVean, Awadalla, and Fearnhead 2002). This difference gives power to estimate the average size of DNA fragments incorporated during recombination events. The fragment size incorporated was assumed to be drawn from a geometric distribution following Falush et al. (2001), and the parameter of the distribution (the average size of incorporated fragments, t) that maximized the composite likelihood in an analysis of all seven loci was found. For distant loci, the key parameter in determining levels of LD is the product of the per-site population recombination rate and the average size of the DNA fragment incorporated in recombination events, 2Nert. The average tract length was estimated from a concatenation of 3,284 bp from the Czech carriage population, comprising all seven gene fragments assembled in the order in which they occurred in the genome sequence of N. meningitidis Z2491 and in which the physical distance between loci was preserved (Parkhill et al. 2000).

Testing the Neutral Model
Estimates of the population mutation rate, population recombination rate, and average recombination fragment size were obtained under the assumption of a simple neutral model. To assess whether the patterns of genetic diversity observed differ significantly from the expectations of the standard neutral model, the data were summarized, for each locus and for the combined set, in terms of the Tajima D statistic (Tajima 1989), which measured departures from neutrality in the frequency spectrum of polymorphic sites. The Tajima D statistic is constructed such that it has an approximately normal distribution; however, significance levels were obtained by simulation using the estimated mutation and recombination rates (table 2).


View this table:
[in this window]
[in a new window]
 
Table 2 Estimates of Recombination and Mutation Rates

 
To assess the level of genetic structuring, we compared the number of STs (i.e., unique haplotypes) observed in the empirical data from the Czech carriage population to the distribution expected under the neutral model (Fu 1996), using the estimates of per-locus population mutation rate, population recombination rate, and the average tract length of recombination events obtained previously. Monte Carlo coalescent simulations were performed using software written by G.Mc.V. and available on request.

Geographical Structuring Within the Population
The degree of genetic variation among carried meningococcal isolates sampled from different locations within the Czech Republic was quantified by Wright's statistic FST, which is defined as the correlation between alleles within a subpopulation relative to alleles within the total population (Wright 1943,1951; Balding 2003). Isolates were sampled from seven locations within the Czech Republic (Jolley et al. 2000) but given the uneven distribution of numbers of isolates among sampling centers, these isolates were pooled into three geographic regions covering the whole country. Region A comprised Praha, Plzen, and Haradec Králové; region B comprised Ceské Budejovice and Kutna Hora; and region C comprised Olomouc and Opava. Analysis of molecular variation (AMOVA) (Excoffier, Smouse, and Quattro 1992), implemented in the software package ARLEQUIN version 2.0 (Schneider, Roessli, and Excoffier 2000), was used to assess the significance of the observed values of FST by means of a permutation test. The analysis was performed separately for each of the seven loci.

Assessing Differentiation Between Disease and Carriage Populations
We wished to assess both the nature and extent of genetic differentiation between disease and carriage isolates in the Czech sample. At one extreme, disease and carriage populations may have near-identical distributions of STs (unique haplotypes across all loci), locus haplotypes (unique haplotypes at a single MLST locus), and individual polymorphisms. At the other extreme, disease and carriage samples may represent entirely differentiated populations that share no common polymorphisms. To assess differentiation, it is possible to compare the frequency of STs, locus haplotypes, and SNPs between the populations. A simple metric of differentiation, which we refer to as the classification index, was devised:

where pij is the frequency of haplotype or polymorphism i in population j, and is the average frequency of p across the populations. When two populations have identical haplotype or polymorphism frequencies, the statistic is 0, and when there is no overlap between populations, the statistic is 1. For the case of biallelic loci, this is equivalent to FST, and for multiallelic loci or haplotype data, the statistic is more sensitive to differentiation than FST, as it compares allele/haplotype with allele/haplotype rather than a summary of diversity such as homozygosity. The statistic can intuitively be thought of as the probability of correctly classifying an allele into one population where classification is based on the difference in allele frequency between the populations. Whether the statistic was significantly different from 0 was assessed through permutation. Because of finite sample size, the statistic is expected to be positive under the null hypothesis of no differentiation, an effect that is greater for multiallelic loci. We, therefore, bias-corrected the statistic by subtracting the expectation under the null hypothesis as assessed by permutation. Note that this does not affect significance levels.


    Results
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
Genetic Diversity of Meningococcal Isolate Collections
The nucleotide sequence data are available at http://pubmlst.org/neisseria/. The 153 STs identified were unevenly distributed among the isolate collections, with only ST-11 present in all three data sets. The analyzed subsets of these data contained between 10 and 24 alleles per locus, with the number of segregating sites per locus ranging among loci from 16 (adk, Czech carriage collection) to 166 (aroE, global-disease collection). Levels of diversity were consistent for a given locus across the three isolate collections, indicating similar effective population sizes. Similar results were obtained with several different subsets of the data. Tajima's D statistic (Tajima 1989) was estimated for each locus and for the 3,284-bp concatenated sequences (table 1).


View this table:
[in this window]
[in a new window]
 
Table 1 Genetic Diversity of Loci Analyzed

 
Estimates of Recombination and Mutation
On the basis of the number of segregating sites, the population mutation rate (2Neµ, or {theta}) was estimated to be between 6.80 x 10–3 and 80.50 x 10–3 per site: these estimates were consistent within loci across the three data sets (table 2). In all isolate collections, there was highly significant evidence for recombination at all loci with the exception of the least-diverse locus, adk: none of the estimates of recombination in this locus were statistically significant. Estimates of the population recombination rate (2Ner), {rho}, ranged from 2.70 x 10–3 to 34.04 x 10–3 per site. These estimates were used to calculate the ratio of recombination to mutation rates (r/µ), which varied from 0.08 to 2.66 for all three isolate collections, with rates of 0.16 to 1.83 within the Czech carriage data set. As with the per-locus analyses, estimates of the per-site population mutation rate 2Neµ from the concatenated sequences varied little among the isolate collections, with values of 29.10 x 10–3, 30.50 x 10–3, and 30.59 x 10–3 for the carriage, Czech disease, and global-disease collections, respectively.

Estimates of Recombination Fragment Size
In analysis of the Czech carriage collection, the rate at which distant loci were separated by gene-conversion events was estimated as 2Nert = 28.2, and the average tract length was estimated to be 1.1 kb (fig. 1). Within LDhat, assessing confidence intervals was problematic because of nonindependence; however, as the log composite likelihood surface could be approximated as a multiple of the true log-likelihood surface, the values of 500 bp and 2.5 kbp were equally less likely, and 500 bp, the approximate size of the loci examined, was probably a lower bound on the average fragment size, otherwise LD would not decay consistently within loci (data not shown).



View larger version (10K):
[in this window]
[in a new window]
 
FIG. 1.— Relative log composite likelihood curve for estimation of the average tract length of recombination events. The fitted model assumes a geometric distribution of tract lengths, which is estimated to have an average of 1.1 kb.

 
Testing the Neutral Model
The hypothesis that patterns of genetic variation in the samples reflected chance events arising from the interaction of mutation and recombination in a population of constant size and in the absence of natural selection was examined by testing for departures from the model by looking at the allele frequency spectrum of biallelic polymorphisms using the Tajima D statistic and the difference between the observed and expected number of STs. The values of Tajima's D statistic for each sample collection are shown in table 1. Across loci, and in all sample collections (Czech carriage, Czech disease, and global disease), there was a tendency for positive statistics, indicative of a dearth of low-frequency variants; however, only for pdhC was the statistic significantly different from that expected under the coalescent accounting for recombination. Using concatenated sequences, only the statistic for Czech carriage was significant (P < 0.01). In contrast, the number of sequence types observed in the Czech carriage population (50) was significantly lower than expected (fig. 2) under the neutral model for the mutation and recombination parameters estimated. The positive, genome-wide, Tajima's D statistic was indicative of a low level of population structure (arising from either geographical restriction of gene flow or selective processes such as balancing selection) or a recent, weak (or strong but old) population bottleneck. Population growth or complete selective sweeps would, in contrast, tend to generate negative Tajima D values. Likewise, the relative deficit of STs implies that genetic structure is stronger than expected under the neutral model, again as might be expected under models with geographical restriction of gene flow, selective processes, or population bottlenecks.



View larger version (9K):
[in this window]
[in a new window]
 
FIG. 2.— Comparison of the observed number of STs in the Czech sample of carried meningococci to the distribution expected under the neutral coalescent model using parameters estimated from the data.

 
Estimates of Geographic Differentiation
To test the hypothesis that geographically restricted gene flow was the cause of the structuring observed in meningococcal diversity, the evidence for differentiation between the isolates obtained from the seven sampling sites in the Czech carriage collection (Jolley et al. 2000) was assessed. FST values were not significantly different from 0 for all except the fumC locus (which is nonsignificant after Bonferroni correction) (table 3). This indicates that geographical factors could not explain the observed level of genetic structuring.


View this table:
[in this window]
[in a new window]
 
Table 3 Geographic Differentiation Among Carried Meningococci in the Czech Republic in 1993

 
Differentiation Between Disease and Carriage Samples
The classification index was calculated for comparisons of the Czech carriage and disease-isolate collections for STs, locus haplotypes, and nucleotide polymorphisms. Strong differentiation was observed at the ST level, with weaker differentiation at the level of locus haplotypes, and low or no significant differentiation at the nucleotide-polymorphism level (table 4). These results indicated that disease and isolate collections represented significantly different sets of STs, as established by the existence of hyperinvasive lineages and, to a lesser extent, haplotypes at individual loci; however, individual polymorphisms were not only shared between disease and carriage populations but also occurred at very similar frequencies. In conclusion, disease and carriage populations, although significantly differentiated, represented different combinations of individual polymorphisms from a common gene pool.


View this table:
[in this window]
[in a new window]
 
Table 4 Differentiation Between the Czech Carriage and Disease Isolate Collections

 

    Discussion
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
A systematic analysis of the factors influencing genetic diversity in the pathogen N. meningitidis has been undertaken. The estimated parameters for mutation and recombination demonstrated that patterns of diversity, as measured by absolute levels of polymorphism, allele frequency spectra, and linkage disequilibrium, were very similar in collections of disease-associated and carried meningococci, despite the dominance of the disease-associated isolate collections by hyperinvasive lineages. In addition, the effective population size scaled rates of mutation and recombination, and the average size of DNA fragments incorporated by recombination were estimated. Although there was no strong signal of major changes in population size, as demonstrated by the largely nonsignificant Tajima D statistics, the presence of a few high-frequency STs and their associated clonal complexes in the carriage population was incompatible with the level of recombination observed. The analysis of the geographical distribution of diversity demonstrated that this structuring could not be explained by geographically restricted gene flow. We, therefore, propose that selective forces, most likely associated with the repeated origin of hyperinvasive lineages, act repeatedly to structure meningococcal populations over evolutionary time.

Structuring of Genetic Variation by Natural Selection
A number of selective forces might act to structure genetic variation. Genetic variation among isolates in factors such as transmission route or resistance to host immune genotypes would lead to cryptic stratification within populations; that is, the pathogen population could be divided into isolated subpopulations, each specializing in the colonization of different groups of hosts. In this scenario, gene flow between subpopulations would be most greatly limited at sites closely genetically linked to genes responsible for host specialization. This process could, however, exert an influence over the entire genome through hitchhiking (Maynard Smith and Haigh 1974). An extension of these ideas is that stratification is a dynamic process, emerging from the ongoing coevolution between the pathogen and the host immune system. New variants that are more efficient at transmitting (e.g., because they have novel antigenic repertoires) may spread through the population rapidly at first, but subsequently transmission rates decrease as potential hosts become resistant as a consequence of prior infection.

These ideas are related to the epidemic-clone model (Maynard Smith et al. 1993), in which genetic variation in natural pathogen populations is repeatedly structured by the origin of novel pathogen types that cause epidemics of limited duration. In the case of the meningococcus, this model is apparently inapplicable, as transmission is not disease associated (Levin and Bull 1994; Maiden 2000); however, increased disease risk, although not adaptive to isolates, could be an unavoidable consequence of increased transmission efficiency. The ST-11 complex and the other hyperinvasive lineages, could, therefore, be recent in evolutionary origin and associated with increased transmission rates, perhaps through the possession of novel genotypes at antigen-encoding loci, such as the capsule operon. Such variants would tend to rapidly increase in frequency, generating a hitchhiking event at linked sites and, therefore, generating high-frequency STs. As host immunity increased at the population level, so the selective advantage of the novel variant would decrease, preventing any single variant from dominating the entire population. This dynamic would lead to repeated structuring of genetic variation and differentiation between disease-associated and carrier isolates at the ST level, as reported here. The repeated and independent origin of such variants would mean that disease-associated and carriage populations would share a common gene pool at the level of nucleotide polymorphisms; this is also consistent with our findings. More extensive genome-wide analyses of variation would identify candidate genes for disease association through showing elevated levels of structuring.

Evolutionary Parameter Estimates
Estimates of the population-mutation rate, the population-recombination rate, and the average size of DNA fragments introduced by recombination enabled a comparison of the relative influence of recombination and mutation on patterns of diversity observed. The relative rate of recombination to mutation has been proposed to be of fundamental importance in determining the degree of clonal structure present in a given bacterial population (Feil et al. 1999, 2000, 2001). In terms of intralocus diversity, novel haplotypes of a locus of length L sites can be generated either by mutation, at rate µL, or by a recombination event in which at least one end of the incorporated fragment lies within the locus, at rate rL. The ratio r/µ, therefore, places an upper bound on the relative contribution of recombination to mutation in generating haplotype diversity. For the Czech carriage collection, this ratio, as calculated from the ratio of estimates, {rho}/{theta}, ranges from 0.16 to 1.83, which indicates that recombination and mutation are of roughly equal importance in generating allelic diversity within a locus.

It is also informative to consider the relative role of recombination and mutation in generating diversity at the interlocus or genome level. A given single-nucleotide site in the genome will change by mutation at rate µ and by recombination at the rate at which gene conversion events that include that site occur, rt/2, multiplied by the probability that the incorporated site has a different nucleotide, the average pairwise difference ({pi}). Using the Czech carriage data set, the relative impact of recombination to mutation can be estimated by multiplying {pi}, which ranges from 0.009 for adk to 0.067 for aroE, by the tract length, t, estimated at 1.1 kbp (fig. 2), and the r value for the appropriate locus divided by 2. Across loci, estimates of this ratio lie in the remarkably narrow range of 6.2 to 16.8, which were within the range of "r/m" values of 4 to 100 calculated previously for the set of 107 global-disease isolates with counting techniques (Feil et al. 1999, 2000, 2001). In summary, in terms of generating novel genomes, recombination is roughly 10 times as important as mutation.

The average tract length in recombination events was estimated to be 1.1 kb. This is considerably less than the value of 7.6 kb (Feil et al. 2000) and 5 to 10 kb (Linz et al. 2000) estimated from direct observation. The discrepancy is most likely a result of the difference between those events that occur and those that persist over evolutionary time. Larger tracts will introduce more nucleotide differences into the existing genome and, if epistatic interactions are important, are more likely to lead to a decrease in fitness (Zhu et al. 2001) than will shorter tracts. Shorter recombination tracts are less likely to lead to fitness decreases but are also harder to detect by direct methods. It is also worth noting that the average size of coding sequences in the meningococcal genome, at 852 bp (Parkhill et al. 2000), is smaller than the current estimate of the average size of imported fragments. Consequently, many recombination events will include complete coding sequences. In other words, gene replacement events would be more common than the generation of new alleles with mosaic structure when compared with the rates observed in bacteria that exhibit recombination fragment sizes that are smaller than the average coding sequence, such as Helicobacter pylori (Falush et al. 2001).

Origin of Hyperinvasive Lineages
The emergence of pathogenic variants within populations of commensal organisms can be rationalized when the disease syndrome contributes to the spread of the pathogen (May and Anderson 1983); however, it is more difficult to explain the emergence and persistence of pathogenic variants in populations of bacteria such as N. meningitidis, where the disease syndromes caused do not generate opportunities for host-to-host spread and, indeed, are inimical to it (Levin and Bull 1994; Maiden 2000). As suggested above, this apparent paradox is resolved if disease is a consequence of increased transmission efficacy in recently arisen or introduced variants. The behavior of the ST-11 clonal complex is particularly illustrative of this effect. In both the Czech disease and carriage collections, the most common ST was ST-11, the principal, or "central" (Urwin and Maiden 2003), ST of the ST-11 complex (previously called the ET-37 complex), a hyperinvasive lineage that has caused disease globally for at least the past 40 years (Wang et al. 1993; Maiden et al. 1998). Previous studies indicated that normally these meningococci exhibit very low point prevalence in carriage, even during disease outbreaks (Caugant et al. 1988; Feavers et al. 1999; Kellerman et al. 2002), and before 1993, this hyperinvasive lineage was not found in the Czech Republic (Krizova and Musilek 1995). However, at the time of the 1993 sample, the Czech Republic was experiencing a major epidemic of serogroup C ST-11 complex meningococci. In short, a novel variant to which the population was naïve was sweeping through the carriage population and generating a large number of disease cases.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 
This work was funded by the Wellcome Trust. Part of the work performed in the Czech Republic was supported by grant No. NI/6882-3 from the Internal Grant Agency of the Ministry of Health of the Czech Republic. M.C.J.M. is a Wellcome Trust Senior Research Fellow and G.Mc.V. is a Royal Society University Research Fellow. D.J.W. is funded by a BBSRC research studentship.


    Footnotes
 
Pierre Capy, Associate Editor:


    References
 TOP
 Abstract
 Introduction
 Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Balding, D. J. 2003. Likelihood-based inference for genetic correlation coefficients. Theor. Popul. Biol 63:221–230.[CrossRef][ISI][Medline]

    Broome, C. V. 1986. The carrier state: Neisseria meningitidis. J. Antimicrob. Chemother. 18(Suppl. A):25–34.[ISI][Medline]

    Caugant, D. A. 1998. Population genetics and molecular epidemiology of Neisseria meningitidis. Apmis 106:505–525.[ISI][Medline]

    Caugant, D. A., B. E. Kristiansen, L. O. Frøholm, K. Bovre, and R. K. Selander. 1988. Clonal diversity of Neisseria meningitidis from a population of asymptomatic carriers. Infect. Immunol. 56:2060–2068.[ISI][Medline]

    Excoffier, L., P. E. Smouse, and J. M. Quattro. 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131:479–491.[Abstract/Free Full Text]

    Falush, D., C. Kraft, N. S. Taylor, P. Correa, J. G. Fox, M. Achtman, and S. Suerbaum. 2001. Recombination and mutation during long-term gastric colonization by Helicobacter pylori: estimates of clock rates, recombination size, and minimal age. Proc. Natl. Acad. Sci. USA 98:15056–15061.[Abstract/Free Full Text]

    Feavers, I. M., S. J. Gray, R. Urwin, J. E. Russell, J. A. Bygraves, E. B. Kaczmarski, and M. C. J. Maiden. 1999. Multilocus sequence typing and antigen gene sequencing in the investigation of a meningococcal disease outbreak. J. Clin. Microbiol. 37:3883–3887.[Abstract/Free Full Text]

    Feil, E. J., E. C. Holmes, D. E. Bessen et al. (12 co-authors). 2001. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc. Natl. Acad. Sci. USA 98:182–187.[Abstract/Free Full Text]

    Feil, E. J., M. C. J. Maiden, M. Achtman, and B. G. Spratt. 1999. The relative contributions of recombination and mutation to the divergence of clones of Neisseria meningitidis. Mol. Biol. Evol. 16:1496–1502.[Abstract]

    Feil, E. J., J. Maynard Smith, M. C. Enright, and B. G. Spratt. 2000. Estimating recombinational parameters in Streptococcus pneumoniae from multilocus sequence typing data. Genetics 154:1439–1450.[Abstract/Free Full Text]

    Fu, Y. X. 1996. New statistical tests of neutrality for DNA samples from a population. Genetics 143:557–570.[Abstract/Free Full Text]

    Holmes, E. C., R. Urwin, and M. C. J. Maiden. 1999. The influence of recombination on the population structure and evolution of the human pathogen Neisseria meningitidis. Mol. Biol. Evol. 16:741–749.[Abstract]

    Jolley, K. A., M. S. Chan, and M. C. Maiden. 2004. mlstdbNet—distributed multi-locus sequence typing (MLST) databases. BMC Bioinformatics 5:86.[CrossRef][Medline]

    Jolley, K. A., J. Kalmusova, E. J. Feil, S. Gupta, M. Musilek, P. Kriz, and M. C. Maiden. 2000. Carried meningococci in the Czech Republic: a diverse recombining population. J. Clin. Microbiol. 38:4492–4498.[Abstract/Free Full Text]

    ———. 2002. Carried meningococci in the Czech Republic: a diverse recombining population. J. Clin. Microbiol. 40:3549–3550.[Free Full Text]

    Kellerman, S. E., K. McCombs, M. Ray et al. (11 co-authors). 2002. Genotype-specific carriage of Neisseria meningitidis in Georgia counties with hyper- and hyposporadic rates of meningococcal disease. J. Infect. Dis. 186:40–48.[ISI][Medline]

    Krizova, P., and M. Musilek. 1995. Changing epidemiology of meningococcal invasive disease in the Czech Republic caused by new clone Neisseria meningitidis C:2a:P1.2(P1.5), ET-15/37. Central Eur. J. Public Health 3:189–194.[Medline]

    Levin, B. R. 1996. The evolution and maintenance of virulence in microparasites. Emerg. Infect. Dis. 2:93–102.[ISI][Medline]

    Levin, B. R., and J. J. Bull. 1994. Short-sighted evolution and the virulence of pathogenic microorganisms. Trends Microbiol. 2:76–81.[CrossRef][Medline]

    Linz, B., M. Schenker, P. Zhu, and M. Achtman. 2000. Frequent interspecific genetic exchange between commensal neisseriae and Neisseria meningitidis. Mol. Microbiol. 36:1049–1058.[CrossRef][ISI][Medline]

    Maiden, M. C. 2002. Population structure of Neisseria meningitidis. Pp. 151–170 in C. Ferreirós, M. T. Criado, and J. Vázquez, eds. Emerging strategies in the fight against meningitis: molecular and cellular aspects. Horizon Scientific Press, Wymondham, Norfolk, United Kingdom.

    Maiden, M. C. J. 2000. High-throughput sequencing in the population analysis of bacterial pathogens. Int. J. Med. Microbiol. 290:183–190.[ISI][Medline]

    Maiden, M. C. J., J. A. Bygraves, E. Feil et al. (13 co-authors). 1998. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA 95:3140–3145.[Abstract/Free Full Text]

    May, R. M., and R. M. Anderson. 1983. Epidemiology and genetics in the coevolution of parasites and hosts. Proc. R. Soc. Lond. B Biol. Sci. 219:281–313.[ISI][Medline]

    Maynard Smith, J., and J. Haigh. 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23:23–35.[ISI][Medline]

    Maynard Smith, J., N. H. Smith, M. O'Rourke, and B. G. Spratt. 1993. How clonal are bacteria? Proc. Natl. Acad. Sci. USA 90:4384–4388.[Abstract/Free Full Text]

    McVean, G., P. Awadalla, and P. Fearnhead. 2002. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160:1231–1241.[Abstract/Free Full Text]

    Parkhill, J., M. Achtman, K. D. James et al. (28 co-authors). 2000. Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature 404:502–506.[CrossRef][ISI][Medline]

    Rosenstein, N. E., B. A. Perkins, D. S. Stephens, T. Popovic, and J. M. Hughes. 2001. Meningococcal disease. N. Engl. J. Med. 344:1378–1388.[Free Full Text]

    Schneider, S., D. Roessli, and L. Excoffier. 2000. Arlequin version 2.000: software for population genetic data analysis. University of Geneva, Geneva, Switzerland.

    Selander, R. K., D. A. Caugant, H. Ochman, J. M. Musser, M. N. Gilmour, and T. S. Whittam. 1986. Methods of multilocus enzyme electrophoresis for bacterial population genetics and systematics. Appl. Environ. Microbiol. 51:837–884.

    Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595.[Abstract/Free Full Text]

    Urwin, R., and M. C. Maiden. 2003. Multi-locus sequence typing: a tool for global epidemiology. Trends Microbiol. 11:479–487.[CrossRef][ISI][Medline]

    van Deuren, M., P. Brandtzaeg, and J. W. van der Meer. 2000. Update on meningococcal disease with emphasis on pathogenesis and clinical management. Clin. Microbiol. Rev. 13:144–166[Abstract/Free Full Text]

    Wang, J.-F., D. A. Caugant, G. Morelli, B. Koumaré, and M. Achtman. 1993. Antigenic and epidemiological properties of the ET-37 complex of Neisseria meningitidis. J. Infect. Dis. 167:1320–1329.[ISI][Medline]

    Watterson, G. A. 1975. On the number of segregating sites. Theor. Popul. Biol. 7:256.[ISI][Medline]

    Wenzel, R. P., J. A. Davies, J. R. Mitzel, and W. E. Beam Jr. 1973. Non-usefulness of meningococcal carriage-rates. Lancet 2:205.

    Wright, S. 1943. Isolation by distance. Genetics 28:114–138.[Free Full Text]

    ———. 1951. The genetical structure of populations. Annal. Eugen. 15:323–354.[ISI]

    Zhu, P., A. van der Ende, D. Falush et al. (16 co-authors). 2001. Fit genotypes and escape variants of subgroup III Neisseria meningitidis during three pandemics of epidemic meningitis. Proc. Natl. Acad. Sci. USA 98:5234–5239.[Abstract/Free Full Text]

Accepted for publication October 26, 2004.