Evidence for Recent Population Expansion in the Evolutionary History of the Malaria Vectors Anopheles arabiensis and Anopheles gambiae

Martin J. Donnelly, Monica C. Licht and Tovi Lehmann

Division of Parasitic Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia
Division of Parasite and Vector Biology, Liverpool School of Tropical Medicine, Liverpool, England


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 References
 
Gene flow in malaria vectors is usually estimated based on differentiation indices (e.g., FST) in order to predict the contemporary spread of genes such as those conferring resistance to insecticides. This approach is reliant on a number of assumptions, the most crucial, and the one most likely to be violated in these species, being mutation-migration-drift equilibrium. Tests of this assumption for the African malaria vectors Anopheles gambiae and Anopheles arabiensis are the focus of this study. We analyzed variation at 18 microsatellite loci and the ND5 region of the mitochondrial genome in two populations of each species. Equilibrium was rejected by six of eight tests for the A. gambiae population from western Kenya and by three tests in eastern Kenya. In western Kenya, all departures from equilibrium were consistent with a recent population expansion, but in eastern Kenya, there were traces of a recent expansion and a bottleneck. Equilibrium was also rejected by two of the eight tests for both A. arabiensis populations; the departure from equilibrium was consistent with an expansion. These multiple-locus tests detected a genomewide effect and therefore a demographic event rather than a locus-specific effect, as would be caused by selection. Disequilibrium due to a recent expansion in these species implies that rates of gene flow, as inferred from differentiation indices, are overestimates as they include a historical component. We argue that the same effect applies to the majority of pest species due to the correlation of their demography with that of humans.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 References
 
The two African malaria vectors Anopheles gambiae and Anopheles arabiensis have recently become the subject of intensive molecular genetic studies to determine patterns of gene flow and population structure (Lehmann et al. 1996Citation ; Besansky et al. 1997Citation ; Lanzaro et al. 1998Citation ; Donnelly et al. 1999Citation ; Simard et al. 2000Citation ). A primary objective of these studies has been to estimate processes that influence the spread of genes, such as those that confer insecticide resistance, between populations. These studies have used mitochondrial or microsatellite loci to calculate statistics such as Wright's (1978)Citation FST or Slatkin's (1995)Citation RST, which measure differentiation between populations based on the interpopulation component of the total genetic variation. Derivation of gene flow from such statistics is dependent on a number of assumptions, including negligible selection at the study loci and many subpopulations, each at mutation-migration-drift equilibrium, which are all identical in their demography and history and in the fraction of migrants they exchange. While this approach is relatively independent of mutation rate and robust to certain deviations from neutrality, it remains sensitive to departures from equilibrium (see Slatkin 1994aCitation ). Theoretically, mutation-drift equilibrium should be reached if the effective population size (Ne) has remained stable/stationary for 2Ne–4Ne generations (Nei and Li 1976Citation ), while migration-drift equilibrium should be reached if population size and migration rate are stable for 1/m or 2Ne generations, whichever is larger (Takahata 1983Citation ). Population stability for such periods is perhaps unrealistic for these species of mosquitoes because both are highly dependent on human abundance and climate, conditions that have changed drastically over the last millennia (see Cavalli-Sforza, Menozzi, and Piazza 1994Citation ; Nicholson 1995Citation and references therein). These mosquitoes characteristically breed in temporary sunlit pools, mostly created by humans and domestic animals; the adults feed primarily on humans or domestic animals and rest in human-made shelters. This dependence on humans has led to the conjecture that an expansion of A. gambiae populations, and possibly speciation, may be linked to the agrarian revolution in sub-Saharan Africa approximately 10,000–4,000 years ago (Coluzzi 1982Citation ). Nonetheless, bottlenecks cannot be entirely ruled out, since severe droughts have also periodically affected large areas, e.g., Würm Glaciation 18,000 years ago (Reader 1997Citation ). A recent demographic change may throw populations out of equilibrium and thus preclude a meaningful derivation of gene flow from differentiation indices. The violation of equilibrium is a critical concern, especially for "pest species" (broadly defined as all organisms that depend on humans and their resources, e.g., disease agents of humans, domestic animals, plants) whose current Ne is large and whose recent demographic history has probably been unstable.

Recently, methods have been developed to test for mutation-drift equilibrium (MDE), which should be approached more slowly than migration-drift equilibrium, and to trace past population demography from mitochondrial (e.g., Rogers and Harpending 1992Citation ) and microsatellite loci (e.g., Kimmel et al. 1998Citation ; Reich and Goldstein 1998Citation ). In this study, we apply these methods to mtDNA and 18 microsatellite loci from populations of A. gambiae and A. arabiensis to address the following questions: (1) Are populations of A. gambiae and A. arabiensis at MDE? (2) If not, was disequilibrium a result of population expansion or contraction? (3) Do populations within a species and/or across species show similar historical demographics? (4) What is the likely influence of the observed demographic history on estimates of gene flow? The answers to these questions not only will elucidate the historical demographics of these species, but will also clarify the utility of the estimates of gene flow derived from differentiation indices.

To maximize independence of populations, we selected two populations for each species that had been shown by earlier work to exhibit the greatest differentiation (Lehmann et al. 1999Citation ; Donnelly and Townson 2000Citation ). The two populations of A. gambiae within Kenya showed higher differentiation than did the same western Kenyan population and one from Senegal (Lehmann et al. 1999Citation ). We selected analytical methods that exploited different aspects of the data in order to maximize independence between tests. Together, these tests can provide a comprehensive picture of the past demographics of these species. Selection can produce patterns of variation that are indistinguishable from those produced by demographic changes (e.g., Tajima 1989Citation ; Fu and Li 1993Citation ). To avoid confusing selection (which is locus-specific) with demographic instability, we relied on the composite signature of all 19 loci, representing two marker systems, mtDNA and microsatellites.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 References
 
Sample Sites, Species Identification, DNA Extraction, and Locus Selection
The mosquito samples used in this study were collected from four locations in Africa (fig. 1 ) and have previously been described in detail (Lehmann et al. 1997, 1998Citation ; Donnelly and Townson 2000Citation ). Adult mosquitoes were obtained from houses within a village by resting and pyrethrum knockdown collections. A sample consisted of specimens collected in an area with a radius of <1 km. In eastern Africa, only one cytotype, the Savanna form, of A. gambiae is present, and there is no evidence for segregating cytological forms of A. arabiensis. Previous studies revealed no population subdivision within and among adjacent villages separated by 10–50 km and that mosquitoes within a house represented a random sample of the population (Petrarca and Beier 1992Citation ; Besansky et al. 1997Citation ; Lehmann et al. 1997, 1998Citation ; Donnelly et al. 1999Citation ). Species identification, collection location and date, and sample size were as follows: A. gambiae—Asembo, western Kenya (June 1996; 55), and Jego, eastern Kenya (June 1996; 56); A. arabiensis—Mkali, Malawi (August 1997; 61), and Harosha, Ethiopia (August 1997; 57). After DNA extraction (see Lehmann et al. 1997Citation ; Donnelly et al. 1999Citation ) and PCR-based species identification (Scott, Brogdon, and Collins 1993Citation ), all individuals were genotyped for variation at 18 microsatellite loci. Detailed descriptions of nine loci for A. gambiae and eight loci for A. arabiensis have previously been published (Lehmann et al. 1998, 1999Citation ; Donnelly and Townson 2000Citation ), as has the A. gambiae mtDNA data from Asembo Bay (Lehmann et al. 1997Citation ). Additional microsatellite loci (table 1 ) were selected for the present study. Loci distributed on both arms of each autosome and on the sex chromosome were selected based on physical and linkage maps of A. gambiae (Zheng et al. 1996Citation ) to ensure genomewide coverage. All but locus Ag2H147 in A. gambiae and loci 33C1, Ag2H26, and Ag2H786 in A. arabiensis were located outside chromosomal inversions, which are polymorphic in these populations, to maximize the independence of loci (Lanzaro et al. 1998Citation ; Petrarca et al. 2000Citation ). A subset of the specimens from each location was sequenced over a stretch of the ND5 region of the mitochondrial genome following the protocols of Besansky et al. (1997)Citation . An exception was A. gambiae from western Kenya, for which sequences of 45 specimens collected in 1994 were used (Lehmann et al. 1997, 2000Citation ). DNA sequencing and microsatellite allele sizing were performed on an ABI 377 machine (Perkin Elmer) using standard protocols. As there were no insertions or deletions, the sequences were unambiguously aligned using Sequence Navigator, version 1.0.1 (ABI Systems, Calif.).



View larger version (59K):
[in this window]
[in a new window]
 
Fig. 1.—Locations of sample sites. Species ranges are marked for Anopheles arabiensis (stippled) and for Anopheles gambiae (gray)

 

View this table:
[in this window]
[in a new window]
 
Table 2 Summary Statistics for Polymorphism at the ND5 Region of mtDNA

 
Methods to Infer Historic Population Demographics from mtDNA
The tests of Tajima (1989)Citation and Fu and Li (1993)Citation compare two estimates of the parameter {theta} (for haploid, maternally inherited mitochondrial DNA, {theta} = 2Neµ, where Ne is the female effective population size and µ is the mutation rate), which should be equal under MDE provided that polymorphism is neutral and an infinite-sites model applies. However, under selection or nonequilibrium, the two estimators will differ, and this difference reflects the mode of selection or the direction of change in population size. Tajima's test contrasts the estimate of {theta} based on the mean number of pairwise differences between sequences (also termed nucleotide diversity, or {pi}) and that based on the number of segregating sites given the sample size. After a change in population size, the estimate of {theta} calculated from the number of segregating sites is more rapidly affected by the new (present) population size, while the estimate from nucleotide diversity would reflect the past population size for a longer time (Tajima 1989Citation ). Fu and Li's (1993)Citation tests contrast estimates of {theta} based on mutations on internal and external branches of the genealogy. The tests, originally designed to assess the neutrality of polymorphism, assume that since selection will purge deleterious mutations, those mutations present are likely to have arisen recently and are found close to the tips of the genealogy. Similarly, mutations in the internal branches are likely to be older and selectively neutral, although a recent mutation conferring a selective advantage could increase to a high frequency and therefore appear internal. If purifying selection is acting on a locus, there will be an excess of mutations in external branches, as deleterious alleles will be present at low frequencies, resulting in negative D* and F* values. A recent population expansion will also result in an excess of external mutations and would produce negative values for these statistics. All statistics were calculated using DnaSP, version 3 (Rozas and Rozas 1999Citation ).

Slatkin (1994b)Citation demonstrated that the probability of detecting linkage disequilibrium between closely linked (neutral) loci is greatly diminished in a recently expanded population. This is a result of the accumulation of new mutations when haplotype loss is minimal. Ancestral haplotypes will persist in the population and will be indistinguishable from putative recombination events, thereby confounding tests of linkage disequilibrium. Calculations of linkage disequilibrium between polymorphic sites in the mtDNA were performed using DnaSP, version 3.

Analyses of the mismatch distribution (the frequency distribution of pairwise differences in mtDNA sequences) as proposed by Slatkin and Hudson (1991)Citation , Rogers and Harpending (1992)Citation , and Rogers (1995)Citation distinguish between the smooth unimodal distribution of a recently expanded population that is shaped by accumulation of mutations with minimal lineage loss and the "ragged" multimodal distributions that are shaped by mutations in equilibrium with stochastic lineage loss. Harpending et al. (1993)Citation suggested a "raggedness" statistic based on the sum of the squared differences between the frequencies of successive entries (the number of mutational differences between sequences) in the distribution. The statistical significance of this value may be determined from the distribution of the statistic determined by simulations. All calculations were performed using DnaSP, version 3.

Methods to Infer Historic Population Demographics from Microsatellite Data
Cornuet and Luikart (1996)Citation have extended the single-locus homozygosity test (Watterson 1978Citation ) to multiple loci under a range of mutation models, including the infinite-alleles model (IAM), the stepwise mutation model (SMM), and the two-phase model (TPM). This approach, analogous to the Tajima test, compares the homozygosity (or its complement—expected heterozygosity) calculated on the basis of allele frequencies with that calculated on the basis of the number of alleles and the sample size, which are expected to be identical in a neutral locus in a population at MDE. To evaluate the sensitivity of the results to the mutation model, we performed the tests under the SMM, the TPM with mutations of more than one repeat occurring at frequencies of 10%, 20%, and 30%, and even under the IAM. Significant departure between the estimates of heterozygosity under the correct mutation model implies that the population is not at MDE. Tests were performed using the Bottleneck program (Cornuet and Luikart 1996Citation ).

Kimmel et al.'s (1998)Citation approach follows a rationale similar to that of Tajima (1989)Citation , Fu and Li (1993)Citation , and Cornuet and Luikart (1996)Citation in that it contrasts an estimate of {theta} (=4Neµ; diploid autosomes) calculated from allele frequencies with an estimate calculated on the basis of the variance in repeat numbers. In neutral loci in a population at MDE, the estimates will be equal. The quotient of the two estimates, termed the imbalance index (ß = {theta}var/{theta}freq) will depart from 1 after a demographic change. ß, and 95% confidence intervals estimated by bootstrapping over loci were calculated using programs written in the SAS language (SAS Institute 1990Citation ).

The k-test of Reich and Goldstein (1998)Citation exploits differences between the expected distributions of alleles in populations at MDE and populations that have recently expanded. The expected distribution of a recently expanded population tends to be unimodal, and more peaked than the multimodal and heavier-tailed distribution of a population at MDE (Reich and Goldstein 1998Citation ). The g-test of Reich and Goldstein (1998)Citation compares the between-loci variance in the number of repeats with a theoretical expectation derived assuming that the loci follow an SMM and that the population size is stable. We performed both the k- and the g-tests using programs written in the SAS language. k-statistics were calculated for each locus, and the significance of the proportion of positive k values was based on a binomial distribution with the probability of a positive k set conservatively as 0.515 (Reich, Feldman, and Goldstein 1999Citation ). Significance levels for the g-test are given in Reich, Feldman, and Goldstein (1999)Citation .


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 References
 
Mitochondrial DNA
Complete sequences of at least 599-bp (199 codons) or 790-bp (263 codons) sections of the ND5 region of the mitochondrial genome were obtained for 58 A. gambiae specimens and 55 A. arabiensis specimens, respectively (GenBank accession numbers AY009952AY010064). Summary statistics are given in table 2 .


View this table:
[in this window]
[in a new window]
 
Table 3 Tests for Nonrandom Associations Between Polymorphic Sites in mtDNA

 
Significant departure from equilibrium, as determined by Tajima's (1989)Citation test, was found only in the western Kenyan population of A. gambiae (table 2 ); however, all populations had negative D values. Correspondingly, plots of the frequency spectrum showed excesses of mutations that appeared in only one or two individuals and a deficiency of mutations shared by large number of specimens (fig. 2 ). This pattern was most extreme in the western Kenyan population of A. gambiae, but it was apparent in all populations (fig. 2 ). Fu and Li's (1993)Citation D* and F* statistics, which contrast estimates of {theta} based on the number of mutations in external versus internal branches of the genealogy, produced similar results.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 2.—Frequency spectra for mtDNA sequences showing the distribution of segregating sites. A, Anopheles arabiensis. B, Anopheles gambiae Jego, eastern Kenya. C, Anopheles gambiae Asembo, western Kenya

 
Tests of a Recent Population Expansion
Tests of linkage disequilibrium (Slatkin 1994bCitation ) in the mtDNA data revealed differences between the species (table 3 ). Despite (presumed) complete linkage between sites in the mtDNA, no (significant) linkage disequilibrium was detected in either population of A. gambiae, whereas linkage disequilibrium was high (and highly significant) in both populations of A. arabiensis. Excluding singleton mutations (table 3 , y = 2) and mutations that were shared by two individuals in the sample (table 3 , y = 3) did not increase the disequilibrium in the large sample of A. gambiae from western Kenya, whereas it increased disequilibrium in A. arabiensis, suggesting that this recent expansion of A. gambiae was not preceded by an "ancient" stable population, unless it was a very small ancient population (see below).


View this table:
[in this window]
[in a new window]
 
Table 1 Summary Data for Microsatellite Loci

 
The mismatch distributions of the A. gambiae populations were unimodal, and visually, they fit well with their corresponding distributions expected under expansion (fig. 3 ). The western Kenyan population of A. gambiae presented the closest fit to its expected distribution underexpansion, yet the raggedness index for eastern Kenya and not western Kenya was significantly lower than that expected for a population in MDE (table 2 ). The mismatch distributions of A. arabiensis, on the other hand, were bimodal, and they fit poorly with their corresponding distributions expected under expansion (fig. 3 ).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 3.—Frequency distributions of pairwise sequence differences between individual haplotypes for all four sample populations. A, Anopheles gambiae, Asembo, western Kenya. B, Anopheles gambiae, Jego, eastern Kenya. C and D, Anopheles arabiensis. In all cases, the y-axis denotes frequency and the x-axis denotes the number of segregating sites between pairwise comparisons of haplotypes. Expected distributions are plotted following the three-parameter model of Rogers and Harpending (1992)Citation ; {theta}0 and {theta}1 were set at 0 and 1,000, respectively, and {tau} was estimated from the data following Rogers (1995)Citation

 
Altogether, these mtDNA results suggest that populations of A. gambiae are not at MDE due to a recent expansion. However, mtDNA provides no compelling evidence for departure from MDE or for a recent expansion in A. arabiensis. The pre-expansion value of {theta} ({theta}0) can be estimated using the simplified two-parameter model of mismatch distributions (Harpending et al. 1993Citation ). Assuming this model, the value of {theta}0 in both A. gambiae populations was 0 (table 2 ), corroborating the results of the linkage analysis in suggesting a very small pre-expansion population size or even a recent speciation followed by an expansion. Alternatively, a selective sweep, resulting in complete replacement of ancient mitochondrial lineages, can also explain these results.

Microsatellites
The polymorphism of the 18 microsatellite loci in A. gambiae and A. arabiensis was moderate to high (table 1 ). The higher genetic diversity of A. gambiae may reflect an ascertainment bias, as the loci were originally isolated from this species or a lower effective population size in A. arabiensis, as suggested by previous studies (Taylor et al. 1993Citation ; Lehmann et al. 1998Citation ; Simard et al. 2000Citation ). Genetic diversity of the A. gambiae population from eastern Kenya was lower than that of western Kenya (expected heterozygosity: Wilcoxon signed-ranks test, n = 18, P < 0.02; number of alleles: sign test, n = 18, P < 0.001), in accordance with previous reports based on a subset of nine of the loci (Lehmann et al. 1998, 1999Citation ). No significant differences in genetic diversity were detected between the populations of A. arabiensis. Exact tests of linkage disequilibrium, using the sequential Bonferroni correction to accommodate the number of tests, showed no significant departure from equilibrium between any locus pair in any population, thereby demonstrating the independence of loci. Using 18 independent loci would allow us to distinguish between a locus-specific effect, such as that caused by selection, and a genomewide effect, caused by a demographic change.

The results of the homozygosity test (Cornuet and Luikart 1996Citation ) were dependent on the mutation model (table 4 ). We emphasize the results based on the SMM and TPM models, since the consensus is that they better approximate the mutation process at microsatellite loci than the IAM (e.g., Weber and Wong 1993Citation ; Di Rienzo et al. 1994Citation ; Primmer et al. 1998Citation ). Higher heterozygosity based on the number of alleles was significant for A. gambiae populations under the SMM and the TPM with multiple repeat mutations at frequencies of 10% and 20%. Similarly, A. arabiensis remained significant, with up to 10% multiple repeat mutations (table 4 ). Higher heterozygosity based on the number of alleles across many independent loci indicates a recent expansion of the population. Another possible cause, a recent influx of rare alleles from genetically distinct populations (Cornuet and Luikart 1996Citation ), is unlikely given that the same signature was observed in all populations and that allele frequency distributions are relatively homogeneous between populations (Lehmann et al. 1996, 1999Citation ; Donnelly and Townson 2000Citation ). Similarly to mtDNA, the pattern of departure from MDE due to a recent expansion was stronger in A. gambiae, as the average deviation from expectation under MDE was larger and departures from equilibrium persisted under a wider range of mutation models (table 4 ).


View this table:
[in this window]
[in a new window]
 
Table 4 Analysis of Microsatellite Data Using Homozygosity Tests (Cornuet and Luikart 1996)

 
The imbalance index ß (Kimmel et al. 1998Citation ) of the western Kenyan population was the lowest (0.75; table 5 ), and close to significance (P < 0.07). Calculating the index for dinucleotide loci (excluding the trinucleotide loci) to maximize homogeneity among loci (Chakraborty and Kimmel 1999Citation ) resulted in a significant value (ß = 0.737, P < 0.05; table 5 ). An index <1 would be expected in a population that has recently expanded (from MDE), as the variance in repeat number increases more slowly than the expected heterozygosity. The imbalance index of the eastern Kenyan population, on the other hand, was the highest (ß = 3.285; table 5 ), and highly significant. The high imbalance index persisted after the trinucleotide loci were excluded (ß = 3.183; table 5 ). Kimmel et al. (1998)Citation demonstrated by simulation that ß > 1 in a population that has expanded after a bottleneck. We argue that ß may also be >1 if the population has experienced a severe bottleneck after expansion, as the variance in allele size, reflecting the prebottleneck population, would be retained for a longer period, whereas expected heterozygosity would decline faster, reflecting the new Ne. The imbalance indices of the A. arabiensis populations did not depart significantly from 1 (table 5 ), consistent with MDE.


View this table:
[in this window]
[in a new window]
 
Table 5 Analysis of Microsatellite Data Using the Imbalance Index {ß} (Kimmel et al. 1998) and the k-Test and g-Test (Reich, Feldman, and Goldstein 1999)

 
Reich, Feldman, and Goldstein (1999)Citation demonstrated that the probability of a positive value of k is constrained between 0.515 and 0.55 for loci when the sample size is greater than 10 and {theta} > 2. The value of {theta} for each locus can be estimated from the relationship {theta}/2 = E[Var] (Zhivotovsky and Feldman 1995Citation ). Following the decision rule of Reich, Feldman, and Goldstein (1999)Citation , we applied the k-test to those loci with variances >1 (table 5 ). The k-test showed a significant signal of expansion in the western Kenyan population of A. gambiae and in both A. arabiensis populations, but not in the eastern Kenyan sample (table 5 ).

The interlocus g-test showed no evidence for deviation from equilibrium in any of the populations (table 5 ). This may reflect the decreased power of this test with extensive variation in mutation rate across loci as may be the case with our data set, which combines dinucleotide and trinucleotide microsatellite loci.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 References
 
This study provides evidence that populations of A. gambiae and A. arabiensis do not exist in MDE and that the departure from equilibrium is a result of a recent population expansion. Detection of this pattern in multiple independent loci allows us to distinguish it from selection, which is locus-specific, and attribute it to a past demographic change. This is consistent with the low levels of population differentiation across the range of both species (e.g., Lehmann et al. 1996Citation ; Donnelly and Townson 2000Citation ). Within populations, the results of the different tests agreed with each other, and their minor differences probably reflect their differing sensitivities to various aspects of the size change. The significant k-test is maximally sensitive to an expansion that occurred approximately 5Ne generations ago (pre-expansion Ne), whereas the g-test is maximally sensitive to the same event after approximately 15Ne generations. Power analysis also showed that a minimum of 30 loci are desirable to detect a 100-fold expansion (Reich, Feldman, and Goldstein 1999Citation ). Therefore, the detection of a strong signature of expansion using only 18 microsatellite loci suggests that the expansion occurred closer to 5Ne generations ago. The failure of mtDNA-based tests to detect expansion in A. arabiensis may reflect the shorter time to equilibrium of mitochondrial markers. In a population with an equal sex ratio, the time until MDE for mtDNA is expected to be one fourth that for a nuclear locus.

MDE was rejected by six of the eight tests in the A. gambiae populations from western Kenya and by three of these tests in eastern Kenya. In both populations, departures were detected in mtDNA- and microsatellite-based tests (table 6 ). All departures from equilibrium in the western Kenyan population were consistent with a recent population expansion, and the same trend was also apparent in the two nonsignificant tests. Departures from equilibrium in the eastern Kenyan population showed traces of both a recent expansion and a bottleneck (table 6 ). Indeed, previous studies have suggested that a bottleneck had occurred in eastern Kenya, based on its lower genetic diversity, the presence of all eastern Kenyan microsatellite alleles (with frequency > 5%) in western Kenya but the absence of several western Kenyan alleles in eastern Kenya, and evidence that differentiation in mtDNA and microsatellites was generated primarily by drift and not by mutation-drift (Lehmann et al. 1998, 1999, 2000Citation ). The lack of significant linkage disequilibrium in the mtDNA, even between mutations shared by three or more individuals (table 3 ), and the unimodal, relatively smooth, and narrow-tailed mismatch distributions of these populations (fig. 3 ) suggest that expansion started from a virtually monomorphic population, and the expansion, whose traces are found in both populations, preceded the bottleneck in eastern Kenya.


View this table:
[in this window]
[in a new window]
 
Table 6 Summary Table of All Tests

 
MDE was rejected by two of the eight tests in both A. arabiensis populations, providing less decisive evidence for deviation from equilibrium in this species. A conservative view would be that a single multilocus test detecting a significant departure from MDE is sufficient to reject equilibrium. Accordingly, the two A. arabiensis populations are not at MDE. There is some evidence for possible mitochondrial introgression in these species, although retention of ancestral polymorphism is also a plausible explanation (Besansky et al. 1997Citation ). If introgression is mostly unidirectional, from A. gambiae into A. arabiensis, it may explain the weaker evidence for expansion in A. arabiensis. Introgression could increase linkage disequilibrium (table 3 ) between sites and may well increase raggedness (fig. 3 ). However, introgression does not explain why raggedness and linkage disequilibrium are highest in Ethiopia, where A. gambiae does not exist (fig. 1 ) and contemporary introgression is unlikely to occur. Furthermore, excluding putatively introgressed sequences (i.e., haplotypes shared by both species) from the analysis resulted in no significant changes. All microsatellite loci were isolated from A. gambiae, and ascertainment bias, defined as decreased length and variability of loci in nontarget species, needs to be considered as a cause of the differences between species. If the lower average variability observed in A. arabiensis populations reflects large variations in mutation rates between loci, the power of the g-test and the imbalance index ß to detect an expansion event is reduced (King, Kimmel, and Chakraborty 2000)Citation . However, this variation will not affect tests that are applied to each locus separately and which do not assume equal mutation rates. Hence, ascertainment bias will not affect either the homozygosity test or the k-test (after excluding loci with variance lower than the threshold), which are the only tests that detected departures from equilibrium in A. arabiensis.

The weaker signal of expansion in A. arabiensis may reflect an earlier expansion, a smaller change in effective population size between the pre- and postexpansion populations, and/or a smaller current population size. The expansion detected in these species may be contemporaneous with the agricultural revolution in sub-Saharan Africa (4,000–10,000 years ago). Coluzzi (1982)Citation proposed that because these species were dependent on humans for feeding and breeding sites, mosquito populations may have mirrored the growth in populations of humans and domestic animals during this period. Estimates of current Ne for A. arabiensis (Taylor et al. 1993Citation ; Simard et al. 2000Citation ) are an order of magnitude lower than those for A. gambiae (Lehmann et al. 1998Citation ). A lower effective population size would mean that A. arabiensis would approach MDE more rapidly after an expansion. Whether the expansion was a result of the agricultural revolution or was, for example, associated with ameliorating conditions after an extensive drought remains to be resolved.

Dependence on Assumed Mutation Models
Microsatellite-based tests of past demographic stability assume a certain mutation model, and an incorrectly specified model can influence the outcomes of tests (e.g., the homozygosity test; table 4 ). Most empirical and theoretical work suggests that the SMM and the TPM are more appropriate mutation models for microsatellite loci than is the IAM (Shriver et al. 1993Citation ; Di Rienzo et al. 1994Citation ; Schlötter et al. 1998Citation ). If the mutation process approximates an IAM, then the k-test may falsely reject a stable population size (Reich, Feldman, and Goldstein 1999Citation ). Conversely, as demonstrated by King, Kimmel, and Chakraborty (2000)Citation , the imbalance index ß, will become more conservative as loci approach an IAM, because the estimate of {theta} derived from variance in allele size will be higher than that estimated from allele frequency. Therefore, even under an IAM, MDE would be rejected in A. gambiae as a result of a significant imbalance index ß and mtDNA tests, but no significant departure from MDE would be detected in A. arabiensis populations. However, an IAM, or a TPM with a high frequency of multiple steps, is unlikely for our microsatellite data because allele arrays have virtually no gaps in the series of allele size, which must be expected under the these models (for allele arrays, see Lehmann et al. 1999Citation ; Donnelly and Townson 2000Citation ).

The current findings help reconcile the discrepancy between ecological studies, suggesting limited dispersal (Adams 1940Citation ), and indirect genetic studies, suggesting high rates of migration across vast distances (Lehmann et al. 1996Citation ; Donnelly and Townson 2000Citation ). Estimates of migration derived from differentiation indices are inflated by a recent expansion. The opposite effect may apply to populations to the east of the eastern Rift Valley, where a recent bottleneck resulted in an underestimation of gene flow (see Lehmann et al. 1999, 2000Citation ). The degree of bias in estimates of gene flow that the demographic changes cause is unknown, which highlights the need for new methods to infer contemporary gene flow in nonequilibrium populations. Large current Ne values and recent dramatic population size changes are likely to be common in many "pest species," and therefore tests of MDE should be performed before gene flow is inferred from differentiation indices.


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 References
 
MDE was rejected in both species, reflecting primarily a recent population expansion. The eastern African populations that were studied here may not represent the entire species, and especially may not represent West African populations. However, these findings indicate that equilibrium should not be assumed for any population of these species without supporting evidence. The expansion observed in these populations will upwardly bias gene flow based on differentiation indices. Therefore, earlier work suggesting extensive gene flow in A. gambiae and A. arabiensis should be viewed with caution, and better estimates of contemporary gene flow are needed to evaluate the potential to control malaria transmission by introducing refractory genes into mosquito populations.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 References
 
This manuscript was improved by the insightful comments of Fred Simard, Fernando Monteiro, Matthew Stephens, David Reich, the Editor, and two anonymous referees. We thank Asefaw Getachew for providing specimens of A. arabiensis from Ethiopia and Brian Holloway and the staff of the NCID Biotechnology Core Facility for synthesizing some of the oligonucleotide primers. M.J.D. was supported by a Wellcome Trust Biodiversity Studentship (with Harold Townson) and by a postdoctoral fellowship from the American Society of Microbiology. This work received financial support from the UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases and from NIH grant (A140631-01).


    Footnotes
 
Keith Crandall, Reviewing Editor

1 Keywords: Anopheles malaria mosquitoes population genetics expansion Back

2 Address for correspondence and reprints: Martin J. Donnelly, Centers for Disease Control and Prevention, MS F22, 4770 Buford Highway, Chamblee, Georgia 30341. E-mail: mpd7{at}cdc.gov Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 References
 

    Adams P. C. G., 1940 Some observations on the flight of stained anophelines in N'kana, Northern Rhodesia Ann. Trop. Med. Parasitol 34:35-43

    Besansky N. J., T. Lehmann, G. T. Fahey, D. Fontenille, L. E. O. Braak, W. A. Hawley, F. H. Collins, 1997 Patterns of mitochondrial variation within and between African malaria vectors, Anopheles gambiae and A. arabiensis, suggest extensive gene flow Genetics 147:1817-1828[Abstract/Free Full Text]

    Cavalli-Sforza L. L., P. Menozzi, A. Piazza, 1994 The history and geography of human genes Princeton University Press, Princeton, N.J

    Chakraborty R., M. Kimmel, 1999 Statistics of microsatellite: estimation of mutation rate and pattern of population expansion Pp. 139–150 in D. B Goldstein and C. Schlötter, eds. Microsatellites; evolution and applications. Oxford University Press, Oxford, England

    Coluzzi M., 1982 Spatial distribution of chromosomal inversions and speciation in anopheline mosquitoes Pp. 143–153 in C. Bargozzi, ed. Mechanisms of speciation. Alan R. Liss, New York

    Cornuet J.-M., G. Luikart, 1996 Description and power analysis of two tests for detecting recent population bottlenecks from allele frequency data Genetics 144:2001-2014[Abstract/Free Full Text]

    Di Rienzo A., A. C. Peterson, J. C. Garza, A. M. Valdes, M. Slatkin, N. B. Freimer, 1994 Mutational processes of simple-sequence repeat loci in human populations Proc. Natl. Acad. Sci. USA 91:3166-3170[Abstract]

    Donnelly M. J., N. Cuamba, J. D. Charlwood, F. H. Collins, H. Townson, 1999 Population structure in the malaria vector, Anopheles arabiensis Patton, in East Africa Heredity 83:408-417[ISI][Medline]

    Donnelly M. J., H. Townson, 2000 Evidence for extensive genetic differentiation among popuulations of the malaria vector, Anopheles arabiensis in East Africa Insect Mol. Biol 9:357-367[ISI][Medline]

    Fu Y.-X., W.-H. Li, 1993 Statistical tests of neutrality of mutations Genetics 133:693-709[Abstract/Free Full Text]

    Harpending H. C., S. T. Sherry, A. R. Rogers, M. Stoneking, 1993 The genetic structure of ancient human populations Curr. Anthropol 34:483-496[ISI]

    Kimmel M., R. Chakraborty, J. P. King, M. Bamshad, W. S. Watkins, L. B. Jorde, 1998 Signatures of population expansion in microsatellite repeat data Genetics 148:1921-1930[Abstract/Free Full Text]

    King J. P., M. Kimmel, R. Chakraborty, 2000 A power analysis of microsatellite-based statistics for inferring past population growth Mol. Biol. Evol 17:1859-1868[Abstract/Free Full Text]

    Lanzaro G. C., Y. T. Toure, J. Carnahan, L. Zheng, G. Dolo, S. Traor, V. Petrarca, K. D. Vernick, C. E. Taylor, 1998 Complexities in the genetic structure of Anopheles gambiae populations in west Africa as revealed by microsatellite DNA analysis Proc. Natl. Acad. Sci. USA 95:14260-14265[Abstract/Free Full Text]

    Lehmann T., N. J. Besansky, W. A. Hawley, T. G. Fahey, L. Kamau, F. H. Collins, 1997 Microgeographic structure of Anopheles gambiae in western Kenya based on mtDNA and microsatellite loci Mol. Ecol 6:243-253[ISI][Medline]

    Lehmann T., C. R. Blackston, N. J. Besansky, A. A. Escalante, W. A. Hawley, F. H. Collins, 2000 The Rift Valley complex as a barrier to gene flow for Anopheles gambiae in Kenya The mtDNA perspective. J. Hered 91:165-168[Abstract/Free Full Text]

    Lehmann T., W. A. Hawley, H. Grebert, F. H. Collins, 1998 The effective population size of Anopheles gambiae in Kenya: implications for population structure Mol. Biol. Evol 15:264-276[Abstract]

    Lehmann T., W. A. Hawley, H. Grebert, M. Danga, F. Atelli, F. H. Collins, 1999 The Rift Valley complex as a barrier to gene flow for Anopheles gambiae in Kenya J. Hered 90:613-621[Abstract/Free Full Text]

    Lehmann T., W. A. Hawley, L. Kamau, D. Fontenille, F. Simard, F. H. Collins, 1996 Genetic differentiation of Anopheles gambiae populations from East and West Africa: comparison of microsatellite and allozyme loci Heredity 77:192-200[ISI][Medline]

    Nei M., W.-H. Li, 1976 The transient distribution of allele frequencies under mutation pressure Genet. Res 28:205-214[ISI][Medline]

    Nicholson S. E., 1995 Environmental change within the historical period Pp. 60–75 in A. S. Goudie, W. M. Adams, and A. Orme, eds. The physical geography of Africa. Oxford University Press, Oxford, England

    Petrarca V., J. C. Beier, 1992 Intraspecific chromosomal polymorphism in the Anopheles gambiae complex as a factor affecting malaria transmission in the Kisumu area of Kenya Am. J. Trop. Med. Hyg 46:229-237[ISI][Medline]

    Petrarca V., A. D. Nugud, M. A. Ahmed, A. M. Haridi, M. A. Di Deco, M. Coluzzi, 2000 Cytogenetics of the Anopheles gambiae complex in Sudan, with special reference to An. arabiensis: relationships with East and West African populations Med. Vet. Entomol 14:149-164[ISI][Medline]

    Primmer C. R., N. Saino, A. P. Møller, H. Ellegren, 1998 Unraveling the process of microsatellite evolution through analysis of germ line mutations in barn swallows Hirundo rustica. Mol. Biol. Evol 15:1047-1054[Free Full Text]

    Reader J., 1997 Africa: a biography of the continent Vintage, New York

    Reich D. E., M. W. Feldman, D. B. Goldstein, 1999 Statistical properties of two tests that use multilocus data sets to detect population expansion Mol. Biol. Evol 16:453-466[Free Full Text]

    Reich D. E., D. B. Goldstein, 1998 Genetic evidence for a Paleolithic human population expansion in Africa Proc. Natl. Acad. Sci. USA 95:8119-8123[Abstract/Free Full Text]

    Rogers A. R., 1995 Genetic evidence for a Pleistocene population explosion Evolution 49:608-615[ISI]

    Rogers A. R., H. Harpending, 1992 Population growth makes waves in the distribution of pairwise genetic differences Mol. Biol. Evol 9:552-569[Abstract]

    Rozas J., R. Rozas, 1999 DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis Bioinformatics 15:174-175[Abstract/Free Full Text]

    SAS Institute. 1990 SAS language: references Version 6. 1st edition. SAS, Cary, N.C

    Schlötterer C., R. Ritter, B. Harr, G. Brem, 1998 High mutation rate of a long microsatellite allele in Drosophila melanogaster provides evidence for allele-specific mutation rates Mol. Biol. Evol 15:1269-1274[Abstract/Free Full Text]

    Scott J. A., W. G. Brogdon, F. H. Collins, 1993 Identification of single specimens of the Anopheles gambiae complex by the polymerase chain reaction Am. J. Trop. Med. Hyg 49:520-529[ISI][Medline]

    Shriver M. D., L. Jin, R. Chakraborty, E. Boerwinkle, 1993 VNTR allele frequency distributions under the stepwise mutation model: a computer simulation approach Genetics 134:983-993[Abstract/Free Full Text]

    Simard F., T. Lehmann, J. J. Lemasson, M. Diatta, D. Fontenille, 2000 Persistence of Anopheles arabiensis during the severe dry season conditions in Senegal: an indirect approach using microsatellite loci Insect Mol. Biol 9:467-479[ISI][Medline]

    Slatkin M., 1994a. Gene flow and population structure Pp. 3–17 in L. R. Real, ed. Ecological genetics. Princeton University Press, Princeton, N.J

    ———. 1994b. Linkage disequilibrium in growing and stable populations Genetics 137:331-336[Abstract/Free Full Text]

    ———. 1995 A measure of population subdivision based on microsatellite allele frequencies Genetics 139:457-462[Free Full Text]

    Slatkin M., R. R. Hudson, 1991 Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations Genetics 129:555-562[Abstract/Free Full Text]

    Tajima F., 1989 The effect of change in population size on DNA polymorphism Genetics 123:597-601[Abstract/Free Full Text]

    Takahata N., 1983 Gene identity and genetic differentiation of populations in the finite island model Genetics 104:497-512[Abstract/Free Full Text]

    Taylor C. E., Y. T. Toure, M. Coluzzi, V. Petrarca, 1993 Effective population size and persistence of Anopheles arabiensis during the dry season in west Africa Med. Vet. Entomol 7:351-357[ISI][Medline]

    Watterson G. A., 1978 The homozygosity test of neutrality Genetics 88:405-417[Abstract/Free Full Text]

    Weber J. L., C. Wong, 1993 Mutation of human short tandem repeats Hum. Mol. Genet 2:1123-1128[Abstract]

    Wright S., 1978 Evolution and the genetics of populations, Vol. 4 Variability among and within populations. 2nd edition. University of Chicago Press, Chicago

    Zheng L., M. Q. Benedict, A. J. Cornel, F. H. Collins, F. C. Kafatos, 1996 An integrated genetic map of the African human malaria vector mosquito, Anopheles gambiae. Genetics 14:941-952

    Zhivotovsky L. A., M. W. Feldman, 1995 Microsatellite variability and genetic distances Proc. Natl. Acad. Sci. USA 92:11549-11552[Abstract]

Accepted for publication March 13, 2001.