Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: selection humans genetic distance STR
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
One approach, suggested many years ago, is based on the idea that genes that are subjected to local selection should exhibit larger than average genetic distances between populations (Cavalli-Sforza 1966). Since this should also be true for marker loci closely linked to the selected locus, screening random marker loci across the genome for large genetic distances might be a useful way to identify genomic regions under local selection. Indeed, Lewontin and Krakauer (1973) proposed a statistical test, based on the expected variance in Fst values for a sample of loci, to detect loci with significantly large Fst values. Unfortunately the test was flawed (Lewontin and Krakauer 1975; Nei and Maruyama 1975; Robertson 1975), and the approach was largely abandoned. However, although the specific test is not valid, the general idea may still have merit. In fact, other authors have suggested that selection may be responsible for particular observations of loci exhibiting large genetic distances between populations (Bowcock et al. 1991), and recently there has been a resurgence of interest in methods to detect such loci (Beaumont and Nichols 1996; Baer 1999; Vitalis, Dawson, and Boursot 2001; Akey et al. 2002; Balloux and Goudet 2002; Payseur, Cutter, and Nachman 2002; Schlötterer 2002a). In this paper we investigate the feasibility of a genome scan approach to detect marker loci that exhibit large genetic distances between human populations, as a means of identifying candidate genes that have experienced local selection.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Genotyping was carried out at the Swedish Genome Center, Uppsala, using loci and methods as described previously (Lindqvist et al. 1996), with the following exceptions: amplifications were carried out in a 9 µl volume; PCR and pooling and dilutions of PCR products were performed with an ABI877 Integrated Thermal Cycler (PE Applied Biosystems, Inc.); and subsequent fragment length analysis was carried out with an ABI PRISM 3700 DNA Analyzer and GeneScan software (PE Applied Biosystems, Inc.). Genotypes are available from the authors upon request.
Additional candidate STR loci in particular genomic regions of interest were identified by screening the DNA sequence with the UCSC Genome Browser Gateway (Kent et al. 2002; http://genome.ucsc.edu/cgi-bin/hgGateway) for 14 or more copies of a dinucleotide and six or more copies of a trinucleotide or tetranucleotide repeated sequence. Primers were designed with the Primer3 program (http://www-genome.wi.mit.edu/genome_software/other/primer3.html), and fluorescent-labeled PCR was performed using standard conditions. The PCR products from up to three nonoverlapping STR loci were pooled and analyzed on an ABI 377 DNA Sequencer and GeneScan software. All loci have been submitted to the Human Genome Database (http://www.gdb.org), from which additional marker and typing details can be obtained. For some loci, direct DNA sequence analysis was performed using the Big Dye Reaction terminator Cycle Sequencing Kit and an ABI 377 DNA Sequencer (PE Applied Biosystems, Inc.).
The average heterozygosity, number of alleles per locus, tests for goodness of fit to Hardy-Weinberg proportions, and Rst values were calculated with the software FSTAT2.9.3 (http://www.unil.ch/izea/softwares/fstat.html). Rst is analogous to Fst but is based on a stepwise mutation model (Slatkin 1995), which is generally considered to be more appropriate for STR loci. Large Rst values indicate large genetic distances between populations. We also calculated the ln RV value for each locus, which is the natural log of the ratio of the variance in allele size for two populations (Schlötterer 2002a). Large positive or negative values of ln RV for a particular STR locus indicate that one population has a much smaller allele size variance, which in turn might reflect a recent selective sweep at a nearby locus in that population. One-way ANOVA and nonparametric tests were carried out with STATISTICA (Statsoft, Inc.).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The 332 loci included 54 dinucleotide repeats, 41 trinucleotide repeats, and 237 tetranucleotide repeats; both average heterozygosity and the number of alleles for both Africans and Europeans differed significantly with respect to repeat type (table 1). All measures of variation were significantly higher for dinucleotide repeats, based on one-way ANOVA. The Africans had significantly higher average heterozygosities and number of alleles per locus than the Europeans (Wilcoxon matched pairs test: heterozygosity, ,
; number of alleles,
,
). The difference remained significant when the Ethiopians and South Africans were analyzed separately (results not shown).
|
|
|
|
|
|
To further investigate the properties of these unusual genomic regions, we took one such region and searched for additional STR loci. The target locus, D2S1400, has both a high Rst value (0.317) and a high ln RV value (2.30), and maps 0.9 kb from the gene for the E2F transcription factor 6 (E2F6; RefSeq ID NM 001952). An additional four STR loci were characterized in this region (fig. 6); Rst and ln RV values are high only for an additional locus (D2S3021) in the immediate vicinity of the E2F6 gene, suggesting that this gene might have been a target for local selection.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
There were no significant differences between autosomal loci and X-linked loci for variability within populations. This contrasts with estimates of nucleotide diversity based on DNA sequence analysis, in which lower levels of variability were found for X-linked loci (Yu et al. 2002), although the statistical significance of the difference in variability was not reported. Similarly, another study (Payseur, Cutter, and Nachman 2002) recently found weak (i.e., statistically nonsignificant) evidence of lower diversity on the X chromosome for published data on STR loci in Europeans. The smaller effective population size of the X chromosome relative to the autosomes leads to the expectation that diversity should be lower for X-linked loci, although the extent of this reduction also depends on the extent to which there is differential reproduction among males (Caballero 1995); the fewer the number of males that reproduce each generation, the more equal the effective sizes for X-linked and autosomal loci. Our failure to detect differences in variability for X-linked versus autosomal loci may reflect the high intrinsic mutation rate of STR loci, or it may reflect other factors such as differential reproduction among males.
Similarly, we found no significant differences between autosomal and X-linked loci with respect to genetic differentiation between populations, although both Rst and ln RV values were bigger for the X-linked loci than for the autosomal loci. However, three of the 15 loci that we identified for further study because of high Rst and/or ln RV values are on the X chromosome, significantly more than expected by chance (,
,
). Payseur, Cutter, and Nachman (2002) also found evidence for more departures from neutrality for STR loci on the X chromosome. They interpreted this as evidence for more positive selection on the X chromosome, but since the data they analyzed came from a single population source (Europeans), their results could reflect either local selection (i.e., affecting only Europeans) or positive selection (i.e., affecting all human populations). Larger genetic differences due to drift would be expected for X-linked loci, if the effective size is indeed smaller than that for autosomal loci. In addition, local selection involving recessive advantageous mutations would be expected to occur more frequently on the X chromosome (Charlesworth, Coyne, and Barton 1987), since recessive mutations at autosomal loci will be dominated by drift during their early history, until they have reached a high enough frequency for significant numbers of the homozygous recessive genotype to appear, at which point selection can act. By contrast, the phenotype of recessive advantageous mutations on the X chromosome will be immediately apparent in males, and hence selection will be much more effective. The fact that diversity is not lower for the X-linked loci, whereas genetic differentiation is larger for the X-linked loci (albeit not significantly so), suggests that differences in effective population size alone cannot explain these patterns. More loci need to be examined to see if genetic differentiation between human populations is indeed larger for X-linked loci, and (if so), to determine to what extent either a reduced effective population size or a greater propensity for local selection is responsible.
To detect candidate genomic regions under local selection, we calculated two measures of genetic distance for our data and looked for outliers (i.e., loci with unusually large genetic distance values). The rationale for this approach, as first proposed by Cavalli-Sforza (1966) and Lewontin and Krakauer (1973), is that since by definition local selection inflates allele frequency differences between populations, marker loci that show unusually large genetic distance values are good candidates for local selection. However, the extent to which an STR locus that is closely linked to a gene subjected to local selection will show an unusually large genetic distance value depends on a number of factors, including the strength of selection, the amount of time elapsed since selection began, the amount of recombination between the marker locus and the selected locus, the mutation rate for the STR locus, and the particular STR allele that was carried by the selected haplotype. These factors will have a different impact on Rst and ln RV values. Following a newly arisen favorable mutation, the frequency of the selected allele will increase, as will the frequency of the STR allele that is on the same haplotype as the selected allele. For ln RV values, the local selection will reduce the variance in the allele frequency distribution for the STR locus in the population in which the local selective sweep is occurring, relative to a population that is not experiencing the local selective sweep, leading to an unusually large ln RV value. Over time, new mutations at the STR locus, as well as recombination between the STR locus and the selected gene, will increase the variance in the allele frequency distribution. Eventually the variance in the selected and nonselected populations will equalize, and so the signal of local selection will not be evident in the ln RV value.
The impact of local selection on Rst values will depend on the particular STR allele that is on the haplotype of the selected allele. If the STR allele happens to be a common allele, then during the selective sweep Rst values will first increase moderately (due to the decrease in allelic variance) but then decrease as new mutations and/or recombination regenerate allelic variation at the STR locus. However, if the STR allele on the selected haplotype happens by chance to be a rare allele, then there will be a large increase in the Rst value, as the mode of the allele frequency distribution at the STR locus has shifted. The large Rst value will be maintained even as new alleles are generated by mutation, as the stepwise mutation process will generate new alleles around the new modal allele, and hence the entire allele frequency distribution at the STR locus will shift. Thus, the power of ln RV values to detect local selection will be highest immediately following the onset of selection and will then decline over time, whereas the power of Rst values should not decline with time, but instead will depend on the extent to which new modal alleles were produced as a consequence of selection.
We assumed that local selection would primarily influence Europeans, as modern humans originated in Africa, and hence new opportunities for local selection would have occurred as modern human populations spread out of Africa. Some support for this assumption comes from the distribution of ln RV values (fig. 3), in which there is an excess of extreme positive values (i.e., in the right-hand tail); since the variance in Africans appears in the numerator of the ln RV value, this indicates that there are many more loci showing significantly reduced variation in Europeans (relative to Africans) than in Africans (relative to Europeans). However, this should be interpreted cautiously, as an extreme bottleneck in Europeans, which is suggested by some genetic data (Tishkoff et al. 1996; Yu et al. 2002), could also lead to an excess of loci with significantly reduced variation in Europeans relative to Africans (Schlötterer 2002a).
Ideally, to identify significant outliers, the observed distribution of Rst and ln RV values would be compared with that expected under neutrality. Although some progress has been made toward understanding the statistical properties of Rst and ln RV values (Balloux and Goudet 2002; Schlötterer 2002a), the underlying assumptions of the models concerning demographic history, migration, and mutation raise questions as to the utility of using analytical approaches based on these models to identify loci in human populations that might be in genomic regions that have been subjected to local selection. A key issue, that has not been adequately addressed, is the extent to which lack of independence of population samples will influence the expected distribution of these statistics under neutrality versus selection (Nei and Chakravarti 1977; Nei, Chakravarti, and Tateno 1977). Further theoretical work on the statistical properties of Rst and ln RV values, as well as the development of new methods for detecting local selection (e.g., Sabeti et al. 2002) are required in order to make full use of the approach and the data.
We therefore adopted an empirical approach, based on the simple assumption that those loci with the largest Rst and/or ln RV values are the most likely candidates, and that if local selection has indeed been operating on the genomic regions containing such loci, then other loci in these genomic regions should also exhibit unusually large Rst and/or ln RV values. We found that this was indeed the case (table 2 and fig. 6); the additional STR loci that we characterized near the target loci had Rst and ln RV values that were significantly larger than average. However, even though the average values for the nearby loci were significantly larger, some of the individual nearby loci did not exhibit unusually large Rst and/or ln RV values. This could indicate that some of these genomic regions do not in fact exhibit unusually large genetic distances. Alternatively, the particular nearby STR locus may not have as much power as the original locus to detect unusually large genetic distances, as this will depend on the number of alleles and overall heterozygosity of the locus. Indeed, the average heterozygosity and number of alleles were both lower for the nearby loci than for the original target loci (table 2), significantly so for average heterozygosity (Mann-Whitney U test, ) and nearly significantly so for the number of alleles (
). Thus, the failure of a nearby locus to confirm the large Rst and/or ln RV value of a particular target locus does not rule out local selection on this genomic region.
|
As a further check on the reproducibility of our results, we compared our results with another recent study that looked for evidence of selection on the human genome. Schlötterer (2002a) applied his ln RV statistic to 94 loci that had been typed in 10 African and non-African populations. Although the number of outliers was consistent with neutral expectations, he considered four loci as possible candidates for local selection. One of these four loci, D6S305, was also included in our study; Schlötterer (2002a) found highly reduced variation at this locus in African populations, which we also found (ln RV = -0.97, compared with the average ln RV value of 0.20). Additionally, Rosenberg et al. (2002) recently analyzed 377 STR loci in 1052 individuals from 52 populations. Five of the 11 loci that we identified with high Rst and/or ln RV values were also analyzed in their study. We obtained the data for the Bantu and French samples in their study (which would be most comparable to our African and European samples) and calculated Rst and ln RV values for these five loci (table 3). The mean Rst and ln RV values were nearly identical for these five loci in both studies, and the mean Rst and ln RV values were significantly higher for these five loci in the Bantu/French comparison than for all of the loci in our study (Mann-Whitney tests: Rst, ,
; ln RV,
,
). Thus, loci that we identify as outliers in our study are also outliers when other samples are analyzed.
|
Further extensions to this approach include screening additional populations and incorporating additional loci into the genome scan (especially as developing technology enables genome scans based on SNPs). Moreover, loci under local selection in other species could also be identified by genome scans. Our results indicate that genome scans should aid in the identification of candidate regions under local selection in human populations, which will increase our knowledge of the selective factors and forces that have shaped human genetic variation.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Naruya Saitou, Associate Editor
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Allison, A. C. 1954. Protection afforded by sickle-cell trait against subterian malarial infection. Br. Med. J. 1:290-294.[ISI]
Akey, J. M., G. Zhang, K. Zhang, L. Jin, and M. D. Shriver. 2002. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12:1805-1814.
Baer, C. F. 1999. Among-locus variation in Fst: fish, allozymes and the Lewontin-Krakauer test revisited. Genetics 152:653-659.
Balloux, F., and J. Goudet. 2002. Statistical properties of population differentiation estimators under stepwise mutation in a finite island model. Mol. Ecol. 11:771-783.[CrossRef][ISI][Medline]
Beaumont, M. A., and R. A. Nichols. 1996. Evaluating loci for use in the genetic analysis of population structure. Proc. Roy. Soc. Lond. B Biol. Sci. 263:1619-1626.[ISI]
Bowcock, A. B., J. R. Kidd, J. L. Mountain, J. M. Hebert, L. Carotenuto, K. K. Kidd, and L. L. Cavalli-Sforza. 1991. Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. Proc. Natl. Acad. Sci. USA 88:839-843.[Abstract]
Caballero, A. 1995. On the effective size of populations with separate sexes, with particular reference to sex-linked genes. Genetics 139:1007-1011.
Cavalli-Sforza, L. L. 1966. Population structure and human evolution. Proc. Roy. Soc. Lond. B Biol. Sci. 164:362-379.[Medline]
Chakraborty, R., M. Kimmel, D. N. Stivers, L. J. Davison, and R. Deka. 1997. Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc. Natl. Acad. Sci. USA 94:1041-1046.
Charlesworth, B., J. A. Coyne, and N. H. Barton. 1987. The relative rates of evolution of sex chromosomes and autosomes. Am. Nat. 130:113-146.[CrossRef][ISI]
Hamblin, M. T., E. E. Thompson, and A. Di Rienzo. 2002. Complex signatures of natural selection at the Duffy blood group locus. Am. J. Hum. Genet. 70:369-383.[CrossRef][ISI][Medline]
Hill, A. V. S., C. E. M. Allsopp, D. Kwiatkowski, N. M. Anstey, P. Twumasi, P. A. Rowe, S. Bennett, D. Brewster, A. J. McMichael, and B. M. Greenwood. 1991. Common West African HLA antigens are associated with protection from severe malaria. Nature 352:595-600.[CrossRef][ISI][Medline]
Johnson, D. G., and R. Schneider-Broussard. 1998. Role of E2F in cell cycle control and cancer. Frontiers Biosci. 3:447-458.
Jorde, L. B., W. S. Watkins, M. J. Bamshad, M. E. Dixon, C. E. Ricker, M. T. Seielstad, and M. A. Batzer. 2000. The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data. Am. J. Hum. Genet. 66:979-988.[CrossRef][ISI][Medline]
Kent, W. J., C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. 2002. The human genome browser at UCSC. Genome Res. 12:996-1006.
Lahr, M. M., and R. A. Foley. 1998. Towards a theory of modern human origins: geography, demography, and diversity in recent human evolution. Yrbk. Phys. Anthropol. 41:137-176.[ISI]
Lewontin, R. C., and J. Krakauer. 1973. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74:175-195.
Lewontin, R. C., and J. Krakauer. 1975. Letters to the editors: testing the heterogeneity of F values. Genetics 80:397-398.
Lindqvist, A. K., P. K. Magnusson, J. Balciuniene, C. Wadelius, E. Lindholm, M. E. Alarcon-Riquelme, and U. B. Gyllensten. 1996. Chromosome-specific panels of tri- and tetranucleotide microsatellite markers for multiplex fluorescent detection and automated genotyping: evaluation of their utility in pathology and forensics. Genome Res. 6:1170-1176.[Abstract]
Luzzatto, L., E. Usanga, and S. Reddy. 1969. Glucose-6-phosphate dehydrogenase deficient red cells: resistance to infection by malarial parasites. Science 164:839-841.[ISI][Medline]
Miller, L. H., S. J. Mason, D. F. Clyde, and M. H. McGinniss. 1976. The resistance factor to Plasmodium vivax in blacks. The Duffy-blood-group genotype, FyFy. N. Engl. J. Med. 295:302-304.[Abstract]
Mountain, J. L. 1998. Molecular evolution and modern human origins. Evol. Anthropol. 7:21-37.[ISI]
Nei, M., and A. Chakravarti. 1977. Drift variances of Fst and Gst statistics obtained from a finite number of isolated populations. Theor. Popul. Biol. 11:307-325.[ISI][Medline]
Nei, M., A. Chakravarti, and Y. Tateno. 1977. Mean and variance of Fst in a finite number of incompletely isolated populations. Theor. Popul. Biol. 11:291-306.[ISI][Medline]
Nei, M., and T. Maruyama. 1975. Letters to the editors: Lewontin-Krakauer test for neutral genes. Genetics 80:395.
Ogawa, H., K. Ishiguro, S. Gaubatz, D. M. Livingston, and Y. Nakatani. 2002. A complex with chromatin modifiers that occupies E2F- and Myc-responsive genes in G0 cells. Science 296:1132-1136.
Payseur, B. A., A. D. Cutter, and M. W. Nachman. 2002. Searching for evidence of positive selection in the human genome using patterns of microsatellite variability. Mol. Biol. Evol. 19:1143-1153.
Robertson, A. 1975. Gene frequency distributions as a test of selective neutrality. Genetics 81:775-785.
Rosenberg, N. A., J. K. Pritchard, J. L. Weber, H. M. Cann, K. K. Kidd, L. A. Zhivotovsky, and M. W. Feldman. 2002. Genetic structure of human populations. Science 298:2381-2385.
Sabeti, P. C., D. E. Reich, and J. M. Higgins, et al. (17 co-authors). 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832-837.[CrossRef][ISI][Medline]
Schlötterer, C. 2002a. A microsatellite-based multilocus screen for the identification of local selective sweeps. Genetics 160:753-763.
Schlötterer, C. 2002b. Towards a molecular characterization of adaptation in local populations. Curr. Opin. Genet. Dev. 12:683-687.[CrossRef][ISI][Medline]
Slatkin, M. 1995. A measure of population subdivision based on microsatellite allele frequencies. Genetics 139:457-462.
Stoneking, M. 1993. DNA and recent human evolution. Evol. Anthropol. 2:60-73.
Stoneking, M., J. J. Fontius, S. L. Clifford, H. Soodyall, S. S. Arcot, N. Saha, T. Jenkins, M. A. Tahir, P. L. Deininger, and M. A. Batzer. 1997. Alu insertion polymorphisms and human evolution: evidence for a larger population size in Africa. Genome Res. 7:1061-1071.
Tishkoff, S. A., E. Dietzsch, and W. Speed, et al. (15 co-authors). 1996. Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271:1380-1387.[Abstract]
Tishkoff, S. A., R. Varkonyi, and N. Cahinhinan, et al. (17 co-authors). 2001. Haplotype diversity and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial resistance. Science 293:455-462.
Trimarchi, J. M., B. Fairchild, J. Wen, and J. A. Lees. 2001. The E2F6 transcription factor is a component of the mammalian Bmi1-containing polycomb complex. Proc. Natl. Acad. Sci. USA 98:1519-1524.
Vigilant, L., M. Stoneking, H. Harpending, K. Hawkes, and A. C. Wilson. 1991. African populations and the evolution of human mitochondrial DNA. Science 253:1503-1507.[ISI][Medline]
Vitalis, R., K. Dawson, and P. Boursot. 2001. Interpretation of variation across marker loci as evidence of selection. Genetics 158:1811-1823.
Yu, N., F. C. Chen, S. Ota, L. B. Jorde, P. Pamilo, L. Patthy, M. Ramsay, T. Jenkins, S. K. Shyue, and W. H. Li. 2002. Larger genetic differences within Africans than between Africans and Eurasians. Genetics 161:269-274.