Graduate Institute of Epidemiology, College of Public Health, National Taiwan University. No. 1, Jen-Ai Rd, 1st Sec, Taipei, Taiwan. E-mail: wenchung{at}ha.mc.ntu.edu.tw
SirsAs genetic studies of complex human diseases are relying more and more on the epidemiological association paradigm,1 it becomes a crucial issue to determine whether a population is a homogeneous one or has hidden structures within it. If the former is true, one can infer the genomic location(s) of the putative susceptibility gene(s) for a particular disease by simply comparing marker allele frequencies between cases and controls recruited from the population. Whereas if the latter is true, a naive case-control approach will produce an excess of false positive results.2 Here I propose a method to detect population stratification (subdivision) using a panel of single nucleotide polymorphisms (SNP).3 The SNP are the most abundant type of human genetic markers. Genotyping of SNP has the potential for automation and the cost of doing it is expected to go down in the future.
Assume that a total of p SNP (indexed by i) have been genotyped in n subjects recruited from a population. For the ith SNP marker, let bi represent the number of subjects with heterozygous Mm genotype, and let ai and ci represent the numbers of subjects with homozygous MM and mm genotypes, respectively (ai + bi + ci = n). Further define Di = 4aici - bi (bi - 1). The expectation of Di can be written as the sum of two terms:
![]() |
Under the null hypothesis of a Hardy-Weinberg population, the first term becomes
![]() |
(see ref. 4), and the second term, n [2fi(1 - fi)], where fi is the allele frequency of the ith SNP. Thus we see that E(Di) = 0 under the null.
One can choose the p SNP to be unlinked or in linkage equilibrium (widely spaced, say >~10 cM), such that the Dis are independent of one another. Therefore, we have
![]() |
distributed asymptotically (for large p) as the standard normal distribution under the null. Under the alternative hypothesis that the subjects are recruited from a population with hidden structures, we have E(Di) > 0 due to the well-known Wahlund phenomenon (an excess of homozygotes).5 Thus, an upper one-sided test based on Z, which combines the information of a panel of SNPs, can be used to detect population stratification.
There are three important characteristics of such an approach: (1) the method does not require knowing the allele frequencies of a panel of SNP in advance; (2) the method is contingent on the number of typed markers (p) being large, whereas the number of subjects recruited (n) can be small (even n = 2 will do, provided p is large); and (3) the method involves nothing more than simple arithmetic. The contingency table 2 test6 and the Hardy-Weinberg test7 had previously been proposed to test for population stratification. However, these two methods will lead to false rejection or acceptance when the number of subjects recruited is small or some cell frequencies are small or zero. Another alternative would be to consider the structured association approach.810 However, it demands many computer-intensive modelling efforts, which are beyond the scope of most epidemiologists.
If a global survey is to be conducted to detect possible hidden structure in the human populations, due to cost and time constraint, one probably would have to be content with just a few subjects for each racial/ethnic group. The present approach is a viable alternative in such a scenario.
References
1 Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science 1996;273:151617.[ISI][Medline]
2 Ewens WJ, Spielman RS. The transmission/disequilibrium test: history, subdivision and admixture. Am J Hum Genet 1995;57:45564.[ISI][Medline]
3 The International SNP Map Working Group. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409:92833.[CrossRef][ISI][Medline]
4 Hernández JL, Weir BS. A disequilibrium coefficient approach to Hardy-Weinberg testing. Biometrics 1989;45:5370.[ISI][Medline]
5 Li CC. Population subdivision with respect to multiple alleles. Ann Hum Genet 1969;33:2329.[ISI][Medline]
6 Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 1999;65:22028.[CrossRef][ISI][Medline]
7 Deng HW, Chen WM, Recker RR. Population admixture: detection by Hardy-Weinberg test and its quantitative effects on linkage-disequilibrium methods for localizing genes underlying complex traits. Genetics 2001;157:88597.
8 Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet 2000;67:17081.[CrossRef][ISI][Medline]
9 Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics 2000;155:94559.
10 Satten GA, Flanders WD, Yang Q. Account for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet 2001;68:46677.[CrossRef][ISI][Medline]
|