Detecting population stratification using a panel of single nucleotide polymorphisms

Wen-Chung Lee

Graduate Institute of Epidemiology, College of Public Health, National Taiwan University. No. 1, Jen-Ai Rd, 1st Sec, Taipei, Taiwan. E-mail: wenchung{at}ha.mc.ntu.edu.tw

Sirs—As genetic studies of complex human diseases are relying more and more on the epidemiological association paradigm,1 it becomes a crucial issue to determine whether a population is a homogeneous one or has hidden structures within it. If the former is true, one can infer the genomic location(s) of the putative susceptibility gene(s) for a particular disease by simply comparing marker allele frequencies between ‘cases’ and ‘controls’ recruited from the population. Whereas if the latter is true, a naive case-control approach will produce an excess of false positive results.2 Here I propose a method to detect population stratification (subdivision) using a panel of ‘single nucleotide polymorphisms’ (SNP).3 The SNP are the most abundant type of human genetic markers. Genotyping of SNP has the potential for automation and the cost of doing it is expected to go down in the future.

Assume that a total of p SNP (indexed by i) have been genotyped in n subjects recruited from a population. For the ith SNP marker, let bi represent the number of subjects with heterozygous ‘Mm’ genotype, and let ai and ci represent the numbers of subjects with homozygous ‘MM’ and ‘mm’ genotypes, respectively (ai + bi + ci = n). Further define Di = 4aici - bi (bi - 1). The expectation of Di can be written as the sum of two terms:


Under the null hypothesis of a Hardy-Weinberg population, the first term becomes


(see ref. 4), and the second term, n • [2fi(1 - fi)], where fi is the allele frequency of the ith SNP. Thus we see that E(Di) = 0 under the null.

One can choose the p SNP to be unlinked or in linkage equilibrium (widely spaced, say >~10 cM), such that the Di’s are independent of one another. Therefore, we have


distributed asymptotically (for large p) as the standard normal distribution under the null. Under the alternative hypothesis that the subjects are recruited from a population with hidden structures, we have E(Di) > 0 due to the well-known Wahlund phenomenon (an excess of homozygotes).5 Thus, an upper one-sided test based on Z, which combines the information of a panel of SNPs, can be used to detect population stratification.

There are three important characteristics of such an approach: (1) the method does not require knowing the allele frequencies of a panel of SNP in advance; (2) the method is contingent on the number of typed markers (p) being large, whereas the number of subjects recruited (n) can be small (even n = 2 will do, provided p is large); and (3) the method involves nothing more than simple arithmetic. The contingency table {chi}2 test6 and the Hardy-Weinberg test7 had previously been proposed to test for population stratification. However, these two methods will lead to false rejection or acceptance when the number of subjects recruited is small or some cell frequencies are small or zero. Another alternative would be to consider the ‘structured association’ approach.8–10 However, it demands many computer-intensive modelling efforts, which are beyond the scope of most epidemiologists.

If a global survey is to be conducted to detect possible hidden structure in the human populations, due to cost and time constraint, one probably would have to be content with just a few subjects for each racial/ethnic group. The present approach is a viable alternative in such a scenario.

References

1 Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science 1996;273:1516–17.[ISI][Medline]

2 Ewens WJ, Spielman RS. The transmission/disequilibrium test: history, subdivision and admixture. Am J Hum Genet 1995;57:455–64.[ISI][Medline]

3 The International SNP Map Working Group. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409:928–33.[CrossRef][ISI][Medline]

4 Hernández JL, Weir BS. A disequilibrium coefficient approach to Hardy-Weinberg testing. Biometrics 1989;45:53–70.[ISI][Medline]

5 Li CC. Population subdivision with respect to multiple alleles. Ann Hum Genet 1969;33:23–29.[ISI][Medline]

6 Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 1999;65:220–28.[CrossRef][ISI][Medline]

7 Deng HW, Chen WM, Recker RR. Population admixture: detection by Hardy-Weinberg test and its quantitative effects on linkage-disequilibrium methods for localizing genes underlying complex traits. Genetics 2001;157:885–97.[Abstract/Free Full Text]

8 Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet 2000;67:170–81.[CrossRef][ISI][Medline]

9 Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics 2000;155:945–59.[Abstract/Free Full Text]

10 Satten GA, Flanders WD, Yang Q. Account for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. Am J Hum Genet 2001;68:466–77.[CrossRef][ISI][Medline]





This Article
Extract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (2)
Request Permissions
Google Scholar
Articles by Lee, W.-C.
PubMed
PubMed Citation
Articles by Lee, W.-C.