Searching for Disease-Susceptibility Loci by Testing for Hardy-Weinberg Disequilibrium in a Gene Bank of Affected Individuals

Wen-Chung Lee 

From the Graduate Institute of Epidemiology, College of Public Health, National Taiwan University, Taipei, Taiwan.

Received for publication June 4, 2002; accepted for publication February 11, 2003.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 TESTING FOR HARDY-WEINBERG...
 POWER FORMULA AND POWER...
 ASSUMPTIONS AND LIMITATIONS
 CONCLUSION
 REFERENCES
 
The future of genetic studies of complex human diseases will rely more and more on the epidemiologic association paradigm. The author proposes to scan the genome for disease-susceptibility gene(s) by testing for deviation from Hardy-Weinberg equilibrium in a gene bank of affected individuals. A power formula is presented, which is very accurate as revealed by Monte Carlo simulations. If the disease-susceptibility gene is recessive with an allele frequency of <=0.5 or dominant with an allele frequency of >=0.5, the number of subjects needed by the present method is smaller than that needed by using a case-parents design (using either the transmission/disequilibrium test or the 2-df likelihood ratio test). However, the method cannot detect genes with a multiplicative mode of inheritance, and the validity of the method relies on the assumption that the source population from which the cases arise is in Hardy-Weinberg equilibrium. Thus, it is prone to produce false positive and false negative results. Nevertheless, the method enables rapid gene hunting in an existing gene bank of affected individuals with no extra effort beyond simple calculations.

disease susceptibility; epidemiologic methods; gene library; genetics; genome; Hardy-Weinberg equilibrium; Monte Carlo method; polymorphism, single nucleotide

Abbreviations: Abbreviation: HWT, Hardy-Weinberg disequilibrium test.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 TESTING FOR HARDY-WEINBERG...
 POWER FORMULA AND POWER...
 ASSUMPTIONS AND LIMITATIONS
 CONCLUSION
 REFERENCES
 
Editor’s note: An invited commentary on this article appears on page 401, and the author’s response appears on page 404.

Whereas linkage analysis has been successfully used to localize disease-causing genes for many monogenic diseases, it has been argued that genetic analysis of complex human diseases calls for a new approach—the epidemiologic association paradigm (1, 2). In particular, the application of the transmission/disequilibrium test in a case-parents study has received much attention (3, 4). However, a transmission/disequilibrium test analysis requires parental genotypes, which can pose serious problems in practice. Parents (serving as the control group) may live elsewhere and be hard to reach, may refuse to participate, or simply may have died already. This is particularly true when the disease under study has an age-at-onset in adulthood, such as non-insulin-dependent diabetes, cardiovascular diseases, Alzheimer’s disease, many forms of cancers, and so on.

Feder et al. (5) have suggested a control-free "case-only" approach to test for deviation from Hardy-Weinberg equilibrium among affected individuals. (A biallelic marker with alleles A and a is in Hardy-Weinberg equilibrium when its genotype frequencies are q2 (AA), 2q(1 – q) (Aa), and (1 – q)2 (aa), where q is the frequency of allele A in the population. A population is in Hardy-Weinberg equilibrium when all the markers are in Hardy-Weinberg equilibrium.) They and subsequent researchers (6, 7) are concerned mainly with the problem of precise localization of a disease-susceptibility locus. Here, I propose to use the principle as a genome-wide screening tool. The method is especially suited for use in a large referral center, where genotyping is done routinely for affected individuals but where a control group, either the population control or the parental control, is difficult to obtain.


    TESTING FOR HARDY-WEINBERG DISEQUILIBRIUM IN A GENE BANK OF AFFECTED INDIVIDUALS
 TOP
 ABSTRACT
 INTRODUCTION
 TESTING FOR HARDY-WEINBERG...
 POWER FORMULA AND POWER...
 ASSUMPTIONS AND LIMITATIONS
 CONCLUSION
 REFERENCES
 
Suppose that a gene bank of marker genotypes across the whole genome for a total of n affected individuals has been established for a disease in a particular population. For a particular marker (e.g., the A marker, with alleles A and a), the number of cases with genotype AA is denoted as n11, the number with genotype Aa as n12, and the number with genotype aa as n22 (n11 + n12 + n22 = n). (This paper considers markers that are biallelic, because a dense map of biallelic single nucleotide polymorphisms will be ready for use in the very near future (8, 9).) The statistic to test for deviation from Hardy-Weinberg equilibrium using this marker, that is, the Hardy-Weinberg disequilibrium test (HWT), is the following (10):

where is the allele frequency of A in the sample, and is the sample estimate of the disequilibrium coefficient measuring the departure of the frequency of heterozygotes (Aa) from the Hardy-Weinberg expected frequency. Note that the square of HWT is the usual goodness-of-fit statistic, that is,

Under Hardy-Weinberg equilibrium (i.e., under H0: D = 0), HWT is asymptotically distributed as a standard normal.

Assume that the source population from which the affected individuals arise is a population in Hardy-Weinberg equilibrium. The question here is whether the affected individuals themselves are also in Hardy-Weinberg equilibrium with respect to the A marker. Denote the genotype relative risks as {Psi}1 (Aa/aa) and {Psi}2 (AA/aa), respectively. (Here the A marker is assumed to be a disease-susceptibility gene.) Among the affected individuals, the expected genotypic frequencies of the A marker are

,

,

,

and the expected allele frequency of A is

,

where q is the allele frequency of A in the source population, and R is a normalizing constant equal to q2{Psi}2 + 2q(1 – q){Psi}1 + (1 – q)2, ensuring that the three probabilities sum to 1. The disequilibrium coefficient in the affected population is

.

It is clear that D != 0 when {Psi}2 != {Psi}12. Thus, by testing for deviation from Hardy-Weinberg equilibrium in a sample of affected individuals (using the above HWT) marker by marker with proper multiple-testing adjustment, one can screen for susceptibility gene(s) for the disease under study, provided that the gene(s) displays a nonmultiplicative mode of inheritance.


    POWER FORMULA AND POWER COMPARISON
 TOP
 ABSTRACT
 INTRODUCTION
 TESTING FOR HARDY-WEINBERG...
 POWER FORMULA AND POWER...
 ASSUMPTIONS AND LIMITATIONS
 CONCLUSION
 REFERENCES
 
The distribution of the HWT (under both H0 and H1) can be approximated by a normal distribution. Using the multivariate delta method (11), it can be shown that such a distribution has a mean of

and a variance of

Let zx denote the x-quantile of a standard normal distribution. Then

Power of the

with Z being a standard normal-distributed random variable.

To compare the powers between the present method and the case-parents designs, we considered the same modes of inheritance as used by Knapp (12) (excluding the multiplicative mode of inheritance): 1) the "additive" model ({Psi}1 = {gamma} and {Psi}2 = 2{gamma}, according to Camp’s definition (13) of additive mode of inheritance for the sake of comparability), 2) the recessive model ({Psi}1 = 1 and {Psi}2 = {gamma}), and 3) the dominant model ({Psi}1 = {Psi}2 = {gamma}). The test was two sided with the {alpha}-level set at 10–7. This corresponds to {alpha} = 5 x 10–8 for the genome-wide, one-sided transmission/disequilibrium tests used by Risch and Merikangas (1). (If allele A is positively associated with the disease and {alpha} is small, the power of the one-sided transmission/disequilibrium test with a type I error rate of {alpha} is very near the power of the two-sided transmission/disequilibrium test with a type I error rate of 2{alpha}.) For each combination of mode of inheritance, risk parameter {gamma} ({gamma} = 1.5, 2, 4), and allele frequency of A in the source population q (q = 0.01, 0.1, 0.5, 0.8), the sample sizes necessary to gain 80 percent power for the (two-sided) HWT were calculated by solving the above power formula using a bisection method. (Note that Camp’s additive model with {gamma} = 2 was not considered here, because it is actually a multiplicative mode of inheritance.)

To check the precision of power approximation, 100,000 simulated data sets at the above-calculated sample sizes were generated. For each round of simulation, the HWT was calculated, and the true power was estimated as the proportion of simulations rejecting the null hypothesis at {alpha} = 10–7.

Table 1 presents the calculated sample sizes necessary to gain 80 percent power for an effect at a single locus by the HWT under various conditions. The empirical powers based on simulations (in parentheses) match very well with the expected value of 0.80, indicating that the power formula is quite accurate. The same table also presents the sample sizes needed by the case-parents design (using the conventional transmission/disequilibrium test as well as the 2-df likelihood ratio test that assumes Hardy-Weinberg equilibrium in the source population (14)). The sample sizes (numbers of study subjects) needed by the two-sided transmission/disequilibrium test were taken from table 3 of the article by Knapp (12) and were multiplied by three before presentation. (Knapp’s paper presented the numbers of case-parents "triads" instead.) The numbers of study subjects needed by the likelihood ratio test were calculated using the method of Longmate (15). It is of particular interest to see that, if the disease-susceptibility gene is recessive with an allele-frequency of <=0.5 or dominant with an allele-frequency of >=0.5, the sample sizes needed by the HWT can be smaller than the corresponding sample sizes needed by the transmission/disequilibrium test and the likelihood ratio test (table 1).


View this table:
[in this window]
[in a new window]
 
TABLE 1. Sample size necessary to gain 80% power ({alpha} = 10–7) for the Hardy-Weinberg disequilibrium test (number of affected individuals needed), the transmission/disequilibrium test (affected individuals plus their parents), and the likelihood ratio test (affected individuals plus their parents) under various modes of inheritance
 

    ASSUMPTIONS AND LIMITATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 TESTING FOR HARDY-WEINBERG...
 POWER FORMULA AND POWER...
 ASSUMPTIONS AND LIMITATIONS
 CONCLUSION
 REFERENCES
 
Although this method makes a convenient gene-searching tool especially for the screening of dominant and recessive genes, it has no power at all to detect genes that display a multiplicative mode of inheritance. (The Hardy-Weinberg equilibrium is preserved under such a scenario.) The sample size requirement may further jeopardize its use to search for genes with an additive mode of inheritance. Therefore, even if a very dense array of markers were genotyped across the genome, one should not expect the method to produce a complete catalog of all the susceptibility genes of a disease.

The method assumes Hardy-Weinberg equilibrium in the source population from which the cases arise and also random sampling of cases (or at least that the missing cases are noninformatively missing). A population can deviate from the Hardy-Weinberg equilibrium for various reasons, biologic or nonbiologic (16). Differential survival of individuals with different genotypes is one of the biologic reasons. It causes the adult and elderly segments of a population to deviate from Hardy-Weinberg equilibrium, even if the newborns of that population are in Hardy-Weinberg equilibrium. Another biologic reason is assortative mating in the population, where the probability of mating between two individuals is related to their phenotypic similarity. A positive signal in a case-only HWT analysis should thus be interpreted with caution, for it could imply that the marker being tested is in linkage disequilibrium with a gene that contributes to disease susceptibility (true positive in the present context), with a gene that affects survival, or with a gene that is associated with the choice of mates.

Yet a more subtle nonbiologic reason for deviation from Hardy-Weinberg equilibrium is "population stratification," whereby the population has a mating substructure and mating is restricted to subjects in the same stratum. In this case, even if mating is random within each stratum, the population as a whole may deviate from Hardy-Weinberg equilibrium (16). A positive signal due to population stratification is the bona fide false alarm. Following the lead, a genomic search in the vicinity of the marker(s) with a significant HWT will most likely be ineffectual, with not a gene found that affects survival or mating choice, let alone a gene that contributes to disease susceptibility. If such a mating substructure can be delineated using variables such as ethnicity or race, one can perform a stratified analysis to adjust for the bias (e.g., by defining a "pooled" HWT statistic, such as

where s is the stratum indicator (17)). If not, one can specify the HWT to be one sided, testing exclusively for heterozygote excess (D < 0). (A population substructure in itself can produce a deviation from Hardy-Weinberg equilibrium only in the direction of heterozygote deficiency, the Wahlund principle (18).) However, such a one-sided approach will fail to detect recessive genes or genes with {Psi}2 >= {Psi}12.


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 TESTING FOR HARDY-WEINBERG...
 POWER FORMULA AND POWER...
 ASSUMPTIONS AND LIMITATIONS
 CONCLUSION
 REFERENCES
 
If a gene bank for a disease has been established in a particular population, one can scan the genome for possible disease-susceptibility gene(s) by testing marker by marker for deviation from Hardy-Weinberg equilibrium. This involves no extra effort beyond some simple calculations. The method thus enables rapid gene hunting. However, one should be aware of the potential for false positive and false negative results.


    ACKNOWLEDGMENTS
 
This study was partly supported by a grant from the National Science Council, Republic of China.


    NOTES
 
Reprint requests to Dr. Wen-Chung Lee, Graduate Institute of Epidemiology, National Taiwan University, No. 1, Jen-Ai Road, Section 1, Taipei 100, Taiwan (e-mail: wenchung{at}ha.mc.ntu.edu.tw). Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 TESTING FOR HARDY-WEINBERG...
 POWER FORMULA AND POWER...
 ASSUMPTIONS AND LIMITATIONS
 CONCLUSION
 REFERENCES
 

  1. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science 1996;273:1516–17.[ISI][Medline]
  2. Khoury MJ, Yang Q. The future of genetic studies of complex human diseases: an epidemiologic perspective. Epidemiology 1998;9:350–4.[ISI][Medline]
  3. Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506–16.[ISI][Medline]
  4. Ewens WJ, Spielman RS. The transmission/disequilibrium test: history, subdivision and admixture. Am J Hum Genet 1995;57:455–64.[ISI][Medline]
  5. Feder JN, Gnirke A, Thomas W, et al. A novel MHC class I-like gene is mutated in patients with hereditary haemochromatosis. Nat Genet 1996;13:399–408.[ISI][Medline]
  6. Nielsen DM, Ehm MB, Weir BS. Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. Am J Hum Genet 1999;63:1531–40.[CrossRef][ISI]
  7. Jiang R, Dong J, Wang D, et al. Fine-scale mapping using Hardy-Weinberg disequilibrium. Ann Hum Genet 2001;65:207–19.[CrossRef][ISI][Medline]
  8. Wang DG, Fan JB, Siao CJ, et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 1998;280:1077–82.[Abstract/Free Full Text]
  9. Sachidanandam R, Weissman D, Schmidt SC, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409:928–33.[CrossRef][ISI][Medline]
  10. Hernández JL, Weir BS. A disequilibrium coefficient approach to Hardy-Weinberg testing. Biometrics 1989;45:53–70.[ISI][Medline]
  11. Agresti A. Categorical data analysis. New York, NY: John Wiley & Sons, 1990.
  12. Knapp M. A note on power approximation for the transmission/disequilibrium test. Am J Hum Genet 1999;64:1177–85.[ISI][Medline]
  13. Camp NJ. Genomewide transmission/disequilibrium testing—consideration of the genotypic relative risks at disease loci. Am J Hum Genet 1997;61:1424–30.[CrossRef][ISI][Medline]
  14. Weinberg CR, Wilcox AJ, Lie RT. A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet 1998;62:969–78.[CrossRef][ISI][Medline]
  15. Longmate JA. Complexity and power in case-control association studies. Am J Hum Genet 2001;68:1229–37.[CrossRef][ISI][Medline]
  16. Sham P. Statistics in human genetics. New York, NY: Oxford University Press, 1998.
  17. Nam JM. Testing a genetic equilibrium across strata. Ann Hum Genet 1997;61:163–70.[ISI][Medline]
  18. Hartl DL, Clark AG. Principles of population genetics. 3rd ed. Sunderland, MA: Sinauer Associates, Inc, 1997.

Related articles in Am. J. Epidemiol.:

Invited Commentary: Testing for Hardy-Weinberg Disequilibrium Using a Genome Single-Nucleotide Polymorphism Scan Based on Cases Only
Clarice R. Weinberg and Richard W. Morris
Am. J. Epidemiol. 2003 158: 401-403. [Extract] [FREE Full Text]  

Lee Responds to "Testing for Hardy-Weinberg Disequilibrium"
Wen-Chung Lee
Am. J. Epidemiol. 2003 158: 404-405. [Extract] [FREE Full Text]