From the Graduate Institute of Epidemiology, College of Public Health, National Taiwan University, Taipei, Taiwan.
Received for publication June 4, 2002; accepted for publication February 11, 2003.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
disease susceptibility; epidemiologic methods; gene library; genetics; genome; Hardy-Weinberg equilibrium; Monte Carlo method; polymorphism, single nucleotide
Abbreviations: Abbreviation: HWT, Hardy-Weinberg disequilibrium test.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Whereas linkage analysis has been successfully used to localize disease-causing genes for many monogenic diseases, it has been argued that genetic analysis of complex human diseases calls for a new approachthe epidemiologic association paradigm (1, 2). In particular, the application of the transmission/disequilibrium test in a case-parents study has received much attention (3, 4). However, a transmission/disequilibrium test analysis requires parental genotypes, which can pose serious problems in practice. Parents (serving as the control group) may live elsewhere and be hard to reach, may refuse to participate, or simply may have died already. This is particularly true when the disease under study has an age-at-onset in adulthood, such as non-insulin-dependent diabetes, cardiovascular diseases, Alzheimers disease, many forms of cancers, and so on.
Feder et al. (5) have suggested a control-free "case-only" approach to test for deviation from Hardy-Weinberg equilibrium among affected individuals. (A biallelic marker with alleles A and a is in Hardy-Weinberg equilibrium when its genotype frequencies are q2 (AA), 2q(1 q) (Aa), and (1 q)2 (aa), where q is the frequency of allele A in the population. A population is in Hardy-Weinberg equilibrium when all the markers are in Hardy-Weinberg equilibrium.) They and subsequent researchers (6, 7) are concerned mainly with the problem of precise localization of a disease-susceptibility locus. Here, I propose to use the principle as a genome-wide screening tool. The method is especially suited for use in a large referral center, where genotyping is done routinely for affected individuals but where a control group, either the population control or the parental control, is difficult to obtain.
![]() |
TESTING FOR HARDY-WEINBERG DISEQUILIBRIUM IN A GENE BANK OF AFFECTED INDIVIDUALS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
where is the allele frequency of A in the sample, and
is the sample estimate of the disequilibrium coefficient measuring the departure of the frequency of heterozygotes (Aa) from the Hardy-Weinberg expected frequency. Note that the square of HWT is the usual goodness-of-fit statistic, that is,
Under Hardy-Weinberg equilibrium (i.e., under H0: D = 0), HWT is asymptotically distributed as a standard normal.
Assume that the source population from which the affected individuals arise is a population in Hardy-Weinberg equilibrium. The question here is whether the affected individuals themselves are also in Hardy-Weinberg equilibrium with respect to the A marker. Denote the genotype relative risks as 1 (Aa/aa) and
2 (AA/aa), respectively. (Here the A marker is assumed to be a disease-susceptibility gene.) Among the affected individuals, the expected genotypic frequencies of the A marker are
,
,
,
and the expected allele frequency of A is
,
where q is the allele frequency of A in the source population, and R is a normalizing constant equal to q22 + 2q(1 q)
1 + (1 q)2, ensuring that the three probabilities sum to 1. The disequilibrium coefficient in the affected population is
.
It is clear that D 0 when
2
12. Thus, by testing for deviation from Hardy-Weinberg equilibrium in a sample of affected individuals (using the above HWT) marker by marker with proper multiple-testing adjustment, one can screen for susceptibility gene(s) for the disease under study, provided that the gene(s) displays a nonmultiplicative mode of inheritance.
![]() |
POWER FORMULA AND POWER COMPARISON |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
and a variance of
Let zx denote the x-quantile of a standard normal distribution. Then
Power of the
with Z being a standard normal-distributed random variable.
To compare the powers between the present method and the case-parents designs, we considered the same modes of inheritance as used by Knapp (12) (excluding the multiplicative mode of inheritance): 1) the "additive" model (1 =
and
2 = 2
, according to Camps definition (13) of additive mode of inheritance for the sake of comparability), 2) the recessive model (
1 = 1 and
2 =
), and 3) the dominant model (
1 =
2 =
). The test was two sided with the
-level set at 107. This corresponds to
= 5 x 108 for the genome-wide, one-sided transmission/disequilibrium tests used by Risch and Merikangas (1). (If allele A is positively associated with the disease and
is small, the power of the one-sided transmission/disequilibrium test with a type I error rate of
is very near the power of the two-sided transmission/disequilibrium test with a type I error rate of 2
.) For each combination of mode of inheritance, risk parameter
(
= 1.5, 2, 4), and allele frequency of A in the source population q (q = 0.01, 0.1, 0.5, 0.8), the sample sizes necessary to gain 80 percent power for the (two-sided) HWT were calculated by solving the above power formula using a bisection method. (Note that Camps additive model with
= 2 was not considered here, because it is actually a multiplicative mode of inheritance.)
To check the precision of power approximation, 100,000 simulated data sets at the above-calculated sample sizes were generated. For each round of simulation, the HWT was calculated, and the true power was estimated as the proportion of simulations rejecting the null hypothesis at = 107.
Table 1 presents the calculated sample sizes necessary to gain 80 percent power for an effect at a single locus by the HWT under various conditions. The empirical powers based on simulations (in parentheses) match very well with the expected value of 0.80, indicating that the power formula is quite accurate. The same table also presents the sample sizes needed by the case-parents design (using the conventional transmission/disequilibrium test as well as the 2-df likelihood ratio test that assumes Hardy-Weinberg equilibrium in the source population (14)). The sample sizes (numbers of study subjects) needed by the two-sided transmission/disequilibrium test were taken from table 3 of the article by Knapp (12) and were multiplied by three before presentation. (Knapps paper presented the numbers of case-parents "triads" instead.) The numbers of study subjects needed by the likelihood ratio test were calculated using the method of Longmate (15). It is of particular interest to see that, if the disease-susceptibility gene is recessive with an allele-frequency of 0.5 or dominant with an allele-frequency of
0.5, the sample sizes needed by the HWT can be smaller than the corresponding sample sizes needed by the transmission/disequilibrium test and the likelihood ratio test (table 1).
|
![]() |
ASSUMPTIONS AND LIMITATIONS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The method assumes Hardy-Weinberg equilibrium in the source population from which the cases arise and also random sampling of cases (or at least that the missing cases are noninformatively missing). A population can deviate from the Hardy-Weinberg equilibrium for various reasons, biologic or nonbiologic (16). Differential survival of individuals with different genotypes is one of the biologic reasons. It causes the adult and elderly segments of a population to deviate from Hardy-Weinberg equilibrium, even if the newborns of that population are in Hardy-Weinberg equilibrium. Another biologic reason is assortative mating in the population, where the probability of mating between two individuals is related to their phenotypic similarity. A positive signal in a case-only HWT analysis should thus be interpreted with caution, for it could imply that the marker being tested is in linkage disequilibrium with a gene that contributes to disease susceptibility (true positive in the present context), with a gene that affects survival, or with a gene that is associated with the choice of mates.
Yet a more subtle nonbiologic reason for deviation from Hardy-Weinberg equilibrium is "population stratification," whereby the population has a mating substructure and mating is restricted to subjects in the same stratum. In this case, even if mating is random within each stratum, the population as a whole may deviate from Hardy-Weinberg equilibrium (16). A positive signal due to population stratification is the bona fide false alarm. Following the lead, a genomic search in the vicinity of the marker(s) with a significant HWT will most likely be ineffectual, with not a gene found that affects survival or mating choice, let alone a gene that contributes to disease susceptibility. If such a mating substructure can be delineated using variables such as ethnicity or race, one can perform a stratified analysis to adjust for the bias (e.g., by defining a "pooled" HWT statistic, such as
where s is the stratum indicator (17)). If not, one can specify the HWT to be one sided, testing exclusively for heterozygote excess (D < 0). (A population substructure in itself can produce a deviation from Hardy-Weinberg equilibrium only in the direction of heterozygote deficiency, the Wahlund principle (18).) However, such a one-sided approach will fail to detect recessive genes or genes with 2
12.
![]() |
CONCLUSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
NOTES |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Related articles in Am. J. Epidemiol.: