Estimating Genotype Relative Risks in Case-Parental Control Studies: An Optimal Weighting Approach
Wen-Chung Lee and
Chin-Hao Chang
From the Graduate Institute of Epidemiology, College of Public Health, National Taiwan University, Taipei, Taiwan, Republic of China.
 |
ABSTRACT
|
---|
The "case-parental control study" is a novel study design. It can quantify the relations between candidate genes and disease risk. Previous authors have proposed noniterative methods for estimating "genotype relative risks" (GRRs) in case-parental control studies. Here the authors propose yet another noniterative method. The new method is simple. It involves taking certain weighted averages with weights chosen according to one's educated guess about the likely values of the true GRRs. Monte Carlo simulation shows that the new estimators are approximately unbiased and that they have smaller variances than the previous estimators. Am J Epidemiol 2000;152:48792.
case-control studies; epidemiologic methods; genotype; risk
Abbreviations:
CPG, conditional on parental genotype; FK, Flanders and Khoury; GRR, genotype relative risk; K, Khoury; SFYK, Sun, Flanders, Yang, and Khoury
 |
INTRODUCTION
|
---|
Searching for disease susceptibility genes is the interest of geneticists and epidemiologists alike (1
, 2
). To this end, an association approach aimed at examining and quantifying the relation between candidate genes and disease risk can be useful. In particular, a novel study design, the "case-parental control" design (3
5
), has come to much attention recently. It is a variant of a type of case-control study, the case-only study (6
). The case-parental control study design requires the case group and their parents only, and can do without a control group entirely. This not only makes the design more cost-efficient but also eliminates the possibility of bias from inappropriate selection of controls whose genetic backgrounds differ systematically from those of cases (3

6
).
Khoury (K) (3
), Flanders and Khoury (FK) (4
), and more recently Sun et al. (SFYK) (5
) have proposed noniterative methods for estimating "genotype relative risks" (GRRs) in case-parental control studies. Their methods produce GRRs that are approximately unbiased, but the variances (instability) of different methods vary (K > FK > SFYK). In this paper, we propose yet another noniterative method of estimating GRRs. The new method is simple. It involves taking certain weighted averages with weights chosen according to one's educated guess about the likely values of the true GRRs. In other words, the method incorporates a priori information to optimize the estimation. We demonstrate using Monte Carlo simulation that the resulting estimators are approximately unbiased and have smaller variances than the K and FK estimators and even the SFYK estimator.
 |
BACKGROUND AND BASIC NOTATION
|
---|
Assume that a candidate locus for disease susceptibility has alleles of M and N, with M the mutant (or high risk) allele and N the normal allele. In the case-parental control design, all newly diseased subjects (or a random sample of them) are recruited. These case subjects and their parents are genotyped at the candidate locus. There are three possible genotypes for each subject: NN, MN, and MM, denoted by g = 0, 1, and 2, respectively, corresponding to the number of mutant alleles in the genotype. Parental matings of the types NN x NN (g = 0 x g = 0), NN x MM (g = 0 x g = 2), and MM x MM (g 2 x g = 2) produce only one offspring genotype and are not informative. These noninformative "case-parent triads" are excluded. The other triads can be cross-tabulated according to the genotypes of case subjects and the genotypes of hypothetical control subjects carrying the nontransmitted parental alleles. We let mij (i, j = 0, 1, 2; 1
i + j
3) denote the number of such informative case-parent triads, with g = i in the case (transmitted) subject and g = j in the control (nontransmitted) subject. Note that these notations are in accordance with those of Sun et al. (5
).
Our interest here is to estimate the GRRs, based on the mij. Two such GRRs can be defined to describe the differential susceptibility to disease for the three genotypes. We let r1 represent the relative risk for individuals with g = 1 versus those with g = 0 and r2 the relative risk of g = 2 versus g = 1. Note that we did not use a common baseline to define the two GRRs. The r1 and r2 defined in this way characterize more succinctly the possible action modes of the candidate gene. If both are close to 1, we conclude that the gene is probably not associated with the disease. If r1 > 1 and r2
1, an autosomal dominant gene is suspected. If r1 is close to 1 but r2 is not, the gene possibly conforms to an autosomal recessive model. If r1
r2 > 1, a "gene-dose effect" (additive or codominant model) is demonstrated, in which disease susceptibility increases precisely in proportion to the number of mutant alleles involved.
The K, FK, and SFYK estimators mentioned above have the following general form:
 | (1) |
and
 | (2) |
Essentially,
r1 is a weighted average of two ratios, m10/m01 and m11/(2m02); and
2 is the weighted average of m21/m12 and 2m20/m11. By setting the weighting constants w1 = w2 = 1, we have the SFYK estimators. By setting w1 = (m02 + m11 + m20)/(m10 + m01) and w2 = (m02 + m11 + m20)/(m12 + m21), we obtain the FK estimators. By letting w1 and w2 both go to infinity, we have the K estimators.
Treating mij (i, j = 0, 1, 2; 1
i + j
3) as a multinomial random vector and the w1 and w2 as fixed constants, the variance of the logarithm of
1 and
2 can be derived using the delta method (7
):
 | (3) |
and
 | (4) |
 |
THE OPTIMAL WEIGHTING APPROACH
|
---|
We show in the Appendix that the variances of the logarithm of
1 and
2 are minimized, if the weighting constants can be chosen to be w1 = 1 + r1/(r1 + 1) and w2 = 1 + 1/(r2 + 1). Estimation of GRRs using these optimized constants will be unbiased and most precise. The above equations imply that the optimal weightings always lie between 1 and 2. This effectively rules out the K estimators, which have weighting constants of infinity, and the FK estimators, which do not guarantee weighting constants that are between 1 and 2. As for the SFYK estimators, we see that they are optimal only when r1 = 0 and r2 =
, that is, when the homozygotes (NN and MM) are prone to disease and the heterozygotes (MN) are immune to disease.
In practice, r1 and r2 are unknown and the optimal weighting constants cannot be determined in advance. However, one can make an approximate guess about the two GRRs. If previous studies have suggested that the gene under study is only weakly associated with the disease (r1, r2
1), we can use w1 = w2 = 1.5. If the gene is suspected of being autosomal dominant, we assign a suitable value between 1.5 and 2 for w1 and a value of 1.5 for w2. If the gene is suspected of being autosomal recessive, we assign 1.5 for w1 and a value between 1 and 1.5 for w2. For situations in which the gene is not clearly dominant or recessive, a reasonable choice would be to set w1 slightly above 1.5 and w2 slightly below 1.5.
 |
SIMULATION STUDIES
|
---|
In this section, we perform a simulation study on the statistical properties of the K, FK, SFYK, and optimal weighting methods. For comparison, we also present simulation results for the "conditional on parental genotype" (CPG) method (8
). This likelihood-based method requires an iterative algorithm to obtain the estimates, yet standard likelihood theory predicts that it will have maximal stability. To simplify the presentation, we adopt an approach similar to that of Sun et al. (5
); i.e., we assume a Hardy-Weinberg equilibrium and random mating in the parental population in the simulation. We emphasize again that the results should hold even if these conditions are not met. The simulation considers an autosomal dominant gene (r1 = 5, r2 = 1), an autosomal recessive gene (r1 = 1, r2 = 5), and a gene with a gene-dose effect (r1 = 5, r2 = 5) under gene frequencies (f, the prevalence of allele N) of 0.2, 0.5, and 0.8, respectively. For each situation, 10,000 simulations are performed. We chose a sample size (the number of informative case-parent triads) such that the minimum expected value of mij would be about 10 for each situation.
Table 1 presents the simulation results for the autosomal dominant gene. It can be seen that all of the methods considered yield log GRRs that are approximately unbiased, though the log r1 estimation is slightly above the true value of log r1. For the stability of the point estimates, one finds, as expected, that the CPG method has the smallest variance. However, the optimal weighting method as proposed in this paper produces the most stable estimates among the noniterative methods: It has smaller variances than the K and FK methods, and even the SFYK method. Since the true GRRs are unknown in practice and the weighting constants must be based on guesswork, our method may become "suboptimal" when certain incorrect values of r1 and r2 are selected. We simulated two cases of incorrect specification: r1 = 1, r2 = 3 and r1 = 3, r2 = 1. The former case corresponds to wrongly assuming a recessive model, and the latter to wrongly assuming a dominant model with underestimated effects. We see that such "suboptimal methods" are still better than the K, FK, and SFYK methods. Table 1 also presents the coverage probabilities and the average lengths of the 95 percent confidence intervals for the various methods. The coverages are close to the nominal 95 percent for all of the methods considered. We notice again that the average length for the optimal weighting method is shortest among the noniterative methods.
View this table:
[in this window]
[in a new window]
|
TABLE 1. Simulation results* for estimators of genotype relative risk, using an autosomal dominant gene (r1 = 5, r2 =1), under different gene frequencies (f = 0.2, 0.5, 0.8)
|
|
Tables 2 and 3 present, respectively, the simulation results for the autosomal recessive gene and the gene with a gene-dose effect. The basic findings are similar to those for the autosomal dominant gene, though the superiority of the optimal weighting method over the SFYK method is not as striking.
View this table:
[in this window]
[in a new window]
|
TABLE 2. Simulation results* for estimators of genotype relative risk, using an autosomal recessive gene (r1 = 1, r2 =5), under different gene frequencies (f = 0.2, 0.5, 0.8)
|
|
View this table:
[in this window]
[in a new window]
|
TABLE 3. Simulation results* for estimators of genotype relative risk, using a gene with a gene-dose effect (r1 = 5, r2 =5), under different gene frequencies (f = 0.2, 0.5, 0.8)
|
|
 |
DISCUSSION
|
---|
In this paper, we present a new approach to estimating the GRRs for a disease susceptibility gene. The new method produces GRRs that are approximately unbiased and that have variances smaller than those of the other noniterative methods proposed to date. The method requires a priori information about the likely values of the GRRs. However, this is not crucial. To simplify matters and to be on the safe side, one can set w1 slightly above 1.5 and w2 slightly below 1.5. The results, though less optimal, are still better than those of the other noniterative methods.
Recently, it was discovered that the likelihood-based CPG method is identical to the use of a log-linear Poisson regression model and that it can therefore be implemented using standard software (9
, 10
). However, it still requires some computational efforts (feeding the data into a computer, specifying appropriate program codes for analysis, etc.). By contrast, the calculation in our method is so simple that it can be done with pencil and paper. This could be useful for a speedy initial assessment. The estimates derived from our method can also serve as reasonable starting values for Poisson regression analysis.
In this paper, the variance formulae are used to construct the confidence intervals around the point estimates. However, they can also be used to perform hypothesis testing about the GRRs. This implies that the optimal weighting approach as proposed in this paper may lead to an alternative "transmission/disequilibrium test" (11
). Such a test is particularly useful when one is dealing with a "marker gene" rather than a "susceptibility gene." In such situations, one's interest lies in whether or not the "marker gene" under study is in linkage disequilibrium with the disease gene, rather than in the magnitude of the GRRs of the marker gene itself. The prospect of an optimal weighting approach to transmission/disequilibrium testing is currently under investigation.
 |
APPENDIX
|
---|
To obtain the optimal weighting, we differentiate equation 3 in the text with respect to w1 and equation 4 with respect to w2, and equate them to zero. That is,
and
After rearrangement, we have
 | (5) |
 | (6) |
The expectations (E) of the mij are proportional to the "probability of mating" (the mating probability of any two given genotypes in the population) and the GRRs. That is,
hk (k = 1, 2, 3) is the probability of mating: h1 is the probability of mating between genotype MN and genotype NN (MN x NN); h2 is the probability of MN x MN; and h3 is the probability of MN x MM. r1 and r2 are the GRRs, as defined in the text. Replacing mij in equations 5 and 6 with its corresponding expected value and rearranging and canceling out hk, we arrive at
and
 |
ACKNOWLEDGMENTS
|
---|
This study was partly supported by a grant from the Taiwan National Science Council.
 |
NOTES
|
---|
Reprint requests to Dr. Wen-Chung Lee, Graduate Institute of Epidemiology, National Taiwan University, No. 1, Jen-Ai Road, 1st Sec., Taipei, Taiwan, Republic of China (e-mail: wenchung{at}ha.mc.ntu.edu.tw).
 |
REFERENCES
|
---|
-
Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science 1996;273:151617.[ISI][Medline]
-
Khoury MJ, Yang Q. The future of genetic studies of complex human diseases: an epidemiologic perspective. Epidemiology 1998;9:3504.[ISI][Medline]
-
Khoury MJ. Case-parental control method in the search for disease-susceptibility genes. (Letter). Am J Hum Genet 1994;55:41415.[ISI][Medline]
-
Flanders WD, Khoury MJ. Analysis of case-parental control studies: method for the study of associations between disease and genetic markers. Am J Epidemiol 1996;144:696703.[Abstract]
-
Sun F, Flanders WD, Yang Q, et al. A new method for estimating the risk ratio in studies using case-parental control design. Am J Epidemiol 1998;148:9029.[Abstract]
-
Greenland S. A unified approach to the analysis of case-distribution (case-only) studies. Stat Med 1999;18:115.[ISI][Medline]
-
Agresti A. Categorical data analysis. New York, NY: John Wiley and Sons, Inc, 1990.
-
Schaid DJ, Sommer SS. Genotype relative risks: methods for design and analysis of candidate-gene association studies. Am J Hum Genet 1993;53:111426.[Medline]
-
Wilcox AJ, Weinberg CR, Lie RT. Distinguishing the effects of maternal and offspring genes through studies of "case-parent triads." Am J Epidemiol 1998;148:893901.[Abstract]
-
Weinberg CR, Wilcox AJ, Lie RT. A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet 1998;62:96978.[ISI][Medline]
-
Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:50616.[ISI][Medline]
Received for publication May 7, 1999.
Accepted for publication September 30, 1999.