Invited Commentary: Making the Most of Genotype Asymmetries

Clarice Weinberg 

From the National Institute of Environmental Health Sciences, Research Triangle Park, NC.

Received for publication August 14, 2003; accepted for publication August 28, 2003.


    INTRODUCTION
 TOP
 INTRODUCTION
 REFERENCES
 
Designs that use family members as genetic controls offer appealing advantages. Family members are often easy to contact and more than willing to participate. They also tend to be well matched to the case on other factors, such as ethnicity. At the same time, unrelated population-based controls may become harder to reach and to recruit for genetic studies, because of the increasing popularity of screening devices such as answering machines and increased concerns about potential abuses of genetic data. Consequently, a population-based case-control design can let the investigator down by failing to provide a high-enough control recruitment rate for the findings to be trustworthy.

One family-based approach genotypes cases and their parents, relying on the fact that any allele related to increased risk (either directly or through linkage disequilibrium with a nearby disease-susceptibility allele) will appear to have been transmitted too often (i.e., more than half the time) by heterozygous parents to affected offspring (1). One can use such a design to estimate genotype relative risks (penetrance ratios (2, 3)) and to study departures from multiplicative joint effects of exposures and genetic factors on risk (4), although one cannot estimate the main effects of exposures with this design. Missing parents can be easily handled, if the missingness mechanism is ignorable (5) and can be handled with a little extra work even if ignorability cannot be assumed (6). Maternally mediated genetic effects that are expressed during gestation and may influence the later health of the child can also be detected and characterized (7). This design can thus be very useful for studying genetic contributions to early onset diseases, such as insulin-dependent diabetes and schizophrenia. Recent work has shown that grandparent controls can yield more power for conditions with very early onset, such as birth defects (8).

All this is well and good. However, most noninfectious diseases have their onset later in life, when parents may be available only rarely. For such diseases, the case-parent design is of limited use, but alternative methods based on comparisons of affected and unaffected siblings are available (9), methods that exploit the genotype symmetry that one would expect if the gene is unrelated to risk. The paper by Lee in this issue (10) provides a different extension, by using the spouse as the genetic control when the parents are unavailable or by using the offspring of the case when neither parents nor spouse is available.

The notion that spouses can be used as genetic controls was proposed by Wilcox et al. (7) in the context of identifying maternally mediated genetic effects on disease susceptibility in the offspring. They reasoned that, if mothers who carry one or two copies of a particular allele have a modified phenotype during pregnancy, which somehow increases the risk of a later health problem in the infant, then this maternally mediated genetic effect should produce a distortion away from symmetry between the mother’s genotype and that of her husband. The mothers of affected children will be more likely to carry the risk-conferring allele than will the fathers.

The mating disequilibrium test proposed in this issue by Lee provides a more general application. One begins by calculating the differences between the number of copies of the variant allele carried by the ith case, Ci, and the number carried by the spouse, Si, or the number imputed for the spouse (2Oi Ci) if only offspring are available, where Oi is the average allele count for the offspring genotyped. Thus, we calculate Di = CiSi when the spouse is available and Di = Ci – (2OiCi) = 2(CiOi) when only one (or more) of the case’s children is available. Under very plausible symmetry assumptions, these differences have expectation zero under the null that the marker under study is not related to risk of the disease. Therefore, the proposed mating disequilibrium test statistic, the square of the sum of the Di divided by the sum of the squares of the Di, should be approximately chi-squared with 1 degree of freedom under the null.

Under the assumptions stated by Lee, and given that we avoid loci on sex chromosomes, I believe the spouse should provide a near-ideal genetic control. Lee is more cautious, however, and demonstrates that, under scenarios where there is extreme admixture, the type I error rate can become very high using either the case-spouse or the case-offspring design. (Exactly the same distortions could affect the test for maternal effects proposed by Wilcox et al.). Intuitively, this distortion happens because, for example, there may be a subpopulation with both higher risk of the disease and higher prevalence of a particular allele, and intermarriage across the disparate subpopulations will produce an artifactual asymmetry, where the case spouse is much more likely to carry the variant allele than is the control spouse, even if the allele is not related to risk.

Such extreme scenarios should be rare in practice. Consider a highly stratified population with random mating, that is, the simulation corresponding to the admixing proportion of 1 in Lee’s figure 3. Such a population will, within a generation or two, mix itself. In effect, cross-marriages, which are needed to cause artifactual asymmetries across spouses, themselves quickly obliterate the stratification. Consequently, in any reasonably steady-state population, such a scenario will be present with probability approaching zero.

For most genes, it seems similarly artificial to think that people destined to develop the disease select their spouse based on the allele at the particular gene under study. Thus, although the correction offered, based on multiplicative scaling of the chi-square using unlinked markers, works well in Lee’s simulations, this level of protection against potential bias may be overkill when spouses are used as controls. Nevertheless, if the disease is more common in one sex than the other, the investigator may want to confirm that the joint distribution of Ci and Si is similar whether the case is the female or the male. If the disease is more common in a particular ethnic group, one might carry out a similar test that focuses on mixed marriages and compares the two distributions based on the ethnicity of the case.

One advantage to using spouses is that they provide a whole, diploid control. By contrast, a single offspring effectively provides only a half of a person, because that offspring will have inherited just one allele at each locus from the missing spouse, the allele that we infer by subtraction. This intuition is supported by Lee’s sample size calculations, which demonstrate the need for approximately twice as many families when single offspring are used, compared with the corresponding scenario with spouses.

Consequently, I wonder whether, when the statistic is to be calculated based on a mix of family types, some with spouse controls and some with offspring controls, a smaller relative weighting for offspring controls could yield a more powerful test. One should perhaps calculate the Di as CiOi rather than 2(CiOi), to reflect that the pairs based on offspring provide about half as much information as pairs based on the spouse. I am assuming here that typically there will be only one or two offspring genotypes available.

Likewise, parental controls should provide about as much power as spousal controls, because a single diploid genotype (i.e., one hypothetical human being) is provided by the parents, namely, the pair of alleles that were not transmitted to the case. This can be thought of as the hypothetical perfect complement to the case—a sibling who would have inherited the other allele from each parent at each locus. This intuition too is supported by Lee’s simulations, where the number of families required for a case-parents analysis is similar to the number of families required for the case-spouse analysis. (This result is also in line with the simulations provided by Wilcox et al. (7)).

In an actual study, case families often offer options beyond using either the spouse or the offspring as controls: One may sometimes be able to study the two parents and sometimes unaffected siblings of the case (11). Lee has accordingly also proposed a more general strategy, which we should perhaps call a disequilibrium test rather than a mating disequilibrium test. When both parents are available, Di becomes the difference between the case allele count and that of the case’s hypothetical sibling complement, constructed from the two nontransmitted parental alleles. This leads to the following: Di = Ci – (Mi + FiCi) = 2Ci – (Mi + Fi), where Mi (Fi) is the number of copies of the variant allele carried by the case’s mother (father). For families without parents but with available unaffected siblings, if the siblings have a mean allele count of Bi, we use Di = CiBi. Both of these defined Di have expectation 0, as did the Di based on the spouse or the offspring, under the null hypothesis that the allele under study is unrelated to risk. These new Di can simply be folded into the calculation of the disequilibrium test statistic.

Lee proposes the following hierarchy for selecting a family-based control to use in constructing Di for a given case (except that I here impose the reduced weighting suggested above for the offspring): We calculate Di as Ci minus the following:

1. Allele counts for the hypothetical complement sibling, that is, the parental alleles not transmitted to the case; if parents are not available, then use

2. Average of the counts of unaffected sibling(s); if not, then use

3. Allele count for the spouse; if not, then use

4. Average of the allele counts for the offspring.

I believe there is an even more efficient strategy for a disequilibrium test. Consider that the siblings are in effect serving only as surrogates for the independent parental information (the parental alleles not transmitted to the case) and the offspring are serving as surrogates for the spouse. In fact, if we had infinitely many unaffected siblings, the only information they could be adding is the exact genotype for the parents. Thus, once we have the parents, those siblings provide no additional genetic information. Similarly, once we have the spouse, the offspring provide no additional genetic information.

Because the unaffected siblings serve only to provide information about the parents and the offspring serve only to provide information about the spouse, we should be able to improve the power of the disequilibrium test by subtracting from Ci the result number 1 or the result number 2 (but not both) averaged with result number 3 or number 4 (but not both). Parents (as in number 1) are preferred over siblings, when both are available, and the spouse (as in number 3) is preferred over offspring, when both are available. This proposed modification, by averaging over more than one control in families where more than one is available, should improve the power of the disequilibrium test by reducing the variance of the component Dis.

More refined (and complicated) strategies might weight the Di according to their estimated variances. Teng and Risch (12) considered families with r affected and s unaffected offspring and regarded such sibships as providing inference about the genotypes of the unobserved parents, so that their Di is the average allele count for affected offspring minus the estimated average allele count for the unobserved parents. Depending on the sizes of r and s and the genotypes observed, one might have a great deal of information about the two missing parents and, hence, a lower variance for the corresponding Di. Perhaps a more powerful test can be developed if the various family contributions have their Di weighted according to the inverse of the variance, if reasonable variance estimates can be found.

Other approaches for improving the power of the disequilibrium test are worth exploring. Lee’s method is intended for studies where the whole genome is scanned for single nucleotide polymorphisms and, hence, his alpha level is very small, 0.0000001, to allow for multiple testing. It has recently been appreciated that linkage is preserved across long stretches of the chromosome (13), and that the number of haplotypes is much smaller than the number that would be predicted based on the number of single nucleotide polymorphisms. Thus, it is likely that a much smaller number of sentinel single nucleotide polymorphisms will be identified that can capture most of the variability across the human genome. Consequently, the sample sizes provided in Lee’s tables may be larger than what will actually be required, once we know how to select single nucleotide polymorphisms that can represent the existing haplotypes. With genetic analyses becoming more rapid and affordable, epidemiologists will soon be able to search the whole genome for susceptibility loci for complex diseases.


    ACKNOWLEDGMENTS
 
Drs. Norm Kaplan and David Umbach provided useful comments on the manuscript.


    NOTES
 
Correspondence to Dr. Clarice Weinberg, National Institute of Environmental Health Sciences, P.O. Box 12233, Research Triangle Park, NC 27709 (weinberg{at}niehs.nih.gov). Back


    REFERENCES
 TOP
 INTRODUCTION
 REFERENCES
 

  1. Spielman R, McGinnis R, Ewens W. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506–16.[ISI][Medline]
  2. Self S, Longton G, Kopecky K, et al. On estimating HLA-disease association with application to a study of aplastic anemia. Biometrics 1991;47:53–61.[ISI][Medline]
  3. Weinberg C, Wilcox A, Lie R. A log-linear approach to case-parent triad data: assessing effects of disease genes that act directly or through maternal effects, and may be subject to parental imprinting. Am J Hum Genet 1998;62:969–78.[CrossRef][ISI][Medline]
  4. Umbach D, Weinberg C. Using case-parent triads to study joint effects of genotype and exposure. Am J Hum Genet 2000;66:251–61.[CrossRef][ISI][Medline]
  5. Weinberg CR. Allowing for missing parents in genetic studies of case-parent triads. Am J Hum Genet 1999;64:1186–93.[CrossRef][ISI][Medline]
  6. Allen AS, Rathouz PJ, Satten GA. Informative missingness in genetic association studies: case-parent designs. Am J Hum Genet 2003;72:671–80.[CrossRef][ISI][Medline]
  7. Wilcox AJ, Weinberg CR, Lie RT. Distinguishing the effects of maternal and offspring genes through studies of "case-parent triads." Am J Epidemiol 1998;148:893–901.[Abstract]
  8. Weinberg CR. Studying parents and grandparents to assess genetic contributions to early-onset disease. Am J Hum Genet 2003;72:438–47.[CrossRef][ISI][Medline]
  9. Horvath S, Laird N. A discordant-sibship test for disequilibrium and linkage: no need for parental data. Am J Hum Genet 1999;63:1886–97.[CrossRef][ISI]
  10. Lee WC. Genetic association studies of adult-onset diseases using the case-spouse and case-offspring designs. Am J Epidemiol 2003;158:1023–32.[Abstract/Free Full Text]
  11. Gauderman WJ, Witte JS, Thomas DC. Family-based association studies. J Natl Cancer Inst Monogr 1999;26:31–7.[Medline]
  12. Teng J, Risch N. The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases. II. Individual genotyping. Genome Res 1999;9:234–41.[Abstract/Free Full Text]
  13. Gabriel S, Schaffner SF, Nguyen H, et al. The structure of haplotype blocks in the human genome. Science 2002;296:2225–9.[Abstract/Free Full Text]

Related articles in Am. J. Epidemiol.:

Genetic Association Studies of Adult-Onset Diseases Using the Case-Spouse and Case-Offspring Designs
Wen-Chung Lee
Am. J. Epidemiol. 2003 158: 1023-1032. [Abstract] [FREE Full Text]  

Lee Responds to "Making the Most of Genotype Asymmetries"
Wen-Chung Lee
Am. J. Epidemiol. 2003 158: 1036-1038. [Extract] [FREE Full Text]