RE: "ASYMPTOTIC BIAS AND EFFICIENCY IN CASE-CONTROL STUDIES OF CANDIDATE GENES AND GENE-ENVIRONMENT INTERACTIONS: BASIC FAMILY DESIGNS"

Clarice Weinberg

Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 27709

A paper in the Journal by Witte et al. (1Go) considered family-based designs for studying genetic relative risks and gene-by-environment interactions. The authors assessed three options for controls: unaffected siblings of cases, unaffected cousins of cases, and "pseudosiblings" of cases. Pseudosiblings are the three other genetically possible siblings the case proband could equally likely have had, based on the genotypes of the proband's two parents. Witte et al. provided statistical relative efficiencies for these designs compared with a population-based case-control design (1Go). Their results should be valuable to epidemiologists planning a study involving genetic factors and possible gene-by-environment interaction.

Some readers may be unfamiliar with the pseudosib design. With this approach, cases and their parents are sampled and genotyped, and one affected proband per family is enrolled. Other authors have referred to this as the case-parent triad (or trio) or case-parents design (2GoGoGo–5Go). In my view, the latter terminologies are preferable on the grounds that "pseudosib" really refers to a method of analysis rather than the design per se.

Witte et al. assert that the case-parents pseudosib design yields biased estimates of the genetic relative risks (1Go), except when the disease is rare. For example, they describe a scenario in which the true genetic relative risk associated with homozygosity is 20 while the asymptotic estimate is 7.4 (1Go). Although they propose a method for correcting this bias, their correction would seem to require knowledge of the family-specific baseline risks, which generally are not knowable. But, my larger point is that such a correction is not needed.

No correction is necessary because there is no asymptotic bias. Usually, when cases are matched to controls and conditional logistic modeling is used, the risk parameters that can be estimated are the odds ratios. Presenting the case-parents design as a pseudosib design and analyzing it by using conditional logistic regression seems to have raised the unfortunate expectation that one must be estimating odds ratios.

However, when a case-parents design is used, use of conditional logistic regression is a computational gimmick that allows estimation of the same risk parameters that would be estimated via a log-linear model, that is, the relative risks. The case-parents design does not permit odds ratios to be estimated, only relative risks. The parameters listed in table 1 of Witte et al. (1Go) under the heading "RR" are in fact odds ratios, not relative risks, and the "expected" values being estimated, as shown in the body of the table, are in fact the corresponding relative risks. For the example just cited, 20 is the odds ratio corresponding to the relative risk of 7.4. (This connection is easily verified algebraically by using the parameters given in table 1.) Thus, the authors should not have concluded that there is bias for the case-parents design when the disease is not rare. The sibling-control design (and the cousin-control design) has the symmetric deficiency, in that it allows the within-family odds ratios to be estimated but not the corresponding relative risks. Seen in this light, none of the three designs yields estimates that are asymptotically biased: they simply permit estimation of different parameters. Of course, if the disease is rare in all families (unlike some of the scenarios considered by Witte et al.), such a distinction would be unimportant. However, the explanation offered by Witte et al. for the discrepancy is misleading, and their use of "bias" is unfortunate.

REFERENCES

  1. Witte JS, Gauderman WJ, Thomas DD. Asymptotic bias and efficiency in case-control studies of candidate genes and gene-environment interactions: basic family designs. Am J Epidemiol 1999;149:693–705.[Abstract]
  2. Santos J, Schaid D, Perez-Bravo F, et al. Applicability of the case-parent design in the etiological research of type I diabetes in Chile and other genetically mixed populations. Diabetes Res Clin Pract 1999;43:143–6.[ISI][Medline]
  3. Geller B, Cook EH Jr. Serotonin transporter gene (HTTLPR) is not in linkage disequilibrium with prepubertal and early adolescent bipolarity. Biol Psychiatry 1999;45:1230–3.[ISI][Medline]
  4. Weinberg CR, Wilcox AJ, Lie RT. A log-linear approach to case-parent triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet 1998;62:969–78.[ISI][Medline]
  5. Weinberg CR. Allowing for missing parents in genetic studies of case-parent triads. Am J Hum Genet 1999;64:1186–;93.[ISI][Medline]

 

THE AUTHORS REPLY

John S. Witte, W. James Gauderman and Duncan C. Thomas

Department of Epidemiology and Biostatistics Case Western Reserve University Cleveland, OH 44109-1998
Department of Preventive Medicine University of Southern California Los Angeles, CA 90033

We commend Dr. Weinberg for pointing out that the case-sibling and case-pseudosib (or case-parents) designs estimate different parameters: the odds ratio for the former, the risk ratio for the latter (1Go). We have no fundamental difference of opinion about this point. For comparison's sake across different study designs, we focused on estimating a single parameter and chose the odds ratio for this purpose (2Go). Of course, the relation between the two population parameters depends on the background disease frequency, irrespective of the study design used to estimate it, and the point of our Appendix was to show how the odds ratio could be estimated by using the case-pseudosib design, given knowledge of the population rate.

It is worth noting that the case-sibling and case-pseudosib designs refer to slightly different source populations. The case-pseudosib design requires only a representative series of cases, with no restriction on the disease status of other family members (other than that parents be available for geno-typing). Under the assumption that availability of parents is unrelated to genotype, the risk ratio (or odds ratio using the correction method described in our Appendix (2)) estimated by using this design is the same as would be estimated by using population controls. On the other hand, the case-sibling design also requires the availability of an unaffected sibling; the odds ratio estimated by using this design is thus strictly referable only to the population of cases with unaffected siblings. If the odds ratio is truly homogeneous across the population, then this distinction is of no consequence. Refer to Kraft and Thomas (3Go) for further discussion of the implications of heterogeneity in baseline risks and odds ratios.

As noted by Dr. Weinberg (1Go), the terms "case-parent triad" or simply "case-parents" provide a concise summary of the design, in the sense of who is sampled and genotyped. However, we still prefer "case-pseudosib" or even "case-pseudocontrol," as previously suggested (4Go, 5Go). This terminology clarifies that it is not the parents' genotypes per se that form the "controls" but rather the set of sibling genotypes they could have transmitted. In addition, it emphasizes that the comparison is against the theoretical distribution of potential sibling genotypes, that is, what would be obtained if an essentially infinite number of siblings were available (assuming Mendelian inheritance and no association between genotype and survival).

Finally, we thank Dr. Weinberg for pointing out to us the additional subtleties that arise with either design when there are more than two siblings in a family and the gene under study is not the causal factor but only in linkage disequilibrium with the real causal gene (C. Weinberg, personal communication, 1999). In this situation, the conditional logistic regressions for n:m matched case-control studies (in the case-sibling design with n affected and m unaffected persons in a sibship) or treatment of each affected person as a separate 1:3 matched case-pseudosib comparison is not valid and will tend to lead to liberal significance tests and to confidence intervals that are too narrow. There has been increasing discussion of this problem in the genetics literature recently. Siegmund et al. (6Go) provide an overview of this literature and a discussion of the use of robust variance estimators that allow for this dependency within sibships.

REFERENCES

  1. Weinberg C. Re: "Asymptotic bias and efficiency in case-control studies of candidate genes and gene-environment interactions: basic family designs." (Letter). Am J Epidemiol 2000;152:689–90.[Free Full Text]
  2. Witte JS, Gauderman WJ, Thomas DC. Asymptotic bias and efficiency in case-control studies of candidate genes and gene-environment interactions: basic family designs. Am J Epidemiol 1999;149:693–705.[Abstract]
  3. Kraft P, Thomas DC. Bias and efficiency in family-matched gene-characterization studies: conditional, prospective, retrospective, and joint likelihoods. Am J Hum Genet 2000;66:1119–131.[ISI][Medline]
  4. Graham J, Kockum I, Breslow N, et al. A comparison of three statistical models for IDDM associations with HLA. Tissue Antigens 1996;48:1–14.[ISI][Medline]
  5. Greenland S. A unified approach to the analysis of case-distribution (case-only) studies. Stat Med 1999;18:1–15.[ISI][Medline]
  6. Siegmund KD, Langholz B, Kraft P, et al. Testing linkage disequilibrium in sibships. Am J Hum Genet 2000;67:244–8.[Medline]