Invited Commentary: On Studying the Joint Effects of Candidate Genes and Exposures

David M. Umbach

From the Biostatistics Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC.


    INTRODUCTION
 TOP
 INTRODUCTION
 CONFOUNDING
 BIOLOGIC PLAUSIBILITY
 A FINAL REMARK
 REFERENCES
 
For a long time, epidemiologists have recognized that human disease arises from the interplay of environmental exposures and host susceptibilities. With recent advances in molecular biology, assessment of genetic contributions to susceptibility has progressed from indirect measures based on family history to direct measures of an individual's genotype at particular loci. With a draft sequence for the entire human genome in hand, medical science is poised for rapid advances in the study of how genetic susceptibility modulates risk from environmental exposures.

The goal of gene-environment studies in epidemiology is to learn how the risk of a disease changes as a joint function of genotype and exposure. Such studies promise new insights into etiology. They also promise the eventual capability to tailor interventions more precisely, whether at a clinical level, where the therapeutic agent prescribed or its dose may be chosen in light of an individual's genetic makeup, or at a public health level, where programs may be targeted at high-risk subpopulations. Of course, the study of candidate genes and exposures presents methodological challenges. This commentary reviews some of these issues and indicates how successfully the report by Wang et al. (1Go) has addressed them. These authors study the interrelation of benzene exposure, maternal polymorphisms in the CYP1A1 and GSTT1 genes, and length of gestation and find that low-dose maternal benzene exposure is associated with shortened pregnancy among a genetically defined subset of mothers.


    CONFOUNDING
 TOP
 INTRODUCTION
 CONFOUNDING
 BIOLOGIC PLAUSIBILITY
 A FINAL REMARK
 REFERENCES
 
Confounding arises in some novel ways in gene-environment studies. One way is through unrecognized linkage, when the candidate locus is closely linked to a susceptibility locus. Whether polymorphisms at the candidate locus are involved in the disease process or not, the genotype at a nearby susceptibility locus will act as a confounder, and the estimated effect of a variant at the candidate locus will be biased. Wang et al. (1Go) make readers aware of this potential difficulty. Better knowledge of the human genome will eventually reduce this problem.

Another potential source of confounding, particularly in studies of perinatal outcomes, is to measure the mother's genotype when the child's genotype is actually the relevant one (or vice versa). The basic biology of allele transmission ensures that a mother and child share one allele at each locus so that their genotypes are correlated. Consequently, the mother's genotype and the child's genotype at the same locus may each be viewed as a potential confounder of the other. In an analysis that uses the mother's genotype at the susceptibility locus when the child's alone is relevant, estimates of the risk associated with a variant allele can be severely biased (2Go).

The same phenomenon can bias estimates of genotype-exposure interactions. As the mother's tissue encounters the toxicant first, her genotype likely plays a key role in activating or deactivating potentially toxic exposures before they reach the fetus. Yet, without evidence that the toxicant does not cross from the maternal to the child's circulation, ruling out any role for the child's genotype in the same biochemical processes is difficult. Wang et al. (1Go) point out the existence of an operating cytochrome P-450 enzyme system in the fetoplacental unit but do not raise the question of whose genotype may be most relevant here. A potential role for the child's genotype is difficult to judge, however, without a mechanistic link between benzene exposure and the initiation of parturition.

One goal of studies involving genetic susceptibility and perinatal exposures should be to disentangle the separate contributions of maternal and offspring genotypes. Because the correlation between maternal and offspring genotypes often leads to correlations between the corresponding risk estimates, the relative importance of these two interrelated risk factors (or of their interactions with exposures) may be difficult to assess with commonly used case-control or cohort designs. Specialized designs, such as case-parents designs, can help, at least when disease status (affected vs. unaffected) is the outcome of interest. With a case-parents design and appropriate analysis, the risk estimates associated with the mother's genotype are uncorrelated with those associated with the offspring's genotype (3Go, 4Go). Similarly, estimates of the mother's genotype-exposure interactions are uncorrelated with estimates of the offspring's genotype-exposure interactions in case-parents designs (unpublished).

Still another source of potential genetic confounding is hidden population structure. The sampled population may consist of several genetically distinct subpopulations that are incompletely mixed (admixture, population stratification). If those subpopulations differ in both the prevalence of a variant allele at the candidate locus and the prevalence or magnitude of a trait, apparent associations between the allele and trait may simply reflect confounding of the allele's effect by subpopulation identity. When an investigator recognizes relevant subpopulations, for example, ethnic groups, and can correctly classify respondents into them, controlling such admixture-induced confounding is straightforward. Population structure, however, can be subtler than overt ethnic differences. Barriers to gene flow within an apparently homogeneous population could lead to a subpopulation structure that is easily overlooked. Even when admixture is known to exist, identifying relevant subpopulations and accurately assessing an individual's membership therein can be difficult. Since exposure prevalence may also vary among genetically distinct subpopulations, exposure and gene-exposure interaction effects can also be biased by subpopulation structure.

That confounding from genetic population structure is capable of biasing inference is inescapable; what is less clear is how severe a problem it presents. For diverse populations such as that of the United States, admixture-induced bias may be relatively small for common variants (5Go). Wang et al. (1Go) report that all the workers were Chinese but offer no information about the possibility of subtler ethnic distinctions among their subjects. Consequently, the potential for such structure is difficult to evaluate.

Clever study designs can help cope with biases that arise from population structure. The basic idea is to use data from family members of affected individuals in ways that, in essence, stratify possible confounding away. Case-parents designs, mentioned earlier, can eliminate spurious associations in studies of genetic effects alone (6Go, 7Go) or in studies of gene-environment interactions (8Go, 9Go). Matched case-control designs that use siblings or other relatives of cases as controls can also eliminate admixture-induced confounding (10Go). Weinberg and Umbach (11Go) provide a discussion of the pros and cons of various retrospective epidemiologic designs for gene-environment studies of disease incidence. Analogous study designs are applicable to studies of quantitative traits (12GoGoGo–15Go).

A different complication is that factors that confound interaction effects need not be the same as those that confound exposure effects. Put another way, adjusting for variables that confound exposure will not guarantee that gene-exposure interaction effects will be free from bias. Investigators tend to avoid explicit consideration of factors that may confound gene-exposure interactions. Perhaps this oversight reflects the relatively recent origin of studies in which detection of interaction is a primary goal. As a specialty, we may not have enough experience thinking about the kinds of factors that may confound interaction effects. On the other hand, despite its theoretical possibility, perhaps confounding of interactions rarely occurs in practice or is adequately controlled by adjustment for confounders of exposure or of genotype. Still, if some factor were a confounder of exposure, prudence would dictate checking whether a term for the interaction of that factor with genotype appeared to confound the gene-exposure interaction of interest. Although the authors adjusted for a number of variables related to gestational age, they did not explicitly consider variables that might be related to genotype-benzene interactions. Again, imagining what set of variables might be relevant is problematic without having some mechanism of action in mind. Because the CYP1A1 and GSTT1 enzymes have been related to the metabolic processing of benz[a]pyrene in cigarette smoke, however, one possibility might be to include CYP1A1-passive smoking or GSTT1-passive smoking interaction terms and look for possible changes in the corresponding gene-benzene interaction coefficients. This sort of check seems particularly apt since passive smoking was somewhat associated with benzene exposure among these workers.


    BIOLOGIC PLAUSIBILITY
 TOP
 INTRODUCTION
 CONFOUNDING
 BIOLOGIC PLAUSIBILITY
 A FINAL REMARK
 REFERENCES
 
Biologic plausibility is an essential element in causal inference, but plausibility of mechanism lies in the eye of the beholder. What mechanisms are regarded as biologically plausible depends on the current state of scientific knowledge and on the interpretation of "plausible" itself (16Go). While strong causal inferences from observational studies require consistent results from a number of studies, individual investigations are evaluated using a parallel standard. An association seen in a single study and supported by a solid mechanistic explanation is more likely to be regarded as causal than one lacking mechanistic support.

Field studies are undertaken because some previous data or mechanistic rationale suggests that the exposure may be related to the disease outcome. Candidate genes are nominated for study analogously. Some connection between the candidate gene and the disease or the exposure under study is desirable. Polymorphic genes make stronger candidates when the different alleles lead to differential catalytic activity of enzymes or differential expression levels. The higher the biologic plausibility of any mechanisms that implicate the candidate gene in the disease process, the more causally relevant any observed associations become.

Criteria like those, while easy to enunciate, are not always easy to evaluate. Knowledge about functional polymorphisms is restricted to a relatively small number of well-studied genes. The Human Genome Project (http://www.nhgri.nih.gov/HGP/) will facilitate gene discovery as well as the identification of polymorphisms. Additional efforts such as the Environmental Genome Project (http://www.niehs.nih.gov/envgenom/) will evaluate selected genes for the existence, and eventually the function, of polymorphisms. As such efforts proceed, the choice of candidate genes for a given exposure-disease setting will become increasingly enlightened.

Science needs activities that generate hypotheses as well as those that test them, and valuable exploratory studies often have little a priori evidence for genetic or exposure effects. Wang et al. (1Go) offer no biologic mechanism relating benzene exposure to parturition. The prima facie plausibility of the observed interaction rests on the fact that the candidate genes, CYP1A1 and GSTT1, are involved in the detoxification of organic solvents. The authors are somewhat less definite about direct connections of either gene to benzene. The CYP1A1 variants do not seem to exhibit differential enzyme activity (17Go) but do, as the authors point out, show differential inducibility. Is benzene a substrate or an inducer for CYP1A1? Substrates for the CYP1A1 enzyme are generally polycyclic molecules such as benz[a]pyrene from cigarette smoke, whereas the phase I metabolism of benzene has been attributed to CYP2E1 (18Go). Even granting that biologic plausibility is a mercurial concept, the authors might have delved more deeply into mechanisms through which benzene might influence parturition or into the roles of these candidate genes in benzene metabolism to provide a stronger foundation for assessing the biologic plausibility of their results.


    A FINAL REMARK
 TOP
 INTRODUCTION
 CONFOUNDING
 BIOLOGIC PLAUSIBILITY
 A FINAL REMARK
 REFERENCES
 
The capacity to genotype individuals at thousands of loci quickly and accurately is available now. Realizing this capability at low per-subject cost is just over the horizon. This burgeoning volume of genetic data represents both an unprecedented opportunity and an unprecedented challenge. At present, gene-environment studies focus on a single exposure and on one or a few candidate loci while adjusting for confounders and covariates. The paper by Wang et al. (1Go)—and this commentary—exemplify the reductionist approach usual in analytical sciences. Such an approach may soon be overwhelmed by the sheer volume of data that will be available. Even as we struggle to learn how to cope methodologically with challenges posed by gene-environment studies, the landscape is changing. The near future will be an exciting—and daunting—time for methodological and epidemiologic research.


    NOTES
 
Correspondence to Dr. David M. Umbach, Biostatistics Branch, Mail Drop A3-03, National Institute of Environmental Health Sciences, P.O. Box 12233, Research Triangle Park, NC 27709-2233 (email: umbach{at}niehs.nih.gov).


    REFERENCES
 TOP
 INTRODUCTION
 CONFOUNDING
 BIOLOGIC PLAUSIBILITY
 A FINAL REMARK
 REFERENCES
 

  1. Wang X, Chen D, Niu T, et al. Genetic susceptibility to benzene and shortened gestation: evidence of gene-environment interaction. Am J Epidemiol 2000;152:693–700.[Abstract/Free Full Text]
  2. Weinberg CR. What can we learn about genetic risk factors from studies of cases and their parents? Presented at the Bernard G. Greenberg Distinguished Lecture Series, University of North Carolina, Chapel Hill, North Carolina, May 14, 1998.
  3. Weinberg CR, Wilcox AJ, Lie RT. A log-linear approach to case-parent triad data: assessing effects of disease genes that act directly or through maternal effects, and may be subject to parental imprinting. Am J Hum Genet 1998;62:969–78.[ISI][Medline]
  4. Wilcox AJ, Weinberg CR, Lie RT. Distinguishing the effects of maternal and offspring genes through studies of "case-parent triads." Am J Epidemiol 1998;148:893–901.[Abstract]
  5. Wacholder S, Rothman N, Caporaso N. Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. J Natl Cancer Inst 2000;92:1151–8.[Abstract/Free Full Text]
  6. Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 1993;52:506–16.[ISI][Medline]
  7. Schaid DJ, Sommer SS. Genotype relative risks: methods for design and analysis of candidate-gene association studies. Am J Hum Genet 1993;53:1114–26.[Medline]
  8. Schaid DJ. Case-parents design for gene-environment interaction. Genet Epidemiol 1999;16:261–73.[ISI][Medline]
  9. Umbach DM, Weinberg CR. The use of case-parent triads to study joint effects of genotype and exposure. Am J Hum Genet 2000;66:251–61.[ISI][Medline]
  10. Witte JS, Gauderman WJ, Thomas DC. Asymptotic bias and efficiency in case-control studies of candidate genes and gene-environment interactions: basic family designs. Am J Epidemiol 1999;149:693–705.[Abstract]
  11. Weinberg CR, Umbach DM. Choosing a retrospective design to assess joint genetic and environmental contributions to risk. Am J Epidemiol 2000;152:197–203.[Abstract/Free Full Text]
  12. Allison DB. Transmission-disequilibrium tests for quantitative traits. Am J Hum Genet 1997;60:676–90.[ISI][Medline]
  13. Rabinowitz D. A transmission-disequilibrium test for quantitative trait loci. Hum Hered 1997;47:342–50.[ISI][Medline]
  14. Fulker DW, Cherny SS, Sham PC, et al. Combined linkage and association sib-pair analysis for quantitative traits. Am J Hum Genet 1999;64:259–67.[ISI][Medline]
  15. van den Oord EJGC. Method to detect genotype-environment interactions for quantitative trait loci in association studies. Am J Epidemiol 1999;150:1179–87.[Abstract]
  16. Weed DL, Hursting SD. Biologic plausibility in causal inference: current method and practice. Am J Epidemiol 1998;147:415–25.[ISI][Medline]
  17. Waterman MR, Guengerich FP. Enzyme regulation. In: Sipes IG, McQueen CA, Gandolfi AJ, eds. Comprehensive toxicology. Vol 3. Biotransformation. New York, NY: Elsevier Science, Ltd, 1997:7–14.
  18. Guengerich FP. Cytochrome P450 enzymes. In: Sipes IG, McQueen CA, Gandolfi AJ, eds. Comprehensive toxicology. Vol 3. Biotransformation. New York, NY: Elsevier Science, Ltd, 1997:37–68.
Received for publication July 7, 2000. Accepted for publication July 11, 2000.