Commentary: Gene-environment interactions: fundamental yet elusive

Paul Burton

Department of Epidemiology and Public Health, University of Leicester, 22–28 Princess Road West, Leicester LE1 6TP, UK. E-mail: pb51{at}le.ac.uk

The article authored by Luan et al.1 is most timely. A primary aim of a large National Cohort study currently being designed under the auspices of the United Kingdom Medical Research Council, the Wellcome Trust and the Department of Health is to investigate the role of gene-environment interactions in the aetiology of complex diseases. The study currently aims to recruit and track approximately 500 000 middle-aged subjects from across the UK. It represents a major investment in the future of British biomedical science, and it is critical that we properly understand the ‘science’ underpinning the detection and interpretation of gene-environment interactions. However, there are a number of significant problems.234 One of these is the dependency of statistical interaction on ‘the scale one chooses to measure effects’.2 If environmental exposure E causes disease D only in the presence of abnormal protein P* encoded by allele A* of gene G, then this represents an interaction between E and G. It is not only present in the underlying biology but, given adequate power, it should also be detectable as a statistical interaction—variation in the estimated association between E and D for different genotypes at G—regardless of the scale of analysis. Strachan refers to this all-or-nothing phenomenon as ‘effect concentration’.5 However, if different genotypes at G produce merely quantitative changes in the association between E and a continuous or categorical trait, then the scale of analysis becomes critical in determining whether or not a statistical interaction is detected. For example, interactions can be created or obscured by changing from the natural scale to a logarithmic scale or vice versa.2 Realistically, it is probable that most true biological interactions cause quantitative rather than all-or-nothing effects at the phenotypic level and Luan et al. concentrate on the former scenario in their article.1

Ideally, we need to know more about the underlying biology. If a model properly reflecting the biology (including the true scale of biological action) demonstrates that the association between E and the phenotype does vary with genotype at G, then this is valuable aetiological information. Unfortunately, a lack of biological knowledge currently impairs our ability to interpret statistical interactions which in turn hinders attempts to learn more about the biology: ‘Catch-22’.6

Two scientific positions would seem tenable. On the one hand, it could be argued that epidemiology has already reached its limits.7 Rather than incurring the opportunity cost associated with an expensive cohort study aimed at teasing out subtle gene-environment interactions of uncertain biological relevance, an argument could be made for cheaper studies to detect large genetic effects (taking advantage of ‘Mendelian Randomisation’8), with the aim of furthering our understanding of the biology at a simple level.34 Alternatively, it could be argued that a cohort study based on 500 000 middle-aged individuals will become ever more valuable over several decades: both as a direct source of information and as a sampling frame for nested sub-studies. The real question is not whether we have enough biological knowledge to definitively interpret gene-environment interactions now, but whether our state of knowledge several decades hence will enable carefully crafted models, faithfully reflecting interactions in the underlying biology, to be used to make useful causal inferences based upon data generated directly or indirectly from the proposed cohort. If so, the prospective and longitudinal nature of the assessment of both exposures and phenotypes should prove invaluable and the benefits accruing from the cohort design could substantially exceed its immediate and on-going costs. From this viewpoint, the key issues—currently being worked on by the Protocol Development Committee are to ensure that the design and conduct of the proposed study are properly thought out, that its aims are realistic given its funding and that the design is flexible enough to cope with the unpredictable course of bioscience over the next few decades.

Given this background, the paper by Luan et al.1 provides a valuable contribution. Firstly, the power estimates are of direct value in their own right, particularly as most previous papers910 have focused on binary rather than quantitative traits. Scientists designing studies with the aim of detecting gene-environment interactions for quantitative traits will be able to make direct, or indirect, use of the documented power profiles. In this regard, it is helpful that the profiles are presented for regression coefficients standardized for dispersion of the trait and of the environmental exposure. Secondly, there is an intrinsic value in performing one's own power calculations from first principles; particularly for large or expensive studies. This paper clearly outlines the methods used, and should enable an equivalent approach to be used in other settings. Thirdly, it is helpful that the authors illustrate how one might use information from previous studies to properly inform such a power calculation. Finally, the paper will be helpful to those designing the UK National Cohort study. Specifically, it will provide provisional power estimates pertinent to the study of gene-environment interaction in the component of the study that will produce results most rapidly: the study of quantitative traits measured at recruitment.

This having been said, power calculations for studies of gene-environment interaction should still carry a cautionary ‘Health Warning’. It is the biology not the statistics that will ultimately tell us the appropriate analytical scale for any given interaction, and until we know whether it is X, log(X) or 1/X that we should be investigating, the link between biological and statistical interaction must remain tenuous. Furthermore, as our knowledge of the complex diseases grows, apparently reasonable power calculations reported for earlier studies look increasingly overoptimistic as new layers of complexity emerge and negative or non-reproducible studies predominate. This is as true for gene-environment interactions as it is for genetic main effects. Nevertheless, one of the distinct advantages of presenting a comprehensive power profile in a form equivalent to that of Luan et al.1 is that it makes all assumptions explicit. An assessor can then decide whether these assumptions are credible or unrealistic given current knowledge.

Notes

Disclaimer: Paul Burton is a member of the Protocol Development Committee for the MRC, Wellcome Trust, DOH National Cohort Study referred to in this commentary. However, the views stated are his own and do not necessarily reflect the views of the Committee or anybody else on the Committee.

References

1 Luan JA, Wong MY, Day NE, Wareham NJ. Sample size determination for studies of gene-environment interaction. Int J Epidemiol 2001;30: 1049–54.

2 Greenland S, Rothman KJ. Concepts of interaction. In: Rothman KJ, Greenland S (eds). Modern Epidemiology. 2nd edn. Philadelphia: Lippincott-Raven, 1998, pp.329–42.

3 Clayton DG. Biostatistics, epidemiology, and the post-genome challenge. Bradford Hill Memorial Lecture. London: Royal Statistical Society, May 2000.

4 Clayton DG, McKeigue P. Epidemiological methods for the study of genes and environmental factors in complex diseases. Lancet 2001 (In press).

5 Strachan DP. The environment and disease: association and causation across three centuries. Bradford Hill Memorial Lecture. London: Royal Statistical Society, April 2001.

6 Heller J. Catch-22. London: Corgi, 1964.

7 Taubes G. Epidemiology faces its limits. Science 1995;269:164–69.[ISI][Medline]

8 Youngman LD, Keavney BD, Palmer A, Clark S, Danesh J, Delephine M et al. Plasma fibrinogen genotypes in 4685 cases of myocardial infarction and in 6002 controls: test of causality by ‘Mendelian Randomisation’. Circulation 2000;102(Suppl.II):31–32

9 Hwang SJ, Beaty TH, Liang KY, Coresh J, Khoury MJ. Minimum sample-size to detect gene environment interaction in case-control designs. Am J Epidemiol 1994;140:1029–37.[Abstract]

10 Garcia-Closas M, Lubin JH. Power and sample size calculations in case-control studies of gene-environment interaction: comments on different approaches. Am J Epidemiol 1999;149:689–92.[Abstract]





This Article
Extract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Burton, P.
PubMed
PubMed Citation
Articles by Burton, P.