Imperial College London and University of Torino
Correspondence: Prof. P Vineis, Department of Epidemiology and Public Health, Imperial College London, St Mary's Campus, Norfolk Place London W2 1PG, UK. E-mail: p.vineis{at}imperial.ac.uk
Serious mistakes have been made in the past by underestimating the effects of environment and overestimating the effects of genes.13 A first seminal 1972 paper by Lewontin1 drew the attention of researchers on the mistakes of partitioning nature and nurture. More recently, Kittles and Weiss, considering the definition of race, showed the lack of an obvious correspondence between genotypes and phenotypes.3
Many investigations on geneenvironment interactions (GEI) are under way in different parts of the world, a subject that also appears as one of the leading items in grant calls from the National Institutes of Health (NIH) or the European Union (EU). Some on-going studies are extremely large (e.g. European Prospective Study into Cancer and Nutrition [EPIC], UK Biobank). All of them employ similar methods for genotyping, while exposure assessment is extremely variable, being for example state-of-the-art for dietary intake in EPIC, but not in other studies or for other exposures. GEI imply studying both environmental exposures (e.g. to pesticides or environmental tobacco smoke) and genetic variants that are supposed to modulate the effects of the former. However, there is an asymmetry between the two. Genotyping is in fact much more accurate than the vast majority of methods used to measure environmental exposures. This implies a lower degree of classification error, that in turn means an easier identification of associations with disease. A further difficulty is related to the rarity of many environmental exposures (that, however, may have an important impact on human health), while several of the polymorphic alleles that are investigated are extremely common (e.g. 4050% for NAT2 or GSTM1). This, again, increases the probability of detecting an association with genotypes (if this is real), but not with environmental exposures.
Let us consider the example in Table 1. The Table refers to the implications of measurement error for the estimation of relative risks. Classification error is expressed by the correlation coefficient between each assessor and a reference standard (r = 1 means no error, r = 0.9 means a 10% classification error). For three different expected relative risks that associate exposure with disease (1.5, 2.0, and 2.5), the Table shows the observed relative risks under different conditions of classification error. For example, a classification error of 10% implies the drop of a relative risk of 2.5 to 2.3, i.e. little change. With a classification error of 90% (assessor 1), however, even a relative risk of 2.5 becomes 1.1, i.e. undetectable with common epidemiological methods. Unfortunately, while in genotyping we are more frequently in the situation of assessor 4, implying a small underestimation of risks, in the field of environmental exposures we are more frequently in the situation of 3 or even 2.
|
According to estimates, the common genotyping method Taqman has 96% sensitivity and 98% specificity, thus allowing little error in classification. On the contrary, sensitivity in environmental exposure assessment is quite often lower than 70% and specificity even lower. This situation is not due to a poor state-of-the-art of environmental science, but to objective problems in reconstructing exposures in free-living populations with great variability and changes in time. Exposure assessment is usually difficult, involves recall of complex informationsuch as dietor extrapolation from few points in space and timesuch as air pollution data. If we aim at understanding the impact of rare exposures (but important for those exposed) then difficulties increase further.
The situation in genetics is still more complicated because of two further reasons. First, multiple genetic testing is becoming the norm. Thus, with the usual P-value thresholds (0.05 or 0.01) a large number of false positives is going to be found. Alternatives to common P-value thresholds, to be used in multiple genetic testing, have been discussed by Colhoun et al.5 Second, one point of view that is held by some authors (e.g. ref. 4) is that case-control studies or even case-only studies are perfectly fine to investigate genetic susceptibility. This is correctly based on the assumption that the genotype does not change in the course of time and with the onset of disease, and is not affected by recall bias. However, the weakest aspect of such study designs, at least for diseases with long latency, is again exposure assessment.
The calculations above are common sense in epidemiology, but seem to be of little concern for those who plan large studies on geneenvironment interactions. It can be predicted that such studies will come up with a number of genetic associations, not so much affected by classification error, and very few credible environmental associations. This is further complicated by the fact that the vast majority of genetic polymorphisms are believed to act through interaction with exposures, so that it will be difficult also to make sense of genetic observations, if the environmental component is weak. What is the solution? The only solution I foresee is to empower exposure assessment, by investing (much more than many investigators have done until now) in strong and validated exposure assessment procedures. This means that epidemiologists should collaborate not only with geneticists, but also with environmental scientists. Large efforts have been made in the field of nutrition, but not yet in all other areas of human exposures. Goals for activities aiming at improving exposure assessment include: repeated measures, allowing assessment of regression dilution bias; and validation of novel research methods, for example metabonomics or the identification of specific DNA adducts, to detect signatures left within body fluids by metabolic processes and/or external exposures. On the other side, the study of genetic influences can be used to shed light on relevant environmental exposures, and the magnitude of their effects in population subgroups. This is the Mendelian randomization paradigm, i.e. the idea that:
the association between a disease and a polymorphism that mimics the biological link between a proposed exposure and disease is not generally susceptible to the reverse causation or confounding that may distort interpretations of conventional observational studies ... Mendelian randomizationthe random assortment of genes from parents to offspring that occurs during gamete formation and conception-provides one method for assessing the causal nature of some environmental exposures.6This would be a different way of looking at the contribution of genes, although a limitation of this approach is that one needs to know much about the functional meaning of genetic variants to use them for the interpretation of environmental exposures. So, Mendelian randomization is not exactly an antidote to the concerns I have expressed.
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() |
---|
2 Kahn J. Getting the numbers right. Perspect Biol Med 2003;46:47383.[ISI][Medline]
3 Kittles RA, Weiss KM. Race, ancestry, and genes: implications for defining disease risk. Annu Rev Genomics Hum Genet 2003;4:3367.[CrossRef][ISI]
4 Clayton D, McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 2001;358:135660.[CrossRef][ISI][Medline]
5 Colhoun HM, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet 2003;361:86572.[CrossRef][ISI][Medline]
6 Davey Smith G, Ebrahim S. Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 2003;32:122.[CrossRef][ISI][Medline]
7 Hankinson SE, Manson JE, London SJ, Willett WC, Speizer FE. Laboratory reproducibility of endogenous hormone levels in postmenopausal women. Cancer Epidemiol Biomarkers Prev 1994;3:5156.[Abstract]