EDITORIAL

Gene–Environment Interactions: How Many False Positives?

Giuseppe Matullo, Marianne Berwick, Paolo Vineis

Affiliations of authors: Foundation and University of Torino, Italy (GM); Division of Epidemiology, Department of Internal Medicine, University of New Mexico, Albuquerque, NM (MB); Department of Epidemiology and Public Health, Imperial College, London, UK (PV)

Correspondence to: Marianne Berwick, PhD, MD, MSC 08 4630, Room 103A, 1 University of New Mexico, Albuquerque, NM 87131 (e-mail: mberwick{at}salud.unm.edu).

In this issue of the Journal, Hung et al. (1) describe the results of a large study on the association between base excision repair gene polymorphisms and lung cancer risk. Their study demonstrates problems typical of investigations of gene–environment interactions, in particular, the fact that among the generally negative results, some seemingly noteworthy associations are identified in subgroups of subjects who are defined on the basis of their tumor histology or smoking habits. The unique aspect of this study is that the authors have estimated the probability that the associations found are attributable to chance (i.e., false positives). The idea of evaluating new observations in the light of existing evidence is not novel and belongs to at least two traditions: Bayesian statistics and clinical epidemiology. In clinical epidemiology, formal treatment of "proof" from prior evidence is updated into posterior probability according to Bayes' theorem. This approach has proved extremely useful because (a) it has revealed that diagnosis (and, by extension, clinical and scientific reasoning) develops in a sequence of steps that incorporate explicitly or implicitly previous knowledge (2) and (b) it has trained physicians to assign probabilities to the different components of such reasoning, namely, prior knowledge and "likelihood," or the probability, of a positive test result in the presence of disease.

Wacholder et al. (3) have proposed these same approaches to assess the validity of the increasing number of associations that are being identified among genetic variants, environmental exposures, and disease (i.e., gene–environment and gene–gene interactions). A large number of such associations have been reported and even more are expected to emerge in the future. Tens of thousands of single nucleotide polymorphisms (SNPs) are or will be investigated for their association with cancer, and many of the observed results will be false positives. The challenge is to distinguish the false-positive associations from the true positives. Wacholder et al. (3) proposed a simple Bayesian approach that is based on the estimation of a prior probability and the calculation of posterior probability. The prior probability can be estimated from results of previous studies, from biochemical or molecular information (e.g., gene expression data) that supports the function of a SNP, or from other types of evidence such as sequence homology (4). The general idea is to weight a new observation with the available prior evidence to derive a posterior probability. Recently, in an extension of the model proposed by Wacholder et al. (3), Ioannidis (5) has incorporated selective reporting and other biases and the fact that specific hypotheses, including gene–disease associations and gene–environment interactions, are usually tested by many teams worldwide. By taking these parameters into account when scrutinizing all the evidence, one can show that very few identified gene–disease associations and gene–environment interactions are probably real.

The approach proposed by Wacholder et al. is ingenious and more convincing than other approaches, such as those based on Bonferroni's correction for multiple comparisons or similar statistical methods (6,7). Although the study by Hung et al. is a clever application of the proposal by Wacholder et al., execution of that proposal is not free from problems. The main problem is how best to calculate the prior probabilities. First, there is often no information available to use. Second, any available information may not be easy to evaluate, either because it is indirect (for example, from sequence homology) or because it is contradictory. The latter is very often the case with the epidemiologic evidence of gene–environment interactions (see below). Third, methods to quantify prior probabilities are not available, even when we have good evidence. The attempt made by Hung et al. is a good step forward, but it is still imperfect. The authors have considered and applied five categories of prior probability (from 50% to 0.1%); however, the basis for these numbers and categories is unclear.

A further step forward might consist of using results of well-done meta-analyses or pooled analyses of the available studies. In this case, we have good examples from the clinical literature: a good meta-analysis can allow clinicians to judge how much a new study changes the estimate of efficacy of a drug. For example, if a meta-analysis provides an odds ratio of 0.9 and a new study reports an odds ratio of 0.2, then the real contribution of the new study to the advancement of knowledge will be judged by its ability to change the odds ratio provided by the meta-analysis. This ability will depend on both qualitative considerations (such as potential biases and the response rate) and the size of the study. Meta-analyses can be extremely useful for obtaining prior estimates against which the new evidence can be challenged because they include (a) an overall odds ratio from virtually all the available studies, (b) an evaluation of the quality of the studies (a quality score is usually assigned to each study included in a meta-analysis), (c) a weight that depends on the study size, and (d) a measure of heterogeneity. In a meta-analysis, a large and well-conducted positive study can thus outweigh several small negative studies.

Figure 1 shows an example of a meta-analysis of some of the genes included in the Hung et al. article that we performed using data on gene–environment interactions, with a specific focus on DNA repair polymorphisms (summarized at http://perseus.isi.it/huge [last accessed: March 24, 2005]). A network of investigative groups involved in human genome epidemiology research has been developed to combine data from many studies to overcome the deficiencies of currently available, relatively small datasets (Ioannidis JPA, Altman R, Boffetta P, Danesh J, Hartge P, Little J, et al.: unpublished data). Overall, the results shown in the figure support the choices made by Hung et al. in assigning prior probabilities. For example, we found an association between the XRCC1 194Trp allele and lung cancer among African-American and white heavy smokers, as was also reported in one previous investigation (15), as well as among all African-American lung cancer patients, whereas the association between the OGG1 326 Cys/Cys genotype and lung cancer does not seem to be higher for adenocarcinoma patients than for the whole lung cancer group (Fig. 1). As Hung et al. reported, the discrepancy between their findings and published data might reflect the fact that their population included subjects with different ethnicities and environmental exposures that may have modified the association. However, more data need to accumulate before even meta-analyses will be useful for the estimation of prior probabilities.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 1. Odds ratios for the relation between lung cancer risk and genotypes. A) OGG1 326 Cys/Cys versus OGG1 326 Ser/Ser, all histotypes. B) OGG1 326 Cys/Cys versus OGG1 326 Ser/Ser, adenocarcinoma. C) XRCC1 194 (Trp/Trp + Trp/Arg versus Arg/Arg), whole study population. D) XRCC1 194 (Trp/Trp + Trp/Arg versus Arg/Arg), heavy smokers. For each study, the odds ratio estimate is plotted with a box; the area of each box is inversely proportional to the estimated square root of the standard deviation in the study. Diamond and dashed vertical lines represent pooled odds ratios; horizontal lines represent 95% confidence intervals. Left side = populations and references of the studies.

 
In conclusion, although Hung et al. are appropriately cautious in drawing conclusions, it is clear that more research and thought are needed to develop robust methods to assign prior probabilities to analyses and that meta-analysis techniques should be incorporated when possible.

NOTES

This work was funded by Compagnia di San Paolo and Associazione Italiana per la Ricerca sul Cancro, AIRC (Torino).

The authors thank Maurizio Manuguerra for maintaining the DNA repair website at ISI Foundation and for performing the meta-analyses, Federica Saletta for her contribution to update the database, and John Ioannidis for comments on the manuscript.

REFERENCES

(1) Hung RJ, Brennan P, Canzian F, Szeszenia-Dabrowska N, Zaridze D, Lissowska J, et al. Large-scale investigation of base excision repair genetic polymorphisms and lung cancer risk in a multicenter study. J Natl Cancer Inst 2005;97:567–76.[Abstract/Free Full Text]

(2) Howson C, Urbach P. Scientific reasoning. The Bayesian approach. 2nd ed. Chicago (IL): Open Court; 1993.

(3) Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst 2004;96:434–42.[Abstract/Free Full Text]

(4) Zhu Y, Spitz MR, Amos CI, Lin J, Schabath MB, Wu X. An evolutionary perspective on single-nucleotide polymorphism screening in molecular cancer epidemiology. Cancer Res 2004;64:2251–7.[Abstract/Free Full Text]

(5) Ioannidis JP. Why most published research findings are false. PLoS Med. In press.

(6) Brennan P. Design and analysis issues in case-control studies addressing genetic susceptibility. IARC Sci Publ 1999;148:123–32.[Medline]

(7) Colhoun HM, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet 2003;361:865–72.[CrossRef][ISI][Medline]

(8) Wikman H, Risch A, Klimek F, Schmezer P, Spiegelhalder B, Dienemann H, et al. hOGG1 polymorphism and loss of heterozygosity (LOH): significance for lung cancer susceptibility in a caucasian population. Int J Cancer 2000;88:932–7.[CrossRef][ISI][Medline]

(9) Sunaga N, Kohno T, Yanagitani N, Sugimura H, Kunitoh H, Tamura T, et al. Contribution of the NQO1 and GSTT1 polymorphisms to lung adenocarcinoma susceptibility. Cancer Epidemiol Biomarkers Prev 2002;11:730–8.[Abstract/Free Full Text]

(10) Le Marchand L, Donlon T, Lum-Jones A, Seifried A, Wilkens LR, et al. Association of the hOGG1 Ser326Cys polymorphism with lung cancer risk. Cancer Epidemiol Biomarkers Prev 2002;11:409–12.[Abstract/Free Full Text]

(11) Sugimura H, Kohno T, Wakai K, Nagura K, Genka K, Igarashi H, et al. hOGG1 Ser326Cys polymorphism and lung cancer susceptibility. Cancer Epidemiol Biomarkers Prev 1999;8:669–74.[Abstract/Free Full Text]

(12) Ito H, Hamajima N, Takezaki T, Matsuo K, Tajima K, Hatooka S, et al. A limited association of OGG1 Ser326Cys polymorphism for adenocarcinoma of the lung. J Epidemiol 2002;12:258–65.[Medline]

(13) Ratnasinghe D, Yao SX, Tangrea JA, Qiao YL, Andersen MR, Barrett MJ, et al. Polymorphisms of the DNA repair gene XRCC1 and lung cancer risk. Cancer Epidemiol Biomarkers Prev 2001;10:119–23.[Abstract/Free Full Text]

(14) David-Beabes GL, London SJ. Genetic polymorphism of XRCC1 and lung cancer risk among African-Americans and Caucasians. Lung Cancer 2001;34:333–9.[CrossRef][ISI][Medline]

(15) Chen S, Tang D, Xue K, Xu L, Ma G, Hsu Y, et al. DNA repair gene XRCC1 and XPD polymorphisms and risk of lung cancer in a Chinese population. Carcinogenesis 2002;23:1321–5.[Abstract/Free Full Text]


This article has been cited by other articles in HighWire Press-hosted journals:


             
Copyright © 2005 Oxford University Press (unless otherwise stated)
Oxford University Press Privacy Policy and Legal Statement