Laboratory for Genetic Epidemiology, Western Australian Institute for Medical Research, Centre for Medical Research, University of Western Australia, QE-II Medical Centre, B Block, Hospital Avenue, Nedlands WA 6009, Australia; and School of Population Health, University of Western Australia, 35 Stirling Highway, Crawley WA 6009, Australia. E-mail: lyle.palmer{at}cyllene.uwa.edu.au
The genomics revolution continues to transform epidemiology, clinical medicine, and drug discovery.17 The generation of new genomic knowledge and its integration into epidemiological, molecular, and clinical research projects in industry and academia are increasing exponentially.8 The genetic basis of disease susceptibility, disease progression and severity, and response to therapy for many complex conditions has been increasingly emphasized in clinical research and epidemiology, with the ultimate goal of improving preventive strategies, diagnostic tools, and therapies.4,5,9,10
These dynamic trends in epidemiology are embodied in this special issue of the International Journal of Epidemiology, which has the theme Genetic Epidemiology. Khoury et al.11 and Jablonka12 provide outstanding reviews of the many ways in which genetic and genomic knowledge is being incorporated into mainstream epidemiology and of the growing integration of these disparate disciplines. Iliadou et al.,13 Sparks et al.,14 Lewis and Brunner,15 McKeown-Eyssen et al.,16 Morita et al.,17 Keavney et al.,18 and Gatto et al.19 provide a number of interesting and instructive illustrations of the praxis of adding genetic information to studies established to investigate epidemiological hypotheses. McNeill et al.20 describe an investigation of the interrelationships between genetic and environmental determinants on complex phenotypesa further area of growing interest for both genetics and epidemiology.
Some historical perspective
Following the completion of the Human Genome Project (www.ornl.gov/sci/techresources/Human_Genome/home.shtm), three key advances are creating unprecedented opportunities for understanding the pathogenic basis of common human diseases:21 (i) extensive catalogues of DNA sequence variants across the human genome are being compiled; (ii) dramatic progress is occurring in molecular genetic technologies for evaluating the polymorphic sites in human samples with increasing efficiency and decreasing cost; and (iii) large-scale, population-based human samples are becoming increasingly available, for example, the European Prospective Investigation into Cancer and Nutrition (EPIC),22 the Busselton Health Study,23 and many others. The construction of large, national cohorts such as the Medical Research Council/Wellcome Trust UK Biobank (www.ukbiobank.ac.uk/) is occurring in many nations, with planned or ongoing initiatives for national cohorts in a number of countries in Western Europe, Scandinavia, North America, and Australasia. The focus of much of this activity has been on common complex human conditions such as obesity and cancer that are determined by multiple genetic and environmental factors; such diseases constitute the principal health burden in the developed nations.1,4,5,24
Despite this manifest progress, in many ways we are right at the beginning of our ability to map complex disease genes. The completion of the sequencing of the human genome was the key enabling event in this enterprise. However, the main focus of the Human Genome Project was on the consensus human sequence, which by definition does not contain information about individual differences of medical relevance.25 To make use of the consensus sequence, the SNP Consortium (TSC) was formed in 1999, alongside many other public and private projects, with the aim of discovering common polymorphism sites in the human genome.26 The increasingly complete catalogue of common genetic variants that is being widely applied to association studies of complex phenotypes is a direct extension of this early project. The natural follow-on from the single nucleotide polymorphism (SNP) discovery phase was to genotype discovered SNPs in multiple individuals to begin to assess their potential utility for disease mapping. This latter enterprise is ongoing in the International HapMap Project (www.hapmap.org/). The next logical stage in the progression from human consensus sequence to SNP identification to SNP genotyping will involve applications to gene discovery, clinical medicine, and epidemiology, reflecting the culmination of the initial human genomic framework studies.
The completion of hundreds of family-based genome-wide scans for linkage to complex disease genes, coupled with the availability of high-density SNP maps across the genome and decreasing genotyping costs, is beginning to shift emphasis away from linkage analysis and microsatellite markers towards SNP genotyping and different analytical strategies based on allelic association.2731 A small but growing number of genes associated with complex diseases have been discovered using association-based genetic mapping.32,33
Epidemiology and genetics: a growing union
In the process of transforming epidemiology, genomics is itself being transformed. The development of large, population-based resources for the joint investigation of environmental and genetic hypotheses is a key advance for the growing integration of epidemiology with genetics. These resources are inherently epidemiological in nature as they aim to promote understanding of disease aetiology at a population level. A better understanding of how aetiological factors act at a population level will be a critical step for the clinical utilization of new genomic knowledge and tools to improve health outcomes.4,34,35 Ultimately, genetic knowledge will become useful in the clinical arena only if it is placed back into an epidemiological and medical/public health context.57,9,36 It is therefore clear that very large, well-characterized population-based studies drawn from multiple ethnic groups will play a central role in the future implementation of SNP-based gene discovery and in diagnostic tests for complex phenotypes in the outbred, highly admixed populations that increasingly characterize modern human societies.37
Another important way in which genetics is being actively enriched by mainstream epidemiology is in the area of study design. Recently, more and more articles have begun to address the features of a good genetic association study.8,33,3739 The increasing focus on study design has resulted from the realization that genetic association studies of complex phenotypes have either tended to fail to discover susceptibility loci or failed to replicate those studies that did.8,37,4044 Despite the widespread use of genetic case/control studies, this lack of consistency is a generally recognized limitation.43,44 The lack of reproducibility is often ascribed to small samples with inadequate statistical power, biological and phenotypic complexity, population-specific linkage disequilibrium, effect-size bias, and population stratification.27,43,45,46 Additional potential reasons for the nonreplication of true positive association results include inter-investigator and interpopulation heterogeneity in study design, analytic method, phenotype definition, genetic structure, environmental exposures, and markers genotyped. It is now routinely argued that large sample sizes (generally, thousands rather than hundreds), rigorous P-value thresholds, and replication in multiple independent datasets are necessary for reliable results.4,33,39,42,43
A new epidemiology?
The last decade has been a tumultuous and exciting time in human genetics. Explosive growth in technical capabilities and genomic knowledge has been tempered by initial failures to find genes for complex phenotypes using any strategylinkage or association. Our statistical capabilities and ability to process and interpret data still lag far behind our technical capability to produce very large amounts of genomic data. What have we learnt over the last decade of gene discovery attempts in complex human disease? One important lesson learnt is that everything in human genetics is context specificspecific to the population, environmental exposures, genomic region, and gene under investigation. There is no one paradigm for gene discovery, study design, or analytic approach that will be optimal in all situations. Thus, some complex phenotypes may be modulated by many rare alleles; some may be modulated by a smaller number of common alleles. Despite the large number of reviews and guru statements on optimal study design and analytic strategies, it has become increasingly clear that flexible, mixed approaches and hypothesis-free study designs are desirable. A feature associated with the genomics revolution has been an unfortunate tendency towards hyperbole in the promise of human gene discovery. This has led to unrealistic expectations regarding the scope of the deliverables and the timeline for the integration of disease-gene discovery into clinical medicine and epidemiology, and exaggerated cynicism and pessimism within the academic community. For researchers interested in investigating the pathogenesis of complex human diseases, one of the most important tasks in the coming years is not to add to the hyperbole surrounding genetic epidemiology, but carefully to establish and communicate a realistic set of expectations.
Where do we stand at present with regard to gene discovery by linkage disequilibrium (LD) mapping? For most complex human diseases, the reality of multiple disease-predisposing genes of modest individual effect, genegene interactions, geneenvironment interactions, interpopulation heterogeneity of both genetic and environmental determinants of disease, and the concomitant low statistical power have made it clear that both initial detection and replication will probably be very difficult.8,44,47 However, in addition to an improved understanding of the complexity of the task at hand, we have some important new tools and knowledge that offer considerable prospects for the future success of gene discovery efforts. The technology for detecting SNPs has undergone rapid development, and increasingly complete catalogues of SNPs across the human genome have been constructed. A large number of groups are currently active in addressing methodological problems in LD mapping and haplotypic approaches. Our growing understanding of the architecture of the human genome and the extent of human genetic variabilityaided by projects such as HapMapwill probably accelerate our ability to use the tools at hand to map genes for many common conditions. We stand at the threshold of the availability of numerous very large cohort opportunities throughout the world. All these recent developments, taken together with a small but growing number of successful gene localization studies for complex phenotypes, suggest that we should be cautiously optimistic about our potential to disover the genes underlying common human diseases.
An important recent trend that also gives rise to considerable hope has been the assimilation of genetic epidemiology into mainstream epidemiology and public health in many academic institutions. The growing engagement of epidemiologists in genetic research should ameliorate some of the problems with discovery and nonreplication that have plagued complex disease geneticsmany of which can be blamed on poor epidemiological study design and overinterpretation of marginal results. At the same time, observational epidemiology has begun to benefit from new genetic approaches to causal inference regarding exposures and disease. One such approach is Mendelian randomization,48 which is based on the plausible proposition that the association between a disease and a genetic polymorphism that mimics the biological link between a proposed exposure and disease is not generally susceptible to the reverse causation or confounding that can distort interpretations of conventional observational studies. The escalating utilization of genetic data in epidemiological investigations in novel and creative ways represents fresh hope for a discipline beleaguered by the potential for reverse causality and many forms of confounding. Both genetics and epidemiology have had real difficulties with the investigation of complex human disease aetiology apropos defining true risk factors, replicating results among different studies, and providing useful information for the appropriate targeting of preventive or therapeutic measures. Each discipline has much to learn from the other and there is much to be gained from active collaboration. Our understanding of complex disease pathophysiology has already begun to enter into the realm of clinical genetics,49 and we have every reason to anticipate that the impact of genomics on clinical practice and on our understanding of biology and epidemiology will continue to accelerate. This issue of the International Journal of Epidemiology with the special theme Genetic Epidemiology is both a testament to this fact and a promise for the future of both epidemiology and genetics.
References
1 Khoury MJ. Genetic epidemiology and the future of disease prevention and public health. Epidemiol Rev 1997;19:17580.[ISI][Medline]
2 Nagy A, Perrimon N, Sandmeyer S, Plasterk R. Tailoring the genome: the power of genetic approaches. Nat Genet 2003;33(Suppl.):27684.[CrossRef][ISI][Medline]
3 Zerhouni E. Medicine. The NIH Roadmap. Science 2003;302:6372.
4 Goldstein DB, Tate SK, Sisodiya SM. Pharmacogenetics goes genomic. Nat Rev Genet 2003;4:93747.[CrossRef][ISI][Medline]
5 Merikangas KR, Risch N. Genomic priorities and public health. Science 2003;302:599601.
6 Kelada SN, Eaton DL, Wang SS, Rothman NR, Khoury MJ. The role of genetic polymorphisms in environmental health. Environ Health Perspect 2003;111:105564.[ISI][Medline]
7 Shostak S. Locating gene-environment interaction: at the intersections of genetics and public health. Soc Sci Med 2003;56:232742.[CrossRef][ISI][Medline]
8 Cardon LR, Bell JI. Association study designs for complex diseases. Nat Rev Genet 2001;2:9199.[CrossRef][ISI][Medline]
9 Burke W. Genomics as a probe for disease biology. N Engl J Med 2003;349:96974.
10 Johnson JA. Pharmacogenetics: potential for individualized drug therapy through genetics. Trends Genet 2003;19:66066.[CrossRef][ISI][Medline]
11 Khoury MJ, Millikan R, Little J, Gwinn M. The emergence of epidemiology in the genomics age. Int J Epidemiol 2004;33:93644.
12 Jablonka E. Epigenetic epidemiology. Int J Epidemiol 2004;33:92935.
13 Iliadou A, Cnattingius S, Lichtenstein P. Low birthweight and type 2 diabetes: a study on 11 162 Swedish twins. Int J Epidemiol 2004;33:94853.
14 Sparks R, Bigler J, Sibert JG, Potter JD, Yasui Y, Ulrich CM. TGß1 polymorphism (L10P) and risk of colorectal adenomatous and hyperplastic polyps. Int J Epidemiol 2004;33:95561.
15 Lewis SJ, Brunner EJ. Methodological problems in genetic association studies of longevity: the apolipoprotein E gene as an example. Int J Epidemiol 2004;33:96270.
16 McKeown-Eyssen G, Baines C, Cole DEC et al. Case-control study of genotypes in multiple chemical sensitivity: CYP2D6, NAT1, NAT2, PON1, PON2, and MTHFR. Int J Epidemiol 2004;33:97178.
17 Morita A, Iki M, Dohi M et al. Prediction of bone mineral density from vitamin D receptor polymorphisms is uncertain in representative samples of the Japanese WomenJapanese population-based Osteoporosis (JPOS) Study. Int J Epidemiol 2004;33:97988.
18 Keavney B, Palmer A, Parish S et al. Lipid-related genes and myocardial infarction in 4685 cases and 3460 controls: discrepancies between genotype, blood lipid concentrations and coronary disease risk. Int J Epidemiol 2004;33:100213.
19 Gatto NM, Campbell UB, Rundle AG, Ahsan H. Further development of the case-only design for assessing geneenvironment interaction: evaluation of and adjustment for bias. Int J Epidemiol 2004;33:101424.
20 McNeill G, Tuya C, Smith WCS. The role of genetic and environmental factors in the association between birthweight and blood pressure: evidence from meta-analysis of twin studies. Int J Epidemiol 2004;33:9951001.
21 Venter JC, Levy S, Stockwell T, Remington K, Halpern A. Massive parallelism, randomness and genomic advances. Nat Genet 2003;33(Suppl.):21927.[CrossRef][ISI][Medline]
22 Riboli E. Nutrition and cancer: background and rationale of the European Prospective Investigation into Cancer and Nutrition (EPIC). Ann Oncol 1992;3:78391.[Abstract]
23 Palmer LJ, Knuiman MW, Divitini ML et al. Familial aggregation and heritability of adult lung function: results from the Busselton Health Study. Eur Respir J 2001;17:696702.
24 Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet 2003;33(Suppl.):22837.[CrossRef][ISI][Medline]
25 Cardon LR, Watkins H. Waiting for the working draft from the human genome project: a huge achievement, but not of immediate medical use. Br Med J 2000;320:122122.
26 Sachidanandam R, Weissman D, Schmidt SC et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409:92833.[CrossRef][ISI][Medline]
27 Risch NJ. Searching for genetic determinants in the new millennium. Nature 2000;405:84756.[CrossRef][ISI][Medline]
28 Schork NJ, Fallin D, Lanchbury JS. Single nucleotide polymorphisms and the future of genetic epidemiology. Clin Genet 2000;58:25064.[CrossRef][ISI][Medline]
29 Gray IC, Campbell DA, Spurr NK. Single nucleotide polymorphisms as tools in human genetics. Hum Mol Genet 2000;9:24038.
30 Keavney B. Genetic association studies in complex diseases. J Hum Hypertens 2000;14:36167.[CrossRef][ISI][Medline]
31 Altmuller J, Palmer LJ, Fischer G, Scherb H, Wjst M. Genomewide scans of complex human diseases: true linkage is hard to find. Am J Hum Genet 2001;69:93650.[CrossRef][ISI][Medline]
32 The International HapMap Project. The International HapMap Project. Nature 2003;426:78996.[CrossRef][ISI][Medline]
33 Zondervan KT, Cardon LR. The complex interplay among factors that influence allelic association. Nat Rev Genet 2004;5:89100.[CrossRef][ISI][Medline]
34 Ohlstein EH, Ruffolo RR, Jr, Elliott JD. Drug discovery in the next millennium. Annu Rev Pharmacol Toxicol 2000;40:17791.[CrossRef][ISI][Medline]
35 Chanda SK, Caldwell JS. Fulfilling the promise: drug discovery in the post-genomic era. Drug Discov Today 2003;8:16874.[CrossRef][ISI][Medline]
36 Khoury MJ, McCabe LL, McCabe ER. Population screening in the age of genomic medicine. N Engl J Med 2003;348:5058.
37 Goldstein DB, Ahmadi KR, Weale ME, Wood NW. Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. Trends Genet 2003;19:61522.[CrossRef][ISI][Medline]
38 Silverman EK, Palmer LJ. Case-control association studies for the genetics of complex respiratory diseases. Am J Respir Cell Mol Biol 2000;22:64548.
39 Dahlman I, Eaves IA, Kosoy R et al. Parameters for reliable results in genetic association studies in common disease. Nat Genet 2002;30:14950.[CrossRef][ISI][Medline]
40 Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication validity of genetic association studies. Nat Genet 2001;29:3069.[CrossRef][ISI][Medline]
41 Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet 2003;33:17782.[CrossRef][ISI][Medline]
42 Tabor HK, Risch NJ, Myers RM. Opinion. Candidate-gene approaches for studying complex genetic traits: practical considerations. Nat Rev Genet 2002;3:39197.[CrossRef][ISI][Medline]
43 Weiss KM, Terwilliger JD. How many diseases does it take to map a gene with SNPs? Nat Genet 2000;26:15157.[CrossRef][ISI][Medline]
44 Terwilliger JD, Goring HH. Gene mapping in the 20th and 21st centuries: statistical methods, data analysis, and experimental design. Hum Biol 2000;72:63132.[ISI][Medline]
45 Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. Replication validity of genetic association studies. Nat Genet 2001;29:30609.[CrossRef][ISI][Medline]
46 Goring HH, Terwilliger JD, Blangero J. Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Genet 2001;69:135769.[CrossRef][ISI][Medline]
47 Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science 1996;273:151617.[ISI][Medline]
48 Davey Smith G, Ebrahim S. Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 2003;32:122.[CrossRef][ISI][Medline]
49 Mallal S, Nolan D, Witt C et al. Association between presence of HLA-B*5701, HLA-DR7, and HLA-DQ3 and hypersensitivity to HIV-1 reverse-transcriptase inhibitor abacavir. Lancet 2002;359:72732.[CrossRef][ISI][Medline]