From the Center for Perinatal, Pediatric, and Environmental Epidemiology, Yale University School of Medicine, New Haven, CT
Correspondence to Dr. Michael B. Bracken, Center for Perinatal, Pediatric, and Environmental Epidemiology, Yale University School of Medicine, One Church Street, 6th Floor, New Haven, CT 06510 (e-mail: michael.bracken{at}yale.edu).
Received for publication April 29, 2005. Accepted for publication May 18, 2005.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
epidemiology; genes; genetics; genome; genome, human; meta-analysis; research; review, systematic
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
WHY REPLICATION IS INCREASINGLY DIFFICULT IN GENOMIC EPIDEMIOLOGY |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The large volumes of genetic data being produced by genome-wide screening require new statistical methodologies that extend beyond traditional hypothesis-driven analyses of one or two candidate genes. Genome-wide screening is being conducted for large numbers of candidate genesfor example, for atopic asthma (8) and myocardial infarction (9
)but also for hypothesis-free genome-wide screening. The hypothesis-free approach must account for the simultaneous effects of multiple alleles (10
, 11
) while managing the statistical problems inherent in multiple comparisons (12
14
). Epidemiologic studies have already taken advantage of this new strategy to identify SNPs associated with age-related macular degeneration (15
).
Many polymorphisms have quite small, independent effects (relative risks of <1.5) with complex disease diagnoses, the phenotype, and they exert their effects principally by interacting with other polymorphisms or environmental risk factors (1618
). The effects of a polymorphism on disease causation are often further obscured by complex biologic mechanisms, some only recently discovered, which have evolved to protect or "buffer" the genome from environmental change (19
), as well as by other epigenetic forces (20
22
).
The proportion of individuals carrying a polymorphism who express the expected phenotype (usually by a specified age) varies. Genes with low penetrance pose problems in genomic epidemiology. If penetrance varies across families, estimates of penetrance from families in which a gene was first identified may be higher than in the general population. For example, the BRCA1 allele causing breast cancer had 85 percent penetrance in the original families studied but 4060 percent penetrance in the broader population by age 70 years (23).
Gene expression may be influenced by the parental origin of the polymorphism, or imprinting. Therefore, the insulin growth factor-2 gene (IGF-2) is active only if derived from the father (24). It is often not known whether parent-of-origin effects are due to imprinting or to placental or breast milk transfer of immune factors. To disentangle parent-of-origin effects, studies must collect DNA from parents, a difficult task for late-onset disease when parents may be deceased.
Polymorphisms having opposite effects on a disease may be present in the same gene. Unless the polymorphism of interest is precisely specified, studies may report different findings for the same gene. For example, in the ß2 adrenoceptor gene (ß2AR), a Gly16 mutation increases the risk of nocturnal asthma, but another mutation at Glu27 protects against bronchial hyperreactivity (25).
DNA sequences are predetermined at conception, but gene function is not. Environmental factors can switch gene functions on or off. Methylation is the most widely studied of these epigenetic processes (22). Epigenetic phenomena reduce specificity of polymorphism-disease associations and lower the power of genomic studies to detect real effects.
![]() |
A NEED FOR ELECTRONIC EVIDENCE-BASED SYSTEMATIC REVIEWS OF GENOMIC EPIDEMIOLOGY |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Publication bias
The genomic epidemiologic literature is plagued with problems of publication bias, particularly toward selectively publishing positive studies. Systematic reviews permit the formal identification and exploration of publication bias. Colhoun et al. (26) describe 19 case-control studies of angiotensin-converting enzyme gene (ACE) DD polymorphisms and coronary heart disease that show apparent publication bias in favor of positive studies, and studies with odds ratios of up to 3.0, in a series of small studies compared with the estimate of 1.1 from a much larger database. Ioannidis et al. (27
) have reported on several disease areas, showing that initial studies tend to report large risk ratios for specific polymorphisms that are either much smaller or not replicated in later investigations.
Study replication
As in classical epidemiology, replication is fundamental for deciding that an observed association is likely not due to chance (28). In genomic epidemiology, replication should be exact for the alleles studied and as similar as possible in terms of population and environment. Even exact replication is susceptible to high rates of false-positive results. In one study, simulation of random fluctuations in a whole genome scan with no trait-causing loci in 100 sib pairs and parents produced 22 regions significant at the p = 0.05 level, of which four remained significant at p = 0.05 in the replication (29
). The effect of population differences causing heterogeneity can be explored formally within the context of a systematic review.
Subgroup analyses
The problem of interpretation found in subgroup analysis in classical epidemiology (30) applies equally in genomic studies. If an association of a polymorphism with disease is found only in a subgroupfor example, after multiple subgroup analyses of gene, microsatellite markers, or SNPs evaluated by gender, age, or ethnic groupthen the association is likely to be spurious unless supported by exact replication. Many reported gene-environment interactions are derived from subgroup analyses, and it is difficult to ascertain whether these analyses were testing a priori hypotheses; however, it seems likely that many were not. Individual patient data meta-analysis (see below) can be a particularly powerful approach to identifying subgroups at differential risk of disease.
Meta-analysis and estimating typical effect sizes
There are two principal approaches to statistically pooling data. Most commonly, summary measures of association from individual studies are weighted by their inverse variance and are analyzed to derive a "typical" estimate of risk by using a Mantel-Haenszel procedure (31). An example was published for the IL-10 1082 G/G genotype, showing increased risk for recurrent pregnancy loss (32
). Exploration of effect modification from covariates may be possible with meta-regression techniques (33
).
The second pooling method uses individual subject data from studies being analyzed (34). This method is preferred because it allows some control of confounding factors in the reanalysis as well as subgroup analysis, but it requires the collaboration of many investigators and their willingness to share data. Ioannidis et al. (35
) used this method to summarize the protective effects of CCR5-
32, an allele found in 15 percent of Caucasians, in slowing down disease progression in individuals infected with human immunodeficiency virus.
Limitations of current databases
Several large electronic databases collect or "bank" genomic data for research purposes. The Environmental Genome Project is targeting SNPs in 200 "environmentally responsive" genes (36); the GenBank database, the National Institutes of Health's genetic sequence database, and part of the International Nucleotide Sequence Database Collaboration (37
) are important repositories of SNP-level information. ALFRED is a useful resource documenting the prevalence of polymorphisms (38
). Other databases describing polymorphism prevalence include the Centers for Disease Control and Prevention's Genotype Prevalence Database (39
), Allele Frequencies in Worldwide Populations (40
), and the National Cancer Institute's SNP500Cancer database (41
). The International HapMap Project is developing a map of haplotypes with a prevalence of more than 5 percent in the human genome, based on genotyping of 1 million SNPs in 270 individuals (42
). These gene banks are important for furthering research into disease associations with candidate polymorphisms, but they do not claim to be and are not a substitute for systematic reviews of epidemiologic studies of gene-disease associations.
Systematic reviews
Evidence-based medicine provides a paradigm for explicitly and systematically searching, collating, and synthesizing a complete body of evidence on a research topic (43). Systematic reviews include publication of detailed and transparent literature-searching methods, searching for possible bias in that evidence base, evaluation of study validity and heterogeneity, and consideration of the importance of effect size and precision. They have been widely adopted in clinical research as the "gold standard" for synthesizing and drawing conclusions from an extant body of research evidence (44
). All scientific literature is amenable to systematic review, a process that does not necessarily require meta-analysis. Many systematic reviews identify a body of data not amenable to meta-analysis because of heterogeneity in study methodology or the populations studied.
In genomic epidemiology, there has been a concerted effort by the Human Genome Epidemiology (HuGE) review group, organized by the Centers for Disease Control and Prevention (45, 46
), to conduct systematic reviews of gene-disease associations. Explicit criteria are being developed to assess the validity of genomic epidemiology publications (47
). These criteria include recommendations for how studies may be scored for validity, data integration for calculating typical effect estimators, and reviews constructed according to common genotypes and genotyping methodology (48
). A recent example, in which individual subject data were used, considered polymorphisms in the alcohol dehydrogenase gene (ADH) and the aldehyde dehydrogenase gene (ALDH) associated with the risk of head and neck cancer (49
).
HuGE reviews do not always include details of analyses to identify possible publication bias, exact accounts of literature search strategies, descriptions of "excluded" studies, or details of actual assessments used to judge individual study validity. Importantly, HuGE reviews are not updated routinely and frequently, all of which are standard features of a modern, electronically based, systematic review. Although electronic files of HuGE publications from several journals, including the American Journal of Epidemiology, are accessible on the HuGE website, the reviews are not published online initially, a process that would speed up publication, lead to more uniformity in the reports, and ease updating. Furthermore, electronic publication would allow standard statistical programs for meta-analysis to be embedded within the software for creating the systematic review.
Importantly, online publication would permit access to a full protocol describing the scope and objectives of the planned review. At present, protocols are listed simply as titles on the HuGE website. Protocol publication explicitly documents the planned objectives of the review, including subgroups to be considered in any analysis and other criteria subject to bias while the review is being constructed. The importance of publishing protocols to avoid ex post facto manipulation of primary outcome selection has recently been demonstrated in the clinical literature (50).
Examples of electronic databases for online systematic reviews from medicine are already well establishednotably, the Cochrane Library and its REVMAN and METAVIEW software for creating systematic reviews (51)and the social sciences (Campbell Collaboration (48
)). All review groups within these collaborations require frequent updating of their reviews. While the Cochrane Collaboration is currently focused on reviewing randomized clinical trials, efforts are under way to expand this focus to observational studies of similar design to genomic epidemiology. The HuGE research network, with its developed consensus guidelines (47
), appears best positioned to move to full electronic creation and publication of systematic reviews of genomic epidemiology as exemplified by the Cochrane Collaboration.
![]() |
SUMMARY |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
ACKNOWLEDGMENTS |
---|
The author is grateful to Geir Jacobsen, Josephine Hoh, and several anonymous reviewers for their comments on earlier drafts of this paper.
Michael Bracken is an editor of the Cochrane Neonatal Review Group and a member of the Cochrane Methods Group.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|