University of Newcastle, Institute of Human Genetics, Central Parkway, Newcastle upon Tyne, UK. E-mail: b.d.keavney{at}newcastle.ac.uk
Over the last scientific generation, observational epidemiology and clinical trials have revolutionized our understanding of causal risk factors predisposing to a variety of common diseases, perhaps most strikingly cardiovascular disease. Pretty much every member of the public now knows that smoking, high blood pressure, high levels of blood cholesterol, and diabetes predispose to the development of coronary heart disease (CHD), and yet one does not have to venture too far back into last century to find a time when all of this was completely unknown. The extraordinary power of large blood-based observational epidemiological studies to identify associations between risk factors and complex diseases has been one of medical science's great recent success stories. No less important have been the data from clinical trials confirming that associations that have been found in observational studies are causalby showing that treatment of particular risk factors using suitably specific therapeutic agents diminishes the risks of developing disease. The third important strand of evidence confirming (albeit indirectly) causality of a risk factor consists of studies in animal models of disease; the technology to create transgenic and gene-targeted animals has resulted in an explosion of activity in this field. Although such models often confirm the importance of risk factors in humans (for example, the development of stroke in certain hypertensive rat models, or of atheromatous disease in genetically engineered hyperlipidaemic mice), importantly they do not in all cases.
Investigators studying genetic susceptibility to complex traits such as CHD have looked on somewhat enviously at the steady flow of scientifically robust data originating from epidemiology and clinical trials groups, in contrast to the extreme difficulties that have been encountered attempting to identify genes contributing significantly to the population burden of conditions such as atherosclerosis, cancer, and obesity. Of course, the principal reason for this dichotomy is that classical epidemiologists have so far been studying effects that are much larger than the effect of any single genetic locus is likely to be. For example, in the large International Study of Infarct Survival (ISIS) case-control study of myocardial infarction (MI), current smoking was associated with a relative risk for MI of 4.6 (95% CI: 4.1, 5.3) among 2554 cases of premature MI (males <55 and females <65) and 4831 controls, whereas in that same study the relative risk for MI among those with the 3/
4 genotype at the apolipoprotein E
2/
3/
4 polymorphism relative to the
2/
3 genotype was only 1.17 (95% CI: 1.09, 1.25).1 The apolipoprotein E
2/
3/
4 polymorphism is the only common genetic variant for which convincing large-scale evidence of an association with MI risk exists to date.
Despite the outstanding successes of observational epidemiology in recent decades, however, there is some concern that much of the low-hanging fruit has now been picked, and that the identification of novel causal risk factors using classical methodology will become exponentially more difficult. In a recent International Journal of Epidemiology review, Davey Smith and Ebrahim drew attention to several instances (mostly to do with vitamin intake and cardiovascular or cancer risk) where the findings of observational epidemiology had not been confirmed by subsequent clinical trials;2 to these examples should be added those hypothesized causal associations which essentially cannot be validated in humans by classical means at present because no suitably specific agent has yet been developed for clinical trial purposes (a good example being the association between plasma C-reactive protein and cardiovascular risk).3 There are three principal reasons why further such difficulties might lie ahead. Firstly, the sizes of effect that are being claimed for novel risk factors are smaller than for the classical risk factors; this means that ever larger studies will be required to produce robust results. Secondly, the associations between novel risk factors and disease might be confounded by other inaccurately measured or unmeasured factors which are themselves related to the risk of disease. For example, plasma fibrinogen, a hypothesized novel risk factor for CHD, is very strongly associated with smoking, a causal factor: while statistical correction for smoking in the assessment of the relationship between fibrinogen and CHD risk is possible, measurement error would render the correction likely to underestimate the effect of smoking (Figure 1). Thirdly, for some diseases, reverse causality may be a problemin the case of atherosclerosis, it is known that the process begins in early life, and pathological studies clearly show its inflammatory component, so those with baseline higher levels of high-sensitivity C-reactive protein or other inflammatory markers may have subclinical disease causing their inflammatory marker profile rather than the other way round (Figure 2).
|
|
Katan was among the first to recognize that genetics could potentially contribute importantly to the debate regarding causality. His brief contribution to the Lancet correspondence pages in 1986 directly addresses a topic of increasing debate among geneticists and epidemiologists in 2003. The problem he addresses is that of the association between low serum cholesterol levels and cancer and the main obstacle he addresses is that of reverse causalityin this case, that pre-existing occult tumour might cause lower cholesterol levels, rather than lower cholesterol levels causing cancer. In the early 1980s, the central role of the apolipoprotein E molecule in cholesterol metabolism was discovered, and the association between the E2, E3, and E4 isoforms of that molecule (determined by the 2,
3, and
4 alleles of the apolipoprotein E gene) and blood levels of low density liporprotein cholesterol were observed in a number of populations. Katan reasoned that, since apolipoprotein E genotypes were determined at conception, they would determine long-term differences in blood cholesterol between individuals and could not be altered by the subsequent development of disease. Thus, if the causal arrow pointed from low cholesterol to cancer, there would be a higher frequency of the allele predisposing to lower cholesterol (
2) and a correspondingly lower frequency of the allele predisposing to higher cholesterol (
4) among cancer cases, whereas if it pointed in the other direction genotypes would be randomly distributed among cases and controls (Figure 3). So far as I am aware Katan's hypothesis was never tested in the way that he proposed it. Subsequent to Katan's Lancet letter, Gray and Wheatley in 1991 proposed a similar use of genetic data to avoid bias when comparing bone marrow transplantation (BMT) with chemotherapy, and coined the term Mendelian randomization.4 If anything, Gray and Wheatley's approach was even more ingenious than Katan's. In the area of leukaemia treatment, by the time their article was written, it was already thought to be unethical to withhold BMT from those who did have human leukocyte antigen (HLA)-matched donors (a minority of patients). So, a randomized trial comparing allogeneic BMT with no BMT (or extra chemotherapy) might not be possible on ethical grounds, and comparing these treatments using observational data would be subject to major biases, chief among which would be selection bias. Gray and Wheatley proposed that comparing the survival of those patients who had HLA-compatible siblings with those who did not would constitute an unbiased assessment of the value of allogeneic BMT. This is because the allocation to either group would have been made years before the onset of disease, and therefore no selection bias could occur. This approach assumed that complete data on HLA typing would, in general, be available, and that most suitable patients with donors would go on to receive BMT. Here, the random segregation of HLA alleles in the meioses producing the affected individual and any siblings produces a de facto randomization entirely analogous to that employed in the clinical trial setting. A number of subsequent studies have adopted Gray and Wheatley's approach in acute myeloid leukeamia, and consistently better survival among those with HLA-matched donors has been observed.5,6
|
|
As proteomic technology becomes progressively more applicable to large sample sizes, it will eventually be possible to study the expression of many thousands of proteins in many thousands of individuals. Establishing the causality of any of the thousands of associations likely to emerge will be extremely challenging, since the development of the number of animal models or therapeutic agents necessary to test this number of associations will be impossible. A potential next step might be the early investigation of these associations by Mendelian randomization, to focus attention on those with more substantial evidence of causality. The development of animal models and/or the conduct of intervention trials could be restricted to those proteins that have passed through the genetic sieve.
Nearly 20 years on, we now have the genetic technology to make widespread use of the insights of Katan, Gray, and Wheatley. Such is the promise of this approach that, although I doubt that either Katan's letter or Gray and Wheatley's is paper currently a citation classic, I would be willing to bet they will be in 5 years.
![]() |
References |
---|
![]() ![]() |
---|
2 Davey Smith G, Ebrahim S. Mendelian randomization: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 2003;32:122.[CrossRef][ISI][Medline]
3 Danesh J, Whincup P, Walker M et al. Low grade inflammation and coronary heart disease: prospective study and updated meta-analyses. BMJ 2000;321:199204.
4 Gray R, Wheatley K. How to avoid bias when comparing bone marrow transplantation with chemotherapy. Bone Marrow Transplant 1991;7(Suppl.3):912.[ISI][Medline]
5 Burnett AK, Wheatley K, Goldstone AH et al. The value of allogeneic bone marrow transplant in patients with acute myeloid leukaemia at differing risk of relapse: results of the UK MRC AML 10 trial. Br J Haematol 2002;118:385400.[CrossRef][ISI][Medline]
6 Harrison G, Richards S, Lawson S et al. Comparison of allogeneic transplant versus chemotherapy for relapsed childhood acute lymphoblastic leukaemia in the MRC UKALL R1 trial. MRC Childhood Leukaemia Working Party. Ann Oncol 2000;11:9991006.[Abstract]
7 Youngman L, Keavney B, Palmer A et al. Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial infarction and 6002 controls: test of causality by Mendelian randomisation. Circulation 2000;102(Suppl.II):3132.
8 Keavney B, McKenzie CA, Connell JM et al. Measured haplotype analysis of the angiotensin-I converting enzyme gene. Hum Mol Genet 1998;7:174551.
9 Clayton D, McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 2001; 358:135660.[CrossRef][ISI][Medline]
10 Little J, Khoury MJ. Mendelian randomisation: a new spin or real progress? Lancet 2003;362:93031.[CrossRef][ISI][Medline]
11 Vickers MA, Green FR, Terry C et al. Genotype at a promoter polymorphism of the interleukin-6 gene is associated with baseline levels of plasma C-reactive protein. Cardiovasc Res 2002;53:102934.[CrossRef][ISI][Medline]
12 Cardon LR, Abecasis GR. Using haplotype blocks to map human complex trait loci. Trends Genet 2003;19:13540.[CrossRef][ISI][Medline]
13 Couzin J. Human genome. HapMap launched with pledges of $100 million. Science 2002;298:94142.[CrossRef][ISI][Medline]
14 Ding C, Cantor CR. A high-throughput gene expression analysis technique using competitive PCR and matrix-assisted laser desorption ionization time-of-flight MS. Proc Natl Acad Sci USA 2003;100: 305964.
15 Knight JC, Keating BJ, Rockett KA, Kwiatkowski DP. In vivo characterization of regulatory polymorphisms by allele-specific quantification of RNA polymerase loading. Nat Genet 2003;33:46975.[CrossRef][ISI][Medline]