Mendelian randomization: prospects, potentials, and limitations

George Davey Smith and Shah Ebrahim

Department of Social Medicine, Canynge Hall, Whiteladies Road, Bristol BS8 2PR E-mail: zetkin{at}bristol.ac.uk

In this issue of the International Journal of Epidemiology we reprint a letter to the Lancet by Martijn Katan1 and several commentaries2–7 concerning what has become known as ‘Mendelian randomization’8–12—the use of genotype–disease associations to make inferences about environmentally modifiable causes of disease. Here we will reflect on the prospects, potentials, and limitations of Mendelian randomization.


    Where can Mendelian randomization help observational epidemiology?
 Top
 Where can Mendelian...
 Categories of inference from...
 Mendelian randomization in...
 Mendelian...
 Conclusions
 References
 
Mendelian randomization is the term applied to the random assortment of alleles at the time of gamete formation. This results in population distributions of genetic variants that are generally independent of behavioural and environmental factors that typically confound epidemiological associations between putative risk factors and disease. In some circumstances this can provide a study design akin to randomized comparisons.

The principles of Mendelian randomization can serve to limit several potential problems in observational epidemiology (Table 1). The avoidance of confounding is clearly a key advantage, and in view of this, Martin Tobin and colleagues6 have suggested that the approach should be termed ‘Mendelian deconfounding’. However, there are several additional and perhaps equally important ways in which Mendelian randomization can strengthen inferences drawn from observational studies. In the example Katan originally presented—that of the association between low serum cholesterol levels and cancer—the most plausible bias would be introduced by reverse causation. The early stages of cancer could lead to a decrease in circulating cholesterol levels, generating an inverse association between cholesterol levels and cancer morbidity or mortality.1 Early stages of cancer will not, however, change inherited genetic variants that are associated with cholesterol levels. Thus if low cholesterol level were a cause of increased cancer risk then individuals with genetic variants associated with lower cholesterol levels should have a higher cancer risk. If, on the other hand, reverse causation is responsible for the association between cholesterol level and cancer, there should be no association between genetic variants related to cholesterol level and cancer risk. Biological forms of reverse causation may influence many epidemiological associations—for example, those between markers of inflammation and coronary heart disease, where existing atherosclerosis may influence the level of factors such as fibrinogen and C-reactive protein.13 Reverse causation can also occur through exposure assignment—for example, people with early stages of coronary heart disease may take vitamin supplements because they believe these will reduce their risk of cardiovascular events. This will tend to generate a positive association between vitamin intake and disease. A form of reverse causation can also occur through reporting bias, with the presence of disease influencing reporting disposition. In case-control studies people with the disease under investigation may report on their prior exposure history in a different way than do controls—perhaps because the former will think harder about potential reasons that account for why they have developed the disease. In this situation the association between genetic variants related to the exposure and disease outcome will not usually be biased.


View this table:
[in this window]
[in a new window]
 
Table 1 Problems in observational epidemiology where Mendelian randomization may help

 
In observational studies associations between an exposure and disease will generally be biased if there is selection according to an exposure–disease combination in case-control studies, or according to an exposure–disease risk combination in prospective studies. If, for example, people with an exposure and at low risk of disease for other reasons were differentially excluded from a study the exposure would appear to be positively related to disease outcome, even if there were no such association in the underlying population. This is a form of ‘Berkson's bias’, well known to epidemiologists.14 A possible example of such associative selection bias relates to the finding in the large American Cancer Society volunteer cohort that high alcohol consumption was associated with a reduced risk of stroke.15 This is somewhat counter-intuitive as the outcome category included haemorrhagic stroke (for which there is no obvious mechanism through which alcohol would reduce risk) and because alcohol is known to increase blood pressure16,17—a major causal factor for stroke.18 Population-based studies have found that alcohol tends to increase stroke risk.19–21 Heavy drinkers who volunteer for a study known to be about the health effects of their lifestyle are likely to be very unrepresentative of all heavy drinkers in the population, in ways that render them to be at low risk of stroke. Moderate and non-drinkers who volunteer may be more representative of moderate and non-drinkers in the underlying population. Thus the low risk of stroke in the heavy drinkers who volunteer for the study could erroneously make it appear that alcohol reduces the risk of stroke.

Perhaps because it is difficult to directly test for the presence of associative selection it is traditional to claim in the discussion section of epidemiological papers that such selection is unlikely to have occurred. We confess to having used such reasoning ourselves on several occasions, but clearly the use of methods not susceptible to such bias would be preferable. As in the example of heavy drinking and stroke, social processes will often underlie any associative selection bias. In contrast, it is improbable that such social processes will lead to selection of study participants on the basis of associations between a genetic variant and behavioural or other genetic risk factors. Therefore inferences drawn from associations between genetic variants and disease are less likely to be distorted by associative selection bias than is the case in conventional observational epidemiology.

The strength of associations between causal risk factors and disease in observational studies will generally be underestimated due to random measurement imprecision in indexing the exposure. A century ago Charles Spearman demonstrated mathematically how such measurement imprecision would lead to what he termed the ‘attenuation by errors’ of associations.22,23 This has more latterly been renamed ‘regression dilution bias’.24 Genetic variants associated with a difference in intermediate phenotypes such as homocysteine levels will index lifetime differences in such exposures and therefore produce estimates that are not susceptible to such attenuation.12 Indeed in the case of homocysteine and coronary heart disease (CHD) the association between MTHFR genotype and CHD, combined with the association between genotype and homocysteine levels, gives an imputed estimate of the homocysteine–CHD association similar to that seen in observational studies after they have been statistically adjusted for attenuation by errors.12,25,26


    Categories of inference from Mendelian randomization
 Top
 Where can Mendelian...
 Categories of inference from...
 Mendelian randomization in...
 Mendelian...
 Conclusions
 References
 
The principles of Mendelian randomization can be applied in a variety of ways to making inferences about environmentally modifiable determinants of disease. Table 2 provides a provisional categorization of these types of inference. First, a genetic variant could influence exposure through an effect on dispositional propensities. For example, variants that influence the tendency to drink alcohol or milk have been identified, and, as discussed later, these allow investigation of the health effects of alcohol and milk consumption.


View this table:
[in this window]
[in a new window]
 
Table 2 Categories of inferences from Mendelian randomization

 
Second, a genetic variant could influence an intermediate phenotype, such as cholesterol or fibrinogen levels. This provides a method for assessment of the causal nature of observed associations between the intermediate phenotype and disease, and thus whether interventions to modify the intermediate phenotype could be expected to influence disease risk. A further elaboration of this second approach is where intermediate phenotypes are related to each other—as in the case of the strong inverse association between triglyceride levels and HDL cholesterol. In such situations it is statistically difficult—if not impossible—to determine which of the factors has causal priority with respect to CHD risk.27,28 Genetic variants associated with differences in one, but not others, of the correlated intermediate phenotypes could be related to disease risk and, in principle, in this way causal factors can be identified. However, in the case of HDL cholesterol, triglyceride, and CHD risk this has so far proved difficult to achieve.29

Third, genetic variants that modify the biological response to an environmental exposure—such as genetic variants related to alcohol metabolism30 or detoxification of organophosphates (as contained in sheep dip, for example31) can be taken as indicators of the effects of different levels of exposure (including some exposure versus no exposure), as we have discussed previously.12 Fourth, in some circumstances genetic variants can better characterize potentially modifiable exposures that are difficult to measure, either because of technical difficulty (for example, measuring colonic bile salt exposure—see below) or because life-time exposure is the relevant measure (for example, homocysteine or blood cholesterol levels). Fifth, maternal genotype can be studied as a determinant of intrauterine exposure acting on a developing fetus, in a form of ‘intergenerational Mendelian randomization’. Again we give examples of this below. Finally, genetic variants can be taken in a non-quantitative way to indicate specific or general categories of exposure that may influence disease risk. For example, the identification of a strong link between a frameshift mutation of the NOD2 gene and Crohn's disease32,33 has been taken to strengthen the view that bacterial components of the intestinal flora contribute to Crohn's disease risk. This is because the mutated gene is involved in sensing and responding to gram-negative bacteria.34 Perhaps (and in line with data regarding cohort effects in Crohn's disease incidence35) exposure early in life, during colonization by the intestinal flora, is of importance.36

We discuss the limitations of Mendelian randomization later in this commentary, but it must be remembered that the inferences outlined above are predicated on establishing robust associations between genetic variants and health outcomes, something which has so far proved difficult in the genetic association study field.37 It is also necessary to be aware that any association between a genetic variant and health outcome may reflect linkage disequilibrium between the variant under study and another variant influencing disease risk. Lastly, interpretation of associations between a genetic variant and health outcomes may not be straightforward if the variant has pleiotropic effects.


    Mendelian randomization in action
 Top
 Where can Mendelian...
 Categories of inference from...
 Mendelian randomization in...
 Mendelian...
 Conclusions
 References
 
Perhaps a better sense for the potentials of Mendelian randomization comes from considering a few examples of the application of these principles. In several of the cases we discuss below we utilize data that were not originally presented in a way that emphasized the inferential strengths of the Mendelian randomization approach. While we categorize these examples according to the framework shown in Table 2, it is clear that some of them provide illustrations of more than one category of inference. These examples also provide practical insights into the limitations of Mendelian randomization, which we have discussed at length elsewhere.12

Exposure propensity
Lactose intolerance and the health effects of drinking milk
The letter by Katan that we reprint in this issue of the International Journal of Epidemiology was an early example of development of the concept of Mendelian randomization. Other—including earlier—examples exist, but the reasoning for the use of genetic variant–disease associations in making inferences about modifiable causes of disease was generally not as clearly formulated as in Katan's letter. One important early example relates to the longstanding interest in the role of dietary calcium—particularly from milk products—in protection against low bone mineral density, osteoporosis, and fractures. It has been reasoned that people who are lactose intolerant—a state associated with lower intake of the dairy products that provide a high proportion of calcium in the diets of people in North America and Europe—should have an increased risk of these conditions, if low milk (and thus calcium) intake is an important aetiological factor.

Several studies have explicitly tested this proposition.38,39 The notion that genetic variation in lactose tolerance may serve as a better index of calcium intake than direct dietary assessment—an example of our category ‘characterizing difficult to measure environmental exposures’—was proposed by Honkanen and colleagues:40

Calcium intake relates only modestly to bone mineral density. One of the main reasons why the verification of this association has been difficult is the lack of accuracy of calcium intake measurements. Self-reported lactose intolerance, as an indicator of long-term low calcium intake, might, therefore, help to detect this slight bone effect better than food frequency enquiry alone.

The association between milk intake and CHD risk has been highly controversial,41 with some studies suggesting that higher consumption may protect people from CHD.42,43 However it has been argued that people with existing heart disease might avoid milk as part of general lifestyle modifications (and this would, of course, generate a non-causal association between milk drinking and lower CHD event rate).44 In some studies there is also evidence of confounding by socio-economic and lifestyle factors, such that people who drink more milk come from more favourable socio-economic backgrounds and exhibit more healthy behavioural profiles.43,44 Adjustment for these factors attenuates the apparent protective effect of milk.43,44 Given the high fat content of milk it might be anticipated that milk would increase coronary risk, but the degree of potential reverse causation and confounding in the observational epidemiological studies leaves the true situation unclear. Segall reported an inverse association between the prevalence of lactose intolerance and CHD mortality in an ecological study, indicating that milk consumption increases risk.45 While this 1980 study could be considered an early example of Mendelian randomization—in that a genetic variant is used as a proxy for a modifiable environmental exposure—ecological studies are prone to confounding by the many factors that differ between countries and could be related to both prevalence of lactose intolerance and CHD risk. Segall proposed that the hypothesis could be tested by direct comparison of the CHD risk in lactose tolerant and lactose intolerant people.45 A case-control study found myocardial infarction risk to be lower in those who were lactose intolerant: odds ratio 0.45 (95% confidence interval 0.19, 1.08; our calculation).46 While full details of the results were not given, the authors concluded that lactose tolerance ‘does not seem to be a risk factor for myocardial infarction by itself but is a precondition for the ability to drink a lot of milk without getting complaints’.46 While these data suggest that milk drinking could increase the risk of CHD it is clear that more powerful studies are required.

Recently a genetic variant underlying lactose intolerance (or at least in tight linkage disequilibrium with such a variant) has been identified.47 Given that some lactose intolerance could be an acquired phenomenon (and therefore potentially related to existing disease and/or confounding factors) and that it takes considerable time and expense to evaluate the lactose intolerance phenotype, the identification of this variant allows for a more reliable and feasible use of lactose intolerance within the Mendelian randomization framework. One situation in which the study of acquired lactose intolerance could confuse interpretation is with respect to inflammatory bowel disease. An increased prevalence of lactose intolerance has been reported in patients with Crohn's disease,48 but this could be a secondary phenomenon due to the mucosal inflammation consequent on the disease. A recent study demonstrated that this appears to be the case, since there is no difference in prevalence of the genotype related to lactose tolerance between Crohn's disease patients and controls.49 This is an example of Mendelian randomization being used to study the phenomenon of reverse causation, in this case suggesting that reverse causation was indeed instrumental in generating the observed phenotype– disease association. The genetic variant associated with lactose tolerance has recently been reported to be related both to milk consumption and to the risk of fractures,50 although larger studies are required to confirm this finding.

Lactose intolerance differs markedly between populations, and thus there could be confounding between genotype, population of ancestral origin, and disease outcome. In the USA, for example, people with more recent African roots will have a higher prevalence of lactose intolerance than the European-derived population, and these populations will experience different risks of disease because of environmental or perhaps other genetic factors. Confounded associations between genotype and disease could be generated in this way. This form of confounding is referred to as population stratification, and while there is debate regarding the extent to which this can generate spurious associations,51–53 it remains important to bear this potential problem—common to all population genetic association studies, not just ones within a Mendelian randomization framework—in mind. Furthermore, with the considerable variation in lactose prevalence between populations, the associations between genotype and milk consumption may not be the same in all places. For example, in populations where the vast majority of people are lactose tolerant, this may generate cultural pressures such that lactose intolerant people drink at least some milk. This implies that it is always necessary to demonstrate that genetic variants are, indeed, related to the potentially modifiable environmental factor they are taken to proxy for when carrying out studies within the Mendelian randomization framework.

Intermediate phenotypes
Familial hypercholesterolaemia: estimating the cholesterol–CHD association
Another early example of the use of the principles of Mendelian randomization involves genetic variation in cholesterol levels and CHD risk. Familial hypercholesterolaemia is a dominantly inherited condition in which many rare mutations (over 700 DNA sequence variations54–56) of the low density lipoprotein receptor gene (about 10 million people affected world-wide, a prevalence of around 0.2%), lead to high circulating cholesterol levels.57 The high risk of premature CHD in people with this condition was readily appreciated, with an early UK report demonstrating that by age 50 half of men and 12% of women had suffered from CHD.58 Compared with the population of England and Wales (mean total cholesterol 6.0 mmol/l), people with familial hypercholesterolaemia (mean total cholesterol 9 mmol/l) suffered a 3.9-fold increased risk of CHD mortality, although very high relative risks among those aged less than 40 years have been observed.59 These observations, regarding genetically determined variation in risk, provided strong evidence that the associations between blood cholesterol and CHD seen in general populations reflected a causal relationship. However, as Ole Færgeman discusses,60 this evidence was not accepted, for a variety of reasons, by many clinical and public health practitioners.

With the advent of effective means of reducing blood cholesterol through statin treatment61 there remains no serious doubt that the cholesterol-CHD relationship is causal. Among people without CHD, reducing total cholesterol levels with statin drugs by around 1–1.5 mmol/l reduces CHD mortality by around 25% over 5 years.62–64 Assuming a linear relationship between blood cholesterol and CHD risk, and given the difference in cholesterol of 3.0 mmol/l between people with familial hypercholesterolaemia and the general population,59 the randomized controlled trial evidence on lowering total cholesterol and reducing CHD mortality would predict a relative risk for CHD of around 2, as opposed to 3.9, for people with familial hypercholesterolaemia. However, the trials also demonstrate that the relative reduction in CHD mortality increases over time from randomization—and thus time with lowered cholesterol—as would be expected if elevated levels of cholesterol operate over decades to influence the development of atherosclerosis. People with familial hypercholesterolaemia will have had high total cholesterol levels throughout their lives and this would be expected to generate a greater risk than that predicted by the results of lowering cholesterol levels for only 5 years. Furthermore, ecological studies relating cholesterol levels to CHD demonstrate that the strength of association increases as the lag period between cholesterol level assessment and CHD mortality increases,65 again suggesting that long-term differences in cholesterol level are the important aetiological factor in CHD. As discussed above, Mendelian randomization is one method of assessing the effects of long-term differences in exposures on disease risk, free from the diluting problems of both measurement error and of only having short-term assessment of risk factor levels. This approach may provide an indication that cholesterol-lowering efforts should be lifelong, rather than limited to the period for which RCT evidence with respect to CHD outcomes is available.

More recently, mutations in the gene coding for apolipoprotein B (apoB) have been found to produce a syndrome phenotypically indistinguishable from familial hypercholesterolaemia—familial defective ApoB.66–68 In a recent study of the Arg3500Gln mutation of the APOB gene, the basic principle behind Mendelian randomization can be demonstrated, in that Arg3500Gln heterozygotes had higher levels of total cholesterol but other CHD risk factors (including triglycerides, fibrinogen, glucose, body mass index and waist-hip ratio) did not differ from non-heterozygotes in the general population.69 The Arg3500Gln heterozygotes had a median 2.6 mmol/l higher blood cholesterol level and a high (but imprecise) odds ratio for CHD of 7.0 (95% CI 2.2, 22) compared with the general population.69 As in the case of familial hypocholesterolaemia this is greater than that predicted by the randomized controlled trial data, but again the differences in cholesterol by genotype will have been life-long, and the elevated CHD risk probably reflects the effects of long-term differences in cholesterol level.

Indicating the category of exposure causing disease risk
Vitamin D, sunlight, tuberculosis, and multiple sclerosis
In the 18th century, fish oil was a recommended treatment for tuberculosis and in the 1940s reports of use of vitamin D to treat skin tuberculosis (Lupus vulgaris) appeared.70 Solaria—rooms to concentrate the sun's rays—were also prevalent as a means of treating tuberculosis (see box and Figure 1).71 More recently, deficiency of vitamin D (25-hydroxycholecalciferol) has been shown to be associated with increased risk of infection with Mycobacterium tuberculosis, with the proposed mechanism implicating 1,25-dihydroxycholecalciferol (the active metabolite of vitamin D) in activating mononuclear phagocytosis of intracellular M. tuberculosis.72 The epidemiological evidence supporting this association comprises the following observations: an increased risk of tuberculosis among south Asians on migration to Britain's less sunny climate, which would lead to a reduction in serum vitamin D levels,73 particularly among vegetarian south Asians living in Britain;74 equivocal case-control studies comparing serum vitamin D in those with and without tuberculosis;75,76 and the annual seasonality of increased incidence of tuberculosis in spring and early summer following the low levels of vitamin D arising during the winter months.77 Clearly, such observations may be highly confounded by the various known socially and culturally patterned risk factors for tuberculosis. Evidence from a randomized controlled trial of vitamin D supplementation could establish causality and the need for a policy of widespread dietary supplementation of those at high risk.78 In the absence of such evidence, the principles of Mendelian randomization may be helpful.




View larger version (225K):
[in this window]
[in a new window]
 
Figure 1 The solarium at Jamnagar, India (above). Lenses and concentrators increase the efficiency of the solarium (below). Photographs: George Davey Smith

 

Box The solarium at Jamnagar

The widespread belief that sunlight was beneficial to health was shared by physicians working in many areas of medicine. Sunlight exposure was recommended for several disorders, and ways of increasing exposure were contrived. The solarium at Jamnagar, Gujarat, India (Figure 1), is an impressive example of this. The solarium was built on the initiative of the ruler of Nawanagar state, the cricketer Ranjitsinghi. It was designed by a French engineer, Dr Jean Saidman (who built three of these solariums), and was operational from 1934. The Jamnagar solarium is 40 feet tall and the treatment rooms are located in the rotating top section, which is 114 feet long and takes an hour to rotate fully. Maximal light exposure can be ensured by rotation. Some treatment rooms are equipped with filters which allow through only rays of wavelengths considered suitable for the various diseases treated in the solarium, and lenses concentrate the light to two and a half times its natural intensity. The solarium no longer works because most of the lenses and concentrators were broken during a cyclone and replacements cannot be found. A detailed photographic library provides before and after views of people treated for various conditions, including lymphoid hyperplasias, tuberculosis, and several skin conditions.

 

Several polymorphisms of the vitamin D receptor (VDR) gene have been identified—of which TaqI, BsmI, ApaI and FokI polymorphisms have been most studied in the context of examining associations with common phenotypes.79 Rarer mutations of the VDR gene are responsible for the autosomal recessive hereditary vitamin D-resistant rickets, but the functional importance, in terms of VDR expression and function in vitro and vitamin D levels and calcium absorption in vivo, of the common polymorphisms is less clear, with some variants likely to be non-functional (BsmI and ApaI), and with evidence of linkage disequilibrium between TaqI, BsmI, ApaI variants but not FokI.79,80 A strong association of the TaqI polymorphism with tuberculosis has been reported among Gambian adults, with the rarer homozygous genotype related odds ratio of tuberculosis being 0.53 (95% CI 0.31, 0.88).81 This study was interpreted as indicating that this genotype confirmed resistance to tuberculosis, prompting the investigators to suggest that a trial of vitamin D supplementation in tuberculosis was warranted. In a further case-control study82 among Gujarati adults in London, which examined the association between this same TaqI polymorphism and tuberculosis, the rarer homozygous genotype was less frequent in people with tuberculosis, although the relationship was statistically not robust. A very strong association was found between vitamin D levels and odds of tuberculosis, and apparent interactions between vitamin D deficiency and the non-protective wild type and heterozygous genotypes were found.83,84 The FokI combined rarer homozygous and heterozygous genotypes also showed weak evidence of association with tuberculosis risk (odds ratio 1.32, 95% CI 0.72, 2.41) in this study.

Our re-analysis of these data82 provides a Mendelian randomization assessment of the association between genotype and disease and between vitamin D and disease. Median vitamin D levels were 5 nmol/l higher in controls compared with tuberculosis patients, and the odds ratio for tuberculosis among those with higher levels of vitamin D was 0.17 (95% CI 0.07, 0.39), indicating a strong protective effect. The vitamin D levels between the TaqI genotypes were compared, and a statistically uncertain 3 nmol/l higher level among rarer homozygous and the heterozygous (denoted tt and Tt) genotypes compared with the wild type (denoted TT) genotype was found. Relating the TaqI polymorphism to risk of tuberculosis yields an odds ratio of 0.85 (95% CI 0.47, 1.53) for tt/Tt versus TT, providing weak evidence of a protective effect among those with genotypes conferring a higher vitamin D level. Comparisons of vitamin D levels between the tt and TT genotypes were not reported but might be expected to be larger than 3 nmol/l, and the odds ratio for tuberculosis of tt versus TT genotypes reflects this, with a greater protective effect: odds ratio 0.53 (95% CI 0.15, 1.69), a very similar estimate to that obtained by the Gambian study.81 Furthermore, the wild type genotype of the FokI polymorphism was associated with a higher vitamin D level of 3.5 nmol/l than the other genotypes, and a similar protective effect on risk of tuberculosis as that observed with the TaqI genotype was found: odds ratio 0.76 (95% CI 0.41, 1.38). These findings are very imprecise, but consistent with the notion that vitamin D deficiency is causally related to increased risk of tuberculosis.

Similar approaches may be applied in examining the hypothesis that vitamin D deficiency is of aetiological importance in multiple sclerosis, where observational epidemiology suggests that sunlight exposure, and hence high vitamin D status, may reduce risk.85 A report examining the association between the BsmI polymorphism of the VDR gene and multiple sclerosis has shown an odds ratio of 2.38 (95% CI 1.03, 5.60) comparing the rarer homozygous variant (bb) with the heterozygous and wild type variants,86 giving supportive evidence that vitamin D deficiency may be a causal factor. Vitamin D status and ultraviolet radiation have been implicated in other autoimmune diseases,87,88 and Mendelian randomization may prove helpful in elucidating causality.

Little can currently be concluded from studies relating vitamin D receptor polymorphisms to disease outcomes, partly because whether there is any functional effect of most of the variants that have been studied is uncertain. Furthermore, interpretation is clouded by the fact that a balance may exist between VDR sensitivity and vitamin D levels, both of which may be related to VDR genotype (perhaps in an inter-connected fashion). As in other areas it will be necessary to conduct much bigger and more powerful studies and to better characterize the functional effects of the VDR polymorphisms before any firm conclusions can be reached.

Aspirin and colon cancer
In recent years there has been considerable interest in the possibility that aspirin reduces the risk of colon cancer. This interest originated in an unhypothesized finding in a case-control study exploring a large number of potential risk factors.89 This type of data-derived finding from an observational epidemiological study is particularly unreliable as a clue to aetiology, but several subsequent studies confirmed the finding.90 However, confounding or various potential biases could have generated the association. The consistency and strength of the basic finding is such, however, that randomised controlled trials are planned. Lin and colleagues approached this issue through examining variants in the gene coding for prostaglandin H synthase 2 (PTGS2), an enzyme involved in conversion of arachidonic acid to prostaglandin H2.91 This enzyme is inhibited by aspirin. Through searching for naturally occurring variants, Lin and colleagues detected a single nucleotide polymorphism that produces an amino acid change in the enzyme.91 This variant was seen amongst African-Americans, but not among the other ethnic groups studied. In three small case-control studies Lin et al. found evidence that the variant was associated with a 30–50% reduced risk of colon cancer or colorectal adenomas amongst African-Americans. They hypothesised that naturally occurring PTGS2 variants might mimic long-term aspirin use. Despite a similar association being seen in the three case-control studies, the sample sizes were small, particularly given the variant allele frequency of less than five percent among African-Americans. A larger study is required to confirm these exciting preliminary data. The data do, however, provide supportive evidence that aspirin (and other PTGS2 inhibitors) protect against colon cancer.

Modifiers of environmental exposures
Alcohol intake, aldehyde dehydrogenase genotype, and coronary heart disease
The possible protective effect of moderate alcohol consumption on CHD risk remains controversial.92–94 Non-drinkers may be at a higher risk of CHD because health problems (perhaps induced by previous alcohol abuse) dissuade them from drinking.95 As well as this form of reverse causation, confounding could play a role, with non-drinkers being more likely to display an adverse profile of socioeconomic or other behavioural risk factors for CHD.96 Alternatively, alcohol may have a direct biological effect that lessens the risk of CHD—for example by increasing the levels of protective high density lipoprotein (HDL) cholesterol.97 It is, however, unlikely that an RCT of alcohol intake, able to test whether there is a protective effect of alcohol on CHD events, will be carried out.

We have previously discussed the investigation of this issue through relating genetic variants associated with alcohol metabolism to CHD risk and HDL cholesterol levels.12 In this earlier case,30 the ADH3 variant was taken to be an indicator of the ‘active’ level of alcohol, given an effect on clearance of alcohol from the system. This association between the ADH3 variant and coronary heart disease is an example of a genetic variant that modifies an environmental exposure, and through this serves as an indicator of the effects of different levels of exposure. However, a more direct way of investigating whether alcohol influences the risk of disease is through the use of genetic variants that influence level of alcohol consumption; an example of ‘exposure propensity’ Mendelian randomization.

Alcohol is oxidized to acetaldehyde, which in turn is oxidized by aldehyde dehydrogenases (ALDHs) to acetate. Half of Japanese people are heterozygotes or homozygotes for a null variant of ALDH2 and peak blood acetaldehyde concentrations post alcohol challenge are 18 times and 5 times higher among homozygous null variant and heterozygous individuals compared with homozygous wild type individuals.98 This renders the consumption of alcohol unpleasant through inducing facial flushing, palpitations, drowsiness, and other symptoms. As Table 3 shows, there are very considerable differences in alcohol consumption according to genotype.99 The principles of Mendelian randomization are seen to apply—two factors that would be expected to be associated with alcohol consumption, age and cigarette smoking, which would confound conventional observational associations between alcohol and disease, are not related to genotype, despite the strong association of genotype with alcohol consumption.


View this table:
[in this window]
[in a new window]
 
Table 3 Relationship between characteristics and ALDH2 genotype: the 2 * 2/2 * 2 and 2 * 2/2 * 1 genotypes are associated with avoidance of alcohol consumption99

 
It would be expected that ALDH2 genotype influences diseases known to be related to alcohol consumption, and as proof of principle it has been shown that ALDH2 null variant homozygosity—associated with low alcohol consumption—is indeed related to a lower risk of liver cirrhosis.100 Considerable evidence, including data from randomized controlled trials, suggests that alcohol increases HDL cholesterol levels101,102 (which should protect against CHD) and blood pressure (which should mitigate or reverse the protective effect of alcohol).103,104 In line with this, ALDH2 genotype is strongly associated with HDL cholesterol and hypertension in the expected direction (Table 3). Given the apparent protective effect of alcohol against CHD risk seen in observational studies possession of the ALDH2 allele—associated with lower alcohol consumption—should be associated with a greater risk of myocardial infarction, and this is what was seen in a case-control study.99 Men either homozygous or heterozygous for null ALDH2 were at twice the risk of myocardial infarction. Supporting reasoning that the HDL cholesterol elevating effects of alcohol are what renders it protective against coronary heart disease, statistical adjustment for HDL cholesterol greatly attenuated the association between ALDH2 genotype and CHD.

The ALDH2 example also illustrates some of the complexity in interpreting studies utilizing the Mendelian randomization approach. ALDH2 genotype could be used to study the association between alcohol intake and cancers for which alcohol is a putative cause. In the case of squamous cell carcinoma of the oesophagus the predictions from data suggesting that alcohol increases risk are borne out. In a case-control study, men homozygous for the ALDH2 null variant—and thus likely to drink much less alcohol than their counterparts—had a markedly reduced risk (OR 0.12; 95% CI 0.01,0.46, our calculation)105 compared with the other men. This finding is unambiguous and provides strong confirmatory evidence that alcohol drinking increases risk of squamous cell cancer of the oesophagous. However, it is thought that acetaldehyde, an established animal carcinogen, might increase the risk of this cancer.106 At a given level of alcohol consumption men with one or two null ALDH2 alleles will have higher acetaldehyde levels, and indeed the data suggest that when drinking is held constant, null variant heterozygotes have an increased risk of oesophageal cancer.105 When a genetic variant has effects on a variety of processes attribution of cause may not be straightforward; in some cases processes may be linked, as when decreased clearance of acetaldehyde leads to both less alcohol consumption and higher levels of a circulating carcinogen at a particular level of alcohol intake, and the ambiguity that Mendelian randomization is intended to resolve returns.

Characterizing environmental exposures
Bile salts and colorectal cancer
Bile acids are created as part of the metabolism of cholesterol and are grouped into primary bile acids produced by the liver and stored in the gall bladder (these aid digestion of fats and are secreted after a meal) and secondary bile acids produced in the colon by the action of bacteria on primary bile acids. These secondary bile acids are returned to the liver in the entero-hepatic circulation via the portal vein with only a small proportion not being reabsorbed and appearing in the faeces. Active transport at the terminal ileum is the main means by which this reabsorption of bile acids occurs and genetic variations in the ileal sodium-dependent bile acid transporter gene may result in differences in the efficiency of the entero-hepatic reabsorption of bile acids. Epidemiological studies have examined the role of secondary bile acids in the aetiology of colon cancer, as they have the potential to damage cell membranes of the colonic mucosa and might thereby stimulate epithelial proliferation, increasing the risk of adenoma and cancers. A high risk of colonic cancer associated with elevated levels of faecal bile acids has been reported in a number of studies, using both prospective107–108 and case-control methods,109–111 although findings have been equivocal. One of the major problems for studies of this nature is characterizing the exposure variable—levels of secondary bile acids in the colonic lumen—accurately. In addition, many potential confounding factors, such as age, intestinal transit time, stool weight and hepatic function would need to be taken into account. Furthermore, in case-control designs it is highly likely that modifications in diet as a result of colonic disease would make assessments of faecal secondary bile acids unrepresentative of typical levels prior to the onset of disease. In these circumstances, Mendelian randomization may provide a method by which unbiased and unconfounded estimates of the effects on colon cancer risk of lifetime higher levels of colonic bile acids may be made, by comparing the risk of cancer in people with and without variants of the ileal sodium-dependent bile acid transporter gene. Such a study has been conducted examining the effects of polymorphisms of the SLC10A2 gene encoding an ileal sodium-dependent bile acid transporter.112 A case-control design with 458 colorectal adenomas and 504 controls reported an odds ratio of 2.06 (95% CI 1.10, 3.83) for a heterozygous 169 C -> T variant compared with homozygous wild type. Interpretation is complicated by the lack of functional significance of the 169 C -> T variant, but it is assumed to be in linkage disequilibrium with a functional variant, and functional variants of the SLC10A2 gene have been shown to cause primary bile acid malabsorption.113

Intergenerational influences
Methyl-tetrahydrofolate reductase (MTHFR) polymorphisms and neural tube defects
Examining the effects of genotype of parents on the health outcomes of their children introduces the idea of ‘intergenerational’ Mendelian randomization. In these circumstances, exposures of interest relate to aspects of the intrauterine environment that are difficult to measure, but are modified by parental genotype. For example, folate deficiency in pregnancy is now known to be a cause of neural tube defects (NTDs), an effect confirmed by randomized controlled trial evidence.114,115 The MTHFR 677 C -> T polymorphism is associated with increased blood levels of homocysteine (equivalent to the situation resulting from lower levels of folate intake) and in a meta-analysis of case-control studies of NTDs, TT mothers had a 2-fold higher risk of having an infant with a neural tube defect than CC mothers.116 The relative risk of a neural tube defect associated with the TT genotype in the infant was less than that observed with respect to maternal genotype, and there was no effect of paternal genotype on offspring neural tube defect risk. This suggests that it is the intra-uterine environment—influenced by maternal TT genotype—rather than the genotype of offspring that increases the risk of NTD, as we have discussed previously in more detail.12

Glucokinase polymorphisms and birthweight of offspring
A mutation in the glucokinase gene related to raised blood glucose has been identified, and pregnant women who are heterozygous for this variant provide an intrauterine environment that will, on average, expose the foetus to higher glucose levels than is the case for the fetuses of pregnant women not carrying the mutation.117 This permits examination of the role of an environmental factor, intrauterine glucose level, on outcomes of importance for the health of the infant. Among offspring who themselves are not carriers of the mutation, birthweight is over 600 g higher if the mother carries the mutation (and thereby on average has higher glucose levels) than if the mother does not carry the mutation,117 demonstrating a clear effect of intrauterine exposure to maternal glucose levels on birth weight. The long-term effects of intrauterine glucose could be studied through comparing such offspring at a later stage of life with respect to growth, insulin resistance and risk of diabetes. Preliminary data suggest that while maternal hyperglycaemia generated by the glucokinase mutation has a large influence on birthweight of offspring, it does not have any substantial impact on weight, body mass index, height and various markers of the insulin resistance syndrome among adult offspring.118 However, this comparison is based on a total of only 81 adult offspring, therefore the power to detect effects of plausible magnitude is low, and further investigation using this unique model would be valuable.


    Mendelian randomization—proving a negative?
 Top
 Where can Mendelian...
 Categories of inference from...
 Mendelian randomization in...
 Mendelian...
 Conclusions
 References
 
Mendelian randomization as a means of testing whether environmental exposures are causal or not has considerable strengths, in that it provides estimates that are largely unconfounded and free from the effects of reverse causality. However, statistical power is low when Mendelian randomization is used in an attempt to refute a causal association. Very large sample sizes may be needed to give sufficiently precise estimates of genotype–phenotype effect to exclude differences of public health and clinical importance.

Consider the example of the association between fibrinogen and CHD, which we have discussed previously.12,13 Youngman et al.8 examined a polymorphism in the promoter region of the ß-fibrinogen gene which had a per-allele influence on plasma fibrinogen levels of 0.12 g/l. In their large case-control study, fibrinogen was related to CHD risk, with 0.12 g/l higher fibrinogen being associated with a relative risk of CHD of 1.20 (95% CI 1.13, 1.26). Since 0.12 g/l is the per allele difference in fibrinogen, it would therefore be predicted that there should be a per allele effect on CHD, with a relative risk of approximately 1.20. However, when genotype was related to CHD risk, essentially no relationship was seen, with a per allele relative risk of 1.03(0.96, 1.10), excluding the estimate obtained by examining the fibrinogen–CHD association. Uncontrolled confounding is a plausible explanation for the difference between the phenotype–CHD and genotype–CHD associations, as fibrinogen levels are higher in smokers, those from deprived backgrounds or with low educational achievement, shorter people (who have higher risk of CHD), and those with raised serum cholesterol levels.119 Reverse causality may also play a part here as early, pre-symptomatic atheroma can itself lead to raised fibrinogen levels, generating an automatic—but non-causal—prospective association between fibrinogen and risk of CHD events.

Very large sample sizes are, however, required to provide sufficiently precise estimates of genotype–disease associations to exclude a clinically relevant effect, owing to the small effect sizes expected. The relationships between genotype and disease risk and genotype and fibrinogen levels observed in the study by Youngman et al.8 imply a risk ratio of 1.3(95% CI 0.7, 2.2) for a 1 g/l increase in fibrinogen, assuming a linear relationship between fibrinogen and log-risk of CHD. This can be compared with the relative risk of 1.8(95% CI 1.6, 2.0), for 1 g/l difference in fibrinogen reported in a meta-analysis of fibrinogen-CHD observational epidemiological studies;120 a test of the difference between these two estimates yields a P value of 0.24. The reason that the genotype–CHD association in the Youngman study is incompatible with the fibrinogen–CHD association is that the latter is of considerably greater magnitude than is seen in the systematic review of this issue. This could reflect greater reverse causation in the Youngman case-control study than in prospective studies (all patients had experienced a heart attack before fibrinogen was measured), or the considerable differential response rates between cases and controls,121 which could have generated a strongly confounded association. A sample size of around 30 000 cases (and the same number of controls) would be required to have 80% power to exclude a 1.5-relative risk of CHD for the difference in fibrinogen between the top and bottom thirds of the population if there is no true fibrinogen–CHD association.13

Limitations of Mendelian Randomization
We dealt extensively with the limitations of Mendelian randomization in our earlier paper12 (see Table 4), and these have also been discussed elsewhere.4–6,122 Interestingly, several of these limitations were recognized during early discussions of what has come to be known as Mendelian randomization. For example Newcomer et al., who related lactose intolerance to osteoporosis, recognized both confounding by ethnic/racial group (i.e. population stratification) and pleiotropy as potential problems.39 With respect to Katan's original example, of apolipoprotein E (apo E) polymorphisms and cancer as a test of the cholesterol—cancer association, some recent (but under-powered) data have suggested that an association may exist.123 However, as we discussed earlier,12 apo E variants have highly pleiotropic effects, which complicates interpretations. The APOE gene is associated with longevity ({varepsilon}2 alleles have an increased frequency in aged populations), although the mechanisms are unclear.124 Associations with blood LDL cholesterol and increased CHD125 and stroke126 risk are found with possession of {varepsilon}4 alleles. The association between the {varepsilon}4 allele and Alzheimer's disease has been widely replicated with a meta-analysis of studies showing a 15-fold increased risk in those with the {varepsilon}4{varepsilon}4 genotype.127 APOE gene variants have been associated with a wide range of other diseases, including modification of the response to head injury, gall stones, osteoporosis, herpes simplex virus type 1 cold sores, and retinitis pigmentosa.128 Many of these associations may simply represent false positives as this gene has been more widely characterized than most in population surveys capable of sustaining genetic association studies. However, there is certainly suggestive evidence that the polymorphism (or variants in linkage disequilibrium with it) influence a variety of pathophysiological processes. For example, {varepsilon}2 alleles are associated with lower cholesterol level (and thus an expectancy of reduced CHD risk) but also with greater postprandial lipaemia, which could (and appears to) mitigate against this protective effect.12 Therefore interpretation of associations between the polymorphism and disease outcomes as reflecting the effects of cholesterol differences between APOE genotypes is problematic. The various pleiotropic effects of APOE genotype thus provide difficulties in, for example, interpreting the association between APOE genotypes and Alzheimer's disease in terms of an association between cholesterol and Alzheimer's disease,129 and the same applies to the association of cholesterol and cancer originally discussed by Katan.1


View this table:
[in this window]
[in a new window]
 
Table 4 Limitations of Mendelian randomisation12

 
Confounding (as in population stratification or linkage disequilibrium) or pleiotrophy as problems for Mendelian randomisation can, at least, be studied through measuring potential confounding factors (as has been done for ALDH2 genotype in Table 3, and for MTHFR genotype elsewhere12). Canalization or developmental compensation is more difficult to study. These processes are considered to reflect developmental buffering against the effect of a polymorphism during fetal development (and perhaps post-natal development). However different categories of Mendelian randomization are differentially susceptible to this problem. As in the earlier example of ALDH2 polymorpisms and alcohol, if a genotype is associated with a behaviour which is generally only adopted after development has ceased (as is the case with alcohol consumption), canalization is not an issue: in these circumstances genotype–disease associations should reflect associations between the behaviour of interest and health outcomes. Furthermore, intergenerational Mendelian randomization is not susceptible to this problem, since the phenotypic expression (and any canalization) of the parental genotype occurs in the parents, not in the offspring. Adaptation of offspring during development would be the same whether the intrauterine environment was influenced by maternal genotype or by maternal environmental factors. Consequently, the effect of the genotype of the parent on the disease risk of the offspring can be considered free of possible effects of canalization.

Population attributable risk and non-removable genetic factors
Many critiques of the conceits of genetic epidemiology focus on two features of findings from genetic association studies: that the population attributable risk of the genetic variants is low, and that in any case the influence of genetic factors is not reversible. Illustrating both of these criticisms, Terwilliger and Weiss130 suggest as reasons for considering that many of the current claims regarding genetic epidemiology are hype (1) that alleles identified as increasing the risk of common diseases ‘tend to be involved in only a small subset of all cases of such diseases’ and (2) that in any case ‘while the concept of attributable risk is an important one for evaluating the impact of removable environmental factors, for non-removable genetic risk factors, it is a moot point’.

These evaluations of the role of genetic epidemiology are not relevant when considering the potential contributions of Mendelian randomization. This approach is not concerned with the population attributable risk of any particular genetic variant, but the degree to which associations between the genetic variant and disease outcomes can demonstrate the importance of environmentally modifiable factors as causes of disease. Consider, for example, the case of familial hypercholesterolaemia or familial defective apo B. The genetic mutations associated with these conditions will only account for a trivial percentage of cases of CHD within the population—i.e. the population attributable risk will be low. For example, in a Danish population, the frequency of familial defective apo B is 0.08% and despite its 7-fold increased risk of CHD, will only generate a population attributable risk of 0.5%.69 However, by identifying blood cholesterol levels as a causal factor for CHD the triangular association between genotype, blood cholesterol and CHD risk identifies an environmentally modifiable factor with a very high population attributable risk—assuming that 50% of the population have raised blood cholesterol above 6.0 mmol/l and this is associated with a relative risk of 2-fold, a population attributable risk of 33% is obtained. The same logic applies to the other examples discussed above—the attributable risk of the genotype is low, but the population attributable risk of the modifiable environmental factor identified as causal through the genotype–disease associations is large.

The same reasoning applies when considering the suggestion that since genotype cannot be modified, genotype–disease associations are not of public health importance.130 The point of Mendelian randomization approaches is not to attempt to modify genotype, but to utilize genotype–disease associations to strengthen inferences regarding modifiable environmental risks for disease, and then reduce disease risk in the population through applying this knowledge.


    Conclusions
 Top
 Where can Mendelian...
 Categories of inference from...
 Mendelian randomization in...
 Mendelian...
 Conclusions
 References
 
Mendelian randomization approaches have the potential to contribute to an improved understanding of the aetiological importance of environmental factors in common chronic diseases, through reducing the influence on estimated associations of confounding, reverse causation, and various other sources of bias. Categories of inference from Mendelian randomization studies involve propensity to being exposed to a risk factor, proxies for intermediate phenotypes, modifiers of environmental exposures, studying intergenerational exposures, and identifying the broad categories of exposure that may be aetiologically important, and thus should be investigated further. Limitations of the approach must be acknowledged; most are common to any genetic epidemiology enterprise and some should be reduced through application of the growing knowledge provided by the study of functional genomics.



View larger version (156K):
[in this window]
[in a new window]
 
Figure 2 Gregor Mendel: progenitor of Mendelian randomization. Photograph courtesy of Mendel Center, Brno, Czech Republic, printed with permission.

 

    Acknowledgments
 
Ian Day, Debbie Lawlor, Sarah Lewis, John Lynch, Neil Pearce, Jonathan Sterne, and Nic Timpson provided helpful comments on earlier drafts of the manuscript.


    References
 Top
 Where can Mendelian...
 Categories of inference from...
 Mendelian randomization in...
 Mendelian...
 Conclusions
 References
 
1 Katan MB. Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet, 1986;i:507–8. (Reprinted Int J Epidemiol 2004;33:9.).

2 Katan MB. Commentary: Mendelian randomization, 18 years on. Int J Epidemiol 2004;33:10–11.

3 Wheatley K, Gray R. Commentary: Mendelian randomization—an update on its use to evaluate allogenic stem cell transplantation in leukaemia. Int J Epidemiol 2004;33:15–17.[Free Full Text]

4 Brennan P. Commentary: Mendelian randomization and gene–environment interaction. Int J Epidemiol 2004;33:17–21.[Free Full Text]

5 Thomas DC, Conti DV. Commentary: The concept of ‘Mendelian Randomization’. Int J Epidemiol 2004;33:21–25.[Free Full Text]

6 Tobin MD, Minelli C, Burton PR, Thompson JR. Commentary: Development of Mendelian randomization: from hypothesis test to ‘Mendelian deconfounding’. Int J Epidemiol 2004;33:26–29.[Free Full Text]

7 Keavney B. Commentary: Katan's remarkable foresight: genes and causality 18 years on. Int J Epidemiol 2004;33:11–14.[Free Full Text]

8 Youngman LD, Keavney BD, Palmer A et al. Plasma fibrinogen and fibrinogen genotypes in 4685 cases of myocardial infarction and in 6002 controls: test of causality by ‘Mendelian randomization’. Circulation 2000;102 (Suppl II):31–32.

9 Clayton D, McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 2001;358:1356–60.[CrossRef][ISI][Medline]

10 Fallon UB. Homocysteine and coronary heart disease—Author's reply. Heart 2001, published online March 14 (http://heart.bmjjournals.com/cgi/eletters/85/2/153).

11 Keavney B. Genetic epidemiological studies of coronary heart disease. Int J Epidemiol 2002;31:730–36.[Free Full Text]

12 Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiol 2003;32:1–22.[CrossRef][ISI][Medline]

13 Davey Smith G, Harbord R, Ebrahim S. Fibrinogen, C-reactive protein and CHD: Does Mendelian randomisation suggest the associations are non-causal? Q J Med 2004;97:163–66.[ISI]

14 Berkson J. Limitations of the application of fourfold table analysis to hospital data. Biometrics Bull 1946;2:47–53.[ISI]

15 Thun MJ, Peto R, Lopez AD et al. Alcohol consumption and mortality among middle-aged and elderly U.S. adults. New Engl J Med 1997;337:1705–14.[Abstract/Free Full Text]

16 Langer RD, Criqui MH, Reed DM. Lipoproteins and blood pressure as biological pathways for effect of moderate alcohol consumption on coronary heart disease. Circulation 1992;85:910–15.[Abstract]

17 Klatsky AL. Alcohol, coronary disease and hypertension. Ann Rev Med 1996;47:149–60.[CrossRef][ISI][Medline]

18 Prospective Studies Collaboration. Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet 2002;360:1903–13.[CrossRef][ISI][Medline]

19 Gill JS, Shipley MJ, Tsementzis SA et al. Alcohol consumption—a risk factor for hemorrhagic and non-hemorrhagic stroke. Am J Med 1991;90:489–97.[ISI][Medline]

20 Gill JS, Zezulka AV, Shipley MJ, Gill SK, Beevers DG. Stroke and alcohol consumption. New Engl J Med 1986;315:1041–46.[Abstract]

21 Hart C, Davey Smith G, Hole D, Hawthorne V. Alcohol consumption and mortality from all causes, coronary heart disease, and stroke: results from a prospective cohort study of Scottish men with 21 years of follow up. Br Med J 1999;318:1725–29.[Abstract/Free Full Text]

22 Spearman C. The proof and measurement of association between two things. Am J Psychol 1904;15:72–101.

23 Davey Smith G, Phillips AN. Inflation in epidemiology: ‘The proof and measurement of association between two things’ revisited. Br Med J 1996;312:1659–61.[Free Full Text]

24 Macmahon S, Peto R, Cutler J et al. Blood pressure, stroke, and coronary heart disease. Part 1, prolonged differences in blood pressure: prospective observational studies corrected for the regression dilution bias. Lancet 1990;335:765–74.[ISI][Medline]

25 The Homocysteine Studies Collaboration. Homocysteine and risk of ischemic heart disease and stroke. A meta-analysis. JAMA 2002;288:2015–22.[Abstract/Free Full Text]

26 Klerk M, Verhoef P, Clarke R et al. MTHFR 677C->T polymorphism and risk of coronary heart disease. A meta-analysis. JAMA 2002;288:2023–31.[Abstract/Free Full Text]

27 Phillips A, Davey Smith G. How independent are ‘independent’ effects? Relative risk estimation when correlated exposures are measured imprecisely. J Clin Epidemiol 1991;44:1223–31.[ISI][Medline]

28 Egger M, Davey Smith G, Pfluger D, Altpeter E, Elwood PC. Triglyceride as a risk factor for ischaemic heart disease in British men: effect of adjusting for measurement error. Atherosclerosis 1999;143:275–84.[CrossRef][ISI][Medline]

29 Prediman K, Shah MD, Kaul S, Nilsson J, Carcek B. Exploiting the vascular protective effects of high-density lipoprotein and its apolipoproteins: an idea whose time for testing is coming, part 1. Circulation 2001;104:2376–83.[Free Full Text]

30 Hines LM, Stampfer MJ, Ma J et al. Genetic variation in alcohol dehydrogenase and the beneficial effect of moderate alcohol consumption on myocardial infarction. N Engl J Med 2001; 344:549–55.[Abstract/Free Full Text]

31 Cherry N, Mackness M, Durrington P et al. Paraoxonase (PON1) polymorphisms in farmers attributing ill health to sheep dip. Lancet 2002;359:763–64.[CrossRef][ISI][Medline]

32 Hugot J-P, Chamaiilard M, Zouali H et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 2001;411:599–603.[CrossRef][ISI][Medline]

33 Ogura Y, Bonen DK, Inohara N et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 2001;411:603–6.[CrossRef][ISI][Medline]

34 Fiocchi C. Inflammatory bowel disease: dogmas and heresies. Digest Liver Dis 2002;34:306–11.[CrossRef][ISI]

35 Sonnenberg A, Cucino C, Bauerfeind P. The unresolved mystery of birth-cohort phenomena in gastroenterology. Int J Epidemiol 2002;31:23–26.[Free Full Text]

36 Didierlaurent A, Sirard J-C, Kraehenbuhl J-P, Neutra MR. How the gut senses its content. Cellular Microbiology 2002;4:61–72.[CrossRef][ISI][Medline]

37 Colhoun H, McKeigue PM, Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet 2003;361: 865–72.[CrossRef][ISI][Medline]

38 Birge SJ, Keutmann HT, Cuatrecasas P, Whedon GD. Osteoporosis, intestinal lactase deficiency and low dietary calcium intake. N Engl J Med 1967;276:445–48.[ISI][Medline]

39 Newcomer AD, Hodgson SF, Douglas MD, Thomas PJ. Lactase deficiency: Prevalence in Osteoporosis. Annals Internal Med 1978;89:218–20.[ISI][Medline]

40 Honkanen R, Pulkkinen P, Järvinen R et al. Does lactose intolerance predispose to low bone density? A population-based study of perimenopausal Finnish women. Bone 1996;19:23–28.[CrossRef][ISI][Medline]

41 Elwood PC. Milk, coronary disease and mortality. J Epidemiol Comm Health 2001;55:375.[Free Full Text]

42 Elwood PC, Yarnell JW, Burr ML et al. Epidemiological studies of cardiovascular disease: progress report VII. MRC Epidemiology Unit: Cardiff, 1991.

43 Ness AR, Davey Smith G, Hart C. Milk, coronary heart disease and mortality. J Epidemiol Comm Health 2001;55:379–82.[Abstract/Free Full Text]

44 Shaper AG, Wannamethee G, Walker M. Milk, butter and heart disease. Br Med J 1991;302:785–86.[ISI][Medline]

45 Segall JJ. Hypothesis. Is lactose a dietary risk factor for ischaemic heart disease? Int J Epidemiol 1980;9:271–76.[Abstract]

46 Lember M, Tamm A. Lactose absorption and milk drinking habits in Estonians with myocardial infarction. Br Med J 1988;296:95–96.[ISI][Medline]

47 Enattah NS, Sahi T, Savilahti E, Terwilliger JD, Peltonen L, Järvelä I. Identification of a variant associated with adult-type hypolactasia. Nature Genet 2002;30:233–37.[CrossRef][ISI][Medline]

48 Mishkin B, Yalovsky M, Mishkin S. Increased prevalence of lactose malabsorption in Crohn's disease patients at low risk for lactose malabsorption based on ethnic origin. Am J Gastroenterol 1997;92:1148–53.[ISI][Medline]

49 Büning C, Ockenga J, Krüger S et al. The C/C-13910 and G/G-22018 Genotypes for adult-type hypolactasia are not associated with inflammatory bowel disease. Scand J Gastroenterol 2003;5:538–42.

50 Obermayer-Pietsch BM, Bonelli CM, Walter DE et al. Genetic predisposition for adult lactose intolerance and relation to diet, bone density, and bone fractures. Journal of Bone and Mineral Research 2004;19:42–47.[ISI][Medline]

51 Thomas DC, Witte JS. Point: population stratification: a problem for case-control studies of candidate–gene associations. Cancer Epidemiol Biomarkers Prevention 2002;11:505–12.[Free Full Text]

52 Wacholder S, Rothman N, Caporaso N. Counterpoint: Bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol Biomarkers Prevention 2002;11:513–20.[Free Full Text]

53 Cardon LR, Palmer LJ. Wagging the dog? Population stratification and spurious allelic association. Lancet 2003;361:598–604.[CrossRef][ISI][Medline]

54 LDL receptor mutation catalogue. http://www.ucl.ac.uk/fh (accessed December 16, 2003).

55 Day IN, Whittall RA, O'Dell SD et al. Spectrum of LDL receptor gene mutations in heterozygous familial hypercholesterolemia. Human Mutation 1997;10:116–27.[CrossRef][ISI][Medline]

56 Hobbs HH, Brown MS, Goldstein JL. Molecular genetics of the LDL receptor gene in familial hypercholesterolemia. Human Mutation 1992;1:445–66.[Medline]

57 Marks D, Thorogood M, Neil HAW, Humphries SE. A review on diagnosis, natural history and treatment of familial hypercholesterolaemia. Atherosclerosis 2003;168:1–14.[CrossRef][ISI][Medline]

58 Slack J. Risks of ischaemic heart disease in familial hyperlipoproteinaemic states. Lancet 1969;2:1380–82.[ISI][Medline]

59 Scientific Steering Committee on behalf of the Simon Broome Register Group. Risk of fatal coronary heart disease in familial hyper-cholesterolaemia. Br Med J 1991;303:893–96.[ISI][Medline]

60 Færgeman O. Coronary Artery Disease: Genes Drugs and the Agricultural Connection. Elsevier: Netherlands, 2003.

61 Ebrahim S, Davey Smith G, McCabe C et al. Cholesterol and coronary heart disease: screening and treatment. Qual Health Care 1998; 7:232–39.[Free Full Text]

62 Randomised trial of cholesterol lowering in 4444 patients with coronary heart disease: the Scandinavian Simvastin survival Study (4S). Lancet 1994;344:1383–89.[ISI][Medline]

63 Heart Protection Study Collaborative Group. MRC/BHF heart protection Study of cholesterol lowering with simvastatin in 20 536 high-risk individuals. Lancet 2002;360:7–22.[CrossRef][ISI][Medline]

64 Shepherd J, Cobbe SM, Ford I et al. for the West of Scotland Coronary Prevention Study Group. Prevention of coronary heart disease with pravastatin in men with hypercholesterolemia. N Engl J Med 1995; 333:1301–07.[Abstract/Free Full Text]

65 Rose G. Incubation period of coronary heart disease. Br Med J 1982; 284:1600–1.[ISI][Medline]

66 Soria LF, Ludwig EH, Clarke HRG, Vega GL, Grundy SM, McCarthy BJ. Association between a specific apolipoprotein B mutation and familial defective apolipoprotein B-100. Proc Natl Acad Sci U S A 1989;86:587–91.[Abstract]

67 Tybjaerg-Hansen A, Humphries SE. Familial defective apolipoprotein B-100: a single mutation that causes hypercholesterolemia and premature coronary artery disease. Atherosclerosis 1992;96:91–107.[ISI][Medline]

68 Myant NB. Familial defective apolipoprotein B-100: a review, including some comparisons with familial hypercholesterolaemia. Atheroscelrosis 1993;104:1–18.

69 Tybjærg-Hansen A, Steffenson R, Meinertz H, Schnohr P, Nordestgaard BG. Association of mutations in the apolipoprotein B gene with hypercholesterolemia and the risk of ischemic heart disease. N Engl J Med 1998;338:1577–84.[Abstract/Free Full Text]

70 Rook GAW. The role of vitamin D in tuberculosis. Am Rev Respir Disease 1988;138:768–70.[ISI][Medline]

71 Ness AR, Frankel SJ, Gunnell DJ, Davey Smith G. Are we really dying for a tan? Br Med J 1999;319:114–16.[Free Full Text]

72 Rook GAW, Steele J, Frahere L et al. Vitamin D, gamma interferon, and control of proliferation of Mycobacterium tuberculosis by human monocytes. Immunology 1986;57:159–63.[ISI][Medline]

73 Davies PDO. A possible link between vitamin D deficiency and impaired host defence to Mycobacterium tuberculosis. Tubercle 1985;66:301–6.[ISI][Medline]

74 Strachan D, Powell KJ, Thaker A, Millard FJ, Maxwell JD. Vegetarian diet as a risk factor for tuberculosis in immigrant south London Asians. Thorax 1995;50:175–80.[Abstract]

75 Davies PDO, Brown RC, Woodhead JS. Serum concentrations of vitamin D metabolites in untreated tuberculosis. Thorax 1985;40:187–90.[Abstract]

76 Grange JM, Davies PDO, Brown RC, Woodhead JS, Kardjito T. A study of vitamin D levels in Indonesian patients with untreated pulmonary tuberculosis. Tubercle 1985;66:187–91.[ISI][Medline]

77 Douglas A, Strachan D, Maxwell JD. Seasonality of tuberculosis: the reverse of other respiratory diseases in the UK. Thorax 1996; 51:944–46.[Abstract]

78 Bellamy R. Evidence for gene–environment interaction in development of tuberculosis. Lancet 2000;355:588–89.[CrossRef][ISI][Medline]

79 Zmuda JM, Cauley JA, Ferrell RE. Molecular epidemiology of vitamin D receptor gene variants. Epidemiol Reviews 2000;22:203–17.[ISI]

80 Uitterlinden AG, Fang Y, Bergink AP, van Meurs JB, van Leeuwen HP, Pols HA. The role of vitamin D receptor gene polymorphisms in bone biology. Mol Cellular Endocrinol 2002;197:15–21.[CrossRef][ISI]

81 Bellamy R, Ruwende C, Corrah T et al. Tuberculosis and chronic hepatitis B virus infection in Africans and variation in the vitamin D receptor gene. J Infect Dis 1999;179:721–24.[CrossRef][ISI][Medline]

82 Wilkinson RJ, Llewelyn M, Toosi Z et al. Influence of vitamin D deficiency and vitamin D receptor polymorphisms on tuberculosis among Gujarati Asians in west London: a case-control study. Lancet 2000;355:618–21.[CrossRef][ISI][Medline]

83 Berrington A, Green J, Newton R. Vitamin D deficiency and tuberculosis. Lancet 2000;356:73–74.[CrossRef][ISI][Medline]

84 Wilkinson RJ, Davidson RN. Vitamin D deficiency and tuberculosis. Lancet 2000;356:74–75.

85 Ponsonby A-L, McMichael A, van der Mei I. Ultraviolet radiation and autoimmune disease: insights from epidemiological research. Toxicology 2002;181:71–78.[CrossRef][ISI]

86 Fukazawa T, Yabe I, Kikuchi S et al. Association of vitamin D receptor gene polymorphism with multiple sclerosis in Japanese. J Neurol Sci 1999;166:47–52.[CrossRef][ISI][Medline]

87 Motohashi Y, Yamada S, Yanagawa T et al. Vitamin D receptor gene polymorphism affects onset pattern of type I diabetes. J Clin Endocrinol Metab 2003;88:3137–40.[Abstract/Free Full Text]

88 Ogunkolade B-W, Boucher BJ, Prahl JM et al. Vitamin D receptor (VDR) mRNA and VDR protein levels in relation to vitamin D status, insulin secretory capacity and VDR genotype in Bangladeshi asians. Diabetes 2002;51:2294–312.[Abstract/Free Full Text]

89 Kune GA, Kune S, Watson LF. Colorectal cancer risk, chronic illnesses, operations and medications: case control results from the Melbourne Colorectal Cancer Study. Cancer Res 1988;48:4399–404.[Abstract]

90 Sandler RS, Galanko JC, Murray SC, Helm JF, Woosley JT. Aspirin and nonsteroidal anti-inflammatory gents and risk for colorectal adenomas. Gastroenterology 1998;114:441–47.[ISI][Medline]

91 Lin HJ, Lakkides KM, Keku TO et al. Prostaglandin H Synthase 2 variant (Val511Ala) in African Americans may reduce the risk for colorectal neoplasia. Cancer Epidemiol Biomarkers Prevention 2002;11:1305–15.[Abstract/Free Full Text]

92 Marmot M. Commentary: Reflections on alcohol and coronary heart disease. Int J Epidemiol 2001;30:729–34.[Free Full Text]

93 Bovet P, Paccaud F. Commentary: Alcohol, coronary heart disease and public health: which evidence-based policy? Int J Epidemiol 2001;30:734–37.[Free Full Text]

94 Klatsky AL. Commentary: Could abstinence from alcohol be hazardous to your health? Int J Epidemiol 2001;30:739–42.[Free Full Text]

95 Shaper AG. Editorial: alcohol, the heart, and health. Am J Public Health 1993;83:799–801.[ISI][Medline]

96 Hart CL, Davey Smith G, Hole DJ, Hawthorne VM. Alcohol consumption and mortality from all causes, coronary heart disease, and stroke: results from a prospective cohort study of Scottish men with 21 years of follow up. Br Med J 1999;318:1725–29.[Abstract/Free Full Text]

97 Rimm E. Commentary: Alcohol and coronary heart disease—laying the foundation for future work. Int J Epidemiol 2001;30:738–39.[Free Full Text]

98 Enomoto N, Takase S, Yasuhara M, Takada A. Acetaldehyde metabolism in different aldehyde dehydrogenase-2 genotypes. Alcohol Clin Exp Res 1991;15:141–44.[ISI][Medline]

99 Takagi S, Iwai N, Yamauchi R et al. Aldehyde dehydrogenase 2 gene is a risk factor for myocardial infarction in Japanese men. Hypertens Res 2002;25:677–81.[CrossRef][ISI][Medline]

100 Chao Y-C, Liou S-R, Chung Y-Y et al. Polymorphism of alcohol and aldehyde dehydrogenase genes and alcoholic cirrhosis in Chinese patients. Hepatology 1994;19:360–66.[ISI][Medline]

101 Haskell WL, Camargo C, Williams PT et al. The effect of cessation and resumption of moderate alcohol intake on serum high-density-lipoprotein subfractions. N Engl J Med 1984;310:805–10.[Abstract]

102 Burr ML, Fehily AM, Butland BK, Bolton CH, Eastham RD. Alcohol and high-density-lipoprotein cholesterol: a randomised controlled trial. Br J Nutrition 1986;56:81–86.[ISI][Medline]

103 Dyer AR, Stamler J, Paul O, Berkson DM. Alcohol, cardiovascular risk factors and mortality: the Chicago experience. Circulation 1981;64(Suppl III):III-20–III-27.

104 Wallace RB, Lynch CF, Pomrehn PR et al. Alcohol and hypertension: epidemiologic and experimental considerations. Circulation 1981; 64(Suppl III):III-41–III-47.

105 Yokoyama A, Kato H, Yokoyama T et al. Genetic polymorphisms of alcohol and aldehyde dehydrogenases and glutathione S-transferase M1 and drinking, smoking, and diet in Japanese men with esophageal squamous cell carcinoma. Carcinogenesis 2002;11:1851–59.[CrossRef]

106 Yokoyama A, Omori T. Genetic polymorphisms of alcohol and aldehyde dehydrogenases and risk for esophageal and head and neck cancers. Jpn J Clin Oncol 2003;33:111–21.[Abstract/Free Full Text]

107 Hill MJ, Crowther JS, Draser BS. Bacteria and etiology of cancer of the large bowel. Lancet 1971;i:95–100.[CrossRef]

108 Wynder EL, Reddy BS. Metabolic epidemiology of colorectal cancer. Cancer 1974;34:801–6.[ISI][Medline]

109 Imray CH, Radley S, Barker G et al. Faecal unconjugated bile acids in patients with colorectal cancer or polyps. Gut 1992;33:1239–45.[Abstract]

110 Breuer NF, Dommes P, Jaelel S, Goebell H. Fecal bile acid excretion pattern in colonic cancer patients. Dig Dis Sci 1985;30:852–59.[ISI][Medline]

111 De Kok TM, van Faassen A, Glinghammar B et al. Bile acid concentrations, cytotoxicity and pH of fecal water from patients with colorectal adenomas. Dig Dis Sci 1999;44:2218–25.[CrossRef][ISI][Medline]

112 Wang W, Xue S, Ingles SA et al. An association between genetic polymorphisms in the ileal sodium-dependent bile acid transporter gene and the risk of colorectal adenomas. Cancer Epidemiol Biomakers Prevention 2001;10:931–36.

113 Oelkers P, Kirby LC, Heubi JE, Dawson PA. Primary bile acid malabsorption caused by mutations in the ileal sodium-dependent bile acid transporter gene (SLC10A2). J Clin Investig 1997;99:1880–87.[Abstract/Free Full Text]

114 MRC Vitamin Study Research Group. Prevention of neural tube defects: results of the Medical Research Council vitamin study. Lancet 1991;338:131–37.[ISI][Medline]

115 Czeizel AE, Dudás I. Prevention of the first occurrence of neural-tube defects by periconceptional vitamin supplementation. N Engl J Med 1992;327:1832–35.[Abstract]

116 Botto LD, Yang Q. 5,10-Methylenetetrahydrofolate reductase gene variants and congenital anomalies: a HuGE Review. Am J Epidemiol 2000;151:862–77.[Abstract]

117 Hattersley AT, Beards F, Ballantyne E, Appleton M, Harvey R, Ellard S. Mutations in the glucokinase gene of the fetus result in reduced birth weight. Nature Genetics 1998;19:268–70.[CrossRef][ISI][Medline]

118 Velho G, Hattersley AT, Froguel P. Maternal diabetes alters birth weight in glucokinase-deficient (MODY2) kindred but has no influence on adult weight, height, insulin secretion or insulin sensitivity. Diabetologia 2000;43:1060–63.[CrossRef][ISI][Medline]

119 Brunner E, Davey Smith G, Marmot M, Canner R, Beksinska M, O'Brien J. Childhood social circumstances and psychosocial and behavioural factors as determinants of plasma fibrinogen. Lancet 1996;347:1008–13.[ISI][Medline]

120 Danesh J, Collins R, Appleby Peto R. Association of fibrinogen, C-reactive protein, albumin or leucocyte count with coronary heart disease. Meta-analysis of prospective studies. JAMA 1998;279:1477–82.[Abstract/Free Full Text]

121 Danesh J, Youngman L, Clark S, Parish S, Peto R, Collins R. Helicobacter pylori infection and early onset myocardial infarction: case-control and sibling pairs study. Br Med J 1999;319: 1157–62.[Abstract/Free Full Text]

122 Little J, Khoury MJ. Mendelian randomization: a new spin or real progress? Lancet 2003;362:930–31.[CrossRef][ISI][Medline]

123 Watson MA, Gay L, Stebbings WSL, Speakman CTM, Bingham SA, Loktionov A. Apolipoprotein E gene polymorphism and colorectal cancer: gender specific modulation of risk and prognosis. Clinical Science 2003;104:537–45.[CrossRef][ISI][Medline]

124 Schachter F, Faure-Delanef L, Guenot F et al. Genetic associations with human longevity at the apoE and ACE loci. Nature Genetics 1994;6:29–32.[ISI][Medline]

125 Wilson PW, Myers RH, Larson MG et al. Apolipoprotein E alleles, dyslipidemia, and coronary heart disease. JAMA 1994;272: 1666–71.[Abstract]

126 McCarron MO, Delong D, Alberts MJ. APOE genotype as a risk factor for ischaemic cerebrovascular disease: a meta-analysis. Neurology 1999;53:1308–11.[Abstract/Free Full Text]

127 Farrer LA, Cupples LA, Haines JL et al. Effects of age, sex and ethnicity on the association between apolipoprotein E genotype and Alzheimer Disease. A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium. JAMA 1997;278:1349–56.[Abstract]

128 Smith JD. Apolipoprotein E4: an allele associated with many diseases. Ann Med 2000;32:118–27.[ISI][Medline]

129 Simons M, Keller P, Dichgans J, Schulz JB. Cholesterol and Alzheimer's disease: is there a link? Neurology 2001;57:1089–93.[Abstract/Free Full Text]

130 Terwilliger JD, Weiss WM. Confounding, ascertainment bias, and the blind quest for a genetic ‘fountain of youth’. Annals Med 2003;35:532–44.[ISI]