Divisions of Genetics and Endocrinology (J.N.H.), Childrens Hospital, Boston, Massachusetts 02115; Departments of Genetics (J.N.H., D.A.) and Medicine (D.A.), Harvard Medical School, Boston, Massachusetts 02114; Whitehead/MIT Center for Genome Research (J.N.H., D.A.), Cambridge, Massachusetts 02139; and Department of Molecular Biology and Diabetes Unit (D.A.), Massachusetts General Hospital, Boston, Massachusetts 02114
Address all correspondence and requests for reprints to: Joel N. Hirschhorn, M.D., Ph.D., Childrens Hospital, Enders 561, 300 Longwood Avenue, Boston, Massachusetts 02115.
Over the last 15 yr, genes responsible for hundreds of inherited human diseases have been identified, enabling clinical diagnosis and the potential for therapeutic intervention. Until very recently, however, success has been limited to so-called monogenic disorders, diseases in which mutation of a single gene is both necessary and sufficient to cause disease in any given individual. Because such mutations are strictly co-inherited with disease, it is possible to use linkage analysis to identify their chromosomal location by analyzing which of a genome-wide set of markers segregates with disease in families. Genes contained within such linked regions become positional "candidates" and are next examined for mutations in affected individuals. For any such candidate gene, proof of causality typically depends on two additional lines of evidence. First, the putative causal changes should be found only in affected individuals. Second, one hopes for a "smoking gun"that the disease-associated mutations are obviously deleterious to protein function (due to truncation or deletion of a coding region or alteration of a highly conserved residue). Success is typically declared when these criteria are all satisfied: the putative disease gene 1) is located in a chromosomal region that co-segregates with disease in affected families, 2) contains multiple independent mutations that are perfectly associated with disease status in the families, and 3) whose characteristics obviously alter protein function.
Most common diseases are strongly influenced by inheritance, but, to date, relatively few genes have been identified that are responsible for familial clustering of these diseases. Success has been elusive because common diseases are almost all complex disorders, where multiple genes and environmental factors collaborate to cause disease. Because no single gene segregates tightly with disease, it has proven very difficult to confidently localize putative disease genes to chromosomal locations. For this reason, optimistic gene hunters have leapt directly to the latter stage of examining candidate genes for mutations that show association to disease. Typically, however, these candidate genes are based on a biological hypothesis, rather than chromosomal position relative to a linkage study.
Association studies can, in theory, succeed where linkage fails, because association can offer much greater statistical power (1), thus providing a rationale for circumventing linkage analysis. But several problems bedevil such association studies. First, it is expected that the causal mutations are neither necessary nor sufficient to cause disease. That is, some people will have the mutation but not disease, and others will have disease without a causal mutation of that gene. Because of this imperfect correlation, association studies must compare the frequencies of a putative causal mutation in individuals with disease and in appropriate controls. If the mutation is found at a statistically significant higher frequency in affected individuals, the mutation is said to be associated with disease. However, determining appropriate thresholds of significance is challenging because the a priori likelihood that any given candidate gene plays a role in disease is unknown, but certainly low (in following up a solid linkage peak, one at least begins with the knowledge that one or more genes in the region is responsible for the disease). In addition, these more subtle genetic risk factors need not be premature stop codons or protein truncation mutants. Rather, they may be innocuous to the scientists eye and yet cause disease by altering the in vivo regulation, expression, stability, activity, or interactions of the encoded protein. Because of these and other difficulties (see, for example, Ref. 2), it is important that association studies be performed and scrutinized carefully, especially when different investigators reach different conclusions as to whether mutations or genetic variation in a gene is associated with disease. Here, we discuss the relevant points in light of a report in this issue questioning the relationship of mutations in the MC4R gene and severe obesity (3).
Most association studies have focused on a common genetic variation: by convention, common genetic variants (or polymorphisms) are those for which two or more alleles each exist in 1% or more of the population at large. There are many practical advantages to studying common variants. Because they are present at high frequency, common variants can be discovered in any modest sized group of individuals. This facilitates cataloging of common variants. Over the last 3 yr, millions of common human sequence variants have been identified and placed in databases (4). Moreover, because strong correlations are typically observed between neighboring variants (linkage disequilibrium), most common variations in the genome can be tested for a role in disease using a subset of carefully chosen "tag" single nucleotide polymorphisms (see Ref. 5 and references therein). Finally, testing common variants for association to disease is technically straightforward. The frequencies of each variant can be accurately estimated in modest sized collections of patients with and without the disease.
Of course, there is no reason to presume that the mutations responsible for common diseases will themselves be common. Certainly, many rare monogenic disorders are due to a heterogeneous collection of variants that are individually very rare. In thinking about the allele spectrum of common diseases, it is important to consider both the overall characteristics of human genetic variation and the particular evolutionary features of each disease (6). The characteristic features of human genetic variation have been well described (see Ref. 4 and references therein). Numerically, rare variants outnumber common variants, but the vast majority of variant alleles in the population (heterozygosity) are attributable to the small number of common variants. Thus, for disease phenotypes that had a neutral effect on human evolutionary fitness, the spectrum of alleles causing disease should resemble this overall patternmost of the genetic burden of disease in the population will be due to common variants. In contrast, where disease was disadvantageous from an evolutionary perspective (e.g. diseases that are lethal in childhood), rare variants will predominate, because variants that lower reproductive fitness generally do not drift up to high frequency. Finally, disease phenotypes that experienced balancing selection (e.g. sickle cell disease, where disease is balanced by resistance to malaria in carriers) or that may even have been evolutionarily advantageous (as has been proposed for obesity under the thrifty gene hypothesis) should be due to variants that are even more common than those found throughout the genome as a whole.
Given the speculative nature of such evolutionary hypotheses, as well as the experience from rare monogenic disorders, there has been great attention to the importance of studying rare genetic variants for a role in common disease (see, for example, Ref. 6). However, studying rare variants introduces important methodological challenges. First, investigators must discover the variants in each of their populations, often by directly resequencing affected individuals. Moreover, because they are rare, much larger samples must be examined before an accurate estimate of frequency can be obtained for the comparison of cases and controls. In making such comparisons, it is critical that controls be scrutinized for variation in the exact same manner as cases, because resequencing only of cases leads to a significant problem called ascertainment bias. In brief, sequencing a large group of individuals will nearly always identify a particular collection of vanishingly rare variants (including missense variants) that will be absent from any second collection of individuals that is tested. Thus, finding a few rare, apparently deleterious mutations in affected individuals does not signify a role in disease, unless controls have been examined with the same intensity as have cases, and the preferential presence of variants in affected individuals is strong and statistically convincing.
In fact, the statistical analysis of rare variation almost always requires that collections of different rare variants be considered as a group, because no individual variant is sufficiently common to permit an accurate assessment of its frequency in realistically sized disease or control populations. It is important that the grouping of rare variants not be done post hoc in a subjective (and, therefore, potentially biased) manner. Rather, the grouping must be on the basis of obvious sequence characteristics (e.g. all missense variants, all nonsense and frameshift mutations, etc.), or on the basis of a valid functional assay. To avoid a biased assignment of functional importance to variants identified in affected individuals, such a functional assay should be developed and validated independently of the results of association analysis.
Despite these additional challenges, association testing of rare and common variants is fundamentally similar. Once a relevant common variant or group of rare variantsis identified, the frequency of variants must be rigorously compared in affected individuals and in controls. The choice of a threshold for declaring a significant association has been a matter of some debate. The P value for differences in frequencies between affected individuals and controls reflects the likelihood of observing an association by chance if the true frequency were the same in both groups. This obviously is not equivalent, however, to the likelihood that the experimenters hypothesis of association is in error, because this latter calculation requires knowledge of the a priori probability that the variant in question was associated with disease. In most such studies, this prior probability is extremely lowthere are hundreds of candidate genes, each of which contains a hundred or more commonly varying sites; moreover, many causal genes will not be obvious candidates, and the entire genome contains 10 million common variants and many more rare variants.
In this Bayesian framework, the P values for most association studies are not low enough to meet a conservative threshold for declaring significance that minimizes type 1 errors (false positive studies). Thus, a single report of association is almost always inadequate to prove causation. Rather, replication, preferably in multiple independent studies, is required. However, consistent replication of associations has been difficult to achieve. Indeed, a review of associations between common variants and disease found that the vast majority of such associations have not been consistently reproduced (2). The possible reasons for the inconsistency include false positives due to type 1 error, false positives due to population stratification, false negatives due to lack of power in potential replication studies, and true differences between study populations (e.g. different phenotypes, or different environmental or genetic modifiers). These explanations probably all contribute to the lack of reproducibility and are relevant to both studies of individual common variants and of grouped rare variants. However, to interpret an association that has not been consistently replicated, it is important to try to distinguish which of these explanations is truly relevant to that particular association.
Lack of power for true associations with modest effects can clearly contribute to inconsistent replication. For example, several studies failed to detect an association of the Pro12Ala variant with type 2 diabetes and concluded that no such role was likely to exist. In fact, this PPARG variant has a much lower relative risk (1.25-fold) than was initially estimated, and consequently many studies were underpowered to detect the true effect of the allele on diabetes risk (7). To assess power, then, the range of genetic effects that are consistent with the negative data should be compared with the genetic effect estimated by considering all of the previous studies together.
In this context, Jacobson et al. (3) report in this journal their failure to replicate the previously described association of rare functional variants in the MC4R gene with morbid human obesity. MC4R was originally implicated in obesity by mouse studies (see, for example, Ref. 8); two groups then independently identified severely obese individuals (one in each study) with a frameshift mutation in the MC4R gene (9, 10). Several additional studies subsequently found at least 34 additional missense or frameshift mutations in MC4R in obese individuals, but no functional missense variants were identified in control individuals who were resequenced (see Refs. 1824 in Ref. 3). It is worth noting that there are three somewhat common missense variants that are present at approximately equal frequencies in obese and nonobese individuals (Val103Ile, Thr112Met, and Ile251Leu). These variants have been distinguished from the putative functional variants because they do not affect MC4R function in in vitro tests (11, 12). If these common variants were included in statistical analysis, they would swamp out any signal due to the rarer, apparently functional mutations. Thus, the statistical arguments in favor of association rely critically on the relevance of the in vitro assay used to assess protein function.
In total, at least 34 putative functional MC4R mutations have been identified in 1187 obese individuals (2.9% frequency) as compared with zero in 827 controls. Nearly all of the obese individuals who were found to carry mutations are either severely morbidly obese [body mass index (BMI),50] or had onset of obesity in childhood or early adolescence. However, some carriers and relatives were identified who also carry MC4R mutations but have less severe phenotypes (12). These findings led to the hypothesis that mutations in MC4R might be a common cause of morbid obesity (12). It was the intent of Jacobson et al. (3) to test this hypothesis.
To study the role of MC4R mutations in obesity, they resequenced the MC4R gene in over 200 obese white subjects and 47 obese black subjects, plus a similar number of controls. None of the previously described putative functional variants were identified. Three new missense variants were identified, each in one black control (Ile102Thr, Phe202Leu, and Asn240Ser). Unfortunately, no functional evaluation of these variants was performed. In addition, one frameshift mutation leading to a predicted premature termination codon was discovered in an obese white female. The authors interpret these data as failing to replicate the previous findings of a high rate of MC4R mutations in obese individuals. To evaluate whether the authors have failed to replicate the previous finding, or have simply asked a different question, we consider below each of the main possible explanations for the failure to replicate: false negative study (inadequate power), a falsely positive original report, and true differences between study populations.
Jacobson et al. (3) state that they can strongly reject the hypothesis that mutations in MC4R account for 4% of cases of obesity, with 8085% power in blacks and greater than 99.9% power in whites. These arguments are based on not having observed any of the previously reported functional variants. However, the spectrum of variants that can affect MC4R function is apparently quite diverse (few or none of the functional variants have been identified independently in separate studies), so the expectation should be that most functional variants in MC4R would be novel. Indeed, Jacobson et al. (3) did discover one such variant (the frameshift mutation, although they did not test it for function in vitro). Thus, a more appropriate interpretation of their data is that they identified one apparently functional mutation in approximately 200 unrelated obese white individuals and 47 obese black individuals. Furthermore, 4% is the highest estimate of MC4R mutations in the literature (12). A more appropriate figure to test would be 2.9% (the estimate from all available data), and even this value may be too high because some of the rare missense changes identified in obese individuals do not impair function in vitro (see Ref. 12) and, thus, may not represent functional MC4R mutations.
What range of frequencies of MC4R mutations could be consistent with the data in Jacobson et al. (3)? Given the observation of one mutation in 200 individuals, the 95% confidence interval for the frequency of MC4R mutations is quite wide; values from 0.122.75% are all consistent with the data. Even if the data from blacks and whites are pooled, the 95% confidence interval extends from 0.12.2%; the data from blacks alone are actually consistent with mutation frequencies as high as 7% in this population. If power is set at a more conservative 80%, the data can reject frequencies of mutations greater than 1.5% in whites, 3.3% in blacks, and 1.2% in the pooled sample.
Although the confidence intervals are wide, the data are apparently inconsistent with the 2.9% frequency of mutations found in previous studies. Thus, additional explanations are required other than inadequate power of the current study. The second possibility is that the previous studies were false positives: all of the functional variants were found in the obese individuals by chance or due to a biased selection of which variants were considered functional. However, this explanation also seems unlikely. The P value for finding 34 functional variants in 1187 obese individuals but no functional variants in 827 controls is under 10-6. In addition, many of the variants are obviously deleterious, and there is no reason to suspect that the functional assay is biased. Thus, one must consider the possibility that the authors of the present study may have asked a different question than that posed before, specifically that the populations used in previous studies and the populations used in Jacobson et al. (3) are in some way fundamentally different from each other.
The possibility of differences in populations seems quite plausible if one compares the characteristics of the populations used in previous studies with those of the obese individuals in this most recent study. In previous studies, almost all MC4R carriers were found among severely obese individuals (BMI, >50) and/or individuals with early-onset obesity (usually in childhood). By contrast, only 42% of the white individuals in the study by Jacobson et al. (3) had early-onset obesity (childhood or adolescence), and less than 10% had a BMI greater than 50. If one only considers these approximately 100 subjects in the study, there is no longer 80% power to reject a mutation frequency of 2.9%, although the estimate of the frequency (now 1% rather than 0.5%) is still much lower than the 2.9% estimated by previous studies.
However, a potentially more important conclusion is suggested from these data: Jacobson et al. (3) observe no MC4R mutations in 100 individuals with a BMI under 50 and onset of obesity in adulthood. Indeed, the only other studies to examine populations not selected or enriched for severe or early-onset obesity also failed to find mutations in MC4R despite screening 90 individuals (13, 14). This suggests that although MC4R mutation might be a rare (23%) but significant cause of early-onset or severe obesity, it is likely to be a less common cause of obesity in the general population. This could be explained (post hoc) by invoking the possible selective disadvantage of relatively penetrant alleles that cause severe, early-onset obesity (where rare MC4R mutations are observed) compared with the less easily predicted evolutionary history of the more common form of the disease.
How can one reconcile this apparent absence of MC4R mutations in cases of less severe obesity with the fact that some relatives of severely affected probands carry MC4R mutations but have milder obesity (12)? Shouldnt some of these relatives have turned up in the studies of "typical" obesity? The answer probably lies in the fact that there are many, many more people with typical obesity than there are relatives of MC4R carriers. This theory predicts that if one studied people with mild obesity who had a relative with severe or early-onset obesity, it would be possible to enrich for carriers of MC4R mutations. This dichotomy is reminiscent of the situation for BRCA1 and breast cancer, where mutations are found at appreciable frequencies in highly familial cases, early-onset cases, or women with multiple cancers. Where family members of such women are tested, carriers with later onset breast cancer (or no cancer at all) are found at appreciable rates. Nevertheless, when more typical cases of breast cancer are examined, BRCA1 mutations are found at much lower rates.
More generally, the current study highlights many of the important issues in association studies of genetic variants and disease. Replication is paramount, because prior probabilities of true association are low and, thus, all genetic associations demand a high level of proof. Nevertheless, failure to replicate can be due to many possible explanations beyond false positive studies and type 1 error. Lack of power to exclude modest effects and different phenotypes of patients in different collections can both contribute to apparent heterogeneity among studies.
Moreover, genetic variants both rare and common are certain to play a role in most diseases, but their relative contributions in any given case are hard to predict. Given the expense of resequencing and the analytic challenges of grouping rare variants into "functional" and "benign" categories, an exhaustive exploration of the "rare variant" hypothesis is daunting even for high-priority candidate genes. Nevertheless, rare variants such as are found in MC4R will no doubt teach us a great deal about physiology and the ways in which human biology can be altered by severe genetic insults. Whether rare variants also turn out to explain much of the population burden of disease is an important question that will challenge us for some time to come.
Acknowledgments
Footnotes
Abbreviation: BMI, Body mass index.
Received August 19, 2002.
Accepted August 19, 2002.
References