Human and Molecular Genetics Center Department of Physiology Medical College of Wisconsin Milwaukee, Wisconsin 53226
For the past several years, human biologists have eagerly awaited the completion of the human genome sequence. The draft sequence, an incomplete yet remarkably useful tool to geneticists, has been available for almost two years, and the finished version is expected by April 2003. While this sequence will still contain gaps, it will be highly accurate (less than one sequencing error every 10,000 bases) and gaps will be mainly due to limitations of current sequencing technology and approaches. For example, heterochromatin regions such as the highly repetitive sequences around the centromeres will not be sequenced. In addition to the actual sequence, the Human Genome Project has also devoted considerable effort to the identification of sequence variants throughout the genome. On average, any two human genomes are more than 99.9% identical, but a difference in the sequence can be detected every 1,0001,500 bp (28). The vast majority of these differences are single base pair changes in the DNA sequence, such as deletions, insertions, or substitutions of individual bases. These polymorphisms are collectively called "single nucleotide polymorphisms," or SNPs. It has been estimated that the entire human population harbors 10 million so-called "common" SNPs with a minor allele frequency [i.e., the percentage of all living humans that have the rarer nucleotide (allele) for this SNP, as opposed to the other more frequent nucleotide] of greater than 5% in the human population (15). Since each human being carries two complete copies of the human genome (the diploid chromosome set), over 10% of the worlds population carries the rarer nucleotide of a "common" SNP in the sequence of one of their two genome copies. To date, over 4 million of these SNPs have been identified across the human genome and are available in public and private databases (see dbSNP, a public database at National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/SNP/index.html, and the Celera SNP database http://www.celeradiscoverysystem.com/glance/home.cfm).
Because of their abundance and density, SNPs have been proposed as ideal polymorphic markers for disease association studies and for fine-mapping efforts to identify disease-causing genes and their mutations (3). However, so far the use of SNPs has been mostly limited to the analysis of small candidate gene regions, mainly for two reasons. First, the cost associated with assaying a comprehensive set of SNPs in large study cohorts is still significant despite recent technological advances. This has been prohibitive for using SNPs in genome-wide association studies. Second, the alternative, an analysis of a representative subset of SNPs across the entire human genome rather than all SNPs, requires extensive knowledge of the relationships of SNPs across the genome. As will be explained in the following sections, SNPs in close proximity to each other are not completely independent, and knowing the allele of one SNP allows predictions about the allele present at the neighboring SNP position. If all of these relationships were known, then efficient analytical approaches could identify a representative, nonredundant subset of SNPs by eliminating SNPs that can be predicted by other neighboring SNPs. Unfortunately, studies have only recently begun to shed light on these relationships between SNPs (2, 5, 7, 19, 22, 23, 25, 30, 31).
Although the technical limitations and cost associated with current SNP genotyping methods are still significant, it is likely that technologies will be greatly improved over the next several years. Therefore, the focus of this editorial will be on strategies being used to analyze the complex relationship between SNPs in the human genome, as well as the approaches proposed to identify an informative subset of SNPs, the primary goal of the recently initiated Haplotype Map (HapMap) Project (4). This international effort of nine groups in the United States, Canada, the United Kingdom, Japan, and China will generate detailed information on the common haplotype structure of the human genome over the next 3 years and help generate a complete haplotype map of the human genome. At a cost of $100 million, a dense set of SNPs will be analyzed in 200400 individuals from four geographically distinct populations: the Yoruba in Nigeria, the Japanese, the Han Chinese, and US residents with Northern European ancestry. The ultimate goal of the effort is to identify a subset of 300,000600,000 "tag" SNPs, a set of nonredundant SNPs (none of the SNPs included in this set would predict each other) that are representative of and capture the haplotype variation in the human genome. While the project is under way, several questions about the current approach and the most efficient design for such a study are still unanswered, and these will be raised in this editorial. The discussion will focus on currently existing data on the haplotype structure of the human genome, the application of haplotype information to disease mapping and disease gene identification, and a preliminary but by no means comprehensive discussion of the unresolved questions about the usefulness of this haplotype-based approach for identifying a subset of informative "tag SNPs" to be used in fine-mapping efforts and whole-genome association analyses of genetic causes of common human disorders. There are certainly additional statistical issues that need to be addressed in the context of SNP vs. haplotype-based association studies to help resolve the controversial discussion about appropriate significance levels, but due to the complexity of the arguments, this editorial will not touch on these issues.
Linkage disequilibrium and haplotypes.
The vast majority of SNPs in the human genome probably arose from individual single mutation events at an early time during human history. From the time of each mutation event, the new allele (i.e., the new "mutant" nucleotide) is located on an individual chromosome that has specific alleles of other SNPs that arose previously on the same chromosome. This physical arrangement of SNP alleles along a chromosome is called a "haplotype." Over multiple successive generations, recombination and novel mutation events will lead to a rearrangement of the ancestral haplotype around the new SNP allele. As a consequence, the new allele remains on the same portion of the ancestral haplotype only with other SNP alleles that are in close physical proximity. This nonrandom arrangement of adjacent loci, i.e., the maintenance of a small segment of the ancestral haplotype, is called "linkage disequilibrium" (LD) or allelic association. The concept of disease association analyses is based on the idea that it should be possible to identify the effect of any common disease-causing variant by determining the ancient haplotype segment on which it first arose. By using a sufficient number of SNP markers in a genetic study, any common variant, even if it was not assayed directly, should display significant LD with neighboring marker SNPs. This would indicate that the two markers (and the DNA segment between them) represents the ancestral haplotype, and therefore the marker allele should show association with the disease phenotype if the disease-causing sequence variant was located between the two SNP markers. This approach has been used successfully in the past to identify genes responsible for Mendelian disorders (e.g., Hirschsprungs disease, cystic fibrosis) (13, 24). More recently, studies have used the analysis of LD and haplotype structures in the identification of genes affecting common human disorders with more complex inheritance patterns, such as Crohns disease (9, 21, 27). It has been proposed that a set of evenly spaced SNPs at high resolution across the genome should permit whole-genome association studies (14). However, it has become clear that the extent of LD and haplotypes in the human genome is not simply a function of distance between SNPs. Rather, the size of regions of significant LD is highly variable in different regions of the genome and reflects the complex history and interplay of recombination and mutation on different regions of the genome.
Analysis of LD patterns in the human genome.
To assess the variability of LD and the resulting haplotype structure in the human genome, studies aiming at understanding the complex relationships between SNPs across extended regions of the human genome initially focused on the analysis of LD. However, most initial studies either examined only small genomic intervals, used only widely spaced SNPs, or analyzed only small numbers of chromosomes. All these studies suggest that LD between SNPs does not simply decline with increasing distance between SNPs. Rather, the LD pattern is much more complex and difficult to predict. While some studies estimate a genome-wide average extent of LD between SNPs, the degree and extent of LD are highly variable.
As one of the earliest studies, Clark et al. (2) analyzed 9.7 kb of sequence around the LPL gene by sequencing 142 independent chromosomes (i.e., chromosomes from unrelated individuals) and identified 88 SNPs in this region. They described regions of high LD between SNPs that ended sharply at boundaries with neighboring regions that had little or no LD between markers. Subsequent analyses showed that these boundaries coincided with a recombinational hot spot in this gene region (33). Subsequent studies by Taillon-Miller et al. (31), who analyzed SNPs with an average distance of 100 kb between markers on human chromosome X, and Abecasis et al. (1), who investigated LD on human chromosomes 2, 13, and 14 with a marker density of 1 SNP per 33110 kb, describe similar, complex patterns of LD, with detectable LD between SNPs up to 1 Mb apart. Physical distance between markers accounted only for a small portion of the overall variation seen in LD between SNPs. Reich et al. (25) reported on an analysis of LD between SNPs in 160-kb regions around 19 different genes. Here, blocks of significant LD ranged from 6 to 155 kb, with the average around 60 kb in North American individuals of European descent. However, LD was only determined between an initial SNP located within a gene of interest, and additional SNPs at various distances from this initial polymorphism. No dense, contiguous map of SNPs was developed for this analysis, nor was LD determined between SNPs independent from the initial coding SNP. In our previous studies (22), we used a small number of separated individual human chromosomes to analyze over 2.5 Mb of sequence on human chromosome 21 with an average SNP density of one SNP per 2 kb. Here, we observed segments with high LD between SNPs extending up to 81 kb (average 21.7 kb), disrupted by segments of similar or larger size with no significant LD between SNPs.
These studies clearly illustrate that the LD pattern reflects the complex recombination history of the human genome. Therefore, an average LD estimate for the entire human genome will not help determine the number of SNPs necessary to capture the majority of genetic variation across the human genome.
Analysis of human haplotype structures.
If SNPs are not in LD, then the alleles of the SNPs occur in seemingly random combination on individual chromosomes. As a consequence, the alleles of neighboring SNPs, in the absence of LD between them, can form a large number of different haplotypes (2n for n SNPs). In contrast, regions where neighboring SNPs are in significant LD, only a small number of resulting haplotypes are observed. These haplotypes are representative of the ancient haplotypes that have not been broken up by recombination. Therefore, an analysis of haplotype patterns would not only identify regions of significant LD between SNPs, but would also identify those common (presumably ancient) haplotype patterns that represent the majority of chromosomes in that particular genomic interval. Knowledge of these common haplotypes would then permit the identification of "tag" SNPs, individual representative nonredundant SNPs that would unambiguously differentiate all major haplotypes without analyzing all SNPs in that particular region.
Only a handful of studies have attempted to analyze haplotype patterns across extended regions of the human genome, and each study has proposed a different approach to defining haplotype blocks, based on varying degrees of stringency. In our analysis of human chromosome 21, we defined haplotype blocks exclusively based on the detection of significant LD between groups of at least five neighboring SNPs (22). Regions of low LD were not investigated further to define haplotype substructures within these regions. In contrast to this LD-based approach, Patil et al. (23) at Perlegen Sciences directly analyzed the haplotype patterns along the entire chromosome 21. This analysis became possible by the same experimental approach of forming single chromosome hybrid cell lines used in our studies (22) that allow the direct ascertainment of the haploid genome rather than the diploid genome present in human cells. First, 24,047 common SNPs were used to identify regions where haplotypes that were found in at least two samples (i.e., common haplotypes with a frequency of greater than 10%) account for at least 80% of all haplotypes found in the entire sample set of 20 chromosomes. Greedy algorithms (algorithms that always take the best immediate, or local, solution while finding an answer) were used to maximize the ratio of the size of resulting haplotype blocks to the number of SNPs needed to differentiate between them. The resulting blocks contain anywhere from 2 to 114 SNPs, and the entire chromosome is covered by adjacent nonoverlapping blocks. Only 2,793 SNPs need to be genotyped to differentiate all common haplotypes for blocks that contain at least three SNPs, less than 12% of the total number of common SNPs on this chromosome.
In another haplotype analysis of a 500-kb region on human chromosome 5q31, Daly et al. (5) focused on the degree of heterozygosity seen in their sample. They analyzed 103 SNPs with a minor allele frequency of greater than 5%. For each set of five consecutive SNPs, they calculated the ratio of observed to expected heterozygosity. Around local minima of this ratio, i.e., in regions where the observed degree of heterozygosity was smaller than expected, neighboring SNPs were added in stepwise fashion to determine the maximum number of SNPs in this region of low haplotype heterozygosity. Using this approach, they identified blocks up to 100 kb with two to four haplotypes. Although the approach to define the boundaries of each block was different, the resulting picture of the block structure is strikingly similar to the structure reported by Patil et al. (23), a general pattern repeatedly found in other studies as well (11, 30). Interestingly, both Daly et al. (5) and Jeffreys et al. (11) describe recombination events between haplotype block regions that lead to this observed genomic haplotype structure. While the degree of recombination within each block is low, an increased number of recombination events can be observed between adjacent haplotype blocks. To use this difference in recombination, Gabriel et al. (7) recently analyzed 13.4 Mb of human sequence using SNPs at a density of one SNP per 7.8 kb. Blocks were defined by calculating confidence intervals for D', a commonly used measure of LD and recombination between pairs of SNPs. Only groups of SNPs uninterrupted by recombination were considered haplotype blocks. For their study, Gabriel et al. (7) examined samples of European ancestry, Japanese and Chinese samples, as well as samples from West Africa and of African descent (African-American samples). Three to six percent of pairwise comparisons in European samples were strongly suggestive of recombination. An analysis of Japanese and Chinese samples yielded similar data. In contrast, African-American and West African samples showed evidence of recombination in 1418% of all SNP pairs. As a consequence, the average haplotype block size in European and Asian samples was 18 kb, but only 9 kb for African samples. Surprisingly, 7795% of block boundaries coincided between the different samples, with discordant results mainly due to additional recombination events in African samples. This similarity across the different samples is also evident in the haplotype diversity: 51% of haplotypes within blocks were present in European, Asian, and African populations, and 72% were present in two of the three population groups. Twenty-eight percent of haplotypes were specific to West African samples. This analysis provides a first glimpse at the potential of genome-wide haplotype maps and its usefulness across different samples and populations.
Overall, these few studies shed an initial light on the complexity of the human haplotype structure. However, the results are too few to allow a conclusive prediction about the entire human genome, and several questions will need to be answered before haplotype block information can be confidently used to select an efficient subset of "tag SNPs" for future whole-genome association studies. Below, I will address some of these questions, in particular as they pertain to the HapMap project.
How do we define haplotype blocks, and what is the confidence in defining the block boundaries?
In the description of the initial studies above, vastly different analytical approaches have been used to divide the genome into haplotype blocks. All studies will result in a set of blocks of different sizes, but to date no dataset has been analyzed with multiple approaches to compare the resulting patterns. If the primary goal of a haplotype map of the human genome is to define a set of tag SNPs that can be used to capture the majority of haplotype diversity for association studies without interrogating all SNPs in the genome, then approaches similar to the approach used by Patil et al. (23) seem advisable. Since this analysis focuses on striking a balance between the number of haplotypes found in a particular genomic region and the number of SNPs needed to differentiate between them, it will identify a subset of tag SNPs that capture the haplotype variation efficiently. Similar approaches using different computational strategies have been presented at the recent Annual Meeting of the American Society for Human Genetics in Baltimore, MD, October 1519, 2002 (Wang N, Akey JM, Zhang K, and Chakraborty R, "Haplotype block definitions and their relationship with population history, recombination rate and SNP density"; Zhang K, Calabrese P, Chen T, Deng M, Nordborg M, Waterman M, and Sun F, "A dynamic programming algorithm for haplotype block partitioning and its application in association studies"; and Perola M, Koivisto M, Varilo T, Hennah W, Ekelund J, Lugg M, Peltonen L, Ukkonen E, and Mannila H, "A method to find and compare the strength of haplotype block boundaries and its application in populations with different settlement histories"). When applied to the Perlegen data set, the approach by Zhang et al. results in an even smaller number of SNPs needed for a complete analysis of human chromosome 21.
However, as biologists, we prefer to use biologically "meaningful" approaches to divide the genome into blocks over approaches based primarily on reducing the number of SNPs needed for comprehensive analysis. Therefore, methods based on the analysis of recombination as the primary force in creating the haplotype block structure are appealing. In particular, knowing the location of recombinational hot spots may help in fine-mapping analyses of quantitative trait loci (QTL) regions. A segment of the genome uninterrupted by recombination contains only a few genes at most, and thus the functional identification and analysis of disease-causing genes in regions previously identified by linkage analysis can be greatly accelerated and targeted. It seems that an ideal approach for defining haplotype blocks would incorporate both ideas to generate a map useful not only for genome-wide disease association studies but also for QTL fine mapping. It remains to be seen whether this is possible, and no clear strategy has been proposed for the HapMap project to address these issues.
Another recently proposed approach uses metric LD maps rather than arbitrary block definitions in the analysis of LD in extended genomic regions (18). This approach generates metric LD maps that can be generated from both haplotype- and phase-unknown genotype data. The approach identifies recombinational hot spots described in the original studies (5, 10, 35) and has been applied successfully to other datasets (18). Future comparisons will show how this method compares to the approaches described above when applied to large genomic intervals.
An additional question is the confidence of different studies in defining the boundaries of haplotype blocks. Blocks that end with a specific SNP and lead to the next haplotype block starting with the next adjacent SNP seem to have well-defined boundaries that can be determined very accurately and precisely, but variation of the parameters chosen to determine these boundaries quickly reveals the difficulty of this task. While large blocks that include multiple SNPs overlap significantly between the results using varying parameters, boundaries between short blocks including only few SNPs diverge. Since these short regions represent areas of low or short LD, this result is not surprising. If the goal of a haplotype analysis of a genomic region is solely to identify a subset of SNPs that are highly informative and capture the vast majority of genotypic and haplotype variation, then this problem may not have a significant effect. However, if haplotypes are to be used directly in disease association studies, then borders should be defined accurately and confidently to permit the exclusion of neighboring haplotype regions as regions harboring potentially causal mutations. For this, studies will need to compare the different statistical approaches on a common extensive set of SNP genotyping data. Hopefully, the data generated by the HapMap consortium will provide this resource.
How are neighboring blocks related to each other?
Even though the definition of distinct haplotype blocks may imply the opposite, neighboring blocks are not independent. In fact, one may still find significant LD between SNPs residing in different haplotype blocks, and often individual haplotypes in one block exclusively occur in conjunction with one particular haplotype in the neighboring block, while other haplotypes between the two blocks show more random association. Several studies have reported observations to this extent (5, 22). This relationship is essential for the interpretation of disease association studies, as has been shown in the analysis of haplotypes in Crohns disease (27) and familial combined hyperlipidemia (26), and thorough investigation of these relationships will be necessary in a variety of samples to clarify this relationship. If indeed regions of significant LD that are separated by interspersed sequence routinely show significant LD between the blocks, then this may complicate the interpretation of haplotype block-based disease association studies. Our current hope is that the use of haplotype block information will identify a defined interval where the functional gene variant is to be found. However, complex LD relationships of individual haplotype blocks to other nonadjacent blocks will expand the region that needs to be analyzed for potential functional variants. Whether this complex relationship is common between haplotype blocks or rather the exception remains to be seen.
What is the difference between populations?
While studies of LD have looked at differences between populations and ethnic groups (8, 12, 20, 32), haplotype analyses so far have primarily been performed using family-based samples of European descent (e.g., CEPH families, triads from Quebec), or samples of unknown ethnicity (Polymorphism Discovery Resource; Coriell Cell Repositories, Camden, NJ). Little is known about the actual differences in haplotype block boundaries for populations of different evolutionary history, and given the observed differences in the extent of LD between North American and African samples, it is conceivable that differences will be found in the resulting haplotype structures. Only the recent study by Gabriel et al. (7) provides preliminary results on the haplotype structure in different populations, and the populations used in the analysis were not selected to capture the divergence of human populations. Although the cores of haplotype blocks in the examined populations may coincide with haplotype blocks in other populations, we do not know how the block boundaries match. It has been shown for small regions for limited numbers of different populations that most common haplotypes are shared among major populations (6). Their frequencies may vary, and the most common haplotype may not be identical in all ethnic groups, but data from several laboratories suggests that this relationship is true at least for major populations. However, this relationship becomes significantly more complex when isolated populations are investigated. The population bottleneck and the resulting small effective population size result in much larger extent of LD, and can drastically alter allele frequencies compared with other populations, i.e., a rare SNP allele in most populations of the world may have a significantly higher frequency in an isolated population. This change in allele frequency can in turn alter haplotype frequencies in this isolated population and even produce "common" haplotypes that are extremely rare anywhere else in the world. Also, haplotype blocks in these populations may be much larger due to increased LD between SNPs. Since several studies have favored the use of recent population isolates in attempts to identify genetic causes of complex disorders, a haplotype map based on North American, European, African, or Asian populations may not advance these studies. This problem will also not be addressed by the HapMap consortium. Here, the focus will be exclusively on the Yoruba in Nigeria, the Japanese, the Han Chinese, and US residents with Northern European ancestry. The proposal clearly states that these populations were selected solely on the basis of availability of samples and not on their usefulness as representative samples for worldwide population diversity. It remains to be seen whether the resulting haplotype map will have universal usefulness or will merely serve as an initial scaffold for future, redefined population-specific haplotype maps.
Furthermore, the use of either family-based or independent individual samples in the different population samples complicates the comparison. Extensive efforts are under way to improve our ability to derive haplotypes from random unrelated individuals. Recent developments certainly suggest that random samples can be used in a comparable manner for haplotype reconstruction (17); however, additional data will be required to validate these results.
What is the "right" allele frequency and SNP density?
Almost all studies analyzing LD and haplotype structures in the human genome have focused on common SNPs with a minor allele frequency of 5% or greater. The main rationale for this approach is the "common allele-common disease hypothesis" suggesting that the majority of common genetic disorders (the analysis of which may require or at least greatly benefit from the possibility of efficient genome-wide association studies) are caused by genetic variants common in the general population (16). To this extent, Gabriel et al. (7) have shown that using a SNP density of one common SNP every 510 kb captures the majority of common haplotype variation. This efficiency is not significantly increased by adding additional common SNPs in the same region. This suggests that, likewise, any common disease-causing allele not directly ascertained would be captured in one of the haplotypes identified through the analysis of common SNPs.
There is considerable debate, however, about the best "cutoff" allele frequency. It is clear that including rarer SNPs in a haplotype analysis will increase the number of haplotypes found, thereby increasing the diversity, and, depending on the algorithm used to define them, reduce the extent of haplotype blocks. It remains to be seen, however, whether it is best to focus on SNPs with an allele frequency of greater than 10% for the construction of a common haplotype map, and how block boundaries will shift when this restriction is varied. Setting the cutoff at a specific value at this point may be practical, but it remains to be seen whether it is biologically meaningful and useful. Recent data from the analysis of the ApoE region suggest that significant effects on plasma lipid levels would have been missed if a haplotype-based analysis had solely focused on haplotypes constructed using common SNPs (29).
How useful is the resulting information for disease gene identification? What is the power of haplotype-based association studies, especially when based on "tag" SNPs?
Although the primary incentive for constructing a haplotype map of the human genome is its application to disease genetics, it is by no means clear how successful this approach will be. Genetic studies focusing on the underlying causes of common human disorders using traditional linkage-based or candidate gene approaches have mostly failed to produce the results scientists had hoped for. The field is by no means devoid of successful results, but the major breakthroughs expected with advances of the Human Genome Project have not yet materialized. A potential use of a haplotype map was shown by Rioux et al. (27) in their analysis of Crohns disease. Here, a haplotype-based analysis of a region on human 5q31 initially identified by linkage analysis further reduced the number of potential candidate genes. However, it did not conclusively identify the gene responsible for the initial linkage results, and it has not yet led to the identification of the genetic defect in this genomic region underlying the disorder. And while these results do suggest a usefulness of a haplotype map for fine-mapping of linkage results, we have no current data to support our claims that a "tag SNP"-based genome-wide association study design will help us achieve our goal of identifying the genetic causes of common human ailments. In the Crohns disease study, this approach clearly reduced the number of candidate genes for functional analyses compared with the initial QTL region. However, the study also illustrates that haplotype-based association or fine-mapping studies cannot and will not replace extensive experimental functional analyses of candidate genes and potential disease-causing mutations. Clearly, more efficient approaches for these functional evaluations will be needed to ultimately solve the genetic puzzle of complex human disorders and to indicate potential novel approaches for effective treatment.
Despite the lack of experimental data, recent simulation results (at least) suggest that the use of representative tag SNPs reduces the power of association analyses by only 4% compared with assaying every single SNP, assuming that 25% of all SNPs were included as "tag" SNPs (34). In comparison, when a quarter of the SNPs were selected at random, the power was reduced by 12%. Similar results were obtained when only every seventh SNP was included in the tag SNP set. Although these simulations raise hopes for genome-wide haplotype approaches, the ultimate usefulness of a haplotype map for disease association will not solely depend on the actual map but even more on the study design and patient cohort that it is applied to.
To date, we have made significant progress in understanding the haplotype structure of the human genome. Only a few years ago, it was generally assumed that LD declined steadily with increasing distance between SNPs, and our estimate of the number of SNPs required for genome-wide association studies was based on a map of evenly spaced SNPs (14). We have come to accept the more complex haplotype and LD structure of the human genome, but there is no consensus on how to interpret and analyze SNP haplotype data and on the "right" way to construct a genome-wide haplotype map. Private and public efforts are under way to generate such maps, and it remains to be seen whether the results will provide the needed resource to accelerate human disease association studies in the future. Hopefully, the open and as of yet unanswered questions will be addressed quickly before the conclusion of the 3-year HapMap effort, and the answers can be used to guide this major effort in studying the complex and intricate haplotype structure of the human genome. Although a haplotype map of the human genome will not help us in answering all questions pertaining to the genetics of common human disorders, it will certainly provide the knowledge necessary for meaningful SNP-based association studies, and as such it will assist basic researchers and clinicians alike in unraveling the complex genetic etiology of human ailments that affect the majority of people at some point during their lifespan. Rather than selecting SNPs at random for association studies, or selecting them based on our limited ability to identify "functional" SNPs (for the vast majority of SNPs, we do not know whether they actually have a direct function; this is true even for "obvious" candidates such as nonsynonymous SNPs altering the amino acid sequence of a protein), a set of "tag" SNPs identified through the analysis of the haplotype structure of the human genome would provide an invaluable resource for any researcher whether he wants to use SNPs to interrogate a possible role of a candidate gene (or region) in a specific disease or to perform a genome-wide association study in his patient cohort. Whether this approach will indeed require the estimated 300,000600,000 tag SNPs can only be answered once more experimental data become available.
FOOTNOTES
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: M. Olivier, Human and Molecular Genetics Center, Dept. of Physiology, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI 53226 (E-mail: molivier{at}mcw.edu)
References
|
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Visit Other APS Journals Online |