NEWS

Latest HapMap Update Aims to Direct Researchers to Genetic Basis of Disease

Charlie Schmidt

When scientists decoded the human genome, they produced the genetic blueprint for an abstract, generic individual who doesn't really exist. Compiled from a few anonymous samples, the DNA sequence derived by the Human Genome Project represented a typical human but offered few insights into the genetic variation that drives individual features, such as physical appearance and disease susceptibility. Because everyone on the planet is more than 99.9% genetically identical, this abstract template was a logical place to start.

More recently, however, scientists have focused on the residual variation, hoping to identify new genetic markers for disease and targets for drug therapy. The variations of most interest are simple DNA "spelling errors" called single-nucleotide polymorphisms (SNPs). The precise number of SNPs, although unknown, likely approaches 10 million, scientists say. Fortunately, scientists don't have to assess each SNP independently to establish its role in disease. This is because related SNPs tend to clump together in discrete, inherited blocks called haplotypes, which break the genome up into navigable chunks. Most humans share the same haplotypes, and the common SNPs within them are found in many of the world's populations.

Phase 2 HapMap Introduced

Efforts to use SNPs in biomedical research recently achieved a major milestone. On October 26, scientists with the International HapMap Project—a multinational consortium of nine research groups from five countries—unveiled their most recent, and likely final, version of the haplotype map (HapMap) at the annual meeting of the American Society of Human Genetics in Salt Lake City. Known as the Phase 2 HapMap, this depiction of SNP locations on the genome was compiled with samples taken from 269 people from four populations: the Yoruba tribe of Nigeria, Japanese people from Tokyo, Han Chinese from Beijing, and U.S. citizens in Utah with western and northern European ancestry. Phase 2 updates an earlier version of the HapMap (Phase 1) that was introduced by the consortium in March of this year. That version mapped the locations of roughly 1 million SNPs. Phase 2 raised the number of mapped SNPs to 3.5 million and captured most of the genetic variation found in the human race, according to David Altshuler, M.D., Ph.D., who directs the Program in Medical and Population Genetics at the Broad Institute of MIT and Harvard University, one of the consortium members. From start to finish, the HapMap took 3 years and cost $138 million.



View larger version (144K):
[in this window]
[in a new window]
 
David Altshuler

 
Scientists maneuver through the HapMap by referring to nearly 400,000 "tag" SNPs that act as signposts for individual haplotypes. Each tag is associated with a particular haplotype, which obviates the need to assess the role of every SNP on the genome independently. This feature speeds analysis and makes it easier to assess how colocated SNPs working together exert some effect on the individual.

Scientists typically apply the HapMap in two kinds of disease-association studies. For hypothesis-driven studies, the HapMap identifies tag SNPs for candidate genes linked putatively to a given disease. In these cases, scientists use the HapMap to identify tag SNPs corresponding to candidate genes of interest. The tags are then screened in case–control studies to assess their frequency among people both with and without the disease under investigation. Scientists can then take a closer look at the associated genomic regions to seek out other variants that might also contribute to the risk.

The second class of studies falls in the realm of exploratory research. In these studies—known as whole-genome scans—scientists survey many SNPs in case and control DNA samples. The goal is to identify entirely new candidate genes, which can then be queried in the HapMap to find more SNPs for investigation.

Lisa Brooks, Ph.D., who directs the Genetic Variation Program as the National Human Genome Research Institute, emphasized the HapMap's utility for studying complex genetic diseases that arise from the small contributions of many genes. "If you're looking at Mendelian diseases that come from variations in just one gene, you can probably get those from linkage studies in affected families," she explained. "But linkage studies don't work well for complex diseases like cancer, where you need to look at the interactions of many genes in unrelated groups of people."



View larger version (116K):
[in this window]
[in a new window]
 
Lisa Brooks

 
A Focus on Common Variants

Among the HapMap's notable features is that it contains only common SNPs, which were specified by the consortium as those occurring in at least 5% of the chromosomes in the sampled populations. The 5% cutoff was warranted, Altshuler contends, for two reasons—because common SNPs lie at the heart of most inherited, complex diseases and because findings that link common SNPs with disease can be more easily replicated in additional cohorts. Replication is a key to establishing causality in disease-association studies, Altshuler says. This is particularly true because many of these genes contribute only a small fraction to the overall risk. Moreover, genes that are associated with disease in one population may not be associated with the same disease in another. The only way to be sure of a gene's suspected role is to study it in many people. And depending on the strength of the effect, this could require sample sizes numbering in the hundreds or even thousands of cases.

Findings with rare SNPs are even harder to replicate. "It's difficult to make a case for a rare genetic variant unless it produces a totally obvious functional change," Althshuler said. "I'd want to be able to say this rare variant is causal by simply staring at it, but I can't do that. That's not to say that rare SNPs might not be important in some cases, but the technology isn't ready to make sense of them yet. So, it's reasonable to focus on the straightforward, common cases, and as the technology develops, we'll consider the rare variants further. For now, we need to walk before we can run."

Limited Diversity?

Even so, critics contend that because only four populations were sampled for the HapMap, common SNPs from some regions, particularly Africa, may not be reflected by its contents. Humans evolved in Africa, and so that is where most of our genetic diversity is concentrated. Even the Yoruba Nigerians sampled for the HapMap are genetically distinct from populations on the rest of the continent. "If the HapMap was weighted by the amount of genetic variation, we'd sample a lot more African populations and a lot fewer from outside of Africa," says Gonçalo Abecasis, D.Phil., associate professor of biostatistics at the University of Michigan's School of Public Health. "Clearly, this is not what happened, and the choice partly reflects the fact that most genetic association studies aren't carried out there."

The consequence, argues Sarah Tishkoff, Ph.D., assistant professor of biology at the University of Maryland, is that SNPs common to African populations might be rare elsewhere in the world. Therefore, this pool of common genetic variation that exists in billions of people might not be accessible to those using the HapMap for disease-association research.

Altshuler acknowledges that the assessment of tag SNP performance using DNA samples from around the world is a priority. Researchers are now working independently to compare SNP profiles from their own study populations to those in the HapMap, he said. Preliminary results appearing in the peer-reviewed literature are encouraging, he added.

But Kenneth Kidd, Ph.D., a well-known HapMap critic, says his own research on 38 different populations shows that HapMap findings are concordant only among non-Africans. Kidd, professor of genetics and psychiatry at Yale University, insists that the HapMap is best suited for studies of people of Japanese, Chinese, or European descent, who tend to be genetically homogenous.

Other scientists agree. Peter Kraft, Ph.D., an assistant professor of epidemiology and biostatistics at the Harvard School of Public Health, offers a "buyer beware" warning to those using the HapMap for their own research. "Scientists should understand their own African population might not look anything like the Yorubans," he said. "Similarly, for those doing studies in the United States, there's probably a big difference between Hispanic Latinos and any of the four sampled populations. So, it's worth your time and money to do some genotyping in your own samples to ensure what you see is consistent with what you find in the HapMap."

Kraft currently uses the HapMap for cancer research in several major Harvard studies, including the Nurses' Health Study, which was launched in 1976, and the Health Professionals Follow-up Study, which dates back to 1986. Within these studies, the main function of the HapMap project is to select SNPs for candidate genes, such as those implicated in breast, prostate, endometrial, and colorectal cancer. For these studies, the HapMap represents a major advance, Kraft said.

Until recently, it would have been difficult to assess most of the common variations in a given gene—but the HapMap has captured almost all of these variations and validated them. As for whole genome studies of cancer, "We wouldn't be able to do them without something like this," he said. "It would be far too expensive to go after every SNP, and furthermore a lot of that energy would be wasted because so many SNPs are redundant in the genome."

Toward the Future

According to NHGRI's Brooks, consortium members have plans to validate the HapMap's content with samples from seven new populations: the Luhya and Massai peoples from Kenya; African Americans in Oklahoma; Mexican Americans in California; Gujarati Indians (from India) in Texas; Tuscans in Italy; and Chinese in Denver. (Funded investigators have not yet been identified.) But she emphasizes that the HapMap was never intended to be a study of global variation patterns. "This wasn't an anthropological study," she said. "It was designed to create a useful resource for biomedical research."

Meanwhile, the genomic landscape is changing rapidly. New chips used in genotyping already contain 500,000 SNPs; those expected within the next few years could contain several million more. Similarly, sequencing costs might drop to the extent that scientists bypass genotyping altogether, delving rather into the depths of the genome to sequence increasingly rare variants in their own populations. Some predict that in the world of ultracheap sequencing, the HapMap could become outdated. But Altshuler insists that the price difference between genotyping and sequencing will remain proportional, even as they fall. There will be a long-standing need for the HapMap, he says, which will benefit from analytical progress and absorb new information as it becomes available.



             
Copyright © 2005 Oxford University Press (unless otherwise stated)
Oxford University Press Privacy Policy and Legal Statement