1 Research Center for Genetic Medicine, Childrens National Medical Center, Washington, District of Columbia 20010
2 Deputy Editor, Physiological Genomics
SINGLE NUCLEOTIDE POLYMORPHISMS (SNPs) can contribute directly to disease predisposition by modifying a genes function, or they can be used as genetic markers to detect nearby disease-causing mutations through association or linkage studies. There are three general classes of single nucleotide variants: those with strong functional significance which dramatically alter a genes behavior (classic mutations); those with more subtle functional effects that predispose to disease in concert with an individuals genetic background or environment (functional SNPs); and those which are completely silent with respect to function (nonfunctional SNPs). Single nucleotide mutations cause a detectable phenotype on their own and can result in the familiar Mendelian inheritance diseases. Functional SNPs are by far the most interesting class of variants since they are thought to occur at high frequencies within the general population and, when present in disadvantageous combinations, can result in disease. Examples of such common multigenic diseases are thought to include diabetes, cancer progression, and heart disease (the subject of this editorial). The distinction between a mutation and a functional SNP is often vague and can be boiled down to the level of penetrance of the nucleotide variant. For example, single nucleotide mutations have very high penetrance and cause disease on their own in most cases, whereas functional SNPs have lowered penetrance and enhance an individuals risk for disease by a small amount. This risk can be raised when combined with another functional SNP in the genome. Finally, SNPs can be completely silent functionally. This class of single nucleotide variant is by far the most common, with a polymorphic base estimated to occur approximately once every 1,300 nucleotides throughout the genome (1). These types of SNPs are useful as genetic markers to localize nearby disease-causing events through association analyses, linkage disequilibrium studies in founder populations, and classic linkage analyses.
It is extremely difficult to assign a pathogenic role to a common functional SNP. Almost exclusively, nonsynonymous (amino acid changing) SNPs have been assumed to be the sole type of functional SNPs that predispose to multigenic disease. This is in part because it is immediately obvious that a protein alteration exists which could change function, especially in the case where the nucleotide change resides in an important peptide motif (9). Even so, one is ultimately forced to validate any presumed causative nonsynonymous SNP finding in vitro or in vivo, a substantial task. This undertaking is further complicated if there are multiple variants which are presumed to work in concert to produce an altered phenotype, and which must be validated together. Functional SNPs may also reside within regulatory elements, for example, which may be within introns, promoters, or distant enhancer or repressor elements. These functional SNPs, in most cases, will not result in an amino acid change (synonymous SNPs), but could alter splicing, regulation, transcript stability, etc. Examples of these are rare, simply because it is much more difficult to assess the mechanism of action of these events. Thus synonymous SNPs are largely ignored with respect to disease causality.
The great, yet unrealized, promise of nonfunctional SNP markers lies in being able to perform whole-genome high-density SNP typing to identify blocks of haplotypes (as a first step to identifying the functional changes) that come together in affected individuals to contribute to common multigenic diseases. The technology for this type of whole-genome SNP typing analysis in large numbers of individuals is still several years away, even though significant effort is being invested in identifying a minimum number of informative SNPs which would ascertain all human haplotypes (3, 5). In the interim, we are forced to take a candidate gene approach to multigenic disease. This entails preselecting genes based on function and then typing SNPs in these genes in affected individuals and controls to identify those SNPs which have skewed frequencies in affected individuals. Presumably, if the control population is matched correctly on ethnicity and geography, this would indicate that a certain "flavor" of gene has a pathogenic role.
To facilitate SNP identification for use as a tool for disease gene identification, the Human Genome Project (HGP) and the Celera genome sequencing endeavor have established large databases of SNPs which have been identified primarily through expressed sequence tag (EST) sequencing projects (HGP) or redundant genome sequencing from 10 alleles (Celera) (6, 13). There are acknowledged problems with these data repositories. The SNPs identified by the HGP are largely biased toward the last exons of genes and probably have many false-positives due to the single-read nature of EST sequencing data used to annotate these SNPs (10, 11). The Celera SNP database has a more even distribution of SNPs across the genome, but is derived from a small number of chromosomes and thus probably does not capture the rarer SNPs in the representative populations sequenced. It is clear that there are both sequencing errors and undetected SNPs in the SNP repositories. There are many ongoing efforts to remedy inherent errors in the repositories so that they are accurate as well as comprehensive. The only true way to do this is to sequence the genomes of many individuals from all ethnicities and geographies, and of course this is not yet feasible.
There are a number of functional SNPs in a variety of genes (such as angiotensinogen, for example) that have been found to be associated with heart failure. In this release of Physiological Genomics, Lynch et al. (Ref. 7; see page 159 in this release) seek to identify additional functional SNPs that may contribute to congestive heart failure (CHF) acting either independently or in concert. The targets of investigation in this study are the small G proteins (7) and their downstream signal transducers. This set of candidate genes was chosen based on previous reports that perturbation of this signal transduction pathway can contribute to heart failure (reviewed in Ref. 8).
Lynch and colleagues (7) used both denaturing high-performance liquid chromatography (dHPLC) and double-stranded sequencing to screen for functional SNPs in a number of genes in a cohort of 144 white and African-American heart failure patients. The sensitivity of the dHPLC screening method for a subset of exons was first verified by direct sequencing. By sequencing exons from the heterotrimeric G proteins Gq, G
11, and G
s; the Rab small G proteins; the signaling factors Ras and Rad; and the MAP kinase, Erk2, the authors identified a number of novel SNPs in addition to those previously characterized by the Celera and HGP SNP databases. For example, in the Rad gene, Lynch et al. found a novel SNP resulting in an amino acid substitution in exon 2. The functional role of this nonsynonymous SNP and of the many synonymous SNPs that were identified was not interrogated. In this sense, the study is strictly a mutational analysis of several candidate genes for CHF with a negative outcome.
As a byproduct of the mutation screening, the authors were able to draw some conclusions regarding the accuracy of the SNP databases, at least with respect to the several genes that they screened. Interestingly, 69% of the SNPs found in the heart failure patients in this study were not annotated in either the Celera or National Center for Biotechnology Information (NCBI) databases. It is not surprising that Lynch et al. identified novel SNPs in their cohort of whites and African Americans by thorough sequencing of representative alleles for these genes. This has been found to be the case in numerous SNP identification studies and relates to the strategies employed by the large consortia building the repositories as described above (2, 5, 12). In addition, population-specific SNPs were never the stated focus of the consortia. What is a bit more disconcerting for the end users of the repositories is the high false-positive rate reported by Lynch and colleagues; 56% of SNPs in the Celera and HGP databases were not found in the cohort of patients screened. As an example, previously reported database SNPs for a total of six exons of two of the heterotrimeric G proteins, Gq and G
s, were not detected. The ramifications of this high false-positive rate on the selection of SNPs from the databases for genotyping efforts are obvious. However, drawing conclusions regarding the integrity of the SNP data across the entire genome from a small study of ethnically homogeneous individuals is probably premature (4). What will be required before the SNP data repositories become user friendly is annotation of the SNP frequencies across all human populations. However, this would be a very difficult task in the absence of large-scale sequencing in a large cohort of diverse individuals at many loci.
There may be significant information embedded in the study that is still untapped. Polymorphisms that were identified were not utilized in a case-control type fashion to determine whether any of the SNP alleles were over- or underrepresented in the CHF patients. This would entail enlisting a similarly sized cohort of matched controls and genotyping them for the SNPs that showed variation. It may be the case that a synonymous SNP is overwhelmingly present in the cases and not present in controls, which would lead to an assumption of causality (or a variant nearby causing disease). Conversely, a functional SNP may be absent in the cohort of patients studied and thereby be causative. Clearly, we are still at the beginning of our journey toward an accurate catalog of all human variation. Similarly, we do not yet appreciate the multitude of ways that synonymous SNPs can affect the function of a gene to contribute to disease risk. Finally, we still have no way to cost-effectively type SNPs across the whole genome or analyze multiple locus interactions to come up with truly comprehensive diagnostics for multigenic disease caused by interacting functional SNPs. When bolstered by careful and thorough explorations of genetic variation, studies such the one presented here by Lynch and colleagues may pave the way to understanding devastating common diseases such as congestive heart failure.
FOOTNOTES
Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: D. A. Stephan, Research Center for Genetic Medicine, Childrens National Medical Center, 111 Michigan Ave., NW, Washington, DC 20010 (E-mail: DStephan{at}cnmcresearch.org).
10.1152/physiolgenomics.00103.2002.
REFERENCES
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Visit Other APS Journals Online |