Containing multitudes: Focus on "Novel and nondetected human signaling protein polymorphisms"

Dietrich A. Stephan1 and Susan B. Glueck2

1 Research Center for Genetic Medicine, Children’s National Medical Center, Washington, District of Columbia 20010
2 Deputy Editor, Physiological Genomics

SINGLE NUCLEOTIDE POLYMORPHISMS (SNPs) can contribute directly to disease predisposition by modifying a gene’s function, or they can be used as genetic markers to detect nearby disease-causing mutations through association or linkage studies. There are three general classes of single nucleotide variants: those with strong functional significance which dramatically alter a gene’s behavior (classic mutations); those with more subtle functional effects that predispose to disease in concert with an individual’s genetic background or environment (functional SNPs); and those which are completely silent with respect to function (nonfunctional SNPs). Single nucleotide mutations cause a detectable phenotype on their own and can result in the familiar Mendelian inheritance diseases. Functional SNPs are by far the most interesting class of variants since they are thought to occur at high frequencies within the general population and, when present in disadvantageous combinations, can result in disease. Examples of such common multigenic diseases are thought to include diabetes, cancer progression, and heart disease (the subject of this editorial). The distinction between a mutation and a functional SNP is often vague and can be boiled down to the level of penetrance of the nucleotide variant. For example, single nucleotide mutations have very high penetrance and cause disease on their own in most cases, whereas functional SNPs have lowered penetrance and enhance an individual’s risk for disease by a small amount. This risk can be raised when combined with another functional SNP in the genome. Finally, SNPs can be completely silent functionally. This class of single nucleotide variant is by far the most common, with a polymorphic base estimated to occur approximately once every 1,300 nucleotides throughout the genome (1). These types of SNPs are useful as genetic markers to localize nearby disease-causing events through association analyses, linkage disequilibrium studies in founder populations, and classic linkage analyses.

It is extremely difficult to assign a pathogenic role to a common functional SNP. Almost exclusively, nonsynonymous (amino acid changing) SNPs have been assumed to be the sole type of functional SNPs that predispose to multigenic disease. This is in part because it is immediately obvious that a protein alteration exists which could change function, especially in the case where the nucleotide change resides in an important peptide motif (9). Even so, one is ultimately forced to validate any presumed causative nonsynonymous SNP finding in vitro or in vivo, a substantial task. This undertaking is further complicated if there are multiple variants which are presumed to work in concert to produce an altered phenotype, and which must be validated together. Functional SNPs may also reside within regulatory elements, for example, which may be within introns, promoters, or distant enhancer or repressor elements. These functional SNPs, in most cases, will not result in an amino acid change (synonymous SNPs), but could alter splicing, regulation, transcript stability, etc. Examples of these are rare, simply because it is much more difficult to assess the mechanism of action of these events. Thus synonymous SNPs are largely ignored with respect to disease causality.

The great, yet unrealized, promise of nonfunctional SNP markers lies in being able to perform whole-genome high-density SNP typing to identify blocks of haplotypes (as a first step to identifying the functional changes) that come together in affected individuals to contribute to common multigenic diseases. The technology for this type of whole-genome SNP typing analysis in large numbers of individuals is still several years away, even though significant effort is being invested in identifying a minimum number of informative SNPs which would ascertain all human haplotypes (3, 5). In the interim, we are forced to take a candidate gene approach to multigenic disease. This entails preselecting genes based on function and then typing SNPs in these genes in affected individuals and controls to identify those SNPs which have skewed frequencies in affected individuals. Presumably, if the control population is matched correctly on ethnicity and geography, this would indicate that a certain "flavor" of gene has a pathogenic role.

To facilitate SNP identification for use as a tool for disease gene identification, the Human Genome Project (HGP) and the Celera genome sequencing endeavor have established large databases of SNPs which have been identified primarily through expressed sequence tag (EST) sequencing projects (HGP) or redundant genome sequencing from 10 alleles (Celera) (6, 13). There are acknowledged problems with these data repositories. The SNPs identified by the HGP are largely biased toward the last exons of genes and probably have many false-positives due to the single-read nature of EST sequencing data used to annotate these SNPs (10, 11). The Celera SNP database has a more even distribution of SNPs across the genome, but is derived from a small number of chromosomes and thus probably does not capture the rarer SNPs in the representative populations sequenced. It is clear that there are both sequencing errors and undetected SNPs in the SNP repositories. There are many ongoing efforts to remedy inherent errors in the repositories so that they are accurate as well as comprehensive. The only true way to do this is to sequence the genomes of many individuals from all ethnicities and geographies, and of course this is not yet feasible.

There are a number of functional SNPs in a variety of genes (such as angiotensinogen, for example) that have been found to be associated with heart failure. In this release of Physiological Genomics, Lynch et al. (Ref. 7; see page 159 in this release) seek to identify additional functional SNPs that may contribute to congestive heart failure (CHF) acting either independently or in concert. The targets of investigation in this study are the small G proteins (7) and their downstream signal transducers. This set of candidate genes was chosen based on previous reports that perturbation of this signal transduction pathway can contribute to heart failure (reviewed in Ref. 8).

Lynch and colleagues (7) used both denaturing high-performance liquid chromatography (dHPLC) and double-stranded sequencing to screen for functional SNPs in a number of genes in a cohort of 144 white and African-American heart failure patients. The sensitivity of the dHPLC screening method for a subset of exons was first verified by direct sequencing. By sequencing exons from the heterotrimeric G proteins G{alpha}q, G{alpha}11, and G{alpha}s; the Rab small G proteins; the signaling factors Ras and Rad; and the MAP kinase, Erk2, the authors identified a number of novel SNPs in addition to those previously characterized by the Celera and HGP SNP databases. For example, in the Rad gene, Lynch et al. found a novel SNP resulting in an amino acid substitution in exon 2. The functional role of this nonsynonymous SNP and of the many synonymous SNPs that were identified was not interrogated. In this sense, the study is strictly a mutational analysis of several candidate genes for CHF with a negative outcome.

As a byproduct of the mutation screening, the authors were able to draw some conclusions regarding the accuracy of the SNP databases, at least with respect to the several genes that they screened. Interestingly, 69% of the SNPs found in the heart failure patients in this study were not annotated in either the Celera or National Center for Biotechnology Information (NCBI) databases. It is not surprising that Lynch et al. identified novel SNPs in their cohort of whites and African Americans by thorough sequencing of representative alleles for these genes. This has been found to be the case in numerous SNP identification studies and relates to the strategies employed by the large consortia building the repositories as described above (2, 5, 12). In addition, population-specific SNPs were never the stated focus of the consortia. What is a bit more disconcerting for the end users of the repositories is the high false-positive rate reported by Lynch and colleagues; 56% of SNPs in the Celera and HGP databases were not found in the cohort of patients screened. As an example, previously reported database SNPs for a total of six exons of two of the heterotrimeric G proteins, G{alpha}q and G{alpha}s, were not detected. The ramifications of this high false-positive rate on the selection of SNPs from the databases for genotyping efforts are obvious. However, drawing conclusions regarding the integrity of the SNP data across the entire genome from a small study of ethnically homogeneous individuals is probably premature (4). What will be required before the SNP data repositories become user friendly is annotation of the SNP frequencies across all human populations. However, this would be a very difficult task in the absence of large-scale sequencing in a large cohort of diverse individuals at many loci.

There may be significant information embedded in the study that is still untapped. Polymorphisms that were identified were not utilized in a case-control type fashion to determine whether any of the SNP alleles were over- or underrepresented in the CHF patients. This would entail enlisting a similarly sized cohort of matched controls and genotyping them for the SNPs that showed variation. It may be the case that a synonymous SNP is overwhelmingly present in the cases and not present in controls, which would lead to an assumption of causality (or a variant nearby causing disease). Conversely, a functional SNP may be absent in the cohort of patients studied and thereby be causative. Clearly, we are still at the beginning of our journey toward an accurate catalog of all human variation. Similarly, we do not yet appreciate the multitude of ways that synonymous SNPs can affect the function of a gene to contribute to disease risk. Finally, we still have no way to cost-effectively type SNPs across the whole genome or analyze multiple locus interactions to come up with truly comprehensive diagnostics for multigenic disease caused by interacting functional SNPs. When bolstered by careful and thorough explorations of genetic variation, studies such the one presented here by Lynch and colleagues may pave the way to understanding devastating common diseases such as congestive heart failure.

FOOTNOTES

Article published online before print. See web site for date of publication (http://physiolgenomics.physiology.org).

Address for reprint requests and other correspondence: D. A. Stephan, Research Center for Genetic Medicine, Children’s National Medical Center, 111 Michigan Ave., NW, Washington, DC 20010 (E-mail: DStephan{at}cnmcresearch.org).

10.1152/physiolgenomics.00103.2002.

REFERENCES

  1. Cargill M, Altshuler D, Ireland J, Sklar P, Ardlie K, Patil N, Shaw N, Lane CR, Lim EP, Kalyanaraman N, Nemesh J, Ziaugra L, Friedland L, Rolfe A, Warrington J, Lipshutz R, Daley GQ, and Lander ES. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet 22: 231–238, 1999.[ISI][Medline]
  2. Douabin-Gicquel V, Soriano N, Ferran H, Wojcik F, Palierne E, Tamim S, Jovelin T, McKie AT, Le Gall JY, David V, and Mosser J. Identification of 96 single nucleotide polymorphisms in eight genes involved in iron metabolism: efficiency of bioinformatic extraction compared with a systematic sequencing approach. Hum Genet 109: 393–401, 2001.[ISI][Medline]
  3. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, and Altshuler D. The structure of haplotype blocks in the human genome. Science 296: 2225–2229, 2002.[Abstract/Free Full Text]
  4. Iwasaki H, Shinohara Y, Ezura Y, Ishida R, Kodaira M, Kajita M, Nakajima T, Shiba T, and Emi M. Thirteen single-nucleotide polymorphisms in the human osteopontin gene identified by sequencing of the entire gene in Japanese individuals. J Hum Genet 46: 544–546, 2001.[ISI][Medline]
  5. Johnson GC, Esposito L, Barratt BJ, Smith AN, Heward J, Di Genova G, Ueda H, Cordell HJ, Eaves IA, Dudbridge F, Twells RC, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough SC, Clayton DG, and Todd JA. Haplotype tagging for the identification of common disease genes. Nat Genet 29: 233–237, 2001.[ISI][Medline]
  6. Lander ES et al. (International Human Genome Sequencing Consortium). Nature 409: 860–921, 2001.[ISI][Medline]
  7. Lynch RA, Wagoner L, Shunan L, Sparks L, Molkentin J, and Dorn GW II. Novel and nondetected human signaling protein polymorphisms. Physiol Genomics 10: 159–168, 2002. First published July 9, 2002; 10.1152/physiolgenomics.00030.2002.
  8. Molkentin JD and Dorn GW II. Cytoplasmic signaling pathways that regulate cardiac hypertrophy. Annu Rev Physiol 63: 391–426, 2001.[ISI][Medline]
  9. Ng PC and Henikoff S. Accounting for human polymorphisms predicted to affect protein function. Genome Res 12: 436–446, 2002.[Abstract/Free Full Text]
  10. Sherry ST, Ward M, and Sirotkin K. Use of molecular variation in the NCBI dbSNP database. Hum Mutat 15: 68–75, 2000.[ISI][Medline]
  11. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, and Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29: 308–311, 2001.[Abstract/Free Full Text]
  12. Small KM, Seman CA, Castator A, Brown KM, and Liggett SB. False positive non-synonymous polymorphisms of G-protein coupled receptor genes. FEBS Lett 516: 253–256, 2002.[ISI][Medline]
  13. Venter JC et al. (Celera Genomics). The sequence of the human genome. Science 291: 1304–1351, 2001.[Abstract/Free Full Text]




This Article
Full Text (PDF)
Citation Map
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Google Scholar
Articles by Stephan, D. A.
Articles by Glueck, S. B.
Articles citing this Article
PubMed
PubMed Citation
Articles by Stephan, D. A.
Articles by Glueck, S. B.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Visit Other APS Journals Online