Department of Biosciences, Karolinska Institute, Novum, S-141 57 Huddinge, Sweden and
1 Cancer Genetics and Epidemiology Program, Lombardi Cancer Center, Georgetown University Medical Center, Washington DC, USA
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() |
---|
Abbreviations: SNP, single nucleotide polymorphisms.
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() |
---|
The International SNP Map Working Group has located 1.42 million SNPs and Celera 2.1 million (6,7). The estimates on the number of SNPs in exonic DNA range from 20 000 to 60 000. Assuming that there are 30 000 genes in the human genome, an average gene could contain one to two SNPs (6,7). About one-half of the SNPs change the amino acid and in 45% of the cases the change is non-conservative (7). SNPs are not equally distributed among genes coding for different classes of proteins, but if they were, a large proportion of proteins would harbor an amino acid change. However, highly conserved proteins may not contain any polymorphisms (8,9). On the other hand, even some intronic polymorphisms, including those at promotor and other regulatory sequences, may be of importance.
Traditionally, the approach to the identification of SNPs as cancer risk factors was to identify candidate genes and polymorphisms within them, tested in an epidemiological study. Great weight would be given to those SNPs that result in functional phenotypic effect (i.e. altered protein levels or function), known a priori and tested in the context of biologically based hypothesis. However, it is now recognized that this approach, while most rational, is yielding to the power of new technologies that have identified a large number of SNPs and where phenotypic effects have not yet been identified. This has resulted in a very different approach to the identification of genetic risk factors for cancer.
Large laboratories are planning or carrying out extensive SNP analysis in disease candidate genes and there will be an explosion of genotype data in the near future. There will be a large number of positive associations, and the recent history of conflicting polymorphism studies in the few metabolic genes is frightening. Lander and Kruglyak have warned about genetic analysis of complex traits: `Scientific disciplines erode their credibility when a substantial proportion of claims cannot be replicatedeven more so when the claims reach not only the professional journals but also the evening news' (10). For illustration about what is to come, let us assume that there are 100 distinct types of cancer (1). Almost all of them are multifactorial, caused by geneenvironment interactions (11,12). SNP analysis of candidate genes in cases and controls is a preferred approach for such diseases (3). For any cancer, candidate genes may number in the hundreds, including those involved in cell-cycle control, signal transduction, DNA repair, cell-to-cell communication and metabolism. Assuming that 100 critical genes will be selected for SNP analysis in any cancer, these may harbor an equal number of relevant SNPs. Thus, among all 100 types of cancers 10 000 polymorphisms will be tested. Usually the study population will be stratified by age, sex, ethnicity, life-style and exposures; genegene and geneenvironment interactions may also be tested. The number of comparisons may become huge in a single study: two categories on three variables will result in eight strata, and on five variables in 32 strata. Thus, even in one cancer, assayed for 100 genes and analyzed only for five variables, 3200 statistical tests will be carried out, giving 160 significant results at a 5% level. This is a typical multiple testing problem.
There are various ways of addressing the multiple testing problem imminent in SNP analyses. In mutation analysis and whole genome linkage analysis, the significance levels have been adjusted based on the number of tests carried out, e.g. by the Bonferoni correction (10,13). The statistical corrections may be useful, and the sample size allowing, it may be advisable to carry out the analysis in two halves, one for generating and the other for testing the hypothesis. However, in SNP analysis there are four biologically motivated considerations for the interpretation of the results: (i) selection of the genes and pathways have to be based on mechanistic knowledge; (ii) a known functional effect induced by the polymorphism makes positive results persuasive, as do SNPs that induce non-conservative amino acid changes in important functional domains of the protein; (iii) as a first approximation, an additive model of gene dosage effect can be assumed, i.e. the effect in heterozygotes should be between the two homozygotes, a test that can always be done. A sample size should be large enough to enable a proper testing of the three genotypes. Gene dosage effects may be non-linear, but heterozygotes should lie at or between homozygotes; (iv) finally, demonstration of loss of heterozygosity (LOH) of the wild-type allele in tumor would be strong evidence of an effect (14). However, this will require a phenotypic effect that will offset the normal function, such as loss-of-function, and availability of tumor samples.
As new SNPs are screened and associations identified, they then become candidate gene polymorphisms. The results are subject to interpretation and caveats, as are the findings of any new epidemiological findings. Specifically, no single study is definitive, each has its own limitations, and all results await replication in different populations. Moreover, following the finding of a candidate gene polymorphism, complementary studies should begin that would identify a phenotype associated with the nucleotide base change. Only the combined consideration of studies in different populations replicating similar results, and supportive laboratory evidence will result in the belief that a candidate gene polymorphism is indeed a cancer risk factor. Therefore, pending additional studies, we as scientists need to inform the media and the public that new candidate gene polymorphism reports are preliminary. Also, following the publication of positive results, journals should be amenable to publishing null results. Otherwise, publication bias can result (15).
We believe that the large-scale genotyping of samples from cancer patients will lead to important breakthroughs in understanding geneenvironment and genegene interactions as mechanistic basis for the common polygenic cancers. Several authors have considered design issues on molecular epidemiology studies and we concur that these are fundamental to the success of genotyping studies (16). However, we are concerned about the overwhelming volume of data that are about to be generated. To guarantee scientific progress the results need to be skillfully and responsibly processed by the individual research teams before publication.
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() |
---|