a Human Genetics Research Division, Southampton University School of Medicine, Southampton SO16 6YD, UK.
b Cardiovascular Institute, Fu Wai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 167 Beilishi Road, Beijing 100037, China.
Prof. Ian NM Day, Human Genetics Research Division, Southampton University School of Medicine, Duthie Building (MP 808), Southampton General Hospital, Tremona Road, Southampton SO16 6YD, UK. E-mail: inmd{at}soton.ac.uk
Expt. 1: Form of seed. From 253 hybrids 7324 seeds were obtained in the second trial year. Among them were 5474 round or roundish ones and 1850 angular wrinkled ones. Therefore the ratio 2.96:1 is deduced. These words derive from Gregor Mendel's obscure paper of 1865, now most readily accessible in translation on the Internet.1 Of course he was counting peas, not people, but he was finding out about fundamental modes of determination of inherited (genetic) characteristics. Traditional epidemiology took its roots around the same time from the study of infectious epidemics such as the 1854 cholera outbreak mapped by Dr John Snow to the Broad Street pump in London. In the absence of many definable genetic variables, or indeed recognition of genetic factors, the study of environmental variables with immediate implications for public health became prevalent. Genetic epidemiology has been defined as a science that deals with aetiology, distribution and control of disease in groups of relatives and with inherited causes in populations.2 The dimensions of physical linkage of genetic markers on chromosomes, occurrence of two copies of each marker per individual because we are diploid, and utility of studying transmission in families on the grounds of marker heritability, are but some of the new demands faced by the synthesis of environmental and genetic epidemiology.
The field of molecular genetics spans little more than two decades and has been underpinned by dramatic advances in molecular cloning, DNA sequencing and bioinformatics. Industrial-scale efforts are in progress to define the 3 x 109 b.p. Human Genome Sequence encoding ~100 000 genes, (announcement of near completion by private and public domain enterprises in June 2000) and to identify 3 x 105 of an estimated 310 x 106 single nucleotide polymorphisms which distinguish individuals and their disease traits and risks, by 2002. Classical family, twin and adoptee studies have shown substantial heritabilities for many disease traits, but except in rare instances the pattern is polygenic rather than monogenic. Knowledge and technology have become sufficient to enable molecular geneticists to study megaphenic disorders in single families, with the expectation of isolating the genes, understanding the pathology and deriving clinically applicable tests of status (e.g. prognosis such as from early breast cancer, proof of clinical diagnosis and prenatal testing to go with non-directive counselling for reproductive choice).
For polygenic traits, knowledge is sufficient to initiate tests of hypotheses, but the technology is as yet insufficient to measure the contribution of genetic diversity to disease liabilities. It will come and the epidemiologist will be able to obtain, through molecular genetics, an assay of the genome-wide contributions to a particular disease or risk factor. The field of molecular genetic epidemiology will count people rather than peas and will have at its disposal a million or more molecular genetic markers rather than a few visible traits such as those studied by Mendel, wrinkles and colour, in his pea-breeding analyses. So what is currently useful in epidemiology to the molecular geneticist and what else do we need?
Existent population resources offer considerable potential to examine the contribution of defined genetic variants or genes to the population burden of disease. Classical examples predating DNA-based genotyping, include association studies of HLA and ABO blood group. Modern possibilities fall into several categories:
(i) Functional variants identified by biochemical and cellular studies
An example of this category is the recognition of (rare) individuals (1/2001/1000) with familial defective apoB (FDB),3 originally by ligand/receptor binding studies of hypercholesterolaemia. These subjects can be severely hypercholesterolaemic, although their plasma cholesterol level may be within reference range and may fluctuate substantially over time. In the large Copenhagen Heart Study (9255 subjects, 948 with ischaemic heart disease), it was possible to measure relative risk for coronary disease for FDB subjectsthis turns out to be seven times the population average.4 As another example, follow-up studies of a mutation in CYP2A6 inactivating its catalytic destruction of nicotine to cotinine and shown in specialized addiction studies to determine cigarette consumption, have measured in a population sample, that half normal complement of functional alleles, leads to almost twice average propensity amongst smokers, to quit.5
(ii) Functional variants identified by linkage and positional cloning
In the most classical linkage design, a large family displaying a clear-cut segregation pattern for a disease is examined at polymorphic sites representing each part of each chromosome. In the simple case of a single gene disorder, only one region of one chromosome will show an allele always cosegregating with the disease, enabling the mapping by linkage of the chromosomal region where the defective gene lies. Detailed characterization of the region by cloning and sequencing is used to try to positionally clone the culprit gene. With the availability of a fairly complete human genome sequence, this approach is reducing to searching the sequence for the possible culprit gene on criteria such as its tissue expression pattern, apparent function predicted from sequence, etc. (positional candidate approach). For example, haemochromatosis was shown by positional cloning to be attributable to an HLA-related gene6 and it has immediately been possible to examine the relationship between genetic and iron status diagnostics, population prevalence of mutations, and prevalence of HFE mutations in haemochromatosis-associated disease groups such as diabetics.6,7 The HFE carrier frequency is approximately one in ten among Caucasians.6 Given the widespread roles of iron, phenome scanning against HFE mutations may generate new hypotheses for other phenotypes and investigation of HFE interactions in other disorders such as porphyrias, haemoglobinopathies and coronary disease and environmental variables such as diet, lead poisoning and infections, is now proceeding at a great pace.810 These represent examples in which epidemiologists' study frameworks can be used to explore gene-environment and gene-gene interactions.
(iii) Disease-associated gene variants identified by candidate gene studies
Type I diabetes mellitus provides a good example of this approach. The functional relevance of the HLA region and insulin (INS) gene products to type I led to the investigation of their genotypic diversity as candidate genes for possible associations of specific alleles with disease risk. Some original and elegant association designs provided conclusive proof of allele-specific susceptibility.11 In a prior era, the HLA association had represented strong genetic evidence that type I and type II diabetes were truly distinct disease entities; an early success of genetics in providing a new taxonomy of disease.12 In the absence of a functional variant, any genetic polymorphism stands a chance of being in linkage disequilibrium (LD) with a functional variant. Linkage disequilibrium implies the likelihood of finding one allele at one locus specifically associated with a particular allele at a nearby locus. This confers the practical advantage of being able to rely on fewer of all the polymorphisms in a physical region (there may be one per 2501000 base pairs) since some will act as proximity markers for others. Given the extent of LD over the genome, it will have great utility in reducing the numbers of sites to be tested to implicate gene regions in a trait, but it may cause great difficulty in identifying truly functional sites and their mechanism of effect. Practical and theoretical approaches to studies capitalizing on LD are very topical at present and are pertinent to both strong candidate genes and where more relaxed stringency for candidacy (see following) is to be considered.13 Some LD mapping designs demand new synthesis of statistics and algebra and consideration of genetic and environmental factors. Linkage disequilibrium mapping and the search for complex disease genes has been recently reviewed.14
(iv) Disease-associated gene variants identified by genome-wide studies
This approach is under development and likely to be possible within the next decade. It will rely on using polymorphic markers throughout the genome in very high throughput genotyping laboratories (possibly covering millions of polymorphisms). It will not be necessary to examine every polymorphism that exists though, on account of the existence of LD (see above). Without a specific hypothesis, the number of markers examined will either place stringent demands on the Bonferroni corrections used, or force the initial studies to be considered solely as hypothesis-forming rather than hypothesis-testing. It should be obvious that confirmatory studies for measurement of population effect, primary studies in formats enabling analysis of gene-environment and gene-gene interaction, and many new hypothesis generating studies, will all have to be considered. In addition, prospective studies will enable the identification of protective alleles more readily than may be feasible in studies focused on cases (i.e. disease alleles).
Approaches (i) to (iv) need epidemiologists:
(1) To gather clinical material suitable for genetic analyses15510 ml potassium EDTA blood frozen and stored at 20 or 70°C until DNA extraction, will achieve a very adequate long-term DNA bank (£1020 per subject). Where more extensive funding can be obtained (£60100 per subject followed by usual costs of DNA extraction), lymphocyte EB transformation from heparinized blood may be feasible from all subjects to obtain permanent cell lines. Much lower DNA yields, but with greater convenience of preparations can be obtained by buccal washes (or brushes) which can be obtained by mail and without venesector costs. The mailshot epidemiologist can thus put some genetic studies in place whereas biochemical studies generally need venesector and/or nurse and/or clinician to meet in persona with the subject. Intermediate traits at the RNA and cellular levels require tissue, but there is growing interest in developing these resources also, to enhance the resolution of analyses and to attempt to understand genetic pathways. RNA (notoriously unstable) in small tissue samples seems to be efficiently preserved at ambient temperature for up to 24 hours or longer at 4°C or permanently at 20°C by RNALater (Ambion, USA) which may simplify the collection logistics for some tissue samples. The commitment to investment in the genetic arm of any study will differ according to perception of genetic utility. However, given that the molecular approaches to gene analysis are rapidly becoming more sensitive and sophisticated, the epidemiologist will be wise to consent retention of material/study continuation (and to ensure that DNA bank exhaustion does not occur by filing half the extracted DNA for never use). In a few years time, genome-wide analysis of marker sets using much less template DNA could become the norm. Without due attention to long-term preservation of an established DNA bank (conservative approaches to use of DNA in assays, protection from contamination, regulation and co-ordination of use for assays), it would be easy to lose resources which are finite, which are based on often laborious and costly phenotypic studies, and in some cases impossible to re-establish. Aspects of clinical sample acquisition and preparation have recently been considered in detail15 (Table 1). A further facet of DNA is that it undergoes (differentially between different tissues and blood) somatic changes and changes of ageing such as: shortening of the ends (telomeres) of chromosomes; methylation at CpG sites both marking parent-of-origin-specific gene expression patterns and changes associated with ageing and cancer change; individual cells can lose chromosomal material with age; DNA can oxidize and form adducts with environmental agents and mutagens; and somatic sequence changes occur during ageing, most notably deletions and point mutations in the mitochondrial genome which is distinct from the genome represented by the complement of chromosomes resident in the nucleus. As geneticists and epidemiologists push the boundaries of genetic, epigenetic and transcriptome knowledge of populations further, approaches to sampling and analysis and consequent modifications to study designs (e.g. parent-of-origin questions for a trait) and clinical sampling will be needed (Table 2
).
|
|
(3) To facilitate or participate in additions to the clinical arm of some environmental epidemiologists' studies, particularly in coming to regard each study subject as a proband for a family.16 Robust, although less powerful, designs, use non-transmitted parental alleles as an artificial control, evading the possible risks of genetic stratification or admixture confounding a purely population-based case-control design. The degree of hidden genetic stratification, such that cases may possess both the disease predisposing alleles plus other alleles on other chromosomes unique to the case group but not disease causing and merely reflecting the heterogeneous history and evolution of human populations, is at present uncertain. The classical example that is cited, is the instance of the association between blue eyes and incompetence with chopsticks in California. One could infer that there exists a gene which both determines eye colour and chopstick competence. It is much more plausible though, that the association merely reflects the interdependent ethnic and cultural origins of the two factors: blue-eyed Caucasians may well have genetic factors causing their different eye colour from their Oriental neighbours, but their incompetence with chopsticks generally reflects their lack of practice compared with their Oriental neighbours whose cultural background usually leads them to chopstick practice from early childhood. Stratifications in admixed populations may result both in (non-causal) associations between unlinked alleles, between groups of phenotypes, and between alleles and phenotypes. A variety of sib-based association designs (in contrast with affected sib pair linkage studies), largely new and untested, also attempt to do the same thing as parents-child trios, for diseases of late onset where parents are not available. The epidemiologist will need to be flexible and adaptable from the planning stage to achieve full successful integration with genetic epidemiology. Some of the wide range of genetic epidemiological study designs are summarized in Table 3.
|
(5) To respond to the ethical and genetic issues in clinical sample and especially DNA banking which demand explanation and written consent, attention to aspects of data protection, anonymization versus name linking, etc. This may be especially important where it is planned to return to subjects of a particular genotype for more detailed studies, also where a genotype of potential clinical importance to the individual or potentially to his family is to be studied. For example, the APOE gene allele 4 was originally of interest as a relatively minor contributor to lipid levels, but later turned up unexpectedly to have significant predictive value for late onset Alzheimer's disease.17 While a two- to threefold risk for the 15% or so of 30-year-olds with one E4 allele may not be viewed as too worrysome, the ten or more fold risk plus 8-year advancement of age of onset for the two per cent of people with two E4 alleles is more disconcerting. The basis of consent for genetic studies (e.g. anonymized, no reporting back versus subject to receive information) clearly matters. These criteria will differ by country, culture, funding level available and type of genetic study. A recent UK-based public survey gives useful insight into some of the concerns and issues.18
Reflection on some of the successes, of which a few have been cited above, in determining gene variants influencing disease burden derived in whole or part from population, rather than from family studies, emphasizes the exciting future available to collaborations between epidemiologists and molecular geneticists. One marker which is already marketed19 as a commercial test founded both on scientific literature and patents, is the angiotensinogen gene.20 A rather complex mixture of association, linkage and then functional studies in vitro, led to evidence implicating alleles of this gene in the aetiology of hypertension. The full US patent (5763168) makes interesting reading and is itself an excellent review: as for other US patents, the full document is available online.21 Alleles of this gene seem to predict, at least in small part, whose hypertension is responsive to low salt diet or some drugs, although not all studies have confirmed these findings and the precise cellular mechanism remains unclear. It may involve an amino acid sequence variation in the hormone itself, or it may involve a polymorphic locus in the promoter (switch) of the gene, which is in complete linkage disequilibrium with the polymorphic coding locus. The rationale for applying the test in determining risk and therapeutic choice, i.e. as a prognostic marker, is obvious in the context of general cardiovascular risk reduction for a population. It remains to be seen whether this risk marker will prove to be of significant utility in clinical management, but over the next decade, thousands more markers like it will enter the scientific and patent literature and at least a handful are likely to change clinical practice in some way. Other successes have been in relation to understanding susceptibility to, or consequences of, infectious diseases of global importance such as HIV22 and malaria.23 They follow studies several decades old24 examining cholera risk for ABO blood groups (one of the substrates of the classical era of human genetic research) which showed O group predisposition to cholera. Group O is very under-represented, possibly due to selective disadvantage, in the Ganges Delta. Blood group genetics was totally unknown at the time of the Broad Street pump epidemic. These are good examples of the holistic synthesis of understanding to be gained from studying both environmental factors and genetic factors, an opportunity which has exploded upon us for the new Millennium with the determination of the Human Genome Sequence. In addition to understanding the pathological basis of diseases in populations, a fraction of the markers identified are likely to guide drug development, enabling smaller, faster, cheaper trials by focusing on those who possess higher disease risk or who are more likely responders and they are likely to guide future therapeutic choices for drug response and drug metabolism on a genotype-specific basis.
Glossary
allelesthe alternative genetic forms of a system (gene, locus or genomic region)
b.p. (base pairs)the fundamental nature of DNA is duplex of two strands paired at each base within their complementary sequences, adenine with thymine, cytosine with guanine. Duplex length, for example of a gene, is therefore measured in base pairs from the start of that gene to the end of that gene
Comet assayA method for measuring DNA damage to individual cells, based on the technique of microelectrophoresis. Cells embedded in agarose are lysed, subjected briefly to an electric field, stained with a fluorescent DNA-binding stain, and viewed using a fluorescence microscope. Broken DNA migrates farther in the electric field, and the cell then resembles a comet with a brightly fluorescent head and a tail region which increases as damage increases
CpG sitesthe base sequence CG (traditionally denoted CpG, the p representing the phosphoribosyl backbone) is specifically recognized by an enzyme which puts a methyl group on the C. However, this is chemically unstable, rendering CpG sites one to two orders of magnitude more susceptible to mutation, than the average for all single bases in the mammalian genome
epigeneticmodifications to DNA sequence such as CpG methylation, which may have transmissible patterns during cellular duplication or even through reproductive meiosis, are called epigenetic
GCgas chromatography
haplotypeif a system (gene or genomic region) contains two or more loci each displaying alternative genetic forms, then its alternative forms are called haplotypes and alleles are restricted to products of single loci
HFEdesignation of the hereditary haemochromatosis gene on chromosome 6
HPLChigh performance liquid chromatography
imprintingfor some genes and genomic regions, the alleles from both parents are unequally expressed in some or all tissues. Indeed, often the allele from one parent is not expressed at all, for example only the paternal copy of the insulin-like growth factor II gene is active in the fetus. Such imprinted loci usually show allele-specific methylation patterns at CpG sites (see CpG)
linkage studiesstudies in which, classically, a large family displaying a clearcut segregation pattern for a disease, is examined at polymorphic sites representing each part of each chromosome. In the simple case of a single gene disorder, only one region of one chromosome will show an allele always cosegregating with the disease, enabling the mapping by linkage of the chromosomal region where the defective gene lies. For diseases considered oligogenic, the same principle has been used in large collections of affected sibling pairs, to try to identify regions shared by sibs selected for affected status, more frequently than chance predicts. Locus heterogeneity (i.e. multiple genes contributing to the disease) may be a major limitation on power for sib pair studies
linkage disequilibrium (LD) mappingalleles (q.v.) of loci which are physically closely linked, are often in coupling or repulsion, that is, the alleles of one locus are nonrandomly distributed relative to the other locus. Given an infinite time in a randomly mating population, meiotic recombinations between the loci would create linkage equilibrium but that is not the case of human history and so linkage disequilibrium often exists between loci even beyond one million base pairs apart
megaphenic disordersdisorders in which a gene has a major effect relative to the standard deviation of that phenotype
phenome scanningthe converse of genome scanning. Given a genotype which may have wide-ranging effects, it may, on a hypothesis forming basis, be appropriate to research genotypic association against all phenotypes available for a cohort
positional cloninglinkage studies (q.v.) identify a region of a chromosome where a gene with a substantial phenotypic effect lies. Detailed characterization of the region by cloning and sequencing is used to try to positionally clone the culprit gene. With the availability of a fairly complete human genome sequence, this approach is reducing to searching the sequence for the possible culprit gene on criteria such as its tissue expression pattern, apparent function, etc. (positional candidate approach)
RT-PCRreverse transcription polymerase chain reaction. In this process messenger RNA from a tissue is reverse transcribed by an enzyme to a complementary DNA strand. This is then amplified in vitro using sequential cycles of polymerase chain reaction in which chemically synthesized primers complementary to the ends of the duplex sequence, in conjunction with a DNA polymerase, achieve an exponential amplification. The RT-PCR product may represent a specific message or using more general primers, many messages. It enables analysis of messenger RNA expression levels
transcriptomethe genome refers to the collective sequence of all chromosomes of an organism. The transcriptome refers to that part of the genome represented in sequences transcribed to RNA, i.e. includes all genes
Acknowledgments
Ian NM Day is a Lister Institute Professor (19962000). Dongfeng Gu was a Royal Society Visiting Professor 19992000. Our research is supported by grants from the Medical Research Council of the UK, from the British Heart Foundation and from the Wessex Medical Trust (Hope).
References
1 http://www.netspace.org/MendelWeb/Mendel.html. Accessed 31 October 2000.
2 Morton NE. Outline of Genetic Epidemiology. Basel, Switzerland: Karger, 1982.
3 http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?107730. Accessed 31 October 2000.
4 Tybjaerg-Hansen A, Steffensen R, Meinertz H, Schnohr P, Nordestgaard BG. Association of mutations in the apolipoprotein B gene with hypercholesterolemia and the risk of ischemic heart disease. N Engl J Med 1998;338:157784.
5 Gu DF, Hinks LJ, Morton NE, Day INM. The use of long PCR to confirm three common alleles at the CYP2A6 locus and the relationship between genotype and smoking habit. Ann Hum Genet 2000;64: 38390.[ISI][Medline]
6 http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?235200. Accessed 31 October 2000.
7 Sampson MJ, Williams T, Heyburn PJ et al. Prevalence of HFE (hemochromatosis gene) mutations in unselected male patients with type 2 diabetes. J Lab Clin Med 2000;135:17073.[ISI][Medline]
8 Stuart KA, Busfield F, Jazwinska EC et al. The C282Y mutation in the haemochromatosis gene (HFE) and hepatitis C virus infection are independent cofactors for porphyria cutanea tarda in Australian patients. J Hepatol 1998;28:40409.[ISI][Medline]
9 Piperno A, Mariani R, Arosio C et al. Haemochromatosis in patients with beta-thalassaemia trait. Br J Haematol 2000;111:90814.[ISI][Medline]
10 Tuomainen TP, Kontula K, Nyyssonen K, Lakka TA, Helio T, Salonen JT. Increased risk of acute myocardial infarction in carriers of the hemochromatosis gene Cys282Tyr mutation: a prospective cohort study in men in eastern Finland [see comments]. Circulation 1999; 100:127479.
11 Julier C, Hyer RN, Davies J et al. Insulin-IGF2 region on chromosome 11p encodes a gene implicated in HLA- DR4-dependent diabetes susceptibility. Nature 1991;354:15559.[ISI][Medline]
12 Bell J. How genetics will change medicine. In: Medicine: Genetics. Ch. Eds. Bell J, Phillips R. 1998;26(3):67. Oxford: The Medicine Publishing Company Ltd.
13 Weiss KM, Terwilliger JD. How many diseases does it take to map a gene with SNPs? [In Process Citation]. Nat Genet 2000;26:15157.[ISI][Medline]
14 Jorde LB. Linkage disequilibrium and the search for complex disease genes. Genome Res 2000;10:143544.
15 Spanakis E. Human DNA sampling and banking. In: Day INM (ed.). Molecular Genetic EpidemiologyA Laboratory Perspective. Berlin: Springer, Ch.2. (in press).
16 Editorial. Freely associating. Nat Genet 1999;22:12.[ISI][Medline]
17 http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=107741. Accessed 31 October 2000.
18 http://www.wellcome.ac.uk/en/1/biovenpopcolspe.html. Accessed 31 October 2000.
19 http://www.myriad.com/gtpatc.html. Accessed 14 January 2001.
20 http://www3.ncbi.nlm.nih.gov:80/htbin-post/Omim/dispmim? 605286. Accessed 14 January 2001.
21 http://www.uspto.gov/patft/index.html. Accessed 14 January 2001.
22 Samson M, Libert F, Doranz BJ et al. Resistance to HIV-1 infection in Caucasian individuals bearing mutant alleles of the CCR-5 chemokine receptor gene. Nature 1996;382:72225.[ISI][Medline]
23 McGuire W, Hill AVS, Allsopp CEM, Greenwood BM, Kwiatkowski D. Variation in the TNF-alpha promoter region associated with susceptibility to cerebral malaria. Nature 1994,371:50811.[ISI][Medline]
24 Glass RI, Holmgren J, Haley CE et al. Predisposition for cholera of individuals with O blood group. Possible evolutionary significance. Am J Epidemiol 1985;121:79196.[Abstract]