Peaks and troughs in linkage mapping for the rheumatic diseases

J. S. Lanchbury and N. J. Schork1

Department of Rheumatology, Division of Medicine, 5th Floor Thomas Guy House, Guy's, King's and St Thomas' Hospitals School of Medicine, King's College, Guy's Hospital Campus, London SE1 9RT, UK and
1 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio; The Program for Population Genetics and Department of Biostatistics, Harvard University School of Public Health, Boston, Massachusetts; The Jackson Laboratory, Bar Harbor, Maine and The Genset Corporation, La Jolla, California, USA.

Concrete progress in the identification of genes influencing susceptibility and severity in complex rheumatic diseases such as rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE) has been elusive. The last 10 yr has seen the development of the tools and resources for this identification effort and has focused on gene mapping via linkage and association approaches. These ongoing approaches can be crudely reduced to semi-automated electrophoresis apparatus for sizing simple sequence repeats in human DNA, more or less accurate maps of the relative locations of these microsatellite genetic markers across the three billion bases of the human genome, collections of DNA from families with multiple affected cases and statistical methods for linking genetic map positions to disease affection status. RA and SLE are typical examples of diseases of complex aetiology in which the inherited component is only one part of the picture [1]. Risk for disease development is dependent on interactions between host genes and environmental factors which may be specific or random and of which we are almost totally ignorant. Investigators are therefore attempting to map genetic effects of unknown, and in all likelihood, small magnitude with crude genetic markers in family collections of inadequate size and with inevitable limitations due to clinicogenetic heterogeneity. Although the theory behind this approach suggests it should work, there are few practical successes. Thus, it is likely that an optimal approach to locus discovery will also involve integration of animal model and other data [2]. What cannot be avoided is that despite the considerable resources devoted to mapping studies in RA and other diseases with a complex basis, there are no examples of the approach yet delivering a causative polymorphism. The authors wish the validity of this statement to be short-lived. It is, however, important at this point in development to evaluate critically the factors which limit the effectiveness of current approaches and to consider the solutions.

Diagnosis and genetic heterogeneity

Any biological experiment is only as good as its material and its design. In human genetic terms ‘good’ means large numbers of subjects with homogeneous, unambiguous phenotypes. Neither of these criteria are met by the current family collections of RA patients because there is no diagnostic test for RA and the disease is only weakly ‘familial’. Thus, studies are limited simply due to lack of availability of family resources. The augmentation of a genetic study by inclusion of family material sampled from distinct population groups is problematical. Population history and local genetic differentiation, as well as important differences in environmental conditions, are likely to contribute to a great deal of heterogeneity with respect to causal factors for the disease. The net result of combining individuals from disparate populations is either a loss of power or at best a modest increment in power at the cost of considerable logistical and perhaps experimental effort. Given our inability to define unambiguously disease phenotype, there are the inevitable complications of including patients whose clinical spectrum is distinct from that of the base population. Some years ago, colleagues carried out a comparative clinical, radiological and serological study of a cohort of UK and Greek caucasoid RA patients [3]. The differences in patient characteristics between northern and southern Europe were profound, lending weight to the supposition that, if these populations represented the spectrum of RA in Europe, pooling any patient group across Europe would be problematical. The dilution effect was prominent when we examined the only verified genetic determinant of the disease. HLA-DR4 was found in around 80% of UK patients and only 25% of those from northern Greece [4, 5]. The ‘shared epitope’ fared only slightly better [6]. Frank clinicogenetic heterogeneity is not confined to comparisons between the UK and southern Europe—several studies of central and northern European groups have failed to record comparable DR4 frequencies to those of the UK [7, 8]. There are practical and theoretical objections to the pooling of data sets even if investigators are organized enough to use the same genetic marker sets for their primary studies.

Problems with family approaches— non-parametric analysis is inefficient

Those of you who spend your time with patients will know better than the authors that RA does not breed true in families in the same way as the dominant Huntington's chorea or other simple Mendelian conditions. The classification of RA as a disease with complex multi-factorial aetiology encourages the use of the non-parametric (i.e. assumption-free) approach to linkage analysis [9, 10]. As has been discussed before, this approach is not subject to the problems of definition of health or disease status in siblings for a condition of relatively late onset. The advantage of freeing the investigator from having to build statistical models simulating the inheritance of the disease needs to be offset against a loss of power associated with most non-parametric methods [11]. Thus, the optimal way in which the data can be analysed at the whole genome level is statistically inefficient. The investigator's response may be to pool families across population boundaries with the consequences outlined above.

Choice of genetic markers

The current mapping sets of simple sequence repeat or microsatellite genetic markers have superseded RFLP (restriction fragment length polymorphism) and VNTR (variable number tandem repeat) markers by virtue of their frequency in the genomes of mammals and the convenience of genotyping large numbers of fluorescently labelled polymerase chain reaction (PCR) products from multiple loci in single electrophoresis gel lanes [12]. However, approaches exploiting these markers are problematical and demanding of human intervention. Since DNA from full nuclear families is not always available (especially in diseases of late onset like RA or those which are socially stigmatized) genotyping errors arise since few laboratories take the precaution of duplicate genotyping. Furthermore, when fine mapping, loci which investigators are obliged to use may not be as technically amenable as those used for the primary mapping. The statistical and investigative consequences of such errors have still to be quantified in full [13, 14]. The inability to fully automate microsatellite genotyping and associated technical problems, have stimulated development of new technologies for assessing new kinds of genetic marker. The commonest class of genetic variant is the single nucleotide polymorphism (SNP). These take the form of nucleotide exchanges or micro-deletions or insertions and can be readily genotyped by hybridization to oligonucleotides. Such SNPs have formed the substrate for most of the HLA genotyping reports for the last 12 yr. The reversal of the more usual format so that huge numbers of hybridization probes are bound to a solid matrix enables parallel interrogation of mixed populations of amplified DNA encoding the sequence variability of interest. With computer-interpreted readout, the potential for simultaneous genotyping of tens of thousands of genetic markers per individual becomes possible with the potential for pooling of populations still to be explored [15, 16]. From August 1999 SNP genotyping for more than 2000 markers across the genome became available on a chip-based format as a commercial service. The statistical basis for use of SNPs for linkage analysis was discussed by Kruglyak who showed by simulation that a map of 1500–3000 biallelic markers provided more linkage information than the currently used 400 marker microsatellite maps [17]. This fresh approach addresses the issue of high-throughput genotyping error rates and removes the subjectivity of the laboratory investigator. We are still left with the power problems due to insufficient numbers of high-quality families. One way forward is to use the considerable population resources for association mapping studies, an approach which is not without potential pitfalls [1719].

Current and developing choices

Genetic case–control association studies in RA have a long history. The discovery of the association between the HLA-Dw4 marker (later shown to equate to the DNA sequence DRB1*0401) and RA dates from 1974 [20]. This candidate gene approach has been powerful and reproducible at the DNA level because the effect is both true and relatively strong in most populations. The same has not held true for many other association studies (see [21]). Populations of patients and controls have several advantages over families. They are more numerous and there are fewer logistical constraints on collection. Consequently, studies can be better powered and clinical/physiological subgroups may be more readily studied. In addition, where a causal or related polymorphism exists, the sensitivity of association studies is greater than for the linkage approach.

At this point we must address the intrinsic differences between family-based approaches and approaches using unrelated individuals when considering the application of a genetic map. Genetic recombination is the process through which the genetic uniqueness of our offspring is ensured. In the generation of a pair of siblings, a single opportunity for meiosis and recombination has occurred in each parent in the generation of gametes for each child. The number of meioses distinguishing a chromosome from two ‘unrelated’ individuals is determined by the number of generations since they shared a common ancestor. This may be of the order of 100–400 in normal outbred populations which have not undergone drastic reduction in population size at some point in their history. Such dramatic events are known as genetic bottlenecks. The consequence of this is that the genomic area flanking a genetic marker or a disease locus in siblings is more likely to be identical over a greater distance than an equivalent region in two unrelated people. A comprehensive genetic survey of the genome in families therefore requires fewer genetic markers compared with populations.

An equivalent comprehensive survey of the genome in unrelated patients and healthy controls is currently not possible. However, the limitations on availability of large family collections of even relatively common complex diseases provide a stimulus for the development of novel analytical approaches which take advantage of the extensive population material which is already available [18]. By alteration of the scale of the SNP typing formats already available, it is theoretically possible to devise a screening system which allows whole genome association studies to be carried out in outbred populations. The key factor is the intermarker distance and the physical effort of ascertaining and mapping up to a million informative SNPs [18]. This latter effort is already underway with efforts by private companies, academic groups and a private consortium which intends to make the resulting SNP map publicly available to avoid future difficulties regarding proprietary rights [2225]. Pilot studies of coding sequence SNPs (cSNPs), which are the most attractive targets, have covered selections of candidate genes and have reported the unsurprising observation that polymorphisms involving amino acid change have relatively low allele frequencies compared with silent substitutions [2628]. This is consistent with classical evolutionary theory which suggests that the vast majority of changes made to a protein will be deleterious [29]. The limited number of SNP screens which have been carried out prove the availability of markers; the current debate is how many are needed, how useful will they be and what are the necessary study designs?

The issue of how many markers are needed to survey the genome of a population of outbred individuals is dependent on the extent of population-specific linkage disequilibrium (LD). Few empirical data exist to inform the debate, but there are educated and somewhat alarming guesses. Kruglyak estimates from simulation studies that genome-wide LD may not extend further than an average of 3 kb requiring 500 000 markers for a genome-wide association approach [30]. Critically, he suggests that even small, isolated founder populations such as the Finns may not help much, a challenge to a number of groups which have adopted this strategy [31]. Of encouragement to those developing SNP maps is the alternative view proposed by Thompson and Neel [32]. By analogy with post-agricultural Amerindian groups, they suggest that extensive LD of up to 0.5 cM should exist in European-derived populations [32]. These remarks are made on the assumption of no influence of natural selection. Many of the rheumatic diseases we focus on are characterized by immune dysregulation. It is possible that current immune dysfunction genes will have been useful at some time in evolutionary history. Should the limited LD view prove the rule, selection may well have maintained extensive LD around key disease susceptibility genes. A reverse of the mapping strategy may be appropriate—find the LD and there will reside your biologically important polymorphisms. Thus, while the maps may have general application (and the systems will be so costly to develop that they will need it), more targeted versions might be generated to serve better the rheumatology community's needs.

Having arrived at a map of sufficient density, what does one study? The hegemony of family studies over the last few years has obscured the fact that despite stratification effects on the distribution of marker- and disease-related allelic variation, polymorphisms would be best examined using case–control population studies. Transmission disequilibrium test (TDT) cohorts of sufficient size are unlikely to be available within useful time scales for replication studies and are likely to occupy an interim investigation phase. An additional complication is that for diseases of late onset, the classical TDT family which includes two parents of a proband is less likely to be available. Development of robust statistical tools that compensate for disease-unrelated human population variation is underway. The problems of population stratification have begun to be addressed by systematically evaluating the degree of variation at selected loci which act as sensors for underlying substructure [33]. This will be a critical area for both practical and theoretical development in the next 5 yr.

The scale of availability of populations of unrelated cases and controls and the inherent sensitivity advantages over linkage-based approaches argues that controlled population association will shortly be the preferred mode of disease locus discovery and replication. However, a key determinant of the success of this approach will be the allelic heterogeneity of a disease locus within a population. This has been critically addressed by Terwilliger and Weiss [34]. Intensive investigation of polymorphism in 9.7 kb of the lipoprotein lipase gene led to the conclusion that random ascertainment and use of a small number of the total 88 SNPs described would not have been sufficiently informative to track the known intermediate phenotype of low HDL cholesterol levels in premature atherosclerosis [35, 36]. Currently we are unaware of the degree to which allele and locus heterogeneity (in the latter case, perhaps part of the same biochemical pathway) may confer similar rheumatic disease phenotype. Sewall Wright suggested that individual disease alleles which supported the development of complex disease would of necessity exist at high frequency in the healthy population. If these high-frequency alleles are in fact composites of multiple complex disease-related variants, the task of detecting them with linked SNP markers becomes increasingly difficult.

The engine for development of robust SNP use is the correlation and eventual predictive use of functional SNPs in disease outcome, drug toxicity and efficacy. There is a great deal of promotion of so-called genetically tailored therapies and the new field of pharmacogenomics [3739]. The challenge for the immediate future is to take on the degree of existing human genetic diversity, to design studies which are statistically robust to multiple comparisons and to distinguish causal pathways from merely population-associated variation.

Notes

Correspondence to: J. S. Lanchbury. Back

References

  1. Vyse TJ, Todd JA. Genetic analysis of autoimmune disease. Cell1996;85:311–8.[ISI][Medline]
  2. Schork NJ, Lanchbury JS. Integrated phenotyping, disease models, and pathophysiologic databases. Trends Genet2000;16:in press.
  3. Drosos AA, Lanchbury JS, Panayi GS, Moutsopoulos HM. Rheumatoid arthritis in Greek and British patients. A comparative clinical, radiologic, and serologic study. Arthritis Rheum1992;35:745–8.[ISI][Medline]
  4. Boki KA, Panayi GS, Vaughan RW, Drosos AA, Moutsopoulos HM, Lanchbury JS. HLA class II sequence polymorphisms and susceptibility to rheumatoid arthritis in Greeks. The HLA-DR beta shared-epitope hypothesis accounts for the disease in only a minority of Greek patients. Arthritis Rheum1992;35:749–55.[ISI][Medline]
  5. Wordsworth BP, Lanchbury JSS, Sakkas LI, Welsh KI, Panayi GS, Bell JI. HLA-DR4 subtype frequencies in rheumatoid arthritis indicate that DRB1 is the major susceptibility locus within the HLA class II region. Proc Natl Acad Sci USA1989;86:10049–53.[Abstract]
  6. Gregersen PK, Silver J, Winchester RJ. The shared epitope hypothesis. An approach to understanding the molecular genetics of susceptibility to rheumatoid arthritis. Arthritis Rheum1987;30:1205–13.[ISI][Medline]
  7. Van Jaarsveld CH, Otten HG, Jacobs JW, Kruize AA, Brus HL, Bijlsma JW. Association of HLA-DR with susceptibility to and clinical expression of rheumatoid arthritis: re-evaluation by means of genomic tissue typing. Br J Rheumatol1998;37:411–6.[ISI][Medline]
  8. Salvarani C, Macchioni PL, Mantovani W et al. HLA-DRB1 alleles associated with rheumatoid arthritis in Northern Italy: correlation with disease severity. Br J Rheumatol1998;37:165–9.[ISI][Medline]
  9. Kruglyak L, Lander ES. Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am J Hum Genet1995;57:439–54.[ISI][Medline]
  10. Kruglyak L. Nonparametric linkage tests are model free. Am J Hum Genet1997;61:254–5.[ISI][Medline]
  11. Abreu PC, Greenberg DA, Hodge SE. Direct power comparisons between simple LOD scores and NPL scores for linkage analysis in complex diseases. Am J Hum Genet1999;65:847–57.[ISI][Medline]
  12. Reed PW, Davies JL, Copeman JB et al. Chromosome-specific microsatellite sets for fluorescence-based, semi-automated genome mapping. Nat Genet1994;7:390–5.[ISI][Medline]
  13. Brzustowicz LM, Merette C, Xie X, Townsend L, Gilliam TC, Ott J. Molecular and statistical approaches to the detection and correction of errors in genotype databases. Am J Hum Genet1993;53:1137–45;54:1132.[ISI][Medline]
  14. Feakes R, Sawcer S, Chataway J et al. Exploring the dense mapping of a region of potential linkage in complex disease: an example in multiple sclerosis. Genet Epidemiol1999;17:51–63.[ISI][Medline]
  15. Lipshutz RJ, Morris D, Chee M et al. Using oligonucleotide probe arrays to access genetic diversity. Biotechniques1995;19:442–7.[ISI][Medline]
  16. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ. High density synthetic oligonucleotide arrays. Nat Genet1999;21(1 Suppl.):20–4.[ISI][Medline]
  17. Kruglyak L. The use of a genetic map of biallelic markers in linkage studies. Nat Genet1997;17:21–4.[ISI][Medline]
  18. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science1996;273:1516–7.[ISI][Medline]
  19. Collins FS, Brooks LD, Chakravarti A. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res1998;8:1229–31;9:210.[Free Full Text]
  20. Stastny P. Mixed lymphocyte culture typing cells from patients with rheumatoid arthritis. Tissue Antigens1974;4:572–9.
  21. Fife M, Coakley G, Lanchbury JS. Current perspectives on the genetics of rheumatoid arthritis. In: Wollheim F, Panayi GS, Firestein G, eds. Rheumatoid arthritis: The new frontiers in pathogenesis and treatment. Oxford: Oxford University Press, 1999.
  22. Marshall A. Genset–Abbott deal heralds pharmacogenomics era. Nat Biotechnol1997;15:829–30.[Medline]
  23. Buetow KH, Edmonson MN, Cassidy AB. Reliable identification of large numbers of candidate SNPs from public EST data. Nat Genet1999;21:323–5.[ISI][Medline]
  24. Sherry ST, Ward M, Sirotkin K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res1999;9:677–9.[Free Full Text]
  25. Masood E. As consortium plans free SNP map of human genome. Nature1999;398:545–6.[ISI][Medline]
  26. Cargill M, Altshuler D, Ireland J et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat Genet1999;22:231–8.[ISI][Medline]
  27. Hacia JG, Fan JB, Ryder O et al. Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays. Nat Genet1999;22:164–7.[ISI][Medline]
  28. Halushka MK, Fan JB, Bentley K et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet1999;22:239–47.[ISI][Medline]
  29. Gillespie JH. The causes of molecular evolution. Oxford: Oxford University Press, 1991.
  30. Kruglyak L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet1999;22:139–44.[ISI][Medline]
  31. Peltonen L. Molecular background of the Finnish disease heritage. Ann Med1997;29:553–6.[ISI][Medline]
  32. Thompson EA, Neel JV. Allelic disequilibrium and allele frequency distribution as a function of social and demographic history. Am J Hum Genet1997;60:197–204.[ISI][Medline]
  33. Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet1999;65:220–8.[ISI][Medline]
  34. Terwilliger JD, Weiss KM. Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr Opin Biotechnol1998;9:578–94.[ISI][Medline]
  35. Nickerson DA, Taylor SL, Weiss KM et al. DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. Nat Genet1998;19:233–40.[ISI][Medline]
  36. Reymer PW, Gagne E, Groenemeyer BE et al. A lipoprotein lipase mutation (Asn291Ser) is associated with reduced HDL cholesterol levels in premature atherosclerosis. Nat Genet1995;10:28–34.[ISI][Medline]
  37. Persidis A. Pharmacogenomics and diagnostics. Nat Biotechnol1998;16:791–2.[ISI][Medline]
  38. Ledley FD. Can pharmacogenomics make a difference in drug development? Nat Biotechnol1999;17:731.[ISI][Medline]
  39. Collins FS. Genetics: an explosion of knowledge is transforming clinical practice. Geriatrics1999;54:41–7.




This Article
Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (4)
Disclaimer
Request Permissions
Google Scholar
Articles by Lanchbury, J. S.
Articles by Schork, N. J.
PubMed
PubMed Citation
Articles by Lanchbury, J. S.
Articles by Schork, N. J.