Progress and problems in defining susceptibility genes for rheumatic diseases

J. Lanchbury, M. Hall and S. Steer

Molecular Immunogenetics Unit, Department of Rheumatology, Division of Medicine, 5th Floor, Thomas Guy House, Guy's Hospital Campus, Guy's, King's and St Thomas’ Hospitals School of Medicine, London SE1 9RT, UK

It is thought with some reason that there is no more dreadful punishment than futile and hopeless labour.

Albert Camus

Hypothesis-free genetic mapping of complex disease genes has been relatively unsuccessful to date and the rheumatic diseases are no exception. Weiss and Terwilliger in a recent article asked how many diseases it took to map a gene using single nucleotide polymorphisms (SNPs) [1; see also 2]. This ironic comment was directed at the raising of the SNP to the status of icon and saviour of human genetics. Behind the jibes lies very real disappointment at the inability of current technology and approaches to deliver verifiable linkages and consequent associations between genetic markers and diseases or intermediate disease phenotypes. If the disillusionment is palpable in the academic sector, it has been crippling for the private and publicly quoted specialized human genetics companies. In this short article we will summarize some of the findings of our recent studies in rheumatoid arthritis (RA), which offer insights into the shortcomings of the standard human genetic approach as currently practised. We hope that these lessons can be combined with the experience of others to inform the next phase of human genetics research and to speed the discovery of gene differences responsible for rheumatic diseases.

The sector of human genetics which uses the genetic variation found associated with genes has two basic approaches to link variation with complex disease phenotypes. These are the whole-genome scan and the candidate gene approach. The first uses a set of markers on a genome map which are selected on the basis of geography and utility, while the second concentrates on genetic changes in and around a gene which have a possible causative role in the observed variation in phenotype. This can be the presence or absence of disease or differences in a quantity, such as autoantibody levels, which are influenced by heritable factors, presumably polymorphisms. Both these approaches have been well reviewed [3, 4]. Family and population human genetic methodologies are the first step in identifying a gene whose role in physiology and pathology can be investigated exhaustively. Here we have excluded discussion of animal studies, in which the science moves from findings in animals directly to human biology, missing out the human genetics step. However, studies in rodents may prove useful in bolstering confidence in weak human linkage to an area where there is clear synteny between species [5].

Significant progress in understanding the rheumatic diseases has not so far been achieved as a result of human genetic approaches. This negative statement takes into account the many discoveries of HLA associations but also recognizes that these have yet to be translated into direct therapeutic advances. However, it is outside the HLA system where most energies have been focused over the last 10 yr and it is in this sector where there has been a failure to deliver disease genes. None of the current therapeutic advances based on TNF-{alpha} and IL-1ß biology have been derived from the several papers devoted to linkage and association studies of these gene regions.

Here we will concentrate on genome screens rather than candidate association studies, as the latter are currently difficult to evaluate critically and the laboratory and analytical approaches are evolving rapidly [6]. It is relevant that until these approaches are fine-tuned it will be difficult, but not impossible, to move from linkage via positional candidates to associated disease genes. However, in most cases investigation is mired at the stage of replication of initial findings rather than this latter phase. Of the major rheumatic diseases, RA has been the subject of three published whole-genome screens [79]. The genome screen from the British consortium is expected shortly. Several scans have been reported for systemic lupus erythematosus (SLE), with a number of follow-up studies by the same groups [1017], two in ankylosing spondylitis (AS) [18, 19] and two in osteoarthritis [2022]. None of the scans and their follow-up surveys have yet delivered disease genes which have been shown to be relevant either to familial complex disease or in the unrelated disease population at large. This last point begs the question of whether the disease which is found in multicase families is influenced to the same extent by genes and by the same gene loci as the sporadic disease found at the population level. There are many instructive examples of familial forms of complex diseases, such as breast cancer [23] and Parkinson's disease [24], in which the familial disease loci do not appear to play a direct role in the general population. However, in these cases disease, while clinically similar to the sporadic form, is often of earlier onset and usually inherited in a Mendelian pattern. One relevant example is lupus, where there is a rich literature on lupus associated with genetic complement deficiencies [25]. It would appear that sporadic lupus is not generally associated with polymorphic complement variants and in this sense, while informing us of relevant pathological mechanisms, the rare Mendelian form stands apart. Similarly, the relevance of linkages and genes isolated by linkage disequilibrium approaches in founder populations needs to be examined carefully in the commoner non-founder groups.

The general relevance of gene differences mapped in families to outbred disease populations will require the reduction of valid linkages to associated genetic variation. If the question is reversed we may ask whether the HLA genomic region which is strongly associated with AS, and less so with RA and SLE, is detected consistently in genome scans of multicase families. In essence, the answer is ‘yes’, with the lod score or equivalent roughly proportional to the amount of genetic variance accounted for by the locus at the population level in AS and RA [8, 9, 18, 19, 26]. In SLE the results are much less clear-cut, perhaps reflecting the ethnic diversity of the multiple genome scans together with the fact that association of HLA variants with lupus is often stronger with clinical features than disease itself [27, 28]. This heterogeneity is enhanced when multiple population groups are examined, as has been discussed previously [29].

The difficulties inherent in the replication of genome scan data have been discussed widely [2933] and approaches such as meta-analysis have been suggested and methodologies formalized and implemented [34, 35]. There is a lack of empirical data to address the causes of non-replication, but the essence is power limitation due to any combination of inadequate initial sample size, genetic and clinical heterogeneity, epistasis, genotyping error, genetic map errors, insufficiently informative genetic markers and genetic marker frequency variation. The difficulties which beset geneticists are therefore biological, technical and sociological. The fact that there is an inability or unwillingness amongst investigators to collaborate to return optimal sample sizes is an issue that national and transnational funding bodies should address with vigour.

To shed light on some of the issues mentioned above, we would like to consider what may be learned from study of the CRH genomic region in RA. While CRH is an excellent candidate gene for RA, as has been reviewed, we took a wider genetic approach which combined aspects of a regional genome scan with a candidate gene approach [36, 37]. Data were then viewed as representative of a portion of a genome scan analysed by non-parametric methods. In the initial survey we examined nine simple tandem repeat (STR) or microsatellite markers over a 20 centimorgan (cM) region centring on CRH at 8q13. One of these markers, CRHRA1, was located on an unsequenced cosmid containing CRH and thus placed at a maximum of 40 kilobases (kb) from the structural locus itself. The initial screen was conducted in a presumably homogeneous collection of 295 UK Caucasoid families with at least two siblings affected with RA. The maximum single-point lod score obtained was 1.8 for marker D8S1723, which was located approximately 9 cM from the CRHRA1 marker and the candidate. Multipoint analysis lowered the lod score slightly, which for Mendelian traits usually raises the question of whether the finding is valid. However, with few verified complex disease genes to compare, it is interesting that lod scores around the APOE gene in Alzheimer's disease behaved similarly [38]. APOE is a well-verified gene for Alzheimer's disease in Caucasoid populations, encouraging the view that similar multipoint results should not discourage investigation of a particular complex disease locus [39, 40]. Ordinarily, the next stage would have been replication of this finding in a large set of RA-affected sibling pairs (ASP). These were not available at the time and were unlikely to be for several years, though the situation is currently being improved by European Consortium on Rheumatoid Arthritis Families and North American Rheumatoid Arthritis Consortium [9, 41]. As a consequence, we proceeded directly with the next step, which is one of the most problematical in human genetics: the transition from genetic linkage to association. Fortunately, the CRHRA1 STR was significantly associated with RA, as evidenced by distorted transmission in a transmission/disequilibrium test analysis. A single allele, CRHRA1*10, at a control population frequency of 25%, accounted for the effect. This meant the marker was within a linkage disequilibrium group which included the disease-relevant variation. However, the size of this interval is difficult to define prospectively as the controversial literature on linkage disequilibrium makes clear [42, 43]. The reason why CRH region linkage was not detected in the two Caucasoid RA genome scans [8, 9] is unclear, but it is worth reflecting on factors such as choice and informativeness of markers and success rate of genotyping as well as possible population differences. In our study, markers D8S1833 and D8S1767 showed no excess allele-sharing despite being close to or within the implicated region.

Rapid advances in human genome sequencing mean that it is now relatively easy to move from map position or candidate gene to draft-quality assembled sequence [44] (http://genome.ucsc.edu). However in the summer of 2000 only 3% of chromosome 8 sequence was available and the nearest sequence appeared to be approximately 20 cM distant. To localize the CRHRA1 marker, to derive new polymorphic markers in the region and to search for adjacent genes, we screened bacterial artificial chromosome clones containing CRHRA1. A contiguous sequence of around 90 kb was obtained and we established the genomic context of the CRH structural gene for the first time. A second polymorphic microsatellite (CRHRA2), located 6 kb nearer CRH, was characterized. Two haplotypes carrying the allele CRHRA1*10, which was overtransmitted in the 295 ASP set, were observed (CRHRA1*10-CRHRA2*14 and CRHRA1*10-CRHRA2*15). Due to the lack of additional ASP families, we were obliged to replicate the genetic association with the region rather than the linkage. Replication was achieved in a second set of 130 Caucasoid single-case RA families for CRHRA1*10, and both sets showed transmission distortion of the CRHRA1*10-CRHRA2*14 haplotype. While the CRHRA1*10-CRHRA2*15 haplotype was associated with RA in the ASP set, this was not the case in the single-case RA families. There is a direct relationship between power in a genetic association study and frequency of susceptibility alleles. CRHRA1*10 and CRHRA1*10-CRHRA2*14 are the second most common allele and haplotype respectively in unaffected parents of RA patients, at 25 and 9%. Had they been less frequent, the power to show transmission distortion and replication in our studies would have been reduced, perhaps resulting in a negative finding. On such arbitrary grounds are disease effects recorded or not.

Much has been written on measures of genetic effect for complex disease and the {lambda}S and {lambda}R quantities of Risch are a widely traded currency [45]. In RA, estimates of {lambda}S vary between 4 and 10, with {lambda}SHLA estimated at around 1.8 or around one-third of the total. Our estimates show {lambda}SCRH to be about 1.14 or just less than 10% of the total genetic component [36]. This is important because arguments continue about the effect size of polygenes in autoimmune, rheumatic and other complex diseases. Clearly, the effect size of HLA is likely to be the exception and CRH may be a typical polygene. Effect sizes of less than that seen for CRH will be difficult to establish with the limited numbers of current ASP cohorts for RA and other rheumatic diseases. CRH may be typical in another sense as only around one-sixth of the families are linked. This establishes the possibility that large numbers of polygenes may contribute to a disease like RA, with extensive locus heterogeneity making the task of analysis more complicated. At least for two genomic regions, HLA and CRH, which have been shown to be linked and associated with RA, the effects hold true for both familial and sporadic forms, as the second group of simplex families were recruited as single, sporadic cases, providing a partial answer to the question posed earlier. Epistasis exists between CRH and HLA and there appears to be some evidence for sex-limitation, though this could be an effect of limited power in the male cohort in what is predominantly a female disease (unpublished observations). On current evidence, CRH is still the best candidate for RA susceptibility in the region. The draft human genome sequence shows the presence of a new gene, the transcription factor MURF2 or muscle-specific ring finger protein 2, which is close to CRH, but it is difficult to fit this gene into any pathogenetic framework for RA. Current efforts are directed at improving the genetic map around CRH by the discovery of SNP genetic markers, which are more abundant than STRs. This should enable refinement of disease haplotypes and, importantly, will provide a set of markers which may be transferred to other laboratories more reliably than the STRs, which were used in the initial work. We have commented elsewhere on the difficulties in sizing of STR alleles across different hardware and software platforms [46]. Once standardization has been achieved, the relevance of CRH genetic variation may be investigated in other RA cohorts and populations and in other immune diseases.

In genetic studies of rheumatic and other complex diseases, we are unable to modify many of the variables, such as allele frequency, locus heterogeneity and epistasis, which will affect study power. We can collaborate openly to maximize our sample sizes to avoid false-negative findings and share technology to enable rapid verification or otherwise of findings. The road ahead in human genetics has few legible signposts and most of us have limited fuel in our tanks. Good luck as well as prudence will be necessary if we are to avoid too many unhelpful detours.

Notes

Correspondence to: J. Lanchbury. Back

References

  1. Weiss KM, Terwilliger JD. How many diseases does it take to map a gene with SNPs? Nat Genet2000;26:151–7.[ISI][Medline]
  2. Terwilliger JD, Weiss KM. Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr Opin Biotechnol1998;9:578–94.[ISI][Medline]
  3. Lander ES, Schork NJ. Genetic dissection of complex traits [published erratum appears in Science 1994;266:353]. Science1994;265:2037–48.[ISI][Medline]
  4. Schork NJ, Cardon LR, Xu X. The future of genetic epidemiology. Trends Genet1998;14:266–72.[ISI][Medline]
  5. Morel L, Blenman KR, Croker BP, Wakeland EK. The major murine systemic lupus erythematosus susceptibility locus, Sle1, is a cluster of functionally related genes. Proc Natl Acad Sci USA2001;98:1787–92.[Abstract/Free Full Text]
  6. Schork NJ, Fallin D, Lanchbury JS. Single nucleotide polymorphisms and the future of genetic epidemiology. Clin Genet2000;58:250–64.[ISI][Medline]
  7. Shiozawa S, Hayashi S, Tsukamoto Y et al. Identification of the gene loci that predispose to rheumatoid arthritis. Int Immunol1998;10:1891–5.[Abstract]
  8. Cornelis F, Faure S, Martinez M et al. New susceptibility locus for rheumatoid arthritis suggested by a genome-wide linkage study. Proc Natl Acad Sci USA1998;95:10746–50.[Abstract/Free Full Text]
  9. Jawaheer D, Seldin MF, Amos CI et al. A genomewide screen in multiplex rheumatoid arthritis families suggests genetic overlap with other autoimmune diseases. Am J Hum Genet2001;68:927–36.[ISI][Medline]
  10. Gaffney PM, Kearns GM, Shark KB et al. A genome-wide search for susceptibility genes in human systemic lupus erythematosus sib-pair families. Proc Natl Acad Sci USA1998;95:14875–9.[Abstract/Free Full Text]
  11. Gaffney PM, Ortmann WA, Selby SA et al. Genome screening in human systemic lupus erythematosus: results from a second Minnesota cohort and combined analyses of 187 sib-pair families. Am J Hum Genet2000;66:547–56.[ISI][Medline]
  12. Gray-McGuire C, Moser KL, Gaffney PM et al. Genome scan of human systemic lupus erythematosus by regression modeling: evidence of linkage and epistasis at 4p16-15.2. Am J Hum Genet2000;67:1460–9.[ISI][Medline]
  13. Moser KL, Neas BR, Salmon JE et al. Genome scan of human systemic lupus erythematosus: evidence for linkage on chromosome 1q in African-American pedigrees. Proc Natl Acad Sci USA1998;95:14869–74.[Abstract/Free Full Text]
  14. Shai R, Quismorio FP Jr, Li L et al. Genome-wide screen for systemic lupus erythematosus susceptibility genes in multiplex families. Hum Mol Genet1999;8:639–44.[Abstract/Free Full Text]
  15. Johanneson B, Steinsson K, Lindqvist AK et al. A comparison of genome-scans performed in multicase families with systemic lupus erythematosus from different population groups. J Autoimmun1999;13:137–41.[ISI][Medline]
  16. Magnusson V, Lindqvist AK, Castillejo-Lopez C et al. Fine mapping of the SLEB2 locus involved in susceptibility to systemic lupus erythematosus. Genomics2000;70:307–14.[ISI][Medline]
  17. Lindqvist AK, Steinsson K, Johanneson B et al. A susceptibility locus for human systemic lupus erythematosus (hSLE1) on chromosome 2q. J Autoimmun2000;14:169–78.[ISI][Medline]
  18. Brown MA, Pile KD, Kennedy LG et al. A genome-wide screen for susceptibility loci in ankylosing spondylitis. Arthritis Rheum1998;41:588–95.[ISI][Medline]
  19. Laval SH, Timms A, Edwards S et al. Whole-genome screening in ankylosing spondylitis: evidence of non-MHC genetic-susceptibility loci. Am J Hum Genet2001;68:918–26.[ISI][Medline]
  20. Loughlin J, Mustafa Z, Irven C et al. Stratification analysis of an osteoarthritis genome screen—suggestive linkage to chromosomes 4, 6 and 16. Am J Hum Genet1999;65:1795–8.[ISI][Medline]
  21. Leppavuori J, Kujala U, Kinnunen J et al. Genome scan for predisposing loci for distal interphalangeal joint osteoarthritis: evidence for a locus on 2q. Am J Hum Genet1999;65:1060–7.[ISI][Medline]
  22. Loughlin J, Mustafa Z, Smith A et al. Linkage analysis of chromosome 2q in osteoarthritis. Rheumatology2000;39:377–81.[Abstract/Free Full Text]
  23. Peto J, Collins N, Barfoot R et al. Prevalence of BRCA1 and BRCA2 gene mutations in patients with early-onset breast cancer. J Natl Cancer Inst1999;91:943–9.[Abstract/Free Full Text]
  24. Polymeropoulos MH. Genetics of Parkinson's disease. Ann NY Acad Sci2000;920:28–32.[Abstract/Free Full Text]
  25. Pickering MC, Botto M, Taylor PR, Lachmann PJ, Walport MJ. Systemic lupus erythematosus, complement deficiency and apoptosis. Adv Immunol2000;76:227–324.[ISI][Medline]
  26. Hardwick LJ, Walsh S, Butcher S et al. Genetic mapping of susceptibility loci in the genes involved in rheumatoid arthritis. J Rheumatol1997;24:197–8.[ISI][Medline]
  27. Harley JB, Sestak AL, Willis LG, Fu SM, Hansen JA, Reichlin M. A model for disease heterogeneity in systemic lupus erythematosus. Relationships between histocompatibility antigens, autoantibodies, and lymphopenia or renal disease. Arthritis Rheum1989;32:826–36.[ISI][Medline]
  28. Stephens HA, McHugh NJ, Maddison PJ, Isenberg DA, Welsh KI, Panayi GS. HLA class II restriction of autoantibody production in patients with systemic lupus erythematosus. Immunogenetics1991;33:276–80.[ISI][Medline]
  29. Risch N. Searching for genes in complex diseases: lessons from systemic lupus erythematosus. J Clin Invest2000;105:1503–6.[Free Full Text]
  30. Bell JI, Lathrop GM. Multiple loci for multiple sclerosis. Nature Genet1996;13:377–8.[ISI][Medline]
  31. Risch NJ. Searching for genetic determinants in the new millennium. Nature2000;405:847–56.[ISI][Medline]
  32. Schork NJ, Fallin D, Thiel B et al. The future of genetic case–control studies. Adv Genet2001;42:191–212.[Medline]
  33. Lanchbury JS, Schork NJ. Peaks and troughs in linkage disequilibrium mapping for the rheumatic diseases. Rheumatology2000;39:435–56.
  34. Wise LH, Lanchbury JS, Lewis CM. Meta-analysis of genome searches. Ann Hum Genet1999;63:263–72.[ISI][Medline]
  35. Merriman TR, Cordell HJ, Eaves IA et al. Suggestive evidence for association of human chromosome 18q12-q21 and its orthologue on rat and mouse chromosome 18 with several autoimmune diseases. Diabetes2001;50:184–94.[Abstract/Free Full Text]
  36. Fife MS, Fisher SA, John S et al. Multipoint linkage analysis of a candidate gene locus in rheumatoid arthritis demonstrates significant evidence of linkage and association with the corticotropin-releasing hormone genomic region. Arthritis Rheum2000;43:1673–8.[ISI][Medline]
  37. Fife MS, Steer S, Fisher SA et al. A single corticotropin releasing hormone (CRH) genomic region (8q13) haplotype is associated with both familial and sporadic rheumatoid arthritis. Arthritis Rheum2002;46:75–82.[ISI][Medline]
  38. Martin ER, Lai EH, Gilbert JR et al. SNPing away at complex diseases: analysis of single-nucleotide polymorphisms around APOE in Alzheimer disease. Am J Hum Genet2000;67:383–94.[ISI][Medline]
  39. Roses AD. A model for susceptibility polymorphisms for complex diseases: apolipoprotein E and Alzheimer disease. Neurogenetics1997;1:3–11.[ISI][Medline]
  40. Fallin D, Cohen A, Essioux L et al. Genetic analysis of case/control data using estimated haplotype frequencies: application to APOE locus variation and Alzheimer's disease. Genome Res2001;11:143–51.[Abstract/Free Full Text]
  41. Balsa A, Barrera P, Westhovens R et al. Clinical and immunogenetic characteristics of European multicase rheumatoid arthritis families. Ann Rheum Dis2001;60:573–6.[Abstract/Free Full Text]
  42. Kruglyak L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet1999;22:139–44.[ISI][Medline]
  43. Collins A, Lonjou C, Morton NE. Genetic epidemiology of single-nucleotide polymorphisms. Proc Natl Acad Sci USA1999;96:15173–7.[Abstract/Free Full Text]
  44. Lander ES, Linton LM, Birren B et al. Initial sequencing and analysis of the human genome. Nature2001;409:860–921.[ISI][Medline]
  45. Risch N. Assessing the role of HLA-linked and unlinked determinants of disease. Am J Hum Genet1987;40:1–14.[ISI][Medline]
  46. Lanchbury JS, Hall MA, Fife MS. Interferon gamma gene in rheumatoid arthritis. Lancet2000;356:2192.