Molecular Immunogenetics Unit, Department of Rheumatology, Division of Medicine, 5th Floor, Thomas Guy House, Guy's Hospital Campus, Guy's, King's and St Thomas Hospitals School of Medicine, London SE1 9RT, UK
It is thought with some reason that there is no more dreadful punishment than futile and hopeless labour.Albert Camus
Hypothesis-free genetic mapping of complex disease genes has been relatively unsuccessful to date and the rheumatic diseases are no exception. Weiss and Terwilliger in a recent article asked how many diseases it took to map a gene using single nucleotide polymorphisms (SNPs) [1; see also 2]. This ironic comment was directed at the raising of the SNP to the status of icon and saviour of human genetics. Behind the jibes lies very real disappointment at the inability of current technology and approaches to deliver verifiable linkages and consequent associations between genetic markers and diseases or intermediate disease phenotypes. If the disillusionment is palpable in the academic sector, it has been crippling for the private and publicly quoted specialized human genetics companies. In this short article we will summarize some of the findings of our recent studies in rheumatoid arthritis (RA), which offer insights into the shortcomings of the standard human genetic approach as currently practised. We hope that these lessons can be combined with the experience of others to inform the next phase of human genetics research and to speed the discovery of gene differences responsible for rheumatic diseases.
The sector of human genetics which uses the genetic variation found associated with genes has two basic approaches to link variation with complex disease phenotypes. These are the whole-genome scan and the candidate gene approach. The first uses a set of markers on a genome map which are selected on the basis of geography and utility, while the second concentrates on genetic changes in and around a gene which have a possible causative role in the observed variation in phenotype. This can be the presence or absence of disease or differences in a quantity, such as autoantibody levels, which are influenced by heritable factors, presumably polymorphisms. Both these approaches have been well reviewed [3, 4]. Family and population human genetic methodologies are the first step in identifying a gene whose role in physiology and pathology can be investigated exhaustively. Here we have excluded discussion of animal studies, in which the science moves from findings in animals directly to human biology, missing out the human genetics step. However, studies in rodents may prove useful in bolstering confidence in weak human linkage to an area where there is clear synteny between species [5].
Significant progress in understanding the rheumatic diseases has not so far been achieved as a result of human genetic approaches. This negative statement takes into account the many discoveries of HLA associations but also recognizes that these have yet to be translated into direct therapeutic advances. However, it is outside the HLA system where most energies have been focused over the last 10 yr and it is in this sector where there has been a failure to deliver disease genes. None of the current therapeutic advances based on TNF- and IL-1ß biology have been derived from the several papers devoted to linkage and association studies of these gene regions.
Here we will concentrate on genome screens rather than candidate association studies, as the latter are currently difficult to evaluate critically and the laboratory and analytical approaches are evolving rapidly [6]. It is relevant that until these approaches are fine-tuned it will be difficult, but not impossible, to move from linkage via positional candidates to associated disease genes. However, in most cases investigation is mired at the stage of replication of initial findings rather than this latter phase. Of the major rheumatic diseases, RA has been the subject of three published whole-genome screens [79]. The genome screen from the British consortium is expected shortly. Several scans have been reported for systemic lupus erythematosus (SLE), with a number of follow-up studies by the same groups [1017], two in ankylosing spondylitis (AS) [18, 19] and two in osteoarthritis [2022]. None of the scans and their follow-up surveys have yet delivered disease genes which have been shown to be relevant either to familial complex disease or in the unrelated disease population at large. This last point begs the question of whether the disease which is found in multicase families is influenced to the same extent by genes and by the same gene loci as the sporadic disease found at the population level. There are many instructive examples of familial forms of complex diseases, such as breast cancer [23] and Parkinson's disease [24], in which the familial disease loci do not appear to play a direct role in the general population. However, in these cases disease, while clinically similar to the sporadic form, is often of earlier onset and usually inherited in a Mendelian pattern. One relevant example is lupus, where there is a rich literature on lupus associated with genetic complement deficiencies [25]. It would appear that sporadic lupus is not generally associated with polymorphic complement variants and in this sense, while informing us of relevant pathological mechanisms, the rare Mendelian form stands apart. Similarly, the relevance of linkages and genes isolated by linkage disequilibrium approaches in founder populations needs to be examined carefully in the commoner non-founder groups.
The general relevance of gene differences mapped in families to outbred disease populations will require the reduction of valid linkages to associated genetic variation. If the question is reversed we may ask whether the HLA genomic region which is strongly associated with AS, and less so with RA and SLE, is detected consistently in genome scans of multicase families. In essence, the answer is yes, with the lod score or equivalent roughly proportional to the amount of genetic variance accounted for by the locus at the population level in AS and RA [8, 9, 18, 19, 26]. In SLE the results are much less clear-cut, perhaps reflecting the ethnic diversity of the multiple genome scans together with the fact that association of HLA variants with lupus is often stronger with clinical features than disease itself [27, 28]. This heterogeneity is enhanced when multiple population groups are examined, as has been discussed previously [29].
The difficulties inherent in the replication of genome scan data have been discussed widely [2933] and approaches such as meta-analysis have been suggested and methodologies formalized and implemented [34, 35]. There is a lack of empirical data to address the causes of non-replication, but the essence is power limitation due to any combination of inadequate initial sample size, genetic and clinical heterogeneity, epistasis, genotyping error, genetic map errors, insufficiently informative genetic markers and genetic marker frequency variation. The difficulties which beset geneticists are therefore biological, technical and sociological. The fact that there is an inability or unwillingness amongst investigators to collaborate to return optimal sample sizes is an issue that national and transnational funding bodies should address with vigour.
To shed light on some of the issues mentioned above, we would like to consider what may be learned from study of the CRH genomic region in RA. While CRH is an excellent candidate gene for RA, as has been reviewed, we took a wider genetic approach which combined aspects of a regional genome scan with a candidate gene approach [36, 37]. Data were then viewed as representative of a portion of a genome scan analysed by non-parametric methods. In the initial survey we examined nine simple tandem repeat (STR) or microsatellite markers over a 20 centimorgan (cM) region centring on CRH at 8q13. One of these markers, CRHRA1, was located on an unsequenced cosmid containing CRH and thus placed at a maximum of 40 kilobases (kb) from the structural locus itself. The initial screen was conducted in a presumably homogeneous collection of 295 UK Caucasoid families with at least two siblings affected with RA. The maximum single-point lod score obtained was 1.8 for marker D8S1723, which was located approximately 9 cM from the CRHRA1 marker and the candidate. Multipoint analysis lowered the lod score slightly, which for Mendelian traits usually raises the question of whether the finding is valid. However, with few verified complex disease genes to compare, it is interesting that lod scores around the APOE gene in Alzheimer's disease behaved similarly [38]. APOE is a well-verified gene for Alzheimer's disease in Caucasoid populations, encouraging the view that similar multipoint results should not discourage investigation of a particular complex disease locus [39, 40]. Ordinarily, the next stage would have been replication of this finding in a large set of RA-affected sibling pairs (ASP). These were not available at the time and were unlikely to be for several years, though the situation is currently being improved by European Consortium on Rheumatoid Arthritis Families and North American Rheumatoid Arthritis Consortium [9, 41]. As a consequence, we proceeded directly with the next step, which is one of the most problematical in human genetics: the transition from genetic linkage to association. Fortunately, the CRHRA1 STR was significantly associated with RA, as evidenced by distorted transmission in a transmission/disequilibrium test analysis. A single allele, CRHRA1*10, at a control population frequency of 25%, accounted for the effect. This meant the marker was within a linkage disequilibrium group which included the disease-relevant variation. However, the size of this interval is difficult to define prospectively as the controversial literature on linkage disequilibrium makes clear [42, 43]. The reason why CRH region linkage was not detected in the two Caucasoid RA genome scans [8, 9] is unclear, but it is worth reflecting on factors such as choice and informativeness of markers and success rate of genotyping as well as possible population differences. In our study, markers D8S1833 and D8S1767 showed no excess allele-sharing despite being close to or within the implicated region.
Rapid advances in human genome sequencing mean that it is now relatively easy to move from map position or candidate gene to draft-quality assembled sequence [44] (http://genome.ucsc.edu). However in the summer of 2000 only 3% of chromosome 8 sequence was available and the nearest sequence appeared to be approximately 20 cM distant. To localize the CRHRA1 marker, to derive new polymorphic markers in the region and to search for adjacent genes, we screened bacterial artificial chromosome clones containing CRHRA1. A contiguous sequence of around 90 kb was obtained and we established the genomic context of the CRH structural gene for the first time. A second polymorphic microsatellite (CRHRA2), located 6 kb nearer CRH, was characterized. Two haplotypes carrying the allele CRHRA1*10, which was overtransmitted in the 295 ASP set, were observed (CRHRA1*10-CRHRA2*14 and CRHRA1*10-CRHRA2*15). Due to the lack of additional ASP families, we were obliged to replicate the genetic association with the region rather than the linkage. Replication was achieved in a second set of 130 Caucasoid single-case RA families for CRHRA1*10, and both sets showed transmission distortion of the CRHRA1*10-CRHRA2*14 haplotype. While the CRHRA1*10-CRHRA2*15 haplotype was associated with RA in the ASP set, this was not the case in the single-case RA families. There is a direct relationship between power in a genetic association study and frequency of susceptibility alleles. CRHRA1*10 and CRHRA1*10-CRHRA2*14 are the second most common allele and haplotype respectively in unaffected parents of RA patients, at 25 and 9%. Had they been less frequent, the power to show transmission distortion and replication in our studies would have been reduced, perhaps resulting in a negative finding. On such arbitrary grounds are disease effects recorded or not.
Much has been written on measures of genetic effect for complex disease and the S and
R quantities of Risch are a widely traded currency [45]. In RA, estimates of
S vary between 4 and 10, with
SHLA estimated at around 1.8 or around one-third of the total. Our estimates show
SCRH to be about 1.14 or just less than 10% of the total genetic component [36]. This is important because arguments continue about the effect size of polygenes in autoimmune, rheumatic and other complex diseases. Clearly, the effect size of HLA is likely to be the exception and CRH may be a typical polygene. Effect sizes of less than that seen for CRH will be difficult to establish with the limited numbers of current ASP cohorts for RA and other rheumatic diseases. CRH may be typical in another sense as only around one-sixth of the families are linked. This establishes the possibility that large numbers of polygenes may contribute to a disease like RA, with extensive locus heterogeneity making the task of analysis more complicated. At least for two genomic regions, HLA and CRH, which have been shown to be linked and associated with RA, the effects hold true for both familial and sporadic forms, as the second group of simplex families were recruited as single, sporadic cases, providing a partial answer to the question posed earlier. Epistasis exists between CRH and HLA and there appears to be some evidence for sex-limitation, though this could be an effect of limited power in the male cohort in what is predominantly a female disease (unpublished observations). On current evidence, CRH is still the best candidate for RA susceptibility in the region. The draft human genome sequence shows the presence of a new gene, the transcription factor MURF2 or muscle-specific ring finger protein 2, which is close to CRH, but it is difficult to fit this gene into any pathogenetic framework for RA. Current efforts are directed at improving the genetic map around CRH by the discovery of SNP genetic markers, which are more abundant than STRs. This should enable refinement of disease haplotypes and, importantly, will provide a set of markers which may be transferred to other laboratories more reliably than the STRs, which were used in the initial work. We have commented elsewhere on the difficulties in sizing of STR alleles across different hardware and software platforms [46]. Once standardization has been achieved, the relevance of CRH genetic variation may be investigated in other RA cohorts and populations and in other immune diseases.
In genetic studies of rheumatic and other complex diseases, we are unable to modify many of the variables, such as allele frequency, locus heterogeneity and epistasis, which will affect study power. We can collaborate openly to maximize our sample sizes to avoid false-negative findings and share technology to enable rapid verification or otherwise of findings. The road ahead in human genetics has few legible signposts and most of us have limited fuel in our tanks. Good luck as well as prudence will be necessary if we are to avoid too many unhelpful detours.
Notes
Correspondence to: J. Lanchbury.
References