1 Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, U.K
2 Oxagen, Abingdon, Oxon, U.K
3 Molecular Genetics Program, Benaroya Research Center and Department of Immunology, University of Washington School of Medicine, Seattle, Washington
4 Imperial College Genetics and Genomics Research Institute, Imperial College Faculty of Medicine, Hammersmith Hospital, London, U.K
5 Department of Medical Genetics, Queens University Belfast, Belfast City Hospital, Belfast, U.K
6 Clinic of Diabetes, Institute of Diabetes, Nutrition and Metabolic Diseases "N. Paulescu" Bucharest, Romania
7 Institute of Medical Genetics, Ulleval University Hospital, University of Oslo, Oslo, Norway
8 Laboratory of Molecular Epidemiology, Division of Epidemiology, Norwegian Institute of Public Health, Oslo, Norway
9 Diabetes and Metabolism, Division of Medicine, University of Bristol, U.K
10 Diabetes and Genetic Epidemiology Unit, National Public Health Institute, University of Helsinki, Helsinki, Finland
11 Department of Public Health, University of Helsinki, Helsinki, Finland
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() |
---|
The insulin/IGF-2 variable number tandem repeat (INS-IGF2 VNTR) is situated 600 bp 5' of the INS transcription start site and is comprised of 1415 bp tandem repeating sequences with the consensus ACAGGGGTSYGGGG (1). In Caucasian populations, this VNTR has two main size classes. The shorter class I alleles have between 26 and 63 repeat units, and the class III alleles have between 141 and 209 repeat units. Intermediately sized class II alleles are rare in white European populations (1,2). Early studies (1,3,4) of INS association with type 1 diabetes reported that the class I VNTR homozygous genotype was present at a higher frequency in case subjects compared with control subjects. Subsequent studies of flanking polymorphisms reported that this association was restricted to markers within a 19-kb interval (5) and, later, to a 4.1-kb interval spanning INS (6). Within this interval there were 10 candidate causal common variants, the VNTR and nine single nucleotide polymorphism (SNPs) (6). Of these 10 candidates, the strongest disease associations were with the VNTR and the 23HphI and +1140A/C SNPs (relative risk [RR] for homozygous genotype = 4.5, for all three polymorphisms). A number of additional studies (710) excluded the SNPs in the 4.1-kb region and led to the VNTR being proposed as the etiological variant (11). This proposal was principally based on the differential association of class III VNTRbearing haplotypes, the protective haplotype (PH) and very protective haplotype (VPH) (9), a phenomenon most easily explained by sequence variation within the VNTR rather than cis effects of SNP alleles on these haplotypes. In addition, expression studies (12,13) suggested that the VNTR could directly alter the transcription of INS and IGF2. However, the fine mapping of IDDM2 susceptibility to the VNTR was heavily reliant on the premise that the association was restricted to the 4.1-kb interval. Thus, when Doria et al. (14) reported disease association with a SNP in the 5' region of the tyrosine hydroxylase gene (TH) and reinterpreted the findings of Lucassen et al. (6), it became possible that the IDDM2 causal variant mapped outside the 4.1-kb region. Here we have performed a more detailed genetic analysis of the IDDM2 region to address the uncertainties surrounding the location of the causal variant.
We identified novel polymorphisms in the region by extensive sequencing and resequencing (online appendix Tables 1 and 2 [available from http://diabetes.diabetesjournals.org]) and, together with additional polymorphisms from published literature and public databases, compiled a much more extensive and dense variant map than had been previously used (in total, 177 polymorphisms) (online appendix Table 3). By genotyping polymorphisms in up to 434 affected sibpair families from the U.K. and combining data with those analyzed by Bennett et al. (9), we assembled genotypes for a total of 75 markers across the region. Analysis of parental chromosomes revealed five main regions of strong LD in the TH/INS/IGF2 region and at a neighboring gene, H19 (Fig. 1). In the central region of 28.5 kb, 21-marker diversity is limited to 16 haplotypes with a frequency >1%, accounting for 84% of all chromosomes. Within this central region, the 4.1-kb region (6) is identifiable, with diversity limited to four haplotypes with frequency >1%, accounting for 98% of all chromosomes (Fig. 2).
|
|
|
|
|
Previously, both 23HphI and +1140A/C SNPs have been excluded as causal variants (9), based on the observation that these SNPs had the same alleles on both the PH and VPH, and the association of these haplotypes with type 1 diabetes had been shown to differ (P = 0.048). However, because this P value was marginal and no correction for the relatedness of affected individuals was made, we sought to replicate this finding in a larger dataset. Further families from Finland, Romania, Norway, the U.S., Barts Oxford study, and additional simplex families from the U.K. were genotyped at 23HphI and +1428FokI, since the haplotypes of these two SNPs distinguish between the susceptible haplotype (A allele at 23HphI), PH (T allele at 23HphI and A allele at +1428FokI), and VPH (T allele at 23HphI and G allele at +1428FokI). In total, 3,722 fully genotyped affected offspring (distributed among 3,056 pedigrees) also had both parents fully genotyped at both loci.
To investigate the PH and VPH association, case and pseudo-control sets were generated in which the phase of the transmitted 23HphI and +1428FokI alleles was also determined. Of the 3,722 cases in 3,056 pedigrees, phase was unambiguously determinable for 3,585 cases (in 2,960 pedigrees). The resulting haplotypes were then assigned as either PH or VPH, as previously defined (9), or as class I VNTRbearing (principally the susceptible haplotype). Parental haplotype frequencies and the numbers of families for which phase was determinable in each population are shown in Table 1. Haplotype risks for the PH and VPH, relative to the class Ibearing haplotypes, were estimated by conditional logistic regression and found to be nearly identical. For the PH, the haplotype risk was 0.46 (P = 2.0 x 1040, 95% CI 0.410.52), whereas for the VPH, the haplotype risk was 0.43 (P = 3.0 x 1022, 95% CI 0.370.51). There was no evidence for population heterogeneity for these haplotype risks using either a seven-population categorization (populations as in Table 2, 12 df, P = 0.36) or a five-population categorization (U.K. Warren, U.K. simplex, and Barts Oxford study grouped, 8 df, P = 0.44). The haplotype risks in each population individually are shown in Table 2. Further tests were constructed to estimate the risks of the six possible haplotype combinations, but no significant evidence for a difference in association between PH homozygotes and VPH homozygotes or between class I/PH and class I/VPH heterozygotes was observed (data not shown). Given all of the above results, it can be concluded that no significant evidence for a difference in association between the PH and VPH is found in these data.
We can conclude that type 1 diabetes susceptibility in this region does map to one (or perhaps a combination) of three common polymorphisms in a 2-kb region at INS, but cannot be mapped precisely to the VNTR. In the absence of a VPH effect, it is unlikely that the resolution of VNTR, 23HphI, and +1140A/C can be achieved by association studies in European-derived populations owing to the strength of LD between these markers. A study (2) of INS haplotypes indicates that the three polymorphisms are also in very strong LD in subjects from diverse populations. However, if class II VNTR risk were to differ from class III VNTR risk, resolution of these three polymorphisms may be achievable in African populations if sufficient sample sizes were available, perhaps using SNPs on class IIbearing haplotypes as surrogate markers (2). Despite the current lack of genetic evidence that would enable the resolution of the three remaining candidates for IDDM2, it is noted that the VNTR remains the best candidate. Functionally it contains multiple binding sites for transcription factors such as Pur-1 (13,17), and the type 1 diabetes susceptibility at INS has been proposed to arise from different levels of thymic expression (1820), whereas there is no obvious functional role for either of the candidate SNPs. We also note that our previous conclusion, which was based on the existence of a difference in risk between the PH and VPH, that the IDDM2 locus was a dominant protective trait (11) is no longer valid.
More generally, in the context of fine mapping susceptibility loci in common multifactorial diseases, our results confirm, as we found for the CTLA-4 gene in Graves disease (21), that, despite strong LD, small discrete regions can be pinpointed, providing that sufficient sample sizes are used. This level of mapping resolution greatly reduces the number of polymorphisms that have to be analyzed for functional effects.
![]() |
RESEARCH DESIGN AND METHODS |
---|
![]() ![]() ![]() ![]() |
---|
Sequence data.
Sequence data for PCR primer design were obtained from GenBank (accession nos. L15440, M32053, AC004556, AF087017, M23597, AC006408, and AH010044) and from shotgun sequencing performed by Incyte Genomics of RPCI11 BAC clone number "bA"94F12 (supplied by the Wellcome Trust Sanger Institute, Cambridge, U.K.). Contigs generated from the shotgun sequencing were joined by the generation and sequencing of PCR products spanning the gaps between contigs, and unresolved contig positions and orientations were then resolved by comparison with sequence data obtained using the Celera Discovery System. Data for the region covered by the novel sequence has since been submitted independently by researchers from the Whitehead Institute/MIT Center for Genome Research as accession no. AC132217.
SNP identification.
SNPs were identified either by denaturing high-performance liquid chromatography as previously described (25) or by sequencing of 32 individuals using BigDye Terminator chemistry and ABI 3700 instrumentation (Applied Biosystems, Foster City, CA). The primers used for SNP identification are shown in online appendix Tables 1 and 2. Sequence data were processed in the Staden package software (www.mrc-lmb.cam.ac.uk/pubseq). One hundred sixty-eight polymorphisms were identified, 24 of which had previously been reported in the literature. Combined with seven other SNPs identified from the literature, the VNTR minisatellite and TH microsatellite, a total of 177 polymorphisms were mapped to the TH-INS-IGF2-H19 region (online appendix Table 3).
Genotyping.
Genotype data were obtained from Invader assays (Third Wave Technologies, Madison, WI), TaqMan chemistry (Applied Biosystems), Pyrosequencing chemistry (Pyrosequencing, Uppsala, Sweden), or PCR restriction fragmentlength polymorphism digests. The estimated error rate for these technologies was 1% (26,27). Additional TH microsatellite genotypes were generated from fluorescently labeled PCR products and sized using ABI 3700 instrumentation and software. Data were combined with those that were previously published (9,16).
Data analysis.
Data were examined for misinheritances using PedCheck (Jeff OConnell, 1997, 1999, University of Pittsburgh, Pittsburgh, PA) and recombinations using GAS (Genetic Analysis System [http://users.ox.ac.uk/ayoung/gas.html]) and potential genotyping errors removed. Intermarker pairwise estimates of LD (D') were estimated with Stata 7 (Stata, San Mateo, CA) using pwld, which is available as part of the Genassoc package available at www.mrc-bsu.cam.ac.uk/pub/methodology/genetics. SNPs with a parental allele frequency <5% and multiallelic markers with rare alleles (TH microsatellite and INS-VNTR) were excluded from D' estimates to prevent inaccurate estimates due to sparse data. LD blocks were assigned by visual inspection of the matrix of pairwise D' estimates. Pseudo-controls were generated and conditional logistic regression analyses performed in Stata 7 using routines from the Genassoc package, according to the method described by Cordell and Clayton (15). For each affected subject, the corresponding pseudo-controls are assigned all of the other possible genotypes of offspring that could have been generated from the parents. In the subsequent conditional logistic regression analysis, case subjects and pseudo-controls are matched according to the parent-case set that they were generated from. In analyses of SNP haplotypes representing the PH and VPH, phase was determined using the phase option of the pseudocc routine of the Genassoc package. In all analyses, association was evaluated by fitting a conditional logistic regression model (as used in regular matched case-control studies), in which the RR of disease is given by ßixi... ßjxj, where xi is an indicator variable for the genotypes (or combinations of phased haplotypes) at locus i of j loci included in the test and ßi... ßj are the parameters to be maximized. The fitted model is compared with the appropriate null hypothesis model in which all ßi = 0, correcting for nonindependence of sibs by use of robust variance estimation. These analyses were carried out using the rclogit command (from the Genassoc package) within Stata 7. In the case of the TH microsatellite, the values were plotted for the most associated allele (Z-16/106) were uncorrected for the number of alleles tested. P values for association, exon, and SNP positions were plotted for figures using the Generic Genome Browser available at www.gmod.org.
![]() |
ACKNOWLEDGMENTS |
---|
We thank the members of the DNA resource team and Neil Walker of the JDRF/WT Diabetes and Inflammation Laboratory for sample and data services. We thank Diabetes U.K., the Human Biological Data Interchange, and the Norwegian Study Group for Childhood Diabetes for collection of the U.K., U.S., and Norwegian families, respectively.
![]() |
FOOTNOTES |
---|
Additional information for this article can be found in an online appendix at http://diabetes.diabetesjournals.org. Further information on JDRF/WT Diabetes and Inflammation Laboratory research, including gene annotations and polymorphisms, is available at http://dil-gbrowse.cimr.cam.ac.uk/cgi-bin/DIL_GenomeView.cgi.
Address correspondence and reprint requests to Bryan J. Barratt, JDRF/WT Diabetes and Inflammation Laboratory, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, U.K. E-mail: bryan.barratt{at}cimr.cam.ac.uk
Received for publication January 28, 2004 and accepted in revised form March 29, 2004
LD, linkage disequilibrium; PH, protective haplotype; SNP, single nucleotide polymorphism; VNTR, variable number tandem repeat; VPH, very protective haplotype
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() |
---|