1 Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania
2 Molecular Genetics Program, Virginia Mason Research Center, Seattle, Washington
3 Department of Immunology, University of Washington School of Medicine, Seattle, Washington
4 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, U.K.
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We have previously reported the results of a genome-wide multipoint linkage analysis of 438 microsatellite markers in type 1 diabetic families. This genome scan was performed in two stages. In the initial stage, 212 affected sib pairs (ASPs) were studied for linkage with markers spaced at 10-cM intervals across the genome. In the second stage, markers in regions that displayed nominal evidence of linkage to type 1 diabetes in the first stage screen, or that had been identified as possibly harboring type 1 diabetes genes in other studies, were genotyped in an independent panel of 467 ASPs (5). The highest multipoint LOD score observed, excluding the HLA region, was 3.31 and occurred near the marker D1S1617 on chromosome 1q. Five additional markers flanking D1S1617 were genotyped in the full panel of 679 ASPs and all yielded LODs >2.0. Other than this and the HLA region, there were no other markers in this genome scan that yielded LODs >1.8.
Even after genotyping the additional 5 markers and 467 ASPs, the region of localization on chromosome 1 was still quite large. In a subsequent study (6), we merged the raw genotype data from our genome scan with those derived from a collection of type 1 diabetic multiplex families of U.K. origin (4). This merged dataset contained more families than our initial genome scan (831 ASPs in 767 families, 667 with full genome-scan data) but did not greatly increase the density of markers genotyped in the chromosome 1q42 region. The maximum LOD score in the region was reduced to 2.2, and there were two peaks with this LOD score located within 5 cM.
In the current study, we sought to refine the localization of the putative type 1 diabetes susceptibility locus on 1q42 by genotyping additional microsatellite and single-nucleotide polymorphism (SNP) markers in the same collection of ASPs previously used (5,6). To establish the correct marker order, we constructed a genomic map of an 10.5-Mb interval from this region. We resequenced 13 potential candidate genes and 42 randomly chosen fragments of DNA in the region, identified 60 SNPs, and tested 30 of these markers as well as the previously genotyped microsatellite markers for allelic association with type 1 diabetes. From these studies, we identified a haplotype, defined by three consecutive markers spanning
600 kb, that was preferentially transmitted to affected offspring in type 1 diabetic families. These results provide support for a type 1 diabetes susceptibility locus on chromosome 1q42 and identify a candidate region for positional cloning.
![]() |
RESEARCH DESIGN AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sequence-tagged site markers and isolation of bacterial artificial chromosome and P1 artificial chromosome clones.
A total of 36 sequence-tagged site (STS) markers mapping to the 7-cM region surrounding D1S1617 on chromosome 1q42 were identified using the Stanford RH map (http://www-shgc.stanford.edu/Mapping/) and Whitehead Institute/MIT Center for Genome Research YAC contig maps (http://www-genome.wi.mit.edu). Sites at The Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/HGP/Chr1/) and NCBI Genome View (http://www.ncbi.nlm.nih.gov/cgi-bin/Entrez/map_search) were used for the selection of additional markers mapping to this region.
Bacterial artificial chromosome (BAC) clones (n = 104) were isolated by screening high-density filters of the Roswell Park Cancer Institute (RPCI)-11 segment 2 Human Male BAC library (http://www.chori.org/bacpac/) with probes prepared from the initial 36 markers. Filter hybridizations were carried out as described by Cheung et al. (12). P1 artificial chromosome (PAC) clones (n = 309) that map to the region of chromosome 1 bounded by SHGC-30224 and D1S437 were isolated from the RPCI-4 and -5 Human Male PAC libraries and provided to us by The Wellcome Trust Sanger Institute.
Additional BAC clones were isolated from the California Institute of Technology (CITB) library by PCR screening of clone pools as per the manufacturers instructions (Research Genetics). Initial screening was performed using the D1S1617 marker. All identified clones were end-sequenced. New primer sets were generated from end sequences and used to screen all previously isolated clones. Primer sets that failed to amplify from previously isolated clones were used in subsequent rounds of library screening. Additional contigs were initiated by library screening with the following microsatellites: D1S1644, D1S439, D1S1656, and D1S2712.
Sequencing of BAC and PAC clones.
DNA for end-sequencing was prepared from 200 ml of overnight cultures using the QIAfilter Plasmid Midi DNA purification kit (Qiagen). For some clones for which end sequences were difficult to determine, additional purification using CsCl2 sedimentation was carried out. Automated dideoxy-terminator cycle sequencing was carried out with SP6 and T7 primers on 1 µg BAC or PAC DNA using the ABI Big Dye Terminator sequencing kit according to manufacturers protocols (Applied Biosystems). Reaction products were purified on G50 spin columns and analyzed on an ABI 377 automated sequencer. Sequences were analyzed with Phred (13,14) and checked for repeat elements using RepeatMasker (http://ftp.genome.washington.edu/cgi-bin/RepeatMasker).
Internal sequence from BAC and PAC clones for which the corresponding genomic sequence was not available in GenBank at the time of isolation was obtained by sample sequencing. Briefly, BAC or PAC clones were separately digested to completion with several different restriction enzymes having 6-nucleotide recognition sequences (EcoRI, BamHI, and HindIII). The digested products were size fractionated on agarose gels and products >2 kb in size were cloned into plasmids. Colonies (n = 3040) were picked at random from each library, and single-pass sequences were determined using flanking SP6 and T7 primers. Sequences were used for STS and SNP development as described below. All sequences were also repeat-masked and used in BLASTN and BLASTX searches (15) of GenBank to identify possible coding regions.
STS content mapping.
BAC and PAC clones were grown overnight on LB plates containing 170 µg/ml chloramphenicol or 25 µg/ml kanamycin. Single colonies from each clone were then grown overnight in LB liquid cultures supplemented with chloramphenicol or kanamycin. Membranes for dot blot hybridizations were prepared by spotting 2 µl of overnight liquid culture onto Hybond N+ nylon membranes (Amersham Pharmacia Biotech). STS probes for hybridization were generated by PCR amplification. PCR primers based on published STS sequence information were obtained from Research Genetics and Operon Technologies. The PCR mixture contained 100 ng genomic DNA, 0.4 mmol/l primers, 200 µmol/l dNTPs, 2.5 units Taq polymerase (Promega), 2.5 mmol/l MgCl2, and 1x reaction buffer A (Promega) in a final volume of 50 µl. Amplifications were carried out as follows: denaturation at 95°C for 5 min, 35 cycles of 94°C for 45 s, 55°C for 45 s, 72°C for 45 s, followed by 72°C for 10 min. The PCR-generated probes were labeled with [-32P]dCTP and hybridized to the dot blot membranes as described (12). The STS content of the clones was also verified by PCR using 5 µl of the overnight LB culture (diluted 1:10 in H2O) using the PCR conditions described above. PCR products were detected on 2% agarose gels.
SNP identification.
For SNP discovery, PCR primers were selected from available sequence and were predicted to amplify fragments of 600800 nucleotides in length. These primer sets were used to amplify fragments from eight individuals with type 1 diabetes. Five of the individuals were members of ASPs that shared both parental haplotypes identical by descent at microsatellite markers spanning the region of interest. The remaining three individuals were selected from ASPs for which parental haplotypes were not shared in the same region. All PCR products were sequenced in both directions, and the sequences were compared between individuals to identify SNPs. Only those SNPs for which all three possible genotypes were observed among the eight screening samples were tested for allelic association with type 1 diabetes by genotyping in additional samples.
Gene resequencing.
Candidate genes were resequenced to identify polymorphism following the same strategy as outlined above for SNP discovery. PCR primers were designed to amplify all exons, 2050 nucleotides of flanking intronic sequence for each exon and 5001,000 nucleotides upstream of the first known transcribed nucleotide (i.e., the putative promoter region). PCR products were generated from the same eight individuals used for SNP discovery. Nucleotide sequences were determined and compared with identify polymorphisms.
Testing for association.
The markers used for linkage analysis (Table 1) were also tested for association by the transmission/disequilibrium test (TDT) (16). The test was performed using Genehunter version 2.01 (17) and the method of Martin et al. (18). Pairwise LD statistics for selected markers were also obtained from the output of Genehunter. Genotyping of individual SNPs was performed by single-strand conformation polymorphism, PCR restriction fragmentlength polymorphism, or primer-extension with high-performance liquid chromatography detection, depending on which technique yielded optimal results for a given marker. The frequencies of SNP alleles from type 1 diabetic subjects and control subjects were compared using Fishers exact test (Table 2). Any SNP yielding nominally significant evidence of allelic association (P = 0.05) in the comparison of type 1 diabetic subjects and control subjects was genotyped in nuclear families and retested for association and linkage by TDT.
|
|
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A detailed view of the map covering 10.5 Mb on 1q42 corresponding to the region of maximum LOD scores is available at http://genomics.med.upenn.edu/clonedb/index.htm. The portion of this map containing the region between D1S1617 and D1S251 appears in Fig. 1 (also see the section below entitled "Multilocus TDT Analyses"). The figure indicates the relative positions of the 17 microsatellite markers and SNPs and 21 known or hypothetical genes mapping to this region.
|
|
|
Additional SNP markers (labeled as anonymous in Table 2) identified by resequencing but not located within any known or predicted transcription unit were genotyped in type 1 diabetes cases and control subjects. Two of these markers displayed nominal evidence of association, which was not significant after correction for the number of markers tested.
Multilocus TDT analyses.
None of the candidate genes in the immediate region around the peak multipoint LOD score yielded significant evidence for association with type 1 diabetes. We therefore expanded our search by carrying out TDT analyses on all microsatellite and SNP markers used for our linkage analyses in the 1q42 region. We first examined three microsatellite markers located in genes: ADPRT, ITPKB (SHGC30224), and SERPINA8 (more commonly known as AGT). No significant evidence of allelic association with type 1 diabetes was seen for any of these markers.
We next performed the TDT on all markers in the 20-cM interval surrounding the peak in our multipoint linkage analysis and found modest evidence of linkage disequilibrium at five markers whose locations span 5 cM (Table 1). For two of these markers (D1S439 and AFM267xa5), the results reflect a small number (
10) of transmissions of very rare alleles. The remaining three markers are adjacent to each other in our physical map (Fig. 1), and for each marker the allele that displays preferential transmission is either the most common at that locus (D1S225, frequency = 0.27 and D1S2383, frequency = 0.6) or the second most common (D1S251, frequency = 0.22). D1S225 and D1S251 are located 470 kb apart, with D1S2383 centrally located
235 kb from each. Of these three markers, D1S225 is the nearest to the location of the peak LOD score on 1q42 and is
735 kb distal to D1S1617. At D1S225, allele 4 (of 15 alleles detected) was preferentially transmitted to affected offspring (54.4% of 768 transmissions,
2 = 6.02, P = 0.014). At D1S2383, allele 2 displayed excess transmission to affected offspring (54.4% of 722 transmissions,
2 = 5.67, P = 0.017). Finally, at D1S251, allele 1 of 14 alleles was preferentially transmitted to affected offspring (54.9% of 745 transmissions,
2 = 7.15, P = 0.007).
For the original TDT to provide a valid test for association (as opposed to linkage), families with more than one affected offspring must not be included. To solve this problem, we used an alternative formulation (18) that treats each sib pair as a unit. In the 445 sibships with exactly two affected sibs, the results were: allele 4 at D1S225: 2 = 3.61, allele 2 at D1S2383:
2 = 6.23, and allele 1 at D1S251:
2 = 5.31. Despite the reduction in sample size with these analyses, the results, with the possible exception of D1S225, do not differ greatly from the original results and are consistent with the earlier impression that the greatest preferential transmission and the strongest evidence of association occur at D1S251.
We also considered the possibility that preferential transmission occurs in the region studied but is unrelated to type 1 diabetes. We genotyped members of 40 large families from CEPH (Center d Étude du Polymorphisme Humain), in which there are no known familial diseases. For the alleles of interest, the values of 2 were 0.60 at D1S225, 2.59 at D1S2383, and 2.67 at D1S251; the smallest P value was 0.10. Thus, we see no significant evidence of preferential transmission in the absence of type 1 diabetes.
An examination of founder haplotypes in our collection of pedigrees revealed significant evidence of linkage disequilibrium overall both between alleles at D1S225 and D1S2383 and between alleles at D1S2383 and D1S251 (P < 10-7). To explore the relationship between type 1 diabetes and these associated markers as a block, we assessed the transmission of the three-marker haplotype containing the 4, 2, and 1 alleles at D1S225, D1S2383, and D1S251, respectively, by the TDT. This haplotype was preferentially transmitted to affected offspring (65.6% of 102 transmissions, 2 = 10.04, P = 0.0015). By the alternative formulation of Martin et al. (18), with much smaller sample size, preferential transmission of this haplotype to affected sib pairs as a unit is also significant (
2 = 7.54, P = 0.006).
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We have carried out two previous linkage analyses in type 1 diabetic families. In the first (5), with 679 ASPs, the only region besides HLA (IDDM1) with LOD scores >1.8 was a novel region on chromosome 1q42 (LOD-3.31). A second study (6) added more families to the analysis but did not increase the marker density in the 1q42 region. In this second analysis, with data from 831 ASPs, the region of localization on 1q42 broadened and there were two peaks with LOD scores of 2.2 within a 5-cM region. Because these results were obtained with the 831 ASPs that constitute essentially all of the multiplex type 1 diabetic families available in public repositories, it is unlikely that we can soon obtain a collection of families with sufficient power to confirm our initial finding. Therefore, in the current study, we focused on two other ways to extend the findings on chromosome 1. First, we increased the information content for linkage by genotyping a dense map of markers spanning the region. Second, we sought evidence of allelic association with type 1 diabetes at these and other markers.
In the current study, we constructed a physical map of the 1q42 region and used it to establish the map order for 35 new markers (31 microsatellites and 4 SNPs). With the addition of genotypes for these markers, the information content statistic for this region, as calculated by Genehunter, ranges from 0.83 to 0.91 for the full collection of 831 ASPs. Multipoint linkage analyses in this dataset revealed a single peak in the region with a maximum LOD score of 2.46 (P = 0.0004). A 1-LOD support interval for the localization spans 4.1 cM. Whereas the maximum LOD score in the region has declined slightly since our first report (5), it remains suggestive of linkage by recommended criteria (P < 7.4 x 10-4, LOD
2.2) (19) and identifies a limited region of elevated LOD scores amenable to linkage disequilibrium mapping.
To undertake a systematic search of the region for genes that might be involved in type 1 diabetes susceptibility, we constructed a physical map spanning the 7 cM that flanked D1S1617, which had the peak LOD in our linkage analysis. In the immediate surrounding region, we resequenced the coding regions of nine genes in affected and unaffected individuals in order to identify polymorphisms. With the exception of SPHAR and CAPN9, all of these genes (Table 3) are expressed in either pancreas or lymphoid tissue (blood, bone marrow, thymus, and spleen), where it might be anticipated that a gene involved in autoimmune destruction of islet cells would be expressed. A function is known for seven of the genes, but none suggests an obvious connection with type 1 diabetes. Therefore, we also considered a broader region and carried out partial resequencing of four additional genes for which some functional rationale, as type 1 diabetes candidates could be found. For example, ADPRT is expressed in pancreas, and it has been reported that null alleles of Adprt in mice protect against streptozotocin-induced diabetes (2022). CHS1 is mutated in Chediak-Higashi syndrome, a disorder with immune manifestations (23). These results were also negative.
Because our physical mapping efforts had identified 34 putative transcripts in the region of interest, and because a survey of the nine genes closest to the site of the peak LOD score in the region did not yield evidence of association with type 1 diabetes, we broadened our search for linkage disequilibrium to include anonymous markers spanning the 1q42 region. In addition to testing the SNP markers found by resequencing, we also tested for linkage disequilibrium at each of the markers genotyped in families for our linkage studies. Since these latter markers had already been genotyped in >600 families, we could test for linkage disequilibrium by the TDT, eliminating concerns about population structure. Three of these markers yielded nominally significant results for a common allele. Whereas none of these findings would be significant if corrected for the 40 markers tested, it is striking that the three markers are adjacent in our physical map (Table 1 and Fig. 1). This finding suggested that the TDT results might reflect the inheritance of a three-marker haplotype that contains these specific alleles and confers elevated risk of type 1 diabetes. A test for transmission of the entire haplotype to affected offspring in type 1 diabetic families was consistent with this possibility.
The three markers and the haplotype that show association with type 1 diabetes are located 7351,200 kb telomeric to the region (D1S1617, D1S2847) with the maximum multipoint LOD, although still within a 1-LOD support interval surrounding the peak. Unlike mapping in a Mendelian disease, where recombinants can be identified and provide precise, though possibly broad, localization, the peak LOD for a disease like type 1 diabetes defines a region that is both broad and imprecise. Thus, a single putative susceptibility locus for type 1 diabetes could be responsible for the association found near D1S225-D1S251 and also account for the evidence for linkage seen as an LOD with a maximum that occurs 7351,200 kb away. From the current findings, it is not possible to determine whether the association observed near D1S225-D1S251 accounts for all, or only some, of the evidence of linkage observed in the 1q42 region. Analysis of the linkage data for chromosome 1 in the 56 families segregating for the associated (4-2-1) haplotype yields a regional maximum LOD score of 0.94, compared with the LOD of 2.46 obtained in the full panel of 767 families. None of the three markers that make up the associated haplotype are likely to contribute directly to type 1 diabetes susceptibility. Therefore, the association of this haplotype with one or more putative etiologic variants in the region is likely to be incomplete, and analysis with just the associated haplotype might well underestimate the contribution of a putative type 1 diabetes locus in this region to the evidence for linkage.
Genes in the vicinity of D1S251 have been studied previously because of the report of cosegregation between a chromosomal translocation in the region and major psychiatric disorders in an extended Scottish pedigree (24,25). As a result of these studies, two genes in the D1S225 through D1S251 interval that are expressed in the pancreas have been described: TSNAX, a translin-associated factor (26), and EGLN1, a putative prolyl hydroxylase (27). Neither these genes nor the one other known gene in the interval, GNPAT (glyceronephosphate-O-acyltransferase) (28), are obvious candidates for type 1 diabetes susceptibility genes. However, current genome sequence data suggest that there may be as many as seven additional transcription units in this interval and several more immediately flanking it (NCBI, Ensembl, and Celera). These genes will need to be evaluated as candidate type 1 diabetes susceptibility genes in future studies.
![]() |
ACKNOWLEDGMENTS |
---|
The authors thank Melissa Arcaro for sequencing some BAC ends, Nancy Cox and Warren Ewens for advice regarding biostatistical issues, Bob Hemphill and Lucy Southworth for help with computing, and Mary West for expert assistance with manuscript preparation. In addition we thank the Human Biological Data Interchange, the British Diabetic Association, and the many type 1 diabetic patients and their families who contributed to these repositories.
![]() |
FOOTNOTES |
---|
Received for publication 5 April 2002 and accepted in revised form 9 July 2002.
C.S. is employed by Merck Pharmaceuticals.
ASP, affected sib pair; BAC, bacterial artificial chromosome; LOD, logarithm of odds; PAC, P1 artificial chromosome; RPCI, Roswell Park Cancer Institute; SNP, single-nucleotide polymorphism; STS, sequence-tagged site; TDT, transmission/disequilibrium test.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|