1 Department of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas
2 Department of Human Genetics, University of Utah Health Sciences Center, Salt Lake City, Utah
3 Department of Medicine, Central Arkansas Veterans Healthcare System, Little Rock, Arkansas
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Type 2 diabetes (MIM125853) likely encompasses a diverse set of diseases marked by elevated levels of plasma glucose. Among Caucasian populations, individuals with type 2 diabetes, individuals with the intermediate phenotype of impaired glucose tolerance, and likely individuals at risk of diabetes are all characterized by variable degrees of both decreased insulin action, particularly resistance to insulin-mediated muscle glucose uptake, and impaired insulin secretion in response to that decreased insulin action (1). Defects of both insulin action and insulin secretion among individuals with normal glucose tolerance predict later onset of diabetes (2). Despite the diverse phenotypic nature of type 2 diabetes, monozygotic and dizygotic twin studies, family studies, and marked differences in disease prevalence across populations all provide convincing evidence for an important role of genetic susceptibility loci in type 2 diabetes pathogenesis (1). Based on epidemiological data, the total sibling relative risk (s) has been estimated at 34 (3), although the number of loci that contribute to this risk is unclear.
Based on these data supporting type 2 diabetes susceptibility genes, genome scans for both type 2 diabetes and type 2 diabetes-related traits have been undertaken by multiple laboratories in Caucasian, Pima Indian, African-American, and Asian populations (1,4,5), among others. These scans have identified possible susceptibility loci throughout the genome, but to date only the NIDDM1 locus on chromosome 2q in Mexican-American subjects has been mapped to a single gene, the calpain 10 gene (6). Calpain 10 plays a small role in most other populations, however, and has been inconsistently replicated by linkage and association. Other regions with evidence for replication include chromosome 12q (79) and chromosome 20 (1012). A region on chromosome 1q21-q23 was identified independently among Pima Indian sib-pairs discordant for type 2 diabetes or Pima Indian sib-pairs with onset of diabetes before age 25 years (13) and in studies from our laboratory of 42 multiplex kindreds of Northern European ancestry ascertained in Utah (14). Subsequent studies in French families (15), English sib-pairs (16), and Amish families (17) and in preliminary studies of Chinese sib-pairs (18) have identified linkage of type 2 diabetes to this same region, very near the original Pima and Utah linkage peaks. Furthermore, this region was linked to HbA1c in the Framingham Offspring Study (19), to metabolic syndrome traits in nuclear families from Hong Kong (20), and to the possibly related phenotype of familial combined hyperlipidemia (21,22). Given the difficulty in replicating linkage in complex diseases, the finding of diabetes and related traits in at least 10 studies from diverse populations is striking. However, the exact map location of the linkage peaks, the specific trait or disease definition for the study, and the subgroup providing the evidence for linkage differs among studies.
In previous studies from our laboratory (23), the most significant linkage peak (logarithm of odds [LOD] = 4.3) was found using pedigrees trimmed to fit into the Genehunter program (24) under a partially penetrant recessive parametric model. The linkage peak was quite broad, with a 1 LOD CI that extended from between D1S305 and CRP to D1S196, or 20 cM. A similar location, albeit with lower significance, was identified with both sib-pair analysis (Mapmaker/Sibs) and nonparametric linkage (NPL). Studies in Pima Indians and in French families placed the linkage peak within 5 cM of our data, although initial Amish and English studies placed the peak centromeric or telomeric, respectively. In post hoc analyses from our laboratory, the LOD score was reduced when full families were used for fewer markers, when unaffected individuals were removed from the analysis, and when individuals with intermediate diagnoses were removed. In contrast, removal of two families that segregated hepatocyte nuclear factor 1
variants increased the LOD score to 4.87 in the remaining 40 families (23,25). Finally, we found no linkage to chromosome 1 in either 21 smaller replication families or when all 63 families were analyzed together without heterogeneity (23). The goal of the present study was to localize the well-replicated type 2 diabetes susceptibility gene in this region using a dense microsatellite map across a 37-cM region for linkage, case-control association studies, and family-based association studies.
![]() |
RESEARCH DESIGN AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
We included two closely related study populations. Both linkage and family-based association studies were conducted in samples from previously described families (23). Briefly, the primary studies were conducted on 618 members of 42 families (526 nonfounders). The mean number of individuals tested was 13.3 per family, and the mean number of affected individuals per family was 4.0, with a mean age of onset of 50.6 years. An additional 27 smaller families (mean number of individuals tested: 6.6), which included six families that were not previously typed, were used as a replication set and were typed for all markers in the present study. The replication families were ascertained under the same criteria as the initial families, but the families were smaller and had fewer available members for testing. All families were ascertained for at least two siblings with type 2 diabetes diagnosed before the age of 65 years and with no more than one parent known to have type 2 diabetes. All subjects were ascertained in Utah for Northern European ancestry. All available parents and siblings of the index sib-pair, as well as all available offspring of diabetic siblings, were studied. All nondiabetic individuals underwent a 75-g oral glucose tolerance test. Subjects were classified as affected if they had a previous diagnosis of type 2 diabetes and were on medical therapy. To incorporate young-onset impaired glucose tolerance into the affection status, individuals were considered affected if the fasting glucose exceeded 7.8 mmol/l or if the 2-h postchallenge glucose was >7.8 mmol/l for participants under age 45 years, 11.1 mmol/l for participants aged 4564 years, or 13.3 mmol/l for those over age 64 years. All other individuals with abnormal glucose tolerance tests were considered to be of unknown affection status. This scheme closely follows the World Health Organization criteria for impaired glucose tolerance (under age 45 years) and type 2 diabetes (age 4564 years) but raises the postchallenge glucose for elderly subjects based on epidemiological data. All diagnoses were the same as in our previous study (23). Uncertainty was programmed into parametric models for individuals considered affected but who did not meet the criteria for type 2 diabetes.
Case-control association studies were conducted on 150 unrelated individuals with known type 2 diabetes and 150 ethnically matched, unrelated control individuals. Of the type 2 diabetic individuals, 70 were selected from the linkage families and 80 additional individuals were selected from the same population for type 2 diabetes and a family history of type 2 diabetes in a first-degree relative. Control individuals included spouses from linkage families who had normal glucose tolerance tests (108 subjects) and Caucasian individuals ascertained in Utah or Arkansas (42 subjects) who had normal glucose levels or glucose tolerance tests and no family history of diabetes in a sibling, parent, or grandparent.
All individuals provided written informed consent under protocols approved by the University of Utah Institutional Review Board (diabetic kindreds and case-control population) or the University of Arkansas for Medical Sciences Institutional Review Board (additional case-control samples).
Marker selection and typing.
For linkage studies of chromosome 1, we added 37 microsatellite markers to the 38 markers previously typed (23), with 29 new markers in the region between D1S305 and D1S212, where previous linkage signals were found. Marker order and spacing was derived from published maps (29,30) with reference to the physical map to establish the order and distance for closely spaced markers (National Center for Biotechnology Information [NCBI] build 33). The average marker distance between D1S305 and D1S212 was 1.17 cM. For the population-based case-control association study, we typed 46 microsatellite markers between markers D1S305 and D1S212, with an average inter-marker distance of 0.52 Mb.
Microsatellite markers were amplified in the presence of universal M13 forward primers that were labeled with LI-COR IR700 and IR800 dyes, and the products were separated and detected on LI-COR 4200 sequencers using standard methods (Li-COR, Lincoln, NE). Genotypes were scored automatically using either SAGAGT software (31) (Li-COR) or semiautomatically using GeneImage IR 3.56 software (Scanalytics, Fairfax, VA). All readings were reviewed independently, and between 30 and 50 blinded duplicate samples were included for all markers for both linkage and association studies. All gels included at least two additional samples from selected grandparents of CEPH (Centre dEtude du Polymorphisme Humain) families as an additional quality control. Before linkage analysis, all data were checked for inconsistencies in size, inconsistencies between duplicates, and inconsistencies in Mendelian inheritance using the PEDCHECK program (v. 1.1) (32). All blinded duplicates were in agreement with the exception of four samples that were consistently incorrect and appeared to be incorrectly identified duplicate samples. We identified 0.98% genotyping errors (251 of 28,095 genotypes that were automatically read without reference to pedigree data) that resulted in noninheritance and were changed to unknown before analysis.
Linkage analysis.
The marker map used for all multipoint studies was derived from primary reference to the Marshfield map (http://research.marshfieldclinic.org/genetics), which included all of the typed markers. To properly space markers that were too close to be resolved on the Marshfield linkage map, we set the distance between markers with 0 recombination fractions to 0.5 cM, with marker order based on the physical map. Consequently, our map over the region from D1S305 to D1S212 was expanded by 3 cM from the Marshfield map and by 4 cM from the recently published DeCode map (33). Thus, exact locations used in the current study differ slightly from those cited in the most recent Marshfield map.
Despite careful quality control and retyping of markers with excess recombination events, recombination between closely linked markers exceeded expectations for many intervals. Inspection of genotypes failed to identify errors leading to increased recombination. Consequently, before multipoint analysis we used a mistyping analysis implemented in SimWalk2 (v. 2.82) (26) to remove all genotypes that had a 25% or greater posterior probability of error based on excess recombination. These genotypes were considered missing for all multipoint analyses. We removed a total of 882 of 48,017 genotypes for all 62 families (1.8%). Expected and observed recombination rates for each interval are shown in the online supplemental data (Table 1).
We conducted multipoint linkage analysis under a recessive parametric model that provided the maximum LOD score in our previous studies using Genehunter version 2.1_r3 beta (24,34) and families trimmed to fit this program. Nonparametric analyses were performed using statistics A through E in SimWalk 2 (v. 2.82) (26). Additionally, based on previous results showing the highest LOD score under a sib-pair analysis, we performed sib-pair linkage analysis using Genehunter (v. 2.1_r3) under models of dominance variance and no dominance variance (35). The recessive parametric model set the disease allele frequency at 0.25 and included a linear, age-dependent penetrance function that varied from 0.02 below age 30 years to 0.60 over age 65 years (23). The allele frequency of each microsatellite marker used for linkage analysis was estimated from unrelated pedigree members, assuming Hardy-Weinberg equilibrium. Linkage studies were conducted on the full 69-family set (original families and replication families) and on the 40 families that provided the maximum evidence for linkage in our previous study. These 40 families were selected from the 42 families of the previous study but excluded two families that segregated hepatocyte nuclear factor 1 variants (25). To fit the large families into Genehunter Plus, individuals who were unaffected or of unknown affection status were trimmed before analysis as described previously (23). The location score was also calculated in SimWalk2 using full families. Parametric recessive LOD scores were calculated assuming homogeneity (
= 1) and allowing for heterogeneity. The maximum likelihood estimate of alleles shared identical by descent (IBD) among sib-pairs from the 40 kindreds that were primarily responsible for earlier linkage findings was calculated both with and without weighting to correct for multiple sib-pairs and both with and without dominance variance (
s =
o).
Because of the increased recombination observed in this study despite elimination of clear genotyping errors and to minimize the impact of map errors, particularly between closely spaced markers, we supplemented the multipoint analyses with a two-point linkage analysis of the 40-family set under the recessive model using the FASTLINK program (36). To further minimize the errors in recombination fractions resulting from sex-averaged estimates of recombination, we incorporated sex-specific recombination fractions in these analyses.
Tests of association.
Population association tests for microsatellite alleles were conducted for 43 markers using CLUMP v. 1.9 software (37). We report the maximized 2 test (T4 statistic), which calculates the maximum
2 value found by collapsing the contingency tables over each allele in turn to form 2 x 2 contingency tables. The significance was assessed using a Monte Carlo approach with 10,000 simulations. Family-based associations with type 2 diabetes were tested in 69 families using a modification of the transmission disequilibrium test (TDT) (38), as implemented in the Pedigree Analysis Package (39) and described previously (40). This analysis tests the probability that a heterozygous parent transmits an allele to an affected offspring more often than expected by chance, similar to the gamete-competition model described by Sinsheimer et al. (41). Increased transmission from parents to affected offspring was tested by maximum likelihood analysis against equal transmission of the alleles. All alleles at a marker were tested simultaneously with k-1 df, where k represents the number of alleles. The pedigree is analyzed as an intact unit, so that trios and nuclear families were not examined separately. Because linkage in this region was established, this likelihood test was a test of association. Data are presented without correction for the number of markers tested. In a case control study of a two-allele marker, our power for a test of allelic association with 150 individuals in each group exceeds 80% for differences in allele frequency of 12% or greater. Linkage disequilibrium between microsatellite adjacent markers was calculated from the case-control study of unrelated individuals (both case and control subjects included) as a multilocus D' value using the expectation maximization algorithm as implemented in the 2LD program (http://linkage.rockefeller.edu).
Haplotype estimation and haplotype sharing analysis.
Haplotypes were inferred in 40 large pedigrees using 33 ordered markers from D1S305 to D1S212 and SimWalk2 (v. 2.82) following the methods of Saarela et al. (42). Sharing of maternal and paternal haplotypes between affected siblings within each family was determined by manual inspection (42).
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Parametric linkage analysis.
As in our previous report of 21 replication families (23), we found no evidence for linkage in the 27 replication families despite the dense map. Using the full available pedigree set (69 families), we only found evidence for linkage under models that incorporated heterogeneity, with a maximum heterogeneity LOD (HLOD) score of 1.42 with 25% of families linked at position 168.5 cM (marker D1S484). When the 40 families from the original linkage study that did not segregate hepatocyte nuclear factor 1 variants were tested, the maximum LOD score using families trimmed to fit Genehunter requirements was 5.28 at the same location (position 168.5 cM; marker D1S484), which was increased from 4.89 in our previous study. In contrast to the full family set, we found little evidence for heterogeneity (HLOD = 5.29;
= 0.96, 168.5 cM) using the 40 families that were trimmed of many unaffected individuals. As in our previous analysis (23), inclusion of all unaffected individuals using the Simwalk2 program dropped the nonheterogeneity location score to 2.98 and the heterogeneity LOD score to 4.07 (
= 0.65) without moving the location of the peak (marker D1S484; 168.5 cM) (Fig. 1). Based on the Genehunter analysis, the one LOD CI was narrowed to 167.6170.6 cM, corresponding to locations 156.8158.9 Mb on the physical map (NCBI Build 33).
|
|
|
Association studies.
To further localize the type 2 diabetes susceptibility locus, we tested association in a case-control population comprising diabetic case subjects and nondiabetic control subjects ascertained in Utah or Arkansas for 46 microsatellite markers. We also tested the 33 markers used in the linkage studies for excess transmission of any allele from parents to affected offspring using maximum likelihood methods. In case control studies, markers D1S194 (178 cM, 162.1 Mb) and D1S1677 (176 cM, 160.2 Mb) were nominally significant at P = 0.003 and P = 0.012, respectively, based on Monte Carlo assessment of significance tested using the CLUMP statistic T4 to examine all alleles simultaneously (37). Marker ATA38A05 at 179 cM (162.5 Mb) was most strongly associated by TDT (P = 0.002). These markers fall under the second linkage peak, with both D1S194 and ATA38A05 falling within the 1 LOD CI for the sib-pair analysis. The data for all 46 microsatellites is shown in Table 2 of the online Supplemental Data. Multipoint linkage disequilibrium between adjacent pairs of markers ranged from not significantly different from 0 to the highest D' value of 0.483 (Table 3 of online Supplemental Data).
Haplotype sharing.
We followed the methods of Saarela et al. (42) to establish shared haplotypes for the 33 markers that spanned the 37-cM region between markers D1S305 and D1S212. Haplotypes were inferred in SimWalk2 and were examined manually for sharing among the 58 sibships that had two or more affected individuals from the 40 families. Although no single haplotype was shared by all sibships, a 1.16-cM region centered on the first linkage peak and flanked by markers D1S2771 and D1S2705 was shared by 32 of 58 sibships (Table 4 of online Supplemental Data).
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
We focused the current study on the 40 multiplex families that provided the majority of the evidence for linkage in our initial report and that did not segregate other known mutations. Unlike the original study, the dense map of approximately one marker every centiMorgan has resolved the broad linkage peak observed initially into at least two narrow peaks. The first of these peaks has moved slightly centromeric from the original peak at APOA2 (170 cM) to the present location of D1S484 (168.5 cM). With additional markers, the LOD score has increased to 5.28 under the recessive model and using Genehunter-sized pedigrees. Similarly, using multipoint sib-pair analysis, the MLS has increased from 2.98 in the original study to 6.07 in the present study. Despite the large variation in significance levels for the first peak with different analytical methods, the location of this peak was remarkably consistent. Both the SimWalk2 statistic A and the parametric analysis continue to support a recessive-like mode of inheritance for the susceptibility gene or genes that accounts for the first peak. Based on the present analyses, we have narrowed the 1 LOD CI for this peak to a region from 167.6 cM (156.8 Mb) to 170.6 cM (158.9 Mb). This peak includes at least 60 RefSeq genes, including a number of strong candidate genes, many of which have been evaluated by our laboratory and others. Among the candidate genes previously evaluated in this region are apolipoprotein A2 (APOA2) at 170 cM (157.9 Mb) (44); phosphoprotein enriched in astrocytes (PEA15), which may be involved in insulin action (45); C-reactive protein, which may be involved in inflamation (46); and two inwardly rectifying potassium channel genes, KCNJ9 and KCNJ10 (47,48). None of the reported associations of single nucleotide polymorphisms (SNPs) in these genes can convincingly account for the strong linkage signal in our families, however. In contrast, we have identified two regions within the 1 LOD support interval in which a cluster of SNPs shows strong associations with type 2 diabetes in case-control studies. These associations thus appear to support the linkage findings. Neither region falls close to a strong candidate gene, but work is in progress to identify additional polymorphisms within these regions and to evaluate nearby coding genes. Additional support for an association under this peak has come from other groups with linkage in this region (17).
Unlike our original report, the present study suggests a second peak at 180 cM, 10 cM from the first peak at 169 cM. Based on the 40-family sib-pair analysis, the 1 LOD support interval is 177.7181.6 cM, or
162.5 to 164.7 Mb. Unlike the first peak, this second region is much less prominent using the recessive parametric models and is most prominent using multipoint sib-pair analysis, under which this peak nearly equals the first peak with a MLS of 5.247. These data suggest that the susceptibility locus accounting for the second peak acts less like a recessive locus. Furthermore, this peak has a higher MLS score than the first peak when all 69 families are considered, thus suggesting that the susceptibility allele accounting for this peak may be more prevalent than that accounting for the first peak. The most prominent candidate genes for type 2 diabetes in the 1 LOD support interval are the RXR
(49), for which we found an association with lipid abnormalities but a less prominent association with type 2 diabetes, and the overlapping homeobox transcription factor LMX1A (50). The microsatellite associations found in the present study also support one or more susceptibility genes that account for this peak. Marker D1S194, which was associated with type 2 diabetes in the case-control study, lies just telomeric to RXR
(162.06 Mb), whereas marker D1S1677 lies nearly 2 Mb telomeric to the 1 LOD CI (160.2 Mb). However, the only marker identified as overtransmitted in family members in a TDT-like test, marker ATA38A05, also lies within this second peak (162.5 Mb). We cannot exclude the possibility that one or more of the associations are spurious, particularly given the modest P values and the span of nearly 2.5 Mb between associated microsatellite markers. Additional SNP typing in these regions will be needed to confirm these associations and to narrow the genes responsible for these associations.
Although this study narrowed the most prominent linkage and association signals to the region between 156 and 168 Mb, we have previously demonstrated an association of multiple noncoding SNPs within the PKLR gene with type 2 diabetes (51), which is centromeric to the first linkage peak. This association would fall under the most centromeric linkage peak that was observed only on the unweighted sib-pair analysis (Fig. 3). The physical distance encompassed by this peak might extend from 117 Mb to at least 152.5 Mb. Among possible candidates in this region besides PKLR are RORC (52), -endosulfine (ENSA) (53), and interleukin-6 receptor (S.C.E., unpublished data). Of these candidates, a prominent association in this population was observed only with PKLR. Because of unusually strong linkage disequilibrium extending for large distances in this centromeric region, the actual genes accounting for the linkage peak and the association may lie at some physical distance from the observed association.
Were a single variant responsible for our linkage signal on 1q21-q24, we would expect to identify one haplotype of the microsatellite markers across the linkage peak that was shared among affected individuals. In contrast, in the region between D1S305 to D1S212, no single haplotype was shared. This finding is consistent with the existence of at least two and possibly three linkage peaks, suggesting more than one susceptibility gene in this region. The finding of several association peaks in this region offers further support for multiple susceptibility loci. We did identify a 1.16-cM region flanked by markers D1S2771 and D1S2705 in which affected siblings of 55% of the 58 sibships from the 40 families shared the same haplotype, but no single haplotype was shared, even in this narrow region. This finding is consistent with other common disease susceptibility genes and suggests that even within this first linkage peak, multiple at-risk haplotypes contribute to the linkage signal.
In summary, using combined linkage mapping, haplotype sharing and association studies with a dense marker map, we were able to confirm and narrow our original peak of linkage to a 3.3-cM region or 2.1 Mb. We have resolved a second linkage peak that is
10 cM telomeric to our largest peak but in a region of both association and linkage in other studies. Our analysis strongly suggests that the replication in this region comes in part from the coalescence of several susceptibility loci in a region that could not be resolved on a 10-cM genome scan. The region harbors many strong candidate genes for type 2 diabetes, as well as a large number of poorly characterized transcripts that may also be good candidates. International collaborative efforts are underway to map these loci using positional candidate and linkage disequilibrium approaches in the populations with linkage to this region.
![]() |
ACKNOWLEDGMENTS |
---|
![]() |
FOOTNOTES |
---|
Address correspondence and reprint requests to Steven C. Elbein, MD, Professor of Medicine, University of Arkansas for Medical Sciences, Endocrinology 111J-1/LR, John L. McClellan Memorial Veterans Hospital, 4700 W. 7th St., Little Rock, AR 72205. E-mail: elbeinstevenc{at}uams.edu
Received for publication August 26, 2003 and accepted in revised form November 10, 2003
HLOD, heterogeneity LOD; IDB, identical by descent; LOD, logarithm of odds; MLS, maximum likelihood score; NPL, nonparametric linkage; PKLR, liver- and red cell-type pyruvate kinase; RORC, retinoid-related orphan receptor ; SNP, single nucleotide polymorphism; TDT, transmission disequilibrium test
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|