1 Division of Endocrinology, Diabetes and Metabolism, Washington University School of Medicine, St. Louis, Missouri
2 Department of Psychiatry, Washington University School of Medicine, St. Louis, Missouri
3 Endocrinology and Metabolism Service, Internal Medicine Department, The Hadassah-Hebrew University Medical Center, Jerusalem, Israel
4 Department of Genetics, Washington University School of Medicine, St. Louis, Missouri
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Regions defined by linkage to complex diseases typically encompass >10 cM, an area that may harbor hundreds of genes, thereby making the identification of disease-causing loci a tedious process. The study of populations that have undergone genetic isolation, as have the Ashkenazim, is thought to be useful in mapping complex disease genes. Taking into consideration that the larger American and European Caucasian populations originated from the Mediterranean basin as did the Ashkenazim, genetic risk factors identified in the Ashkenazi Jewish population may be important in other Caucasian populations (1,2). In a genome-wide scan of 267 multiplex type 2 diabetic Ashkenazi Jewish families, regions on chromosome 20 that exhibited nominal evidence for linkage (P < 0.05) were identified (3). The strongest linkage signal on chromosome 20q was observed at D20S195; a weaker signal on chromosome 20p was seen at D20S103. Several other type 2 diabetes studies have also identified linkage to chromosome 20q13.1-13.2 in Caucasian (47) and Japanese (8) families. Interestingly, the linkage peaks in these studies overlap at a region near the hepatocyte nuclear factor-4 (HNF4
) gene. The gene spans
29 kb with 12 exons on chromosome 20q13.1-13.2 (9). It encodes for an orphan receptor member of the nuclear receptor superfamily 1.
HNF4 variants have been shown to cosegregate in an autosomal-dominant manner in families with an atypical form of type 2 diabetes known as maturity-onset diabetes of the young (MODY)-1. MODY is a clinically and genetically heterogeneous form of nonketotic diabetes that presents before age 25 years, usually in nonobese, asymptomatic, hyperglycemic individuals (1012). HNF4
s role in MODY stems from its function as a ß-cell transcription factor that influences glucose-induced insulin secretion (13). In contrast to MODY, type 2 diabetes usually occurs between ages 40 and 60 years, with the exception of obesity-related pediatric type 2 diabetes, regardless of family history (14). Both MODY and type 2 diabetic patients have reduced insulin sensitivity as a result of pancreatic islet ß-cell dysfunction. In addition, HNF4
has been shown to influence lipid transport and metabolism (15,16).
HNF4 is differentially expressed in mammalian liver, kidney, small intestine, colon, stomach, and pancreas from as many as nine different transcripts (17,18). An alternative promoter, P2, lies 45.6 kb upstream of the proximal P1 promoter (1820). P2-driven transcripts have been described as the predominant splice variant in pancreatic ß-cells (1821). Although HNF4
intragenic and/or proximal P1 promoter single nucleotide polymorphisms (SNPs) have been described in previous type 2 diabetes studies (4,2226), a thorough examination of the P2 region has not been reported; thus, association mapping was designed to examine the P2 region in this study.
Case-control studies of unrelated individuals have become the methodology of choice to follow up on linkage findings. The working hypothesis is that variants in linkage disequilibrium (LD) with the susceptibility locus will define the genomic region responsible for the original linkage signal. However, the extent of LD in various regions of genomic DNA has been shown to be highly variable (2729). Recently, we reported (30) the ongoing evaluation of SNPs across a 7.3-Mb region near microsatellite D20S107 in an association study using pooled DNAs from 150 Ashkenazi Jewish type 2 diabetic patients and 150 control subjects. In the absence of a strong positive association between any of these SNPs and type 2 diabetes, we implemented a more direct candidate gene approach involving the HNF4 gene in this study. We determined the patterns of LD and haplotype block structure to identify the number of haplotype-tag SNPs (htSNPs) required to capture the most common haplotypes across a 78-kb region encompassing HNF4
and P2 preparatory to case-control analysis. An htSNP in the P2 region was associated with type 2 diabetes and appeared to be responsible for the previously defined linkage peak in families in which the probands carried at least one risk allele. Notably, similar findings were independently observed in a Finnish sample from the concurrent FUSION (Finland-United States Investigation of NIDDM Genetics) study (31; see this issue of Diabetes).
![]() |
RESEARCH DESIGN AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Genotyping and single nucleotide polymorphism discovery.
All genotypes in this study were assessed by PCR amplification of genomic DNA and Pyrosequencing technology (Pyrosequencing, Uppsala, Sweden), as previously described (32). To establish an SNP map spanning the 78 kb encompassing
HNF4 and P2, SNPs were ascertained by searching the public database, WAVE analysis, and/or sequencing. In all, 35 SNPs were identified and tested for validation through our efforts described below. To achieve an SNP density of 1 SNP every 5 kb, SNPs were dropped if they occurred <2 kb apart. In all, 19 SNPs, resulting in an average density of 1 SNP every 4.1 kb, were tested for Hardy-Weinberg equilibrium (HWE), assessment of minor allele frequency, and characterization of LD structure across the region in the sample of 68 unrelated Ashkenazi Jewish individuals (sample 3). SNPs with a minor allele frequency of
0.09 in sample 3 were eliminated from further study.
From the public database, 12 SNPs (rs736821, rs10480819, rs3212210, rs1535337, rs1028583, rs2273618, rs3818247, rs1884614, rs2425639, rs4810424, rs1885088, and rs3761186) were selected and validated in samples of previously described DNA pools (30). Of these, nine SNPs (rs736821, rs3212210, rs1535337, rs1028583, rs3818247, rs1884614, rs2425639, rs1885088, and rs3761186) were further characterized in the 68 independent Ashkenazi Jewish individuals (sample set 3) for this study. An additional five public database SNPs (rs3761184, rs1884613, rs2144908, rs2425637, and rs2425640) were provided by Silander et al. (31).
To screen for SNPs by WAVE analysis (denaturing high-performance liquid chromatography [dHPLC]; Transgenomic, Omaha, NE), 15 primer sets were used to amplify by PCR all 12 exons (1,589 bp), flanking 5' and 3' intronic sequences (3,364 bp), the proximal promoter (800 bp), and 415 bp of the P2 promoter of the HNF4 gene in 96 type 2 diabetic Ashkenazi Jewish probands (from sample 1). These PCR fragments were subsequently screened using a modification of the WAVE analysis. Because SNP-specific heteroduplex and homoduplex controls were not available, dHPLC peaks were visually scored for differences in WAVE patterns. Subsequently, one or two patients specific for each WAVE pattern were sequenced by dye-terminator chemistry (Applied Biosystems, Foster City, CA). A total of 14 variants were identified, of which 7 were novel (Table 1). Because many of the SNPs were <1 kb apart, only five (rs1800963, rs2071197, rs736823, rs3212195, and lg8100208) were further characterized.
|
Statistical analysis.
Statistical significance for type 2 diabetes SNP association was determined by Fishers exact test, and the 95% CI was calculated using the approximation of Woolf (InStat version 3; GraphPad Software, San Diego, CA). P values were corrected for multiple tests using the Benjamini-Hochberg method (33). SNP genotype departures from HWE were tested using a 2 test with 1 degree of freedom.
Haplotypes were inferred using the Bayesian method as implemented in phase v1.0.1 (34). Phase-formatted data were run as a single file (case and control subjects combined) to allow for a more conservative estimation of haplotype frequency than would be obtained by separate case and control sample analyses. The program has the potential to optimize to each file separately, possibly skewing the haplotype frequencies. Several runs of phase were performed using the following parameters: iterations = 10 and 20 K, thinning intervals = 100 and 1,000, and burn-ins = 10 and 20 K.
Haplotype block structure was inferred by the greedy algorithm as implemented in HaploBlockFinder (35). In this program, the extent of LD was measured in terms of D', d2, and r2 (36,37). The significance of LD was assessed by the log likelihood ratio statistic under the assumption of HWE. HaploBlockFinder selects sets of SNPs defining 80% of the haplotypes (i.e., htSNPs) within a block based on r2, which represents absolute levels of LD. The parameters for HaploBlockFinder were as follows: block definition = minimum LD range; minimum D' = 0.80; genotype quality filter = 0.50 (ambiguous genotypes at a given locus can affect block partitioning, thus loci with ambiguous-to-total ratio genotypes with a threshold >0.50 are ignored); minor allele frequency (lowerbound) = 0.10; and coverage of htSNPs = 0.800.90.
Linkage analysis.
To test the hypothesis that the "A" allele at SNP rs1884614 (or an allele in strong LD with it) is a risk factor for type 2 diabetes, we partitioned a sample of multiplex type 2 diabetic families that had previously been genotyped for up to 40 chromosome 20 microsatellites into subgroups according to the probands genotype at rs1884614. The average heterozygosity of these 40 markers was 0.72. To protect against inadvertently including families with MODY, we required that the age at diagnosis of all affected pedigree members be >35 years. In all, 199 multiplex nuclear families met this inclusion criterion. Of these, 4 were maternal half-sibling families, 152 contained a pair of affected full siblings, 37 contained three affected siblings, and 6 contained an affected sibling quartet. A small number of additional nonfirst-degree genotyped relatives (two affected half-siblings and three affected first cousins) as well as 86 unaffected relatives (mostly siblings) were included in the linkage analysis. All linkage analyses were performed with Genehunter Plus using the "pairs" option under the exponential model (38,39). All linkage analyses were carried out using marker positions as determined on the Marshfield map (http://research.marshfieldclinic.org/genetics).
Linkage analysis was initially performed on three subgroups, depending on the probands genotype at rs1884614 ("AA" [n = 8], "AG" [n = 78], and "GG" [n = 113]). Visual inspection of the resulting logarithm of odds (LOD) score curves revealed that the "AA" and "AG" partitions were virtually identical (data not shown). Because the sample size of the "AA" subgroup was small, we decided to pool it with the "AG" partition. Accordingly, the comparison was between families with a proband who had at least one "A" allele and families where the proband lacked the "A" allele.
We carried out a randomization test to determine the significance of the partitioning. Subsamples of families (n = 86 and its complement [n = 199 86 = 113]) were drawn at random. For each subsample, we re-estimated the allele frequencies and performed a Genehunter Plus analysis, as above. A total of 10,000 randomizations were used to obtain empirical P values.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
As shown in Fig. 1, the remaining 15 informative SNPs were used to determine the pattern of LD in this region. Phase software was used to estimate haplotypes, which were distributed into blocks and "tagged" according to user-defined parameters in HaploBlockFinder. The LD plot, shown in Fig. 2, illustrates seven haplotype blocks spanning the 78-kb region that were identified in the Ashkenazim. These seven blocks included three "singleton" blocks. A singleton refers to a single SNP that is not in LD with neighboring variants. As is seen in Fig. 2, the LD plot of D' indicates strong LD among neighboring SNPs across the P2 region and HNF4; however, LD tended to decay within the
45-kb gap separating HNF4
from its alternative upstream promoter. Although the pattern of LD across this region was not striking due to the presence of the singleton blocks, this was not an unusual finding considering that LD and distance are semi-independent over short distances (40). In a separate study, we compared the LD pattern by genotyping these 15 SNPs in a sample of Centre dEtude du Polymorphisme Humain individuals (n = 34) and found that the block structure was not significantly different. P2 and HNF4
were distributed in the same blocks identified in the Ashkenazim, and the singleton blocks located within the 45-kb region were also observed (data not shown).
|
|
In an independent association analysis of type 2 diabetes in a Finnish sample from the FUSION study (31), several additional SNPs (Fig. 1) were observed to be associated with type 2 diabetes. We tested five of these SNPs and an additional SNP, rs3761184, to further define the length of the associated haplotype block in the Ashkenazim (Table 1). The two P2 proximal SNPs (rs1884613 and rs2144908) were found to be associated with type 2 diabetes in the Ashkenazim. Furthermore, these SNPs were found to be in strong LD with rs1884614 (D' = 0.98, r2 = 0.95, and d2 = 0.95 between rs1884613 and rs1886414 in the probands; D', r2, and d2 = 1.0 between rs2144908 and rs1884614 in the probands). Consequently, results from both studies identified a haplotype block spanning >10 kb of DNA that was associated with type 2 diabetes. In contrast, the FUSION-associated SNPs located near P1 (promoter proximal to the gene) were not associated with type 2 diabetes in the Ashkenazi sample.
Figure 3 reports the LOD score curves for the total sample of 199 multiplex families and the two subpartitions. As can be seen, the profiles for the two partitions were dramatically different. Indeed, it appears that the "A+" partition accounted for virtually all of the linkage signal on 20q12-13 present in our earlier analysis (3). For all 199 families, the maximum LOD score of 2.01 was located at D20S195. In the "A+" partition (families with the risk allele), the maximum LOD (2.72) also occurred at D20S195. By contrast, the "A" partition (families without the risk allele) attained a LOD score of 0.17 at D20S195.
|
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Significant SNPs were then tested to determine if they could resolve the etiologic heterogeneity by partitioning the families that provided the original linkage signal into homogeneous subgroups. The demonstration that partitioning our sample of multiplex families according to probands genotype at rs1884614 gave rise to significantly different LOD score profiles is prima facie evidence that the "A" allele (or an allele in strong LD with it) is a potent genetic risk factor predisposing to type 2 diabetes. We note that the P2 promoter of HNF4 was not located directly under our peak LOD score in the A+ partition. It would, indeed, be remarkable if the partitioning event enhanced the LOD score in the HNF4
interval (bounded in our data by D20S107 and D20S119) to a greater extent than in the more centromeric region where our original signal was maximized in these same families.
The remarkable similarity between our linkage partitioning findings and those reported by Silander et al. (31) for the FUSION study allows us to speculate that chromosome 20 may actually contain two distinct type 2 diabetespredisposing regions. In our families, the strongest signal and the most significant partitioning occurred on 20q. The signal on 20p was less persuasive in terms of the absolute LOD scores. In addition, the interval on 20p is sufficiently distant from the location of the partitioning event at HNF4 that it is unlikely that the effects of the partitioning could propagate over such a large distance. Nonetheless, the partitioning based on rs1884614 in our study or, equivalently, given the degree of LD, on rs2144908 in the FUSION study suggests that the partitioning appears to account for the original signal on 20q and that the region immediately upstream of the P2 promoter of HNF4
is an important contributor to risk.
It is reasonable to suggest that any of the four associated SNPs flanking the P2 promoter could have functional implications. For example, the expression of HNF4 P2-driven transcripts may be affected. Similarly, expression of adjacent hypothetical genes in the region may be affected. According to the current Entrez MapViewer (build 33), there are at least three predicted genes within the 78-kb region examined in this study (Fig. 1). These SNPs could be coding variants in yet-to-be-defined genes in this region. For example, SNP rs2144908 is positioned within the untranslated region of a predicted gene (FLJ39654) in which expressed sequence tags have been isolated from liver, kidney, and spleen. However, these predicted genes have not been described in pancreas.
In conclusion, it appears more likely that the four associated SNPs are regulatory variants or in LD with a coding or regulatory variant that predisposes to type 2 diabetes. These SNPs do not appear to be in linkage equilibrium with coding variants within HNF4, as extensive dHPLC and sequence analysis failed to identify common nonsynonymous SNPs. A likely explanation for our results is that these SNPs are in fact markers for a chromosomal region that regulates expression of either HNF4
or one of the neighboring genes. This hypothesis can now be tested by measuring allele-specific transcription.
![]() |
ACKNOWLEDGMENTS |
---|
We are greatly indebted to the investigators of the FUSION study for sharing their data before publication and helpful discussions of the manuscript. We thank Dr. Anthony Hinrichs for access to his 120 processor Beowulf-class computer cluster used to obtain the empiric P values from 10,000 randomizations. We also thank Mark Daly and Jeffrey Barrett of the Whitehead Institute for Biomedical Research for assistance with computing the measures of LD and Kun Zhang of the Center for Genome Information at University of Cincinnati School of Medicine for assistance with the HaploBlockFinder software. Finally, the authors would like to thank Gary Skolnick for assistance with preparation of the manuscript.
![]() |
FOOTNOTES |
---|
Additional information for this article can be found in an online appendix available at http://diabetes.diabetesjournals.org.
Address correspondence and reprint requests to M. Alan Permutt, MD, Division of Endocrinology, Diabetes and Metabolism, Washington University School of Medicine, St. Louis, MO 63110. E-mail: apermutt{at}im.wustl.edu
Received for publication September 10, 2003 and accepted in revised form January 13, 2004
dHPLC, denaturing high-performance liquid chromatography; FUSION, Finland-United States Investigation of NIDDM Genetics; htSNP, haplotype-tag single nucleotide polymorphism; HNF hepatocyte nuclear factor; HWE, Hardy-Weinberg equilibrium; LD, linkage disequilibrium; MODY, maturity-onset diabetes of the young; SNP, single nucleotide polymorphism
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|