* Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine
Chinese National Human Genome Center at Beijing, Beijing, China
Correspondence: E-mail: hefc{at}nic.bmi.ac.cn.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: interleukin-13 single nucleotide polymorphism positive selection
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
An important step for studies explaining the role of genetic variation in risk of disease is a description of quality, quantity, and organization of genetic variation within and between human populations. To further understand genetic variability at the IL13 locus, we undertook a continuous scan for sequence variation patterns in unrelated individuals. The corresponding chimpanzee (Pan troglodytes) sequence was also completed, to serve as an outgroup for evolutionary and population genetic tests. Our study provides insight into the likely relative roles of selection and population history in establishing present-day genetic variation at the IL13 locus.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Laboratory Analysis
Overlapping primer sets covering the genomic sequence of IL13 (GenBank accession number AC074127.2), which span more than 5 kb, were designed on the basis of size and overlap of PCR amplicons by use of Primer 3.0, release 0.9 (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi). We performed PCR in a total volume of 25 µl, containing 0.2 mM dNTPs, 4.5 mM MgCl2, 150 pM of each primer, 1 x standard buffer, 1 x Q-solution, 1 U Hotstar Taq polymerase (Qiagen), and 10 ng of genomic DNA. PCR conditions were as follows: 41 cycles of denaturation at 95°C, annealing at 56°C, and primer extension at 72°C, each step for 45 s. The first step of denaturation and the last step of extension were 15 min and 5 min, respectively. PCR products were purified by use of the 96-well microtitre purification plate according to the manufacturer's protocol (Millipore). Cycle sequencing was performed on PCR products in both directions according to the manufacturer's instructions using ABI PRISM Dye Terminator Sequencing Kits with Amplitaq DNA polymerase, FS (PerkinElmer). Excess dye terminators were removed by ethanol precipitation. The extension products were evaporated to dryness under vacuum (Savant Instruments), resuspended in 10 µl ddH2O, heated for 2 min at 92°C and loaded onto an Applied Biosystems 3700 sequencer.
SNP Identification
The ABI sequence software version 1.1 was used for lane tracking and first pass base-calling (PerkinElmer). The Phred-Phrap-Consed-PolyPhred package was used to assemble the sequences and identify SNPs. Specific descriptions and documentation on the above programs are available at http://www.genome.washington.edu. Once identified by PolyPhred, SNPs were visually inspected by at least two observers. Each SNP position and each individual's genotype has been confirmed by reamplifying and resequencing the SNP site from the same or opposite strand. Furthermore, because of the sequence overlap within the analyzed regions, more than one call for each genotype was often obtained for each position in a sample.
Data Analysis
Allele frequencies for each SNP were determined by gene counting, and the significance of deviations from Hardy-Weinberg equilibrium was tested using the random-permutation procedure implemented in the Arlequin package (http://lgb.unige.ch/arlequin/).
IL13 haplotypes were assigned by the computer program PHASE (Stephens, Smith, and Donnelly 2001). The Netwok 3.0 package (http://www.fluxus-technology.com/sharenet.htm) was used to construct the minimum-mutation network, which reflect the mutational relationships among the inferred haplotypes and the evolutionary history of genetic changes at the IL13 locus by means of the Reduced Median (RM) algorithm (Bandelt et al. 1995), with the common chimpanzee as an outgroup species. Trees were also constructed using the neighbor-joining, parsimony, and maximum-likelihood methods implemented in the PHYLIP software package (http://evolution.genetics.washington.edu/phylip.html). Two different statistical tests of the null hypothesis of neutralitythe haplotype partition test (HP test) (Hudson et al. 1994) and the haplotype diversity test (Hd test) (Depaulis and Veuille 1998)were carried out, assuming randomly generated haplotypes. Significance values for each of the above two test statistics were estimated from 104 coalescent simulations using the program ALLELIX (http://www.snv.jussieu.fr/mousset/), which condition on the sample size and the number of segregating sites as the observed data, under a conservative assumption of no recombination.
To test for differentiation between populations, FST (Wright 1951), which reflects differences in allele frequencies among samples and increase as allele frequency differences between population samples become more pronounced, was calculated using the program Arlequin.
The human sequences were aligned with the chimpanzee sequence to identify fixed difference and the ancestral allele/haplotype. Sequence divergence was measured by aligning the chimpanzee outgroup sequence with the human polymorphic sample.
Genetree software (http://www.maths.monash.edu.au/mbahlo/mpg/gtree.html) was used to estimate the likelihood that the observed outside Africa haplotypic distribution could arise through neutral evolution under a simple out-of-Africa model (Griffiths and Tavaré 1994).
Three measures of diversity were computed for each of the three population samples and the pooled samples: (1) Watterson's W (Watterson 1975), based on the number of segregating sites in the sample, an estimate of the expected per-site nucleotide heterozygosity, theoretically equal to the neutral mutation parameter 4Nem; (2)
(Nei and Li 1979), the direct estimate of per-site heterozygosity derived from the average number of pairwise sequence differences in the sample; and (3)
H (Fay and Wu 2000), a summary that gives more weight to high frequencyderived variants. To test whether the frequency spectrum of mutations conformed to the expectations of the standard neutral model, we calculated the values of two test statistics: (1) Tajima's D statistic (Tajima 1989), which considers the difference between
W and
, and (2) Fay and Wu's H statistic, which considers the difference between
H and
. Under neutrality, the two test statistics should be close to 0. Fu and Li's D statistic (Fu and Li 1993) was used to compare the observed number of singleton polymorphisms with those expected under a neutral model. Significance values for each of the above three test statistics were estimated from 104 coalescent simulations of a Wright-Fisher equilibrium model that condition on the sample size and level of polymorphism as the observed data, with no recombination (Hudson 1990), using DnaSP Software (Rozas and Rozas 1999) and Fay's H-test version 1.0, respectively. The Hudson/Kreitman/Aguadé (HKA) test (Hudson 1987) was used to compare diversity patterns in the IL13 region with diversity patterns found at two other loci: ß-globin (Harding et al. 1997) and DMD intron 44 (Nachman and Crowell 2000a), using the HKA computer program (http://lifesci.rutgers.edu/
heylab).
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
To further validate the hypothesis of positive selection attributed to the locus-specific pattern of unusual population differentiation at the IL13 region, a series of neutral model tests was used on each population and pooled populations (table 2). Tajima's D statistic suggested no statistically significant deviation from selective neutrality. Fu and Li's D test indicated a significant excess of singletons in the overall population but not in individual and pooled outside-Africa populations. In addition, by the comparison of the level of IL13 polymorphism and divergence to that observed at two other sampled human loci, ß-globin (Harding et al. 1997) and DMD intron 44 (Nachman and Crowell 2000a), HKA test suggested no significant differences (P > 0.05, data not shown).
However, when we use the recent Fay and Wu's H test, which compares the fit of the observed derived allele frequency spectrum of the SNPs with that expected under neutrality (fig. 1), it is significant in the Chinese (P = 0.043), Caucasian (P = 0.010), and pooled non-African (P = 0.032) populations but not in the African population (P = 0.783). A significant H test is thought to be the unique signature of a very recent sweep (Fay and Wu 2000; Otto 2000).
|
Having informed ourselves that positive selection may have operated at the IL13 locus in the Caucasian and Chinese population, we attempt to clarify how selection might have worked in this system. We constructed haplotypes on the basis of the genotype data from 11 SNPs selected to span most of IL13 using PHASE program. A 97% phase assignment was made with greater than 90% certainty. Thirty-eight haplotypes were identified. Consistent with above substantial difference in terms of FST among populations, haplotype diversity, the probability that two haplotypes randomly chosen from the sample will be different, also appeared quite different, with 0.979 in the Africans, 0.857 in the Chinese, and 0.551 in the Caucasians, respectively.
Further, the outside-Africa population data sets suggested unusual haplotype distribution. In the Caucasian population, the haplotype 3 was the only predominant haplotype, having a frequency of 0.67. In the Chinese population, the haplotypes 1, 3, and 4 were most common, having frequencies of 0.20, 0.26, and 0.19, respectively, and frequencies of only 0.02, 0.07, and 0.02, respectively, in the African population (table 3). To determine whether the observed haplotype distribution was consistent with the expectations of a neutral equilibrium model, we applied two tests of neutrality: Hd and HP. The Hd test suggested a significant reduction with respect to the neutral expectation in the haplotype diversity of the Caucasian sample (table 2). The HP test further revealed that polymorphic sites are not uniformly distributed among the sequences in the Caucasian sample (table 2). The above three most common haplotypes may have expanded outside Africa because of natural selection, which may fix different haplotypes in different populations.
The likely genealogical history of the observed SNPs was inferred from RM network (Bandelt et al. 1995) constructed for 12 common haplotypes (fig. 2), using the Netwok 3.0 software package. As shown in the RM network, the picture was dominated by two main clusters that differ at five SNP sites (involving SNPs T1922C, A2043G, A2524G, C2579A, and T2748C). One cluster, contained haplotypes 1 and 10 (referred to here as "cluster A"), and the other cluster, contained haplotypes 2, 3, 4, 7, and 8 (referred to here as "cluster B"). The same clusters and root position were also seen in trees constructed by the neighbor-joining and parsimony methods performed by the PHYLIP package. Haplotypes from cluster A and cluster B were found in approximately 12% and 54% of sampled chromosomes, respectively. Both clusters were found in the Chinese, Caucasian, and African samples. The observation that two distinct haplotype clusters with broadly overlapping geographical distribution are present provided further evidence for positive selection on the IL13 region. Furthermore, the RM network could give us more detailed information and indicate that different scenarios of action of positive selection in different populations may exist. Indeed, there are an accumulating number of examples where distinct selective pressures appear to apply in different environments (Harris and Hey 1999; Rana et al. 1999; Hamblin and Di Rienzo 2000). Clearly, the genetic variation pattern in the Caucasian population can be explained by the simple selection sweep hypothesis (Maynard Smith and Haigh 1974; Hudson 1990). However, in the Chinese population, the situation may be more complex. It is possible that an older selection sweep may have happened and driven the cluster B (including haplotypes 2, 3, 4, 7, and 8) to high frequency. The higher haplotype diversity of the cluster B is consistent with its relative antiquity. Afterwards, a more recent hitchhiking event may have taken place and driven the cluster A (including haplotypes 1 and 10) to high frequency. This second round of selection sweep in the Chinese population may still be active because of the very low frequency of cluster A outside of the Chinese population (table 3). Alternatively, the focal point of positive selection is a bit further away that allows recombination to happen before haplotypes 1 and 10 even get to beyond 0.20.
Another plausible factor that can explain the present-day variation pattern at the IL13 locus in the Chinese population is balancing selection. Balancing selection can maintain old evolutionary lineages at low frequencies by protecting them from genetic drift (Lewontin and Hubby 1966) and can also cause excess of high frequencyderived variants. However, the fact that there is much LD (data not shown) and the number of haplotypes is small (eight haplotypes) in the Chinese population indicates the balancing selection hypothesis may be unlikely.
Although selection is one primary explanation for the observed variation patterns in the Chinese and Caucasian populations, the effect of population growth should not be debarred completely. In fact, SNPs and haplotypes found outside Africa were mostly a subset of those found within Africa (tables 1 and 3). It is conceivable that a reduction in population size (e.g., founder effect caused by migration out of Africa) would cause the frequencies of some chromosomes (haplotypes) to be increased, leaving one predominant haplotype (i.e., haplotype 3) in the Caucasian population and three common haplotypes (i.e., haplotype 1, 3, and 4) in the Chinese population, respectively. On inspection, it seems improbable that the outside-Africa population haplotype distribution could be obtained by sampling from the African population. Coalescent modeling of this process using the Genetree program (data not shown) shows that the three most common outside-Africa haplotypes are among the least likely to become common after migration. This suggests that population growth alone is an unlikely explanation for the unusual population-specific haplotype distribution pattern at the IL13 locus. In addition, under population growth, high frequencyderived alleles are expected to be less abundant than under a constant population size model (Fay and Wu 2000). Thus, population growth could not make Fay and Wu's H statistic significant, although it is true for Fu and Li's and Tajima's D statistics.
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bandelt, H. J., P. Forster, B. C. Sykes, and M. B. Richards. 1995. Mitochondrial portraits of human populations using median networks. Genetics 141:743-753.
Cavalli-Sforza, L. L., P. Menozzi, and A. Piazza. 1994. The history and geography of human genes. Princeton University Press, Princeton.
Clark, A. G., K. M. Weiss, and D. A. Nickerson, et al. (11 co-authors). 1998. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Hum. Genet. 63:595-612.[CrossRef][ISI][Medline]
Depaulis, F., and M. Veuille. 1998. Neutrality tests based on the distribution of haplotypes under an infinite-site model. Mol. Biol. Evol. 15:1788-1790.
Fay, J. C., and C. I. Wu. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405-1413.
Fu, Y. X., and W. H. Li. 1993. Statistical tests of neutrality of mutations. Genetics 133:693-709.
Fullerton, S. M., A. Bartoszewicz, G. Ybazeta, Y. Horikawa, G. I. Bell, K. K. Kidd, N. J. Cox, R. R. Hudson, and A. Di Rienzo. 2002. Geographic and haplotype structure of candidate type 2 diabetes susceptibility variants at the calpain-10 locus. Am. J. Hum. Genet. 70:1096-1106.[CrossRef][ISI][Medline]
Glazier, A. M., J. H. Nadeau, and T. J. Aitman. 2002. Finding genes that underlie complex traits. Science 298:2345-2349.
Graves, P. E., M. Kabesch, M. Halonen, C. J. Holberg, M. Baldini, C. Fritzsch, S. K. Weiland, R. P. Erickson, E. von Mutius, and F. D. Martinez. 2000. A cluster of seven tightly linked polymorphisms in the IL-13 gene is associated with total serum IgE levels in three populations of white children. J. Allergy Clin. Immunol. 105:506-513.[ISI][Medline]
Griffiths, R. C., and S. Tavaré. 1994. Ancestral inference in population genetics. Stat. Sci. 9:307-319.[ISI]
Hamblin, M. T., and A. Di Rienzo. 2000. Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am. J. Hum. Genet. 66:1669-1679.[CrossRef][ISI][Medline]
Harding, R. M., S. M. Fullerton, R. C. Griffiths, and J. B. Clegg. 1997. A gene tree for beta-globin sequences from Melanesia. J. Mol. Evol. 44:(suppl 1): S133-138.[ISI][Medline]
Harris, E. E., and J. Hey. 1999. X chromosome evidence for ancient human histories. Proc. Natl. Acad. Sci. USA 96:3320-3324.
Heinzmann, A., X. Q. Mao, and M. Akaiwa, et al. (32 co-authors). 2000. Genetic variants of IL-13 signalling and human asthma and atopy. Hum. Mol. Genet. 9:549-559.
Howard, T. D., P. A. Whittaker, A. L. Zaiman, G. H. Koppelman, J. Xu, M. T. Hanley, D. A. Meyers, D. S. Postma, and E. R. Bleecker. 2001. Identification and association of polymorphisms in the interleukin-13 gene with asthma and atopy in a Dutch population. Am. J. Respir. Cell Mol. Biol. 25:377-384.
Hudson, R. R. 1987. Estimating the recombination parameter of a finite population model without selection. Genet. Res. 50:245-250.[ISI][Medline]
1990. Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7:1-44.
Hudson, R. R., K. Bailey, D. Skarecky, J. Kwiatowski, and F. J. Ayala. 1994. Evidence for positive selection in the superoxide dismutase Sod region of Drosophila melanogaster. Genetics 136:1329-1340.
International SNP Map Working Group. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928-933.[CrossRef][ISI][Medline]
Jaruzelska, J., E. Zietkiewicz, M. Batzer, D. E. Cole, J. P. Moisan, R. Scozzari, S. Tavare, and D. Labuda. 1999. Spatial and temporal distribution of the neutral polymorphisms in the last ZFX intron: analysis of the haplotype structure and genealogy. Genetics 152:1091-1101.
Jobling, M. A., G. A. Williams, G. A. Schiebel, G. A. Pandya, G. A. McElreavey, G. A. Salas, G. A. Rappold, N. A. Affara, and C. Tyler-Smith. 1998. A selective difference between human Y-chromosomal DNA haplotypes. Curr. Biol. 8:1391-1394.[ISI][Medline]
Kaessmann, H., F. Heissig, A. von Haeseler, and S. Paabo. 1999. DNA sequence variation in a non-coding region of low recombination on the human X chromosome. Nat. Genet. 22:78-81.[CrossRef][ISI][Medline]
Lalouel, J. M. 2001. From genetics to mechanism of disease liability. Adv. Genet. 42:517-533.[Medline]
Leung, T.F., N. L. Tang, I. H. Chan, A. M. Li, G. Ha, and C. W. Lam. 2001. A polymorphism in the coding region of interleukin-13 gene is associated with atopy but not asthma in Chinese children. Clin. Exp. Allergy 31:1515-1521.[CrossRef][ISI][Medline]
Lewontin, R. C., and J. L. Hubby. 1966. A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics 54:595-609.
Liu, X., R. Nickel, K. Beyer, U. Wahn, E. Ehrlich, L. R. Freidhoff, B. Bjorksten, T. H. Beaty, and S. K. Huang. 2000. An IL13 coding region variant is associated with a high total serum IgE level and atopic dermatitis in the German multicenter atopy study MAS-90. J. Allergy Clin. Immunol. 106:167-170.[CrossRef][ISI][Medline]
Maynard Smith, J. M., and J. Haigh. 1974. The hitchhiking effect of a favourable gene. Genet. Res. 23:23-35.[ISI][Medline]
Nachman, M. W. 1998. Y chromosome variation of mice and men. Mol. Biol. Evol. 15:1744-1750.
Nachman, M. W., and S. L. Crowell. 2000a. Contrasting evolutionary histories of two introns of the duchenne muscular dystrophy gene, Dmd, in humans. Genetics 155:1855-1864.
2000b. Estimate of the mutation rate per nucleotide in humans. Genetics 156:297-304.
Nei, M., and W. H. Li. 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76:5269-5273.[Abstract]
Otto, S. P. 2000. Detecting the form of selection from DNA sequence data. Trends Genet. 16:526-529.[CrossRef][ISI][Medline]
Rana, B. K., D. Hewett-Emmett, and L. Jin, et al. (12 co-authors). 1999. High polymorphism at the human melanocortin 1 receptor locus. Genetics 151:1547-1557.
Rieder, M. J., S. L. Taylor, A. G. Clark, and D. A. Nickerson. 1999. Sequence variation in the human angiotensin converting enzyme. Nat. Genet. 22:59-62.[CrossRef][ISI][Medline]
Rosenberg, N. A., J. K. Pritchard, J. L. Weber, H. M. Cann, K. K. Kidd, L. A. Zhivotovsky, and M. W. Feldman. 2002. Genetic structure of human populations. Science 298:2381-2385.
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174-175.
Simonsen, K. L., G. A. Churchill, and C. F. Aquadro. 1995. Properties of statistical tests of neutrality for DNA polymorphism data. Genetics 141:413-429.
Stephens, M., N. J. Smith, and P. Donnelly. 2001. A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68:978-989.[CrossRef][ISI][Medline]
Subrahmanyan, L., M. A. Eberle, A. G. Clark, L. Kruglyak, and D. A. Nickerson. 2001. Sequence variation and linkage disequilibrium in the human T-cell receptor beta TCRB locus. Am. J. Hum. Genet. 69:381-395.[CrossRef][Medline]
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585-595.
Taylor, M. F., Y. Shen, and M. E. Kreitman. 1995. A population genetic test of selection at the molecular level. Science 270:1497-1499.[Abstract]
van der Pouw Kraan, T. C., A. van Veen, L. C. Boeije, S. A. van Tuyl, E. R. de Groot, S. O. Stapel, A. Bakker, C. L. Verweij, L. A. Aarden, and J. S. van der Zee. 1999. An IL-13 promoter polymorphism associated with increased risk of allergic asthma. Genes Immunol. 1:61-65.[CrossRef]
Watterson, G. A. 1975. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7:256-276.[ISI][Medline]
Wright, S. 1951. The genetical structure of populations. Ann. Eugenics 15:323-354.[ISI]
Yu, N., Z. Zhao, and Y. X. Fu, et al. (11 co-authors). 2001. Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. Mol. Biol. Evol. 18:214-222.
Zhao, Z., L. Jin, and Y. X. Fu, et al. (13 co-authors). 2000. Worldwide DNA sequence variation in a 10-kilobase noncoding region on human chromosome 22. Proc. Natl. Acad. Sci. USA 97:11354-11358.