* Computational Genomics and Department of Agronomy, Purdue University; Department of Statistics, Purdue University
Correspondence: E-mail: simonsen{at}stat.purdue.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: linkage disequilibrium recombination haplotype blocks coalescent simulations
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Because recombination is the key factor that gradually disrupts allelic associations after an LD-generating event (such as a population bottleneck or population admixture), clustering of recombination in small hot spots could define the boundaries of larger haplotype block segments of reduced recombination (Goldstein 2001; Nachman 2002; Jeffreys et al. 2004). This intuitively attractive hypothesis is supported by different lines of evidence, including the demonstration of spatial variation in recombination rates by comparing genetic and physical maps (Yu et al. 2001); coincidence of fine-scale recombination hot spots with haplotype block boundaries in sperm-typing studies (Jeffreys, Kauppi, and Neumann 2001; May et al. 2002; Kauppi, Sajantila, and Jeffreys 2003); and better fit with empirically observed LD patterns of models that incorporate recombination clustering compared to uniform recombination models in simulation studies (Reich et al. 2002; Wall and Pritchard 2003a).
However, simulation studies also show that the LD block structure can be affected by a range of factors other than recombination rate variability, such as marker density (Wang et al. 2002; Phillips et al. 2003; Wall and Pritchard 2003a), population demographic history (Wang et al. 2002; Stumpf and Goldstein 2003; Zhang et al. 2003; Anderson and Slatkin 2004), gene conversion (Przeworski and Wall 2001), ascertainment bias in marker selection (Akey et al. 2003; Phillips et al. 2003; Zhang et al. 2003), and, conceivably, by stochastic clustering of random recombinations (Subrahmanyan et al. 2001). And although modeling recombination hot spots does result in corresponding LD block boundaries, blocks also arise in areas of low recombination (Wall and Pritchard 2003b; Zhang et al. 2003). From these studies it is evident that a block-like LD structure can develop in the absence of recombination rate heterogeneity. The recombination hot spot hypothesis implies a strong relationship between recombination localization and haplotype block boundaries, but the observation that block boundaries arise readily under uniform recombination challenges the assumption that hot spots are responsible for most of the observed block-like LD patterns. This finding is relevant for the haplotype-tagging approach to gene mapping. The approach is most useful when the block structure is consistent within and between populations, which can be expected if the structure is biologically determined (via recombination hot spots), but is not obvious when the structure is mainly determined stochastically (Wang et al. 2002; Phillips et al. 2003), in which case the structure is shared only to the extent that the populations have a common history.
Using a new coalescent-based program that allows modeling of the genealogy of haplotypes consisting of many markers at fixed distances, we simulated the evolution of haplotypes under a uniform recombination rate in order to investigate the relationship between historic recombination and present-day haplotype blocks. We show that the relation between historic recombination frequency and block boundaries can be quite weak, which emphasizes the potential of other stochastic events to shape LD blocks. We stress the impact on block structure of the timing of mutation relative to recombination events, and we illustrate this impact via the effects of historic population bottlenecks and of marker frequency constraints.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Coalescent Program
We simulated genealogies based on a coalescent model with recombination (Kaplan and Hudson 1985; Simonsen and Churchill 1997). This process has been made more efficient to permit the simulation of thousands of loci by exploiting the sparsity of the Markov Chain structure (Simonsen et al. in preparation). Our implementation differs from that of Hudson (1983) in that recombination rates between each pair of adjacent loci are specified, and multiple recombination events are permitted at any given position. This permits a more realistic simulation of a genome with markers at pre-specified locations. For every genealogy, we recorded the actual number of recombination events that occurred at each position.
Population bottlenecks were imposed on the genealogies via modification of branch lengths in standard ways (Griffiths and Tavaré 1994). Briefly, if U is a coalescence time generated by a constant population size model, then the time T under a bottleneck model is
![]() |
Biallelic markers were simulated by imposing exactly one mutation at each locus, on a branch chosen with probability proportional to its length, subject to specified constraints on marker frequencies. For example, if minimum allele frequencies were specified as greater than or equal to 0.2, then branches leading to fewer than 0.2n or more than 0.8n sequences were not eligible for mutation.
The coalescent simulation program is available at http://www.stat.purdue.edu/simonsen/Argos/.
Simulation Settings
Unless stated otherwise, we simulated genealogies for 1-Mb segments with equidistant markers at 1-kb (1,000 markers) or 5-kb (200 markers) intervals, assuming a uniform recombination rate of 1 cM/Mb (Kruglyak 1999). In all simulations the basic effective population size was 10,000. To avoid unwarranted stochastic variation among models, the effects of marker frequency and population demographic history on haplotype blocks were assessed using the same underlying genealogies (20 iterations) for all models. Based on the constant population size model, the genealogies were compressed over part of the time range to fit the bottleneck scenarios and were subsequently subjected to the various mutation models that constrained the minimum allele frequencies. During a bottleneck, population size decreased to an effective population size of 10 for 50 generations. Note that these severe bottleneck models are not intended to reflect human demographic history, but to illustrate the effect on haplotype blocks of the relative timing of mutation and recombination events.
Linkage Disequilibrium and Haplotype Blocks
We used Lewontin's D' (Lewontin 1964) as a measure of pairwise LD, and we defined a haplotype block as a contiguous region of at least three loci in which |D'| 0.9 for all pairs of loci within the block (conform Phillips et al. 2003). In the simulated example with low sample size we also report Hill and Robertson's r2 index of LD (Hill and Robertson 1968), which is less sensitive to missing genotypes. To avoid overlapping blocks, we used a greedy algorithm (Zhang and Jin 2003) to maximize block length. With loci at equal distances of 1 kb, block length (in kb) was taken to be equal to the number of loci contained in the block.
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
Our simulations stress that the haplotype block structure can be weakly related to local historic recombination frequency, as only a small proportion of the recombinations might have a discernible effect on present-day LD. Observing haplotype blocks clearly does not justify the inference of recombination hot spots; additional evidence is needed to support their role in shaping the haplotype block structure.
Compelling evidence for recombination hot spots shaping LD patterns, albeit at a limited scale, is provided by sperm-typing studies that show concordance between population-level LD block boundaries and meiotic recombination clustering within single individuals (Jeffreys, Kauppi, and Neumann 2001; May et al. 2002; Kauppi, Sajantila, and Jeffreys 2003). Simulation studies also suggest a role for recombination hot spots, as some characteristics of human LD block patterns, such as incidental very long blocks or the degree of genomic block coverage, are not easily reproduced under uniform recombination (Wang et al. 2002; Wall and Pritchard 2003a; Zhang et al. 2003; but see Phillips et al. 2003). On the other hand, our results suggest that local LD patterns may be more affected by the relative timing of historic recombination and mutation events in their genomic region than by the total number of recombinations, and infrequent recombination events can easily result in block boundaries in areas of reduced recombination (Wall and Pritchard 2003b). Furthermore, the human LD block structure appears sensitive to population demographic history in a way that is consistent with stochastic determination of block boundaries under uniform recombination. For instance, African samples consistently show less extensive haplotype blocks than samples from the more recently founded European population (Gabriel et al. 2002; Wall and Pritchard 2003b). In some studies block boundaries have been reported to correspond well across populations (Gabriel et al. 2002; Kauppi, Sajantila, and Jeffreys 2003), consistent with the idea that recombination hot spots shape the haplotype block structure. However, these studies considered mainly old mutations (that is, with high minimum allele frequencies) that predate some or all of the population separations. This strategy can detect shared block structures that derive from shared population history, but is not efficient in detecting population-specific block structures that developed after the populations separated (Wang et al. 2002; Zhang et al. 2003).
Our simulations assume selective neutrality of mutations, and therefore they do not capture the effect that selection has on LD patterns. Selection can affect haplotype block structure: a beneficial mutation can sweep through the population together with linked alleles of the haplotype on which it arose (the hitchhiking effect; Maynard Smith and Haigh 1974), creating an area of reduced haplotype diversity and increased LD (that is, a haplotype block) around the selected locus (Sabeti et al. 2002; Wootton et al. 2002; Palaisa et al. 2004). The affected area can be large initially (Sabeti et al. 2002), but it may erode rapidly as a result of recombination (Przeworski 2002). Recent work in coalescent theory is aimed at exploring the spatial effects of such selective sweeps along recombining chromosomes (e.g., Kim and Stephan 2002; Kim and Nielsen 2004), which can improve our understanding of the effects of selective sweeps on haplotype block patterns. It is relevant to note that LD blocks that arise due to selective sweeps will develop in the absence of recombination hot spots. As with the stochastically developing haplotype blocks in our simulations, they are affected by recombination, but their structure does not necessarily reflect intrinsic spatial variation in recombination intensity. Rather, they are thought to vary between populations according to population-specific selection pressures (Storz, Payseur, and Nachman 2004).
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Akey, J. M., K. Zhang, M. M. Xiong, and L. Jin. 2003. The effect of single nucleotide polymorphism identification strategies on estimates of linkage disequilibrium. Mol. Biol. Evol. 20:232242.
Anderson, E. C., and M. Slatkin. 2004. Population-genetic basis of haplotype blocks in the 5q31 region. Am. J. Hum. Genet. 74:4049.[CrossRef][ISI][Medline]
Daly, M. J., J. D. Rioux, S. E. Schaffner, T. J. Hudson, and E. S. Lander. 2001. High-resolution haplotype structure in the human genome. Nat. Genet. 29:229232.[CrossRef][ISI][Medline]
Dawson, E., G. R. Abecasis, S. Bumpstead et al. (29 co-authors). 2002. A first-generation linkage disequilibrium map of human chromosome 22. Nature 418:544548.[ISI][Medline]
Gabriel, S. B., S. F. Schaffner, H. Nguyen et al. (18 co-authors). 2002. The structure of haplotype blocks in the human genome. Science 296:22252229.
Goldstein, D. B. 2001. Islands of linkage disequilibrium. Nat. Genet. 29:109111.[CrossRef][ISI][Medline]
Griffiths, R. C., and S. Tavaré. 1994. Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 344:403410.[ISI][Medline]
Hill, W. G., and A. Robertson. 1968. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38:226231.
Hudson, R. R. 1983. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23:183201.[ISI][Medline]
. 1990. Gene genealogies and the coalescent process. Pp. 144 in D. Futuyma, and J. Antonovics, eds. Oxford surveys in evolutionary biology. Oxford University Press, New York.
Int HapMap Consortium. (118 co-authors). 2003. The International HapMap Project. Nature 426:789796.[CrossRef][ISI][Medline]
Jeffreys, A. J., J. K. Holloway, L. Kauppi, C. A. May, R. Neumann, M. T. Slingsby, and A. J. Webb. 2004. Meiotic recombination hot spots and human DNA diversity. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 359:141152.[CrossRef][ISI][Medline]
Jeffreys, A. J., L. Kauppi, and R. Neumann. 2001. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat. Genet. 29:217222.[CrossRef][ISI][Medline]
Johnson, G. C. L., L. Esposito, B. J. Barratt et al. (21 co-authors). 2001. Haplotype tagging for the identification of common disease genes. Nat. Genet. 29:233237.[CrossRef][ISI][Medline]
Kaplan, N., and R. R. Hudson. 1985. The use of sample genealogies for studying a selectively neutral m-loci model with recombination. Theor. Popul. Biol. 28:382396.[ISI][Medline]
Kauppi, L., A. Sajantila, and A. J. Jeffreys. 2003. Recombination hotspots rather than population history dominate linkage disequilibrium in the MHC class II region. Hum. Mol. Genet. 12:3340.
Kim, Y., and R. Nielsen. 2004. Linkage disequilibrium as a signature of selective sweeps. Genetics 167:15131524.
Kim, Y., and W. Stephan. 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160:765777.
Kruglyak, L. 1999. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat. Genet. 22:139144.[CrossRef][ISI][Medline]
Lewontin, R. C. 1964. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49:4967.
May, C. A., A. C. Shone, L. Kalaydjieva, A. Sajantila, and A. J. Jeffreys. 2002. Crossover clustering and rapid decay of linkage disequilibrium in the Xp/Yp pseudoautosomal gene SHOX.. Nat. Genet. 31:272275.[CrossRef][ISI][Medline]
Maynard Smith, J., and J. Haigh. 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23:2335.[ISI][Medline]
Nachman, M. W. 2002. Variation in recombination rate across the genome: evidence and implications. Curr. Opin. Genet. Dev. 12:657663.[CrossRef][ISI][Medline]
Nordborg, M., and S. Tavaré. 2002. Linkage disequilibrium: what history has to tell us. Trends Genet. 18:8390.[CrossRef][ISI][Medline]
Palaisa, K., M. Morgante, S. Tingey, and A. J. Rafalski. 2004. Long-range patterns of diversity and linkage disequilibrium surrounding the maize Y1 gene are indicative of an asymmetric selective sweep. Proc. Natl. Acad. Sci. USA 101:98859890.
Patil, N., A. J. Berno, D. A. Hinds et al. (22 co-authors). 2001. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:17191723.
Phillips, M. S., R. Lawrence, R. Sachidanandam et al. (35 co-authors). 2003. Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots. Nat. Genet. 33:382387.[CrossRef][ISI][Medline]
Przeworski, M. 2002. The signature of positive selection at randomly chosen loci. Genetics 160:11791189.
Przeworski, M., and J. D. Wall. 2001. Why is there so little intragenic linkage disequilibrium in humans?. Genet. Res. 77:143151.[CrossRef][ISI][Medline]
Reich, D. E., S. F. Schaffner, M. J. Daly, G. McVean, J. C. Mullikin, J. M. Higgins, D. J. Richter, E. S. Lander, and D. Altshuler. 2002. Human genome sequence variation and the influence of gene history, mutation and recombination. Nat. Genet. 32:135142.[CrossRef][ISI][Medline]
Sabeti, P. C., D. E. Reich, J. M. Higgins et al. (17 co-authors). 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832837.[CrossRef][ISI][Medline]
Simonsen, K. L., and G. A. Churchill. 1997. A Markov chain model of coalescence with recombination. Theor. Popul. Biol. 52:4359.[CrossRef][ISI][Medline]
Storz, J. F., B. A. Payseur, and M. W. Nachman. 2004. Genome scans of DNA variability in humans reveal evidence for selective sweeps outside of Africa. Mol. Biol. Evol. 21:18001811.
Stumpf, M. P. H., and D. B. Goldstein. 2003. Demography, recombination hotspot intensity, and the block structure of linkage disequilibrium. Curr. Biol. 13:18.[ISI][Medline]
Subrahmanyan, L., M. A. Eberle, A. G. Clark, L. Kruglyak, and D. A. Nickerson. 2001. Sequence variation and linkage disequilibrium in the human T-cell receptor beta (TCRB) locus. Am. J. Hum. Genet. 69:381395.[CrossRef][Medline]
Wall, J. D., and J. K. Pritchard. 2003a. Assessing the performance of the haplotype block model of linkage disequilibrium. Am. J. Hum. Genet. 73:502515.[CrossRef][Medline]
. 2003b. Haplotype blocks and linkage disequilibrium in the human genome. Nat. Rev. Genet. 4:587597.[CrossRef][ISI][Medline]
Wang, N., J. M. Akey, K. Zhang, R. Chakraborty, and L. Jin. 2002. Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am. J. Hum. Genet. 71:12271234.[CrossRef][ISI][Medline]
Wiuf, C., T. Christensen, and J. Hein. 2001. A simulation study of the reliability of recombination detection methods. Mol. Biol. Evol. 18:19291939.
Wootton, J. C., X. Feng, M. T. Ferdig, R. A. Cooper, J. Mu, D. I. Baruch, A. J. Magill, and X. Su. 2002. Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature 418:320323.[CrossRef][ISI][Medline]
Yu, A., C. F. Zhao, Y. Fan et al. (11 co-authors). 2001. Comparison of human genetic and sequence-based physical maps. Nature 409:951953.[CrossRef][ISI][Medline]
Zhang, K., J. M. Akey, N. Wang, M. Xiong, R. Chakraborty, and L. Jin. 2003. Randomly distributed crossovers may generate block-like patterns of linkage disequilibrium: an act of genetic drift. Hum. Genet. 113:5159.[CrossRef][ISI][Medline]
Zhang, K., and L. Jin. 2003. HaploBlockFinder: haplotype block analyses. Bioinformatics 19:13001301.
|