* Center for Genome Information, University of Cincinnati, Cincinnati, Ohio
Human Genetics Center, University of Texas-Houston, Houston, Texas.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: ascertainment bias linkage disequilibrium SNPs coalescent
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The prospect of systematically determining genome-wide patterns of background LD has been greatly facilitated by the recent construction of a human high-density SNP marker map (Sachidanandam et al. 2001). It is important to realize, however, that many of these SNPs were identified by in silico methods that ascertained SNPs from a small number of chromosomes in a limited number of populations (Taillon-Miller et al. 1998; Marth et al. 1999; Altshuler et al. 2000; Irizarry et al. 2000; Mullikin et al. 2000). Therefore, inferences drawn from studies using such SNPs may be influenced by ascertainment bias (AB). In fact, several recent studies have demonstrated that the SNP discovery process introduces bias into estimates of various population genetic parameters such as the population migration rate (Wakeley et al. 2001), the population mutation rate (Kuhner et al. 2000; Nielsen 2000), and the population recombination rate (Nielsen 2000).
To date, the effect of AB on estimates of background LD has not been rigorously investigated, although Weiss and Clark (2002) have succinctly described the problem in a recent review article. Therefore, the purpose of this article is to demonstrate how the ascertainment strategy of SNP markers affects estimates of LD.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Outline of Study Design
There are many sources of bias that can potentially affect an estimate of LD. In this article, we restrict our analysis to ABthat is, bias attributable to the ascertainment protocol of SNP markers. Thus, we developed a study design that allows us to specifically characterize the effect of AB on estimates of background LD; it consists of four distinct steps. First, we generated the raw SNP data by simulating a genomic region from a subdivided population with two subpopulations that symmetrically exchange M migrants/generation. Second, we modeled a SNP identification experiment. Here two strategies were considered: (1) a hierarchical approach in which n1 chromosomes from subpopulation 1 are sampled and (2) a balanced design in which an equal number of chromosomes, denoted n1 = n2, are sampled from both subpopulations. Third, we calculated two LD statistics, D'all and D'id. D'all is the average D' between all pairwise SNPs produced in a particular realization of a coalescent simulation, whereas D'id is the average D' between only the SNPs identified in the SNP discovery experiment. Finally, in the fourth step, AB was measured as the mean absolute fractional error (MAFE). These steps are explained in more detail in the paragraphs that follow.
Step 1: Coalescent Simulations
The coalescent is a stochastic process that provides a powerful technique for rapidly simulating population genetic data (reviewed by Fu and Li 1999). We used a coalescent model to simulate a genomic region from two subpopulations connected by gene flow whose patterns of sequence variation and LD were influenced by genetic drift, mutation, population demography, recombination, and migration among subpopulations (http://home.uchicago.edurhudson1/; Hudson 1993).
The major parameters of the coalescent simulations were: M = Nm (one-quarter the population migration rate, where N is the effective population size and m is the migration rate/generation), = 4Nµ (the population mutation rate where µ is the mutation rate/locus/generation), and
= 4Nc (the population recombination rate where c is the recombination rate/gamete across the entire region). Migration was modeled according to an island model in which each subpopulation symmetrically exchanges M migrants/generation.
A major objective of simulation studies is to make the study as realistic as possible by carefully selecting parameter values that are consistent with empirical and theoretical data. In many cases, however, exact parameter values are either unknown or, as in our case, vary throughout the genome. We therefore adopted the strategy of performing simulations over a wide range of plausible parameter values culled from the empirical literature. Values of and
were assumed to be of the same order of magnitude (Clark et al. 1998; Nachman 2001; Nordborg and Tavaré 2002). Specifically, values of
and
considered were:
=
= [1, 2, ... , 12], which is in the range reported by many empirical and theoretical studies (Long and Langley 1999; Templeton et al. 2000; Yu et al. 2001b). Moreover, the values of M considered were as follows: M = [0.0125, 0.125, 1.25], which again are consistent with available empirical data (Santos, Epplen, and Epplen 1997; Wakeley 1999). For each parameter combination, 1000 simulations were performed and analyzed as described below. For each simulation replicate, 200 chromosomes were simulated from each subpopulation.
Furthermore, we considered two demographic models: (1) a model of constant population size and (2) a model of recent population expansion where each subpopulation expanded to 1000 N at 1.1 N generations in the past. If we assume N = 10,000 individuals, this model corresponds to a population expansion approximately 11,000 years ago, and thus the agricultural revolution is modeled (Zollner and von Haeseler 2000).
Step 2: Model SNP Identification Strategies
After simulating a genomic region, we model a SNP identification experiment. Here we considered two strategies: (1) a hierarchical approach in which chromosomes from a single subpopulation are sampled and (2) a balanced study design in which an equal number of chromosomes from each subpopulation are sampled for SNP discovery. In the hierarchical SNP identification strategy, we sampled n1 = [2, 4, 8, 16] chromosomes from subpopulation 1, whereas in the balanced strategy we sampled n1 = n2 = [2, 4, 8] chromosomes from each subpopulation. In all cases, chromosomes were "aligned" and candidate SNPs were identified. Candidate SNPs were accepted for further analysis if the minor allele frequency was 2% in the total sample of 200 chromosomes. This somewhat arbitrary threshold was selected for two reasons: (1) it is consistent with previous frequency-dependent definitions of SNPs (Brookes 1999) and (2) imposing a frequency threshold is necessary to avoid spuriously high estimates of D' (see Mateu et al. 2001).
Step 3: Calculate LD
Following SNP identification, we calculated two LD statistics, D'all and D'id. Let xi and yi denote the number of total and identified SNPs in subpopulation i (where i = 1 or 2), respectively. Then, D'all in the ith subpopulation is
|
|
In words, D'all is the average D' between all pairwise SNPs produced in the coalescent simulations, whereas D'id is the average D' between only the SNPs identified in the SNP discovery experiment. Note that D'all and D'id are calculated in the total number of chromosomes in each subpopulation (200), which eliminates any potential bias attributable to sample size. In some replicates, all candidate SNPs that were identified in a small number of chromosomes were not sufficiently polymorphic when genotyped in the entire population (i.e., a minor allele frequency 2%). When this occurred, D'id was set to be equal to 0. Although in practice one would likely attempt to discover more SNPs in these circumstances and re-estimate D'id, setting D'id equal to 0 provides a useful theoretical approach for decomposing the total amount of AB into its component parts.
Step 4: Measure AB
Ascertainment bias was measured as the mean absolute fractional error (MAFE), which is in general defined as the absolute value of the difference between theoretical (t) and observed (
t) values divided by the theoretical value (
t). Here,
t = D'all and
t = D'id. Thus, the AB in subpopulation i is
|
as a Measure of LD
The effect of AB on as a measure of LD was also studied. Data were simulated and SNPs were discovered as described above. Simulation parameters were set at
=
= 3, M = 0.125, n1 = [2, 4], and n1 = n2 = 2. For each parameter combination, 100 data sets were simulated.
For each data set, was estimated by a recently developed method that uses importance sampling to approximate the joint likelihood surface of
and
assuming a coalescent-model (http://www.stats.ox.ac.uk/
fhead/index.html; Fearnhead and Donnelly 2001). Parameter estimates are obtained by specifying "driving values" that parameterize the prior distribution of
and
. Here we considered four driving values and simulated 50,000 genealogies/driving value to approximate the likelihood surface of the data and estimate
(see Fearnhead and Donnelly 2001 for details).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Distribution of D'all and D'id
If AB affects estimates of LD, then the distribution of D'all and D'id will be different. Therefore, we begin by comparing the distribution of D'all and D'id under a hierarchical SNP identification strategy (n1 = 4). Figure 1 clearly demonstrates that D'all and D'id differ and that the shape of these distributions is influenced by M, , and
. For example, in subpopulation 1 when
=
= 6, approximately 65% of simulations result in values of D'all between 0.90 and 1, whereas only 36% of simulations yield values of D'id in this range. The shift in distribution is even more pronounced in subpopulation 2, where the percent of simulations in the range of 0.90 and 1 decreases from approximately 63% in D'all to 30% in D'id. A similar pattern is observed for each parameter value presented in figure 1, and thus, in general, AB causes LD to be underestimated. It is also interesting to note that when there was little gene flow between subpopulations (M = 0.125), a significant proportion of replicates resulted in D'id values of 0 in subpopulation 2 under a hierarchical SNP identification strategy. In other words, SNPs that are informative in the population in which they are discovered are not necessarily informative in other populations. This result has important implications for the utility of SNP resources in general, and those constructed by hierarchical study designs in particular (see Discussion). Although comparing the distribution of D'all and D'id provides an intuitive feel for how AB affects measures of LD, we now quantitatively address this question.
|
|
Third, the magnitude of AB varies as a function of and
. As we demonstrate in the section Decomposing AB,
and
contribute to AB in distinct ways. This result is important because it suggests that the magnitude of AB may vary as a function of the particular genomic locus considered, because
and
likely vary across the genome (Yu et al. 2001a; Zavolan and Kepler 2001).
Fourth, as the number of chromosomes used for SNP identification increases, AB decreases in both subpopulations, although the decline is slower in subpopulation 2. In fact, under extreme cases of population differentiation AB remains uniformly high in subpopulation 2 (data not shown). Although this amount of differentiation is not, in general, characteristic of human populations, it it may still be possible that certain genomic regions demonstrate such deep subdivision (Hamblin and Di Rienzo 2000; Hamblin et al. 2002).
AB Under Balanced SNP Identification Strategies
Next, we investigated the effect of AB under a balanced SNP discovery strategy. As expected, the AB under a balanced study design is nearly identical in each subpopulation (see fig. 3). For example, when =
= 1, M = 0.125, and n1 = n2 = 2, the AB in subpopulations 1 and 2 is 0.25 and 0.23, respectively. Furthermore, when comparing the magnitude of AB under hierarchical and balanced SNP identification strategies, it is clear that the latter leads to SNP resources that are broadly applicable across various subpopulations. For example, when
=
= 6, M = 0.125, and four chromosomes are used for SNP identification, AB in subpopulation 2 is reduced from 0.29 under a hierarchical design to 0.06 (n1 = 4) under a balanced (n1 = n2 = 2) design (fig. 2 and fig. 3B). Importantly, AB in subpopulation 1 is only marginally affected under the two different sampling schemes (hierarchical, 0.07; balanced 0.05). Intuitively, this result is obvious, because the rationale for a balanced study design is to minimize AB from population substructure. In the absence of subdivision, sampling N chromosomes from each of two potential subpopulations is equivalent to sampling 2N chromosomes from one subpopulation. Although a balanced SNP identification strategy is preferred over hierarchical approaches, it is important to note that AB can still be strong, particularly when a small number of chromosomes are used for SNP discovery.
|
|
Formally, we denote the total amount of AB as ABT, the AB attributable to identifying uninformative SNPs as ABI, and the AB attributable to sampling SNPs whose patterns of LD are unrepresentative of the region-at-large as ABR. If we assume under our heuristic model that ABI and ABR are independent, ABT is simply ABT = ABI + ABR. We estimated ABT by analyzing all simulation replicates regardless of whether informative SNPs were identified (i.e., allowing D'id = 0; see Methods). We estimated ABR by analyzing only the simulation replicates in which informative markers were identified (i.e., excluding replicates where D'id = 0). Finally, we estimated ABI by subtracting ABR from ABT. In summary, ABT is the total amount of bias introduced into an estimate of LD through the process of SNP ascertainment, which can then be partitioned into two component parts ABI and ABR.
Figure 5 shows the contribution of ABI and ABR to ABT over a broad range of values for and
for both constant and recently expanded demographic models. Several interesting observations emerge from these graphs. First, for both demographic models, the relationship of ABI and ABR to
and
is diametrically opposed. Specifically, ABR increases and ABI decreases as
and
increase. Therefore, for small values of
and
ABI dominates ABT, whereas for larger values of
and
, ABR dominates ABT. Second, the magnitude of ABI, and to a lesser extent ABR, is higher for the recently expanded demographic model than for the constant size model. This observation is attributable to a higher proportion of uninformative (i.e., rare) SNPs in expanded populations (data not shown).
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In addition, we decomposed the total amount of AB (ABT) into two component parts: ABI and ABR, which capture different aspects of the data. Recall that ABI is the AB attributable to identifying uninformative SNPs and ABR is the AB attributable to sampling SNPs unrepresentative of the region-at-large. Therefore, our finding that the magnitude of ABI is dictated by is intuitive, as one would expect the level of sequence variation to mediate the probability of identifying informative SNPs. Previous studies have also suggested that a significant fraction of candidate SNPs identified in a small number of chromosomes will be uninformative (Eberle and Kruglyak 2000; Yang et al. 2000), although to our knowledge we are the first to address this question in a subdivided population.
In contrast, ABR provides an estimate of AB conditional on the SNPs being informative. Hence, it is not surprising that the magnitude of ABR is determined primarily by and proportional to . More specifically, ABR increases as
increases, because the sampling variation of LD is greater for larger values of
. Thus, the greater variability in LD increases the probability of sampling SNPs whose patterns of LD are not representative of the region-at-large.
It is important to note that in our heuristic model to decompose ABT we assumed ABI and ABR to be independent. This assumption may not be completely accurate, however, as and
are positively correlated (Nachman 2001; Yu et al. 2001a), and warrants further investigation. Nonetheless, in the context of our heuristic model, we believe that our assumption that ABI and ABR are independent is a reasonable approximation that allows a deeper understanding of how population genetic parameters contribute to AB.
The results presented here are subject to several limitations. For instance, we have focused on how AB affects a commonly used measure of LD, D', although other measures exist (see Akey et al. 2001). Nevertheless, we also have demonstrated that AB affects estimates of (table 1). We therefore believe that our results are general and capture the important details of how AB affects inferences of LD. In future studies, it would be interesting to investigate how the magnitude of AB varies as a function of the particular statistic used to estimate LD.
Moreover, the simulation model is an obvious simplification of human population history. Specifically, we have assumed an island model of population structure, which posits a constant and symmetrical migration rate between subpopulations. Obviously, patterns of human migration are more complex, although it is difficult to predict systematically how these deviations affect our results. It is interesting to note however, how well our simple model fits the empirical data (table 2).
Furthermore, few empirical data are available regarding estimates of M between human populations. Available data from autosomal DNA (Santos, Epplen, and Epplen 1997; Wakeley 1999) and mitochondrial DNA (Seielstad, Minch, and Cavalli-Sforza 1998; Beerli and Felsenstein >2001) suggest that M is approximately 1, with one study reporting a 95% confidence interval of 0.611.43 (Wakeley 1999). However, many estimates of M are based primarily on FST, which has been criticized as an unreliable method for inference that is accurate to only a few orders of magnitude (Whitlock and McCauley 1999).
Moreover, as FST = 1/[1 + 4M]-1, it is plausible that estimates of FST may vary across the genome (for example, selection could result in regionally restricted changes in N), which may lead to a nonuniform distribution of AB. In other words, some genomic regions may be strongly affected by AB, whereas others may be minimally affected, a phenomenon that can observed in the empirical data (table 2). The Duffy blood group locus is a particularly good example of a genomic region where AB may be particularly strong because of natural selection. The Fy*O allele is nearly fixed in sub-Saharan African populations but is rare outside Africa, leading to the largest observed FST of any allele in humans (Hamblin and Di Rienzo 2000; Hamblin, Thompson, and Di Rienzo 2002).
Recently, a series of articles were published that suggest LD is arranged in blocklike structures such that within a block, limited haplotype diversity is observed (Daly et al. 2001; Johnson et al. 2001; Jeffreys, Kauppi, and Neumann 2001; Patil et al. 2001). Hence, it would be of tremendous interest to systematically identify haplotype blocks throughout the genome. To this end, a large-scale project is under consideration (Robertson 2001). Our results have important implications for the study design in constructing a genome-wide haplotype map. For example, our data intimate that identifying informative SNPs from the currently available collection of markers may be problematic (see ABI in fig. 6). Consistent with this hypothesis, Johnson et al. (2001), in their study of a European population, remarked that most SNP markers in dbSNP proved to be insufficient for identifying haplotype blocks. Nevertheless, our results are encouraging. If haplotype blocks are determined primarily by limited recombination (as opposed to demographic history), then our data demonstrate that AB within a haplotype block may not be very significant, because ABR is minimal for small values of . However, in regions outside of haplotype blocks ABR may be quite high, because
will be large.
Although AB may complicate accurate parameter estimates with the available SNP markers, it is important to realize that approaches exist to account for and correct this bias: for example, if the SNP ascertainment strategy is known and appropriately modeled analytical methods have been developed to accurately estimate (Kuhner et al. 2000; Nielsen 2000) and M (Wakeley et al. 2001). In the context of LD studies, there is a crucial need to move beyond pairwise LD measures and develop statistics to estimate the overall LD of a genomic region. Future work in this area should pay close attention to the aforementioned methods that allow AB to be modeled and corrected. As an example, Nielsen (2000) demonstrated that AB could be corrected for in estimates of
, which we used as an overall measure of LD (table 1).
![]() |
Conclusion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Jeffrey Long, Associate Editor
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Abecasis, G. R., E. Noguchi, A. Heinzmann, et al. (9 co-authors). 2001. Extent and distribution of linkage disequilibrium in three genomic regions. Am. J. Hum. Genet. 68:191-197.[CrossRef][ISI][Medline]
Akey, J., L. Jin, and M. Xiong. 2001. Haplotypes vs single marker linkage disequilibrium tests: what do we gain? Eur. J. Hum. Genet. 29:291-300.[CrossRef]
Akey, J. M., K. Zhang, M. Xiong, P. Doris, and L. Jin. 2001. The effect that genotyping errors have on the robustness of common linkage-disequilibrium measures. Am. J. Hum. Genet. 68:1447-1456.[CrossRef][ISI][Medline]
Altshuler, D., V. J. Pollar, C. R. Cowles, W. J. Van Etten, J. Baldwin, L. Linton, and E. S. Lander. 2000. A SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407:513-516.[CrossRef][ISI][Medline]
Beerli, P., and J. Felsenstein. 2001. Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl. Acad. Sci. USA 98:4563-4568.
Brookes, A. J. 1999. The essence of SNPs. Gene 234:177-186.[CrossRef][ISI][Medline]
Cho, R. J., M. Mindrinos, D. R. Richards, et al. (18 co-authors). 1999. Genome-wide mapping with biallelic markers in Arabidopsis thaliana. Nat. Genet. 23:203-207.[CrossRef][ISI][Medline]
Clark, A. G., K. M. Weiss, D. A. Nickerson, et al. (11 co-authors). 1998. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase. Am. J. Hum. Genet. 63:595-612.[CrossRef][ISI][Medline]
Daly, M. J., J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander. 2001. High-resolution haplotype structure in the human genome. Nat. Genet. 29:229-232.[CrossRef][ISI][Medline]
Eberle, M. A., and L. Kruglyak. 2000. An analysis of strategies for discovery of single-nucleotide polymorphisms. Genet. Epidemiol. 19:S29-S35.[CrossRef][ISI][Medline]
Fearnhead, P., and P. Donnelly. 2001. Estimating recombination rates from population genetic data. Genetics 159:1299-1318.
Fu, Y. X., and W. H. Li. 1999. Coalescing into the 21st century: an overview and prospects of coalescent theory. Theor. Popul. Biol. 56:1-10.[CrossRef][ISI][Medline]
Goddard, K. A., P. J. Hopkins, J. M. Hall, and J. S. Witte. 2000. Linkage disequilibrium and allele-frequency distributions for 114 single-nucleotide polymorphisms in five populations. Am. J. Hum. Genet. 66:216-234.[CrossRef][ISI][Medline]
Hamblin, M. T., and A. Di Rienzo. 2000. Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus. Am. J. Hum. Genet. 66:1669-1679.[CrossRef][ISI][Medline]
Hamblin, M. T., E. E. Thompson, and A. Di Rienzo. 2002. Complex signatures of natural selection at the Duffy blood group locus. Am. J. Hum. Genet. 70:369-383.[CrossRef][ISI][Medline]
Hudson, R. R. 1993. The how and why of generating gene genealogies. Pp. 2336 in N. Takahata and A. G. Clark, eds. Mechanisms of molecular evolution. Japan Scientific Societies, Tokyo.
Irizarry, K., V. Kustanovich, C. Li, N. Brown, S. Nelson, W. Wong, and C. J. Lee. 2000. Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences. Nat. Genet. 26:233-236.[CrossRef][ISI][Medline]
Jeffreys, A. J., L. Kauppi, and R. Neumann. 2001. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat. Genet. 29:217-222.[CrossRef][ISI][Medline]
Johnson, G. C., L. Esposito, B. J. Barratt, et al. (18 co-authors). 2001. Haplotype tagging for the identification of common disease genes. Nat. Genet. 29:233-237.[CrossRef][ISI][Medline]
Kidd, J. R., A. J. Pakstis, H. Zhao, et al. (12 co-authors). 2000. Haplotypes and linkage disequilibrium at the phenylalanine hydroxylase locus, PAH, in a global representation of populations. Am. J. Hum. Genet. 66:1882-1899.[CrossRef][ISI][Medline]
Kuhner, M. K., P. Beerli, J. Yamato, and J. Felsenstein. 2000. Usefulness of single nucleotide polymorphism data for estimating population parameters. Genetics 156:439-447.
Lewontin, R. C. 1964. The interaction of selection and linkage. I. General considerations: heterotic models. Genetics 49:49-67.
Lindblad-Toh, K., E. Winchester, M. J. Daly, et al. (15 co-authors). 2000. Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse. Nat. Genet. 24:381-386.[CrossRef][ISI][Medline]
Long, A. D., and C. H. Langley. 1999. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9:720-731.
Marth, G. T., I. Korf, M. D. Yandell, R. T. Yeh, Z. Gu, H. Zakeri, N. O. Stitziel, L. Hillier, P. Y. Kwok, and W. R. Gish. 1999. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23:452-456.[CrossRef][ISI][Medline]
Mateu, E., F. Calafell, O. Lao, B. Bonne-Tamir, J. R. Kidd, A. Pakstis, K. K. Kidd, and J. Bertranpetit. 2001. Worldwide genetic analysis of the CFTR region. Am. J. Hum. Genet. 68:103-117.[CrossRef][ISI][Medline]
Moffatt, M. F., J. A. Traherne, G. R. Abecasis, and W. O. Cookson. 2000. Single nucleotide polymorphism and linkage disequilibrium within the TCR alpha/delta locus. Hum. Mol. Genet. 9:1011-1019.
Mullikin, J. C., S. E. Hunt, C. G. Cole, et al. (40 co-authors). 2000. An SNP map of human chromosome 22. Nature 407:516-520.[CrossRef][ISI][Medline]
Nachman, M. B. 2001. Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 17:481-485.[CrossRef][ISI][Medline]
Nielsen, R. 2000. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154:931-942.
Nordborg, M., and S. Tavaré. 2002. Linkage disequilibrium: what history has to tell us. Trends Genet. 18:83-90.[CrossRef][ISI][Medline]
Patil, N., A. J. Berno, D. A. Hinds, et al. (19 co-authors). 2001. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719-1723.
Pritchard, J. K., and M. Przeworski. 2001. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69:1-14.[CrossRef][ISI][Medline]
Robertson, D. 2001. Racially defined haplotype project debated. Nat. Biotechnol. 19:795-796.
Sachidanandam, R., D. Weissman, S. C. Schmidt, et al. (38 co-authors). 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928-933.[CrossRef][ISI][Medline]
Santos, E. J., J. T. Epplen, and C. Epplen. 1997. Extensive gene flow in human populations as revealed by protein and microsatellite DNA markers. Hum. Hered. 47:165-172.[ISI][Medline]
Seielstad, M. T., E. Minch, and L. L. Cavalli-Sforza. 1998. Genetic evidence for a higher female migration rate in humans. Nat. Genet. 20:278-280.[CrossRef][ISI][Medline]
Taillon-Miller, P., I. Bauer-Sardina, N. L. Saccone, J. Putzel, T. Laitinen, A. Cao, J. Kere, G. Pilia, J. P. Rice, and P. Y. Kwok. 2000. Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28. Nat. Genet. 25:324-328.[CrossRef][ISI][Medline]
Taillon-Miller, P., Z. Gu, Q. Li, L. Hillier, and P. Y. Kwok. 1998. Overlapping genomic sequences: a treasure trove of single-nucleotide polymorphisms. Genome Res. 8:748-754.
Templeton, A. R., A. G. Clark, K. M. Weiss, D. A. Nickerson, E. Boerwinkle, and C. F. Sing. 2000. Recombinational and mutational hotspots within the human lipoprotein lipase gene. Am. J. Hum. Genet. 66:69-83.[CrossRef][ISI][Medline]
Tishkoff, S. A., E. Dietzsch, W. Speed, A. J. Pakstis, J. R. Kidd, K. Cheung, B. Bonne-Tamir, A. S. Santachiara-Benerecetti, P. Moral, and M. Krings. 1996. Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271:1380-1387.[Abstract]
Wakeley, J. 1999. Nonequilibrium migration in human history. Genetics 153:1863-1871.
Wakeley, J., R. Nielsen, S. N. Liu-Cordero, and K. Ardlie. 2001. The discovery of single-nucleotide polymorphisms and inferences about human demographic history. Am. J. Hum. Genet. 69:1332-1347.[CrossRef][ISI][Medline]
Weiss, K. M., and A. G. Clark. 2002. Linkage disequilibrium and the mapping of complex human traits. Trends. Genet. 18:19-24.[CrossRef][ISI][Medline]
Whitlock, M. C., and D. E. McCauley. 1999. Indirect measures of gene flow and migration: FST not equal to 1/4Nm + 1. Heredity 82:117-125.[CrossRef][ISI][Medline]
Wilson, J. F., M. E. Weale, A. C. Smith, F. Gratrix, B. Fletcher, M. G. Thomas, N. Bradman, and D. B. Goldstein. 2001. Population genetic structure of variable drug response. Nat. Genet. 29:265-269.[CrossRef][ISI][Medline]
Yang, Z., G. K. Wong, M. A. Eberle, M. Kibukawa, D. A. Passey, W. R. Hughes, L. Kruglyak, and J. Yu. 2000. Sampling SNPs. Nat. Genet. 26:13-14.
Yu, A., C. Zhao, Y. Fan, W. Jang, A. J. Mungall, P. Deloukas, A. Olsen, N. A. Doggett, N. Ghebranious, K. W. Broman, and J. L. Weber. 2001a. Comparison of human genetic and sequence-based physical maps. Nature 409:951-953.[CrossRef][ISI][Medline]
Yu, N., Z. Zhao, X. Y. Fu, N. Sambuughin, M. Ramsay, T. Jenkins, E. Leskinen, L. Patthy, L. B. Jorde, T. Kuromori, and W. H. Li. 2001b. Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. Mol. Biol. Evol. 18:214-222.
Zavolan, M., and T. B. Kepler. 2001. Statistical inference of sequence-dependent mutation rates. Curr. Opin. Genet. Dev. 11:612-615.[CrossRef][ISI][Medline]
Zollner, S., and A. von Haeseler. 2000. A coalescent approach to study linkage disequilibrium between single-nucleotide polymorphisms. Am. J. Hum. Genet. 66:615-628.[CrossRef][ISI][Medline]