Department of Ecology and Evolutionary Biology, Biosciences West Building, University of Arizona
Population genetics theory indicates that natural selection will affect levels and patterns of genetic variation at closely linked loci. Background selection (Charlesworth, Morgan, and Charlesworth 1993
) proposes that the removal of recurrent deleterious mutations and associated neutral variants will cause a reduction of nucleotide variation in low-recombination regions. The strength of background selection depends on the deleterious mutation rate, the magnitude of selection and dominance, and the recombination rate. Genetic hitchhiking (Maynard Smith and Haigh 1974
), the fixation of advantageous alleles and the associated fixation of linked neutral alleles, can also decrease nucleotide diversity in low-recombination regions. The extent of genetic hitchhiking depends on the strength of selection and the rate of recombination. Therefore, under both background selection and genetic hitchhiking, theory predicts that genomic regions that rarely recombine may be subject to reductions in nucleotide diversity. Furthermore, if the rate of deleterious mutation or selective sweeps (or both) is sufficiently high, background selection (Hudson and Kaplan 1995
) and genetic hitchhiking (Wiehe and Stephan 1993
) models predict an overall positive correlation between nucleotide polymorphism and recombination rate.
Empirical investigations of nucleotide variation support these predictions. In Drosophila melanogaster, regions of the genome with little recombination show reduced heterozygosity (Aguade, Miyashita, and Langley 1989
; Begun and Aquadro 1991
; Berry, Ajioka, and Kreitman 1991
). Furthermore, there is evidence that nucleotide variation and recombination rate are positively correlated in several taxa, including fruit flies (Begun and Aquadro 1992
), house mice (Nachman 1997
), goatgrasses (Dvorak, Luo, and Yang 1998
), sea beets (Kraft et al. 1998
), tomatoes (Stephan and Langley 1998
), humans (Nachman et al. 1998
; Przeworski, Hudson, and Di Rienzo 2000
; Nachman 2001
), and maize (Tenaillon et al. 2001
). The combination of theoretical and empirical results indicates that selection acting at linked sites is likely to be a major force shaping genomic patterns of nucleotide variation.
The documented relationship between nucleotide variation and recombination rate raises the question of whether other measurable variables can explain additional variation in nucleotide polymorphism in the context of selection at linked sites. We predict that the effects of selection at linked sites will depend on local gene density. If selection acts primarily on genes, genomic regions with high gene density will harbor more potential selective targets than genomic regions with low gene density. This prediction should be valid irrespective of whether positive or purifying selection is driving observed patterns. Humans provide a good system in which this prediction can be tested, for two reasons. First, gene density varies substantially across the genome (International Human Genome Sequencing Consortium 2001
; Venter et al. 2001
). For example, sequence data suggest that chromosome 19 has an average of 23 genes per Mbp, whereas chromosome 4 averages only 6 genes per Mbp (Venter et al. 2001
). Second, estimates of nucleotide polymorphism assessed using reasonable sample sizes are available for multiple loci across the human genome. Here, we demonstrate that nucleotide diversity and gene density are negatively correlated in humans. This result provides further evidence for the importance of selection at linked sites and suggests that the number of genes in a genomic region is a reasonable indicator of selective intensity.
We assessed the relationship between nucleotide polymorphism (measured by Watterson's [1975]) and gene density using data from sequence-based studies of variation that sampled more than 10 chromosomes (table 1
). The variance in
can be quite large with sample sizes smaller than 10 (Pluzhnikov and Donnelly 1996
).
|
Nucleotide polymorphism and recombination rate are strongly, positively correlated (R2 = 0.63; P = 0.0002; fig. 1a
) for these data, despite no evidence for a positive relationship between divergence and recombination rate (P > 0.05). Comparing the residuals of the regression of nucleotide polymorphism on recombination rate with gene density reveals a significant negative association (R2 = 0.25; P = 0.04; fig. 1b
). As predicted, nucleotide polymorphism is reduced in regions with higher gene density, once recombination rate variation is taken into account. A model including both recombination rate and gene density as independent variables explains 68% (adjusted R2; P = 0.0001; recombination rate: P = 0.0001; gene density: P = 0.05) of the variation in nucleotide polymorphism. There is weak evidence for a negative association between nucleotide polymorphism and gene density alone (R2 = 0.17; P = 0.10). There is no evidence of a statistical interaction between recombination rate and gene density, although such an interaction would be difficult to detect with our small sample size. We also asked whether an alternative measure of nucleotide variation, the average pairwise divergence between sequences, (Nei and Li 1979
), is associated with gene density. There is a slight trend toward a negative relationship, but it is not statistically significant (P > 0.05 in all tests).
and
incorporate different aspects of the data in their estimates of nucleotide diversity. Whereas
is estimated by counting the number of segregating sites in the total sample,
is estimated by comparing all the pairwise sequence combinations and calculating the average number of differences. As a result,
contains information about allele frequencies and
does not. However,
has a lower sampling variance than
. Using the average number of sampled chromosomes (n = 124) and the average
or
value (approximately 0.1%) for the studies included in our analysis, under an infinite sites model with no recombination, the sampling variance of
(0.034%) is nearly twice that of
(0.019%). Although this effect may be ameliorated by recombination (Pluzhnikov and Donnelly 1996
), the increased statistical difficulty in estimating
may contribute to our failure to detect an association between gene density and
.
|
Several conclusions follow from these results. First, natural selection at the molecular level has a pronounced effect on the levels of nucleotide heterozygosity in humans. Even if the total number of sites under selection is relatively modest, it is clear that the effects on linked, neutral variation can be substantial. It remains to be seen whether the patterns depicted in figure 1
are driven mainly by positive selection and associated genetic hitchhiking, purifying selection, or some combination of both. Background selection and genetic hitchhiking are not mutually exclusive, and it seems likely that both processes may be contributing to observed patterns (Kim and Stephan 2000
). Second, these results suggest that the density of genes is a reasonable indicator of the potential for selection and that genes are likely the targets of selection in many cases. However, the high degree of sequence conservation between human and mouse in intergenic regions suggests that many of these intergenic regions may also be functional, possibly containing important cis-regulatory elements (Shabalina et al. 2001
). The degree to which the densities of genes and cis-regulatory elements covary is therefore an interesting question for further investigation. Finally, our results indicate that levels of human nucleotide polymorphism can be predicted with reasonable precision, given the knowledge about local recombination rate and gene density. Because recombination rate and gene density can now be measured throughout the human genome, this predictive ability could assist efforts to map genes underlying complex diseases.
Acknowledgements
We thank Bruce Walsh for helpful discussions. We also acknowledge the useful comments of Adam Eyre-Walker and two anonymous reviewers.
Footnotes
Adam Eyre-Walker, Reviewing Editor
Address for correspondence and reprints: Bret A. Payseur, Department of Ecology and Evolutionary Biology, Biosciences West Building, University of Arizona, Tucson, Arizona 85721. payseur{at}email.arizona.edu
.
References
Aguade M., N. Miyashita, C. H. Langley, 1989 Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster Genetics 122:607-615
Alonso S., J. A. L. Armour, 2001 A highly variable segment of human subterminal 16p reveals a history of population growth for modern humans outside Africa Proc. Natl. Acad. Sci. USA 98:864-869
Badge R. M., J. Yardley, A. J. Jeffreys, J. A. L. Armour, 2000 Crossover breakpoint mapping identifies a subtelomeric hotspot for male meiotic recombination Hum. Mol. Genet 9:1239-1244
Begun D. J., C. F. Aquadro, 1991 Molecular population genetics of the distal portion of the X chromosome in Drosophila: evidence for genetic hitchhiking of the yellow-achaete-scute region Genetics 129:1147-1158
. 1992 Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster Nature 356:519-520[ISI][Medline]
Berry A. J., W. Ajioka, M. Kreitman, 1991 Lack of polymorphism on the Drosophila fourth chromosome resulting from selection Genetics 129:1111-1117
Charlesworth B., M. T. Morgan, D. Charlesworth, 1993 The effect of deleterious mutations on neutral molecular variation Genetics 134:1289-1303
Clark A. G., K. M. Weiss, D. A. Nickerson, et al. (11 co-authors) 1998 Haplotype structure and population genetic inferences from nucleotide sequence variation in human lipoprotein lipase Am. J. Hum. Genet 63:595-612[ISI][Medline]
Deinard A., K. Kidd, 1999 Evolution of a HOXB6 intergenic region within the great apes and humans J. Hum. Evol 36:687-703[ISI][Medline]
Dvorak J., M. C. Luo, Z. L. Yang, 1998 Restriction fragment length polymorphism and divergence in the genomic regions of high and low recombination in self-fertilizing and cross-fertilizing Aegilops species Genetics 148:423-434
Fullerton S. M., A. Bernardo Carvalho, A. G. Clark, 2001 Local rates of recombination are positively correlated with GC content in the human genome Mol. Biol. Evol 18:1139-1142
Fullerton S. M., K. M. Weiss, A. G. Clark, et al. (11 co-authors) 2000 Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism Am. J. Hum. Genet 67:881-900[ISI][Medline]
Gilad Y., D. Segre, K. Skorecki, M. W. Nachman, D. Lancet, D. Sharon, 2000 Dichotomy of single-nucleotide polymorphism haplotypes in olfactory receptor genes and pseudogenes Nat. Genet 26:221-224[ISI][Medline]
Hamblin M. T., A. Di Rienzo, 2000 Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus Am. J. Hum. Genet 66:1669-1679[ISI][Medline]
Harding R. M., S. M. Fullerton, R. C. Griffiths, J. Bond, M. J. Cox, J. A. Schneider, D. S. Moulin, J. B. Clegg, 1997 Archaic African and Asian lineages in the genetic ancestry of modern humans Am. J. Hum. Genet 60:772-789[ISI][Medline]
Harris E. E., J. Hey, 1999 X chromosome evidence for ancient human histories Proc. Natl. Acad. Sci. USA 96:3320-3324
. 2001 Human populations show reduced DNA sequence variation at the factor IX locus Curr. Biol 11:774-778[ISI][Medline]
Hudson R. R., N. L. Kaplan, 1995 Deleterious background selection and recombination Genetics 141:1605-1617
International Human Genome Sequencing Consortium. 2001 Initial sequencing and analysis of the human genome Nature 409:860-921[ISI][Medline]
International SNP Map Working Group. 2001 A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms Nature 409:928-933[ISI][Medline]
Jaruzelska J., E. Zietkiewicz, M. Batzer, D. E. C. Cole, J. P. Moisan, R. Scozzari, S. Tavare, D. Labuda, 1999 Spatial and temporal distribution of the neutral polymorphisms in the last Zfx intron: analysis of haplotype structure and genealogy Genetics 152:1091-1101
Kaessmann H., F. Heibig, A. von Haeseler, S. Paabo, 1999 DNA sequence variation in a non-coding region of low recombination on the human X chromosome Nat. Genet 22:78-81[ISI][Medline]
Kim Y., W. Stephan, 2000 Joint effects of genetic hitchhiking and background selection on neutral variation Genetics 155:1415-1427
Kraft T., T. Sall, I. Magnusson-Rading, N. O. Nilsson, C. Halden, 1998 Positive correlation between recombination rates and levels of genetic variation in natural populations of sea beet (Beta vulgaris subsp. maritima) Genetics 150:1239-1244
Maynard Smith J., J. Haigh, 1974 The hitch-hiking effect of a favourable gene Genet. Res 23:23-35[ISI][Medline]
Nachman M. W., 1997 Patterns of DNA variability at X-linked loci in Mus domesticus Genetics 147:1303-1316
. 2001 Single nucleotide polymorphisms and recombination rate in humans Trends Genet 17:481-485[ISI][Medline]
Nachman M. W., V. L. Bauer, S. L. Crowell, C. F. Aquadro, 1998 DNA variability and recombination rates at X-linked loci in humans Genetics 147:1133-1141
Nachman M. W., S. L. Crowell, 2000 Contrasting evolutionary histories of two introns of the Duchenne muscular dystrophy locus, Dmd, in humans Genetics 155:1855-1864
Nei M., W.-H. Li, 1979 Mathematical model for studying genetic variation in terms of restriction endonucleases Proc. Natl. Acad. Sci. USA 76:5269-5273[Abstract]
Payseur B. A., M. W. Nachman, 2000 Microsatellite variation and recombination rate in the human genome Genetics 156:1285-1298
Pluzhnikov A., P. Donnelly, 1996 Optimal sequencing strategies for surveying molecular genetic diversity Genetics 144:1247-1262
Przeworski M., R. R. Hudson, A. Di Rienzo, 2000 Adjusting the focus on human variation Trends Genet 16:296-302[ISI][Medline]
Rana B. K., D. Hewett-Emmett, L. Jin, et al. (12 co-authors) 1999 High polymorphism at the human melanocortin 1 receptor locus Genetics 151:1547-1557
Rieder M. J., S. L. Taylor, A. G. Clark, D. A. Nickerson, 1999 Sequence variation in the human angiotensin converting enzyme Nat. Genet 22:59-62[ISI][Medline]
Shabalina A. S., A. Y. Ogurtsov, V. A. Kondrashov, A. S. Kondrashov, 2001 Selective constraint in intergenic regions of human and mouse genomes Trends Genet 17:373-376[ISI][Medline]
Stephan W., C. H. Langley, 1998 DNA polymorphism in Lycopersicon and crossing-over per physical length Genetics 150:1585-1593
Tenaillon M. I., M. C. Sawkins, A. D. Long, R. L. Gaut, J. F. Doebley, B. S. Gaut, 2001 Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. Mays L.) Proc. Natl. Acad. Sci. USA 98:9161-9166
Venter J. C., M. D. Adams, E. W. Myers, et al 2001 (274 co-authors). The sequence of the human genome Science 291:1304-1351
Watterson G. A., 1975 On the number of segregating sites in genetical models without recombination Theor. Popul. Biol 7:256-276[ISI][Medline]
Wiehe T. H. E., W. Stephan, 1993 Analysis of a genetic hitchhiking model and its application to DNA polymorphism data from Drosophila melanogaster Mol. Biol. Evol 10:842-854[Abstract]
Yu N., Y.-X. Fu, N. Sambuughin, M. Ramsay, T. Jenkins, E. Leskinen, L. Patthy, L. B. Jorde, T. Kuromori, W.-H. Li, 2001 Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1 Mol. Biol. Evol 18:214-222
Zhao Z., L. Jin, Y.-X. Fu, et al. (13 co-authors) 2000 Worldwide DNA sequence variation in a 10-kb noncoding region on human chromosome 22 Proc. Natl. Acad. Sci. USA 97:11354-11358