Gene Density and Human Nucleotide Polymorphism

Bret A. Payseur and Michael W. Nachman

Department of Ecology and Evolutionary Biology, Biosciences West Building, University of Arizona

Population genetics theory indicates that natural selection will affect levels and patterns of genetic variation at closely linked loci. Background selection (Charlesworth, Morgan, and Charlesworth 1993Citation ) proposes that the removal of recurrent deleterious mutations and associated neutral variants will cause a reduction of nucleotide variation in low-recombination regions. The strength of background selection depends on the deleterious mutation rate, the magnitude of selection and dominance, and the recombination rate. Genetic hitchhiking (Maynard Smith and Haigh 1974Citation ), the fixation of advantageous alleles and the associated fixation of linked neutral alleles, can also decrease nucleotide diversity in low-recombination regions. The extent of genetic hitchhiking depends on the strength of selection and the rate of recombination. Therefore, under both background selection and genetic hitchhiking, theory predicts that genomic regions that rarely recombine may be subject to reductions in nucleotide diversity. Furthermore, if the rate of deleterious mutation or selective sweeps (or both) is sufficiently high, background selection (Hudson and Kaplan 1995Citation ) and genetic hitchhiking (Wiehe and Stephan 1993Citation ) models predict an overall positive correlation between nucleotide polymorphism and recombination rate.

Empirical investigations of nucleotide variation support these predictions. In Drosophila melanogaster, regions of the genome with little recombination show reduced heterozygosity (Aguade, Miyashita, and Langley 1989Citation ; Begun and Aquadro 1991Citation ; Berry, Ajioka, and Kreitman 1991Citation ). Furthermore, there is evidence that nucleotide variation and recombination rate are positively correlated in several taxa, including fruit flies (Begun and Aquadro 1992Citation ), house mice (Nachman 1997Citation ), goatgrasses (Dvorak, Luo, and Yang 1998Citation ), sea beets (Kraft et al. 1998Citation ), tomatoes (Stephan and Langley 1998Citation ), humans (Nachman et al. 1998Citation ; Przeworski, Hudson, and Di Rienzo 2000Citation ; Nachman 2001Citation ), and maize (Tenaillon et al. 2001Citation ). The combination of theoretical and empirical results indicates that selection acting at linked sites is likely to be a major force shaping genomic patterns of nucleotide variation.

The documented relationship between nucleotide variation and recombination rate raises the question of whether other measurable variables can explain additional variation in nucleotide polymorphism in the context of selection at linked sites. We predict that the effects of selection at linked sites will depend on local gene density. If selection acts primarily on genes, genomic regions with high gene density will harbor more potential selective targets than genomic regions with low gene density. This prediction should be valid irrespective of whether positive or purifying selection is driving observed patterns. Humans provide a good system in which this prediction can be tested, for two reasons. First, gene density varies substantially across the genome (International Human Genome Sequencing Consortium 2001Citation ; Venter et al. 2001Citation ). For example, sequence data suggest that chromosome 19 has an average of 23 genes per Mbp, whereas chromosome 4 averages only 6 genes per Mbp (Venter et al. 2001Citation ). Second, estimates of nucleotide polymorphism assessed using reasonable sample sizes are available for multiple loci across the human genome. Here, we demonstrate that nucleotide diversity and gene density are negatively correlated in humans. This result provides further evidence for the importance of selection at linked sites and suggests that the number of genes in a genomic region is a reasonable indicator of selective intensity.

We assessed the relationship between nucleotide polymorphism (measured by Watterson's [1975]) and gene density using data from sequence-based studies of variation that sampled more than 10 chromosomes (table 1 ). The variance in can be quite large with sample sizes smaller than 10 (Pluzhnikov and Donnelly 1996Citation ).


View this table:
[in this window]
[in a new window]
 
Table 1 Data used for analyses in this study

 
For X-linked loci, nucleotide diversity was multiplied by 4/3 to account for the fact that the effective population size of the X chromosome is 3/4 of that of the autosomes (assuming a sex ratio of 1). Sequence-based maps of the human genome (http://www.ncbi.nlm.nih.gov/genome/guide/human/, June 2001 Version) were used to estimate the base pair position of each locus. Gene density was estimated by counting the number of genes in a window, including 1 Mbp of sequence on either side of each locus (http://www.ncbi.nlm.nih.gov/genome/guide/human/). Gene density estimates based on a 10-Mbp window gave similar results. Recombination rates were taken from Payseur and Nachman (2000)Citation , who compared the genetic and physical positions of microsatellites spaced at approximately 2-Mbp intervals. Recombination rates for X-linked loci were multiplied by 2/3 to correct for differences in population recombination rates. All variables were approximately normally distributed (visual inspection of histograms; Shapiro-Wilks goodness-of-fit test; P > 0.05). Additionally, the residuals from the regression of on all variables were normally distributed (P > 0.05). All analyses were done using least-squares linear regression.

Nucleotide polymorphism and recombination rate are strongly, positively correlated (R2 = 0.63; P = 0.0002; fig. 1a ) for these data, despite no evidence for a positive relationship between divergence and recombination rate (P > 0.05). Comparing the residuals of the regression of nucleotide polymorphism on recombination rate with gene density reveals a significant negative association (R2 = 0.25; P = 0.04; fig. 1b ). As predicted, nucleotide polymorphism is reduced in regions with higher gene density, once recombination rate variation is taken into account. A model including both recombination rate and gene density as independent variables explains 68% (adjusted R2; P = 0.0001; recombination rate: P = 0.0001; gene density: P = 0.05) of the variation in nucleotide polymorphism. There is weak evidence for a negative association between nucleotide polymorphism and gene density alone (R2 = 0.17; P = 0.10). There is no evidence of a statistical interaction between recombination rate and gene density, although such an interaction would be difficult to detect with our small sample size. We also asked whether an alternative measure of nucleotide variation, the average pairwise divergence between sequences, (Nei and Li 1979Citation ), is associated with gene density. There is a slight trend toward a negative relationship, but it is not statistically significant (P > 0.05 in all tests). and incorporate different aspects of the data in their estimates of nucleotide diversity. Whereas is estimated by counting the number of segregating sites in the total sample, is estimated by comparing all the pairwise sequence combinations and calculating the average number of differences. As a result, contains information about allele frequencies and does not. However, has a lower sampling variance than . Using the average number of sampled chromosomes (n = 124) and the average or value (approximately 0.1%) for the studies included in our analysis, under an infinite sites model with no recombination, the sampling variance of (0.034%) is nearly twice that of (0.019%). Although this effect may be ameliorated by recombination (Pluzhnikov and Donnelly 1996Citation ), the increased statistical difficulty in estimating may contribute to our failure to detect an association between gene density and .



View larger version (10K):
[in this window]
[in a new window]
 
Fig. 1.—(a) The relationship between nucleotide polymorphism and recombination rate. Scatterplot of nucleotide polymorphism (, expressed as a percentage) versus recombination rate (cM/Mb) for 17 loci surveyed in humans. R2 = 0.63; P = 0.0002. (b) The relationship between nucleotide polymorphism, corrected for variation in recombination rate, and gene density. Scatterplot of the residuals from a regression of nucleotide polymorphism on recombination rate versus gene density. R2 = 0.25; P = 0.04

 
An alternative interpretation of our results is that nucleotide polymorphism is shaped by other variables that are correlated with gene density or recombination rate. GC content is positively correlated with both gene density (International Human Genome Sequencing Consortium 2001Citation ) and recombination rate (Fullerton, Bernardo Carvalho, and Clark 2001Citation ) in humans. Consequently, we asked whether GC content was associated with nucleotide polymorphism alone or once gene density and recombination rate had been taken into account. There is no evidence that GC content affects levels of polymorphism in these data (P > 0.05, bivariate and multiple linear regression analyses), although a weak correlation between SNP (single nucleotide polymorphism) heterozygosity and GC content in humans has been reported (International SNP Map Working Group 2001Citation ). This discrepancy may be because of the relatively small number of loci used in our study.

Several conclusions follow from these results. First, natural selection at the molecular level has a pronounced effect on the levels of nucleotide heterozygosity in humans. Even if the total number of sites under selection is relatively modest, it is clear that the effects on linked, neutral variation can be substantial. It remains to be seen whether the patterns depicted in figure 1 are driven mainly by positive selection and associated genetic hitchhiking, purifying selection, or some combination of both. Background selection and genetic hitchhiking are not mutually exclusive, and it seems likely that both processes may be contributing to observed patterns (Kim and Stephan 2000Citation ). Second, these results suggest that the density of genes is a reasonable indicator of the potential for selection and that genes are likely the targets of selection in many cases. However, the high degree of sequence conservation between human and mouse in intergenic regions suggests that many of these intergenic regions may also be functional, possibly containing important cis-regulatory elements (Shabalina et al. 2001Citation ). The degree to which the densities of genes and cis-regulatory elements covary is therefore an interesting question for further investigation. Finally, our results indicate that levels of human nucleotide polymorphism can be predicted with reasonable precision, given the knowledge about local recombination rate and gene density. Because recombination rate and gene density can now be measured throughout the human genome, this predictive ability could assist efforts to map genes underlying complex diseases.

Acknowledgements

We thank Bruce Walsh for helpful discussions. We also acknowledge the useful comments of Adam Eyre-Walker and two anonymous reviewers.

Footnotes

Adam Eyre-Walker, Reviewing Editor

Address for correspondence and reprints: Bret A. Payseur, Department of Ecology and Evolutionary Biology, Biosciences West Building, University of Arizona, Tucson, Arizona 85721. payseur{at}email.arizona.edu . Back

References

    Aguade M., N. Miyashita, C. H. Langley, 1989 Reduced variation in the yellow-achaete-scute region in natural populations of Drosophila melanogaster Genetics 122:607-615[Abstract/Free Full Text]

    Alonso S., J. A. L. Armour, 2001 A highly variable segment of human subterminal 16p reveals a history of population growth for modern humans outside Africa Proc. Natl. Acad. Sci. USA 98:864-869[Abstract/Free Full Text]

    Badge R. M., J. Yardley, A. J. Jeffreys, J. A. L. Armour, 2000 Crossover breakpoint mapping identifies a subtelomeric hotspot for male meiotic recombination Hum. Mol. Genet 9:1239-1244[Abstract/Free Full Text]

    Begun D. J., C. F. Aquadro, 1991 Molecular population genetics of the distal portion of the X chromosome in Drosophila: evidence for genetic hitchhiking of the yellow-achaete-scute region Genetics 129:1147-1158[Abstract/Free Full Text]

    ———. 1992 Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster Nature 356:519-520[ISI][Medline]

    Berry A. J., W. Ajioka, M. Kreitman, 1991 Lack of polymorphism on the Drosophila fourth chromosome resulting from selection Genetics 129:1111-1117[Abstract/Free Full Text]

    Charlesworth B., M. T. Morgan, D. Charlesworth, 1993 The effect of deleterious mutations on neutral molecular variation Genetics 134:1289-1303[Abstract/Free Full Text]

    Clark A. G., K. M. Weiss, D. A. Nickerson, et al. (11 co-authors) 1998 Haplotype structure and population genetic inferences from nucleotide sequence variation in human lipoprotein lipase Am. J. Hum. Genet 63:595-612[ISI][Medline]

    Deinard A., K. Kidd, 1999 Evolution of a HOXB6 intergenic region within the great apes and humans J. Hum. Evol 36:687-703[ISI][Medline]

    Dvorak J., M. C. Luo, Z. L. Yang, 1998 Restriction fragment length polymorphism and divergence in the genomic regions of high and low recombination in self-fertilizing and cross-fertilizing Aegilops species Genetics 148:423-434[Abstract/Free Full Text]

    Fullerton S. M., A. Bernardo Carvalho, A. G. Clark, 2001 Local rates of recombination are positively correlated with GC content in the human genome Mol. Biol. Evol 18:1139-1142[Free Full Text]

    Fullerton S. M., K. M. Weiss, A. G. Clark, et al. (11 co-authors) 2000 Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism Am. J. Hum. Genet 67:881-900[ISI][Medline]

    Gilad Y., D. Segre, K. Skorecki, M. W. Nachman, D. Lancet, D. Sharon, 2000 Dichotomy of single-nucleotide polymorphism haplotypes in olfactory receptor genes and pseudogenes Nat. Genet 26:221-224[ISI][Medline]

    Hamblin M. T., A. Di Rienzo, 2000 Detection of the signature of natural selection in humans: evidence from the Duffy blood group locus Am. J. Hum. Genet 66:1669-1679[ISI][Medline]

    Harding R. M., S. M. Fullerton, R. C. Griffiths, J. Bond, M. J. Cox, J. A. Schneider, D. S. Moulin, J. B. Clegg, 1997 Archaic African and Asian lineages in the genetic ancestry of modern humans Am. J. Hum. Genet 60:772-789[ISI][Medline]

    Harris E. E., J. Hey, 1999 X chromosome evidence for ancient human histories Proc. Natl. Acad. Sci. USA 96:3320-3324[Abstract/Free Full Text]

    ———. 2001 Human populations show reduced DNA sequence variation at the factor IX locus Curr. Biol 11:774-778[ISI][Medline]

    Hudson R. R., N. L. Kaplan, 1995 Deleterious background selection and recombination Genetics 141:1605-1617[Abstract/Free Full Text]

    International Human Genome Sequencing Consortium. 2001 Initial sequencing and analysis of the human genome Nature 409:860-921[ISI][Medline]

    International SNP Map Working Group. 2001 A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms Nature 409:928-933[ISI][Medline]

    Jaruzelska J., E. Zietkiewicz, M. Batzer, D. E. C. Cole, J. P. Moisan, R. Scozzari, S. Tavare, D. Labuda, 1999 Spatial and temporal distribution of the neutral polymorphisms in the last Zfx intron: analysis of haplotype structure and genealogy Genetics 152:1091-1101[Abstract/Free Full Text]

    Kaessmann H., F. Heibig, A. von Haeseler, S. Paabo, 1999 DNA sequence variation in a non-coding region of low recombination on the human X chromosome Nat. Genet 22:78-81[ISI][Medline]

    Kim Y., W. Stephan, 2000 Joint effects of genetic hitchhiking and background selection on neutral variation Genetics 155:1415-1427[Abstract/Free Full Text]

    Kraft T., T. Sall, I. Magnusson-Rading, N. O. Nilsson, C. Halden, 1998 Positive correlation between recombination rates and levels of genetic variation in natural populations of sea beet (Beta vulgaris subsp. maritima) Genetics 150:1239-1244[Abstract/Free Full Text]

    Maynard Smith J., J. Haigh, 1974 The hitch-hiking effect of a favourable gene Genet. Res 23:23-35[ISI][Medline]

    Nachman M. W., 1997 Patterns of DNA variability at X-linked loci in Mus domesticus Genetics 147:1303-1316[Abstract/Free Full Text]

    ———. 2001 Single nucleotide polymorphisms and recombination rate in humans Trends Genet 17:481-485[ISI][Medline]

    Nachman M. W., V. L. Bauer, S. L. Crowell, C. F. Aquadro, 1998 DNA variability and recombination rates at X-linked loci in humans Genetics 147:1133-1141

    Nachman M. W., S. L. Crowell, 2000 Contrasting evolutionary histories of two introns of the Duchenne muscular dystrophy locus, Dmd, in humans Genetics 155:1855-1864[Abstract/Free Full Text]

    Nei M., W.-H. Li, 1979 Mathematical model for studying genetic variation in terms of restriction endonucleases Proc. Natl. Acad. Sci. USA 76:5269-5273[Abstract]

    Payseur B. A., M. W. Nachman, 2000 Microsatellite variation and recombination rate in the human genome Genetics 156:1285-1298[Abstract/Free Full Text]

    Pluzhnikov A., P. Donnelly, 1996 Optimal sequencing strategies for surveying molecular genetic diversity Genetics 144:1247-1262[Abstract/Free Full Text]

    Przeworski M., R. R. Hudson, A. Di Rienzo, 2000 Adjusting the focus on human variation Trends Genet 16:296-302[ISI][Medline]

    Rana B. K., D. Hewett-Emmett, L. Jin, et al. (12 co-authors) 1999 High polymorphism at the human melanocortin 1 receptor locus Genetics 151:1547-1557[Abstract/Free Full Text]

    Rieder M. J., S. L. Taylor, A. G. Clark, D. A. Nickerson, 1999 Sequence variation in the human angiotensin converting enzyme Nat. Genet 22:59-62[ISI][Medline]

    Shabalina A. S., A. Y. Ogurtsov, V. A. Kondrashov, A. S. Kondrashov, 2001 Selective constraint in intergenic regions of human and mouse genomes Trends Genet 17:373-376[ISI][Medline]

    Stephan W., C. H. Langley, 1998 DNA polymorphism in Lycopersicon and crossing-over per physical length Genetics 150:1585-1593[Abstract/Free Full Text]

    Tenaillon M. I., M. C. Sawkins, A. D. Long, R. L. Gaut, J. F. Doebley, B. S. Gaut, 2001 Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. Mays L.) Proc. Natl. Acad. Sci. USA 98:9161-9166[Abstract/Free Full Text]

    Venter J. C., M. D. Adams, E. W. Myers, et al 2001 (274 co-authors). The sequence of the human genome Science 291:1304-1351[Abstract/Free Full Text]

    Watterson G. A., 1975 On the number of segregating sites in genetical models without recombination Theor. Popul. Biol 7:256-276[ISI][Medline]

    Wiehe T. H. E., W. Stephan, 1993 Analysis of a genetic hitchhiking model and its application to DNA polymorphism data from Drosophila melanogaster Mol. Biol. Evol 10:842-854[Abstract]

    Yu N., Y.-X. Fu, N. Sambuughin, M. Ramsay, T. Jenkins, E. Leskinen, L. Patthy, L. B. Jorde, T. Kuromori, W.-H. Li, 2001 Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1 Mol. Biol. Evol 18:214-222[Abstract/Free Full Text]

    Zhao Z., L. Jin, Y.-X. Fu, et al. (13 co-authors) 2000 Worldwide DNA sequence variation in a 10-kb noncoding region on human chromosome 22 Proc. Natl. Acad. Sci. USA 97:11354-11358[Abstract/Free Full Text]

Accepted for publication October 8, 2001.