Selective Sweeps in the Human Genome: A Starting Point for Identifying Genetic Differences Between Modern Humans and Chimpanzees

Karl C. Diller2, William A. Gilbert and Thomas D. Kocher

Hubbard Center for Genome Studies and Graduate Program in Genetics, University of New Hampshire

Despite more than a century of interest in the evolution of humans from our close relatives the great apes, the genes responsible for phenotypic differences between humans and chimpanzees have remained elusive. Sequencing of the chimpanzee genome is expected to identify some 42 million nucleotide differences between humans and chimpanzee. How can we identify the small proportion of these differences which are the essential elements of being human? We have analyzed the draft human genome to find regions which may have experienced recent strong selection in the human line. Included in the identified regions are several genes for neural development and function, skeletal development, and fat metabolism. These observations provide a starting point in the search to identify the salient genetic differences between modern humans and our immediate hominid ancestors.

Strong directional selection for a favorable new allele can cause a "selective sweep." As the new mutant rises in frequency, adjacent chromosomal regions are also swept to fixation in a process sometimes called genetic hitchhiking (Maynard-Smith and Haigh 1974Citation ). These events can be recognized as regions of low nucleotide diversity. The size of this region depends both on the length of time required to bring the mutation to fixation in the population and the local recombination rate. With time, the record of the selective sweep is gradually erased by new mutations and the continual turnover of neutral variation. The average persistence of time for a human single nucleotide polymorphism (SNP) is 4N generations, or about 1 Myr (assuming a population [N] of 10,000 and a generation time of 25 years). So a region devoid of SNPs because of a selective sweep will regain 50% of its normal level of polymorphism in 1 Myr and 75% within 2 Myr. The current areas of low nucleotide diversity in the human genome may therefore reflect recent selective sweeps surrounding genes which are important for the evolution of modern humans some 200,000 years ago but are much less likely to be associated with the rise of Homo erectus 2 MYA.

We examined the data of the International SNP Map Working Group (2001)Citation to identify areas of low nucleotide diversity in the human genome. In these data, the genome is divided into consecutive bins of 200,000 base pairs, and the density of SNPs is recorded from a panel of 24 ethnically diverse individuals. Over the whole genome, 2.5% of the bins had nucleotide diversity ({pi}) less than 2.0 x 10-4. Regions of low SNP density are most prevalent on the sex chromosomes: 89% of bins in the nonrecombining region of the Y and 15% of bins on the X chromosome have {pi} < 2.0 x 10-4.

Our analysis focused on the autosomes. While any bin where {pi} < 2.0 x 10-4 is a likely place to look for a selective sweep, areas where we find two or more consecutive bins are of special interest because they may indicate an episode of strong selection. The 192 low-diversity autosomal bins are not distributed randomly. One would expect no more than five bins to be clustered with other bins by chance, but we found 31 bins clustered into 12 runs of two or more consecutive bins. This clumping is highly unlikely (Runs test, t = -10.57, P << 10-16; see table 1 ) (Sokal and Rohlf 1981Citation , pp. 782–784). There are two runs of four consecutive low-diversity bins, three runs of three consecutive bins, and seven runs of two consecutive bins. None of these runs are on chromosomes containing "recombination deserts" described by Yu et al. (Yu et al. 2001Citation ). Because of continuing changes in the assembly of the human genome, we used BLAST to match the sequences of the original bins to the current annotated assembly (February 2002) and scanned these chromosomal regions to identify genes which might be responsible for a selective sweep.


View this table:
[in this window]
[in a new window]
 
Table 1 Runs Test for Distribution of Low-Diversity Bins on Human Autosomes

 
The run of four consecutive bins on chromosome 21 encodes two genes involved in neural development and function: SYNJ1, synaptojanin 1, is a presynaptic protein with a role in synaptic vesicle recycling and with a possible link to bipolar disorder (Saito et al. 2001Citation ); OLIG2, oligodendrocyte lineage transcription factor 2, specifies the development of oligodendrocytes (the myelin sheath) from the stem cell precursors of neurons and glial cells. It is required for motor neuron development and contributes to neural patterning (Zhou and Anderson 2002Citation ).

The run of four consecutive low-diversity bins on chromosome 16 contains ABCC11 and ABCC12 (also known as MRP8 and MRP9). These genes encode members of the multidrug resistance protein group of the ATP-binding cassette transporter superfamily. They are most closely related to ABCC5, which is a transporter of nucleosides in a variety of tissues (Dean, Rzhetsky, and Allikmets 2001Citation ). ABCC11 is expressed in most tissues, whereas expression of ABCC12 is limited to testis, ovary, and prostate. Their chromosomal location identifies them as candidates for two inherited forms of convulsive disorders (PKC and ICCA) (Tammur et al. 2001Citation ).

Eighteen of the 89 named genes found in the low SNP density bins were involved in neural development and function. Related in function to OLIG2, we have GCMB, a binary switch between neural and glial cell determination and EDN3 which induces reversion of melanocytes to their bipotential neural crest stem cell precursors allowing them to develop either into glial cells or melanocytes (Dupin et al. 2000Citation ). In runs of two and three consecutive low-diversity bins on chromosome 21, in the area involved with Down Syndrome, there are genes for central nervous system development and function (DSCR1, ITSN1). Also of note are nardilysin (NRD1) which in early mouse development is expressed almost exclusively in neural tissue (Fumagalli et al. 1998Citation ); members of the protocadherin beta family which are involved in synapse formation and neuron-neuron recognition and interaction; and two glutamate receptors (GRM1/3) involved in neurotransmission.

At least fifteen other genes on our list are involved in general skeletal development, including a bone morphogenic protein receptor (BMPR2) and a cartilage-derived morphogenic protein (GDF5). We find dysmorphic anatomic features from the Down Syndrome genes in trisomy 21, and from mutations in glucosidase (GCS1), polydactyly (PAPA-1), and TRIM37 (the syndrome of mulibrey nanism [Perpheentupa et al. 1973Citation ]).

One long-noted basic anatomical difference between humans and chimpanzees is the human subcutaneous layer of fat which is lacking in other primates (Wood-Jones 1929Citation , p. 309). In one of our three-bin runs we find the gene AGRP, agouti-related protein homolog, which regulates body weight, obesity, and fat distribution, including the layer of subcutaneous fat. Other lipid-related genes are an intracellular lipid receptor (OSBPL9) and 2 apolipoprotein genes (APOL5/6).

Although multiple adjacent bins of low SNP density suggest recent selective sweeps, isolated bins may indicate relatively ancient selective sweeps or instances of weaker selection. In total, the 192 autosomal bins with low SNP density contain 18 genes related to neural development and function, 15 other genes relating to structural development or growth factors, and 56 other genes of miscellaneous function (see table 2 Go ). There are also 470 hypothetical genes whose functions may be important to human evolution.


View this table:
[in this window]
[in a new window]
 
Table 2 Genes Found in Autosomal Low SNP Density Bins

 

View this table:
[in this window]
[in a new window]
 
Table 2 Continued

 
Our analysis is based on data from the draft sequence of the human genome and a relatively small sampling of human genetic diversity. Therefore, we can expect that a few of the low-density bins are artifacts of a low depth of coverage and therefore a low power to detect SNPs; we cannot rule out the possibility that a few gene-rich bins might have low SNP density because of background (negative) selection; and we must expect a few bins to have low SNP density purely by chance because of the inherently stochastic nature of mutation and drift. Nevertheless the majority of these bins fulfill the criteria of likely selective sweeps, and they provide a principled and plausible starting point in the search for functionally significant differences between the genomes of modern humans and chimpanzees. Important next steps are the sequencing of homologous regions of the chimpanzee genome and further assessment of human variation in these candidate regions.

Footnotes

Naruya Saitou, Reviewing Editor

Keywords: selective sweeps human evolution human genome chimpanzee genome genetic hitchhiking Back

Address for correspondence and reprints: Karl C. Diller, Hubbard Center for Genome Studies, Environmental Technology Building, 430, University of New Hampshire, Durham, New Hampshire 03824. E-mail: karl.diller{at}unh.edu . Back

References

    Dean M., A. Rzhetsky, R. Allikmets, 2001 The human ATP-binding cassette (ABC) transporter superfamily Genome Res 11:1156-1166[Abstract/Free Full Text]

    Dupin E., C. Glavieux, P. Vaigot, N. M. Le Douarin, 2000 Endothelin 3 induces the reversion of melanocytes to glia through a neural crest-derived glial-melanocytic progenitor Proc. Natl. Acad. Sci. USA 97:7882-7887[Abstract/Free Full Text]

    Fumagalli P., M. Accarino, A. Egeo, et al. (12 co-authors) 1998 Human NRD convertase: a highly conserved metalloendopeptidase expressed at specific sites during development and in adult tissues Genomics 47:238-245[ISI][Medline]

    Maynard-Smith J., J. Haigh, 1974 The hitchhiking effect of a favorable gene Genet. Res 23:23-35[ISI][Medline]

    Perpheentupa J., S. Autio, S. Leisti, C. Raitta, L. Tuuteri, 1973 Mulibrey nanism, an autosomal recessive syndrome with pericardial constriction Lancet 2:351-355[Medline]

    Saito T., F. Guan, D. F. Papolos, S. Lau, M. Klein, C. S. Fann, H. M. Lachman, 2001 Mutation analysis of SYNJ1: a possible candidate gene for chromosome 21q22-linked bipolar disorder Mol. Psychiatry 6:387-395[ISI][Medline]

    Sokal R. R., F. J. Rohlf, 1981 Biometry: the principles and practice of statistics in biological research W. H. Freeman and Co, San Francisco

    Tammur J., C. Prades, I. Arnould, et al. (12 co-authors) 2001 Two new genes from the human ATP-binding cassette transporter superfamily, ABCC11 and ABCC12, tandemly duplicated on chromosome 16q12 Gene 273:89-96[ISI][Medline]

    The International SNP Map Working Group. 2001 A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms Nature 409:928-933.[ISI][Medline]

    Wood-Jones F., 1929 Man's place among the mammals Edward Arnold, London

    Yu A., C. Zhao, Y. Fan, et al. (11 co-authors) 2001 Comparison of human genetic and sequence-based physical maps Nature 409:951-953[ISI][Medline]

    Zhou Q., D. J. Anderson, 2002 The bHLH transcription factors OLIG2 and OLIG1 couple neuronal and glial subtype specification Cell 109:61-73[ISI][Medline]

Accepted for publication August 26, 2002.