Y-Chromosome Mismatch Distributions in Europe

Luísa Pereira, Isabelle Dupanloup, Zoë H. Rosser, Mark A. Jobling and Guido Barbujani

Instituto de Patologia e Imunologia Molecular da Universidade do Porto (IPATIMUP) and Faculdade de Ciências da Universidade do Porto, Porto, Portugal
Dipartimento di Biologia, Università di Ferrara, Ferrara, Italy
Department of Genetics, University of Leicester, Leicester, England


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Ancient demographic events can be inferred from the distribution of pairwise sequence differences (or mismatches) among individuals. We analyzed a database of 3,677 Y chromosomes typed for 11 biallelic markers in 48 human populations from Europe and the Mediterranean area. Contrary to what is observed in the analysis of mitochondrial polymorphisms, Tajima's test was insignificant for most Y-chromosome samples, and in 47 populations the mismatch distributions had multiple peaks. Taken at face value, these results would suggest either (1) that the size of the male population stayed essentially constant over time, while the female population size increased, or (2) that different selective regimes have shaped mitochondrial and Y-chromosome diversity, leading to an excess of rare alleles only in the mitochondrial genome. An alternative explanation would be that the 11 variable sites of the Y chromosome do not provide sufficient statistical power, so a comparison with mitochondrial data (where more than 200 variable sites are studied in Europe) is impossible at present. To discriminate between these possibilities, we repeatedly analyzed a European mitochondrial database, each time considering only 11 variable sites, and we estimated mismatch distributions in stable and growing populations, generated by simulating coalescent processes. Along with theoretical considerations, these tests suggest that the difference between the mismatch distributions inferred from mitochondrial and Y-chromosome data are not a statistical artifact. Therefore, the observed mismatch distributions appear to reflect different underlying demographic histories and/or selective pressures for maternally and paternally transmitted loci.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Analyses of mitochondrial and Y-chromosome polymorphisms in humans tend to suggest that the female- and the male-transmitted gene pools evolved under somewhat different conditions. Although some studies (notably, Poloni et al. 1997Citation ; see Bertranpetit 2000Citation ) found congruent results, Y-chromosome data seem to be characterized by lower diversity within populations and more significant spatial structure than mitochondrial data (see, e.g., Jorde et al. 2000)Citation . It is unclear why this is so. Selection (Excoffier 1990Citation ; Wise et al. 1998Citation ; Wyckoff, Wang, and Wo 2000Citation ) and different demographic histories for females and males (Sajantila et al. 1996Citation ; Seielstad, Minch, and Cavalli-Sforza 1998Citation ; Perez-Lezaun et al. 1999Citation ) are two popular types of explanations.

Changes in population size tend to leave recognizable signatures in the patterns of nucleotide diversity. Therefore, the distribution of pairwise sequence differences in a sample (or, simply, the mismatch distribution) contains information on the population's history (Rogers and Harpending 1992Citation ). The genealogy of a population of constant size is expected to have long deep branches (Donnelly 1996Citation ); mutations occurring along these branches will be shared by several lineages, which will result in an irregular or ragged distribution of pairwise sequence differences. Conversely, the genealogy of a population that has substantially grown in size has long terminal branches, and the mutations that have occurred along these branches, i.e., most mutations, will be specific to a single lineage (Donnelly 1996Citation ). Under these conditions, one expects unimodal mismatch distributions (Harpending et al. 1993Citation ; Harpending 1994Citation ; Marjoram and Donnelly 1994Citation ), whose means, under an infinite-sites mutation model, increase as a function of the time elapsed after population growth (Sherry et al. 1994Citation ). However, different selective regimes may mimic the effects of changes in population size.

In addition, recombination acts as a confounding factor, for it brings together chromosome regions that evolved independently. For that reason, with only one exception (Alonso and Armour 2001)Citation , human mismatch distributions have only been studied at the mitochondrial level so far, based on both restriction fragment length polymorphisms (RFLPs) (Rogers and Harpending 1992Citation ; Harpending 1994Citation ) and hypervariable region I (HVRI) sequences (Excoffier and Schneider 1999Citation ). Almost all of these distributions are unimodal. In general, both the mean and the variance are highest (and therefore the curve is smoothest) in African populations, lower among Asians and Americans, and lowest among Europeans. The mean mismatch is related to the time of the expansion through the mutation rate (Rogers and Harpending 1992Citation ; Rogers and Jorde 1995Citation ), so the dates of the main demographic expansions are estimated at around 110,000 years ago in East Africa, 70,000 years ago in the rest of Africa and Asia, 55,000 years ago in America, and 40,000 years ago in Europe and in the Middle East (Excoffier and Schneider 1999Citation ). The few exceptions are represented by populations which may have undergone recent bottlenecks, thus presumably losing the typical genetic features of expanding populations (Excoffier and Schneider 1999Citation ).

Because only mitochondrial mismatch distributions have been studied so far, it is not yet known whether the inferred expansions affected the entire populations or only their female components. In this study, we calculated the mismatch distributions in a data set of Y-chromosome biallelic polymorphisms in 48 population samples from Europe and the Mediterranean area (data from Rosser et al. 2000)Citation . The results obtained differ sharply from those observed for mitochondrial data. To understand the causes of that discrepancy, a database of European mitochondrial sequences was reanalyzed and patterns in the mismatch distributions of computer-simulated populations were studied with the aim of determining the effects of expansions versus constant population sizes.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Databases
The database of Y-chromosome biallelic markers we considered (Rosser et al. 2000)Citation comprised data on 48 populations (listed in table 1 ), for a total of 3,677 individuals. Most were European, but populations from the eastern and southern shores of the Mediterranean sea, the Caucasus region, and Greenland were also included.


View this table:
[in this window]
[in a new window]
 
Table 1 Measures of Genetic Diversity Estimated from Y-Chromosome Data

 
The 11 biallelic markers considered, namely, 2 insertion/deletion and 9 single-nucleotide (SNP) polymorphisms, defined 10 alleles, 6 of them polymorphic or subpolymorphic (frequency > 0.01) and 4 of them rare in Europe. For the sake of consistency with other studies, and because micro- and minisatellite variation has been observed within most such alleles (Jobling and Tyler-Smith 1995Citation , 2000; Karafet et al. 1999Citation ), we refer to each of them as a "haplogroup" (De Knijff 2000)Citation . A single minimum-spanning tree could be constructed on the basis of those 10 haplogroups (fig. 1 ). Therefore, there was reason to believe that each of the 11 polymorphisms of interest represented the effect of a unique mutational event, without ambiguities. Full haplogroup descriptions and frequencies are in Rosser et al. (2000)Citation .



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 1.—Network summarizing the evolutionary relationships among the 10 haplogroups observed in Europe. Each arrow represents one mutational event, whose probable direction is indicated by the arrow (from Rosser et al. 2000Citation )

 
A database of mitochondrial DNA HVRI sequences in Europe (updated from Simoni et al. 2000)Citation was used for comparisons of patterns of genetic diversity in maternally transmitted genes. Of the 48 population samples available there, 21 were selected that approximately matched the geographic location of the samples considered in this analysis of Y-chromosome diversity.

Mismatch Distributions and Neutrality Tests
Mismatch distributions and gene diversity, i.e., the probability that two randomly sampled chromosomes differ from each other (Nei 1987Citation ), were estimated for both databases by ARLEQUIN, version 2.0 (Schneider, Roessli, and Excoffier 2000)Citation . The fit of the observed distribution of mismatches to a model of population expansion was tested by a bootstrap approach, also implemented in ARLEQUIN. Note that for this method, the null hypothesis is one of population expansion, due to the fact that there is no quantitative expectation as for the shape of the mismatch distribution in a stationary population (Harpending 1994Citation ), whereas under the hypothesis of expansion, a parameter {tau} estimated from the data allows one to predict the average mismatch.

Within each population sample (including simulated samples; see below), departures from mutation-drift or mutation-selection equilibrium were tested by means of Tajima's D and Fu's FS. In Tajima's (1989a, 1989b) test, the parameter {theta} = 2Nµ (where N is the population size and µ is the mutation rate) is independently estimated twice, once from the number of polymorphic sites and once from the average mismatch in the sample. Differences between the two estimates are then attributed to selection or to the demographic history of the population studied. Similarly, Fu's (1997)Citation FS statistic compares the observed number of alleles in a sample with the number of alleles expected if the population has kept a constant size.

The significance of D and FS was tested by randomization. By the coalescent simulation program implemented in the ARLEQUIN package (Schneider, Roessli, and Excoffier 2000)Citation , separately for each sample studied, we generated random samples from a hypothetical stationary population whose parameter {theta} = 2Nµ was equal to the average number of observed pairwise differences. For each sample, the procedure was repeated 1,000 times, in every case recomputing the D and FS statistics so as to obtain empirical null distributions of these statistics and hence the probability of the observed D and FS values under the hypothesis of demographic stationarity.

Reanalysis of Reduced Mitochondrial Data Sets
To understand whether the Y-chromosome mismatch distributions could reflect an insufficient number of sites considered, we reanalyzed the mtDNA database in two ways.

  1. From one randomly selected European population sample, we repeatedly calculated the mismatch distribution, each time removing 10 nucleotide sites from the initial 360 (starting from positions 16024–16033 of the Cambridge reference sequence; Anderson et al. 1981Citation ), until 10 sites were left.
  2. In the 21 European samples, four sets of 11 polymorphic sites were selected so as to analyze the same number of sites for mtDNA and for the Y chromosome. The criteria of selection were as follows. In two runs of the analysis (data sets A and B), 11 sites were selected at random; in one run (data set C), 10 highly variable sites were used; and in one run (data set D), selection was among poorly polymorphic sites.

Coalescent Simulations
We generated samples from stationary and expanding populations by Monte Carlo simulation to see whether the shapes of the Y-chromosome mismatch distributions described in this study were compatible with some form of demographic expansion. The simulation algorithm was based on the coalescent process with superimposed mutations, as described by Hudson (1990)Citation . Each sample was obtained by first generating its genealogy. Mutations were then randomly placed on the genealogy assuming they occurred according to a uniform and constant Poisson process.

First, we simulated 1,000 samples of genes under the assumption of a large and constant population size from a single panmictic deme. Each sample was composed of 80 individuals, i.e., 80 sets of 300 potentially variable sites. The size of the deme was 5,000 haploid individuals, and the mutation rate was 2 x 10-4 per generation for the whole sequence, i.e., 6.7 x 10-7 for each site. Although a plausible mutation rate for Y-chromosome biallelic polymorphisms seems to be around 5 x 10-7 (Hammer 1995Citation ; Jobling, Pandya, and Tyler-Smith 1997Citation ) or less (Thomson et al. 2000)Citation , we chose this higher rate so as to obtain similar mean pairwise differences in the simulated and in the real samples. The initial number of sites, 300, was also chosen because with those mutation rates, most sites (>97% in stationary populations) were monomorphic at the end of each simulation.

Second, four processes of exponential population expansion were simulated using the same coalescent approach, namely, (1) expansion from 50,000 years ago until now, with a 100-fold increase in population size, final effective population size N0 = 50,000; (2) expansion from 50,000 years ago until now, with a 100-fold increase in population size, N0 = 100,000; (3) expansion from 50,000 years ago until now, with a 100-fold increase in population size, N0 = 200,000; and (4) expansion from 100,000 years ago until now, with a 100-fold increase in population size, N0 = 100,000. The mutation rate was the same as that considered for the stationary populations. One thousand samples of 80 individuals were generated in this way for each of the four processes.

To more easily compare the simulation results, we defined three basic shapes of the mismatch distribution, namely, unimodal with a maximum at 0 (type 0), unimodal with a maximum >0 (type 1), and bimodal (type 2); examples are shown in figure 5 .



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 5.—Scheme of the most common shapes of the mismatch distribution observed in the simulations

 

    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Mismatch Distributions and Neutrality Tests
Mismatch distributions obtained for Y chromosome biallelic markers were multimodal with one exception: the Chuvash population from Russia (fig. 2 ). Each distribution had at least two peaks, one at 0 and the other at a number of differences that varied among populations. These shapes reflect the fact that in many populations, most Y chromosomes belong to two frequent haplogroups, whereas other haplogroups occur at lower frequencies. Therefore, the peak at 0 differences corresponded to the comparisons between individuals that share the same allele, and the second peak was located at the mismatch representing the number of mutational steps separating the most frequent haplogroups. Where more than two haplogroups occurred at intermediate or high frequencies, there was a third, and sometimes a fourth, peak.



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 2.—Mismatch distributions for Y-chromosome biallelic markers in 48 populations.

 
For 13 populations, the hypothesis of expansion could be rejected at the P < 0.05 level (fifth column of table 1 ). Although these probabilities were only nominal, Bonferroni's correction for multiple tests (Sokal and Rohlf 1995Citation ) confirmed significant overall departure from expansion expectations for the samples in this study. Conversely, all 21 mismatch distributions of the mtDNA samples appeared compatible with the effects of a population increase (fifth column of table 2 ). One parameter of the mismatch distribution, {tau}, estimates the time elapsed since population expansion (Rogers and Jorde 1995Citation ; Rogers 1995Citation ). This parameter is not reported in table 1 because we found little or no evidence for expansions in the shape of the Y-chromosome mismatch distributions.


View this table:
[in this window]
[in a new window]
 
Table 2 Measures of Genetic Diversity Estimated from mtDNA Data

 
It is possible to lump together Y-chromosome distributions based on their shapes; the clusters obtained in this way corresponded to sets of geographically near populations (fig. 3 ). This seems to be a consequence of the clinal variation shown by most nuclear markers in Europe (Chikhi et al. 1998Citation ; Casalotti et al. 1999Citation ; Quintana-Murci et al. 1999Citation ; Rosser et al. 2000Citation ; Barbujani and Bertorelle 2001Citation ). Most Western and Central European populations (British Isles, France, Belgium, and the Netherlands) showed a peak at three differences, i.e., the mutational distance between haplogroups 1 and 2, which represented the largest fraction of haplogroups there. Iberian populations (except Basques) showed an additional peak at five differences, resulting from the presence of a substantial number of haplogroup 21 chromosomes, which differ from those of haplogroup 1 by five mutational steps. The Southern-Central European and Turkish samples showed smoother distributions, reflecting larger numbers of different haplogroups. The increased within-population diversity seems to be largely due to the fact that differently oriented clines, from the southeast into the northwest and from south to north, converge in that area. As a consequence, haplogroups that were very rare or absent elsewhere tended to reach substantial frequencies in these populations. For northern-eastern samples, other peaks at three, four, and five differences were evident, resulting from differences between haplogroups 2 and 16, haplogroups 2 and 3, and haplogroups 3 and 16, respectively.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 3.—Mismatch distributions for Y-chromosome biallelic markers in the various population groups.

 
Mismatch distributions appeared to be bi- or multimodal, regardless of whether single samples (fig. 2 ) or groups thereof (fig. 3 ) were analyzed. Accordingly, insufficient sample size does not seem to be a plausible explanation for that finding. Bimodal distributions were also observed in the few Asian and North African samples available, and among Greenlanders. We do not know of any suitable Y-chromosome data set which could allow comparison with other continents.

Mitochondrial and Y-chromosome variation also differed when summarized by means of Tajima's D and Fu's FS (seventh and ninth columns of tables 1 and 2 ). For mitochondrial data, both statistics were negative and significant (with the exception only of Saami), and all FS values were significant at the 0.001 level. Conversely, when estimated from Y-chromosome data, most values of FS, and especially of D, were insignificant, and the latter were even positive in 12 cases. Such positive D values were not associated with any spatial pattern that we could recognize. On the contrary, the four negative and significant values occurred in linguistic (Basques) or geographic isolates (Sardinians, Scots, Cornish; note, however, the small sample size in Sardinia), also showing low gene diversity. Gene diversity seems to be patterned in space (sixth column of table 1 ), with comparatively high values in the south and in the east, as also observed for mitochondrial variation (Comas et al. 1997Citation ).

By and large, taken at face value, mismatch distributions, Tajima's D, and Fu's FS would suggest that the European male population has had a different history than the female population and that only the latter has increased substantially in numbers. Before drawing any conclusions, however, it is better to ask whether those apparent differences between sexes may simply be some sort of statistical artifact. Only 11 Y-chromosome polymorphic sites were studied, versus more than 200 for mtDNA. Might that have biased the results?

Reanalysis of Reduced Mitochondrial Data Sets
Initially, we repeatedly estimated the mitochondrial mismatch distribution and related statistics in one randomly chosen sample, the Cornish sample, each time considering a decreasing number of sites, from 360 to 10. Figure 4 shows that the characteristic, unimodal pattern of the mismatch distribution is always maintained through repeated reductions of molecular information in the 69 HVRI sequences considered. As expected, the mean moved left and the variance decreased as fewer and fewer nucleotide positions were considered. The reduction in the diversity was not linear; it was slow in the first steps, and it accelerated later, probably reflecting the fact that the 5' and 3' extremes of the HVRI are less variable than is the intermediate segment.



View larger version (10K):
[in this window]
[in a new window]
 
Fig. 4.—Mismatch distributions observed through analysis of subsets of data obtained by progressively removing 10 (constant or polymorphic) sites from the mitochondrial hypervariable region I of the Cornish population

 
As the number of sites considered decreased, D always remained negative and lost significance when 50 sites were left, whereas FS was significant even when 30 nucleotide sites were analyzed (the last 20 sites were monomorphic; table 3 ). This may indicate that although D is more robust than FS for small sample sizes, FS is more sensitive when few polymorphic positions are considered.


View this table:
[in this window]
[in a new window]
 
Table 3 Diversity Parameters Estimated in the Analysis of Mitochondrial Hypervariable Region I in the Cornish Population (N = 69), Considering a Decreasing Number of Sites Each Time

 
As a second test, we estimated the mismatch distributions from four different sets of 11 mtDNA polymorphic sites in the 21 populations for which both mtDNA and Y-chromosome data were available (table 4 ). The shape of the mismatch distribution was almost always unimodal, with only three vaguely bimodal shapes in a total of 81 mismatch distributions simulated (in the other three cases, all in data set D, no site was polymorphic among sequences, and therefore no statistic could be estimated). These results did not depend on the way polymorphic sites were selected, i.e., at random or on the basis of their levels of variation. Predictably, analyses of highly variable sites (data set C) yielded the highest means and variances.


View this table:
[in this window]
[in a new window]
 
Table 4 Median Observed Values in the Analysis of Reduced mtDNA Data Sets Comprising 11 Sites in the 21 Population Samples of Table 2

 
Once again, Fu's FS and Tajima's D values were negative with one exception (Saami), Also, confirming data obtained in the previous simulation, FS negative values were statistically significant in the vast majority of the cases. In contrast, Tajima's D negative values did not often reach significance, although they tended to do so in data set D.

It is evident that even when the number of sites considered is the same as that available for the Y chromosome, mtDNA mismatch distributions are unimodal and therefore different from those calculated for the Y chromosome.

Simulations
In simulated stationary populations, the average mismatch was higher than that for expanding populations (table 5 ) and close to the expected value, i.e., the parameter {theta} used to generate the simulated samples. This is what one expects under mutation-drift equilibrium (Rogers and Harpending 1992Citation ; Rogers et al. 1996Citation ). Also, the observed standard deviation (1.22) was close to the expectation (1.26) derived by Tajima (1983, eq. 30). In expanding populations, conversely, the average mismatch and its standard deviation were reduced, but both increased with the size of the population after the expansion and with the time since the expansion event.


View this table:
[in this window]
[in a new window]
 
Table 5 Simulation Results

 
Less than 12% of the mismatch distributions in the samples generated under stationarity showed a geometric form (type 0), and <20% had a single peak (type 1). Around 70% of the distributions showed multiple peaks (type 2). Conversely, for expanding populations, the number of bell-shaped mismatch distributions increased with the size of the population after the expansion and with the time to the expansion event and, correspondingly, the number of type 0 distributions decreased. Also, the proportion of distributions with multiple (generally two) peaks increased with the time since expansion and with the size of the population after expansion. The largest value was observed for populations that had expanded for 50,000 years, reaching an effective size N0 of 200,000, where 21.8% of the mismatch distributions had two peaks (type 2).

In synthesis, the bimodal mismatch distributions observed in Y-chromosome European samples can be generated in simulations of both stationary and postexpansion populations. However, depending on the modes of the simulated expansion, bimodality is from 3–10 times as frequent in stationary populations as in expanding populations.

In the simulated stationary populations, both D and FS showed a wide distribution centered on 0 (552 negative values out of 1,000 simulations for D, and 577 negative values for FS). Only in 3.9% and 8.7% of the cases, respectively, were these values significant. Conversely, when expansions were simulated, FS was always negative and was significant at the 5% level in >95% of the iterations. Tajima's D was also negative in nearly all cases of expansion and reached statistical significance in >85% of the simulations.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The mismatch distributions inferred in this study from Y-chromosome biallelic markers were bimodal and did not resemble those inferred in the same populations from mtDNA data, which were unimodal. Statistical tests failed to reject a neutral equilibrium model for European Y-chromosome variation, whereas there was highly significant departure from equilibrium for mtDNA data (Merriwether et al. 1991Citation ; Excoffier and Schneider 1999Citation ) (table 5 ).

Models of population expansion do not predict, even transitorily, the presence of multiple peaks in the mismatch distribution (Rogers and Harpending 1992Citation ; Rogers and Jorde 1995Citation ). In Slatkin and Hudson's (1991)Citation simulations, bimodal distributions with a peak at 0 were observed only for stationary populations. Conversely, expanding populations showed no instance of bimodality, and there were virtually no observations for mismatch = 0 (Slatkin and Hudson 1991Citation , p. 560). Similar results were obtained by Harpending et al. (1998)Citation , who also showed that gene trees with a few well-differentiated alleles, much like those described in this study for the Y chromosome, are the rule in populations whose size has stayed constant or contracted.

Three lines of evidence suggest that the results of this study are not simply a statistical artifact:

  1. By analyzing biallelic variation as we did in this study, one neglects other possible, but so far undetected, polymorphism at other sites. However, that also applies to the mitochondrial RFLP studies which, as we have seen, show very different, unimodal mismatch distributions (Harpending 1994Citation ).
  2. When we reanalyzed subsets of mitochondrial HVRI data, the shape of the distributions remained unimodal, and Tajima's D and Fu's FS remained negative (although only the latter was always significant).
  3. In our simulations, bimodal distributions appeared much more frequently in stationary than in expanding populations. We did not estimate a likelihood ratio because the numerical results depended on admittedly approximate expansion parameters and mutation rates. However, all factors considered, constant population sizes seemed roughly 3–10 times as likely as expansions.

The demographic scenarios we simulated were very simple. It is customary to model expansions as either instantaneous (see, e.g., Rogers and Jorde 1995Citation ) or exponential phenomena (this study; see also Excoffier and Schneider 1999Citation ), but populations may have grown in other ways. To mention just one, temporary contractions may have punctuated periods of general expansion, and that may have had an impact on levels and patterns of genetic diversity (Excoffier and Schneider 1999Citation ). However, in the absence of more sophisticated testable models, this study does not suggest that the Y-chromosome and the mitochondrial mismatch distributions differ from each other because of the limited resolution offered by the available Y-chromosome data.

Nonunimodal distributions (and insignificant Tajima's and Fu's statistics) are regarded as evidence that populations evolved under neutrality, without significantly increasing in size (Harpending et al. 1993Citation ). Departures from the expected shape can reflect adaptation (if one assumes constant population size), demographic changes (if one assumes neutrality), or both and can also occur when mutation rates vary across nucleotide sites (Aris-Brosou and Excoffier 1996Citation ). In principle, therefore, the different results obtained in the analysis of maternally and paternally transmitted genes in Europe may be due to differences in mutation mechanisms, differences in selective regimes, differences in past demographic history, or combinations thereof.

Might some sort of distorted mutational process have generated spurious multimodal mismatch distributions? Each of the 11 polymorphisms considered in this study probably results from a mutation that occurred only once (Rosser et al. 2000)Citation , and therefore these polymorphisms meet the assumptions of the model underlying the theory of mismatch distributions, the infinite-sites model (Rogers and Harpending 1992Citation ). In addition, simulations suggest anyway that the shape of the mismatch distribution tends to faithfully reflect the demographic history of a population, despite even substantial violations of the infinite-sites model (Rogers at al. 1996Citation ). Finally, recurrent mutation at some sites, reflecting mutation rate heterogeneity, may mimic the effects of population growth (Aris-Brosou and Excoffier 1996Citation ), but here the problem is the opposite, i.e., how to explain the apparent constancy of population size suggested by data. In short, we cannot rule out yet-to-be-discovered peculiarities of the Y-chromosome mutation process, but even if they existed, at present it is hard to imagine how such peculiarities could account for the results of this study.

Adaptation is the second factor. Tests based on the comparison of within-species and between-species nucleotide diversity have failed so far to reject the hypothesis of neutrality for Y-chromosome markers in comparisons between humans and mice (Nachman 1998Citation ), but not in comparisons between humans and Old World monkeys (Wyckoff, Wang, and Wu 2000)Citation . In addition, some loci of the Y chromosome are known to affect male fertility (Vogt 1997Citation ), and some detrimental mutations have been shown to occur more frequently on a particular Y-chromosome background (Jobling et al. 2000)Citation . Therefore, some role of selective pressures appears probable. However, Nachman's (1998)Citation results and other simple calculations suggest that selection can explain, per se, only a small fraction of the human Y chromosome variation (Bertranpetit 2000)Citation .

The differences described here between mitochondrial and Y-chromosome data seem therefore to reflect, at least in part, the effects of past demographic phenomena. There are a few complications, though. Mismatch distributions from chromosomes subject to recombination contain little unambiguous evolutionary information. However, Tajima's D and Fu's FS have been estimated within autosomal regions with no apparent recombination (Harding et al. 1997Citation ; Hey 1997Citation ; Zietkiewicz et al. 1998Citation ), and their values do not appear to depart from neutral, stationary expectations. Fay and Wu (1999)Citation proposed that the smaller mitochondrial population size (one fourth that of autosomal genes) has caused a stronger impact of past population bottlenecks on mitochondrial variation. That interpretation seems at odds with the results of this study, because indices of Y-chromosome diversity resemble those estimated at the other nuclear loci, despite the fact that, in principle, Y-chromosome and mitochondrial effective population sizes should be the same.

It thus seems necessary to envisage either different demographic histories for males and females, with the former leaving a stronger mark on autosomal variation, or some combination of demographic changes and selective processes. Schematically, three hypotheses appear compatible with the available data:

  1. The European female population increased in size; the male population did not.
  2. Neither population increased in size, but there was disruptive selection for mtDNA.
  3. Both populations increased in size, and there was purifying selection on the Y chromosome.

The data we analyzed do not allow, at present, discrimination among these hypotheses. However, it is worth noting that a small population size does not necessarily mean small numbers of individuals (of males in the present case). A high variance of reproductive success among individuals reduces the effective population size (Crow 1958Citation ). If the number of offspring has been generally more variable among males than among females, the effective population size inferred from Y-chromosome diversity is expected to be less (which is what this study suggests), and the genetic differences between populations tend to be greater (which has been demonstrated by, among others, Seielstad, Minch, and Cavalli-Sforza [1998Citation ] and Perez-Lezaun et al. [1999Citation ]). In other words, European men may have been approximately as numerous as European women, but a fraction of men may have left many descendants at each generation, and another fraction may have left just a few or none. The correlation in family size across generations, demonstrated in Canadian pedigrees (Austerlitz and Heyer 1998Citation ), would increase the evolutionary impact of this effect.

The first two hypotheses appear to contrast with the population expansions inferred from microsatellite (Pritchard et al. 1999Citation ) and sequence (Shen et al. 2000)Citation Y-chromosome variation. Those studies, however, considered largely non-European samples and different Y-chromosome polymorphisms; it may be that the demographic history of Europe has been peculiar or that the biallelic polymorphisms we considered offer insight into a different period. Biallelic Y-chromosome markers, with their low mutation rates (<10-8 per site per year; Hammer 1995Citation ; Jobling, Pandya, and Tyler-Smith 1997Citation ; Thomson et al. 2000Citation ), may only be able to reveal ancient population growth (see Takahata 1995Citation ). Conversely, fast-evolving markers in the mitochondrial genome (estimates of the mutation rate per site per year are 8.6 x 10-5 for the hypervariable region [Stoneking et al. 1992Citation ] and about 4.5 x 10-5 for RFLP [Rogers and Harpending 1992Citation ]) may contain information on more recent demographic changes.

At any rate, it is not impossible to reconcile the findings of Pritchard et al. (1999)Citation and Shen et al. (2000)Citation with those of the present study. If hypothesis 3 proved correct, the apparent constancy of the European male population size, as inferred from mismatch distributions, Tajima's tests, and Fu's tests, would be due to some form of purifying selection, ultimately concealing the effects of the demographic growth that previous studies of the Y-chromosome have recognized.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We are grateful to Giorgio Bertorelle, Laurent Excoffier, Antonio Amorim, and Chris Tyler-Smith, who discussed the results of this study with us and critically read the manuscript. Chris Tyler-Smith and Tatiana Zerjal also gave us access to unpublished data, and we thank them for that. This research was supported by funds from the Italian Ministry of the Universities (MURST COFIN 99) and from the University of Ferrara. L.P. was supported by a Ph.D. grant from Fundação para a Ciência e a Tecnologia (PRAXIS XXI/BD/13632/97), I.D. by a grant from the Swiss National Research Council (FNRS) for Perspective Investigators, Z.H.R. by a BBSRC Studentship, and M.A.J. by a Wellcome Trust Senior Fellowship in Basic Biomedical Science (grant number 057559).


    Footnotes
 
Jeffrey Long, Reviewing Editor

1 Abbreviations: HVRI, hypervariable region I; RFLP, restriction fragment length polymorphism. Back

2 Keywords: Y chromosome polymorphism human mismatch distribution population expansion Back

3 Address for correspondence and reprints: Guido Barbujani, Dipartimento di Biologia, Università di Ferrara, via L. Borsari 46, I-44100 Ferrara, Italy. E-mail: bjg{at}unife.it Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Alonso S., J. A. Armour, 2001 A highly variable segment of human subterminal 16p reveals a history of population growth for modern humans outside Africa Proc. Natl. Acad. Sci. USA 98:864-869.[Abstract/Free Full Text]

    Anderson S., A. T. Bankier, B. G. Barrell, et al. (14 co-authors) 1981 Sequence and organization of the human mitochondrial genome Nature 290:457-465[ISI][Medline]

    Aris-Brosou S., L. Excoffier, 1996 The impact of population expansion and mutation rate heterogeneity on DNA sequence polymorphism Mol. Biol. Evol 13:494-504[Abstract]

    Austerlitz F., E. Heyer, 1998 Social transmission of reproductive behavior increases frequency of inherited disorders in a young-expanding population Proc. Natl. Acad. Sci. USA 95:15140-15144[Abstract/Free Full Text]

    Barbujani G., G. Bertorelle, 2001 Genetics and the population history of Europe Proc. Natl. Acad. Sci. USA 98:22-25[Abstract/Free Full Text]

    Bertranpetit J., 2000 Genome, diversity, and origins: the Y chromosome as a storyteller Proc. Natl. Acad. Sci. USA 97:6927-6929[Free Full Text]

    Casalotti R., L. Simoni, M. Belledi, G. Barbujani, 1999 Y-chromosome polymorphisms and the origins of the European gene pool Proc. R. Soc. Lond. B Biol. Sci 266:1959-1965[ISI]

    Comas D., F. Calafell, E. Mateu, A. Perez-Lezaun, E. Bosch, J. Bertranpetit, 1997 Mitochondrial DNA variation and the origin of the Europeans Hum. Genet 99:443-449[ISI][Medline]

    Chikhi L., G. Destro-Bisol, V. Pascali, V. Baravelli, M. Dobosz, G. Barbujani, 1998 Clinal variation in the DNA of Europeans Hum. Biol 70:643-657[ISI][Medline]

    Crow J. F., 1958 Some possibilities for measuring selection intensities in man Hum. Biol 30:1-13[Medline]

    De Knijff P., 2000 Messages through bottlenecks: on the combined use of slow and fast evolving polymorphic markers on the human Y chromosome Am. J. Hum. Genet 67:1055-1061[ISI][Medline]

    Donnelly P., 1996 Interpreting genetic variability: the effects of shared evolutionary history Pp. 25–50 in K. Weiss, ed. Variation in the human genome. Wiley, Chichester, England (CIBA Foundation Symposium)

    Excoffier L., 1990 Evolution of human mitochondrial DNA: evidence for departure from a pure neutral model of populations at equilibrium J. Mol. Evol 30:125-139[ISI][Medline]

    Excoffier L., S. Schneider, 1999 Why hunter-gatherer populations do not show signs of Pleistocene demographic expansions Proc. Natl. Acad. Sci. USA 96:10597-10602[Abstract/Free Full Text]

    Fay J. C., C. I. Wu, 1999 A human population bottleneck can account for the discordance between patterns of mitochondrial versus nuclear DNA variation Mol. Biol. Evol 16:1003-1005[Free Full Text]

    Fu Y.-X., 1997 Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection Genetics 147:915-925[Abstract/Free Full Text]

    Hammer M. F., 1995 A recent common ancestry for human Y chromosomes Nature 378:376-378[ISI][Medline]

    Harding R. M., S. M. Fullerton, R. C. Griffiths, J. Bond, M. J. Cox, J. A. Schneider, D. S. Moulin, J. B. Clegg, 1997 Archaic African and Asian lineages in the genetic ancestry of modern humans Am. J. Hum. Genet 60:772-789[ISI][Medline]

    Harpending H. C., 1994 Signature of ancient population growth in a low-resolution mitochondrial DNA mismatch distribution Hum. Biol 66:591-600[ISI][Medline]

    Harpending H. C., M. A. Batzer, M. A. Gruven, L. B. Jorde, A. R. Rogers, S. T. Sherry, 1998 Genetic traces of ancient demography Proc. Natl. Acad. Sci. USA 95:1961-1967[Abstract/Free Full Text]

    Harpending H. C., S. T. Sherry, A. R. Rogers, M. Stoneking, 1993 The genetic structure of ancient human populations Curr. Anthropol 34:483-496[ISI]

    Hey J., 1997 Mitochondrial and nuclear genes present conflicting portraits of human origins Mol. Biol. Evol 14:166-172[Abstract]

    Hudson R. R., 1990 Gene genealogies and the coalescent process Pp. 1–44 in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Oxford University Press, Oxford, England

    Jobling M. A., A. Pandya, C. Tyler-Smith, 1997 The Y chromosome in forensic analysis and paternity testing Int. J. Legal Med 110:118-124[ISI][Medline]

    Jobling M. A., C. Tyler-Smith, 1995 Fathers and sons: the Y chromosome and human evolution Trends Genet 11:449-456[ISI][Medline]

    ———. 2000 New uses for new haplotypes: the human Y chromosome, disease, and selection Trends Genet 16:356-362[ISI][Medline]

    Jobling M. A., G. Williams, K. Schiebel, A. Pandya, K. McElreavey, L. Salas, G. A. Rappold, N. A. Affara, C. Tyler-Smith, 2000 A selective difference between human Y-chromosomal DNA haplotypes Curr. Biol 8:1391-1394.[ISI]

    Jorde L. B., W. S. Watkins, M. J. Bamshad, M. E. Dixon, C. E. Ricker, M. T. Seielstad, M. A. Batzer, 2000 The distribution of human genetic diversity: a comparison of mitochondrial, autosomal, and Y-chromosome data Am. J. Hum. Genet 66:979-988[ISI][Medline]

    Karafet T. M., S. L. Zegura, O. Posukh, et al. (14 co-authors) 1999 Ancestral Asian source(s) of new world Y-chromosome founder haplotypes Am. J. Hum. Genet 64:817-831[ISI][Medline]

    Marjoram P., P. Donnelly, 1994 Pairwise comparisons of mitochondrial DNA sequences in subdivided populations and implications for early human evolution Genetics 136:673-683[Abstract/Free Full Text]

    Merriwether D. A., A. G. Clark, S. W. Ballinger, T. G. Schurr, H. Soodyall, T. Jenkins, S. T. Sherry, D. C. Wallace, 1991 The structure of human mitochondrial DNA variation J. Mol. Evol 33:543-555[ISI][Medline]

    Nachman M. W., 1998 Y-chromosome variation of mice and men Mol. Biol. Evol 15:1744-1750[Abstract/Free Full Text]

    Nei M., 1987 Molecular evolutionary genetics Columbia University Press, New York

    Perez-Lezaun A., F. Calafell, D. Comas, et al. (12 co-authors) 1999 Sex-specific migration patterns in central Asian populations, revealed by the analysis of Y-chromosome short tandem repeats and mtDNA Am. J. Hum. Genet 65:208-219[ISI][Medline]

    Poloni E. S., G. Passarino, A. S. Santachiara-Benerecetti, O. Semino, A. Langaney, L. Excoffier, 1997 Human genetic affinities for Y chromosome p49a,f/Taq I haplotypes show strong correspondence with linguistics Am. J. Hum. Genet 61:1015-1035[ISI][Medline]

    Pritchard J. K., M. T. Seielstad, A. Perez-Lezaun, M. W. Feldman, 1999 Population growth of human Y chromosomes: a study of Y chromosome microsatellites Mol. Biol. Evol 16:1791-1798[Abstract/Free Full Text]

    Quintana-Murci L., O. Semino, E. Minch, G. Passarino, A. Brega, A. S. Santachiara-Benerecetti, 1999 Further characteristics of proto-European Y chromosomes Eur. J. Hum. Genet 7:603-608[ISI][Medline]

    Rogers A. R., 1995 Genetic evidence for a Pleistocene population explosion Evolution 49:608-615[ISI]

    Rogers A. R., A. E. Fraley, M. J. Bamshad, W. S. Watkins, L. B. Jorde, 1996 Mitochondrial mismatch analysis is insensitive to the mutational process Mol. Biol. Evol 13:895-902[Abstract/Free Full Text]

    Rogers A. R., H. Harpending, 1992 Population growth makes waves in the distribution of pairwise genetic differences Mol. Biol. Evol 9:552-569[Abstract]

    Rogers A. R., L. B. Jorde, 1995 Genetic evidence on modern human origins Hum. Biol 67:1-36[ISI][Medline]

    Rosser Z. H., T. Zerjal, M. E. Hurles, et al. (60 co-authors) 2000 Y-chromosomal diversity within Europe is clinal and influenced primarily by geography rather than language Am. J. Hum. Genet 67:1526-1543[ISI][Medline]

    Sajantila A., A. H. Salem, P. Savolainen, K. Bauer, C. Gierig, S. Pääbo, 1996 Paternal and maternal DNA lineages reveal a bottleneck in the founding of the Finnish population Proc. Natl. Acad. Sci. USA 93:12035-12039[Abstract/Free Full Text]

    Schneider S., D. Roessli, L. Excoffier, 2000 ARLEQUIN: a software for population genetics data analysis Version 2.0. Department of Anthropology, University of Geneva, Switzerland

    Seielstad M., E. Minch, L. L. Cavalli-Sforza, 1998 Genetic evidence for a higher female migration rate in humans Nat. Genet 20:278-280[ISI][Medline]

    Shen P., F. Wang, P. A. Underhill, et al. (13 co-authors) 2000 Population genetics implications from sequence variation in four Y chromosome genes Proc. Natl. Acad. Sci. USA 97:7354-7359[Abstract/Free Full Text]

    Sherry S. T., A. R. Rogers, H. C. Harpending, H. Soodyall, T. Jenkins, M. Stoneking, 1994 Pairwise differences of mtDNA reveal recent human population expansions Hum. Biol 66:761-776[ISI][Medline]

    Simoni L., F. Calafell, D. Pettener, J. Bertranpetit, G. Barbujani, 2000 Geographic patterns of mtDNA diversity in Europe Am. J. Hum. Genet 66:262-278[ISI][Medline]

    Slatkin M., R. R. Hudson, 1991 Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations Genetics 129:555-562[Abstract/Free Full Text]

    Sokal R. R., F. J. Rohlf, 1995 Biometry 3rd edition. Freeman, San Francisco

    Stoneking M., S. Sherry, L. Vigilant, 1992 Geographic origin of human mtDNA revisited Syst. Biol 41:384-391[ISI]

    Tajima F., 1983 Evolutionary relationship of DNA sequences in finite populations Genetics 105:437-460[Abstract/Free Full Text]

    ———. 1989a. Statistical method for testing the neutral mutation hypothesis by DNA polymorphisms Genetics 123:585-595[Abstract/Free Full Text]

    ———. 1989b. The effect of change in population size on DNA polymorphism Genetics 123:597-601[Abstract/Free Full Text]

    Takahata N., 1995 A genetic perspective on the origin and history of humans Annu. Rev. Ecol. Syst 26:343-372[ISI]

    Thomson R., J. K. Pritchard, P. Shen, P. J. Oefner, M. W. Feldman, 2000 Recent common ancestry of human Y chromosomes: evidence from DNA sequence data Proc. Natl. Acad. Sci. USA 97:7360-7365[Abstract/Free Full Text]

    Vogt P. H., 1997 Molecular basis of male (in)fertility Int. J. Androl 20:(Suppl. 3)2-10[Medline]

    Wise C. A., M. Sraml, D. C. Rubinsztein, S. Easteal, 1998 Comparative nuclear and mitochondrial genome diversity in humans and chimpanzees Mol. Biol. Evol 14:707-716[Abstract]

    Wyckoff G. J., W. Wang, C. Wu, 2000 Rapid evolution of male reproductive genes in the descent of man Nature 403:304-308[ISI][Medline]

    Zietkiewicz E., V. Yotova, M. Jarnik, et al. (11 co-authors) 1998 Genetic structure of the ancestral population of modern humans J. Mol. Evol 47:146-155[ISI][Medline]

Accepted for publication March 15, 2001.