Dipartimento di Biologia, Università di Ferrara, Ferrara, Italy
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In populations that have a long history of separation (and especially if mutation rates are high), different mutations may be present, and there is therefore useful information not only in allele frequency differences between populations, but also in the amount of molecular differentiation between alleles. Under these conditions, taking into account also the molecular differences between alleles when admixture proportions are estimated seems desirable.
The first attempt along these lines (Pinto et al. 1996
) was followed by the derivation of an estimator (called mY) that can be applied to any type of molecular data, as long as their amount of molecular diversity can be simply related to coalescence times (Bertorelle and Excoffier 1998
).
The application of mY (e.g., Hammer et al. 2000) has been limited to cases of admixture in which only two PPs contribute alleles to the gene pool of the HP. Here, we derive a system of linear equations that allows a simple extension of this model to the case in which the HP received a genetic contribution from any arbitrary number d of PPs. The behavior of our d-parental estimators of admixture proportions, and, in particular, the effect of increasing the number of estimated parameters on their errors, is evaluated by simulation. Finally, we apply this method to the study of a human HP with three putative PPs.
![]() |
Derivation of the Multiparental Estimators |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The mean coalescence time between a gene drawn from the HP and a gene drawn from the ith PP, h,i, is given by
| (1) |
Noting that for d PPs there are d mean coalescence times h,i and (d - 1) contributions to estimate (
µi = 1), a least-squares estimator for µi can be computed minimizing the sum of the squares of the differences between the left- and the right-hand sides of equation (1)
:
| (2) |
The estimators are thus computed solving the (d - 1) linear equations obtained deriving equation (2)
for the (d - 1) unknowns. This system turns out to be simply described by the general kth equation:
| (1) |
| (2) |
![]() |
Testing the Estimators |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Monte Carlo Simulation of HP and PPs
The genealogies of samples of 60 DNA sequences from the HP and each PP were reconstructed following a coalescent approach (Hudson 1990
). Mutations were then introduced assuming an infinite-sites model with
= 2Nu =10 (N = haploid effective population size; u = mutation rate per locus per generation). For various combinations of the parameters µ,
, tA, and d, 1,000 genealogies were generated, thus allowing an empirical evaluation of the bias and the standard error of the estimators. Mean coalescence times were estimated from the average number of nucleotide differences.
In general, the d-parental estimators seemed to retain the properties of the two-parental estimator of admixture proportions (Bertorelle and Excoffier 1998
). The estimator bias, unless very short divergence times among parental populations were assumed, was almost negligible (results not reported). On the other hand, the standard error became reasonably low only when PPs had diverged for a number of generations
in the range of the population size or higher (see fig. 1
). As observed for the two-parental estimator, the results reported in figure 1
also suggest that the age of the admixture event (tA) affects the precision of the estimates. For example, for PPs with an effective population size of 500, reliable estimates of admixture proportions are expected if the PPs diverged at least 500 generations ago. Even in this case, however, if the admixture event occurred 25 generations ago, the errors in the estimates can increase by a factor 3 or 4. As noted earlier (Bertorelle and Excoffier 1998
), these results suggest some conditions (
> 1N, tA < 0.05N) for the applicability of the estimator mYi to single-locus data. If these conditions are not fulfilled, several loci with similar mutation rates should be simultaneously analyzed.
|
Artificial Hybrid Populations
We also simulated artificial HPs by pooling individuals extracted from real samples of human populations. On the basis of a multidimensional analysis of 61 samples of human populations typed for hypervariable region I of mtDNA (Excoffier and Schneider 1999
), we chose one group of genetically rather homogeneous samples in Europe (Bavarians, Cornish, English, Germans, Welsh) and one group of genetically differentiated samples (!Kung, Australians, Japanese, Nootka, Saami). Using the sample allele frequencies as probabilities, and separately for each group of samples, we generated artificial PP samples of 100 sequences and a sample of 100 sequences (the artificial HP) extracted with fixed relative proportion from the PPs. This procedure was repeated 1,000 times, and the mean and standard errors of the estimators were then analyzed. Compared with the previous Monte Carlo simulations, the coalescent structure did not change among replicates; the standard errors we compute in this section therefore do not include the stochastic factors associated to the gene genealogy.
Figure 2
shows the results obtained for the two groups of populations when the relative contributions were fixed at 0.6 for one PP and 0.1 for four others. The bias was virtually absent in both cases, but the error of our estimators seemed reasonably low only when very different populations were used as artificial PPs. In other words, single-locus analysis of admixture processes does not seem feasible for human populations if they are only slightly differentiated, as is the case for most European groups. Again, we expect that only the simultaneous analysis of several loci could provide more reliable estimates, as is also the case, for example, for trees summarizing the evolutionary relationships of populations (Mountain and Cavalli-Sforza 1997
). Finally, the results of this analysis seem to indicate a positive relationship between the error of the estimated contribution of a PP and its level of genetic variability. A set of simulations (whose results are not reported in detail) in which parental populations had different effective sizes supported the conclusion that the genetic variability of a PP and the error of its estimated contribution to the HP are positively correlated.
|
Our results suggest, therefore, that a large majority of Canarian mtDNAs have a North African Berber origin and that the Spanish contribution was limited. These results seem more consistent than previous ones with the known history of the Spanish occupation and the presumed relationship between the pre-occupation people and the North African Berbers. We nevertheless note the large standard error associated with the estimates.
![]() |
Conclusions |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A Monte Carlo simulation study shows that the multiparental estimators behave in a way very similar to that of the two-parental estimator. The number of parameters to estimate increases with the number of PPs, but so does the information contained in the data. This is probably the reason for the constancy of the standard errors with the numbers of PPs considered.
Simulating artificial HP using human mtDNA sequences, we showed that in our species, the level of divergence between populations from different continents is probably large enough to allow reliable estimates of admixture proportions based on a single locus. This is certainly not true for closely related populations, such as those in the Canarian example, where the analysis of several loci seems necessary.
Finally, it is important to remember that the estimators of admixture proportion proposed here were derived assuming a specific population model. Indeed, suppose two parental populations contributed to the hybrid population the same amount of genes, but at different times. Larger numbers of mutations are expected to accumulate between the hybrid population and the parental population which contributed its genes earlier; this may lead to a decreased similarity between them, producing an underestimation of the more ancient contribution. Small deviations from the model probably have limited effects on the admixture proportion estimates, but this point needs to be further clarified.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: admixture coefficients
least-squares method
coalescent
Monte Carlo simulations
human populations
mtDNA sequences
2 Address for correspondence and reprints: Isabelle Dupanloup de Ceuninck, Dipartimento di Biologia, Università di Ferrara, via L. Borsari 46, 44100 Ferrara, Italy. dpi{at}dns.unife.it
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bertorelle, G., and L. Excoffier. 1998. Inferring admixture proportions from molecular data. Mol. Biol. Evol. 15:1298 1311
Cavalli-Sforza, L. L., and W. F. Bodmer. 1971. The genetics of human populations. W. H. Freeman and Company, San Francisco
Chakraborty, R. 1986. Gene admixture in human populations: models and predictions. Yearb. Phys. Anthropol. 29: 143
Estoup, A., J. M. Cornuet, F. Rousset, and R. Guyomard. 1999. Juxtaposed microsatellite systems as diagnostic markers for admixture: theoretical aspects. Mol. Biol. Evol. 16: 898908
Excoffier, L., and S. Schneider. 1999. Why hunter-gatherers do not show signs of Pleistocene demographic expansions. Proc. Natl. Acad. Sci. USA 96:1059710602
Hammer, M. F., A. J. Redd, E. T. Wood et al. (12 co-authors). 2000. Jewish and middle eastern non-Jewish populations share a common pool of Y-chromosome biallelic haplotypes. Proc. Natl. Acad. Sci. USA 97:67696774
Hudson, R. R. 1990. Gene genealogies and the coalescent process. Pp. 144. in D. Futuyma and J. Antonovics, eds. Oxford surveys in evolutionary biology. Oxford University Press, Oxford, England
Long, J. C. 1991. The genetic structure of admixed populations. Genetics 127:417428
Mountain, J. L., and L. L. Cavalli-Sforza. 1997. Multilocus genotypes, a tree of individuals, and human evolutionary history. Am. J. Hum. Genet. 61:705718[ISI][Medline]
Parra, E. J., A. Marcini, J. Akey et al. (11 co-authors). 1998. Estimating African American admixture proportions by use of population-specific alleles. Am. J. Hum. Genet. 63:1839 1851[ISI][Medline]
Pinto, F., V. M. Cabrera, A. M. Gonzalez, J. M. Larruga, A. Noya, and M. Hernandez. 1994. Human enzyme polymorphism in the Canary Islands. VI. Northwest African influence. Hum. Hered. 44:156161
Pinto, F., A. M. Gonzalez, M. Hernandez, J. M. Larruga, and V. M. Cabrera. 1996. Genetic relationship between the Canary Islanders and their African and Spanish ancestors inferred from mitochondrial DNA sequences. Ann. Hum. Genet. 60:321330[ISI][Medline]
Roberts, D. F., M. Evans, E. W. Ikin, and A. E. Mourant. 1966. Blood groups and the affinities of the Canary Islanders. Man 1:512