School of Biological Sciences, Queen Mary, University of London, London, England
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
These data illustrate the principle that undetected substructure can be confused with consanguinity, as they both produce excess homozygosity over Hardy-Weinberg (HW) expectations when genotype proportions are given by the product of allele proportions. Population substructure is often viewed as the consequence of geographic subdivision, but there are several alternatives. There may be incomplete admixture between components of the population because of assortative mating, niche specialization, or, particularly in the case of humans, cultural differences.
Our method detects current population subdivision. Past episodes of admixture can be detected by other methods because components of linkage disequilibrium persist for several generations, even in the face of random mating (Hartl and Clark 1997
, p. 105). Our method differs in making use of the patterns of excess homozygosity and linkage disequilibrium, which do not persist beyond one generation of admixture.
If we had perfect knowledge of the population subdivision, it would be relatively simple to distinguish between the two causes of excess homozygosity. This can be achieved using a hierarchical approach incorporating F statistics (Wright 1921
; Weir and Cockerham 1984
). These measures deal with the apportionment of allelic identity within individuals and populations, which are sometimes further divided into subpopulations.
Consanguinity produces excess homozygosity over that expected from the subpopulation allele frequencies. This is quantified by the statistic FIS, which is the correlation measuring the increased probability of a match between a pair of alleles from the same individual (I) compared with pairs drawn from the subpopulation (S) (Cockerham 1973
).
If the population is divided into partially isolated subpopulations, individuals from the same subpopulation have an increased probability of sharing a common ancestor and hence an increased probability of homozygosity. It is possible to use F statistics to measure this increase at one hierarchical level relative to one more inclusive (Hartl and Clark 1997
, p. 117). For example, if a population is divided into regional populations (R) comprising distinct subpopulations (S), the differentiation between subpopulations is measured as FSR. However, if we are unaware of the finer subdivisions and pool a sample of individuals from different subpopulations, we will observe an apparent excess of homozygosity that might be falsely attributed to consanguinity. This amounts to a confusion of FSR with FIS.
In many instances, investigators might not detect the finer population subdivision. In the case of human populations, they could be unaware of restrictions on marriage that are subtle or unreported. Our method is a likelihood-based approach that can detect such situations by making use of the patterns found in multilocus data.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The probability of drawing two alleles of type A from one of these subpopulations is given by the standard equation
![]() |
For two different alleles, A and B, the probability is
![]() |
In this situation, where consanguinity is assumed to be zero, the genealogies of genes at different loci are essentially independent (Balding and Nichols 1994
). Nevertheless, the drift between subpopulations will result in random associations within these subpopulations, between alleles at different loci. Although there are correlations between loci, the probability of a multilocus genotype can be calculated by taking the simple product of the single-locus probabilities given in equations (1)
with an appropriate value of
. Because this calculation takes into account the effects of drift on the allele frequencies at each locus, it allows for the linkage disequilibrium that drift has generated between unlinked loci (Balding and Nichols 1994).
Next, we need to consider the unions of the individuals within the subpopulations. Within populations that traditionally practice consanguinity, such as the U.K. Pakistani communities (Darr and Modell 1988
; Overall 1998
), consanguinity is essentially due to unions between first cousins. The probability that the offspring of first cousins will inherit two identical-by-descent (IBD) genes at any one locus is approximately 1/16. We will call this probability R. The proportion of the population that practice consanguinity will be represented by the symbol C.
In the offspring of consanguineous individuals, the probability of observing two copies of an allele A which trace their ancestry to an allele in the grandparental generation is pAR. With probability (1 - R), the offspring of first cousins will not inherit alleles IBD. However, if the sample population is also substructured, then the two ancestral lineages are drawn from the same subpopulation (ignoring migration in the last two generations) so that the probabilities of identity given by equations (1)
apply. Putting these two expressions together gives
|
Again, this single-locus approach can be extended to the evaluation of multilocus genotype probabilities by taking the product of single-locus probabilities. Here, R, the magnitude of excess homozygosity through alleles IBD, is the measure of association between loci caused by consanguinity.
We wish to estimate the parameters and C using l, the likelihood of the sample. This likelihood function incorporates equations (1) and (2) to calculate the probability of the observed data given the parameter values
and C. The product is over loci and individuals:
|
In the present situation, we are concerned specifically with cousin marriages. In other cases, the proportion of the population not mating at random will have a variety of different degrees of relatedness, which cannot be represented by a single scalar, R. Instead, we require a vector with elements Ri corresponding to the relatedness produced by each of n categories of union (i 1, ... , n). In that case, the frequency of parents in each class can be represented by the corresponding element of another vector D. A specific example of this type of calculation is described below (Discussion) for a selfing population, in which the various values of Di can be calculated from a single parameter C: Di = Ck(1 - C). In other cases, it is possible to jointly estimate all the values of D using Markov chain Monte Carlo methods such as Gibbs sampling or the Metropolis Hastings algorithm (Gelman et al. 1995
).
Equation (3) does ignore any weak correlation that may occur between individuals, particularly the likelihood that individuals from the same subpopulation will be homozygous at the same loci. This issue appears to be important only when there are very few subpopulations of unequal size, and it is addressed by the simulation study below.
The ability of this method to identify the joint likelihood of C and relies on the observation that consanguinity and population subdivision can generate distinctive patterns of homozygosity in multilocus data. For example, in the case of a population practicing consanguineous unions, a sample may contain a mixture of offspring: those from consanguineous unions and those not from consanguineous unions. The former will be expected to have excess homozygosity due to IBD alleles at each of their loci, whereas the latter, randomly mating, individuals will be expected to have none.
On the other hand, a sample drawn from a subdivided population is expected to contain a mixture of individuals from different subpopulations. Here, we consider the case in which we know the average allele frequencies for only the whole population, because we are unaware of the subpopulation origin of each individual. If different subpopulations have different allele frequencies at their loci, then individuals drawn from the different subpopulations are expected to be homozygous at different loci.
An example to illustrate the different patterns is given in figure 1 . Figure 1a shows 10-locus genotypes for a randomly mating population. The black circles represent the homozygote loci, and the open circles represent the heterozygote loci, each in HW proportions. Figure 1b represents the case in which a population is subdivided into four subpopulations. The global allele frequencies are the same as those in the panmictic population (fig. 1a ). Each subpopulation is in HW equilibrium, but there is an excess of homozygosity due to variation in allele frequencies between the subpopulations. The excess is spread across loci and across subpopulations. Figure 1c shows a population with high consanguinity, of which 50% are the offspring of first-cousin unions. Again, the global allele frequencies are the same as those in figure 1a, and the excess homozygosity is of the same magnitude as that in figure 1b. The excess in this case, however, is confined to that portion of the population that is consanguineous. Each consanguineous individual is therefore more likely to be homozygous at every locus than are individuals from figure 1b, so these individuals tend to be homozygous at a larger number of loci, whereas the nonconsanguineous individuals are in the HW proportions.
|
To further illustrate the underlying differences between the pattern generated by substructure and that generated by consanguinity, we contrast the two cases by plotting the probability of s homozygous loci per individual, where s = 1, 2, ... , 10. The number of loci at which an individual is expected to be homozygous can be plotted as a binomial distribution (Sokal and Rohlf 1995
, p. 71) where p, the probability of homozygosity, is given by equation (1) or (2)
, depending on the scenario considered. In the case of substructure, the number of homozygous loci per individual is distributed as B(10, p), where, for t alleles, p =
ti=1 pi(
+ (1 -
)pi), with
= 1/32. In the case of 50% consanguinity, half the population is distributed with p =
ti=1 pi(R + (1 - R)pi), where R = 1/16, and the other half is distributed with p =
ti=1 p2i. The homozygosity through IBD is therefore equivalent in both scenarios (50% x 1/16 = 1/32). The probability of homozygosity is clearly a function of the allele frequency, so the distributions are given for alleles at global frequencies of pi = 0.25 (fig. 2a
) and pi = 0.05 (fig. 2b
). The distributions are noticeably different, particularly at lower frequencies, demonstrating the benefits of using highly polymorphic markers. The figure also gives some idea of the sample size needed to detect the difference between the two cases. In figure 2b,
for example, the largest differences between the two distributions occur at the 0 and 1 homozygous loci per individual category. Consequently, the sample size would need to be large enough to detect frequencies that differ, here, by less than 0.05. In addition to the number of alleles considered, the discrepancy between the distributions becomes more marked with increasing numbers of loci.
|
Simulation Study
As a starting point for our simulations, we used the U.S. Caucasian frequency distributions for the 10 SGM Plus loci (PE Biosystems 1999
), which have an average of 11 alleles per locus. For the first study, samples from populations with 50% first-cousin offspring and no substructure were simulated. The U.S. Caucasian allele frequency distributions were used to represent the distributions of a single homogeneous population. From these frequency distributions, arrays of all possible single-locus genotypes were generated in proportions according to equations (1) and (2)
for random mating (RM) and consanguineous populations, respectively. For example, the probability of drawing a homozygote for an allele with frequency 0.2 would be 0.04 under the RM model, but it would be 0.05 under the consanguineous model.
When the results for a consanguineously mating population were generated, the first 50% of the sample represented individuals from nonconsanguineous parents. For each of these individuals, a random draw was made from the RM genotypic arrays for each of the 10 loci. The remaining 50% of the sample were consanguineous individuals, and for each of these, a random draw was made from the consanguineous genotypic arrays for each locus. Each sample consisted of 1,000 individuals with genotypes at 10 loci. This sampling procedure was repeated 10 times. Each resulting likelihood surface peaked at the desired parameter combination, but often with broad confidence intervals. The 10 replicate data sets were pooled to produce the narrow likelihood surface shown in figure 3 .
|
|
The third study simulated samples from a population that was both substructured ( = 0.03) and consanguineous (C = 50%), combining the procedures outlined in both previous simulation studies. The likelihood surface is shown in figure 5
.
|
In addition to the previous simulations, our method performed well on simulated data in which the population consisted of numerous (>2) differentiated subpopulations of unequal sizes (it is assumed throughout this section that these subpopulations were represented in the sample in their naturally occurring proportions). The method performed less well when the population was subdivided into just two unequal-sized subpopulations (e.g., 0.9:0.1), for the simple reason that the average of the allele frequencies was very similar to that of the larger of the two sampled subpopulations. In this case, a true = 0.03 was incorrectly estimated as 0.003. Because
was underestimated, individuals from the smaller sample appeared to have additional excess homozygosity, which the procedure interpreted as the result of consanguinity. In this example, C was incorrectly estimated to be 5%, rather than 0.
Population Study
The use of equation (3)
can be demonstrated by its application to microsatellite data collected from two Asian communities (Overall 1998
). The U.K. Asian population presents a situation in which high levels of consanguineous unions have been correlated with high rates of morbidity and mortality, particularly within the Pakistani communities (Terry et al. 1985
; Darr and Modell 1988
; Chitty and Winter 1989
; Bundey et al. 1991
). It is not unusual to observe a proportion of first-cousin marriages of around 50% (Darr and Modell 1988
). High rates of consanguinity are therefore common, and because of the implications for inherited disorders, genetic counseling in such situations has been recommended (Modell 1991
).
A recent study of two U.K. Asian communities (Overall 1998
; Ayres and Overall 1999
) observed evidence for similar levels of excess homozygosity. One of these populations, the Mirpuri, comprised Moslems of Pakistani descent, of whom 50% were born to first cousins. Excess homozygosity was expected within this population as a result. The other population, the Jullunduri, were of Indian Sikh origin with no tradition of consanguinity. Interestingly, excess homozygosity was also observed within this sample. There are a number of explanations for this result. It is possible that there has been an increase in consanguinity resulting from disruptions to traditional marriage practices during migration. In addition, restrictions imposed by U.K. immigration law may have played a part in forcing unorthodox unions. Such an event would increase the incidence of recessive disorders within this community and warrant a comprehensive study of this Asian subpopulation. Alternatively, the Jullunduri population could be substructured, in which case the implications for deleterious recessive traits would depend on the history of population size and isolation. Our method was used to investigate the relative plausibility of these two explanations.
Two sample populations were collected from each of the Nottingham Mirpuri and Jullunduri communities on two separate occasions. The earlier collections of Mirpuri and Jullunduri samples (N = 45 and 45, respectively) were typed for six short tandem repeat (STR) microsatellite loci using the SGM multilocus primer set (Kimpton et al. 1996
). The later collections (N = 48 and 32, respectively) were typed for 10 loci using the SGM Plus primer set (PE Biosystems 1999
). In total, 10% of the samples were rerun to check for typing error. There were no typing discrepancies. Both SGM and SGM Plus amplification kits are optimized to minimize the preferential amplification of smaller alleles and loci. Consequently, the potential for incorrectly assigning homozygosity, which could augment estimates of both consanguinity and substructure, is minimal. Each individual was typed with only one of the two kits. These primers were fluorescent-tagged and facilitated automated scoring using GeneScan software.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The results of applying equation (3)
to each of the two data sets are shown in the likelihood surfaces of figures 6 and 7
. The likelihood surfaces in figures 6 and 7
support contrasting causes for the excess homozygosity observed within each sample. For the Mirpuri data (fig. 6
), the maximum-likelihood value implies that 50% of the subjects were born to consanguines. This is in accordance with sociological data about U.K. Pakistani Moslem populations. There is less support for a high value of . The Jullunduri surface (fig. 7
) differs from the Mirpuri surface, with two maxima giving quite different interpretations of the data. Two maxima are given for the Jullunduri data because the two scenarios
= 1/16 and C = 100% with R = 1/16 have the same probability distribution if represented as in figure 2
, of which both parts explain the excess homozygosity of around 1/16. With alternative values of R, the maximum-likelihood value favors the substructure scenario (not shown). Clearly, then, this method works more effectively when there is variation in the degree to which the individuals of the population sampled are inbred (i.e., C < 100%). Alternatively, the nongenetic data may help rule out one option. Knowledge of the Jullunduri population makes the interpretation of complete consanguinity implausible (R. Ballard, personal communication). The substructure alternative is quite plausible given that caste endogamy is still practiced in many Sikh communities in India (Mukherjee et al. 1999
) as well as in the United Kingdom (Ballard 1994).
|
|
Randomizing the alleles among individuals disrupts the correlation between alleles at the same locus and hence removes the homozygote excess. The effect on our estimates when this was done with the Jullunduri data was, as would be predicted, that the maximum-likelihood value went to 0 for both parameters (not shown).
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Another area in which the distinction between consanguinity and subdivision is of practical importance is forensic science. The Forensic Science Service's (FSS) Asian database consists of individuals of mixed Asian origin, each typed for six STR loci (Gill and Evett 1995
). Within our Jullunduri sample, there are 32 individuals with full profiles at these loci. When our Jullunduri and FSS Asian allele frequency distributions are compared, differentiation appears low (
= 0.01; Ayres and Overall 1999
), implying that the FSS database allele frequencies are representative of the Jullunduri subpopulation. However, our analysis suggests that the excess homozygosity observed within the Jullunduri sample, estimated at around 0.06 (Ayres and Overall 1999
), actually reflects substructure within this population at a lower level.
The implications for forensic interpretation can be illustrated by calculating match probabilities for the 32 individuals using the FSS database. Match probabilities quantify how likely it is to observe a matching STR profile in an alternative suspect, given that the defendant matches the crime scene profile. If we interpret the 6% observed homozygosity excess as the consequence of consanguinity, then we can build this into our match probability calculation using the method of Ayres and Overall (1999)
, setting FIS = 0.06. On the other hand, if it is due to population subdivision, then the probability of matching an alternative suspect from the same subpopulation is elevated. This probability can be calculated using the method of Balding and Nichols (1994)
and setting FST = 0.06.
This issue of the nonindependence of suspect and offender genotypes is also addressed in detail by Evett and Weir (1998
, p. 211). In particular, their calculation of the probability of a crime scene profile, given that some person other than the suspect left the DNA evidence, incorporates allelic dependencies, quantified as FST. The authors arrive at their equations (Evett and Weir 1998
, eqs. 8.1) by a route different from that of Balding and Nichols (1994
, eqs. 1 and 2). Essentially, both treatments avoid the need to assume allelic independence, as well as the need to specify individual subpopulations.
The outcomes when considering excess homozygosity as inbreeding or substructure in match probability calculations are very different, however. Comparing our corrected match probabilities for each individual with FIS = 0.06, the match probabilities are elevated only 1.1 times on average, with the most extreme result being 3 times the uncorrected value. However, if we use a correction of FST = 0.06, the match probabilities are 82,000 times as high, with a mean of 234 times the uncorrected value. Our conclusion that the homozygous excess does indeed reflect subdivision is therefore of clear importance.
A specific case has been presented here in which consanguines are first cousins. Using R = 1/16, however, ignores the possible consanguineous history of the grandparents. To see that this effect is very small, consider the case in which all preceding generations showed the same marriage patterns. We wish to calculate : the probability of IBD in the common grandparents, plus the additional probability due to consanguinity in those grandparents. It can be shown that
= C/(2n - C), where n is the number of ancestors in each line of descent through each common ancestor (by extension of eqs. 5.1 and 5.15 in Falconer [1989
, pp. 8797]). If we consider C = 50%, ignoring consanguinity in the grandparental generation increases the probability of identity by less than 1/1,000. The use of R = 1/16 in the above calculations is a reasonable approximation. The increase in identity over generations becomes less trivial, however, when unions are between closer relatives. To incorporate closer degrees of relatedness into the estimation procedure, we need to modify R.
With regular systems of inbreeding (e.g., see Falconer 1989
, p. 91), the inbreeding coefficient increases in magnitude with each generation of successive inbreeding. We consider a slightly different situation, in which the inbreeding individuals form only a portion of the population, with this proportion remaining constant from one generation to the next. The proportion of individuals born to consanguines (C) then is partitioned into individuals that have experienced 0, 1, 2, ... , n successive generations of inbreeding. The overall inbreeding coefficient of a population is calculated assuming that C(1 - C) have experienced one generation of inbreeding with coefficient r, C2(1 - C) have experienced two successive generations of inbreeding and have, therefore, an inbreeding coefficient of r + r2, and so forth, giving the series
|
In the limit of n
, this converges to
= C/[(1/r) - C], which was previously arrived at by the extension of Falconer's equations, detailed above, and which is a general treatment of equations given in Li (1976
, p. 243). For practical purposes, it may be noted that for most systems of close inbreeding, the calculation need not be extended beyond 20 generations (Falconer 1989
, p. 93).
This value of can be substituted into equation (3) to give the probability for single-locus data. For multilocus data and close inbreeding, this is not appropriate because it takes the product over loci. This implies that the loci are independent, which they are not. Some individuals have been produced by one generation of inbreeding, and the appropriate inbreeding coefficient will apply to all of their loci. Other individuals will be the products of two generations of inbreeding, and this will apply to all of their loci, and so forth. In general, a proportion of individuals Ck(1 - C) will have an inbreeding coefficient, Rk, that will apply to all loci. For M loci, therefore, the probability of a multilocus genotype (1i1j, 2i2j, ... , MiMj) becomes the sum over generations
|
For close degrees of inbreeding, this equation gives appreciably different results from the naïve approach of substituting into equations (2) and (3)
. The importance of this discrepancy can be illustrated by an example in which we score eight unlinked loci in a species that can self-fertilize. Consider observing one individual homozygous at six loci (pm = 0.1). Under equation (3)
, a selfing rate of C = 20% is essentially ruled out with a likelihood over 2,000 times less likely than the maximum-likelihood value of C, around 70%, whereas under equation (4)
, C = 20% is plausible, being only 3.2 times less likely.
Although we have presented our method in the context of human medical genetics and forensic science, excess homozygosity has been observed in a wide range of plant and animal studies (Shapcott 1994
; Premoli 1996
; Freeland, Noble, and Okamura 2000
). Population substructure has frequently been suspected as a contributory factor but has not been clearly identified. These examples include species which breed with closer relatives (than cousins) or self-fertilize. For these situations, equation (4)
should be preferred. In either case, it is necessary to have some understanding of the underlying pattern of mating that is being compared with substructure. In particular, a value of R must be specified, which requires knowledge of the species' reproductive biology and ecology.
Our approach could be extended to include parameters that specify the subpopulation from which each individual is drawn, following the approach of Pritchard, Stephens, and Donnelly (2000)
. Although they did not model consanguineous populations, they did show that it is possible to use Markov chain Monte Carlo methods to probabilistically assign individuals to subpopulations. In many studies, however, the objective is to distinguish consanguinity from population substructure or to quantify the magnitude of substructure among inbred individuals (Shapcott 1994
; Premoli 1996
; Freeland, Noble, and Okamura 2000
). In such cases, the computational costs of increasing the number of parameters may not be justified. Given the abundance of multilocus data currently being produced, our method could find application in a variety of both human and nonhuman population studies.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Present address: Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh, Scotland.
2 Keywords: consanguinity
inbreeding
population substructure
short tandem repeat loci
U.K. Asian population
forensic match probabilities
3 Address for correspondence and reprints: A. D. J. Overall, Institute of Cell, Animal and Population Biology, University of Edinburgh, Ashworth Laboratories, King's Buildings, Edinburgh EH9 3JT, United Kingdom. andy.overall{at}ed.ac.uk
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Ayres K. L., A. D. J. Overall, 1999 Allowing for within-subpopulation inbreeding in forensic match probabilities Forensic Sci. Int 103:207-216[ISI]
Balding D. J., R. A. Nichols, 1994 DNA profile match probability calculations: how to allow for population stratification, relatedness, database selection and single bands Forensic Sci. Int 64:125-140[ISI][Medline]
Ballard R., 1994 Desh Pardesh: the south Asian presence in Britain Oxford University Press, Oxford
Brookfield J. F. Y., 1996 A simple new method for estimating null allele frequency from heterozygote deficiency Mol. Ecol 5:453-455[ISI][Medline]
Bundey S., H. Alam, A. Kaur, S. Mir, R. Lancashire, 1991 Why do UK-born Pakistani babies have high perinatal and neonatal mortality rates? Paediatr. Per. Epidemiol 5:101-114
Chitty L. S., R. M. Winter, 1989 Perinatal mortality in different ethnic groups Arch. Dis. Child 64:1036-1041[Abstract]
Cockerham C. C., 1973 Analysis of gene frequencies Genetics 74:679-700
Corte-Real F., L. Souto, M. J. Anjos, M. Carvalho, D. N. Vieira, A. Carracedo, M. C. Vide, 1999 Population distribution of six PCR-amplified loci in Madeira Archipelago (Portugal) Forensic Sci. Int 100:93-99[ISI][Medline]
Darr A., B. Modell, 1988 The frequency of consanguineous marriage among British Pakistanis J. Med. Genet 25:186-190[Abstract]
Evett I. W., P. D. Gill, J. A. Lambert, N. Oldroyd, R. Frazier, S. Watson, S. Panchal, A. Connolly, C. Kimpton, 1997 Statistical analysis of data for three British ethnic groups from a new STR multiplex Int. J. Legal Med 110:5-9[ISI][Medline]
Evett I. W., B. S. Weir, 1998 Interpreting DNA evidence Sinauer, Sunderland, Mass
Falconer D. S., 1989 Introduction to quantitative genetics. 3rd edition Longman Scientific and Technical, Harlow, United Kingdom
Freeland J. R., L. R. Noble, B. Okamura, 2000 Genetic consequences of the metapopulation biology of a facultatively sexual freshwater invertebrate J. Evol. Biol 13:383-395[ISI]
Gelman A., J. B. Carlin, H. S. Stern, D. R. Rubin, 1995 Bayesian data analysis Chapman and Hall, London
Gill P., I. Evett, 1995 Population genetics of short tandem repeat (STR) loci Genetica 96:69-87[ISI][Medline]
Hartl D. L., A. D. Clark, 1997 Principles of population genetics. 3rd edition Sinauer, Sunderland, Mass
Kimpton C. P., N. C. Oldroyd, S. K. Watson, R. R. E. Frazier, P. E. Johnson, E. S. Millican, A. Urquhart, R. L. Sparkes, P. Gill, 1996 Validation of highly discriminating multiplex short tandem repeat amplification systems for individual identification Electrophoresis 17:1283-1293[ISI][Medline]
Li C. C., 1976 First course in population genetics Boxwood, Pacific Grove, Calif
Modell B., 1991 Social and genetic implications of customary consanguineous marriage among British Pakistanis J. Med. Genet 28:720-723[ISI]
Mukherjee N., P. P. Majumder, B. Roy, M. Roy, B. Dey, M. Chakraborty, S. Banerjee, 1999 Variation at 4 short tandem repeat loci in 8 population groups in India Hum. Biol 71:439-446[ISI][Medline]
Nagylaki T., 1998 Fixation indices in subdivided populations Genetics 148:1325-1332
Nei M., 1987 Molecular evolutionary genetics Columbia University Press, New York
Overall A. D. J., 1998 The geographic scale of human genetic differentiation at short tandem repeat loci Ph.D. thesis, University of London, London
PE Biosystems. 1999 AmpFSTR SGM Plus user manual Perkin-Elmer, Foster City, Calif
Premoli A. C., 1996 Allozyme polymorphisms, outcrossing rates, and hybridization of South American Nothofagus Genetica 97:55-64[ISI]
Pritchard J. K., M. Stephens, P. Donnelly, 2000 Inference of population structure using multilocus genotype data Genetics 155:945-959
Shapcott A., 1994 Genetic and ecological variation in Athererosperma-moschatum and the implications for conservation and biodiversity Aust. J. Bot 42:663-686[ISI]
Sokal R. R., F. J. Rohlf, 1995 Biometry. 3rd edition W. H. Freeman, San Francisco, Calif.
Terry P. B., J. G. Bissenden, R. G. Condie, P. M. Mathew, 1985 Ethnic differences in congenital malformations Arch. Dis. Child 62:866-879[ISI][Medline]
Weir B. S., C. C. Cockerham, 1984 Estimating F statistics for the analysis of population structure Evolution 38:1358-1370[ISI]
Wright S., 1921 Systems of mating Genetics 6:111-178