Vavilov Institute of General Genetics, Moscow, Russia;
Department of Biology, University College London, London, England;
Department of Biological Sciences, Stanford University
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The usual way to analyze a set of microsatellite loci from individuals sampled in two populations is to compute (µ)2 for each locus and average across loci. If the mutation rate (or the effective mutation rate) is the same at all loci, and is known, then simple division gives an estimate of the expected time since separation of the populations. Variation across loci in the mutation rate affects the variance of (
µ)2 (but not its expectation).
The evolutionary process involves genetic sampling error due to random genetic drift and mutation, and thus the variance among the possible evolutionary replicates of the distance is an important issue. Zhivotovsky and Feldman (1995)
implied that among replicates, the distance follows a chi-square distribution. In fact, the variance of the distance does asymptotically satisfy the most important property of the chi-square distribution, namely, that its variance approaches twice the square of its expectation as time increases (Zhivotovsky, Feldman, and Grishechkin 1997
), but the actual distribution is not exactly chi-square.
From their analysis of properties of (µ)2 in a study of more than 200 human microsatellite loci, Cooper et al. (1999)
found strong evidence for variation among loci in the mutation rate. Our purpose with this paper is to obtain an analytical expression for the variance of (
µ)2 when the mutation rate is variable. An important application of this analytical expression could be estimation of the extent of variation in mutation rate among microsatellite loci. Our analysis also allows us to compute the time-dependent dynamics of the variance of (
µ)2 and to assess how sensitive these dynamics are to the assumption of a fixed mutation rate that is constant across loci.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The within-population variation at a microsatellite locus can be characterized by the mean allele size (r), the variance of allele size (the second central moment) (V), and the unnormalized kurtosis (the fourth central moment) (K) (Zhivotovsky and Feldman 1995
). The between-population variation can be measured by analogs of FST (Slatkin 1995
; see also Michalakis and Excoffier 1996
; Rousset 1996
; Feldman, Kumm, and Pritchard 1999
). For two populations, the (
µ)2 distance is defined as the squared difference of the mean values of their repeat scores: (
µ)2 = (r1 - r2)2 (Goldstein et al. 1995
).
Suppose that two populations diverged from an ancestral population at initial time t = 0 at which the profile of allele frequencies was represented by 0 (i.e.,
0 produced specific values of the variance V0, unnormalized kurtosis K0, etc.) and then evolved independently under random genetic drift and multistep mutation. Given
0,
r{S |
0} is an expectation operator that averages the statistic S over all possible realizations (replicates) of the drift-mutation process. Averaging with
r may then be followed by the operator
0, which averages over all possible genetic structures
0 of the unknown ancestral population, i.e., values of V0, K0, etc. Thus,
r averages over loci having identical mutation parameters and identical starting conditions, and
0 averages over the different initial conditions. We assume that prior to divergence, the ancestral population had attained mutation-drift equilibrium, where the expectations of the within-locus variances, the between-locus variance of variances, and the unnormalized within-locus kurtosis are
|
After generations of divergence, the expected distance,
0
r((
µ)2), equals 2w
(Zhivotovsky and Feldman 1995
; see also Feldman, Kumm, and Pritchard 1999
; Zhivotovsky 2001
), which becomes 2µ
with one-step symmetric mutation (Goldstein et al. 1995
).
The square of the genetic sampling error of a statistic is its variance over replicates (Weir 1996
). Therefore, the within-locus variance of (
µ)2 is defined as
|
|
Suppose the effective mutation rate varies across loci with mean , variance
2w, and
, the mean value of k over loci. It is proved in the appendix that
|
|
![]() |
|
As time increases, 2w and the coefficient of variation of w, Cw =
w/
, asymptotically satisfy
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Earlier, Zhivotovsky and Feldman (1995)
pointed out that hundreds of loci are required to estimate the genetic distance (
µ)2 with reasonable accuracy, and with variable mutation rates, the number of loci must be even greater. Indeed, as follows from equation (5)
, the coefficient of variation of genetic distance (
µ)2 averaged over L loci, which can be used as a measure of the relative accuracy (R) of estimation of the genetic distance, is approximated by [(2 + 3C2w)/L]
, or L = (2 + 3C2w)/R2. For instance, if the relative accuracy is 10%, i.e., R = 0.1, then 200 loci with identical mutation rates would be needed, whereas 500 loci are required to estimate genetic distance with the same precision if the relative variation in mutation rates is 100%, i.e., if Cw = 1. As an example, using combined data on 131 di-, tri-, and tetranucleotide microsatellite loci, Zhivotovsky (2001,
table 1
) estimated approximately 14% for the accuracy of genetic distances between African and non-African populations. It should be noted, however, that in the analyses of Jin et al. (2000)
, (
µ)2 was not able to reliably distinguish continental groups in trees made using the 28 loci of Bowcock et al. (1994)
, although its performance was comparable with other distance measures with 64 microsatellite loci. Again, this reinforces our view that several hundred loci would be needed to produce satisfactory estimates of (
µ)2 and Cw.
It should be strongly emphasized that expression (2)
, as well as expressions (4) and (5)
, derived from it, are only valid for reproductively isolated populations of constant size at mutation-drift equilibrium. Otherwise, if we consider a process of subdivision of a parental population into two populations that subsequently evolve under mutation and genetic drift, the genetic distance (µ)2 becomes a nonlinear function of time; in particular, it underestimates the divergence time if the two populations are growing in size and/or are connected by gene flow (Zhivotovsky 2001
). Therefore, our estimates in table 1
have to be regarded with caution.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Appendix |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The Within-Locus Variance of (µ)2
Using the expression for the within-locus variance Var{(t)} (Zhivotovsky, Feldman, and Grishechkin 1997
, p. 932, right column), which remains valid with the moments taken with respect to zero, and taking the limit as the regression coefficient ß
+0, we obtain
|
![]() |
The Between-Locus Variance of (µ)2
From equations (4) and (14)
of Zhivotovsky and Feldman (1995)
, the changes in the expected values of the distance and the variance are
r((
µ)2(
+ 1) |
0) -
r((
µ)2(
) |
0)
(1/N)
r(V(
) |
0), and
r(V(
+ 1) |
0) -
r(V(
) |
0)
w - (1/N)
r(V(
) |
0)), neglecting terms of order less than 1/N and recalling that w is defined by equation (6)
. Replacing the differences in the left-hand sides of these approximations with corresponding differentials and solving the resulting linear differential equations, we have
|
As follows from the definition of the between-locus variance, VarB is equal to the expectation 0 of the square of 2(V0 -
)(1 - e-
/2N). Then, using equation (1)
, we obtain
|
Variation in Mutation Rate
The well-known partitioning of conditional variance (e.g., Rice 1995
) can be extended to the case of three random values: for an arbitrary function f(x, y, z), its variance,
z
y
x(f -
z
y
xf)2, is
|
Now, consider x,
y, and
z, respectively, as
r,
0, and the expectation operator averaging over varying values of the mutation parameters,
m, and take the distance (
µ)2 as function f. The first two terms in the right-hand side of equation (11)
represent the expectation
m of VarW in equation (7)
and VarB in equation (10)
, respectively. The third term is Varm(
0((
µ)2)), the variance of the expected distance in equation (9)
with respect to mutation parameters. Taking the expectations and summing in equation (11)
, we obtain equation (2) .
Additionally, note that at mutation-drift equilibrium, the within-locus variance, the unnormalized within-locus kurtosis, and the between-locus variance of variances in the case of varying mutation rate become (using the same notation as in eq. 1
)
|
Di Rienzo et al. (1998)
obtained the same expression for Var(V).
![]() |
Footnotes |
---|
Keywords: microsatellite loci
mutation rate
genetic distance
Address for correspondence and reprints: Marcus W. Feldman, Department of Biological Sciences, Stanford University, Stanford, California 94305. marc{at}charles.stanford.edu
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bowcock A. M., A. Ruiz-Linares, J. Tomfohrde, E. Minch, J. R. Kidd, L. L. Cavalli-Sforza, 1994 High resolution of human evolutionary trees with polymorphic microsatellites Nature 368:455-457[ISI][Medline]
Cooper G., W. Amos, R. Bellamy, M. R. Siddiqui, A. Frodsham, A. V. S. Hill, D. C. Rubinsztein, 1999 An empirical exploration of the (µ)2 genetic distance for 213 human microsatellite markers Am. J. Hum. Genet 65:1125-1133[ISI][Medline]
Di Rienzo A., P. Donnelly, C. Toomajian, B. Sisk, A. Hill, M. L. Petzl-Erler, G. K. Haines, D. H. Barch, 1998 Heterogeneity of microsatellite mutations within and between loci, and implications for human demographic histories Genetics 148:1269-1281
Feldman M. W., J. Kumm, J. K. Pritchard, 1999 Mutation and migration in models of microsatellite evolution Pp. 98115 in D. G. Goldstein and C. Schlotterer, eds. Microsatellites: evolution and applications. Oxford University Press, Oxford.
Forster P., A. Rohl, P. L. Lunnermann, C. Brinkmann, T. Zerjal, C. Tyler-Smith, B. Brinkmann, 2000 A short tandem repeat-based phylogeny for the human Y chromosome Am. J. Hum. Genet 67:182-196[ISI][Medline]
Goldstein D. B., A. R. Linares, L. L. Cavalli-Sforza, M. W. Feldman, 1995 Genetic absolute dating based on microsatellites and the origin of modern humans Proc. Natl. Acad. Sci. USA 92:6723-6727[Abstract]
Hudson R. R., 1990 Gene genealogies and the coalescent process Oxf. Surv. Evol. Biol 7:1-45
Jin L., M. L. Baskett, L. L. Cavalli-Sforza, L. A. Zhivotovsky, M. W. Feldman, N. A. Rosenberg, 2000 Microsatellite evolution in modern humans: a comparison of two data sets from the same populations Ann. Hum. Genet 64:117-134[ISI][Medline]
Jorde L. B., A. R. Rogers, M. Bamshad, W. S. Watkins, P. Krakowiak, S. Sung, J. Kere, H. Harpending, 1997 Microsatellite diversity and the demographic history of modern humans Proc. Natl. Acad. Sci. USA 94:3100-3103
Kimmel M., R. Chakraborty, 1996 Measures of variation at DNA repeat loci under a general stepwise mutation model Theor. Popul. Biol 50:345-367[ISI][Medline]
Michalakis Y., L. A. Excoffier, 1996 Generic estimation of population subdivision using distances between alleles with special reference for microsatellite loci Genetics 142:1061-1064
Rice J. A., 1995 Mathematical statistics and data analysis. 2nd edition Duxbury Press, Belmont, Calif
Rousset F., 1996 Equilibrium values of measures of population subdivision for stepwise mutation processes Genetics 142:1357-1362
Slatkin M., 1995 A measure of population subdivision based on microsatellite allele frequencies Genetics 139:457-462
Weir B. S., 1996 Genetic data analysis II Methods for discrete population genetic data. Sinauer, Sunderland, Mass
Zhivotovsky L. A., 2001 Estimating divergence time with the use of microsatellite genetic distances: impacts of population growth and gene flow Mol. Biol. Evol 18:700-709
Zhivotovsky L. A., M. W. Feldman, 1995 Microsatellite variability and genetic distances Proc. Natl. Acad. Sci. USA 92:11549-11552[Abstract]
Zhivotovsky L. A., M. W. Feldman, S. A. Grishechkin, 1997 Biased mutations and microsatellite variation Mol. Biol. Evol 14:926-933[Abstract]