* School of Animal and Microbial Sciences
School of Computer Science
Department of Applied Statistics
Department of Agricultural Botany, University of Reading, United Kingdom
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: microsatellite evolution replication slippage dinucleotide repeats human AC
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To calculate the predictions of the slippage/point-mutation theory, we follow earlier treatments in four important respects. Firstly, we restrict attention to models in which microsatellite lengths change by slippage mutations by one repeat, which seems a reasonable simplification given the evidence stated above. Secondly we suppose that the expansion rate at any length is the same as the contraction rate at that length. Without this assumption, the theory would not be clearly differentiated from the mutation bias theory described above. Thirdly, we assume that current distributions of microsatellite lengths represent an equilibrium between the expansionary tendencies of slippage mutation and the splitting effects of point mutations breaking microsatellites into smaller units. It seems reasonable to assume such evolutionary processes are at equilibrium given the long periods for which some microsatellite loci are known to have existed. Lastly, we assume that the point mutation rate does not vary within the genome, despite some evidence to the contrary (Wolfe, Sharp, and Li 1989; Santibanez-Koref, Gangeswaran, and Hancock 2002). For peripheral segments of microsatellites, it is possible to use the slippage/point-mutation theory to calculate analytically the equilibrium distributions of slippage models that specify the relationships between segment length, slippage rate, and the point mutation rate (Kruglyak et al. 1998; 2000; Sibly, Whittaker, and Talbot 2001). These equilibrium distributions can then be compared with observed distributions of lengths obtained from a genome search, and the best parameter values of the slippage models can be found using maximum-likelihood (Sibly, Whittaker, and Talbot 2001). These methods have also been used to compare nested linear models of the relationship between slippage rate and length. The results suggest that slippage rates increase with length for dinucleotide microsatellites in humans, mice, and fruit flies and that no or very little slippage occurs in very short segments comprising one to four repeats (Sibly, Whittaker, and Talbot 2001).
Here we derive and test the predictions of the slippage/point-mutation theory for the complete structure of interrupted microsatellites. No assumptions are made as to the form of the relationship between slippage rate and microsatellite length. Predictions are made using a combination of analytical methods and computer simulations. Predictions are tested using data from human AC microsatellites obtained from the human genome, taking one allele per locus.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
CA, GT, and TG microsatellites were not recorded. Within microsatellites satisfying the above criteria we refer to uninterrupted sequences as segments. Thus, in the example given under criterion (3), there are two segments, the first (5'-edge) two repeats in length and the second of five repeats in length.
Models
The version of the slippage/point-mutation theory implemented here follows earlier treatments in presuming that (1) slippage mutations cause an increase or decrease of one repeat; (2) the expansion and contraction rates are identical at any length; (3) the frequency distributions of microsatellite segment lengths in the genome are in equilibrium; and (4) the point mutation rate is invariant throughout the genome. The underlying framework is a discrete time Markov chain, the states of the chain being the positive integers, 1, 2, 3 ... , each of which corresponds to the number of repeats at a microsatellite locus. Under the terms of the model, for each segment each generation three types of transition may occur:
Analytical Calculation of the Equilibrium Distribution of 5' Peripheral Segments
For peripheral segments, the equilibrium frequency distribution of a given slippage model can be calculated analytically (Sibly, Whittaker, and Talbot 2001). Here we show how the methods previously used to analyze straight-line models are readily extended to larger models that place few restrictions on the form of the relationship between microsatellite length and slippage rate. Note that when point mutation breaks a 5' segment into two, the one at the 5' side of the point mutation becomes the new 5' segment. Letting pi be the probability that a randomly selected 5' segment is of length i, i 1, the equilibrium values of pi satisfy the following equations (Sibly, Whittaker, and Talbot 2001):
|
In practice, following Sibly, Whittaker, and Talbot (2001), we estimated the ratios si/a, and c was not estimated because the fitting procedure was conditioned on the absence of microsatellites smaller than five repeats in the data set. Under the equilibrium assumption the observed frequencies of microsatellite lengths have a multinomial distribution with parameters given by the above equilibrium distribution and the sample size n so the likelihood of the data can be written in terms of n and the pi of the equilibrium distribution. Since the pi are functions of the model parameters, this gives the likelihood as a function of the model parameters. Maximum likelihood methods are then used to estimate parameter values and their standard errors as previously described (Sibly, Whittaker, and Talbot 2001).
Computer Simulations
Since the available analytical methods only apply to first segments (Kruglyak et al. 1998, 2000; Sibly, Whittaker, and Talbot 2001), we employed computer simulations to find the equilibrium distributions of second and later segments. For this purpose we needed a sample of independently evolving microsatellites, which we obtained by following the evolution of a single microsatellite, and taking samples sufficiently sparsely that they were not autocorrelated. To see that it is sufficient to model the evolution of a single microsatellite, note that all current microsatellites at any locus are descended from a single ancestor, the most recent common ancestor (MRCA), and that sampling a single microsatellite from the population is equivalent to choosing at random from the possible descendants of the MRCA. The simulation model was run for 2.5 x 1011 generations using the parameters, illustrated in figure 1 (right), that were derived from the frequency distribution of 5'-edge segments. Microsatellite characteristics were recorded only every 107 generations, to remove autocorrelation from the data set. Thus, although the simulation followed the evolution of a single microsatellite, by taking samples sufficiently sparsely that they were not autocorrelated, we obtained a sample equivalent to what would be obtained from independently evolving microsatellites.
|
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The estimated slippage rates shown in figure 1 (right) are relatively low for the first four repeats, and then rise initially roughly linearly with length, as found previously (Rose and Falush 1998; Sibly, Whittaker, and Talbot 2001). However in contrast to the theoretical expectation that slippage rate increases linearly with length (Kruglyak et al. 1998), the more complex model used here suggests that slippage rate peaks at a length of 10 repeats and then declines, roughly linearly, until 20 repeats. The decline in slippage rates after 10 repeats is necessary, in the slippage/point-mutation model, to maintain the frequency-length distribution of figure 1 (left) near to horizontal between 10 and 20 repeats.
The frequency length distributions of the first five segments of interrupted AC microsatellites in the human genome, read in the 5' to 3' direction, are shown in figure 2. The distributions show some similarities, but the later segments are shorter. The equilibrium distributions of segments other than the first cannot be calculated analytically, and we employed simulation to discover the implications for later segments of the slippage model shown in figure 1 (right). The frequency distributions so obtained approximate equilibrium distributions, and are shown in the bottom row of figure 2. The frequency distributions of the different segments of the simulated distributions are very similar to each other, and there is none of the shortening of later segments seen in the genome. Furthermore, there are fewer later segments in the genome than in the simulation.
|
|
|
|
The slippage/point-mutation theory also fails to adequately explain the frequency distributions of the lengths of the various segments (figs. 2 and 5). The theory predicts the various segments will have similar distributions, but in the genome, the later segments are shorter, counting in the 5' to 3' direction (fig. 2), and counting inwards from the periphery, interior segments are shorter than those on the periphery (figs. 4 and 5). The fact that segment lengths decrease towards the center of interrupted microsatellites (fig. 4) suggests that microsatellites become wholly or partly stabilized by processes acting differentially on interior and peripheral segments. This suggests that stabilizing processes act more strongly in the interior than on the peripheral segments of microsatellites. There are several plausible mechanisms by which this feature might be achieved. Interruptions may stabilize the microsatellite (Petes, Greenwell, and Dominska 1997) and perhaps block expansions (Rolfsmeier, Dixon, and Lahue 2000; Rolfsmeier and Lahue 2000) or lead to segment shortening (Taylor, Durkin, and Breden 1999).
Stabilization of the interior segments of microsatellites would also explain the otherwise puzzling geometric decline in the frequency of microsatellites with number of interruptions shown in figure 3. The geometric decline suggests that there is a constant chance of occurrence of one further duplication, and from figure 3 this chance is 0.337. Thus, the chance of obtaining one duplication is 0.337, of obtaining two duplications is 0.3372, of obtaining three duplications is 0.3373, and so on. At first sight, it appears odd that the chance of obtaining one further duplication does not increase with the number of interruptions already in the microsatellite. If for instance the microsatellite contains five interruptions, one would have thought it five times more prone to slippage mutation than a microsatellite with one interruption. This can be explained, however, if internal interruptions are stabilized as discussed above, so that only peripheral interruptions are prone to duplication.
We conclude that the detailed structure of interrupted microsatellites is incompatible with the slippage/point-mutation theory in the simple form in which it has been tested here. The inferred pattern of mutation rates (fig. 1, right) is contrary to expectation, the ratio of slippage to point mutation rates is two orders of magnitude less than it should be, there is unexplained variation between the frequency distributions and sizes of the various segments (figs. 2, 4, and 5), and the frequency distribution of interruptions falls off faster than predicted (fig. 3). Reconciliation of some of these results with the theory might be possible by invoking slippage to duplicate/remove interruptions within microsatellites as described above, but it is uncertain whether this would have the desired quantitative effects.
In deriving predictions from the slippage/point-mutation theory it was assumed that the point mutation rate is invariant within a microsatellite locus. Could the slippage/point-mutation theory be rescued by relaxing this assumption while constraining the slippage mutation rate to increase with segment length in a more plausible manner? Variation of the point mutation rate with segment length might arise as a result of the action of slippage duplicating or stabilizing interruptions as described above. It can be incorporated into the models by making the parameter a a function of segment length j, so that equation (2) becomes
|
Derivation of predictions also relied on the assumption that the point mutation rate is invariant throughout the genome. There is, however, some evidence to the contrary (Wolfe, Sharp, and Li 1989; Santibanez-Koref, Gangeswaran, and Hancock 2002). Recently, indeed, it has been pointed out that the slippage/point-mutation theory predicts a negative correlation between microsatellite length and the local point mutation rate, and a negative correlation has been reported between microsatellite lengths and the substitution rate in their flanking sequences, comparing orthologous loci in mouse and rat (Wolfe, Sharp, and Li 1989; Santibanez-Koref, Gangeswaran, and Hancock 2002). It was suggested on this basis that length differences are to some extent caused by differences in the local point mutation rate. It is therefore worth considering how the analysis that produced figure 1 (right) would be affected by local variation in point mutation rate. The easiest way to think about this is to suppose that loci are classified by their local point mutation rates, the frequency of each such class being known. Each class then corresponds to a particular value of a in equations (2) and (3). The equilibrium distribution for each value of a can be calculated as before, and the frequencies of each microsatellite length can then be obtained by summing the class frequencies. Realization of this analysis in practice will have to await information as to the distribution of local point mutation rates.
Given that the results presented here are incompatible with the existing slippage/point-mutation theory, could they be explained by the alternative theory mentioned in the Introduction? According to this theory, longer microsatellites experience more contractions than expansions, so that there is there is a length-dependant mutation bias (Xu et al. 2000). This "mutation-bias" theory can certainly produce frequency distributions like that shown in figure 1a, but when we have attempted to incorporate its effects, the parameter estimates failed to converge. Thus, we have not so far been able to combine the slippage/point-mutation and mutation-bias theories into a single theory with estimable parameters. Note also that such a theory will still have difficulties accounting for the observed differences between peripheral and interior segments (figs. 4 and 5), so further factors will still be needed to explain the stabilization of interior segments
In conclusion, we have shown that the results presented here are incompatible with the existing slippage/point-mutation theory, and, in addition, important information has been gained about the detailed structure of interrupted microsatellites. We now know that segment lengths decrease towards the center of interrupted microsatellites (fig. 4), and that microsatellite frequency declines geometrically with microsatellite size measured in number of segments (fig. 3). An intriguing implication of these findings is that peripheral segments essentially behave as two isolated and perfect microsatellites, whereas neighboring segments heavily influence the growth of internal segments. The mechanistic explanations of these results probably involve stabilizing effects of interruptions, perhaps blocking expansions or enabling segment shortening, together with removal of interruptions through slippage.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
E-mail: r.m.sibly{at}rdg.ac.uk.
Diethard Tantz, Associate Editor
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bell, G. I., and J. Jurka. 1997. The length distribution of perfect dimer repetitive DNA is consistent with its evolution by an unbiased single-step mutation process. J. Mol. Evol. 44:414-421.[ISI][Medline]
Brinkmann, B., M. Klintschar, F. Neuhuber, J. Huhne, and B. Rolf. 1998. Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. Am. J. Human Genet. 62:1408-1415.[CrossRef][ISI][Medline]
Brohede, J., C. R. Primmer, A. P. Moller, and H. Ellegren. 2002. Heterogeneity in the rate and pattern of germline mutation at individual microsatellite loci. Nucleic Acids Res. 30:1997-2003.
Calabrese, P. P., R. Durrett, and C. F. Aquadro. 2001. Dynamics of microsatellite divergence under stepwise mutation and proportional slippage/point mutation models. Genetics 159:839-852.
Crow, J. F. 1993. How much do we know about spontaneous human mutation rates? Environ. Mol. Mutagen. 21:122-129.[ISI][Medline]
Ellegren, H. 2000. Microsatellite mutations in the germline: implications for evolutionary inference. Trends Genet. 16:551-558.[CrossRef][ISI][Medline]
Harr, B., B. Zangerl, and C. Schlotterer. 2000. Removal of microsatellite interruptions by DNA replication slippage: phylogenetic evidence from Drosophila. Mol. Biol. Evol. 17:1001-1009.
Huang, Q. Y., F. H. Xu, H. Shen, H. Y. Deng, and Y. J. Liu, et al. 2002. Mutation patterns at dinucleotide microsatellite loci in humans. Am. J. Human Genet. 70:625-634.[CrossRef][ISI][Medline]
Kruglyak, S., R. T. Durrett, M. D. Schug, and C. F. Aquadro. 1998. Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. Proc. Nat. Acad. Sci. USA 95:10774-10778.
2000. Distribution and abundance of microsatellites in the yeast genome can be explained by a balance between slippage events and point mutations. Mol. Biol. Evol. 17:1210-1219.
Petes, T. D., P. W. Greenwell, and M. Dominska. 1997. Stabilization of microsatellite sequences by variant repeats in the yeast Saccharomyces cerevisiae. Genetics 146:491-498.
Rolfsmeier, M. L., M. J. Dixon, and R. S. Lahue. 2000. Mismatch repair blocks expansions of interrupted trinucleotide expansions in yeast. Mol. Cell 6:1501-1507.[ISI][Medline]
Rolfsmeier, M. L., and R. S. Lahue. 2000. Stabilizing effects of interruptions on trinucleotide repeat expansions in Saccharomyces cerevisiae. Mol. Cell Biol. 20:173-180.
Rose, O., and D. Falush. 1998. A threshold size for microsatellite expansion. Mol. Biol. Evol. 15:613-615.
Santibanez-Koref, M. F., R. Gangeswaran, and J. M. Hancock. 2002. A relationship between lengths of microsatellites and nearby substitution rates in mammalian genomes. Mol. Biol. Evol. 18:2119-2123.[ISI]
Sibly, R. M., J. C. Whittaker, and M. Talbot. 2001. A maximum-likelihood approach to fitting equilibrium models of microsatellite evolution. Mol. Biol. Evol. 18:413-417.
Tautz, D. 1993. Notes on the definition and nomenclature of tandemly repetitive DNA sequences. Pp. 2128 in S. D. J. Pena, R. Chakraborty, J. T. Epplen, and A. J. Jeffreys, eds. DNA fingerprinting: state of the science. Birkhauser, Basel, Switzerland.
Taylor, J. S., J. M. H. Durkin, and F. Breden. 1999. The death of a microsatellite: a phylogenetic perspective on microsatellite interruptions. Mol. Biol. Evol. 16:567-572.
Wolfe, K. J., P. M. Sharp, and W. H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283-285.[CrossRef][ISI][Medline]
Xu, X., M. Peng, Z. Fang, and X. Xu. 2000. The direction of microsatellite mutations is dependent upon allele length. Nat. Genet. 24:396-399.[CrossRef][ISI][Medline]