Comparative Sequence Analysis Group, MRC Clinical Sciences Centre, Imperial College School of Medicine, Hammersmith Hospital, London, England
A number of studies have indicated an influence of sequences flanking tandem repeats on repeat stability (Monckton et al. 1994
; Shimizu et al. 1996
; Bowater et al. 1997
; Jeffreys, Murray, and Neumann 1998
; Kruglyak et al. 1998
; Brock, Anderson, and Monckton 1999
). In particular, Kruglyak et al. (1998)
have postulated a model to explain differences in average microsatellite length between species. According to this model, the average microsatellite length in a genome depends on two parameters: the tendency of microsatellites to undergo slippage-like mutation, and the rate of base substitution. We hypothesized that such a model might also apply within genomes, such that, for example, local variations in point substitution rate (Wolfe, Sharp, and Li 1989
) could give rise to differences in average microsatellite length in different locations. To test this hypothesis, we investigated the relationship between microsatellite length and the substitution rates in flanking sequences.
We first examined the relationship between flanking-sequence divergence and array length for CA microsatellites, which are common in a variety of eukaryotes (Hamada, Petrino, and Kakunaga 1982
). We compared homologous loci in the rat and the mouse, two species for which a substantial amount of sequence information is available and which are related closely enough to frequently share microsatellites at orthologous positions. Sequences were retrieved from the GenBank (release 99.0) or EMBL database (release 49.0) using the GCG package. The databases were screened for rat or mouse sequences containing at least (CA)10 or (TG)10. Fifty bases 5' of the start of the CA or TG block in the orientation represented in the database were used to search for homologs in the other species. We excluded entries containing composite repeats and other microsatellites within 150 bp, since these would complicate the determination of the flanking-sequence boundaries. The sequences were aligned using the program BESTFIT (GCG 1994
) using default parameter values.
As shown in figure 1A,
the boundaries between a microsatellite and its flanking sequences are not always easily defined. Surrounding the (CA)n tract is often a region that contains reiterations of the repeat motif interrupted by other sequences. Consequently, an unambiguous alignment in this region is often difficult to define, making estimates of sequence divergence unreliable. Previous analyses have suggested that substitution rates in regions immediately adjacent to microsatellite arrays are elevated (Djian, Hancock, and Chana 1996
; Brohede and Ellegren 1999
; Hancock, Worthey, and Santibáñez-Koref 2001
). Sequence changes in regions immediately adjacent to tandem repeats may reflect mutational events involving the repeats themselves rather than processes in nearby regions of the genome. Because of this, we avoided the region immediately adjacent to the microsatellite when estimating the sequence divergence of flanking sequences. The region that we excluded from the analysis was designated the transitional zone (see fig. 1
). We defined the limits of this region to be the first two identical residues 5' or 3' of the microsatellite core that were not involved in a reiteration of the microsatellite motif in either of the two sequences of the alignment.
|
To test the generality of this observation for other classes of repeats, we investigated whether a similar correlation could be observed for CAG arrays. These arrays are often found in coding regions (Stallings 1994
) and have recently attracted considerable attention because of their involvement in a number of inherited human diseases (Rubinsztein 1999
). We examined a set of homologous mouse and rat coding sequences containing CAG arrays. We selected sequences with a CAG or CTG tract with more than five repeat units in either the rat or the mouse homolog. To restrict the influence of possible constraints at the amino acid level, only sequences coding for polyglutamine tracts were included. Total (K), synonymous (Ks), and nonsynonymous (Ka) divergences were then calculated with the method of Li, Wu, and Luo (1985)
using the program LI93 (K. H. Wolfe, unpublished). The criteria defined above for CA repeats were used to delimit the transitional zone. The calculations were again based on 150-bp flanking sequence on either side of the array as far as this was possible, but introns, 5' and 3' untranslated regions, and regions in which the alignments became ambiguous were excluded. Twelve sequence pairs were included in the analysis. Note that since many of the sequences included in this analysis were only available as cDNAs, the results (see table 1
) are probably confounded by not taking into account distances separating array and flanking sequence at the genomic level and by any effects of intronic sequences on flanking-sequence evolution.
|
The observed correlation between flanking-sequence divergence and repeat length raised the possibility that changes in the local rate of sequence change could give rise to changes in repeat length. Such changes should be observable in phylogenetic analysis of a region containing a repeat known to change in length during evolution. Examples of such repeats are CAG repeats found in genes involved in human neurological diseases (reviewed by Rubinsztein 1999
). To investigate whether this may be the case, we carried out a phylogenetic analysis of changes within primate orthologs of the human dentatorubral-pallidoluysian atrophy (DRPLA) locus (Nagafuchi et al. 1994
). We PCR-amplified 435 bp of genomic sequence 3' of the DRPLA CAG repeat. This region corresponds to positions 17352169 of the published human sequence (accession number D31840; Nagafuchi et al. 1994
) and lies within exon 5 of the gene. The region was amplified from six nonhuman primate speciesthe bonobo (Pan paniscus, accession number AJ133270), the gorilla (Gorilla gorilla, AJ133271), the orangutan (Pongo pygmaeus, AJ133272), the gibbon (Hylobates lar, AJ133273), the cynomolgous monkey (Macaca fascicularis, AJ133274), and the tufted capuchin (Cebus apella, AJ133275)and compared with the published human sequence. We compared substitution rates rather than divergences to take into account divergence times between species. Synonymous and nonsynonymous changes were identified by assigning the reading frame from the human sequence, and rates were calculated using divergence times published by Kumar and Hedges (1998)
. The values in table 1
are the correlation coefficients derived from 21 pairwise comparisons. To estimate the significance of these coefficients given the phylogenetic relationships and array sizes, we simulated a set of sequences related by the corresponding phylogenetic tree (Kumar and Hedges 1998
) 10,000 times, assuming a constant substitution rate of 1.3 x 10-3 per residue per million years (the observed rate average for these species at that locus). We then calculated simulated correlation coefficients between estimated substitution rates and array length. The analyses again showed a significant inverse correlation between substitution rate and array length.
Our findings suggest that a relationship between flanking-sequence divergence and repeat length applies to both noncoding CA microsatellites and coding CAG repeats. The observation of this relationship at synonymous sites and in noncoding sequences suggests that the effect is not the result of selection at the peptide level and may instead reflect genuine differences in mutation rates. While some investigators have not detected any differences between the substitution rates of sequences adjacent to microsatellites (Brohede and Ellegren 1999
), others have reported surprisingly low divergence, or conservation, of microsatellite loci across different species (Schlötterer, Amos, and Tautz 1991
; FitzSimmons, Moritz, and Moore 1995
; Rico, Rico, and Hewitt 1996
; Ezenwa et al. 1998
; Zhu, Queller, and Strassmann 2000
). Observations of this kind would be consistent with our results, as higher conservation of flanking sequences of long microsatellites would improve the ability of primers to amplify across species provided these flanking sequences are located far enough from the array, i.e., outside the transitional zone.
Although the wider generality of these phenomena remains to be tested, the observations reported here are consistent with the hypothesis that local variations in point substitution rate around a genome could influence the lengths of microsatellites. This represents an extension of the model proposed by Kruglyak et al. (1998)
, who suggested that such a relationship gave rise to the genomewide length distribution of microsatellites. On this basis, we suggest that long microsatellites are likely to be more stable (and/or form preferentially) in regions of mammalian genomes with low point mutation rates. The results also raise the possibility that evolutionary processes that affect substitution rates at a genomic locality may also affect the lengths of microsatellites in that region.
Our results could also be explained by an inverse effect of array expansion on the rate of sequence change in flanking regions. The presence of a microsatellite has been associated with effects on adjacent sequences such as alteration of chromatin structure (Wang et al. 1994
; Otten and Tapscott 1995
), transcription (Bidichandani, Ashizawa, and Patel 1998
), and stimulation of gene conversion (Wahls, Wallace, and Moore 1990
). These or similar processes could affect the mutation rate by, for example, modulating accessibility to DNA-damaging agents or components of the repair machinery. However, any influence of these processes on the substitution rate remains to be established.
In summary, our findings provide evidence for an association between the size of CA and CAG microsatellites and the rate of evolutionary change of the adjacent DNA. They suggest that the presence of this widespread class of sequence elements may signal the presence of genome regions with relatively low point mutation rates.
Acknowledgements
We thank the U.K. Medical Research Council for financial support, Andy Porter for supplying orangutan DNA, Hector Seuanez for C. apella DNA, Philippe Djian for DNA from the other primates, and Ken Wolfe for LI93.
Footnotes
Jeffrey Long, Reviewing Editor
1 Present address: Department of Bioinformatics, Max Delbruck Centrum, Berlin, Germany.
2 Present address: Department of Computer Science, Royal Holloway University of London, Egham, Surrey, England.
3 Keywords: microsatellites
genome evolution
flanking sequences
transitional zone
DRPLA
mutation rate
4 Address for correspondence and reprints: John M. Hancock, Department of Computer Science, Royal Holloway University of London, Egham, Surrey TW20 0EX, United Kingdom. j.hancock{at}cs.rhul.ac.uk
.
References
Amos W., S. J. Sawcer, R. W. Feakes, D. C. Rubinsztein, 1996 Microsatellites show mutational bias and heterozygote instability Nat. Genet 13:390-391[ISI][Medline]
Bidichandani S. I., T. Ashizawa, P. I. Patel, 1998 The GAA triplet-repeat expansion in Friedreich ataxia interferes with transcription and may be associated with an unusual DNA structure Am. J. Hum. Genet 62:111-121[ISI][Medline]
Bowater R. P., A. Jaworski, J. E. Larson, P. Parniewski, R. D. Wells, 1997 Transcription increases the deletion frequency of long CTG. CAG triplet repeats from plasmids in Escherichia coli Nucleic Acids Res 25:2861-2868
Brock G. J., N. H. Anderson, D. G. Monckton, 1999 Cis-acting modifiers of expanded CAG/CTG triplet repeat expandability: associations with flanking GC content and proximity to CpG islands Hum. Mol. Genet 8:1061-1067
Brohede J., H. Ellegren, 1999 Microsatellite evolution: polarity of substitutions within repeats and neutrality of flanking sequences Proc. R. Soc. Lond. B Biol. Sci 266:825-833[ISI][Medline]
Djian P., J. M. Hancock, H. S. Chana, 1996 Codon repeats in genes associated with human diseases: fewer repeats in the genes of nonhuman primates and nucleotide substitutions concentrated at the sites of reiteration Proc. Natl. Acad. Sci. USA 93:417-421
Ezenwa V. O., J. M. Peters, Y. Zhu, E. Arevalo, M. D. Hastings, P. Seppa, J. S. Pedersen, F. Zacchi, D. C. Queller, J. E. Strassmann, 1998 Ancient conservation of trinucleotide microsatellite loci in polistine wasps Mol. Phylogenet. Evol 10:168-177[ISI][Medline]
FitzSimmons N. N., C. Moritz, S. S. Moore, 1995 Conservation and dynamics of microsatellite loci over 300 million years of marine turtle evolution Mol. Biol. Evol 12:432-440[Abstract]
GCG. 1994 Program manual for the Wisconsin package. Version 8 Genetics Computer Group, Madison, Wis
Hamada H., M. G. Petrino, T. Kakunaga, 1982 A novel repeated element with Z-DNA-forming potential is widely found in evolutionarily diverse eukaryotic genomes Proc. Natl. Acad. Sci. USA 79:6465-6469[Abstract]
Hancock J. M., E. A. Worthey, M. F. Santibáñez-Koref, 2001 A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in humans and mice Mol. Biol. Evol 18:1014-1023
Jeffreys A. J., J. Murray, R. Neumann, 1998 High-resolution mapping of crossovers in human sperm defines a minisatellite-associated recombination hotspot Mol. Cell 2:267-273[ISI][Medline]
Kruglyak S., R. T. Durrett, M. D. Schug, C. F. Aquadro, 1998 Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations Proc. Natl. Acad. Sci. USA 95:10774-10778
Kumar S., S. B. Hedges, 1998 A molecular timescale for vertebrate evolution Nature 392:917-920[ISI][Medline]
Li W. H., C. I. Wu, C. C. Luo, 1985 A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes Mol. Biol. Evol 2:150-174[Abstract]
Monckton D. G., R. Neumann, T. Guram, N. Fretwell, K. Tamaki, A. MacLeod, A. J. Jeffreys, 1994 Minisatellite mutation rate variation associated with a flanking DNA sequence polymorphism Nat. Genet 8:162-170[ISI][Medline]
Nagafuchi S., H. Yanagisawa, E. Ohsaki, T. Shirayama, K. Tadokoro, T. Inoue, M. Yamada, 1994 Structure and expression of the gene responsible for the triplet repeat disorder, dentatorubral and pallidoluysian atrophy (DRPLA) Nat. Genet 8:177-182[ISI][Medline]
Otten A. D., S. J. Tapscott, 1995 Triplet repeat expansion in myotonic dystrophy alters the adjacent chromatin structure Proc. Natl. Acad. Sci. USA 92:5465-5469[Abstract]
Primmer C. R., N. Saino, A. P. Moller, H. Ellegren, 1996 Directional evolution in germline microsatellite mutations Nat. Genet 13:391-393[ISI][Medline]
Rico C., I. Rico, G. Hewitt, 1996 470 million years of conservation of microsatellite loci among fish species Proc. R. Soc. Lond. B Biol. Sci 263:549-557[ISI][Medline]
Rubinsztein D. C., 1999 Trinucleotide expansion mutations cause diseases which do not conform to classical Mendelian expectations Pp. 8097 in D. B. Goldstein and C. Schlötterer, eds. Microsatellites: evolution and applications. Oxford University Press, Oxford, England
Rubinsztein D. C., W. Amos, J. Leggo, S. Goodburn, S. Jain, S. H. Li, R. L. Margolis, C. A. Ross, M. A. Ferguson-Smith, 1995 Microsatellite evolutionevidence for directionality and variation in rate between species Nat. Genet 10:337-343[ISI][Medline]
Schlötterer C., B. Amos, D. Tautz, 1991 Conservation of polymorphic simple sequence loci in cetacean species Nature 354:63-65[ISI][Medline]
Shimizu M., R. Gellibolian, B. A. Oostra, R. D. Wells, 1996 Cloning, characterization and properties of plasmids containing CGG triplet repeats from the FMR-1 gene J. Mol. Biol 258:614-626[ISI][Medline]
Stallings R. L., 1994 Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases Genomics 21:116-121[ISI][Medline]
Wahls W. P., L. J. Wallace, P. D. Moore, 1990 The Z-DNA motif d(TG)30 promotes reception of information during gene conversion events while stimulating homologous recombination in human cells in culture Mol. Cell. Biol 10:785-793[ISI][Medline]
Wang Y. H., S. Amirhaeri, S. Kang, R. D. Wells, J. D. Griffith, 1994 Preferential nucleosome assembly at DNA triplet repeats from the myotonic dystrophy gene Science 265:669-671[ISI][Medline]
Wolfe K. H., P. M. Sharp, 1993 Mammalian gene evolution: nucleotide sequence divergence between mouse and rat J. Mol. Evol 37:441-456[ISI][Medline]
Wolfe K. H., P. M. Sharp, W. H. Li, 1989 Mutation rates differ among regions of the mammalian genome Nature 337:283-285[ISI][Medline]
Zhu Y., D. C. Queller, J. E. Strassmann, 2000 A phylogenetic perspective on sequence evolution in microsatellite loci J. Mol. Evol 50:324-338[ISI][Medline]