MRC Clinical Sciences Centre, Imperial College School of Medicine, Hammersmith Hospital, London, England;
Department of Computer Science, Royal Holloway University of London, Egham, Surrey, England
Leishmania Genome Group, Seattle Biomedical Research Institute, Seattle, Washington
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Two explanations for these observations have been proposed. The first suggests that the evolutionary expansion of these repeats reflects their genomewide expansion along the primate lineage and especially in humans (Rubinsztein et al. 1995a
). The reality of such lineage-specific, genomewide effects remains uncertain, despite a number of subsequent analyses (reviewed in Amos 1999
; Rubinsztein, Amos, and Cooper 1999
). This is primarily because of the confounding effect of ascertainment bias (Ellegren, Primmer, and Sheldon 1995
), that is, the expectation that repeats isolated in one species will be longer than their homologs in other species as they have been isolated because of their polymorphic nature. Long repeats are more polymorphic than short repeats. Ascertainment bias confounds even the relatively well studied comparison between humans and chimpanzees, while evidence for such differences between humans and other primates is lacking, and indeed there is some evidence to the contrary (e.g., Morin et al. 1998
). There is also evidence for very long CAG repeats in mice (King et al. 1998
). A number of explanations have been suggested for the human-chimpanzee difference (Amos 1999
; Rubinsztein, Amos, and Cooper 1999
), but these rely on characteristics of human and chimpanzee evolutionary history and therefore cannot provide an explanation for changes in repeat length over long periods of evolution.
The second possible explanation for the evolutionary expansion of CAG repeats in these genes is that forces or processes that are specific to individual genes and/or genomic locations act on particular genes in particular evolutionary lineages to give rise to locus- and lineage-specific expansions. One prominent candidate for such an influence is local base (and nucleotide motif) composition. Different isochores in mammalian genomes have different GC compositions, and genes within these regions show correlated base compositions, notably at third codon positions (Mouchiroud, Gautier, and Bernardi 1995
). Thus, genes within GC-rich isochores will tend to accumulate concentrations of codons with G and C at their third positions, which might act as seeds for replication slippage and predispose genes to accumulating codon repeats. In the extreme, such biases could even bias amino acid compositions of proteins, again predisposing genes to seeding of codon repeats (Nakachi et al. 1997
; Nishizawa and Nishizawa 1998
; Brock, Anderson, and Monckton 1999
). Brock, Anderson, and Monckton (1999)
have even suggested that local base composition affects the frequency of indel mutations at CAG repeats. Another possibility is that of the effects of local mutation rate. Kruglyak et al. (1998)
have suggested that the equilibrium length of microsatellites is a consequence of the balance between the rates of point and slippage mutation. Incorporation of point mutations into repeats reduces their rate of length change during evolution (Albà, Santibáñez-Koref, and Hancock 1999a
). If either or both of these parameters varied across a genome, this could affect the accumulation of tandem repeats. Finally, Djian, Hancock, and Chana (1996)
have suggested that codon repeats in disease genes are flanked by regions with a relatively high frequency of acceptance of point mutations. Mutational instability of regions immediately flanking CA microsatellites has also been suggested by Brohede and Ellegren (1999)
. High rates of sequence change could reflect a relatively low level of purifying selection in the vicinity of repeats. Selective forces could differ between genes and subregions of genes, depending on the phenotypic consequences of mutations in these different locations. These differences could affect the probability of tandem repeats arising, and, in particular, expanding, during evolution (Nishizawa, Nishizawa, and Kim 1999
). The recent demonstration for Saccharomyces cerevisiae that transcription factors and protein kinases are significantly overrepresented among proteins that contain polyglutamine repeats (Albà, Santibáñez-Koref, and Hancock 1999b
) also indicates a role for selective constraints in the evolution of these structures, although their functional significance remains unclear (Schmid and Tautz 1999
).
Here, we addressed the question of the forces giving rise to the evolutionary expansion of CAG repeats in triplet expansion disease and other genes by comparing the lengths of CAG repeats in humans and mice and by considering the base and codon compositions and rates of synonymous and nonsynonymous substitution in CAG repeat-containing genes. We found no evidence of a preferential accumulation of CAG repeats in the human genome relative to the mouse genome or of differences in the nature of the selection acting on genic positioning of CAG repeats in the two species. When we considered pairs of proteins that contained a CAG repeat in one species but not the other, we found no differences in the properties of surrounding sequences. However, we did find an overrepresentation, relative to the average amino acid usage in humans and mice, of the amino acids proline, glutamine, histidine, and serine, which may have given rise to biases in the gene sequences and predisposed them to accumulating repeats. We also observed locally high levels of nonsynonymous base substitution in the neighborhood of repeats in genes containing a repeat in only one species, but low levels in genes in which repeats were conserved between humans and mice. We combine these observations to propose a hypothesis to explain the evolution of these repeats.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
For comparative analysis of database sequences containing CAG repeats of length 7 or more, the GenBank and EMBL DNA databases, including EST and STS subgroups, were analyzed using routines from the GCG package, version 9.1 (Genetics Computer Group 1997
), unless otherwise noted. The databases were searched using the pattern recognition routine FINDPATTERNS. Entries showing >95% identity to one another upon multiple sequence alignments using PILEUP (Genetics Computer Group 1997
), CLUSTAL W (Thompson, Higgins, and Gibson 1994
), version 1.7, and FASTA (Pearson and Lipman 1988
) were considered to represent the same sequence and grouped together. This allowed for sequencing errors without grouping members of gene families together as single loci. The sequence with the longest array was again taken as the representative from each of these groups. Database entries were again obtained using ENTREZ. The genic locations of repeats were identified using sequence annotations where these were available.
Sequence Analysis Methods
Tandem codon arrays of length 5 were identified using ARRAYFINDER (Hancock et al. 1999
). A modified version of ARRAYFINDER (PROTARRAY) allowed identification of all amino acid tandem repeats of this length. cDNA codon frequencies were calculated using the GCG program CODONFREQUENCY (Genetics Computer Group 1997
). These frequencies were used to calculate overall and third-codon-position base compositions using a commercially available spreadsheet, which was also used to carry out most statistical tests. Other statistical tests were carried out using the SPSS package and the VassarStats web server (http://faculty.vassar.edu/
lowry/VassarStats.html). Significance thresholds were subjected to Bonferroni adjustment to take into account multiple testing. Significance values quoted in the text are also Bonferroni-adjusted. Expected amino acid frequencies in cDNAs were calculated on the basis of overall codon frequency tables for mice and humans obtained from the CUTG database server (Nakamura, Gojobori, and Ikemura 2000) at http://www.kazusa.or.jp/codon/. To calculate synonymous and nonsynonymous DNA sequence divergences (Ks and Ka), sequence pairs were aligned using the LaserGene program MEGALIGN (DNASTAR, Madison, Wis.). Alignments were calculated by translating cDNAs into protein sequences and using the method of Hein (1990)
, which coped better with sequences of unequal length than the Clustal algorithm (Higgins and Sharp 1989
) as implemented in MEGALIGN. Ks and Ka for sequence pairs were calculated using MEGA, version 1.01 (Kumar, Tamura, and Nei 1993
) using the Jukes-Cantor correction for saturation (Jukes and Cantor 1969
). We excluded all repetitive regions from the analysis. Regions to be excluded were initially identified by length difference between species (i.e., presence of an indel in the alignment). The limits of the repeat region were then defined by extending the repeat as far as the last codon adjacent to the repeat that was identical in two out of three positions to the tandemly repeated codon in either species. This excluded not only CAG repeats, but also all other length-varying codon repeats.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
To further investigate whether the lengths of human and mouse CAG repeats differed, we screened databases for tandem CAG repeats of length >7 in the two species. We identified all repeats, irrespective of their locations within genes, and did not restrict our search to pairs of homologous sequences. Mean lengths (in base pairs) for these repeats were 29.06 (median 27, N = 205) for humans and 36.05 (median 33, N = 63) for mice. The length distributions were significantly different (P < 0.001, Mann-Whitney U test), with mice tending to have longer CAG repeats than humans. We therefore found no bias toward longer CAG repeats in humans versus mice at the whole-genome level and, indeed, found evidence of the opposite bias.
There is no a priori reason to expect tandem repeats of CAG to lie in any particular reading frame of an exon unless selection has constrained the reading frames in which these repeats have been able to expand. Frame specificity of this kind has been reported previously (Stallings 1994
). To test for any global difference in this pattern (and therefore in the selection causing it) between humans and mice, we investigated the locations of the identified repeats that lay within adequately annotated database sequences (table 2
). CAG repeats were preferentially found in the reading frame encoding glutamine (reading frame 1 in table 2
) in both humans and mice (P < 0.0001 for mice, humans, and overall; chi-square against an even distribution in all six reading frames, df = 5). There was no significant difference in repeat distribution between species (chi-square test for inhomogeneity in the 2 x 9 contingency table; P > 0.05; df = 8). Thus, there appear to be no strong differences in the selective forces acting on the locations of CAG repeats in the human and mouse genomes.
|
Base, Codon, and Amino Acid Composition
As base composition has been proposed to be an important factor in driving CAG repeat evolution (Brock, Anderson, and Monckton 1999
), we attempted to identify common sequence properties of genes containing disease-causing CAG repeats and consistent changes in homologs containing repeats relative to homologs not containing repeats by analyzing the base compositions of the cDNA sequences for the 28 gene pairs. For both mouse and human homologs and for all gene groups, G+C compositions were on average higher than expected compositions calculated from the CUTG table of codon frequencies (table 3
). The overall mean G+C composition (i.e., for groups B, M, and H pooled) deviated significantly from expectation in mice and humans (P < 0.05; two-tailed t-test). Third-codon-position base compositions were also higher than expected for all groups, but the pooled difference did not approach significance. Interspecies differences in base composition were not statistically significant. Thus, we found a generally high G+C content in the set of genes in both species, even when the gene did not contain a repeat.
|
As these analyses indicated significantly biased amino acid compositions, at least for groups B and H, we then calculated the relative representations of amino acids within the 28 proteins, again calculating expectations based on species codon frequencies (table 4 ). Significances of the observed/expected (O/E) values so calculated were estimated using the same set of sequences as above, calculating O/E values for the same numbers of random groups of 8 or 10 proteins. Confidence levels were estimated for each amino acid separately after adjusting for multiple tests. In both human and mouse data sets, four amino acids (Gln, Pro, His, and Ser) showed a significant overall excess (P < 0.05) and showed an excess in all three groups.
|
|
|
|
If a low Ka value is indicative of relatively strong selection acting on a protein, this might also influence the rate of change of the lengths of repeat regions. We therefore investigated the relationship between Ka and the difference in the length of the longest CAG repeat present in each gene, irrespective of the species in which it was found. Ka correlated positively and significantly with this difference (r = 0.420, P < 0.05).
In summary, these results indicate an association of new repeats with regions of high Ka (corresponding to regions of low purifying selection) and no association with regions of high Ks (corresponding to a high local mutation rate).
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our data also do not support the suggestion that local base composition has driven the accumulation of repeats within the 28 pairs of homologous repeat-containing genes we considered (Jurka and Pethiyagoda 1995
; Nakachi et al. 1997
; Nishizawa and Nishizawa 1998
; Brock, Anderson, and Monckton 1999
). Although we found higher GC and GC3 contents than expected for all of the gene groups studied here, this reflected solely the biased amino acid compositions of the gene products and was not the result of any preferential use of synonymous codons with GC-rich third positions, as would be expected if mutation toward a biased base composition were the force driving the observed biases. We also did not find any difference in base composition between genes containing repeats and genes not containing repeats, which would be expected if changes in base composition drove repeat evolution.
Finally, we found no relationship between mutation rate, as indicated by the synonymous substitution rate, and the emergence of repeats during evolution. This is not consistent with a model whereby repeat evolution in a genomic locality reflects the balance between point and slippage mutation rates there (Kruglak et al. 1998
). However, there is evidence that substitution rates in regions flanking CA microsatellites correlate inversely with repeat length in a larger data set (unpublished data). It is therefore possible that effects of this kind also contribute to the evolution of CAG repeats in genes but that these effects are relatively weak in this data set and/or could not be detected here because of the data set's relatively small size and the correlation between Ka and Ks.
We found three strong patterns in our data set: overrepresentations of certain amino acids, differences in the nonsynonymous substitution rates observed in group B genes compared with group H and M genes, and elevated nonsynonymous substitution rates in the vicinity of repeats in group H and M genes. At the level of amino acid composition, we observed significant overrepresentation of four amino acids, Gln, Pro, Ser, and His, in all genes studied. Along with Gln repeats, we also observed numerous Pro repeats in these proteins. It is likely that the biased amino acid compositions of these genes reflect in some way functional selection on these genes. As these amino acid composition biases are similar in human and mouse proteins, this selection must have taken place before the divergence of the two lineages, one of the most ancient eutherian divergences. The shared overrepresentation of these amino acids between species also indicates that changes in amino acid bias have not driven repeat accumulation. However, the biased amino acid compositions of repeat-containing proteins indicate that such bias might provide a breeding ground for new repeats because new repeats contain an unusual concentration of Gln codons and related codons such as CCG (Pro). The preference for polyglutamine repeats to occur in proteins with these amino acid composition biases could therefore reflect either selection favoring polyglutamine repeats in these proteins as part of a selection for a high Gln content, preferential seeding of CAG repeats in genes with high concentrations of Gln and high GC-content, or both.
We also found a significant difference in overall Ka (but not Ks) between group B proteins and other proteins and a significant bias toward higher Ka (but not Ks) near the Gln repeat in group H+M but not group B proteins. The Ka values for regions flanking repeats in group H+M genes were twice the average for human-mouse sequence pairs calculated by Makalowski and Boguski (1998)
, 0.201 compared with 0.090, consistent with our suggestion of high rates of sequence change near disease-causing repeats (Djian, Hancock, and Chana 1996
), although this difference was not significant (Mann-Whitney U test). These observations indicate that there have been considerably larger differences in strength of selection than in mutation rate in these proteins. If a high Ka value indicates a low level of purifying selection, polyglutamine repeats in proteins in groups H and M could have evolved as effectively neutral structures in a low-purifying-selection environment. Repeats in the group B genes, on the other hand, may have been conserved in a high-purifying-selection environment. The significant correlation between Ka and CAG length difference between species is consistent with this.
The stronger purifying selection acting on the polyglutamine repeats in group B proteins is also consistent with the observation of a significant difference in the lengths of polyglutamine repeats of humans and mice in these genes: there may be differences in the strength or type of selection acting on these repeats between the two species. This, in turn, may reflect in some way the functions of these structures in the two species. However, this difference in repeat length appears to be a special property of genes that have a repeat in both species, as lengths of CAG repeats did not show any evidence of significant difference between species overall. This difference would therefore not appear to be relevant to neutrally evolving repeats, such as those found in the human disease genes.
Whether or not polyglutamine repeats in proteins affect function remains unclear. Sequence analysis has not provided clear evidence for their functional importance (Treier, Pfeifle, and Tautz 1989
; Green and Wang 1994
; Karlin and Burge 1996
; Michalakis and Veuille 1996
; Tautz and Nigro 1998
; Schmid and Tautz 1999
), but biochemical studies have indicated effects on protein-protein interactions (Kazemi-Esfarjani, Trifiro, and Pinsky 1995
; Lanz et al. 1995
; Pinto and Lobe 1996
; Schwechheimer, Smith, and Bevan 1998
). Our data may explain this apparent discrepancy, as they suggest that polyglutamine repeats may be neutral in some proteins and not in others and that rapidly evolving repeats are more nearly neutral than conserved repeats. Searches for a functional role for polyglutamine repeats in proteins should therefore focus on proteins, such as those in our group B, that show conservation of Gln repeats over long periods of evolutionary time.
In conclusion, we suggest that the following interplay of forces influences the emergence of polyglutamine repeats. Glutamine repeats emerge preferentially in a sequence environment biased toward an overrepresentation of Gln codons (and possibly also related codons such as CCG). These concentrations occur in a class of proteins enriched in these codons by selection for a high content of Gln (as well as Pro, His, and Ser). Repeats emerge in regions of proteins that are subject to lower-than-average levels of purifying selection (Nishizawa, Nishizawa, and Kim 1999
), as indicated by their nonsynonymous divergence rate, although the whole proteins are not subject to atypically low levels of purifying selection. We therefore propose that emerging repeats evolve as essentially neutral structures. As such, we would expect them to be gained or lost in a manner that reflects the underlying dynamics of the mutational process, thought to be predominantly replication slippage. Recent evidence suggests that slippage shows a bias toward expansion for short repeats coupled with shortening of longer repeats (Ellegren 2000
; Xu et al. 2000
), which would give rise to net expansion of new repeats. However, changes in the strength of purifying selection acting on the region of the protein containing the repeat may result in the repeat ceasing to be a neutral structure and becoming fixed in length, as appears to have happened in the proteins in our group B, which contain a repeat in both species. Fixation of repeats, or the susceptibility of proteins to incorporation of them, may reflect the general functional class of the protein concerned, as certain classes of proteins in Saccharomyces cerevisiae, notably transcription factors and protein kinases, are significantly enriched in Gln repeats (Albà, Santibáñez-Koref, and Hancock 1999b
). If purifying selection plays an important role in regulating the emergence of CAG repeats in proteins, the recent suggestion that nonsynonymous substitution rates may vary systematically around mammalian genomes (Williams and Hurst 2000), perhaps reflecting variation in recombination frequency along chromosomes, may have implications for the chromosomal distribution of repeat-containing proteins.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: CAG repeats
triplet expansion diseases
simple sequences
natural selection
2 Address for correspondence and reprints: John M. Hancock, Department of Computer Science, Royal Holloway University of London, Egham, Surrey TW20 0EX, United Kingdom. j.hancock{at}dcs.rhul.ac.uk
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Abbott, C., and D. Chambers. 1994. Analysis of CAG trinucleotide repeats from mouse cDNA sequences. Ann. Hum. Genet. 58:8794[ISI][Medline]
Albà, M. M., M. F. Santibáñez-Koref, and J. M. Hancock. 1999a. Conservation of polyglutamine tract size between mice and humans depends on codon interruption. Mol. Biol. Evol. 16:16411644
. 1999b. Amino acid reiterations in yeast are overrepresented in particular classes of proteins and show evidence of a slippage-like mutational process. J. Mol. Evol. 49:789797
Albanese, V., S. Holbert, C. Saada et al. (14 co-authors). 1998. CAG/CTG and CGG/GCC repeats in human brain reference cDNAs: outcome in searching for new dynamic mutations. Genomics 47:414418
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403410[ISI][Medline]
Amos, W. 1999. A comparative approach to the study of microsatellite evolution. Pp. 6679 in D. B. Goldstein and C. Schlötterer, eds. Microsatellites: evolution and applications. Oxford University Press, Oxford, England
Aoki, M., L. Koranyi, A. C. Riggs et al. (11 co-authors). 1996. Identification of trinucleotide repeat-containing genes in human pancreatic islets. Diabetes 45:157164
Brock, G. J. R., N. H. Anderson, and D. G. Monckton. 1999. Cis-acting modifiers of expanded CAG/CTG triplet repeat expandability: associations with flanking GC content and proximity to CpG islands. Hum. Mol. Genet. 8:10611067
Brohede, J., and H. Ellegren. 1999. Microsatellite evolution: polarity of substitutions within repeats and neutrality of flanking sequences. Proc. R. Soc. Lond. B Biol. Sci. 266:825833[ISI][Medline]
Bulle, F., N. Chiannilkulchai, A. Pawlak, J. Weissenbach, G. Gyapay, and G. Guellaen. 1997. Identification and chromosomal localization of human genes containing CAG/CTG repeats expressed in testis and brain. Genome Res. 7:705715
Chambers, D. M., and C. M. Abbott. 1996. Isolation and mapping of novel mouse brain cDNA clones containing trinucleotide repeats, and demonstration of novel alleles in recombinant inbred strains. Genome Res. 6:715723[Abstract]
Djian, P., J. M. Hancock, and H. S. Chana. 1996. Codon repeats in genes associated with human diseases: fewer repeats in the genes of nonhuman primates and nucleotide substitutions concentrated at the sites of reiteration. Proc. Natl. Acad. Sci. USA 93:417421
Ellegren, H. 2000. Heterogeneous mutation processes in human microsatellite DNA sequences. Nat. Genet. 24:400402[ISI][Medline]
Ellegren, H., C. R. Primmer, and B. C. Sheldon. 1995. Microsatellite evolution: directionality or bias? Nat. Genet. 11:360362[Medline]
Genetics Computer Group. 1997. Wisconsin package. Version 9.1. GCG Genetics Computer Group. 1997. Wisconsin package. Version 9.1. GCG, Madison, Wis
Graur, D. 1985. Amino acid composition and the evolutionary rates of protein-coding genes. J. Mol. Evol. 22:5362[ISI][Medline]
Green, H., and N. Wang. 1994. Codon reiteration and the evolution of proteins. Proc. Natl. Acad. Sci. USA 91:42984302
Hancock, J. M., P. J. Shaw, F. Bonneton, and G. A. Dover. 1999. High sequence turnover in the regulatory regions of the developmental gene hunchback in insects. Mol. Biol. Evol. 16:253265[Abstract]
Hein, J. J. 1990. Unified approach to alignment and phylogenies. Methods Enzymol. 183:626645[ISI][Medline]
Higgins, D. G., and P. M. Sharp. 1989. Fast and sensitive multiple sequence alignments on a microcomputer. Comput. Appl. Biosci. 5:151153[Abstract]
Jiang, J. X., R. H. Deprez, E. C. Zwarthoff, and P. H. Riegman. 1995. Characterization of four novel CAG repeat-containing cDNAs. Genomics 30:9193
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Jurka, J., and C. Pethiyagoda. 1995. Simple repetitive DNA sequences from primates: compilation and analysis. J. Mol. Evol. 40:120126[ISI][Medline]
Karlin, S., and C. Burge. 1996. Trinucleotide repeats and long homopeptides in genes and proteins associated with nervous system disease and development. Proc. Natl. Acad. Sci. USA 93:15601565
Kazemi-Esfarjani, P., M. A. Trifiro, and L. Pinsky. 1995. Evidence for a repressive function of the long polyglutamine tract in the human androgen receptor: possible pathogenetic relevance for the (CAG)n-expanded neuronopathies. Hum. Mol. Genet. 4:523527[Abstract]
Kim, S. J., B. H. Shon, J. H. Kang, K. S. Hahm, O. J. Yoo, Y. S. Park, and K. K. Lee. 1997. Cloning of novel trinucleotide-repeat (CAG) containing genes in mouse brain. Biochem. Biophys. Res. Commun. 240:239243[ISI][Medline]
King, B. L., G. Sirugo, J. H. Nadeau, T. J. Hudson, K. K. Kidd, B. M. Kacinski, and M. Schalling. 1998. Long CAG/CTG repeats in mice. Mamm. Genome 9:392393
Kruglyak, S., R. T. Durrett, M. D. Schug, and C. F. Aquadro. 1998. Equilibrium distributions of microsatellite repeat length resulting from a balance between slippage events and point mutations. Proc. Natl. Acad. Sci. USA 95:1077410778
Kumar, S., T. Tamura, and M. Nei. 1993. MEGA: molecular evolutionary genetics analysis. Version 1.01. Pennsylvania State University, University Park
Lanz, R. B., S. Wielands, M. Hug, and S. Rusconi. 1995. A transcriptional repressor obtained by alternative translation of a trinucleotide repeat. Nucleic Acids Res. 23:138145[Abstract]
Li, S. H., M. G. McInnis, R. L. Margolis, S. E. Antonarakis, and C. A. Ross. 1993. Novel triplet repeat containing genes in human brain: cloning, expression, and length polymorphisms. Genomics 16:572579
Makalowski, W., and M. S. Boguski. 1998. Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. Proc. Natl. Acad. Sci. USA 95:94079412
Margolis, R. L., M. R. Abraham, S. B. Gatchell, S. H. Li, A. S. Kidwai, T. S. Breschel, O. C. Stine, C. Callahan, M. G. McInnis, and C. A. Ross. 1997. cDNAs with long CAG trinucleotide repeats from human brain. Hum. Genet. 100:114122[ISI][Medline]
Matassi, G., P. M. Sharp, and C. Gautier. 1999. Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol. 9:786791[ISI][Medline]
Michalakis, Y., and M. Veuille. 1996. Length variation of CAG/CAA trinucleotide repeats in natural populations of Drosophila melanogaster and its relation to the recombination rate. Genetics 143:17131725
Morin, P. A., P. Mahboubi, S. Wedel, and J. Rogers. 1998. Rapid screening and comparison of human microsatellite markers in baboons: allele size is conserved, but allele number is not. Genomics 53:1220
Mouchiroud, D., C. Gautier, and G. Bernardi. 1995. Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of nonsynonymous substitutions. J. Mol. Evol. 40:107113[ISI][Medline]
Nakachi, Y., T. Hayakawa, H. Oota, K. Sumiyama, L. Wang, and S. Ueda. 1997. Nucleotide compositional constraints on genomes generate alanine-, glycine-, and proline-rich structures in transcription factors. Mol. Biol. Evol. 14:10421049[Abstract]
Nakamura, Y., T. Gojobori, and T. Ikemura. 2000. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res. 25:244245
Neri, C., V. Albanese, A. S. Lebre et al. (23 co-authors). 1996. Survey of CAG/CTG repeats in human cDNAs representing new genes: candidates for inherited neurological disorders. Hum. Mol. Genet. 5:10011009
Nishizawa, M., and K. Nishizawa. 1998. Biased usages of arginines and lysines in proteins are correlated with local-scale fluctuations of the G + C content of DNA sequences. J. Mol. Evol. 47:385393[ISI][Medline]
Nishizawa, K., M. Nishizawa, and K. S. Kim. 1999. Tendency for local repetitiveness in amino acid usages in modern proteins. J. Mol. Biol. 294:937953[ISI][Medline]
Ohta, T., and Y. Ina. 1995. Variation in synonymous substitution rates among mammalian genes and the correlation between synonymous and nonsynonymous divergences. J. Mol. Evol. 41:717720[ISI][Medline]
Pawlak, A., N. Chiannikulchai, W. Ansorge, F. Bulle, J. Weissenbach, G. Gyapay, and G. Guellaen. 1998. Identification and mapping of 26 human testis mRNAs containing CAG/CTG repeats. Mamm. Genome 9:745748
Pearson, W. R., and D. J. Lipman. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85:24442448
Pinto, M., and C. G. Lobe. 1996. Products of the grg (Groucho-related gene) family can dimerize through the amino-terminal Q domain. J. Biol. Chem. 271:3302633031
Reddy, P. H., E. Stockburger, P. Gillevet, and D. A. Tagle. 1997. Mapping and characterization of novel (CAG)n repeat cDNAs from adult human brain derived by the oligo capture method. Genomics 46:174182
Riggins, G. J., L. K. Lokey, J. L. Chastain, H. A. Leiner, S. L. Sherman, K. D. Wilkinson, and S. T. Warren. 1992. Human genes containing polymorphic trinucleotide repeats. Nat. Genet. 2:186191[ISI][Medline]
Rubinsztein, D. C. 1999. Trinucleotide expansion mutations cause diseases which do not conform to classical Mendelian expectations. Pp. 8097 in D. B. Goldstein and C. Schlötterer, eds. Microsatellites: evolution and applications. Oxford University Press, Oxford, England
Rubinsztein, D. C., B. Amos, and G. Cooper. 1999. Microsatellite and trinucleotide-repeat evolution: evidence for mutational bias and different rates of evolution in different lineages. Philos. Trans. R. Soc. Lond. B Biol. Sci. 354:10951099[ISI][Medline]
Rubinsztein, D. C., W. Amos, J. Leggo, S. Goodburn, S. Jain, S. H. Li, R. L. Margolis, C. A. Ross, and M. A. Ferguson-Smith. 1995a. Microsatellite evolutionevidence for directionality and variation in rate between species. Nat. Genet. 10:337343
Rubinsztein, D. C., W. Amos, J. Leggo, S. Goodburn, R. S. Ramesar, J. Old, R. Bontrop, R. McMahon, D. E. Barton, and M. A. Ferguson-Smith. 1994. Mutational bias provides a model for the evolution of Huntington's disease and predicts a general increase in disease prevalence. Nat. Genet. 7:525530[ISI][Medline]
Rubinsztein, D. C., J. Leggo, G. A. Coetzee, R. A. Irvine, M. Buckley, and M. A. Ferguson-Smith. 1995b. Sequence variation and size ranges of CAG repeats in the Machado-Joseph disease, spinocerebellar ataxia type 1 and androgen receptor genes. Hum. Mol. Genet. 4:15851590
Schmid, K. J., and D. Tautz. 1999. A comparison of homologous developmental genes from Drosophila and Tribolium reveals major differences in length and trinucleotide repeat content. J. Mol. Evol. 49:558566[ISI][Medline]
Schwechheimer, C., C. Smith, and M. W. Bevan. 1998. The activities of acidic and glutamine-rich transcriptional activation domains in plant cells: design of modular transcription factors for high-level expression. Plant Mol. Biol. 36:195204[ISI][Medline]
Stallings, R. L. 1994. Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. Genomics 21:116121
Tautz, D., and L. Nigro. 1998. Microevolutionary divergence pattern of the segmentation gene hunchback in Drosophila. Mol. Biol. Evol. 15:14031411
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680[Abstract]
Ticher, A., and D. Graur. 1989. Nucleic acid composition, codon usage, and the rate of synonymous substitution in protein-coding genes. J. Mol. Evol. 28:286298[ISI][Medline]
Treier, M., C. Pfeifle, and D. Tautz. 1989. Comparison of the gap segmentation gene hunchback between Drosophila melanogaster and Drosophila virilis reveals novel modes of evolutionary change. EMBO J. 8:15171525[Abstract]
Williams, E. J. B., and L. D. Hurst. 2000. The proteins of linked genes evolve at similar rates. Nature 407:900903
Xu, X., M. Peng, Z. Fang, and X. Xu. 2000. The direction of microsatellite mutations is dependent upon allele length. Nat. Genet. 24:396399[ISI][Medline]
Zuhlke, C., R. Kiehl, A. Johannsmeyer, K. H. Grzeschik, and E. Schwinger. 1999. Isolation and characterization of novel CAG repeat containing genes expressed in human brain. DNA Seq. 10:16[Medline]