UMR 5558 CNRS, Université Lyon I, Villeurbanne, France
Correspondence: E-mail: meunier{at}biomserv.univ-lyon1.fr.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: isochore recombination biased gene conversion GC-content human genome
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The question of the evolutionary forces and molecular processes responsible for this peculiar genomic landscape of mammalian chromosomes remains highly debated (Eyre-Walker and Hurst 2001; Galtier et al. 2001). It has been recently conjectured that recombination might influence the evolution of base composition (Galtier et al. 2001). In mammals, several observations support this hypothesis. First, there is a positive correlation between the GC-content of genomic DNA and the local rate of crossover (Kong et al. 2002). Second, multigenic families that are subject to concerted evolution (i.e., ectopic recombination) are GC-rich (Galtier et al. 2001; Galtier 2003). Finally, it has been shown that the translocation of the Fxy gene into the pseudoautosomal region (where the recombination rate is extremely high) has resulted in a striking increase of its GC-content in the mouse lineage (Montoya-Burgos, Boursot, and Galtier 2003). The positive correlation between recombination rate and GC-content seems to be a widespread phenomenon, because it has also been observed in yeast, Drosophila, and nematode (Gerton et al. 2000; Marais, Mouchiroud, and Duret 2001; Birdsell 2002; Piganeau and Marais 2002).
Note, however, that such a correlation does not indicate which is the relation of causality between the two variables. According to some authors, the correlation observed in yeast does not occur because recombination affects substitution patterns, but rather because high GC-contents promote recombination (Gerton et al. 2000; Blat et al. 2002; Petes and Merker 2002). Moreover, it should be stressed that the correlation between crossover rate and GC-content is relatively weak in human (r2 = 0.15; Kong et al. 2002). Thus, at first sight, one might conclude that recombination is, at most, a minor determinant of base substitution patterns. Yet, it is important to note that this correlation was measured between two variables that reflect processes operating on different time scales: the rate of crossover is derived from linkage maps measured on extant populations, whereas the GC-content results from the average substitution pattern to which sequences have been subject over a long evolutionary period. Furthermore, many observations indicate that recombination rates may change rapidly during evolution as a consequence of inversions, translocations, and possibly point mutations (True, Mercer, and Laurie 1996; Depaulis et al. 2000; Jeffreys and Neumann 2002; Montoya-Burgos, Boursot, and Galtier 2003). For example, the human linkage map is 30% larger than that of baboon (Rogers et al. 2000). Variations of recombination rate have also been reported among more closely related species, and even within populations (True, Mercer, and Laurie 1996; Depaulis et al. 2000; Kong et al. 2002; Navarro and Barton 2003). To determine whether recombination drives the evolution of base composition, it is therefore necessary to use estimates of recombination rates and substitution patterns measured on similar time scales.
For this purpose, we analyzed the recent pattern of neutral substitutions, which occurred after the human and chimpanzee divergence, in 14.3 Mb of orthologous noncoding DNA sequences. This analysis demonstrates for the first time a genome-wide correlation between the substitution pattern and the rate of crossover, which indicates that recombination is the major determinant of the evolution of isochores in the genome of primates. The mechanism by which recombination affects the GC-content, as well as the consequences of variation of recombination rate on the evolution of base composition in the genome of sexual organisms, is discussed.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Recombination Rate
The rates of crossover in the human genome were taken from Kong et al. (2002). The average recombination rates in mouse chromosomes were computed from the Whitehead Mouse Genetic Map URL: http://carbon.wi.mit.edu:8000/cgi-bin/mouse/index#genetic.
Inferring Substitution Rates
Substitutions were inferred in human and chimpanzee lineages using unweighted parsimony on informative sites, with the baboon as outgroup. In our analyses, we only considered substitutions occurring in noncoding regions (as defined by Ensembl annotations). It is known that because of multiple substitutions, parsimony may be misleading. Given the evolutionary distances considered here, only hyper-mutable 5'-CpG-3' dinucleotides (hereafter noted CpG) are expected to generate homoplasy. We therefore considered three classes of sites: (1) sites not immediately preceded by a C in 5' or followed by a G in 3' in any of the human, chimpanzee, or baboon sequencesi.e., sites that are expected never to have been part of a CpG doublet since the last common ancestor of the three species (CpG-free sites); (2) sites for which the ancestral human/chimpanzee state inferred by parsimony was part of a CpG (CpG-anc sites); (3) other sites. All simulations (see below) revealed that (1) sites that we inferred as CpG-free truly evolved without being part of a CpG; (2) sites that we inferred as CpG-anc were truly part of an ancestral CpG before the human/chimpanzee split; (3) all substitution rates could be accurately estimated on CpG-free and CpG-anc sites, simply by dividing the number of observed changes by the number of inferred ancestral sites (rate estimation errors 3%). Using the first site category, we inferred by parsimony six rates (pooling together complementary rates): 4 transversion rates (A
T + T
A, G
C + C
G, A
C + T
G, C
A + G
T) and 2 transition rates (G
A + C
T, A
G + T
C. The transition rate at CpG sites (C
T + G
A) was estimated using the second site category. For better rate estimates, we pooled substitutions in human and chimpanzee lineages.
Simulations
We checked the quality of parsimony inferences on the three classes of sites by performing simulations with the following parameters: ratio transition over transversion: 2.75; increase of the transition rate in CpG: x10; human/chimpanzee divergence: 1%; human/baboon divergence: 5%. Simulations were driven as follows: we first let a sequence evolve until each site experienced an average of 10 substitutions (i.e., to reach equilibrium), using 0.36, 0.42, and 0.52 as GC-bias values; we then simulated the sequence evolution on the human/chimpanzee/baboon phylogenetic tree topology either leaving the GC-bias value unchanged or setting it to 0.35 (i.e., mimicking a non-equilibrium situation).
We also used simulations to estimate the time needed to reach the equilibrium GC-content. As above, we started with sequences at equilibrium (initial GC-content: GCi), and then changed the GC-bias parameter and let sequences evolve until they reach their new equilibrium (GCn). We plotted the GC-content as a function of the number of substitutions (d), and we measured on this graph the value d1/2 corresponding to half of the distance between GCi and GCn. We repeated simulations for different values of GCi (0.35, 0.40, 0.45, 0.50, and 0.55), and of GCn (0.35 or 0.40), similar to that observed in our data. The average value of d1/2 was 0.75 substitutions per site (varying from 0.7 to 0.8, depending on the values of GCi and GCn). Given the per-year substitution rate observed between human and chimpanzee, this value of d1/2 corresponds approximatively to 750 Myr.
Analysis of Regional Substitution Patterns
Human autosomal chromosomes were divided into 1 Mb non-overlapping windows (referred to as loci). A locus was retained for the analysis only if (1) it was associated with a marker from the genetic map (if a locus was associated with more than one marker, we used the mean crossover rate value for the analysis); (2) it contained alignments with more than 100 AT GC (i.e., from A or T to G or C or the opposite) substitutions in human and chimpanzee lineages; (3) AT
GC (i.e., from A or T to G or C) and GC
AT substitution distributions presented no significant difference among both lineages (Fisher exact test, P > 0.05). A total of 33 loci were finally selected for the analysis, with a mean of 1,743 inferred AT
GC substitutions in human and chimpanzee lineages. It is possible to compute the GC-content toward which a sequence is evolving, given its present substitution pattern (the equilibrium GC-content, hereafter denoted as GC*). GC* is generally computed by using the model proposed by Sueoka (1962): under the assumption that all sites within a sequence evolve independently of each other, then GC* should converge to u/(u + v), where u and v are, respectively, the AT
GC and GC
AT substitution rates. This assumption is, however, not valid in vertebrates, where it is known that the rate of mutation of a given base depends on the nature of its neighboring bases, essentially because of the hypermutability effects of CpG dinucleotides (Arndt, Petrov, and Hwa 2003). We therefore used the seven substitution rates described above as input to derive GC* using the sequence evolution model of Arndt, Burge, and Hwa (2003), which takes into account not only the different transversion and transition rates, but also the high transition rate at CpG dinucleotides. The error in GC* estimates resulting from the use of parsimony appeared negligible in all our simulations (less than 1% error).
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
It is known that the two sexes contribute differently to the number of mutations in the human genome: the rate of mutation is 4 to 6 times higher in male than in females (reviewed in Li, Yi, and Makova 2002). There are also some important variations in the rate of crossover according to sex. In humans, the average rate of crossover is 1.65 times higher in the autosomes of females than in those of males. Thus, females probably contribute more than males to the increase in GC-content. Interestingly, GC* correlates more strongly with sex-averaged rates (r2 = 0.61, fig. 1) or female crossover rates (r2 = 0.69), than with male crossover rates (r2 = 0.21), suggesting that the impact of recombination on base composition might be female-driven. However, more data will be necessary to determine whether this difference is significant or not (the correlation is significantly weaker in males, but the test becomes nonsignificant after the removal of only one highly recombinant locus [data not shown]).
How Does Recombination Influence GC-Content?
Three hypotheses have been proposed to explain how recombination might influence base composition: (1) recombination promotes the creation of new GC-alleles via mutation; (2) recombination favors the spread of GC-alleles when both AT and GC alleles are present in a population (i.e., there is a fixation bias toward GC alleles), via biased gene conversion (BGC) (Galtier et al. 2001); (3) there is a selective pressure in favor of a high GC-content, and GC* increases with high crossover rates because selection is more efficient (Charlesworth 1994). Among these three models, the selectionist one appears very unlikely. Indeed, the increase in GC-content induced by a single AT GC mutation, occurring in megabase-long noncoding sequences, is extremely small. Given the limited population sizes of mammals, it is totally unrealistic that this tiny increase in GC-content might be gripped by natural selection. Conversely, various observations support the BGC model. Notably, there is a fixation bias toward GC-alleles (Duret et al. 2002; Webster, Smith, and Ellegren 2003), which is consistent with the BGC hypothesis, but not with the simple mutational model. Moreover, in vivo repair of GC:AT mismatches is indeed biased in favor of GC-alleles (Birdsell 2002), and such bias is expected to induce BGC (Galtier et al. 2001). It should be borne in mind that the BGC model does not exclude the mutational model. Indeed, various observations suggest that recombination has an influence on the mutation rate in humans (Lercher and Hurst 2002; Hellmann et al. 2003); yet, it is not clear whether recombination favors the creation of new GC-alleles. More polymorphism data will be needed to assess the role of mutational bias related to recombination with respect to GC-content evolution.
It is important to note that the crossover rate is not necessarily perfectly correlated with the recombination rate. Each recombination event involves nonreciprocal DNA exchange (i.e., gene conversion), but only a fraction of these recombination events are resolved to give a crossover. There is evidence that the proportion of recombination events that lead to crossover varies according to species (Kleckner 1996) and within genomes (Langley et al. 2000). If the BGC model is correct, we expect that GC* will correlate more strongly with the rate of recombination than with the rate of crossover. Thus, the correlation presented in figure 1 might in fact underestimate the real relationship between GC* and recombination.
Evolution of the Isochore Structure in Human and Chimpanzee
The second important result is presented in figure 3, which shows the relationship between the present base composition of a locus (GC), and the composition toward which it is evolving (GC*). GC and GC* are only weakly correlated, which indicates that the present substitution pattern is very different from what it had been in the past. Thus, the present pattern is profoundly modifying the isochore organization of our genome, which is evolving toward a more homogeneous and GC-poor base compositon (GC: 35%55%; GC*: 33%42%). This observation confirms recent works showing a strong erosion of GC-rich isochores (Duret et al. 2002; Arndt, Petrov, and Hwa 2003; Webster, Smith, and Ellegren 2003). Note that contrary to our findings, Webster, Smith, and Ellegren (2003) reported an increase of GC-content in GC-poor isochores. However, this observation appears to be an artifact, because their estimate of GC* was based on the simple Sueoka model (Sueoka 1962), and did not take into account the effect of CpGs (data not shown).
The relative independence of GC and both of the current crossover rates (r2 = 0.21; fig. 2) and GC* (r2 = 0.16; fig. 3) is probably due to changes of recombination rates during evolutionary time. As previously mentioned, rates of recombination are highly variable (True, Mercer, and Laurie 1996; Depaulis et al. 2000; Jeffreys and Neumann 2002; Montoya-Burgos, Boursot, and Galtier 2003). For example, the rate of crossover is twice as high in the human genome as in mice (fig. 4). Furthermore, under the BGC model, the overall GC* also depends on the effective population size (Ne) (Nagylaki 1983), another parameter expected to vary rapidly during evolution. Conversely, the evolution of base composition is a relatively slow process: we estimated that the time needed to reduce the distance from equilibrium in base composition by half is about 750 Myr (Methods). It seems therefore likely that the GC* of most loci will significantly change before equilibrium is met. This suggests that the situation of equilibrium in base composition would be rather an exception than the rule in genomes which GC-content is driven by recombination. Interestingly, after only 6 Myr of divergence, about 8% of human and chimpanzees orthologous loci (3 over 36) have experienced drastic changes in GC*. We propose that these changes are a consequence of variations in recombination rates.
|
Concluding Remarks
Our analysis is restricted to primates, but the results presented here are likely to apply for a wide range of organisms. Indeed, GC-biased DNA repair has been demonstrated in yeasts and vertebrates and conjectured in most organisms, possibly reflecting an adaptation to frequent GC AT mutations (Birdsell 2002; Marais 2003). Moreover, positive correlations between recombination rate and GC-content have been observed not only in mammals but also in yeast, Drosophila, and nematode (Gerton et al. 2000; Marais, Mouchiroud, and Duret 2001; Birdsell 2002; Kong et al. 2002; Piganeau and Marais 2002). We therefore expect recombination to drive base composition in genomes of a broad range of sexual organisms with GC-biased DNA repair. Such an evolutionary mode for base composition profoundly modifies our views of neutral evolution. Indeed, it violates fundamental population genetic statements, stipulating that under neutral evolution, the substitution rate equals the mutation rate and that the dynamic of neutral alleles depends only on genetic drift (Otto 2000). These assumptions are the basis of many tests for selection extensively used in the literature (Otto 2000). Thus, this study highlights the importance of neutral side-effects associated with recombination on genome evolution (Marais 2003) and the need to incorporate them into current evolutionary models to accurately detect natural selection within genomes.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Arndt, P. F., C. B. Burge, and T. Hwa. 2003. DNA sequence evolution with neighbor-dependent mutation. J. Comput. Biol. 10:313-322.[CrossRef][ISI][Medline]
Arndt, P. F., D. A. Petrov, and T. Hwa. 2003. Distinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation. Mol. Biol. Evol. 20:1887-1896.
Bernardi, G. 2000. Isochores and the evolutionary genomics of vertebrates. Gene 241:3-17.[CrossRef][ISI][Medline]
Birdsell, J. A. 2002. Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol. Biol. Evol. 19:1181-1197.
Birdsell, J. A., and C. Wills. 2003. The evolutionary origin and maintenance of sexual recombination: a review of contemporary models. Evol. Biol. 33:27-38.[ISI]
Blat, Y., R. U. Protacio, N. Hunter, and L. Kleckner. 2002. Physical and functional interactions among basic chromosome organizational features govern early steps of meiotic chiasma formation. Cell 111:791-802.[CrossRef][ISI][Medline]
Burt, D. W. 2002. Origin and evolution of avian microchromosomes. Cytogenet. Genome Res. 96:97-112.[CrossRef][ISI][Medline]
Charlesworth, B. 1994. Genetic recombination: patterns in the genome. Curr. Biol. 4:182-184.[ISI][Medline]
de Massy, B. 2003. Distribution of meiotic recombination sites. Trends Genet. 19:514-522.[CrossRef][ISI][Medline]
Depaulis, F., L. Brazier, S. Mousset, A. Turbe, and M. Veuille. 2000. Selective sweep near the In(2L)t inversion breakpoint in an African population of Drosophila melanogaster. Genet. Res. 76:149-158.[CrossRef][ISI][Medline]
Duret, L., D. Mouchiroud, and C. Gautier. 1995. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J. Mol. Evol. 40:308-317.[ISI][Medline]
Duret, L., M. Semon, G. Piganeau, D. Mouchiroud, and N. Galtier. 2002. Vanishing GC-rich isochores in mammalian genomes. Genetics 162:1837-1847.
Eyre-Walker, A., and L. D. Hurst. 2001. The evolution of isochores. Nat. Rev. Genet. 2:549-555.[CrossRef][ISI][Medline]
Galtier, N. 2003. Gene conversion drives GC content evolution in mammalian histones. Trends Genet. 19:65-68.[CrossRef][ISI][Medline]
Galtier, N., G. Piganeau, D. Mouchiroud, and L. Duret. 2001. GC content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics 159:907-911.
Gerton, J. L., J. DeRisi, R. Shroff, M. Litchen, P. O. Brown, and T. D. Petes. 2000. Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 97:11383-11390.
Hellmann, I., I. Ebersberger, S. Ptak, S. Pbo, and M. Przeworski. 2003. A neutral explanation for the correlation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72:1527-1535.[CrossRef][ISI][Medline]
Holn, M., S. Kurtz, and E. Ohlebusch. 2002. Efficient multiple genome alignment. Bioinformatics S1:S312-S320.
Jabbari, K., and G. Bernardi. 1998. CpG doublets, CpG islands and Alu repeat elements in long human DNA sequences from different isochores families. Gene 224:123-128.[CrossRef][ISI][Medline]
Jeffreys, A. J., L. Kauppi, and R. Neumann. 2001. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat. Genet. 29:217-222.[CrossRef][ISI][Medline]
Jeffreys, A. J., and R. Neumann. 2002. Reciprocal crossover asymmetry and meiotic drive in a human recombination hot spot. Nat. Genet. 31:267-271.[CrossRef][ISI][Medline]
Kleckner, N. 1996. Meiosis: how could it work ? Proc. Natl. Acad. Sci. USA 93:8167-8174.
Kong, A., D. F. Gudbjartsson, G. M. Jonsdottir, S. A. Gudjonsson, B. Richardsson, S. Sigurdardottir, J. Barnard, B. Hallbeck, G. Masson, and A. Shlien, et al. 2002. A high-resolution recombination map of the human genome. Nat. Genet. 31:241-247.[CrossRef][ISI][Medline]
Lander, E. S., L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, and W. FitzHugh, et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921.[CrossRef][ISI][Medline]
Langley, C. H., B. P. Lazzaro, W. Phillips, E. Heikkinen, and J. M. Braverman. 2000. Linkage disequilibria and the site frequency spectra in the su(s) and su(w(a)) regions of the Drosophila melanogaster X cromosome. Genetics 156:1837-1852.
Lercher, M. J., and L. D. Hurst. 2002. Human SNP variability and mutation rate are higher in region of high recombination. Trends Genet. 18:337-340.[CrossRef][ISI][Medline]
Li, W., S. Yi, and K. Makova. 2002. Male-driven evolution. Curr. Opin. Genet. Dev. 12:650-656.[CrossRef][ISI][Medline]
Marais, G. 2003. Biaised gene conversion: implications for genome and sex evolution. Trends Genet. 19:330-338.[CrossRef][ISI][Medline]
Marais, G., D. Mouchiroud, and L. Duret. 2001. Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes. Proc. Natl Acad. Sci. USA 98:5688-5692.
Montoya-Burgos, J. I., P. Boursot, and N. Galtier. 2003. Recombination explains isochores in mammalian genomes. Trends Genet. 19:128-130.[CrossRef][ISI][Medline]
Mouchiroud, D., G. D'Onofrio, B. Aissani, G. Macaya, C. Gautier, and G. Bernardi. 1991. The distribution of genes in the human genome. Gene 100:181-187.[CrossRef][ISI][Medline]
Mouchiroud, D., C. Gautier, and G. Bernardi. 1988. The compositional distribution of coding sequences and DNA molecules in humans and murids. J. Mol. Evol. 27:311-320.[ISI][Medline]
Nagylaki, T. 1983. Evolution of a finite population under gene conversion. Proc. Natl Acad. Sci. USA 80:6278-6281.[Abstract]
Navarro, A., and N. H. Barton. 2003. Chromosomal speciation and molecular divergence-accelerated evolution in rearranged chromosomes. Science 300:321-324.
Nekrutenko, A., and W. H. Li. 2000. Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Res. 10:1986-1995.
Otto, S. P. 2000. Detecting the form of selection from DNA sequence data. Trends Genet. 16:526-529.[CrossRef][ISI][Medline]
Pardo-Manuel de Villena, F., and C. Sapienza. 2001. Female meiosis drives karyotypic evolution in mammals. Genetics 159:1179-1189.
Petes, T. D., and J. D. Merker. 2002. Context dependence of meiotic recombination hotspots in yeast. The relationship between recombination activity of a reporter construct and base composition. Genetics 162:2049-2052.
Piganeau, G., and G. Marais. 2002. Hill-Robertson interference is a minor determinant of variations in codon bias across Drosophila melanogaster and Caenorhabditis elegans genomes. Mol. Biol. Evol. 19:1399-1406.
Reich, D. E., S. F. Schaffner, M. J. Daly, G. McVean, J. C. Mullikin, J. C. Mullikin, J. M. Higgins, D. J. Richter, E. S. Lander, and D. Altshuler. 2002. Human genome sequence variation and the influence of gene history, mutation and recombination. Nat. Genet. 32:135-142.[CrossRef][ISI][Medline]
Rogers, J., M. C. Mahaney, S. M. Witte, S. Nair, D. Newman, S. Wedel, L. A. Rodriguez, K. S. Rice, S. H. Slifer, and A. Perelygin, et al. 2000. A genetic linkage map of the baboon (Papio hamadryas) genome based on human microsatellite polymorphisms. Genomics 67:237-247.[CrossRef][ISI][Medline]
Smit, A. F. A. 1999. Interspersed repeats and others mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 9:657-663.[CrossRef][ISI][Medline]
Sueoka, N. 1962. On the genetic basis of variation and heterogeneity of dna composition. Proc. Natl. Acad. Sci. USA 48:582-592.[ISI][Medline]
True, J. R., J. M. Mercer, and C. C. Laurie. 1996. Differences in cross-over frequency and distribution among three sibling species of Drosophila. Genetics 142:507-523.
Watanabe, Y., A. Fujiyama, Y. Ichiba, M. Hattori, T. Yada, Y. Sakaki, and T. Ikemura. 2002. Chromosome-wide assessment of replication timing for human chromosomes 11q and 21q: disease-related genes in timing-switch regions. Hum. Mol. Genet. 11:13-21.
Waterston, R. H., K. Lindblad-Toh, E. Birney, J. Rogers, J. F. Abril, P. Agarwal, R. Agarwala, R. Ainscough, M. Alexandersson, and P. An, et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520-562.[CrossRef][ISI][Medline]
Webster, M. T., N. G. C. Smith, and H. Ellegren. 2003. Compositional evolution of non-coding DNA in the human and chimpanzee genomes. Mol. Biol. Evol. 20:278-286.