Pole BioInformatique Lyonnais, Laboratoire BBE-UMR Centre National de la Recherche Scientifique 5558, Universite Claude BernardLyon 1, Villeurbanne, France;
and
Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, England
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Different selectionist models have been proposed. Notably, it was noticed that isochore patterns were remarkably different in fishes and amphibians (cold-blooded vertebrates) compared with birds and mammals (warm-blooded vertebrates). As a general rule, GC-rich isochores are absent from fish and amphibian genomes (Bernardi et al. 1985
; Bernardi 2000
). It is known that GC content affects the thermodynamic stability of DNA and RNA and also, possibly, that of proteins (hydrophobic amino acids are encoded by relatively GC-rich codons). It was therefore proposed that the acquisition of GC-rich isochores in birds and mammals reflected an adaptation to homeothermy (Bernardi and Bernardi 1986
; Bernardi 2000
). However, a recent analysis, albeit based on a small sample size, has demonstrated that the GC content of crocodile and turtle genes (cold-blooded reptiles) was as high as that of their orthologs in birds and mammals (Hughes, Zelus, and Mouchiroud 1999
). Therefore, the acquisition of GC-rich isochores occurred before the divergence between mammals and sauropsids (birds and reptiles), i.e., before the acquisition of homeothermy.
It was also postulated that GC-rich isochores were particularly rich in housekeeping genes, suggesting that these isochores could correspond to a particular chromatin structure, favorable for gene expression (Bernardi 1995
). However, we found that housekeeping genes were less frequent in GC-rich than in GC-poor isochores (Gonçalves, Duret, and Mouchiroud 2000)
, and we failed to find any evidence of selection on codon usage in housekeeping genes (Duret and Mouchiroud 2000
). Interestingly, some polymorphism data are consistent with selection on silent sites (Eyre-Walker 1999
). However, biased gene conversion might also explain the data. It should be stressed that the particular base composition of GC-rich isochores concerns not only coding regions, but also introns and intergenic regions and, notably, pseudogenes (Francino and Ochman 1999
). Therefore, if the acquisition of GC-rich isochores results from natural selection, this is due to a selective advantage not at the RNA or the protein level, but at the DNA level. In our opinion, it is hard to see what this selective advantage might be. Moreover, it is often noted that mammalian populations are so small that it is not possible that selection could act on such a weakly deleterious trait as a single silent mutation (Sharp et al. 1995
).
The alternative neutralist hypotheses that have been proposed to account for the acquisition and maintenance of GC-rich isochores point to some potential mutational processes that might vary around the genome: DNA repair (Filipski 1987
), mutational bias (Suoka 1988
), changes in nucleotide pools during replication (Wolfe, Sharp, and Li 1989
), or biased gene conversion associated with recombination (Eyre-Walker 1993
). Although these models are attractive, it should be noted that, for now, there are no data that directly demonstrate that mutation patterns vary according to the isochores.
One important observation has been recently put forward to try to distinguish between selectionist and neutralist hypotheses (Hughes and Yeager 1997
; Eyre-Walker 1999
; Bernardi 2000
). This observation is that the GC content at exonic fourfold-redundant sites (GC4) tends to be greater than the GC content of flanking introns (GCi) (D'Onofrio et al. 1991
). According to Eyre-Walker (1999)
, this pattern is consistent with a selectionist model. Let us suppose that selection might favor a particular local GC content: as the first two codon sites (GC12) in exons are constrained, selection favors exaggerated GC content at the third site to provide local compensation (Eyre-Walker 1999
). If GCi represents a selective optimum, the difference between GC12 and GCi should predict the amount of compensation that is necessary at GC4. The pattern of GC4 tending to be greater than GCi appears to exist only in GC-rich isochores. In GC-poor regions, the opposite pattern is found (fig. 1
). This is also consistent with the selectionist model. In GC-poor regions, GCi might be lower than GC12, so selection might favor GC4 being lower still to compensate.
|
For this to be a likely explanation, TEs must be adequately common. This seems to be the case. Recognizable TEs make up more than 40% of the human genome (Smit 1999
). In our data set of human introns (16 Mb from 1,396 genes) TEs constitute, on average, 35% of the introns. This must be the lowest estimate, as the programs to identify TEs fail to recognize them when they are >40% divergent. Hence, it is difficult to identify TEs resulting from ancient insertion events. Insertions and deletions also contribute to TE decay. However, such events are rare compared with substitutions: deletions are 40 times and insertions 100 times as infrequent as substitutions (Ophir and Graur 1997
). Thus, it is expected that TEs keep on contributing significantly to the length and G+C content of noncoding sequences long after they are no longer recognizable. In summary, it is likely that TE insertion is the major factor in the expansion of noncoding regions (both introns and intergenic sequences).
To see more precisely the effects that TEs have on GC content, it is necessary to consider a few more details of their biology. Consider, for example, Alu and L1 insertions. These insertions contribute to the isochore compartmentalization of the human genome due to their being differentially located: Alu's (about 50% G+C) are found preferentially in GC-rich isochores, whereas L1 elements (about 37% G+C) are found preferentially in GC-poor isochores (Duret, Mouchiroud, and Gautier 1995
). Recognizable Alu's make up 20% of GC-rich isochores and 7% of GC-poor isochores (Smit 1999
). By contrast, recognizable L1 elements make up 5% of GC-rich isochores and 20% of GC-poor isochores (Smit 1999
). The reason for the biases in distribution are unclear, but they probably do not reflect biases in patterns of insertion. They most likely represent biases in patterns of decay of fixed elements (see Discussion).
To see the effects that the observed pattern of TE distribution has on GC content, let us consider a noncoding region where the GC content at equilibrium would be 30% according to the background pattern of point mutation, with 20% L1 and 7% Alu. For illustrative purposes, we shall assume that selection does not act on GC content. The global GC content in this region would be
| (1) |
| (2) |
Importantly, if we suppose that GC4 is a reflection of local point-mutational pressures alone, then the above calculations suggest that this set of biases could explain why GCi is higher than GC4 in GC-poor isochores and why GCi is lower than GC4 in GC-rich isochores. Note that the impact of TE insertions on GC content is probably stronger than suggested by the above calculation. Indeed, besides Alu and L1 (which make up about 50%60% of all recognizable TEs; Smit 1999
), other TEs also affect introns' GC content. Moreover, many old TEs that are recognizable no more also contribute to GC content. Therefore, we appear to have a model that can compete with the compensatory selectionist model as an explanation for the pattern shown in figure 1
. We now examine the TE model in more detail to determine whether it is a likely explanation.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Predictions and Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The compensatory selectionist model makes two predictions. First, if GC12 is much lower than GCi, then GC4 must be especially high to compensate (and vice versa if GC12 is higher than GCi). Therefore, there should be a positive correlation between (x = GCi - GC12) and (y = GC4 - GCi). Second, if GC12 is equal to GCi, then no compensation is necessary, and GC4 should be the same as GC12 and GCi.
Consistent with the first prediction, we found a positive correlation between x and y (fig. 2 ). However, this correlation, although statistically significant, is very weak (R2 = 0.046). Moreover, this is not a discriminating prediction because the mutational model also predicts this correlation. According to this mutational model, GC4 - GCi (and GC12 - GCi, but to a lesser extent because of selective constraints on the encoded protein) indicates the difference in the mutation pattern between coding and noncoding regions. As detailed above, in GC-rich regions, the difference between the GC content of TEs and that of the background point-mutational processes in unique sequences is large, whereas in GC-poor regions, the difference is very much less pronounced. Thus, in GC-rich regions, we expect a relatively large positive difference between both GC4 and GC12 on the one hand and GCi on the other. By contrast, in GC-poor regions, the background GC pressures are quite similar to the GC content of the TEs. Hence GC4 and GC12 should both be only slightly less than GCi. This model therefore predicts that, considered over all isochores, x and y should positively covary.
|
We can provide two further tests of the TE model. We may suppose that long introns are more likely to have received a relatively recent TE insertion (and hence are far from mutational equilibrium). This supposition is based on two points of logic. First, a small intron with a recent TE insertion would no longer be small. Second, if TEs are fixed at random, for which there is some evidence (see Discussion), young ones are more likely to be found in long introns than in short ones, simply because a given long intron represents a larger target area. We can then predict that GC4 - GCi should tend to 0 as intron size tends to 0. This is equivalent to arguing that an intron that is too small to have any TEs should have the same GC content as flanking fourfold-degenerate sites. This is found to be true (fig. 3 ).
|
|
It should be noted, however, that this sample is very small (5 genes, only 68 informative substitutions), maybe too small to detect any significant difference. Unfortunately, the other species for which genomic sequences are available (rodents, bovines) are too distantly related for their introns to be aligned with those of humans. More data will therefore be necessary to more thoroughly test the TE model.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Additionally, we can probably reject a selectionist model that assumes that GCi is the selective optimum. While we struggle to imagine selective explanations for the other patterns that we have described, we are cautious about rejecting a selectionist model outright. What we can safely conclude is that there can exist a viable alternative mutationist model to account for the patterns of covariance of GC4 and GCi. The assertion that one must evoke selection to explain GC4 > GCi because repair enzymes cannot act differently in introns and in exons therefore seems to be unsafe, as this rejection makes no allowance for TE insertions.
Our findings, however, say relatively little about the evolution of isochores. Importantly, we have not explained why it is that the patterns of point mutations/substitutions vary within the genome. We have, therefore, failed to account for the strong correlation between GC4 and GCi. All that we have done is provide some evidence that the slope of the regression is not equal to 1 because of the impact of TEs on local intronic GC content.
Perhaps even more crucially, we have not given an account of the underlying causes of the patterns in TE distribution. The fact that young Alu's (<4.5% substitution) are distributed uniformly in all isochores (Smit 1999
) suggests that Alu's are inserted randomly. Why, then, do older Alu's show the classical bias toward GC-rich isochores? As the young Alu's are fixed, it cannot be owing to selection against them once they have inserted. Instead, the pattern seems most parsimoniously explained by different decay rates in different genomic regions, for which other evidence is available (Casane et al. 1997
). Whether this is due to differences in the strength of stabilizing selection or differences in the rate of mutation has not been resolved. For this reason, additionally, our analysis fails to contribute to the selectionist-versus-neutralist debate over the maintenance of isochores.
The question of whether isochores are the result of selection or mutation therefore remains open. Our results have nonetheless made clear that if there is selection, this selection occurs at the genomic level (not the RNA or the protein level), in both coding and noncoding regions. Hence, any model of selection on isochores should take TEs into account.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Abbreviations: GC4, GC content at fourfold-redundant sites in exons; GC12, GC content at the first two positions in codons; GCi, GC content of introns; TE, transposable element.
2 Keywords: GC content
isochores
transposable elements
introns
3 Address for correspondence and reprints: Laurence D. Hurst, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom. l.d.hurst{at}bath.ac.uk
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bernardi, G. 1995. The human genome: organization and evolutionary history. Annu. Rev. Genet. 29:445476.[ISI][Medline]
. 2000. Isochores and the evolutionary genomics of vertebrates. Gene 241:317.
Bernardi, G., and G. Bernardi. 1986. Compositional constraints and genome evolution. J. Mol. Evol. 24:111.[ISI][Medline]
Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunierrotival, and F. Rodier. 1985. The mosaic genome of warm-blooded vertebrates. Science 228:953958.
Casane, D., S. Boissinot, B. H. J. Chang, L. C. Shimmin, and W. H. Li. 1997. Mutation pattern variation among regions of the primate genome. J. Mol. Evol. 45:216226.[ISI][Medline]
D'Onofrio, G., D. Mouchiroud, B. Aissani, C. Gautier, and G. Bernardi. 1991. Correlations between the compositional properties of human genes, codon usage, and amino-acid-composition of proteins. J. Mol. Evol. 32:504510.[ISI][Medline]
Duret, L., and D. Mouchiroud. 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17:6874.
Duret, L., D. Mouchiroud, and C. Gautier. 1995. Statistical-analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J. Mol. Evol. 40:308317.[ISI][Medline]
Duret, L., D. Mouchiroud, and M. Gouy. 1994. HOVERGENa database of homologous vertebrate genes. Nucleic Acids Res. 22:23602365.[Abstract]
Eyre-Walker, A. 1993. Recombination and mammalian genome evolution. Proc. R. Soc. Lond. B Biol. Sci. 252:237243.[ISI][Medline]
. 1999. Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152:675683.
Filipski, J. 1987. Correlation between molecular clock ticking, codon usage, fidelity of DNA-repair, chromosome-banding and chromatin compactness in germline cells. FEBS Lett. 217:184186.[ISI][Medline]
Francino, H. P., and H. Ochman. 1999. Isochores result from mutation not selection. Nature 400:3031.
Gonçalves, I., L. Duret, and D. Mouchiroud. 2000. Nature and structure of human genes that generate retropseudogenes. Genome Res. 10:672678.
Gouy, M., F. Milleret, C. Mugnier, M. Jacobzone, and C. Gautier. 1984. ACNUCa nucleic-acid sequence data-base and analysis system. Nucleic Acids Res. 12:121127.[Abstract]
Hattori, M., A. Fujiyama, T. D. Taylor et al. (60 co-authors). 2000. The DNA sequence of human chromosome 21. Nature 405:311319.
Hughes, A. L., and M. Yeager. 1997. Comparative evolutionary rates of introns and exons in murine rodents. J. Mol. Evol. 45:125130.[ISI][Medline]
Hughes, S., D. Zelus, and D. Mouchiroud. 1999. Warm-blooded isochore structure in Nile crocodile and turtle. Mol. Biol. Evol. 16:15211527.[Abstract]
Jareborg, N., E. Birney, and R. Durbin. 1999. Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res. 9:815824.
Matassi, G., P. M. Sharp, and C. Gautier. 1999. Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol. 9:786791.[ISI][Medline]
MHC Sequencing Consortium. 1999. Complete sequence and gene map of a human major histocompatibility complex. Nature 401:921923.
Ophir, R., and D. Graur. 1997. Patterns and rates of indel evolution in processed pseudogenes from humans and murids. Gene 205:191202.
Sharp, P. M., M. Averof, A. T. Lloyd, G. Matassi, and J. F. Peden. 1995. DNA-sequence evolution: the sounds of silence. Philos. Trans. R. Soc. Lond. B Biol. Sci. 349:241247.[ISI][Medline]
Smit, A. F. 1999. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 9:657663.[ISI][Medline]
Suoka, N. 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:26532657.
Williams, E. J. B., and L. D. Hurst. 2000. The proteins of linked genes evolve at similar rates. Nature 407:900903.
Wolfe, K. H., P. M. Sharp, and W.-H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283285.