The Elevated GC Content at Exonic Third Sites Is Not Evidence Against Neutralist Models of Isochore Evolution

Laurent Duret and Laurence D. HurstGo,

Pole BioInformatique Lyonnais, Laboratoire BBE-UMR Centre National de la Recherche Scientifique 5558, Universite Claude Bernard–Lyon 1, Villeurbanne, France; and
Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, England


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Predictions and Results
 Discussion
 Acknowledgements
 literature cited
 
The human genome is divided into isochores, large stretches (>>300 kb) of genomic DNA with more or less consistent GC content. Mutational/neutralist and selectionist models have been put forward to explain their existence. A major criticism of the mutational models is that they cannot account for the higher GC content at fourfold-redundant silent sites within exons (GC4) than in flanking introns (GCi). Indeed, it has been asserted that it is hard to envisage a mutational bias explanation, as it is difficult to see how repair enzymes might act differently in exons and their flanking introns. However, this rejection, we note, ignores the effects of transposable elements (TEs), which are a major component of introns and tend to cause them to have a GC content different from (usually lower than) that dictated by point mutational processes alone. As TEs tend not to insert at the extremities of introns, this model predicts that GC content at the extremities of introns should be more like that at GC4 than are the intronic interiors. This we show to be true. The model also correctly predicts that small introns should have a composition more like that at GC4 than large introns. We conclude that the logic of the previous rejection of neutralist models is unsafe.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Predictions and Results
 Discussion
 Acknowledgements
 literature cited
 
Mammalian and bird genomes are divided into discrete blocks of distinct GC content, so-called isochores (Bernardi et al. 1985Citation ), that are thought to range from 300 kb to over 1,000 kb in size. The existence of isochores has been recently confirmed by the study of the GC content of linked genes in rodents and humans (Matassi, Sharp, and Gautier 1999Citation ; Williams and Hurst 2000Citation ), as well as by the analysis of large genomic sequences from human chromosomes (MHC Sequencing Consortium 1999Citation ; Hattori et al. 2000Citation ). Understanding the evolutionary forces responsible for the evolution of isochores has been one of the foci of the debate between neutralists and selectionists but remains unresolved.

Different selectionist models have been proposed. Notably, it was noticed that isochore patterns were remarkably different in fishes and amphibians (cold-blooded vertebrates) compared with birds and mammals (warm-blooded vertebrates). As a general rule, GC-rich isochores are absent from fish and amphibian genomes (Bernardi et al. 1985Citation ; Bernardi 2000Citation ). It is known that GC content affects the thermodynamic stability of DNA and RNA and also, possibly, that of proteins (hydrophobic amino acids are encoded by relatively GC-rich codons). It was therefore proposed that the acquisition of GC-rich isochores in birds and mammals reflected an adaptation to homeothermy (Bernardi and Bernardi 1986Citation ; Bernardi 2000Citation ). However, a recent analysis, albeit based on a small sample size, has demonstrated that the GC content of crocodile and turtle genes (cold-blooded reptiles) was as high as that of their orthologs in birds and mammals (Hughes, Zelus, and Mouchiroud 1999Citation ). Therefore, the acquisition of GC-rich isochores occurred before the divergence between mammals and sauropsids (birds and reptiles), i.e., before the acquisition of homeothermy.

It was also postulated that GC-rich isochores were particularly rich in housekeeping genes, suggesting that these isochores could correspond to a particular chromatin structure, favorable for gene expression (Bernardi 1995Citation ). However, we found that housekeeping genes were less frequent in GC-rich than in GC-poor isochores (Gonçalves, Duret, and Mouchiroud 2000)Citation , and we failed to find any evidence of selection on codon usage in housekeeping genes (Duret and Mouchiroud 2000Citation ). Interestingly, some polymorphism data are consistent with selection on silent sites (Eyre-Walker 1999Citation ). However, biased gene conversion might also explain the data. It should be stressed that the particular base composition of GC-rich isochores concerns not only coding regions, but also introns and intergenic regions and, notably, pseudogenes (Francino and Ochman 1999Citation ). Therefore, if the acquisition of GC-rich isochores results from natural selection, this is due to a selective advantage not at the RNA or the protein level, but at the DNA level. In our opinion, it is hard to see what this selective advantage might be. Moreover, it is often noted that mammalian populations are so small that it is not possible that selection could act on such a weakly deleterious trait as a single silent mutation (Sharp et al. 1995Citation ).

The alternative neutralist hypotheses that have been proposed to account for the acquisition and maintenance of GC-rich isochores point to some potential mutational processes that might vary around the genome: DNA repair (Filipski 1987Citation ), mutational bias (Suoka 1988Citation ), changes in nucleotide pools during replication (Wolfe, Sharp, and Li 1989Citation ), or biased gene conversion associated with recombination (Eyre-Walker 1993Citation ). Although these models are attractive, it should be noted that, for now, there are no data that directly demonstrate that mutation patterns vary according to the isochores.

One important observation has been recently put forward to try to distinguish between selectionist and neutralist hypotheses (Hughes and Yeager 1997Citation ; Eyre-Walker 1999Citation ; Bernardi 2000Citation ). This observation is that the GC content at exonic fourfold-redundant sites (GC4) tends to be greater than the GC content of flanking introns (GCi) (D'Onofrio et al. 1991Citation ). According to Eyre-Walker (1999)Citation , this pattern is consistent with a selectionist model. Let us suppose that selection might favor a particular local GC content: as the first two codon sites (GC12) in exons are constrained, selection favors exaggerated GC content at the third site to provide local compensation (Eyre-Walker 1999Citation ). If GCi represents a selective optimum, the difference between GC12 and GCi should predict the amount of compensation that is necessary at GC4. The pattern of GC4 tending to be greater than GCi appears to exist only in GC-rich isochores. In GC-poor regions, the opposite pattern is found (fig. 1 ). This is also consistent with the selectionist model. In GC-poor regions, GCi might be lower than GC12, so selection might favor GC4 being lower still to compensate.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 1.—The relationship between GC4 and GCi for 1,396 human genes using orthogonal regression. GC4 = -0.427 + 2.118GCi; R2 = 0.586, P < 0.0001

 
It has been asserted that it is hard to envisage a mutational bias explanation for the difference between GC4 and GCi: since mutations affect equally exons and introns, one expects a priori that the base composition should be the same in all silent sites (introns and synonymous codon positions). The observation that GC4 is generally greater than GCi was therefore considered evidence against the mutational-bias model (Hughes and Yeager 1997Citation ; Bernardi 2000Citation ). We certainly agree that it is difficult to see how repair enzymes might act differently in exons and in flanking introns. However, we show here that this argument is not sufficient to reject the mutational-bias model because it overlooks the influence of transposable elements (TEs) on the base composition of noncoding sequences. While TEs can insert into introns, they cannot affect GC4. As TEs tend to have GC contents lower than those of the more GC-rich genes, their insertion must almost inevitably reduce the GC content of introns below that of the fourfold-degenerate sites in exons.

For this to be a likely explanation, TEs must be adequately common. This seems to be the case. Recognizable TEs make up more than 40% of the human genome (Smit 1999Citation ). In our data set of human introns (16 Mb from 1,396 genes) TEs constitute, on average, 35% of the introns. This must be the lowest estimate, as the programs to identify TEs fail to recognize them when they are >40% divergent. Hence, it is difficult to identify TEs resulting from ancient insertion events. Insertions and deletions also contribute to TE decay. However, such events are rare compared with substitutions: deletions are 40 times and insertions 100 times as infrequent as substitutions (Ophir and Graur 1997Citation ). Thus, it is expected that TEs keep on contributing significantly to the length and G+C content of noncoding sequences long after they are no longer recognizable. In summary, it is likely that TE insertion is the major factor in the expansion of noncoding regions (both introns and intergenic sequences).

To see more precisely the effects that TEs have on GC content, it is necessary to consider a few more details of their biology. Consider, for example, Alu and L1 insertions. These insertions contribute to the isochore compartmentalization of the human genome due to their being differentially located: Alu's (about 50% G+C) are found preferentially in GC-rich isochores, whereas L1 elements (about 37% G+C) are found preferentially in GC-poor isochores (Duret, Mouchiroud, and Gautier 1995Citation ). Recognizable Alu's make up 20% of GC-rich isochores and 7% of GC-poor isochores (Smit 1999Citation ). By contrast, recognizable L1 elements make up 5% of GC-rich isochores and 20% of GC-poor isochores (Smit 1999Citation ). The reason for the biases in distribution are unclear, but they probably do not reflect biases in patterns of insertion. They most likely represent biases in patterns of decay of fixed elements (see Discussion).

To see the effects that the observed pattern of TE distribution has on GC content, let us consider a noncoding region where the GC content at equilibrium would be 30% according to the background pattern of point mutation, with 20% L1 and 7% Alu. For illustrative purposes, we shall assume that selection does not act on GC content. The global GC content in this region would be


(1)
In such a sequence, GCi would be around 33%, while GC4 would be lower, at 30%. In a region in which the GC content at equilibrium would be 80% according to the pattern of mutation, with 5% L1 and 20% Alu, the global GC content in this region would be:


(2)
In this sequence, GCi would be around 72%, while GC4 would be higher, at 80%.

Importantly, if we suppose that GC4 is a reflection of local point-mutational pressures alone, then the above calculations suggest that this set of biases could explain why GCi is higher than GC4 in GC-poor isochores and why GCi is lower than GC4 in GC-rich isochores. Note that the impact of TE insertions on GC content is probably stronger than suggested by the above calculation. Indeed, besides Alu and L1 (which make up about 50%–60% of all recognizable TEs; Smit 1999Citation ), other TEs also affect introns' GC content. Moreover, many old TEs that are recognizable no more also contribute to GC content. Therefore, we appear to have a model that can compete with the compensatory selectionist model as an explanation for the pattern shown in figure 1 . We now examine the TE model in more detail to determine whether it is a likely explanation.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Predictions and Results
 Discussion
 Acknowledgements
 literature cited
 
Human genes were extracted from HOVERGEN (Duret, Mouchiroud, and Gouy 1994Citation ), release 36 (June 15, 1999), using the ACNUC sequence retrieval system (Gouy et al. 1984Citation ). First, we selected all complete human CDSs with introns (excluding pseudogenes). Those containing partial introns were excluded, as were multiple copies of the same gene and alternative splicing variants. Genes for which introns have not been entirely sequenced but exons and partial introns have been concatenated in a single GenBank entry were also excluded. For each gene, the minimum and the average intron sizes were looked at, and the shortest ones were checked by eye and removed if suspect. Furthermore, cDNAs containing CDSs with a "join" were excluded, as these generally correspond to alternative splice forms. The set of genes was also contaminated with partial genes annotated as "complete." These were eliminated by examining the shortest CDS. We also excluded immunoglobulin and TCR genes. The list of 1,396 genes that we selected is available at http://pbil.univ-lyon1.fr/datasets/Duret_Hurst.html.


    Predictions and Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Predictions and Results
 Discussion
 Acknowledgements
 literature cited
 
A simple prediction of the TE model is that the difference between GC content of repeats within introns and that at GC4 should be small and negative in GC-poor isochores but large and positive in GC-rich isochores. We applied RepeatMasker to isolate TEs and found these predictions to be upheld: the difference between GC4 and intronic GC content within recognizable repeats in GC-poor isochores was -0.0262 (SD = 0.0928, N = 292, GC4 < 57%), but it was +0.269 (SD = 0.0658, N = 267, GC4 > 75%) in GC-rich ones. As expected, in GC-rich isochores, the difference between GC4 and intronic GC content was lower outside (0.219 SD = 0.0613) than within recognizable repeats (t = 17.3, P < 0.0001). However, the difference between GCi and GC4 remains relatively large, possibly because introns contain many unrecognizable TEs. Alternatively, there may be some truth in the compensatory selection explanation, which would also predict a large residual difference.

The compensatory selectionist model makes two predictions. First, if GC12 is much lower than GCi, then GC4 must be especially high to compensate (and vice versa if GC12 is higher than GCi). Therefore, there should be a positive correlation between (x = GCi - GC12) and (y = GC4 - GCi). Second, if GC12 is equal to GCi, then no compensation is necessary, and GC4 should be the same as GC12 and GCi.

Consistent with the first prediction, we found a positive correlation between x and y (fig. 2 ). However, this correlation, although statistically significant, is very weak (R2 = 0.046). Moreover, this is not a discriminating prediction because the mutational model also predicts this correlation. According to this mutational model, GC4 - GCi (and GC12 - GCi, but to a lesser extent because of selective constraints on the encoded protein) indicates the difference in the mutation pattern between coding and noncoding regions. As detailed above, in GC-rich regions, the difference between the GC content of TEs and that of the background point-mutational processes in unique sequences is large, whereas in GC-poor regions, the difference is very much less pronounced. Thus, in GC-rich regions, we expect a relatively large positive difference between both GC4 and GC12 on the one hand and GCi on the other. By contrast, in GC-poor regions, the background GC pressures are quite similar to the GC content of the TEs. Hence GC4 and GC12 should both be only slightly less than GCi. This model therefore predicts that, considered over all isochores, x and y should positively covary.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 2.—The relationship between x = GCi - GC12 and y = GC4 - GCi, using orthogonal regression: y = 0.167 + 3.399x; R2 = 0.046, P < 0.0001

 
The second prediction of the compensatory selectionist model is that the regression line between x and y should run through the origin (x = 0, y = 0). The intercept is, however, significantly different from (0, 0) (the intercept at x = 0 is 0.167; fig. 2 ). The 95% confidence intervals do not run close to the origin (see fig. 2 ). This is not consistent with the selectionist model. However, the model that we can reject is one that supposes that GCi is the same as the putative selective optimum GC level. While we can reject this model, we cannot reject more complicated compensatory models that might suppose that GCi is not precisely the same as the putative selective optimum.

We can provide two further tests of the TE model. We may suppose that long introns are more likely to have received a relatively recent TE insertion (and hence are far from mutational equilibrium). This supposition is based on two points of logic. First, a small intron with a recent TE insertion would no longer be small. Second, if TEs are fixed at random, for which there is some evidence (see Discussion), young ones are more likely to be found in long introns than in short ones, simply because a given long intron represents a larger target area. We can then predict that GC4 - GCi should tend to 0 as intron size tends to 0. This is equivalent to arguing that an intron that is too small to have any TEs should have the same GC content as flanking fourfold-degenerate sites. This is found to be true (fig. 3 ).



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3.—The relationship between log(mean intron size) and GC4 - GCi for GC-rich genes. Note that small introns have less discrepancy between GC4 and GCi: Y = -0.26 + 0.085X, R2 = 0.26, P < 0.0001, N = 452

 
Furthermore, it is known that TEs tend not to insert at the extremities of introns (Jareborg, Birney, and Durbin 1999Citation ), as would be expected by chance. Therefore, regardless of intron size, the extremities should show less of a difference between GC4 and GCi than the internal parts of the introns. We do not expect intronic extremities to have a GC content precisely the same as GC4, as some TE contamination is possible. In the GC-moderate and GC-rich regions, we find that the extremities of introns have a much lower difference between GC4 and GCi, as predicted (fig. 4 ). We cannot conceive of a selective reason for this, especially since we removed the constrained parts of the sequence.



View larger version (7K):
[in this window]
[in a new window]
 
Fig. 4.—The difference between GC4 and GCi in the three isochore types for the extremities of the introns and the interior of introns. Note that the discrepancy between GC4 and GCi is much reduced at the extremities. The extremities are the first and last 50 bp of introns except for signals for splice donor (positions +1 to +6) and splice acceptor (positions -20 to -1 relative to the intron/exon junction). The average number of introns in the data set (excluding genes with only one intron) is 7.6, and the average length of extremities is 551 nt

 
According to the TE model, the pattern of substitution should be similar in introns and coding sequences (NB: because of neighboring effects, mutation patterns may in fact be slightly different in the two compartments). We tested this prediction by looking at the substitution pattern in GC-rich genes (i.e., genes for which the difference between GC4 and GCi was the most striking). This analysis required orthologous genes (with intron sequences) from at least three species (two species plus an outgroup to orientate substitutions) that were close enough that introns were alignable. We used HOVERGEN to search for triplets of orthologous genes (for which at least one intron was available) in the human, a nonhuman homonidae (e.g., the chimpanzee), and another catarrhini as outgroup (e.g., the macaca). We found eight genes that fit these criteria, among which five were GC-rich (GC3 >= 75%): c-myc, protamine 1, alpha-globin, theta-globin, and insulin. The alignments of these five genes were pooled (935 codons, GC3 = 81%; 2,865 bp from eight introns, GCi = 58%). We measured base-specific substitution rates u (AT->GC) and v (GC->AT) in introns and third codon positions. In agreement with the TE model, we found that patterns of substitution (u/(u + v)) were very similar in introns and third codon positions: for third codon positions, u = 0.0237, v = 0.0235, and u/(u + v) = 0.50; for introns, u = 0.0165, v = 0.0165, and u/(u + v) = 0.51.

It should be noted, however, that this sample is very small (5 genes, only 68 informative substitutions), maybe too small to detect any significant difference. Unfortunately, the other species for which genomic sequences are available (rodents, bovines) are too distantly related for their introns to be aligned with those of humans. More data will therefore be necessary to more thoroughly test the TE model.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Predictions and Results
 Discussion
 Acknowledgements
 literature cited
 
We have shown that the correlation of x = GCi - GC12 and y = GC4 - GCi across different isochores is consistent with the TE model. We have also shown that this model correctly predicts that small introns have a minimal difference in GC4 and GCi and that the degree of difference between GC4 and GCi should be attenuated when one examines the extremities of introns only. In short, we have failed to falsify the TE model.

Additionally, we can probably reject a selectionist model that assumes that GCi is the selective optimum. While we struggle to imagine selective explanations for the other patterns that we have described, we are cautious about rejecting a selectionist model outright. What we can safely conclude is that there can exist a viable alternative mutationist model to account for the patterns of covariance of GC4 and GCi. The assertion that one must evoke selection to explain GC4 > GCi because repair enzymes cannot act differently in introns and in exons therefore seems to be unsafe, as this rejection makes no allowance for TE insertions.

Our findings, however, say relatively little about the evolution of isochores. Importantly, we have not explained why it is that the patterns of point mutations/substitutions vary within the genome. We have, therefore, failed to account for the strong correlation between GC4 and GCi. All that we have done is provide some evidence that the slope of the regression is not equal to 1 because of the impact of TEs on local intronic GC content.

Perhaps even more crucially, we have not given an account of the underlying causes of the patterns in TE distribution. The fact that young Alu's (<4.5% substitution) are distributed uniformly in all isochores (Smit 1999Citation ) suggests that Alu's are inserted randomly. Why, then, do older Alu's show the classical bias toward GC-rich isochores? As the young Alu's are fixed, it cannot be owing to selection against them once they have inserted. Instead, the pattern seems most parsimoniously explained by different decay rates in different genomic regions, for which other evidence is available (Casane et al. 1997Citation ). Whether this is due to differences in the strength of stabilizing selection or differences in the rate of mutation has not been resolved. For this reason, additionally, our analysis fails to contribute to the selectionist-versus-neutralist debate over the maintenance of isochores.

The question of whether isochores are the result of selection or mutation therefore remains open. Our results have nonetheless made clear that if there is selection, this selection occurs at the genomic level (not the RNA or the protein level), in both coding and noncoding regions. Hence, any model of selection on isochores should take TEs into account.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Predictions and Results
 Discussion
 Acknowledgements
 literature cited
 
We thank Dominique Mouchiroud, Manolo Gouy, Adam Eyre-Walker, James Randerson, Dan Graur, and Elizabeth Williams for their helpful comments. We also thank two anonymous referees for comments on an earlier version of this paper. This work was supported by the Centre National de la Recherche Scientifique. L.D.H. was funded by the Royal Society.


    Footnotes
 
Dan Graur, Reviewing Editor

1 Abbreviations: GC4, GC content at fourfold-redundant sites in exons; GC12, GC content at the first two positions in codons; GCi, GC content of introns; TE, transposable element. Back

2 Keywords: GC content isochores transposable elements introns Back

3 Address for correspondence and reprints: Laurence D. Hurst, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom. l.d.hurst{at}bath.ac.uk Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Predictions and Results
 Discussion
 Acknowledgements
 literature cited
 

    Bernardi, G. 1995. The human genome: organization and evolutionary history. Annu. Rev. Genet. 29:445–476.[ISI][Medline]

    ———. 2000. Isochores and the evolutionary genomics of vertebrates. Gene 241:3–17.

    Bernardi, G., and G. Bernardi. 1986. Compositional constraints and genome evolution. J. Mol. Evol. 24:1–11.[ISI][Medline]

    Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunierrotival, and F. Rodier. 1985. The mosaic genome of warm-blooded vertebrates. Science 228:953–958.

    Casane, D., S. Boissinot, B. H. J. Chang, L. C. Shimmin, and W. H. Li. 1997. Mutation pattern variation among regions of the primate genome. J. Mol. Evol. 45:216–226.[ISI][Medline]

    D'Onofrio, G., D. Mouchiroud, B. Aissani, C. Gautier, and G. Bernardi. 1991. Correlations between the compositional properties of human genes, codon usage, and amino-acid-composition of proteins. J. Mol. Evol. 32:504–510.[ISI][Medline]

    Duret, L., and D. Mouchiroud. 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17:68–74.[Abstract/Free Full Text]

    Duret, L., D. Mouchiroud, and C. Gautier. 1995. Statistical-analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J. Mol. Evol. 40:308–317.[ISI][Medline]

    Duret, L., D. Mouchiroud, and M. Gouy. 1994. HOVERGEN—a database of homologous vertebrate genes. Nucleic Acids Res. 22:2360–2365.[Abstract]

    Eyre-Walker, A. 1993. Recombination and mammalian genome evolution. Proc. R. Soc. Lond. B Biol. Sci. 252:237–243.[ISI][Medline]

    ———. 1999. Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152:675–683.

    Filipski, J. 1987. Correlation between molecular clock ticking, codon usage, fidelity of DNA-repair, chromosome-banding and chromatin compactness in germline cells. FEBS Lett. 217:184–186.[ISI][Medline]

    Francino, H. P., and H. Ochman. 1999. Isochores result from mutation not selection. Nature 400:30–31.

    Gonçalves, I., L. Duret, and D. Mouchiroud. 2000. Nature and structure of human genes that generate retropseudogenes. Genome Res. 10:672–678.[Abstract/Free Full Text]

    Gouy, M., F. Milleret, C. Mugnier, M. Jacobzone, and C. Gautier. 1984. ACNUC—a nucleic-acid sequence data-base and analysis system. Nucleic Acids Res. 12:121–127.[Abstract]

    Hattori, M., A. Fujiyama, T. D. Taylor et al. (60 co-authors). 2000. The DNA sequence of human chromosome 21. Nature 405:311–319.

    Hughes, A. L., and M. Yeager. 1997. Comparative evolutionary rates of introns and exons in murine rodents. J. Mol. Evol. 45:125–130.[ISI][Medline]

    Hughes, S., D. Zelus, and D. Mouchiroud. 1999. Warm-blooded isochore structure in Nile crocodile and turtle. Mol. Biol. Evol. 16:1521–1527.[Abstract]

    Jareborg, N., E. Birney, and R. Durbin. 1999. Comparative analysis of noncoding regions of 77 orthologous mouse and human gene pairs. Genome Res. 9:815–824.[Abstract/Free Full Text]

    Matassi, G., P. M. Sharp, and C. Gautier. 1999. Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol. 9:786–791.[ISI][Medline]

    MHC Sequencing Consortium. 1999. Complete sequence and gene map of a human major histocompatibility complex. Nature 401:921–923.

    Ophir, R., and D. Graur. 1997. Patterns and rates of indel evolution in processed pseudogenes from humans and murids. Gene 205:191–202.

    Sharp, P. M., M. Averof, A. T. Lloyd, G. Matassi, and J. F. Peden. 1995. DNA-sequence evolution: the sounds of silence. Philos. Trans. R. Soc. Lond. B Biol. Sci. 349:241–247.[ISI][Medline]

    Smit, A. F. 1999. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev. 9:657–663.[ISI][Medline]

    Suoka, N. 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:2653–2657.

    Williams, E. J. B., and L. D. Hurst. 2000. The proteins of linked genes evolve at similar rates. Nature 407:900–903.

    Wolfe, K. H., P. M. Sharp, and W.-H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283–285.

Accepted for publication January 5, 2001.