* International Institute of Molecular and Cell Biology
Institute of Biochemistry and Biophysics, Warsaw, Poland
Correspondence: E-mail: grzesiek{at}iimcb.gov.pl.
Abstract
To investigate the mechanisms regulating the nucleotide usage in mammalian genes, we analyzed the sequences of three physically linked Hsp70 paralogs in human and mouse. We report that the sequences of HSPA1A and HSPA1B genes are almost identical, whereas the HSPA1L gene contains some regions very similar to HSPA1A/B and some regions with much higher divergence. Phylogenetic analysis reveals that gene conversion has homogenized the entire coding regions of HSPA1A/B and several fragments of HSPA1L. The regions undergoing conversion are all very GC rich, contrarily to the regions not subject to conversion. The pattern of nucleotide substitution in mammalian orthologs suggests that the mechanism increasing the GC content is still functioning. To test the possibility that the high GC content facilitates the expression of Hsp70 during heat-shock, we performed in vitro translation experiments. We failed to detect any effect of GC content on the translation efficiency at high temperatures. Taken together, our data strongly support the biased gene conversion hypothesis of GC-content evolution.
Key Words: GC content gene conversion heat-shock
Introduction
The variability of nucleotide composition is one of the most mysterious characteristics of mammalian genomes. Bernardi et al. (1985) performed density gradient centrifugations of sheared mammalian DNA and observed several fractions with distinct average densities. The fractions were found to correspond to DNA fragments with different average GC contents, and the corresponding genome regions were called isochores. Although the question whether the isochores are truly "iso" has recently raised much controversy (Bernardi 2001; Lander et al. 2001; Li et al. 2003), there is no doubt that mammalian genomes are far from homogenous and that we do not well understand why it is so.
In the prokaryotic genomes, the main source of GC-content heterogeneity is the presence of highly expressed genes with strongly biased codon usage. Many studies have shown that the use of optimal codons is a decisive factor determining the level of gene expression in bacteria, yeast, flies, and worms and that genes requiring high translation levels are forced by selection to adopt a particular set of codons (Gouy and Gautier 1982; Grosjean and Fiers 1982; Sharp and Li 1986; Bulmer 1987; Powell and Moriyama 1997; Duret and Mouchiroud 1999).
The situation is more complicated in mammals. The GC content of large genome fragments (isochores) ranges from 30% to 60%, and the GC content at the third codon positions of genes (GC3) ranges from 25% to more than 90% (Bernardi 1995). Unlike in lower organisms, no clear correlation has been found between the codon usage of genes and their expression levels. Consequently, several other hypotheses have been put forward to explain the origin of GC-rich isochores and genes in mammals. For example, based on the observation that the genomes of homeothermic vertebrates have more GC-rich isochores than the poikilothermic ones, it has been proposed that the isochore structure of mammalian genomes is an adaptation to higher body temperatures (Bernardi et al. 1985). Although some data from genomic sequence analysis support the selective hypotheses of mammalian GC-content evolution (Zoubak et al. 1995; Hughes and Yeager 1997; Eyre-Walker 1999), most recent analyses argue against the thermal stability version of the selectionist view (Hughes, Zelus, and Mouchiroud 1999; Hamada et al. 2003; Ream, Johns, and Somero 2003).
An alternative set of hypotheses proposes that the GC content of mammalian isochores and the codon usage of genes have no selective meaning. The neutral factors that were proposed to account for the GC-content variation include mutation bias and biased gene conversion (BGC) (Sueoka 1988; Wolfe, Sharp, and Li 1989; Holmquist 1992; Eyre-Walker 1993; Eyre-Walker and Hurst 2001; Birdsell 2002). The latter theory, which has recently gained much interest (Galtier 2003; Marais 2003; Montoya-Burgos, Boursot, and Galtier 2003), states that high GC content is a consequence of a GC-biased repair of mismatches during recombination. If this theory is true, then frequently recombining genes, as well as those undergoing concerted evolution by gene conversion, should experience an increase of GC content. Indeed, it is known that GC content correlates with the local recombination rate (Eyre-Walker 1993). There is also evidence that the GC content of histone genes covaries with the number of their close paralogs in the genome, potentially indicative of gene conversion (Galtier 2003). However, the evidence for gene conversion of the histone genes is rather indirect. A model locus was needed that provided good evidence for gene conversion and in which some genes or gene fragments were known not to undergo conversion. Such a locus would enable the direct investigation of the relationship between conversion and GC content.
Here we investigate a triplet of closely linked mammalian Hsp70-family genes. Two of the genes undergo frequent conversions, and they are more GC-rich than the third gene, which is only subject to partial conversions. This supports the BGC hypothesis of GC-content evolution. Interestingly, the Hsp70-family genes are differentially expressed at high temperatures, providing an interesting framework to test the thermal hypothesis of isochore evolution. Our initial experiments suggest that high GC content is not required for the efficient translation of mammalian genes at high temperatures.
Materials and Methods
Phylogenetic Analyses
The following sequences were used in this study: human (Homo sapiens) HSPA1A (M59828), HSPA1B (M59830), HSPA1L (D85730); mouse (Mus musculus) Hspa1a (M76613), Hspa1b (M35021), Hspa1l (M32218); rat (Rattus norvegicus) Hspa1a (X77207); Hspa1b (X77208); Hspa1l (X77209); pig (Sus scrofa) HSP70 (M69100); and bovine (Bos taurus) HSP70-2 (U02892). The sequences were aligned using ClustalW at http://www.ebi.ac.uk/clustalw/ with the default parameters. Further analyses were performed on the coding regions of the genes using the MEGA2 package (Kumar et al. 2001). The synonymous and nonsynonymous substitution rates were calculated using the Nei and Gojobori method (Nei and Gojobori 1986). The phylogenetic trees were built using the neighbor-joining method, using the Tamura's three-parameter distance measure (Tamura 1992), which corrects for the GC content and transition/transversion rate biases. The bootstrap tests of phylogeny were performed using 500 replicates. To estimate the transition/transversion rate ratios between the HSPA1A orthologs, Tamura's three-parameter model was used.
Nucleotide Usage Analyses
To investigate local GC contents within and around the genes, the genomic fragments containing the human and mouse MHCIII loci were downloaded from the NCBI site (http://www.ncbi.nlm.nih.gov). To compare the local GC content and the local similarity between HSPA1A and HSPA1B, 9 kb of genomic sequence centered at the two genes was aligned using the default settings of ClustalW. The local similarity was estimated as the percentage of nucleotide matches in sliding windows of 100 bp (counting gaps as mismatches), and the local GC content was also measured in 100-bp windows. To investigate the similarity between HSPA1A and HSPA1L, only the coding regions of the genes were used.
In vitro Translation
Human/HSPA1A and HSPA8 were cloned into the pET3c plasmid (Novagen) containing a T7 polymerase promoter and a T7 terminator. After linearization, the plasmids were used as a template for the production of capped mRNA with the T7 Cap Scribe kit (Roche). Equal amounts of mRNA for HSPA1A and HSPA8 were translated for 1 h using the Reticulocyte Translation Kit Type II (Roche) and 35S-labeled Methionine (Amersham Biosciences) in a gradient thermocycler in the temperature range 26°C to 42°C.
Half of the reaction mixture was resolved on a 10% SDS-polyacrylamide gel, dried and exposed overnight to an autoradiography film (Kodak BioMax). The other half was TCA-precipitated on GFC filters (Whatman) according to the protocol enclosed in the Reticulocyte Translation Kit Type II and counted in a liquid scintillation counter.
Results and Discussion
Around 10 genes from the Hsp70 family are present in the genomes of humans and other mammals (Tavaria, Kola, and Anderson 1997). We focused our attention on the three human genes located in the MHCIII locus. Two of those genes, HSPA1A and HSPA1B, are intronless, and their expression is strongly increased in most tissues after heat-shock or other types of stress. The third gene, HSPA1L, possesses one intron, is testis specific, and is constitutively expressed even in the absence of heat-shock. An orthologous triplet of genes also exists in the MHCIII complex of the mouse (fig. 1a) and rat (Walter, Rauh, and Gunther 1994; Tavaria, Kola, and Anderson 1997; Ito et al. 1998;). It is, therefore, assumed that the duplications that led to the formation of the HSPA1A, HSPA1B, and HSPA1L genes must have taken place before the split of rodent and primate lineages.
|
To locate more precisely the regions undergoing conversion between the paralogous genes, we performed pairwise alignments of human (or mouse) genomic regions containing the genes of interest. The results of this analysis are summarized in table 1. Greater than 99% identity exists between HSPA1A and HSPA1B, as well as between mouse Hspa1a and Hspa1b. The regions of high similarity begin around 500 nucleotides upstream from the start of the open reading frames and stop near the end of the translated regions. Assuming an evolutionary rate of 2.5 silent substitutions per site per billion years (Lynch and Conery 2000), we can estimate that the last conversion event between human HSPA1A and HSPA1B took place 2 MYA. Similarly, the last conversion event in the mouse can be dated to 3 MYA. Interestingly, we also observed several regions of high local similarity between the mouse Hspa1a and Hspa1l genes (table 1). The average identity at fourfold degenerate sites between Hspa1a and Hspa1l is 57%, but there are some stretches of 50 to 90 codons with more than 90% identity at fourfold degenerate sites. A similar result was obtained for the human HSPA1A and HSPA1L genes (table 1), as well as for the rat genes (data not shown). When we repeated the phylogenetic analysis using those gene fragments, the conversion between Hspa1a/b and Hspa1l became apparent (fig. 1c and d). Interestingly, we found that similar regions undergo independent conversions in humans and mice (table 1 and fig. 1d). The extent of similarity suggests that the most similar fragments of HSPA1A/B and HSPA1L underwent conversion around 15 MYA. We conclude that frequent gene conversion homogenizes the entire coding regions of HSPA1A and HSPA1B and several fragments of HSPA1A (or HSPA1B) and HSPA1L.
|
|
|
|
|
Acknowledgements
We thank M. Zylicz, M. Bochtler, E. Bartnik, R. I. Morimoto, J. M. Bujnicki, M. Cheetham, and the people from the M. Zylicz lab and from the Polish Children's Fund for helpful discussions. G.K. is the recipient of a scholarship from the Postgraduate School of Molecular Medicine affiliated with the Medical University of Warsaw. This work was supported by the State Committee for Scientific Research Grant number 6P04A4219 and the Foundation for Polish Science.
Footnotes
Literature Cited
Bernardi, G. 1995. The human genome: organization and evolutionary history. Annu. Rev. Genet. 29:445-476.[CrossRef][ISI][Medline]
Bernardi, G. 2001. Misunderstandings about isochores. Part 1. Gene 276:3-13.[CrossRef][ISI][Medline]
Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunier-Rotival, and F. Rodier. 1985. The mosaic genome of warm-blooded vertebrates. Science 228:953-958.[ISI][Medline]
Bielawski, J. P., K. A. Dunn, and Z. Yang. 2000. Rates of nucleotide substitution and mammalian nuclear gene evolution: approximate and maximum-likelihood methods lead to different conclusions. Genetics 156:1299-1308.
Birdsell, J. A. 2002. Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol. Biol. Evol. 19:1181-1197.
Bulmer, M. 1987. Coevolution of codon usage and transfer RNA abundance. Nature 325:728-730.[CrossRef][ISI][Medline]
Duret, L., and D. Mouchiroud. 1999. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc. Natl. Acad. Sci. USA 96:4482-4487.
Duret, L., M. Semon, G. Piganeau, D. Mouchiroud, and N. Galtier. 2002. Vanishing GC-rich isochores in mammalian genomes. Genetics 162:1837-1847.
Eyre-Walker, A. 1993. Recombination and mammalian genome evolution. Proc. R. Soc. Lond. B Biol. Sci. 252:237-243.[ISI][Medline]
Eyre-Walker, A. 1999. Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics 152:675-683.
Eyre-Walker, A., and L. D. Hurst. 2001. The evolution of isochores. Nat. Rev. Genet. 2:549-555.[CrossRef][ISI][Medline]
Galtier, N. 2003. Gene conversion drives GC content evolution in mammalian histones. Trends Genet. 19:65-68.[CrossRef][ISI][Medline]
Galtier, N., and J. R. Lobry. 1997. Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J. Mol. Evol. 44:632-636.[ISI][Medline]
Gouy, M., and C. Gautier. 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10:7055-7074.[Abstract]
Grosjean, H., and W. Fiers. 1982. Preferential codon usage in prokaryotic genes: the optimal codon-anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene 18:199-209.[CrossRef][ISI][Medline]
Hamada, K., T. Horiike, H. Ota, K. Mizuno, and T. Shinozawa. 2003. Presence of isochore structures in reptile genomes suggested by the relationship between GC contents of intron regions and those of coding regions. Genes Genet. Syst. 78:195-198.[CrossRef][ISI][Medline]
Holmquist, G. P. 1992. Chromosome bands, their chromatin flavors, and their functional features. Am. J. Hum. Genet. 51:17-37.[ISI][Medline]
Hughes, A. L., and M. Yeager. 1997. Comparative evolutionary rates of introns and exons in murine rodents. J. Mol. Evol. 45:125-130.[ISI][Medline]
Hughes, S., D. Zelus, and D. Mouchiroud. 1999. Warm-blooded isochore structure in Nile crocodile and turtle. Mol. Biol. Evol. 16:1521-1527.[Abstract]
Hurst, L. D., and A. R. Merchant. 2001. High guanine-cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes. Proc. R. Soc. Lond. B Biol. Sci. 268:493-497.[CrossRef][ISI][Medline]
International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921.[CrossRef][ISI][Medline]
Ito, Y., A. Ando, H. Ando, J. Ando, Y. Saijoh, H. Inoko, and H. Fujimoto. 1998. Genomic structure of the spermatid-specific hsp70 homolog gene located in the class III region of the major histocompatibility complex of mouse and man. J. Biochem. (Tokyo) 124:347-353.[Abstract]
Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245.
Li, W., P. Bernaola-Galvan, P. Carpena, and J. L. Oliver. 2003. Isochores merit the prefix iso. Comput. Biol. Chem. 27:5-10.[CrossRef][ISI][Medline]
Lynch, M., and J. S. Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151-1155.
Marais, G. 2003. Biased gene conversion: implications for genome and sex evolution. Trends Genet. 19:330-338.[CrossRef][ISI][Medline]
Montoya-Burgos, J. I., P. Boursot, and N. Galtier. 2003. Recombination explains isochores in mammalian genomes. Trends Genet. 19:128-130.[CrossRef][ISI][Medline]
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418-426.[Abstract]
Oliver, J. L., P. Bernaola-Galvan, P. Carpena, and R. Roman-Roldan. 2001. Isochore chromosome maps of eukaryotic genomes. Gene 276:47-56.[CrossRef][ISI][Medline]
Powell, J. R., and E. N. Moriyama. 1997. Evolution of codon usage bias in Drosophila. Proc. Natl. Acad. Sci. USA 94:7784-7790.
Ream, R. A., G. C. Johns, and G. N. Somero. 2003. Base compositions of genes encoding alpha-actin and lactate dehydrogenase-A from differently adapted vertebrates show no temperature-adaptive variation in G + C content. Mol. Biol. Evol. 20:105-110.
Sharp, P. M., and W. H. Li. 1986. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24:28-38.[ISI][Medline]
Sueoka, N. 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85:2653-2657.[Abstract]
Tamura, K. 1992. Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C-content biases. Mol. Biol. Evol. 9:678-687.[Abstract]
Tavaria, M., I. Kola, and R. L. Anderson. 1997. The hsp70 genes of mice and men. Pp. 4952 in M. J. Gething, ed. Guidebook to molecular chaperones and protein-folding catalysts. Oxford University Press, Oxford, UK.
Walter, L., F. Rauh, and E. Gunther. 1994. Comparative analysis of the three major histocompatibility complex-linked heat shock protein 70 (Hsp70) genes of the rat. Immunogenetics 40:325-330.[ISI][Medline]
Wolfe, K. H., P. M. Sharp, and W. H. Li. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337:283-285.[CrossRef][ISI][Medline]
Yang, Z., and A. D. Yoder. 1999. Estimation of the transition/transversion rate bias and species sampling. J. Mol. Evol. 48:274-283.[ISI][Medline]
Zoubak, S., G. D'Onofrio, S. Caccio, and G. Bernardi. 1995. Specific compositional patterns of synonymous positions in homologous mammalian genes. J. Mol. Evol. 40:293-307.[ISI][Medline]