Danish University of Pharmaceutical Sciences, Institute of Pharmacology, Universitetsparken, Copenhagen, Denmark
Correspondence: E-mail anfu{at}dfh.dk.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Sequence alignment Shine-Dalgarno regions mutation molecular evolution codon usage bias expressivity
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Given that mutations and selectional forces display regional preferences, is it then possible to identify regions of DNA that evolve faster than others? We may get an answer from pairwise alignment studies of homologous genes from closely related species (i.e., by aligning homologs and count divergences). There is little information in, say, two lacZ genes representing any two species, but if we study many such orthologous pairs from the same two species, then we might be able to elucidate some patterns. One problem is that gene lengths also evolve, so codon-by-codon alignment will rely on either orthologs of equal length or on using parsing algorithms that are able to handle sequence gaps. We have chosen the former approach because gapped alignments give rise to some uncertainty about intragenic positions and because gapped alignment is much more difficult to implement on a large scale. On a small subset of orthologs of bacterial genes of Escherichia coli and Salmonella typhimurium, Eyre-Walker and Bulmer (1993) found a decreased tendency for synonymous substitution near the gene starts. Other studies conducted on orthologs of eukaryotes have focused more on the specific nucleotides involved in the changes and codon positions than on regional variations of these phenomena (Alvarez-Valin, Jabbari, and Bernardi 1998; Alvarez-Valin et al. 1999). There is a lack of studies encompassing whole genomes. This study, therefore, aims to expand the knowledge by using a whole-genome approach to quantify the intragenic divergences of orthologous pairs in relation the positions inside and outside genes.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Based on these considerations, E. coli strain O157:H7 EDL933 (GenBank accession number 002655 [Perna et al. 2001]) was chosen as the close relative of E. coli K12, and Yersinia pestis CO92 (GenBank accession number NC003143 [Parkhill et al. 2001]) and Salmonella typhimurium LT2 (GenBank accession number NC003197 [McClelland et al. 2001]) were selected as more distant relatives. Phylogenetically, the Escherichia genus is a member of the enterobacteriaceae group in the subdivision of the proteobacteria. Salmonella also belongs to the enterobacteriaeceae, whereas Yersinia is more distant to E. coli, belonging to the pasteurellaceae group in the
subdivision. All three strains are roughly comparable in GC content. (Table 1 lists the number of homologs and their GC content.) There are, thus, three data sets in this study: the first, containing homologous genes of equal length from the two E. coli strains; the second, containing homologous genes of equal length from E. coli K12 and S. typhimurium LT2; and the third, containing homologous genes of equal length from E. coli K12 and Y. pestis CO92.
|
Extragenic Evolution Rates
To study the extragenic evolution rates, the data sets were expanded with information about the 50 nucleotides upstream of start codons and downstream of stop codons. Nucleotide differences were then counted and plotted as function of the nucleotide position upstream of the start codon or downstream of the stop codon.
Conservation in the Shine-Dalgarno Region
It has been shown previously that a distinct pattern of nucleotide nonrandomness can be seen at around N = 9 nucleotides upstream of bacterial start codons, indicating usage of Shine-Dalgarno regions (Fuglsang and Engberg 2003). Furthermore, E. coli is reported to display more pronounced selection for translational efficiency than Bacillus subtilis (Shields and Sharp 1987; Sharp et al. 1988). This led to the investigation of nucleotide nonrandomness in data sets for high-expressivity genes versus low-expressivity genes, and it was shown that in E. coli, the nonrandomness is more pronounced in the Shine-Dalgarno region for the high-expressivity genes than for the low-expressivity genes, whereas no difference was observed in B. subtilis (Fuglsang 2003). On this basis, and because the data sets with data for S. typhimurium and Y. pestis displayed markedly lower divergence rates in this region (see Results and Discussion), it was decided to perform the same analysis on S. typhimurium and Y. pestis. For both species, two data sets were constructed, one consisting of the 500 genes having lowest expressivity and the other consisting of the 500 genes having highest expressivity. Note that expressivity is not the same as expression levels; expressivity denotes the codon adaptation index (CAI [Sharp and Li 1987]), which is only a surrogate value for expression levels. Nonrandomness analysis (Fuglsang and Engberg 2003; Fuglsang 2003) was performed on these two fractions individually.
Statistical Analysis
Spearman's rank correlation analysis was performed using GraphPad Prism version 3.0 (GraphPad Inc., Calif.) to test whether divergence rates show an increasing or decreasing trend near gene ends. When linear relationships were expected, linear regression was used. A probability corresponding to less than 5% chance was considered significant.
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Extragenic Observations and Expressivity
In figure 3ac, the extragenic divergences are shown for nucleotides upstream of start codons and downstream of stop codons. There is no clear pattern in the divergence frequencies downstream of stop codons. However, in figure 3b and c (and perhaps figure 3a as well), a remarkable drop in divergence is seen in the region around 10 nucleotides upstream of start codons. This is, in my interpretation, a clear proof of the importance of functional Shine-Dalgarno regions. Previously published results on this region showed that the degree of nonrandomness, centered on N = 9 nucleotides upstream of start codons in E. coli, is very pronounced here (Fuglsang and Engberg 2003) and that the nucleotides, furthermore, are more nonrandom in the highly expressed genes than in the lowly expressed genes (Fuglsang 2003). As figure 4a and b shows, the same is true for S. typhimurium and for Y. pestis. Both clearly make extensive use of Shine-Dalgarno regions, and these seem to be positioned just as in E. coli. Table 2 lists the 3' ends of the 16S rRNA for the species and strains in this study plus some more distantly related examples. This region is extremely well conserved across a wide range of eubacteria and archaebacteria. The natural role of the Shine-Dalgarno region is to facilitate ribosomal binding and translational initiation. The higher the complementarity between a gene's Shine-Dalgarno region and the 3' end of the 16S rRNA, the higher the chance of translational initiation. Therefore, highly expressed genes tend to have a highly complementary Shine-Dalgarno region. Figure 5a shows a plot of synonymous mutations between E. coli K12 and S. typhimurium orthologs versus the E. coli K12 codon adaptation index. Note that the two codon adaptation indexes are well correlated (inset in figure 5a). The figure shows that highly expressed genes tend to undergo synonymous evolution at a slower rate than lowly expressed genes. The same holds true for the nonsynonymous changes (fig. 5b). Data for the other data sets reveal the same tendency (not shown). Thus, the higher the expression of a gene, the slower it evolves, both synonymously and nonsynonymously. A study by Alff-Steinberger (2001) concluded that rare codons tend to be more prone to evolution. Because rare codons give lower expressivities, the findings of this study are in good agreement with Alff-Steinberger's (2001) results and also with Sharp and Li (1987), who concluded that codon bias and substitution rate varies inversely. All in all these findings also accord well with the findings of Eyre-Walker and Bulmer (1993), who concluded that the synonymous substitution rate is lower near start codons. This study expands those findings considerably, and, thus, the following are concluded:
|
|
|
|
The data presented here only represent bacteria in the subdivision of the proteobacteria. Therefore, one should be careful not to conclude that molecular evolution works the same all over regardless of phylogenetic position. It would thus be interesting to repeat the analysis for orthologous gene pairs taken from bacteria located elsewhere on the phylogenetic tree. This has so far been very difficult; for example, generation of similar data sets with B. subtilis (GenBank accession number NC000964) and B. halodurans (GenBank accession number NC002570), firmicutes, only gives 132 orthologous pairs and, thus, do not yield conclusive figures.
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() |
---|
Alff-Steinberger, C. 2001. A comparative study of mutations in Escherichia coli and Salmonella typhimurium shows that codon conservation is strongly correlated with codon usage. J. Theor. Biol. 206:307-311.[CrossRef][ISI]
Alvarez-Valin, F., K. Jabbari, and G. Bernardi. 1998. Synonymous and nonsynonymous substitutions in mammalian genes: intragenic correlations. J. Mol. Evol. 46:37-44.[ISI][Medline]
Alvarez-Valin, F., K. Jabbari, N. Carels, and G. Bernardi. 1999. Synonymous and nonsynonymous substitutions in genes from Gramineae: intragenic correlations. J. Mol. Evol. 49:330-342.[ISI][Medline]
Bentley, S. D., K. F. Chater, and A. M. Cerdeno-Tarraga, et al. (40 co-authors). 2002. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2),. Nature 417:141-147.[CrossRef][ISI][Medline]
Blattner, F. R., G. Plunkett, and C. A. Bloch, et al. (14 co-authors). 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453-1474.
Bulmer, M. 1987. Coevolution of codon usage and transfer RNA abundance. Nature 325:728-730.[CrossRef][ISI][Medline]
Bulmer, M. 1988. Codon usage and intragenic position. J. Theor. Biol. 133:67-71.[ISI][Medline]
Chen, G. F., and M. Inouye. 1990. Suppression of the negative effect of minor arginine codons on gene expression: preferential usage of minor codons within the first 25 codons of the Escherichia coli genes. Nucleic Acids Res. 18:1465-1473.[Abstract]
Chen, G. F., and M. Inouye. 1994. Role of the AGA/AGG codons, the rarest codons in global gene expression in Escherichia coli. Genes Dev. 8:2641-2652.[Abstract]
Eyre-Walker, A., and M. Bulmer. 1993. Reduced synonymous substitution rate at the start of enterobacterial genes. Nucleic Acids Res. 21:4599-4603.[Abstract]
Fuglsang, A. 2003. Association of the nucleotide with codon bias, amino acid usage and expressivity: differences between Bacillus subtilis and Escherichia coli. APMIS 111:926-930.[CrossRef][ISI][Medline]
Fuglsang, A., and J. Engberg. 2003. Non-randomness in Shine-Dalgarno regions: links to gene characteristics. Biochem. Biophys. Res. Commun. 302:296-301.[CrossRef][ISI][Medline]
Gouy, M., and C. Gautier. 1982. Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 10:7055-7074.[Abstract]
Herbeck, J. T., D. P. Wall, and J. J. Wernegreen. 2003. Gene expression level influences amino acid usage, but not codon usage, in the tsetse fly endosymbiont Wigglesworthia. Microbiology 149:585-2596.[CrossRef]
Ikemura, T. 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Mol. Biol. Evol. 2:13-34.[Abstract]
Ikemura, T., and H. Ozeki. 1983. Codon usage and transfer RNA contents: organism-specific codon-choice patterns in reference to the isoacceptor contents. Cold Spring Harb. Symp. Quant. Biol. 47:1087-1097.[ISI][Medline]
Kanaya, S., Y. Yamada, M. Kinouchi, Y. Kudo, and T. Ikemura. 2001. Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J. Mol. Evol. 53:290-298.[CrossRef][ISI][Medline]
McClelland, M., K. E. Sanderson, and J. Spieth, et al. (23 co-authors). . Complete genome sequence of Salmonella enterica serovar typhimurium LT2. Nature 413:852-856.
McInerney, J. O. 1998. Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc. Natl. Acad. Sci. USA 95:10698-10703.
Moll, I., M. Huber, S. Grill, P. Sairafi, F. Mueller, R. Brimacombe, P. Londei, and U. Blasi. 2001. Evidence against an Interaction between the mRNA downstream box and 16S rRNA in translation initiation. J. Bacteriol. 183:3499-3505.
Osada, Y., R. Saito, and M. Tomita. 1999. Analysis of base-pairing potentials between 16S rRNA and 5' UTR for translation initiation in various prokaryotes. Bioinformatics 15:578-581.
Parkhill, J., B. W. Wren, and N. R. Thomson, et al. (32 co-authors). 2001. Genome sequence of Yersinia pestis, the causative agent of plague. Nature 413:523-527.[CrossRef][ISI][Medline]
Perna, N. T., G. Plunkett, and V. Burl, et al. (25 co-authors). 2001. Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409:529-533.[CrossRef][ISI][Medline]
Rocha, E. P., and A. Danchin. 2001. Ongoing evolution of strand composition in bacterial genomes. Mol. Biol. Evol. 18:1789-1799.
Rocha, E. P., A. Viari, and A. Danchin. 1998. Oligonucleotide bias in Bacillus subtilis: general trends and taxonomic comparisons. Nucleic Acids Res. 26:2971-2980.
Sakai, H., C. Imamura, Y. Osada, R. Saito, T. Washio, and M. Tomita. 2001. Correlation between Shine-Dalgarno sequence conservation and codon usage of bacterial genes. J. Mol. Evol. 52:164-170.[ISI][Medline]
Sharp, P. M., E. Cowe, D. G. Higgins, D. C. Shields, K. H. Wolfe, and F. Wright. 1988. Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, and Homo sapiens; a review of the considerable within-species diversity. Nucleic Acids Res. 17:8207-8211.[ISI]
Sharp, P.M., and W.-H. Li. 1987. The codon adaptation indexa measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15:1281-1295.[Abstract]
Shields, D. C., and P. M. Sharp. 1987. Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res. 19:8023-8040.
Shine, J., and L. Dalgarno. 1974. The 3'-terminal sequence of Escherichia coli 165 ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. USA 71:1342-1346.[Abstract]
Sprengart, M. L., H. P. Fatscher, and E. Fuchs. 1990. The initiation of translation in E. coli: apparent base pairing between the 16s rRNA and downstream sequences of the mRNA. Nucleic Acids Res. 19:1719-1723.
Sprengart, M. L., E. Fuchs, and A. G. Porter. 1996. The downstream box: an efficient and independent translation initiation signal in Escherichia coli. EMBO J. 15:665-674.[Abstract]
Stenstrom, C. M., E. Holmgren, and L. A. Isaksson. 2001. Cooperative effects by the initiation codon and its flanking regions on translation initiation. Gene 273:259-265.[CrossRef][ISI][Medline]