*Department of Ecology and Evolutionary Biology, University of Arizona;
Department of Biochemistry and Molecular Biophysics, University of Arizona
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Recent completion of E. coli and Salmonella genome sequencing (Blattner et al. 1997
; McClelland et al. 2001
; Parkhill et al. 2001
; Perna et al. 2001
) allows a reexamination of this distance effect on synonymous substitution rates based on the entire complement of homologous genes in these organisms. In addition, the full sequences of many bacterial genomes have been completed, including many pairs of closely related species, allowing a test of whether the effect of chromosome position on sequence divergence is a general feature of bacterial genomes. Because substitution rates at synonymous sites are less influenced by selection than at nonsynonymous positions, changes at these sites can provide substantial information about the underlying rates of mutations. We have examined the relationship between synonymous substitution rates and chromosomal position in 14 bacterial species pairs, each sharing a large number of gene homologs, in order to study differences in sequence divergence along the chromosome.
The distance effect could be attributable to increased mutation rates or decreased repair capabilities because genes are situated further from the replication origin. Although the molecular basis of these differences in mutation rates has not been addressed experimentally, it was originally hypothesized to be the outcome of more frequent recombinational repair or biased gene conversion (Sharp et al. 1989
; Sharp 1991
; Birky and Walsh 1992
), which might arise from higher gene dosage near the origin, as achieved by multiple replication forks. Because the growth conditions and the number of coincident replication forks per cell are variable among species, the strength of the distance effect in different taxa could lend support to this explanation. In addition, we have determined the patterns of individual substitutions at synonymous positions in order to elucidate the potential causes of differences in substitution rates at different positions of the chromosome.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Determining Positions of Replication Origin and Terminus
The location of the origin of replication in each genome is based on information present in the databases as derived by experimental evidence (Weigel et al. 1997
; Barekzi et al. 2001
), the presence of the dnaA box sequence (Salazar et al. 1996
; Gasc et al. 1998
), shifts in G+C-skew at third codon positions (Lobry 1996
; Read et al. 2000
; Kuroda et al. 2001
; Ogata et al. 2001
) and shifts in skewed oligonucleotides (or both) (Salzberg et al. 1998
). The replication terminus for each genome was located at the position most distant from the origin and usually coincided with a second shift in G+C-skew (Lobry 1996
; Salzberg et al. 1998
). Gene positions were estimated as the distance from the origin of replication to the start of the open reading frame, regardless of coding strand. Homologous genes were excluded from the analysis if they occupied genome positions that differed by more than one tenth the length of the genome in either species. This elimination of homologs was especially relevant to the analyses of the Pseudomonads, which have undergone high levels of genome rearrangements, and to the analysis of the two Mycobacteria, which differ by one megabase in genome size (Cole et al. 2001
).
Sequence Analysis and Codon Usage
The genes of a reference species were searched for sequence homology using BLAST similarity searches (Altschul et al. 1997
) against the full sequence of a subject genome. A gene in the subject species was considered a homolog when it shared at least 60% sequence identity over at least 80% of the length of the reference gene. Genes shorter than 150 bp were excluded from the analysis, and those genes with different orientations in the two species from each compared pair were also eliminated. Homologous sequences were aligned using the Gap command of the GCG package (Devereux, Haeberli, and Smithies 1984
). Divergence rates at silent sites (Ks) were obtained through Diverge in GCG, which applies the algorithm by Li (1993)
and Pamilo and Bianchi (1993)
. The accuracy of using this measure of Ks for estimating synonymous divergence has been validated in the pair E. coli-S. typhimurium (Smith and Eyre-Walker 2001), but the assumptions in the use of this distance statistic might be violated in some genomes with extreme G + C compositions. Substitutions were identified as one of six types: A
T, A
G, A
C, C
G, C
T, and C
A. Because the ancestral state of sequences is unknown in pairwise comparisons, directionality of nucleotide substitutions was not determined. Codon usage bias was estimated by the
2 measure (Shields et al. 1998
) using the publicly available DNA Master program from J. G. Lawrence (http://cobamide2.bio.pitt.edu/computer.htm).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Although silent sites may not be entirely neutral because many species show a nonrandom choice of codons (Gouy and Gautier 1982
; Ikemura 1985
), the distance effect observed in the species pairs presented in figure 1
is not affected by codon usage bias. This is supported by two results. First, individual regression analyses of distance from the origin as a predictor of Ks performed on different codon usage bias categories (low, intermediate, and high) yielded consistent results for each species comparison. Figure 2
shows this result for homologs from E. coli and S. typhimurium. The strength of the distance effect did not statistically differ among the three codon bias categories (F = 0.97, P = 0.38, ANCOVA). In addition, there was no effect of distance from the origin on codon usage bias for all species pairs (simple regression analysis, data not shown); in other words, highly biased genes were distributed equally throughout the chromosome (fig. 2
), and the distance effect was not a by-product of low-biased genes being clustered near the terminus.
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We examined whether the distance effect is also present in archeabacteria. Although the mechanisms of DNA replication have not been fully elucidated in archaebacteria, recent research demonstrates that replication occurs from a single origin in some species such as Pyrococcus (Myllykallio et al. 2001
; Smith et al. 1997
; Salzberg et al. 1998
). However, in a comparison of homologs between Pyrococcus abysii and P. horikoshi, there is no significant relationship between Ks and the distance from the putative replication origin (r2 = 0.001, P > 0.1).
Possible Causes
Changes in substitution rates with distance from the replication origin could result from either differences in mutation rates or differences in repair rates at different positions of the chromosome. The distance effect was originally hypothesized to result from more frequent recombinational repair or biased gene conversion near the origin (Sharp et al. 1989
; Sharp 1991
; Birky and Walsh 1992
) as achieved from the presence of multiple replication forks which produce multiple copies of sequences closer to the origin. Because the distance effect preferentially acts on transversions, it is difficult to see how such a substitutional pattern could arise from a nondiscriminating repair process, such as gene conversion or homologous exchange. The number of replication forks within a cell is largely influenced by the growth rate, which is highly variable among the bacteria analyzed in the present study (Mira, Moran, and Ochman 2001
). Although growth rates are sometimes difficult to estimate, there is an association between the number of ribosomal RNA operons and growth rate across bacterial species (Asai et al. 1999
; Klappenbach, Dunbar, and Schmidt 2000
). When the strength of the distance effect is compared with the number of ribosomal operons across species (including the Chlamydia, Listeria, Rickettsia, Mycobacterium, and Pseudomonas pairs, together with the comparisons S. pyogenes-S. pneumoniae, S. aureus-S. epidermis, and E. coli-S. typhimurium), there is a positive relationship (r2 = 0.66): the enteric bacteria have the strongest distance effect and also the highest number of ribosomal RNA operons (seven), whereas Chlamydia and Mycobacterium, which display no distance effect, have only one or two ribosomal operons. However, this result is equivocal: Rickettsia contains but a single ribosomal operon but shows a strong distance effect. In addition, this relationship between rRNA operons and the strength of the distance effect is based on a small set of phylogenetically distant, but not completely independent, comparisons.
To gain additional insights into the potential mechanisms involved in the distance effect, we examined the frequency of individual substitutions at different parts of the chromosome. In the cases where a positive distance effect was detected, transitions generally increased with distance from the origin, but to a significantly lower extent than transversions. When the different transitions and transversions were evaluated, the substitutions contributing most to the distance effect varied according to the specific bacterial pair. For example, G T and A
C transversions are most prevalent in Escherichia, Salmonella, and Rickettsia, in contrast to A
T transversions in Listeria.
Gene orientation does not influence the distance effect: genes of forward and of reverse orientation show a similar increase in Ks values with distance from replication origin. In species pairs for which there is a significant distance effect, the GC content of homologs differs most in genes situated away from the origin of replication. The extent to which this is a cause or result of the distant effect is not known. It is possible that some of the mechanisms affecting GC composition, such as mutational bias, are intensified when a gene is located closer to the replication terminus. Bacterial chromosomes are thought to move through a stationary machinery for replicating DNA (Lemon and Grossman 1998
), and the newly formed replication origins appear to move toward the pole of cells (Webb et al. 1998
). It is possible that this replication process creates differences in enzyme activity (and mutation rates) along different parts of the chromosome. For example, the DNA polymerase may tend to fall off the replicating DNA strand as replication progresses and the reassembling of the polymerase can be error-prone (Goodman 2000
; Courcelle and Hanawalt 2001
).
Seeming Exceptions
Focusing on the cases where a distance effect on Ks was not detected offers additional insights into its possible causes. For example, when homologs from strains within a single species were compared, as possible in E. coli, Neisseria meningitidis, and Helicobacter pylori, no significant distance effects were observed, and the same was true for the pair N. meningitidis-N. gonorrhoeae. Thus, in comparisons where there are low levels of sequence divergence between homologs, we detected no effect of distance from the replication origin on substitution rates (table 1
). In these cases, there is probably insufficient variation to detect a change in substitution frequencies across the chromosome, particularly if rarer transversions are responsible for the phenomenon. In addition, recombination between such closely related strains might diminish the overall amount of detected divergence.
Another case in which there is no significant association between distance from the replication origin and Ks is in the C. muridarum-C. trachomatis comparison. The relatively small chromosome of these parasitic bacteria (Read et al. 2000
) might contribute to the absence of a distance effect. In these genomes, a gene can be, at most, 500 kilobases (kb) from the replication origin, a distance that may not be sufficient to produce a significant effect in this species. For example, when analyzing only the genes in the initial 500 kb of E. coli and S. typhi chromosomes, there is no significant distance effect (t = -0.67, P = 0.55). However, in Rickettsia, which has approximately the same genome size as Chlamydia, a distance effect is apparent.
During the process of genome reduction, both Rickettsia and Chlamydia have lost several DNA repair genes (Stephens et al. 1998
; Andersson and Andersson 2001
), and if any are uniquely involved in the preferential repair of close-to-the-origin genes, their absence might eliminate a distance effect.
Whatever mechanism underlies the distance effect, the increase in synonymous divergence with distance from the replication origin should be apparent in spontaneous mutation or substitution rates measured under experimental conditions. However, Hudson et al. (2002)
failed to detect an effect of distance from the replication origin on the mutation rate of lacZ alleles inserted at four sites in the Salmonella genome. In contrast, they found the highest mutation rate at a locus of intermediate position between the replication origin and terminus. The basis for this discrepancy could be that laboratory conditions produce a different mutational spectrum than that under natural conditions (Hudson et al. 2002
). Although the distance effect was not apparent in this experimental setting, it has influenced the rates and patterns of molecular evolution across a wide range of bacterial genomes.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Present address: Department of Molecular Evolution, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18C, SE-752 36 Uppsala, Sweden
Keywords: substitution rates
replication origin
genome evolution
Escherichia coli
Address for correspondence and reprints: Howard Ochman, Department of Biochemistry and Molecular Biophysics, University of Arizona, P.O. Box 210088, Tucson, Arizona 85721. hochman{at}email.arizona.edu
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402
Andersson J. O., S. G. E. Andersson, 2001 Pseudogenes, junk DNA, and the dynamics of Rickettsia genomes Mol. Biol. Evol 18:829-839
Asai T., C. Condon, J. Voulgaris, D. Zaporojets, B. Shen, M. Al-Omar, C. Squires, C. L. Squires, 1999 Construction and initial characterization of Escherichia coli strains with few or no intact chromosomal rRNA operons J. Bacteriol 181:3803-3809
Barekzi N., K. Beinlich, T. T. Hoang, X. Q. Pham, R. Karkhoff-Schweizer, H. P. Schweizer, 2001 High-frequency flp recombinase-mediated inversions of the oriC-containing region of the Pseudomonas aeruginosa genome J. Bacteriol 182:7070-7074
Birky C. W. Jr.,, J. B. Walsh, 1992 Biased gene conversion, copy number, and apparent mutation rate differences within chloroplast and bacterial genomes Genetics 130:677-783
Blattner F. R., G. Plunkett III, C. A. Bloch, et al. (17 co-authors) 1997 The complete genome sequence of Escherichia coli K-12 Science 277:1453-1474
Cole S. T., K. Eiglmeier, J. Parkhill, K. D. James, N. R. Thomson, P. R. Wheeler, N. Honore, T. Garnier, C. Churcher, D. Harris, 2001 Massive gene decay in the leprosy bacillus Nature 409:1007-1011[ISI][Medline]
Courcelle J., P. C. Hanawalt, 2001 Participation of recombination proteins in rescue of arrested replication forks in UV-irradiated Escherichia coli need not involve recombination Proc. Natl. Acad. Sci. USA 98:8196-8202
Devereux J., P. Haeberli, O. Smithies, 1984 A comprehensive set of sequence analysis programs for the VAX Nucleic Acids Res 12:387-395[Abstract]
Francino M. P., H. Ochman, 2001 Deamination as the basis of strand-asymmetric evolution in transcribed Escherichia coli sequences Mol. Biol. Evol 18:1147-1150
Gasc A. M., P. Giammarinaro, S. Richter, M. Sicard, 1998 Organization around the dnaA gene of Streptococcus pneumoniae Microbiology 144:433-439[Abstract]
Gojobori T., K. Ishii, M. Nei, 1982 Patterns of nucleotide substitution in pseudogenes and functional genes J. Mol. Evol 18:360-369[ISI][Medline]
Goodman M. F., 2000 Coping with replication train wrecks' in Escherichia coli using Pol V, Pol II and RecA proteins Trends Biochem. Sci 25:189-195[ISI][Medline]
Gouy M., Gautier C., 1982 Codon usage in bacteria: correlation with gene expressivity Nucleic Acids Res 10:7055-7074[Abstract]
Hudson R. E., U. Bergthorsson, J. R. Roth, H. Ochman, 2002 Effect of chromosome location on bacterial mutation rates Mol. Biol. Evol. 19:8592.
Ikemura T., 1981 Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system J. Mol. Biol 151:389-409[ISI][Medline]
Ikemura T., 1985 Codon usage and tRNA content in unicellular and multicellular organisms Mol. Biol. Evol 2:13-34[Abstract]
Klappenbach J. A., J. M. Dunbar, T. M. Schmidt, 2000 rRNA operon copy number reflects ecological strategies of bacteria Appl. Environ. Microbiol 66:1328-1333
Kuroda M., T. Ohta, I. Uchiyama, et al. (37 co-authors) 2001 Whole genome sequencing of meticillin-resistant Staphylococcus aureus Lancet 357:1225-1240[ISI][Medline]
Lemon K. P., A. D. Grossman, 1998 Localization of bacterial DNA polymerase: evidence for a factory model of replication Science 282:1516-1519
Li W. H., 1993 Unbiased estimation of the rates of synonymous and non-synonymous substitution J. Mol. Evol 36:96-99[ISI][Medline]
Lobry J. R., 1996 Asymmetric substitution patterns in the two DNA strands of bacteria Mol. Biol. Evol 13:660-665[Abstract]
McClelland M., K. E. Sanderson, J. Spieth, et al. (26 co-authors) 2001 Complete genome sequence of Salmonella enterica serovar Typhimurium LT2 Nature 413:852-856[ISI][Medline]
Mira A., N. A. Moran, H. Ochman, 2001 Deletional bias and the evolution of bacterial genomes Trends Genet 17:589-596[ISI][Medline]
de Miranda A. B., F. Alvarez-Valin, K. Jabbari, W. M. Degrave, G. Bernardi, 2000 Gene expression, amino acid conservation, and hydrophobicity are the main factors shaping codon preferences in Mycobacterium tuberculosis and Mycobacterium leprae J. Mol. Evol 50:45-55[ISI][Medline]
Moran N. A., J. J. Wernegreen, 2000 Lifestyle evolution in symbiotic bacteria: insights from genomics Trends Ecol. Evol 15:321-326[ISI][Medline]
Myllykallio H., P. Lopez, P. Lopez-Garcia, R. Heilig, W. Saurin, Y. Zivanovic, H. Philippe, P. Forterre, 2001 Bacterial mode of replication with eukaryotic-like machinery in a hyperthermophilic archaeon Science 288:2212-2215
Ogata H., S. Audic, P. Renesto-Audiffren, et al. (11 co-authors) 2001 Mechanisms of evolution in Rickettsia conorii and R. prowazekii Science 293:2093-2098
Pamilo P., N. O. Bianchi, 1993 Evolution of the zfx and zfy genesrates and interdependence between the genes Mol. Biol. Evol 10:271-281[Abstract]
Parkhill J., G. Dougan, K. D. James, et al. (41 co-authors) 2001 Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18 Nature 413:848-852[ISI][Medline]
Perna N. T., G. Plunkett III, V. Burland, et al. (28 co-authors) 2001 Genome sequence of enterohaemorrhagic Escherichia coli O157:H7 Nature 409:529-533[ISI][Medline]
Read T. D., R. C. Brunham, C. Shen, et al. (25 coauthors) 2000 Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 Nucleic Acids Res 28:1397-1406
Salazar L., H. Fsihi, E. de Rossi, G. Riccardi, C. Rios, S. T. Cole, H. E. Takiff, 1996 Organization of the origins of replication of the chromosomes of Mycobacterium smegmatis, Mycobacterium leprae and Mycobacterium tuberculosis and isolation of a functional origin from M. smegmatis Mol. Microbiol 20:283-293[ISI][Medline]
Salzberg S. L., A. J. Salzberg, A. R. Kerlavage, J. F. Tomb, 1998 Skewed oligomers and origins of replication Gene 217:57-67[ISI][Medline]
Sharp P. M., 1991 Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution J. Mol. Evol 33:23-33[ISI][Medline]
Sharp P. M., W. H. Li, 1987 The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias Mol. Biol. Evol 4:222-230[Abstract]
Sharp P. M., D. C. Shields, K. H. Wolfe, W. H. Li, 1989 Chromosomal location and evolutionary rate variation in enterobacterial genes Science 246:808-810[ISI][Medline]
Shields D. C., P. M. Sharp, D. G. Higgins, F. Wright, 1998 "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons Mol. Biol. Evol 5:704-716[Abstract]
Smith D. R., L. A. Doucette-Stamm, C. Deloughery, et al. (37 co-authors) 1997 Complete genome sequence of Methanobacterium thermoautotrophicum H: functional analysis and comparative genomics J. Bacteriol 179:7135-7155[Abstract]
Smith N. G., A. Eyre-Walker, 2001 Nucleotide substitution rate estimation enterobacteria: approximate and maximum-likelihood methods lead to similar conclusions Mol. Biol. Evol 18:2124-2126
Stephens R. S., S. Kalman, C. Lammel, et al. (12 co-authors) 1998 Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis Science 282:754-759
Webb C. D., P. L. Graumann, J. A. Kahana, A. A. Teleman, P. A. Silver, R. Losick, 1998 Use of time-lapse microscopy to visualize rapid movement of the replication origin region of the chromosome during the cell cycle in Bacillus subtillus Mol. Microbiol 28:883-892[ISI][Medline]
Weigel C., A. Schmidt, B. Ruckert, R. Lurz, W. Messer, 1997 DnaA protein binding to individual DnaA boxes in the Escherichia coli replication origin, oriC EMBO J 16:6574-6583