Department of Biology, University of Rochester
Department of Ecology and Evolutionary Biology, University of Arizona
Analyses of sequence evolution in Escherichia coli and Salmonella enterica have revealed that the pattern of nucleotide substitutions in enterobacterial genes is asymmetric. The incidence of CT transitions is strongly biased toward the nontranscribed strand of DNA, which accumulates such changes at a two- to threefold higher rate than the complementary transcribed strand. We previously proposed that the asymmetric distribution of C
T substitutions was caused by strand-specific biases in the occurrence and repair of DNA damage during transcription (Francino et al. 1996
; Francino and Ochman 1997
). Two processes render mutations less likely to originate on the transcribed template strand than on its complement: (1) transcription-coupled repair is induced by RNA polymerases stalled at lesions on the template strand (Hanawalt 1995
), and (2) cytosine deamination is less frequent on the template strand, which is shielded by the RNA polymerase and the nascent mRNA, than on the more exposed nontranscribed strand (Beletskii and Bhagwat 1996
). However, the pattern of nucleotide substitutions in coding regions may reflect not only the underlying mutational process, but also the action of natural selection. Given that the majority of C
T substitutions in bacterial sequences occur at synonymous sites, selection on codon usage could potentially contribute to the generation of the observed asymmetry.
To determine whether transcription alone, without the intervention of natural selection on codon usage, produces substitutional asymmetry in bacterial sequences, we analyzed patterns of substitution in two different noncoding regions: a transcribed but untranslated region, and the adjacent nontranscribed sequence. The detection of substitutional bias in the transcribed but untranslated region and the absence of bias in the nontranscribed sequence would confirm that transcription is necessary and sufficient to generate asymmetry, without a requirement for selection on codon usage. Furthermore, an increase in CT substitutions with transcription would implicate deamination events in the nontranscribed strand as the principal cause of asymmetry, whereas a decrease in G
A substitutions with transcription would implicate transcription-coupled repair of pyrimidine dimers on the complementary template strand.
Because bacterial chromosomes are so tightly packed with genes, noncoding sequences of appropriate length for analysis of rates and patterns of substitution are scarce. Intergenic regions in E. coli are typically very shortaveraging only 118 bp in lengthbut the complete genomic sequence of E. coli MG1655 has revealed some intergenic regions of much larger size (Blattner et al. 1997
). However, when we investigated sequence variation in several of the longest untranslated regions and nontranscribed sequences among natural strains of E. coli, we found that surprisingly low levels of divergence in most of these regions precluded the analysis of substitutional patterns (data not shown). Therefore, we restricted our analysis to a region which contained sufficient nucleotide sequence diversity to investigate the effect of transcription on the pattern of nucleotide substitutions. This region, the cysB-acnA region at 28.7 min on the E. coli chromosome, is well suited for this type of analysis because the transcription start points of acnA have been experimentally established (Cunningham, Gruer, and Guest 1997
) and a very likely rho-independent terminator has been located near the end of cysB (Prodromou, Artymiuk, and Guest 1992
). These transcription signals clearly delineate the transcribed and nontranscribed sequences between the cysB and acnA genes (fig. 1A
).
|
The pattern of nucleotide substitutions was reconstructed on the phylogeny relating the cysB-acnA sequences as estimated by the neighbor-joining method. The sequence alignment and tree topology were analyzed with MACCLADE, version 3.0 (Maddison and Maddison 1992
) to reconstruct the most parsimonious ancestral states and the directionality of the nucleotide substitutions. Minimal numbers of substitutions were used in subsequent analyses, i.e., substitutions that occurred with certainty but whose localization within the tree topology may or may not be determined. Substitution frequencies were obtained by dividing the observed occurrences of a given substitution by the total number of nucleotides of the type undergoing the substitution among all sequences (x1,000).
The nontranscribed intergenic region and the adjacent mRNA leader underwent different patterns of nucleotide substitution. The transcribed but untranslated leader showed a significant excess of CT over G
A transitions, similar to that in the sequenced portions of the surrounding coding sequences, cysB and acnA. In contrast, C
T-over-G
A asymmetry was not apparent in the nontranscribed intergenic sequence. For each region, C
T and G
A substitution frequencies are graphed in fig. 1B,
which also specifies the absolute numbers of C
T and G
A substitutions, over the numbers of C's and G's. Although the number of changes in the nontranscribed region was too small to affirm that no asymmetry existed, the fact that this region underwent a significantly lower frequency of C
T changes than transcribed regions containing fewer C's strongly suggests a lack of asymmetry in the region. Therefore, the situation in the cysB-acnA region supports the hypothesis that the C
T-versus-G
A asymmetry is generated during transcription by indicating (1) that the asymmetry is not apparent in regions that are not transcribed, and (2) that translation and selection on amino acid or codon usage are not necessary for asymmetry.
During transcription, two processes have been shown experimentally to affect the generation of mutations in an asymmetric manner between the two DNA strands: transcription-coupled repair (TCR) and cytosine deamination. TCR corrects bulky lesions, particularly UV-induced pyrimidine dimers, which block transcription when present on the template strand by causing the RNA polymerase to stall. Given that CT transitions are the primary mutations induced by pyrimidine dimers, they are recovered at a much higher frequency on the nontranscribed strand of active genes (Oller et al. 1992
). C
T transitions due to deamination are also more frequent on the nontranscribed strand, presumably because cytosines on this strand remain unpaired longer as the transcription bubble proceeds (Beletskii and Bhagwat 1996
; Beletskii et al. 2000
). However, TCR and cytosine deamination have different effects on the rates of C
T and G
A transitions generated during transcription: deamination causes an increase in C
T transitions (and no change in G
A), whereas transcription-coupled repair causes a decrease in G
A transitions (and no change in C
T) when the nontranscribed strand of regions that undergo transcription is compared with regions that are not transcribed (fig. 2
). Hence, comparison of C
T (and G
A) frequencies between transcribed and nontranscribed regions can reveal which of the two strand-asymmetric processes is the main generator of the asymmetric substitutional pattern observed in transcribed sequences. The frequencies of C
T substitutions in each of the sequences that undergo transcription were all significantly higher than the frequency of C
T substitutions in the nontranscribed intergenic sequence (fig. 1B
). Numbers of G
A substitutions in both the transcribed and the nontranscribed sequences were similarly low, although there were too few changes for statistical comparison. Nevertheless, the steep increase in C
T substitutions that accompanies transcription suggests that the C
T-versus-G
A asymmetry is generated by excessive deamination of cytosines on the nontranscribed strand rather than by TCR of lesions on the complementary strand.
|
Footnotes
1 Present address: Laboratoire de Génétique Moléculaire Evolutive et Médicale, E9916 INSERM Faculté de Médecine-"Necker-Enfants malades", Université René Descartes-Paris V, Paris, France.
2 Keywords: cytosine deamination
substitution patterns
strand asymmetry
transcription-induced mutations
noncoding DNA
Escherichia coli.
3 Address for correspondence and reprints: Howard Ochman, Department of Ecology and Evolutionary Biology, 233 Life Sciences South, University of Arizona, Tucson, Arizona 85721. hochman{at}email.arizona.edu
literature cited
Beletskii, A., A. S. Bhagwat. 1996. Transcription-induced mutations: increase in C to T mutations in the nontranscribed strand during transcription in Escherichia coli.. Proc. Natl. Acad. Sci. USA. 93:1391913924
Beletskii, A., A. Grigoriev, S. Joyce, A. S. Bhagwat. 2000. Mutations induced by bacteriophage T7 RNA polymerase and their effects on the composition of the T7 genome. J. Mol. Biol. 300:10571065[ISI][Medline]
Blattner, F. R., G. Plunkett III, C. A. Blochet al. (14 co-authors)1997. The complete nucleotide sequence of Escherichia coli K-12. Science. 277:14531462
Cunningham, L., M. J. Gruer, J. R. Guest. 1997. Transcriptional regulation of the aconitase genes (acnA and acnB) of Escherichia coli.. Microbiology. 143:37953805[Abstract]
Francino, M. P., L. Chao, M. A. Riley, H. Ochman. 1996. Asymmetries generated by transcription-coupled repair in enterobacterial genes. Science. 272:107109[Abstract]
Francino, M. P., H. Ochman. 1997. Strand asymmetries in DNA evolution. Trends Genet. 13:240245[ISI][Medline]
.1999. A comparative genomics approach to DNA asymmetry. Ann. N.Y. Acad. Sci. 870:428431
Frank, A. C., J. R. Lobry. 1999. A symmetric substitution patterns: a review of possible underlying mutational or selective mechanisms. Gene. 238:6577[ISI][Medline]
Freeman, J. M., T. N. Plasterer, T. F. Smith, S. C. Mohr. 1998. Patterns of genome organization in bacteria. Science. 279:1827
Fryxell, K. J., E. Zuckerkandl. 2000. Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol. Biol. Evol. 17:13711383
Grigoriev, A.. 1999. Strand-specific compositional asymmetries in double-stranded DNA viruses. Virus Res. 60:119[ISI][Medline]
Hanawalt, P. C.. 1995. DNA repair comes of age. Mutat. Res. 336:101113[ISI][Medline]
Herzer, P. J., S. Inouye, M. Inouye, T. S. Whittam. 1990. Phylogenetic distribution of branched RNA-linked multicopy single-stranded DNA among natural isolates of Escherichia coli.. J. Bacteriol. 172:61756181[ISI][Medline]
Mackiewicz, P., A. Gierlik, M. Kowalczuk, M. R. Dudek, S. Cebrat. 1999. How does replication-associated mutational pressure influence amino acid composition of proteins?. Genome Res. 9:409416
McLean, M., K. H. Wolfe, K. M. Devine. 1998. Base composition skews, replication orientation and gene orientation in 12 prokaryotic genomes. J. Mol. Evol. 47:691696[ISI][Medline]
Maddison, W. P., D. R. Maddison. 1992. MacClade v3.0Analysis of phylogeny and character evolution. Sinauer, Sunderland, Mass
Mrazek, J., S. Karlin. 1998. Strand compositional asymmetry in bacterial and large viral genomes. Proc. Natl. Acad. Sci. USA. 95:37203725
Ochman, H., R. K. Selander. 1984. Standard reference strains of Escherichia coli from natural populations. J. Bacteriol. 157:690693[ISI][Medline]
Oller, A. R., I. J. Fijalkowska, R. L. Dunn, R. M. Schaaper. 1992. Transcription-repair coupling determines the strandedness of ultraviolet mutagenesis in Escherichia coli.. Proc. Natl. Acad. Sci. USA. 88:1103611040
Prodromou, C., P. J. Artymiuk, J. R. Guest. 1992. The aconitase of Escherichia coli.. Eur. J. Biochem. 204:599609[Abstract]
Reyes, A., C. Gissi, G. Pesole, C. Saccone. 1998. Asymmetrical directional mutation pressure in the mitochondrial genome of mammals. Mol. Biol. Evol. 15:957966[Abstract]
Rocha, E. P., A. Danchin, A. Viari. 1999. Universal replication biases in bacteria. Mol. Microbiol. 32:1116[ISI][Medline]
Tillier, E. R., R. A. Collins. 2000. The contributions of replication orientation, gene direction, and signal sequences to base-composition asymmetries in bacterial genomes. J. Mol. Evol. 50:249257[ISI][Medline]