1 Center for Adaptation Genetics and Drug Resistance, Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, MA 02111, USA
2 Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, UK
3 School of Biological Sciences, University of Auckland, Private Bag 92019, Auckland, New Zealand
4 Department of Medicine, Tufts University School of Medicine, Boston, MA 02111, USA
Correspondence
Stuart B. Levy
(stuart.levy{at}tufts.edu)
Despite the vast amount of useful data that has come from the use of bioinformatic and genomic technologies, there are intrinsic limitations to the type of information that can be obtained simply by standard computational analysis of DNA sequences. Bioinformatic tools must be trained to look for sequence features associated with genes or specific motifs. Thus, computational approaches may fail to find expressed sequences that differ significantly from those characterized previously. For example, Wong et al. (2000) found a gene, specifying a 41 residue protein that affected cell-wall-synthesis inhibition, within a 602 bp region of the Escherichia coli genome that had been annotated as being an intergenic sequence. Genome arrays designed to include all annotated ORFs of a particular genome sequence will fail to identify novel expressed sequences that have escaped the annotation process. Examples include the growing number of regulatory RNA molecules that are being identified in prokaryotes. These limitations do not seriously diminish the value of genomic approaches, but do indicate that genomics should be just one of a number of tools used to study complex systems.
A complementary approach to identify and understand functional genes in bacteria even in complex environments is the promoter trapping strategy IVET (in vivo expression technology) (Mahan et al., 1993; Osbourn et al., 1987
). Promoters active in the wild, but inactive under laboratory conditions, can be isolated on a genome-wide scale on the basis of their ability to drive expression of a gene that is essential for growth. IVET (and various derivatives) has been widely used to examine the genes induced in pathogens during infection of hosts and has also been used to identify genes induced during colonization of plants, of plant-pathogenic fungi (Lee & Cooksey, 2000
), during infection of Arabidopsis thaliana (Boch et al., 2002
) and during Rhizobiumlegume symbiosis (Oke & Long, 1999
).
We have used IVET to identify Pseudomonas fluorescens promoters (and the genes they control) induced in complex soil-based environments. Screening IVET libraries constructed from genomes of both SBW25 or Pf0-1 has identified more than 50 P. fluorescens promoters (Gal et al., 2003; Rainey, 1999
; M. W. Silby & S. B. Levy, unpublished). Analysis of the trapped DNA sequences shows that most contain a recognizable promoter oriented in the correct direction with respect to transcription (Fig. 1
a). However, 10 out of 22 fusions in Pf0-1 lack a discernable promoter and are organized such that the captured DNA has a known gene oriented opposite to that necessary for transcription of the promoterless reporter (Fig. 1b
). That these fusion strains do indeed possess trapped promoters and are not merely false-positive isolates is demonstrated by showing increased survival of such strains in the wild relative to a negative control. Various groups have reported IVET fusions that are oriented in the opposite orientation to annotated gene(s) (Camilli & Mekalanos, 1995
; Mahan et al., 1993
; Rainey, 1999
; Wang et al., 1996
). Although only one such fusion was reported for SBW25 in a screen for rhizosphere-activated promoters (Rainey, 1999
), approximately 20 % of fusions recovered in that screen fell into the opposite orientation class but were not published (P. B. Rainey, unpublished). We term these cryptic fusions' to reflect the fact that the active promoters and the sequences under their transcriptional control (oriented correctly with respect to the reporter genes) have not been recognized.
|
Functional cryptic promoters can conceivably drive transcription with one of two resulting outcomes production of a non-coding RNA molecule that is not translated, or transcription of mRNA that will subsequently direct production of a protein. In the first case, the cryptic promoter might normally be responsible for the transcription of a regulatory RNA molecule, which would be antisense to the transcript of any oppositely oriented, overlapping gene (Fig. 1b) and so control its expression. There are an increasing number of regulatory RNAs described among the prokaryotes, adding to the complexity of the current view of gene regulation (Johansson & Cossart, 2003
; Masse et al., 2003
). Given the emerging view that regulatory RNA molecules are important for environmental adaptation (Repoila et al., 2003
), it is conceivable that the cryptic promoter drives transcription of an environment-specific regulatory molecule to allow appropriate modulation of activity of the target gene in a given environment, in a metabolically affordable manner. An antisense molecule would be the simplest possibility, but the diversity of regulatory RNAs means that a more broadly active regulator cannot be ruled out. That the unknown sequences might be involved in antisense regulation has been suggested (Osorio & Camilli, 2003
).
If the cryptic promoters drive production of mRNA, the findings would indicate the existence of a number of overlapping protein-coding genes that run in opposite directions. Although overlapping genes of this nature are thought to be rare in prokaryote genomes (Rogozin et al., 2002), examination of sequences available from the draft genome of Pf0-1 reveals the presence of an ORF in the same orientation as the reporter gene in each cryptic fusion construct, and on the opposite strand to a gene that gives a clear hit in BLAST searches (see Fig. 1c
, for example). A closer examination shows that these ORFs range in size from 213 to 1908 bp and the predicted translation products do not match any known or hypothetical protein in the public databases. In terms of codon usage, the mean differences when compared to the P. fluorescens codon-usage table (http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=Pseudomonas+fluorescens+[gbbct]) range from 14·81 to 27·47 %, with the shorter ORFs tending to have the greater mean difference. While the codon usage may appear to be considerably different from that expected for P. fluorescens, the fact that known genes from Pf0-1 also show high mean differences suggests that such differences are not significant. For example, RecA, DapB, GlnA and FlgB have mean differences of 14·33, 11·84, 14·69 and 16·8 %, respectively. What has led to the retention of such gene arrangements through evolution is unknown, but the potentially large numbers of such loci could indicate a functional significance. For example, there may be a competition between the two genes for optimal transcription, the outcome of which is determined by environmental factors. An alternative speculation is that there is some special relationship between the protein products of the overlapping genes. Regardless, the correlation of overlapping ORFs with the cryptic fusions is striking. When considered alongside the fact that the IVET reporter is expressed in these strains, this arrangement suggests hitherto unrecognized genes that are active only under natural environmental conditions (or repressed under laboratory conditions).
While there is yet considerable effort required to fully understand and appreciate the significance of these findings, we are moved to communicate the results for two reasons. First, to alert other researchers using similar screens not to simply discard such sequences in the belief that they represent false-positive findings. Second, and perhaps more importantly, these findings serve as a timely reminder of the ongoing potential for discovery using genetic approaches. Genetics and genomics should progress side by side to maximize the potential of both.
Acknowledgements
We are grateful to Laura McMurry for comments on the manuscript. This work was supported by a grant from the Department of Energy to S. B. L. (DE#FG02-97ER62493) and a BBSRC (UK) Research Fellowship to P. B. R.
REFERENCES
Boch, J., Joardar, V., Gao, L., Robertson, T. L., Lim, M. & Kunkel, B. N. (2002). Identification of Pseudomonas syringae pv. tomato genes induced during infection of Arabidopsis thaliana. Mol Microbiol 44, 7388.[CrossRef][Medline]
Camilli, A. & Mekalanos, J. J. (1995). Use of recombinase gene fusions to identify Vibrio cholerae genes induced during infection. Mol Microbiol 18, 671683.[Medline]
Gal, M., Preston, G. M., Massey, R. C., Spiers, A. J. & Rainey, P. B. (2003). Genes encoding a cellulosic polymer contribute toward the ecological success of Pseudomonas fluorescens SBW25 on plant surfaces. Mol Ecol 12, 31093121.[CrossRef][Medline]
Johansson, J. & Cossart, P. (2003). RNA-mediated control of virulence gene expression in bacterial pathogens. Trends Microbiol 11, 280285.[CrossRef][Medline]
Lee, S. W. & Cooksey, D. A. (2000). Genes expressed in Pseudomonas putida during colonization of a plant-pathogenic fungus. Appl Environ Microbiol 66, 27642772.
Mahan, M. J., Slauch, J. M. & Mekalanos, J. J. (1993). Selection of bacterial virulence genes that are specifically induced in host tissues. Science 259, 686688.[Medline]
Masse, E., Majdalani, N. & Gottesman, S. (2003). Regulatory roles for small RNAs in bacteria. Curr Opin Microbiol 6, 120124.[CrossRef][Medline]
Oke, V. & Long, S. R. (1999). Bacterial genes induced within the nodule during the Rhizobium-legume symbiosis. Mol Microbiol 32, 837849.[CrossRef][Medline]
Osbourn, A. E., Barber, C. E. & Daniels, M. J. (1987). Identification of plant-induced genes of the bacterial pathogen Xanthomonas campestris pathovar campestris using a promoter-probe plasmid. EMBO J 6, 2328.
Osorio, G. & Camilli, A. (2003). Hidden dimensions of Vibrio cholerae pathogenesis. ASM News 69, 396401.
Rainey, P. B. (1999). Adaptation of Pseudomonas fluorescens to the plant rhizosphere. Environ Microbiol 1, 243257.[CrossRef][Medline]
Repoila, F., Majdalani, N. & Gottesman, S. (2003). Small non-coding RNAs, co-ordinators of adaptation processes in Escherichia coli: the RpoS paradigm. Mol Microbiol 48, 855861.[CrossRef][Medline]
Rogozin, I. B., Spiridonov, A. N., Sorokin, A. V., Wolf, Y. I., Jordan, I. K., Tatusov, R. L. & Koonin, E. V. (2002). Purifying and directional selection in overlapping prokaryotic genes. Trends Genet 18, 228232.[CrossRef][Medline]
Wang, J., Mushegian, A., Lory, S. & Jin, S. (1996). Large-scale isolation of candidate virulence genes of Pseudomonas aeruginosa by in vivo selection. Proc Natl Acad Sci U S A 93, 1043410439.
Wong, R. S., McMurry, L. M. & Levy, S. B. (2000). Intergenic blr gene in Escherichia coli encodes a 41-residue membrane protein affecting intrinsic susceptibility to certain inhibitors of peptidoglycan synthesis. Mol Microbiol 37, 364370.[CrossRef][Medline]
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
J MED MICROBIOL | ALL SGM JOURNALS |