1 Pacific Northwest National Laboratory, Computational Biology and Bioinformatics Group, PO Box 999, MS: K7-90, Richland, WA 99352, USA
2 Department of Microbiology and Molecular Genetics, The University of Texas Health Science Center, Medical School, Houston, TX 77030, USA
Correspondence
Haluk Resat
haluk.resat{at}pnl.gov
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the PrrB/PrrA (photosynthetic response regulator) two-component system, PrrA serves as a response regulator, and PrrB (Lee & Kaplan, 1992) is a membrane-localized sensor kinase/phosphatase which phosphorylates PrrA upon O2 deprivation (Eraso & Kaplan, 1994
). In addition to regulating photosynthesis-gene expression, PrrA acts as a global regulator, affecting the expression of genes encoding electron-transport components, genes involved in CO2 and N2 fixation, and genes involved in hydrogen oxidation, among others (Elsen et al., 2004
; Joshi & Tabita, 1996
; Qian & Tabita, 1996
). Although the importance of the role played by PrrA in gene regulation is clear, the DNA sequence to which it binds remains poorly defined.
In the AppA/PpsR antirepressor/repressor system (Gomelsky & Kaplan, 1997), AppA (activation of photopigment and puc expression) serves as an antirepressor and modulates the repressor activity of PpsR (photopigment suppression) (Penfold & Pemberton, 1994
) such that PpsR becomes more active upon the oxidation of the quinone pool (Braatsch et al., 2002
; Oh & Kaplan, 2001
). The antirepressor AppA is also responsible for blue-light photoreception, which can affect its activity toward PpsR (Braatsch et al., 2002
; Masuda & Bauer, 2002
). PpsR functions as a tetramer with a helixturnhelix (HTH) domain at the carboxy-terminal region that genetic analysis suggests binds to a conserved DNA sequence, TGTN12ACA, where N represents a non-specific nucleotide (Gomelsky et al., 2000
). This DNA motif is found in the region upstream of the genes bch and crt, as well as the puc operon, all of which encode products required for photosynthesis, i.e. bacteriochlorophyll, carotenoids and structural proteins, respectively (Zeilstra-Ryalls et al., 1998
).
The R. sphaeroides regulator FnrL is considered to be a homologue of the Escherichia coli anaerobic regulatory protein FNR (fumarate and nitrate reduction regulatory protein) (Zeilstra-Ryalls & Kaplan, 1995). This hypothesis is based in part on the FnrL amino acid sequence, which shows homology to known functional domains of the FNR protein. By analogy, it has also been hypothesized that FnrL may recognize the FNR consensus sequence TTGATN4ATCAA (Zeilstra-Ryalls & Kaplan, 1998
). This consensus sequence has been found in the sequences upstream of hemA, hemN and hemZ, genes involved in the tetrapyrrole biosynthetic pathway, the bchE gene, and the puc operon (Choudhary & Kaplan, 2000
; Zeilstra-Ryalls & Kaplan, 1995
). Regions upstream of the ccoNOQP operon encoding the cbb3 oxidase, the rdxBHIS operon, and the structural genes encoding the aa3 cytochrome oxidase (Zeilstra-Ryalls et al., 1998
) also contain the FnrL consensus sequence, suggesting that FnrL indirectly regulates the volume of electron flow toward different terminal oxidases and to the Rdx redox centre by changing their gene-expression levels.
The purpose of this study was to predict and identify DNA motifs present in the R. sphaeroides 2.4.1 genome that may bind the transcription factors PrrA, PpsR and FnrL, and thereby identify which genes in the genome may be influenced by these regulators. The rationale behind our methodological approach was as follows: If genes a, b, c, d and e, show high levels of expression under condition x, and low levels under condition y, and no expression under condition z, then it is plausible that the expression of these genes may be controlled by the same regulatory protein. If so, then this regulator is hypothesized to recognize the same signature within the DNA sequence. In brief, we carried out hierarchical clustering of R. sphaeroides genes using microarray mRNA expression data to follow which genes showed concomitant increased/decreased expression patterns under seven different experimental conditions. We then searched loci, i.e. the regions upstream of these genes or their operons, for signature sites that suggest co-regulation. These sites were then used to generate a predicted consensus sequence.
The application of both microarray data clustering and motif-finding approaches to a large dataset has allowed us to independently find putative PpsR binding sites that are in good agreement with the previously published PpsR binding consensus. It has also allowed us to predict refinements to that consensus. Our results for FnrL binding sites are also in agreement with the previous predictions for the FnrL consensus sequence, although here we extend the likely numbers of target genes. For PrrA, our predictions suggest a PrrA DNA binding sequence comprising two blocks with an internal gap of variable length, again consistent with previously published predictions. We have also calculated the statistical distribution of the variable gap widths between the conserved block elements of the binding motif for PrrA. Using the predicted PpsR, FnrL and PrrA consensus sequences deduced from this study, we were able to predict the genes that are potentially regulated by these transcription factors throughout the genome. We note that due to the statistical filtering approach which was used and the limited amount of data available, our findings are likely to contain false-positive and false-negative binding sites. However, our analysis of microarray data from a PrrA mutant suggests that our method is sufficiently robust to assist in the prediction of genes controlled by this regulator. These newly identified target genes and their mode of regulation are now more amenable to study using classic genetic and biochemical approaches, in other words our findings will be used to design new experiments for the next round of studies.
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
For validation of the methodology, R. sphaeroides 2.4.1 and a prrA mutant (PRRA2) were grown under anaerobic dark DMSO conditions. Both strains were grown as independent triplicate cultures and treated as described above.
Strains.
R. sphaeroides strain 2.4.1 (wild-type, ATCC BAA-808) and two in-frame deletion mutation-containing strains, ccoNOQP (Oh & Kaplan, 2002) and rdxB (Oh & Kaplan, 1999
), were used in this study. The mutant strains are defective in part of the known signal-transduction pathway for photosynthesis gene expression. In the wild-type, the photosynthesis genes are only expressed under anaerobic conditions; however, in these two mutant strains, these genes are expressed under aerobic conditions. We included the mutant data in our analysis because in terms of a statistical approach, any data pertaining to photosynthesis gene expression can add to the information content by providing a larger dataset to further enhance the analysis.
The prrA mutation used for validation was created by deletion of part of the prrA gene and has been described previously (Eraso & Kaplan, 1997).
R. sphaeroides growth conditions.
Briefly, wild-type cells were grown under the following five growth conditions: aerobic (30 % O2), photosynthetic (3, 10 and 100 W m2 light intensity), and DMSO with 10 W m2 light intensity. The two mutant strains were only grown under aerobic, i.e. 30 % O2 conditions. For validation of the study, both wild-type and PRRA2 were grown under anaerobic dark DMSO conditions.
In detail, the strains were grown at 29±1 °C on Sistrom's minimal medium A (SIS) containing succinate as carbon source (Sistrom, 1962). Aerobic cultures were grown while sparging with a gas mixture of 30 % O2/69 % N2/1 % CO2 and harvested at a low OD600 of 0·18±0·02 in order to ensure oxygen saturation. Photosynthetic cultures were grown at light intensities of 3, 10 and 100 W m2 (measured at the surface of the growth vessel) while sparging with 95 % N2/5 % CO2, and harvested at OD600 0·45±0·05 to prevent self-shading. For cultures grown with DMSO at 10 W m2, the cells were cultivated in the presence of 60 mM DMSO (to change the redox state of the cells), and were also sparged with a gas mixture of 95 % N2/5 % CO2 (to generate anaerobic conditions) and harvested at OD600 0·45±0·05. All light intensities were measured using a YSI-Kettering model 65A radiometer (Simpson Electric Co.).
For validation, wild-type and PRRA2 were grown in Sistrom's medium containing a final concentration of 60 mM DMSO. The medium was sparged with 95 % N2/5 % CO2. Cells were harvested at the densities described above.
RNA manipulation.
A previously described RNA isolation procedure (Roh & Kaplan, 2002) was modified to optimize the isolation of intact mRNA for microarray analysis (Roh et al., 2004
). We modified the earlier procedure by eliminating cell collection by centrifugation. A volume of cells grown as described above was directly pipetted into an equal volume of 2x lysis buffer (100 °C). After thorough mixing, lysed cells were immediately transferred to an equal volume of hot phenol solution (65 °C). The total time required to transfer from the culture vessel to hot phenol was kept to less than 1 min to minimize mRNA degradation and to maximize the yield of intact mRNA. The remainder of the RNA purification procedure was identical to that described previously (Roh & Kaplan, 2002
). Each isolated RNA sample was treated with 50 µl RQ1 RNase-free DNase (1 unit µl1, Promega) and 50 µl 10x buffer in a total volume of 500 µl. Samples were incubated for 1 h at 37 °C, extracted with acidic phenol, acidic phenol/chloroform, and chloroform, then precipitated by adding 1 ml ethanol. The pellet was washed with 75 % ethanol and suspended in diethylpyrocarbonate (DEPC)-treated water. Total RNA was pelleted again by adding the same volume of 4 M LiCl, washed with 75 % ethanol, and resuspended in DEPC-treated water. Chromosomal DNA contamination was tested by PCR amplification using the rdxB-specific primers (a and b), as described previously (Roh & Kaplan, 2002
).
Microarray experiments.
The R. sphaeroides 2.4.1 GeneChip was custom designed and manufactured by Affymetrix Inc. (Pappas et al., 2004). In most cases, one probe set was designed to represent one gene/ORF. But there are cases where more than one probe set with the same RSP number (e.g. RSP1556_f_at and RSP1556_r_at) was used to represent the same gene/ORF. Total RNA was prepared from three independent cultures of R. sphaeroides. cDNA synthesis, fragmentation, labelling and hybridization were adapted, with few modifications, from the methods optimized for the GeneChip designed for the Pseudomonas aeruginosa Genome Array by Affymetrix, Inc. (http://www.affymetrix.com/support/technical/manuals.affx). Briefly, 10 µg total RNA was annealed with 750 ng of random primers (New England Biolabs) and incubated at 70 °C for 10 min, and then at 25 °C for 1 h. First-strand cDNA was synthesized with 200 units µl1 SuperScript II with 5x 1st strand buffer (Invitrogen Life Technologies) in the presence of 10 mM DTT, 0·5 mM dNTPs and 0·5 units µl1 SUPERase In RNase inhibitor (Ambion) (25 °C for 20 min, 37 °C for 1 h, 42 °C for 1 h, 70 °C for 10 min). After removal of RNA by alkaline treatment and neutralization, the cDNA synthesis product was purified using the QIAquick PCR purification kit (Qiagen). For fragmentation, 79 µg cDNA and 1 unit of RQ1 DNase I (Promega) were incubated at 37 °C. After 1 min, one-third of the cDNA/DNase mixture was removed and heat-inactivated at 100 °C for 5 min. Further one-third aliquots were removed at 2 and 3 min and similarly heat-inactivated. The desired cDNA size range of 50200 bases was selected after 3 % agarose gel electrophoresis using 200 ng fragmented cDNA. The fragmented cDNA was 3'-end labelled using the Enzo BioArray Terminal Labelling kit (Affymetrix) with biotinddUTP. Target hybridization, washing, staining and scanning were performed according to the protocol supplied by the manufacturer using a GeneChip Hybridization Oven 640, a Fluidics Station 400, and the Agilent GeneArray Scanner under the control of Affymetrix Microarray Suite 5.0.
Data files were analysed using the MAS 5.0 (Affymetrix Inc.) and dChip 1.2 software (Li & Hung Wong, 2001; Li & Wong, 2001
). Raw intensity values from different experiments were normalized against a target intensity value for across-experiment comparison. Probe intensities of the triplicate array experiments for every condition were then further intensity-normalized using the total array intensity of the chips. The mean of triplicate measurements was used to describe the expression level of a gene for that particular condition, and the mean expression values for the seven experimental conditions were then used in the clustering analysis.
Clustering analysis.
Genes were clustered according to their expression patterns in the seven different experiments using the dChip software (Li & Hung Wong, 2001; Li & Wong, 2001
). The hierarchical clustering method used within dChip has been described elsewhere (Eisen et al., 1998
). Before clustering, genes that showed a relative expression variation (ratio of the standard deviation to mean value) of less than 0·5 over the seven experiments were determined in order to filter out the genes whose expression change across the studied conditions was insignificant. Of the original 4490 probe sets, 3583 fell into this class and were removed from further analysis. The remaining 907 probe sets that showed significant changes were used in the clustering analysis. It should be noted that the cutoff used in the selection is somewhat arbitrary, and this approach has the potential to weigh towards genes expressed at low levels. To verify that our filtering approach does not introduce a serious selection bias, we have calculated the distribution of the intensities of the genes for the total and the selected sets. Computed distributions (Supplementary Fig. S1) clearly showed that the filtering schema utilized does not cause a noticeably significant statistical bias.
The expression values of the 907 probe sets used in the clustering analysis were further preprocessed such that they had a zero mean and unit standard deviation over the seven experiments. The analysis utilized the average linkage method, in which the distance between pairs of genes is defined as 1R, where R is the correlation coefficient between the expression patterns.
DNA motif search.
The MEME (Bailey & Elkan, 1994) and BioProspector (Liu et al., 2001
) programs were used to search the DNA sequences upstream of genes for DNA binding motifs. In this work, we often refer to the sequences upstream of genes and operons collectively as loci. We use this term for convenience, but the reader should realize that it can mean that the sequence originated from upstream of a gene or an operon. Up to 1 kb of sequence upstream from an individual gene or from the first gene in each operon was extracted from the genomic sequence and used in these motif searches. When available, the structures of operons were obtained from the literature (Oh & Kaplan, 2001
); otherwise they were predicted based on the relative chromosomal positions of the genes, their putative transcription directions, their intergenic sequence lengths or their functions.
Given a group of related DNA or protein sequences, the MEME program (Bailey & Elkan, 1994) uses a statistical expectation maximization technique to find different fixed-width motifs. The BioProspector program (Liu et al., 2001
) uses a Gibbs sampling strategy to detect sequence motifs, and the motifs can be allowed to have variable widths. Using the putative DNA binding motifs detected by MEME and BioProspector, the MAST program (Bailey & Gribskov, 1998
) was then applied to scan sequences upstream of the target genes to search for matches to the detected motifs. For our analysis, we modified the MAST program so that it could search for motifs with variable widths.
The two chromosomes of R. sphaeroides are predicted to encode about 3980 genes, 2095 (53 %) of which have intergenic upstream sequences (loci) with lengths 50 bp. We collected the intergenic upstream sequences of these 2095 genes and, in addition, we collected the sequence upstream of pucB (2096 upstream sequences in total). This latter sequence was dealt with separately because its 5' end overlaps with the 3' end of an upstream hypothetical gene (RSP0313). This gene organization, i.e. the lack of an intergenic sequence, would have prevented the pucB upstream sequence from being captured by the
50 bp cutoff described above. It was known that the pucB promoter is embedded in the coding sequence of the upstream gene. In addition, pucB has been shown experimentally to be regulated by PpsR/FnrL/PrrA (Lee & Kaplan, 1992
); it was therefore important to deal with it as a special case and thus include it in the analysis.
The R. sphaeroides genome sequence, the chromosomal locations of the encoded genes, their upstream sequences and annotations can be accessed at the website http://genome.ornl.gov/microbial/rsph/.
Consensus diagrams (cf. Figs 2 and 3) were created using the WebLogo program (Crooks et al., 2004
) through the website at http://weblogo.berkeley.edu/.
|
|
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Chromosome I of R. sphaeroides contains a contiguous 67 kb region that encompasses the photosynthesis gene cluster and encodes the puc and puf operons, bch genes, crt genes, photosynthesis gene regulators and other photosynthesis-related genes (Choudhary & Kaplan, 2000
). Seventy-nine of the probe sets on the microarray chip represent the genes located in this 67 kb photosynthesis region. Thirty-seven of these were included among the 907 probe elements selected for clustering, and constituted 4·1 % of the probe elements.
Two of the clusters generated by clustering analysis contained a significant number of genes and operons that lie within the 67 kb region of chromosome I and are functionally integrated with photosynthesis. These clusters will be referred to as clusters with high photosynthesis content (CHPC) (see Fig. 1). CHPC1 is composed of 65 probe sets, of which 21 (32 %) lie within the 67 kb photosynthesis gene region. CHPC2 is composed of 44 probe sets, of which 10 (23 %) lie within the 67 kb photosynthesis region. Probe sets relating to photosynthesis that are contained within these two CHPCs are listed in Table 1
. In our analysis, 38 and 23 loci were derived from CHPC1 and CHPC2, respectively. Since four loci were common to CHPC1 and CHPC2, the combined clusters contained 57 loci. The complete list of genes and operons contained in the two clusters is included in Supplementary Table S1.
|
|
DNA binding motif search
Since CHPCs contain a large number of genes functionally related to photosynthesis, investigating the regulation of the genes belonging to these clusters can help us understand the transcriptional regulation of photosynthesis gene expression in R. sphaeroides. The loci within the CHPCs were searched using the MEME and BioProspector programs. We first searched for motifs in the loci of each individual CHPC and then repeated the searches after combining the clusters. We first present the motifs detected in the loci and then discuss the properties of the predicted DNA recognition motifs that were detected. We particularly emphasize the detection of the putative PrrA DNA binding motif specific to R. sphaeroides.
CHPC1.
The MEME program was used to search and generate the six most statistically significant motifs in the loci belonging to CHPC1. To be inclusive, a window of 650 bp for the motif length was used in the search. Each upstream sequence was allowed to contain any number of occurrences of each detected motif. Among the top six detected motifs, three were found to be of particular interest.
The first detected motif, TGTCA[A/G][C/A]NNAANTTGACA, has a 6 bp inverted repeat form and reproduces the known less-restrictive DNA binding sequence pattern that has been suggested to be recognized by PpsR: TGTN12ACA (Gomelsky et al., 2000). This was the highest-ranking motif in the search; its probability score matrix is reported in Supplementary Table S2. The second motif (ranked second; Supplementary Table S3), TTGA[T/C][C/A]C[G/A/T][G/C][A/G]TCAA, also has a palindromic structure and matches the hypothesized FnrL consensus sequence TTGATN4ATCAA (Zeilstra-Ryalls & Kaplan, 1998
). We note that the detected PpsR and FnrL DNA binding motifs have perfect inverted-repeat forms, even though a palindromic structure was not imposed in the search. The third detected motif (ranked sixth), GC[G/T][G/T/C]C[C/A/T]C[T/G]CT[G/T]CC[G/T]C, has a 5 bp inverted-repeat region that is poorly conserved and resembles DNA recognition sequences that have previously been proposed for PrrA (Supplementary Table S4). The identification and possible biological significance of this predicted highly degenerate PrrA binding motif will be discussed in depth later.
CHPC2.
MEME searching parameters for CHPC2 were identical to those for CHPC1, except that the length of the motif was set more stringently to vary between 14 and 20 bp to limit the lengths of detected motifs. Supplementary Tables S2S4 list the three top-ranked motifs detected by MEME. Comparison of the motif results for CHPC2 with the results for CHPC1 shows that the motifs detected for the individual clusters are in good agreement (Supplementary Tables S2S4). We note that, as it contains more elements and had a higher content of photosynthesis-related genes and operons, the predictions for CHPC1 may be more reliable than those for CHPC2 for predicting motifs involved in photosynthesis gene regulation.
Combined CHPCs.
Since, in general, increasing the sample size can be expected to lead to an increase in the statistical information content, we merged the sets of loci for the two CHPC clusters and then searched for motifs in the combined upstream sequence set. Search parameters for the combined clusters were the same as those used for CHPC1. Not surprisingly, transcription-factor binding motifs found for the combined clusters are very similar to the motifs found using the data for individual CHPC clusters (Supplementary Tables S2S4). As we expect the predictions based on larger sample sizes to have better statistical relevance, we base our subsequent discussion mostly on the results for the combined clusters.
PpsR binding motif.
Earlier studies suggest that PpsR binds to the nucleotide sequence TGTN12ACA (Gomelsky et al., 2000; Lee & Kaplan, 1992
). One of the motifs found during our searches of the combined clusters, TGTCA[A/G]NN[A/C][A/T][A/T/C]N[T/C]TGACA (Supplementary Table S2), is in agreement with this earlier finding, but is significantly more refined than the previously published sequence (TGTN12ACA). We therefore assigned this motif as the new predicted PpsR consensus sequence (Supplementary Table S2 and Fig. 2
).
Using the set of 2096 upstream region sequences, the MAST program (Methods) was applied to search within the genome for genes potentially regulated by PpsR. The search resulted in the detection of 11 genes whose upstream sequences contain the new PpsR DNA binding motif. These 11 predicted PpsR-targeted genes, together with their fold changes between expression levels at 10 W m2 light intensity (without DMSO) versus expression under aerobic conditions (30 % O2), are listed in Table 2. Ten of the 11 predicted genes are known to be regulated by PpsR (Choudhary & Kaplan, 2000
; Moskvin et al., 2005
; Zeng et al., 2003
), an observation that supports our approach. Strikingly, with the exceptions of argD and bchC, nine of the genes are encoded in operons or by genes belonging to the two CHPC clusters. We therefore conclude that PpsR is not a major regulator outside the photosynthesis genes in R. sphaeroides.
|
|
PrrA binding motif.
The PrrA DNA binding motifs that were detected when the two clusters were examined independently and the motifs detected when the two clusters were combined are compared in Table 4. Although the motifs found in different loci sets are similar, there are noticeable differences at some of the nucleotide positions between detected motifs. The PrrA recognition motif found for the combined clusters is a mixture of the motifs observed in the loci searches for the individual CHPCs (Table 4
). The first eight nucleotide positions of the motif for the combined clusters, T[G/A/C]CGACA[C/G], and the subsequent eight positions, [T/A][C/A]TGTCG[C/A], show best matches with the motifs from CHPC2 and CHPC1, respectively.
|
Further analysis of the PrrA recognition motif
The predicted DNA binding sequence of PrrA found using the MEME program is highly degenerate (Table 4). To determine if our results depended on the motif search algorithm, we utilized another program, BioProspector (Liu et al., 2001
), to repeat the search for the PrrA motif. An advantage of the BioProspector program is that it allows for variable-width pattern searches in which the investigated motif can have the form block1gapblock2, where block1 and block2 refer to the two recognition elements directly contacted by a regulator. Both blocks have fixed widths and the intervening gap can be of variable length. In the search for the PrrA motif, a 6-[0-10]-5 search parameter was used, i.e. the widths of block1 and block2 were 6 and 5 bp, respectively, and the intervening spacing (gap) had a range of 010 bp. One of the detected motifs (Supplementary Table S8), [C/T][G/C]CGG[C/G]-gap-G[T/A]C[G/A][C/A], is almost identical to the PrrA motif that was found using the MEME program (Table 4
). We therefore assigned this motif as the putative PrrA DNA binding sequence with a variable width. The only notable disagreement between the MEME and BioProspector results is for the fifth position, where A dominates the MEME motif while the BioProspector result is dominated by G (Table 4
). Thus, the fifth position of the PrrA motif is inconclusive from our results and, as both programs seem to perform equally well, we predict that both A and G are probable.
To further probe the characteristics of the predicted PrrA consensus sequence, we have also compared the PrrA DNA binding motifs that were found in our analysis with the consensus sequences that have been predicted by other groups in earlier biochemical studies (Emmerich et al., 2000; Laguri et al., 2003
; Swem et al., 2001
). As shown in Table 4
, PrrA DNA binding motifs that were detected in our analysis for the combined cluster dataset are in good agreement with the predictions made by other groups using different approaches (Emmerich et al., 2000
; Laguri et al., 2003
; Swem et al., 2001
). The most significant difference between our predictions and those of earlier studies is that rather than being non-specific, our analysis specifies that position 13 in the motif is either T or A. Implications of this close agreement between our new results and these earlier published results will be discussed later.
In our motif search using the BioProspector program, we looked for variable gap motifs where the gap ranged between 0 and 10 bp. The detected motif [C/T][G/C]CGG[C/G]-gap-G[T/A]C[G/A][C/A] was observed at 170 different DNA sites in loci belonging to the CHPC clusters. These 170 putative PrrA DNA binding sites were distributed among upstream sequences of 51 out of the 57 operons belonging to the clusters. We note that results obtained using the MEME and the BioProspector programs are in good agreement (Table 4), and therefore this observation is unlikely to be an artifact of an individual algorithm.
Fig. 4 shows the percentage distribution of the widths of intervening spacers among the 170 predicted PrrA binding sites. The most probable gap width is 5 bp (17 %), which coincides with the distance (from position 7 to 11) in Table 4
. This lies between the two inverted repeats in the fixed-gap motif detected by MEME for the combined clusters. Although a gap that varies between 0 and 10 bp is probably too variable to be real, we opted for a large gap range in our motif search to be inclusive in the searches. As shown in Fig. 4
, predictions for PrrA DNA binding motifs with very small and very large gaps occur less frequently than motifs with a 5 bp gap. However, these very large and small gaps still exist at a statistically significant number of places.
|
BioProspector detected putative PrrA binding sites in 51 loci. For each of these loci, the putative PrrA binding sequence that showed the best match to the motif [C/T][G/C]CGG[C/G]-gap-G[T/A]C[G/A][C/A] (Supplementary Table S8) was selected and the gap width analysed. The distributions of the gap widths for these best matches are depicted in Fig. 5. Among the 51 best-matching sites, the 5 bp gap had the highest frequency; however, other statistically significant gap widths also occur. Among the 51 loci, 11 were predicted by BioProspector to have only one putative PrrA binding site. The statistical distribution of gap widths of these 11 PrrA binding sites is reported in Supplementary Fig. S2. Again, no single gap width was dominant.
|
|
|
To predict the relative importance of the interplay of PpsR, FnrL and PrrA on a genome-wide scale, we combined our binding-site data for the three regulators. Fig. 6 shows the predicted overlap in their regulatory roles. Of the 11 predicted PpsR regulons, eight were among the 1285 possible PrrA targets, whereas for the 40 genes likely to be regulated by FnrL, 32 were potential PrrA targets. Of the 2096 loci examined, two genes, namely pucB and bchE, are predicted, solely on the basis of the motif searches, to be regulated by all three regulators. Genetic approaches support this conclusion (Oh et al., 2000
).
|
We then determined how many of the 1285 genes predicted to be regulated by PrrA showed changes in their expression pattern when compared to wild-type. We found that 523 genes showed a change of expression pattern of 1·5-fold (significant), and 520 genes showed a change of expression pattern of <1·5-fold (considered insignificant in our selection scheme). The remaining genes were called absent by the analysis software, i.e. their level of expression was too low to be detected. This suggests that our methodology predicted 523 of the 850 genes that showed a significant change in expression, while falsely identifying 520 genes.
Of the 523 genes considered significant, 193 showed decreased expression in PRRA2 (predicted to be PrrA acting as an activator) but, surprisingly, 329 genes showed increased expression (PrrA acting as a repressor). This result was a surprise, as PrrA is generally thought to act as an activator. These microarray results have been confirmed in part by randomly selecting seven genes and carrying out Northern blot analysis (J. M. Eraso & S. Kaplan, unpublished results).
![]() |
DISCUSSION AND CONCLUSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Using a combination of structure determination and sequence analysis methods, Laguri et al. (2003) presented a convincing argument that PrrA binds to the DNA as a homodimer, and that the width of the gap between the DNA sequences GCGNC and GNCGC directly contacted by the two monomers can be variable. The variable gap in the DNA recognition motif is supported by our results. However, the R. sphaeroides genome has a 69 % G+C content and the consensus sequence of PrrA [C/T][G/C]CGG[C/G]-gap-G[T/A]C[G/A][C/A] is also dominated by G and C, which makes the search for putative PrrA binding sites less reliable. Thus, as discussed by Laguri et al. (2003)
, although the gap width of the PrrA binding motif may be variable, one should be cautious about predicted binding sites with very large or small gap widths. Such sites should be treated as either false-positive detections or very-low-affinity binding sites until confirming biological data are collected.
In addition to determining a variable-gap PrrA consensus sequence, we also refined the putative PpsR consensus sequence, and confirmed the putative FnrL consensus sequence. The detected consensus sequences for PpsR, FnrL and PrrA were then used to predict their potential regulons on a genome-wide scale. Of 11 genes detected by MAST to be regulated by PpsR (Table 2), 10 have been reported earlier to be potentially regulated by PpsR (Choudhary & Kaplan, 2000
; Moskvin et al., 2005
; Zeng et al., 2003
). These 10 genes are indicated in Table 2
and their encoded products are all related to the photosynthetic metabolic function; examples are bacteriochlorophyll biosynthesis (bch genes), carotenoid biosynthesis (crt genes), light-harvesting complex (puc genes), and the regulation of photosynthesis genes (ppaA). Not surprisingly, these 10 genes showed increased mRNA abundance under anaerobic conditions with 10 W m2 light intensity compared with aerobic growth conditions (Table 2
). This result can be explained in part by the observation that under anaerobic conditions with light, AppA inhibits the repression activity of PpsR and thus derepresses PpsR-targeted genes (Braatsch et al., 2002
; Masuda & Bauer, 2002
). Among these 10 genes, we found that the upstream sequences of four genes contain two predicted PpsR binding sites. Interestingly, for pucB and puc2B, the two predicted PpsR binding sites are separated by 7 bp, whereas for the two divergently transcribed genes, bchF and ppaA, the two sites are separated by 126 bp. The presence of two binding sites in the upstream sequence might be explained by the tetrameric structure of PpsR (Gomelsky et al., 2000
). However, binding of a PpsR tetramer to two sites that are separated by 126 bp probably requires either DNA looping or possibly the binding of two tetramers to the target sites. The latter possibility has been described in Bradyrhizobium, where, under oxidizing conditions, PpsR binds as an octamer (Jaubert et al., 2004
). Thus it would be interesting to see the effect of deleting one of the two predicted binding sites in R. sphaeroides.
Very recently, two divergently transcribed genes involved in the early steps of tetrapyrrole biosynthesis, haem (RSP0680) and hemC (RSP0679), were experimentally identified to be targeted by PpsR (Moskvin et al., 2005). Their observed PpsR binding sites are all located within the coding regions for these two genes. As we chose the upstream regions of the genes to search for the regulator DNA binding sites (see Methods), these sites were not included in our set of upstream region sequences and therefore could not be detected with our approach. These, however, are not real false negatives; lack of their detection is purely due to the DNA-region selection criteria used. Compared with the 10 detected photosynthesis-related genes, argD, which encodes acetylornithine aminotransferase, does not show an apparent link with photosynthesis. The expression level of argD under aerobic conditions is similar to that of puc2B. However, under photosynthetic conditions at 10 W m2, puc2B mRNA abundance increases 25-fold, whereas argD mRNA abundance decreases (1·5-fold), which cannot be explained by the role of AppA in mediating the repression activity of PpsR under anaerobic photosynthetic conditions. Thus, we suggest that argD may be a false-positive detection, or that PpsR may under certain conditions function as an activator (Jaubert et al., 2004
). Based on our motif-searching results, we conclude that PpsR in R. sphaeroides is to a large extent specific for the regulation of photosynthesis-related genes (Moskvin et al., 2005
).
Compared to the genes regulated by PpsR, 40 genes that are predicted to be regulated by FnrL exhibit a much broader range of biological functions, including photosynthesis, signal transduction, electron transport, redox homeostasis and translation elongation (Supplementary Table S5). Due to their diverse functions, the expression patterns of these 40 genes under photosynthetic growth conditions, compared with aerobic conditions, vary considerably (Supplementary Table S5) (Kang et al., 2005). For example, all five photosynthesis-related genes (bchE, pucB, hemN, hemZ and hemA) have a significant increase in mRNA abundance (6- to 16-fold), while the two aa3 oxidase subunits (coxI and coxII) show decreases in mRNA abundance by ninefold, whereas the expression of fnrL itself shows little change. Thus, FnrL can exert its anaerobic regulation both positively and negatively.
When compared with the genome-wide predictions of regulation by PpsR and FnrL, a much larger number of genes (1285 of 2096 genes) have the potential to be influenced by PrrA. Noting that we lack the benchmark data to compute the possible false-positive detection percentages, we stress that a sizeable percentage of the predictions of putative regulation by PrrA may be false. If they are not false positives, that such a large number of genes are predicted to be candidates for PrrA regulation is probably influenced by the variable gap widths (which were allowed to vary between 3 and 7 bp) and the high G+C content of the R. sphaeroides genome. It also suggests that PrrA may require a much less stringent sequence pattern for binding compared with the binding requirements for PpsR and FnrL. Such a finding may also suggest that PrrA has the potential to act in a much more global fashion than the other two regulators, FnrL and PpsR. This suggestion has been reinforced by DNA microarray experiments, in which the gene expression of prrA mutant PRRA2 and wild-type cells grown under dark DMSO conditions was compared. In these experiments, the transcription of at least 850 genes (showing a transcription difference of >1·5-fold) was found to be affected by the absence of PrrA (J. M. Eraso & S. Kaplan, unpublished results). Although the number of detected genes depends on the used fold ratio cutoff, the finding that 523 of these genes were captured by our predictions confirms the expected global regulatory role for PrrA and in part validates the methodology described here. The finding that for 60 % of these genes PrrA may act as a repressor turns on its head the conventional idea of the role of PrrA as an activator. These findings clearly suggest that it performs as both a repressor and activator, with a slight leaning in favour of repression.
It is interesting to note that in the microarray comparison between wild-type and PRRA2, only 523 genes were captured by prediction compared to the 850 genes found experimentally. It might be expected that because of possible false positives our method would capture more, not fewer, than the 850 genes found by microarray analysis. However, in our method, we scanned sequences upstream of operons. In an operon, by definition, there are always fewer upstream sequences than genes; for example, an operon of four genes will only have one upstream sequence. Therefore, in the highly unlikely event that our predictions were perfect, we would always underestimate the number of genes regulated by PrrA. This problem is compounded by the fact that binding sites can be buried within the coding regions of stand-alone genes (Moskvin et al., 2005), and operons will also be missed in our method, resulting in an underestimation of genes controlled by a regulator.
As with all prediction methods, the user should be aware of the possibilities for error. In the case of PrrA, overestimation of binding sites may occur as a result of genome G+C composition and a highly redundant PrrA binding sequence coupled with its own high G+C composition. Underestimation in this case can occur because the number of upstream sequences is always less than the number of operons and hence coding regions in the genome. In addition, our method suggests genes that may be directly controlled by regulator binding. It misses completely all genes where the effect of the regulator is indirect, i.e. the regulator is the first step or an intermediate in a longer regulatory pathway.
PrrA, FnrL and PpsR are three major transcription regulators that control the expression of photosynthesis genes of R. sphaeroides in response to environmental stimuli. By selecting two clusters enriched for photosynthesis genes from the microarray clustering results and analysing the loci belonging to these two clusters, we obtained PpsR and FnrL consensus sequences, as well as a variable gap motif that is predicted to be recognized by PrrA of R. sphaeroides. By applying this approach to other clusters derived from the microarray data, it should be feasible to determine the consensus sequences recognized by transcription factors involved in regulating other biological processes. One of the main aims of this study is to use computational methods to identify a small number of targets to be investigated in future experimental studies. As our results show, the ability to determine the DNA binding sequences of the regulators of interest and the ability to do a whole-genome-level search for putative regulatory targets are useful filtering tools to direct future experiments towards a limited number of genes. Such computational approaches are also useful in putatively distinguishing the profile of the transcriptional regulators, i.e. whether they control a small or large number of genes. This work is being extended in two ways: one involves the expression patterns obtained from genes using the microarray analysis of R. sphaeroides PpsR, FnrL and PrrA mutants; the second involves the direct examination by biochemical and genetic techniques of genes identified in this study as being subject to regulation by each of the three regulators. Such studies are now under way.
![]() |
ACKNOWLEDGEMENTS |
---|
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bailey, T. L. & Gribskov, M. (1998). Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 4854.[Abstract]
Braatsch, S., Gomelsky, M., Kuphal, S. & Klug, G. (2002). A single flavoprotein, AppA, integrates both redox and light signals in Rhodobacter sphaeroides. Mol Microbiol 45, 827836.[CrossRef][Medline]
Choudhary, M. & Kaplan, S. (2000). DNA sequence analysis of the photosynthesis region of Rhodobacter sphaeroides 2.4.1. Nucleic Acids Res 28, 862867.
Comolli, J. C., Carl, A. J., Hall, C. & Donohue, T. (2002). Transcriptional activation of the Rhodobacter sphaeroides cytochrome c2 gene P2 promoter by the response regulator PrrA. J Bacteriol 184, 390399.
Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. (2004). WEBLOGO: a sequence logo generator. Genome Research 14, 11881190.
Du, S. & Bauer, C. E. (1999). DNA binding characteristics of RegA. A constitutively active anaerobic activator of photosynthesis gene expression in Rhodobacter capsulatus. J Biol Chem 274, 1634316348.
Dubbs, J. M. & Tabita, F. R. (2003). Interactions of the cbbII promoter-operator region with CbbR and RegA (PrrA) regulators indicate distinct mechanisms to control expression of the two cbb operons of Rhodobacter sphaeroides. J Biol Chem 278, 1644316450.
Dubbs, J. M., Bird, T. H., Bauer, C. E. & Tabita, F. R. (2000). Interaction of CbbR and RegA* transcription regulators with the Rhodobacter sphaeroides cbbI promoter-operator region. J Biol Chem 275, 1922419230.
Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95, 1486314868.
Elsen, S., Swem, L. R., Swem, D. L. & Bauer, C. E. (2004). RegB/RegA, a highly conserved redox-responding global two-component regulatory system. Microbiology and Molecular Biology Reviews 68, 263279.
Emmerich, R., Strehler, P., Hennecke, H. & Fischer, H. M. (2000). An imperfect inverted repeat is critical for DNA binding of the response regulator RegR of Bradyrhizobium japonicum. Nucleic Acids Res 28, 41664171.
Eraso, J. M. & Kaplan, S. (1994). prrA, a putative response regulator involved in oxygen regulation of photosynthesis gene expression in Rhodobacter sphaeroides. J Bacteriol 176, 3243.[Abstract]
Eraso, J. M. & Kaplan, S. (1997). Oxygen-insensitive synthesis of the photosynthetic membranes of Rhodobacter sphaeroides: a mutant histidine kinase. J Bacteriol 177, 26952706.
Gomelsky, M. & Kaplan, S. (1997). Molecular genetic analysis suggesting interactions between AppA and PpsR in regulation of photosynthesis gene expression in Rhodobacter sphaeroides 2.4.1. J Bacteriol 179, 128134.
Gomelsky, M., Horne, I. M., Lee, H. J., Pemberton, J. M., McEwan, A. G. & Kaplan, S. (2000). Domain structure, oligomeric state, and mutational analysis of PpsR, the Rhodobacter sphaeroides repressor of photosystem gene expression. J Bacteriol 182, 22532261.
Jaubert, M., Zappa, S., Fardoux, J. & 7 other authors (2004). Light and redox control of photosynthesis gene expression in Bradyrhizobium. Dual roles of two PpsR*. J Biol Chem 279, 4440744416.
Joshi, H. M. & Tabita, F. R. (1996). A global two component signal transduction system that integrates the control of photosynthesis, carbon dioxide assimilation, and nitrogen fixation. Proc Natl Acad Sci U S A 93, 1451514520.
Kammler, M., Schon, C. & Hantke, K. (1993). Characterization of the ferrous iron uptake system of Escherichia coli. J Bacteriol 175, 62126219.[Abstract]
Kang, Y., Weber, K. D., Qiu, Y., Kiley, P. J. & Blattner, F. R. (2005). Genome-wide expression analysis indicates that FNR of Escherichia coli K-12 regulates a large number of genes of unknown function. J Bacteriol 187, 11351160.
Karls, R. K., Wolf, J. R. & Donohue, T. J. (1999). Activation of the cycA P2 promoter for the Rhodobacter sphaeroides cytochrome c2 gene by the photosynthesis response regulator. Mol Microbiol 34, 822835.[CrossRef][Medline]
Laguri, C., Phillips-Jones, M. K. & Williamson, M. P. (2003). Solution structure and DNA binding of the effector domain from the global regulator PrrA (RegA) from Rhodobacter sphaeroides: insights into DNA binding specificity. Nucleic Acids Res 31, 67786787.
Lee, J. K. & Kaplan, S. (1992). cis-acting regulatory elements involved in oxygen and light control of puc operon transcription in Rhodobacter sphaeroides. J Bacteriol 174, 11581171.[Abstract]
Li, C. & Hung Wong, W. (2001). Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2, 111.
Li, C. & Wong, W. H. (2001). Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A 98, 3136.
Liu, X., Brutlag, D. L. & Liu, J. S. (2001). BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In Pacific Symposium on Biocomputing, pp. 127138.
Masuda, S. & Bauer, C. E. (2002). AppA is a blue light photoreceptor that antirepresses photosynthesis gene expression in Rhodobacter sphaeroides. Cell 110, 613623.[CrossRef][Medline]
Masuda, S., Matsumoto, Y., Nagashima, K. V., Shimada, K., Inoue, K., Bauer, C. E. & Matsuura, K. (1999). Structural and functional analyses of photosynthetic regulatory genes regA and regB from Rhodovulum sulfidophilum, Roseobacter denitrificans, and Rhodobacter capsulatus. J Bacteriol 181, 42054215.
Moskvin, O. V., Gomelsky, L. & Gomelsky, M. (2005). Transcriptome analysis of the Rhodobacter sphaeroides PpsR regulon: PpsR as a master regulator of photosystem development. J Bacteriol 187, 21482156.
Oh, J. I. & Kaplan, S. (1999). The cbb3 terminal oxidase of Rhodobacter sphaeroides 2.4.1: structural and functional implications for the regulation of spectral complex formation. Biochemistry 38, 26882696.[CrossRef][Medline]
Oh, J. I. & Kaplan, S. (2001). Generalized approach to the regulation and integration of gene expression. Mol Microbiol 39, 11161123.[CrossRef][Medline]
Oh, J. I. & Kaplan, S. (2002). Oxygen adaptation. The role of the CcoQ subunit of the cbb3 cytochrome c oxidase of Rhodobacter sphaeroides 2.4.1. J Biol Chem 277, 1622016228.
Oh, J. I., Eraso, J. M. & Kaplan, S. (2000). Interacting regulatory circuits involved in orderly control of photosynthesis gene expression in Rhodobacter sphaeroides 2.4.1. J Bacteriol 182, 30813087.
Pappas, C. T., Sram, J., Moskvin, O. V. & 7 other authors (2004). Construction and validation of the Rhodobacter sphaeroides 2.4.1 DNA microarray: transcriptome flexibility at diverse growth modes. J Bacteriol 186, 47484758.
Penfold, R. J. & Pemberton, J. M. (1994). Sequencing, chromosomal inactivation, and functional expression in Escherichia coli of ppsR, a gene which represses carotenoid and bacteriochlorophyll synthesis in Rhodobacter sphaeroides. J Bacteriol 176, 28692876.[Abstract]
Qian, Y. & Tabita, F. R. (1996). A global signal transduction system regulates aerobic and anaerobic CO2 fixation in Rhodobacter sphaeroides. J Bacteriol 178, 1218.
Roh, J. H. & Kaplan, S. (2002). Interdependent expression of the ccoNOQP-rdxBHIS loci in Rhodobacter sphaeroides 2.4.1. J Bacteriol 184, 53305338.[CrossRef][Medline]
Roh, J. H., Smith, W. E. & Kaplan, S. (2004). Effects of oxygen and light intensity on transcriptome expression in Rhodobacter sphaeroides 2.4.1. Redox active gene expression profile. J Biol Chem 279, 91469155.
Sistrom, W. R. (1962). The kinetics of the synthesis of photopigments in Rhodopseudomonas sphaeroides. J Gen Microbiol 28, 607616.[Medline]
Swem, L. R., Elsen, S., Bird, T. H., Swem, D. L., Koch, H. G., Myllykallio, H., Daldal, F. & Bauer, C. E. (2001). The RegB/RegA two-component regulatory system controls synthesis of photosynthesis and respiratory electron transfer components in Rhodobacter capsulatus. J Mol Biol 309, 121138.[CrossRef][Medline]
Zeilstra-Ryalls, J. H. & Kaplan, S. (1995). Aerobic and anaerobic regulation in Rhodobacter sphaeroides 2.4.1: the role of the fnrL gene. J Bacteriol 177, 64226431.
Zeilstra-Ryalls, J. H. & Kaplan, S. (1998). Role of the fnrL gene in photosystem gene expression and photosynthetic growth of Rhodobacter sphaeroides 2.4.1. J Bacteriol 180, 14961503.
Zeilstra-Ryalls, J. H., Gabbert, K., Mouncey, N. J., Kaplan, S. & Kranz, R. G. (1997). Analysis of the fnrL gene and its function in Rhodobacter capsulatus. J Bacteriol 179, 72647273.
Zeilstra-Ryalls, J., Gomelsky, M., Eraso, J. M., Yeliseev, A., O'Gara, J. & Kaplan, S. (1998). Control of photosystem formation in Rhodobacter sphaeroides. J Bacteriol 180, 28012809.
Zeng, X., Choudhary, M. & Kaplan, S. (2003). A second and unusual pucBA operon of Rhodobacter sphaeroides 2.4.1: genetics and function of the encoded polypeptides. J Bacteriol 185, 61716184.
Received 29 April 2005;
revised 25 July 2005;
accepted 26 July 2005.
HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
J MED MICROBIOL | ALL SGM JOURNALS |