Relationship between codon biased genes, microarray expression values and physiological characteristics of Streptococcus pneumoniae

Antonio J. Martín-Galiano1,{dagger}, Jerry M. Wells2 and Adela G. de la Campa1

1 Unidad de Genética Bacteriana (CSIC), Centro Nacional de Microbiología, Instituto de Salud Carlos III, 28220, Majadahonda, Madrid, Spain
2 Bacterial Infection and Immunity Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA, UK

Correspondence
Antonio J. Martín-Galiano
a.martin{at}wzw.tum.de


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
A codon-profile strategy was used to predict gene expression levels in Streptococcus pneumoniae. Predicted highly expressed (PHE) genes included those encoding glycolytic and fermentative enzymes, sugar-conversion systems and carbohydrate-transporters. Additionally, some genes required for infection that are involved in oxidative metabolism and hydrogen peroxide production were PHE. Low expression values were predicted for genes encoding specific regulatory proteins like two-component systems and competence genes. Correspondence analysis localized 484 ORFs which shared a distinctive codon profile in the right horn. These genes had a mean G+C content (33·4 %) that was lower than the bulk of the genome coding sequences (39·7 %), suggesting that many of them were acquired by horizontal transfer. Half of these genes (242) were pseudogenes, ORFs shorter than 80 codons or without assigned function. The remaining genes included several virulence factors, such as capsular genes, iga, lytB, nanB, pspA, choline-binding proteins, and functions related to DNA acquisition, such as restriction-modification systems and comDE. In order to compare predicted translation rate with the relative amounts of mRNA for each gene, the codon adaptation index (CAI) values were compared with microarray fluorescence intensity values following hybridization of labelled RNA from laboratory-grown cultures. High mRNA amounts were observed in 32·5 % of PHE genes and in 64 % of the 25 genes with the highest CAI values. However, high relative amounts of RNA were also detected in 10·4 % of non-PHE genes, such as those encoding fatty acid metabolism enzymes and proteases, suggesting that their expression might also be regulated at the level of transcription or mRNA stability under the conditions tested. The effects of codon bias and mRNA amount on different gene groups in S. pneumoniae are discussed.


Abbreviations: CAI, codon adaptation index; COA, correspondence analysis; FU, fluorescence units; Nc, effective number of codons; PHE, predicted highly expressed; RP, ribosomal protein; RSCU, relative synonymous codon usage

CAI values for all genes of strain TIGR4 are available as supplementary data with the online version of this paper at http://mic.sgmjournals.org.

{dagger}Present address: Lehrstuhl für Genomorientierte Bioinformatik, Wissenschaftszentrum Weihenstephan, Am Forum 1, 85354 Freising, Germany.


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Streptococcus pneumoniae, commonly known as the pneumococcus, is one of the most important human pathogens worldwide, causing a number of diseases including pneumonia, meningitis, otitis media and sinusitis. The increasing number of clinical isolates found to be antibiotic-resistant (and multidrug-resistant) highlights the importance of research on the molecular biology of this organism. The availability of genome sequence data for S. pneumoniae strain JNR7/87 (TIGR4) of serotype 4 (Tettelin et al., 2001), strain R6 (an unencapsulated laboratory derivative of a serotype 2 strain) (Hoskins et al., 2001), and strain G54 serotype 19F strain (Dopazo et al., 2001), provides a wealth of untapped information with which to analyse codon usage and its relationship to gene expression and mutational bias. Besides other mechanisms, codon bias can influence gene expression by optimization of the translation rate (Chavancy & Garel, 1981). It is based on the selection of the third codon position to adapt coding sequences to the most abundant tRNAs in the cell (Ikemura, 1981) or to those with more efficient codon–anticodon interaction kinetics (Grosjean et al., 1978). Although this gene adaptation is species-specific, close similarities can be found in organisms of the same genus (Sharp, 1991). Highly restrictive codon patterns exist in genes encoding abundant polypeptides, probably due to a low tolerance to synonymous substitutions that slow down the translation elongation process (Sharp & Li, 1987b).

The approximate expression level of a gene can be predicted by comparing its codon bias with the profile of universally highly expressed genes, such as the ribosomal protein (RP) genes, which are commonly used as a reference set. Algorithms developed for this purpose (Sharp & Li, 1987a; Karlin & Mrazek, 2000) are adequate for deciphering the general pattern of gene expression in the cell, and to detect special enhanced functions in some micro-organisms, such as DNA and protein repair in Deinococcus radiodurans and flagellar motility in Treponema pallidum (Karlin & Mrazek, 2000). There is a good correlation of predicted highly expressed (PHE) genes with high two-dimensional gel abundances in Bacillus subtilis and Escherichia coli (Karlin et al., 2001). However, these algorithms do not allow the detection of genes encoding proteins that are abundant due to their high stability rather than to a high translation rate (Karlin et al., 2001) and, given the large translation capacity of ribosomes, codon usage restrictions of highly expressed genes should operate only at critical stages of rapid growth (Kurland, 1991). In accordance with these ideas, the slow-growing Mycobacterium tuberculosis (24–36 h doubling time) exhibits almost no alternative codon bias among genes that are PHE in other, fast-growing eubacteria (Andersson & Sharp, 1996; Karlin & Mrazek, 2000).

Codon bias could be an important factor in S. pneumoniae since its cell-division time under laboratory growth conditions is typically less than 45 min. However, to the best of our knowledge, systematic studies of the effect of codon usage on gene expression levels and gene function have not been reported for the lactic acid group of bacteria. In addition, there is one report on the correlations between codon usage bias and microarray data for E. coli (dos Reis et al., 2003). Given the medical significance of S. pneumoniae, Streptococcus pyogenes, and the viridans group streptococci, and the industrial importance of the food lactic acid bacteria, such as Lactococcus lactis and Lactobacillus acidophilus, a study of the relationship between codon usage, gene expression and gene function is required. The objective of this study was to analyse the relationships between the predicted level of gene expression based on codon usage, actual microarray expression values and gene function at the genomic level in S. pneumoniae.


   METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Synonymous codon usage and statistical analysis.
The genomic sequences of S. pneumoniae strain JNR7/87 (TIGR4; Tettelin et al., 2001) and strain R6 (Hoskins et al., 2001) were obtained from the The Institute for Genomic Research (TIGR, http://www.tigr.org). Three parameters were calculated, essentially according to the method of Sharp and Li (1987a): RSCU (relative synonymous codon usage), w (relative adaptiveness of a codon) and CAI (codon adaptation index). An RSCU value for a codon is the observed frequency of a codon divided by the expected frequency when all synonymous codons for that amino acid are used equally. Therefore, RSCU values close to 1·0 indicate a lack of bias for that codon. w is a normalized version of RSCU, calculated as the quotient of the RSCU value of a specific codon and the highest RSCU value for codons encoding the same amino acid. The CAI value of a gene is the geometric mean of the w values from all its codons. A w value of 0·001 was assigned to codons never used in the reference set to avoid CAI values of 0 for genes having those codons. CAI values for all genes of strain TIGR4 are available as supplementary data with the online version of this paper (http://mic.sgmjournals.org). Programs for calculating CAI and the effective number of codons (Nc) values were written in Visual Basic. Correspondence analysis (COA) of RSCU values was performed using the GCUA program (available at http://bioinf.may.ie/gcua/download.html; McInerney, 1998). Briefly, this method plots genes according to the codon usage in a 59-dimensional space (not including the five non-variant codons), and then identifies the major trends in codon usage as those axes through this multidimensional hyperspace which account for the largest fractions of the variation among genes.

Culture conditions, RNA extraction and microarray experiments.
S. pneumoniae R6 was grown in Todd–Hewitt medium (Difco) with 0·5 % yeast extract, adjusted to pH 7·8 (THYE medium). Cells corresponding to 50 ml cultures were collected at mid-exponential phase (OD620=0·25), washed with cold 0·9 % NaCl and stored at –80 °C. Pellets were thawed and cells lysed for 15 min at 37 °C in 10 mM Tris, 1 mM EDTA (pH 8·0), 0·1 % sodium deoxycholate. RNA was extracted with the RNeasy midi kit (QIAGEN), including a DNase treatment according to the manufacturer's instructions, precipitated with ethanol, washed, and suspended in 40 µl H2O. Concentration and purity of the RNA samples were measured using the 2100 Bioanalyser (Agilent). Details of the construction of the microarrays used in this study have been described previously (Dagkessamanskaia et al., 2004). The microarrays included probes for all strain TIGR4 annotated genes (2236) and probes for 117 R6-specific genes (i.e. less than 90 % similarity, as deduced by BLAST analysis). To obtain labelled cDNA, a 25 µl mixture was made with 15 µg RNA, 5 µg random primers (obtained with the Bioprime DNA labelling kit, Invitrogen), 12 µM DTT, 500 µM each dNTP (except for CTP, which was 240 µM), 2 nM Cy3- or Cy5-labelled CTP, and 200 units Stratascript (Stratagene) reverse transcriptase, in the buffer supplied by the manufacturer. The mixture was incubated overnight at 37 °C and the reaction stopped by addition of 1·5 µl 20 mM EDTA plus 15 µl 0·1N NaOH. After 15 min incubation at 70 °C, 15 µl 0·1N HCl was added. Labelled cDNA was treated with the QIAquick PCR purification kit (QIAGEN), the volume was reduced to 10 µl by lyophilization, and then 6·1 µg Cot1 human DNA was added, as well as 3x SSC, 0·2 % SDS, 0·02 M HEPES and 4x Denhardt's solution, to a final volume of 90 µl. Samples were treated for 2 min at 100 °C and 10 min at room temperature, centrifuged twice, and 40 µl of the supernatant was applied to a microarray slide. After overnight incubation at 63 °C, microarrays were washed and scanned with an Axon 4000A apparatus, using GenePix Pro 3.0 software. Fluorescence values, taken as the median of the intensity of all the pixels after subtracting the surrounding background, corresponded to the mean of three independent samples, each having four replicates for each gene.


   RESULTS AND DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Multivariate statistics: correspondence analysis
To examine the codon usage heterogeneity among S. pneumoniae genes, COA analysis of RSCU values of all ORF genes in strain TIGR4 (2236) was performed. Scatter plots revealed a core region and two ascending horns, as reported previously for other eubacteria, such as E. coli (Médigue et al., 1991). The left horn was less dispersed than the right one (Fig. 1A). A total of 484 ORFs localized in the right horn (axis 1 values >0 and axis 2 values >0·02). Half of these genes (242 of 484) were pseudogenes (usually transposases), were shorter than 80 codons, or encoded unassigned hypothetical proteins (Fig. 1B). A total of 242 functional genes were present in the right horn (Fig. 1C), including several genes encoding phosphotransferase systems, restriction-modification systems, choline-binding proteins, competence proteins and most genes of the blp operon (related to toxin production). These genes associated with the right horn are potentially foreign genes acquired by horizontal transfer, which have not yet evolved a codon profile matched to the translation machinery of S. pneumoniae. They had a mean G+C content (33·4 %) lower than that of the coding sequences of the whole genome (39·7 %). Most of the PHE (see below) and RP genes were localized in the left horn (Fig. 1B, C), indicating that they share a similar codon bias that is rather different from the rest of the ORFs. In the first two COA axes, at least, no significant differences in codon usage were observed, independently of whether or not the coding sequence was complementary to the leading (79 % of the genes) or lagging strand (21 % of the genes) (Fig. 1D).



View larger version (43K):
[in this window]
[in a new window]
 
Fig. 1. A. Plot of the two first axes generated by COA of RSCU values for 2236 ORFs of S. pneumoniae TIGR4. Gene function symbols in (B): *, RP genes; {triangledown}, genes with less than 80 codons; {bullet}, degenerated and truncated genes. Symbols in (C): {circ}, PHE genes; {square}, pathogenicity genes; +, modification-restriction enzyme genes; {blacktriangledown}, two-component systems; {blacksquare}, DNA transformation. Symbols in (D): {blacksquare}, lagging-strand genes; dots, leading-strand genes.

 
Synonymous codon usage: CAI-value calculations
Given the similarity in the correspondence analysis plots for E. coli (Médigue et al., 1991) and S. pneumoniae (Fig. 1A), we assumed that highly expressed genes in S. pneumoniae would have a codon-usage bias which was positively correlated with the abundance of the isoacceptor tRNA levels, as occurs in E. coli (Ikemura, 1981). For the construction of a reference table of w values, 52 of the 56 RP genes were chosen, the four excluded RP genes (prmA, rpsN, sp0555 and sp0973) having a different codon profile (CAI<0·5). In accordance with the high A+T content (60 %) of the genome (Tettelin et al., 2001), A- and U-ending codons were favoured in both gene sets. There was also a selection for A- and C-ending codons in amino acids encoded by two codons (Phe, Tyr, His, Gln, Asn, Lys, Asp, Glu) or three codons (Ile). Considering the only type of tRNA detected for these nine amino acids (or the most represented in the case of Lys, Table 1), the results suggested a selection for a codon–anticodon interaction without wobble. While 21 out of 61 codons in the RP set of highly expressed genes had a codon usage bias and w values below 0·1 (10-fold less than the preferred isocodon), only one codon in the data for the whole genome set had a w value less than 0·1 (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. w values for S. pneumoniae codons in the whole genome (GEN) and for ribosomal protein (RP) genes, together with the number of tRNA genes

Codon usage for the whole genome and tRNA gene data were downloaded from http://www.tigr.org.

 
The CAI algorithm was applied to 1802 non-RP full-length gene sequences from the TIGR4 strain, all with between 80 and 1500 codons (not including the stop codon). The distribution of CAI values (0·156–0·866) was unimodal, with the majority of genes (78·4 %) having CAI values between 0·200 and 0·400 (Fig. 2A), and the mean and median CAI values were 0·338 and 0·312, respectively. The CAI value was found to be independent of gene sequence length (r2=0·0003, Fig. 2B), suggesting that codon bias is not a major mechanism directed towards the efficient translation of long genes. Genes were classified into three groups with high (CAI>0·500), medium (0·500>CAI>0·250) and low (CAI<0·250) levels of predicted expression. The PHE genes represented 7·3 % (131 genes) of the total, a figure compatible with those found (4–10 %) in other eubacteria (Karlin & Mrazek, 2000). Predicted medium- and lowly-expressed genes represented 78·7 % (1419 genes) and 14·0 % (252 genes) of the total, respectively. The 131 PHE genes were grouped into functional classes and subclasses (Table 2). As in other fast-growing bacteria (Karlin et al., 2001), genes of glycolytic enzymes and translation elongation factors (Table 2, Fig. 2B) were among the 25 genes with the highest CAI values. Of the 10 most abundantly expressed proteins in Streptococcus mutans, which is phylogenetically close to S. pneumoniae (Wilkins et al., 2002), eight homologues are found in the group of the 25 genes with the highest CAI values in S. pneumoniae (Table 2), and the remaining two are encoded by RP genes.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 2. Distribution of CAI values (A), and relationship between CAI and gene length (B). Regression line in B illustrates the lack of association between CAI and gene length. Gene function symbols: {triangleup}, glycolysis; x, elongation factors; {lozenge}, initiation factors/aminoacyl tRNA synthetases/RNA polymerase subunits; +, chaperones; {blacktriangledown}, two-component systems; {blacksquare}, DNA transformation. All other genes are indicated by dots.

 

View this table:
[in this window]
[in a new window]
 
Table 2. S. pneumoniae PHE genes listed by role and subrole

ABCT, ATP-binding cassette transporter; B/D, biosynthesis and degradation; DH, dehydrogenase; DP, diphosphate; MT, methyl-transferase; MTHPTG, methyltetrahydropteroyltriglutamate; NT, nucleotidyltransferase; P, phosphate; PEP, phosphoenol pyruvate; PMF, proton motive force; PTS, phosphotransferase system; TF, transferase. The 25 genes with the highest CAI values are asterisked.

 
PHE genes are expected to use a small number of different codons. This value, known as the Nc variable (Wright, 1990) can have values from 20 (when one codon is exclusively used for each amino acid) to 61 (when the use of alternative synonymous codons is equally likely). Analysis of the 1802 TIGR4 genes revealed Nc values ranging from 26 to 61. On average, genes with CAI>0·6 had Nc values 13 units lower than genes with CAI<0·210, and 9·5 units lower than genes of the whole genome set of the same length (data not shown).

Comparison of CAI and microarray fluorescence values
As 85 % of the genes of the R6 and TIGR4 strains have a similarity above 90 %, and a good correlation (r2=0·99) of CAI values among their homologous genes was observed (data not shown), Cy3- (two replicates) and Cy5- (one replicate) labelled cDNA obtained from R6 grown to mid-exponential phase (OD620=0·25) was hybridized to the microarrays, as described in Methods, and the mean fluorescence measurements for each gene were used to estimate the relative mRNA transcript levels. Fluorescence was detected for 1513 homologues of R6 and TIGR4. Given the median (1675 FU, fluorescence units) of the fluorescence distribution, and the proportion (12·56 %, 190 of 1513) of genes with values higher than 6000 FU (Fig. 3A), that value was chosen as the cut-off to assign highly expressed genes. Among the 114 PHE genes (CAI>0·5), 32·5 % showed high (>6000 FU), 33·3 % medium (2000–6000 FU), and 34·2 % low (<2000 FU) relative levels of expression (Fig. 3B). Among the 25 genes with the highest CAI values (CAI>0·680), the majority (16 of 25, 64 %) gave high fluorescence values on the microarray, revealing a correlation between the levels of transcription and translation among a substantial proportion of highly expressed genes. A similar relationship has been recently observed in E. coli (dos Reis et al., 2003). An increase in the proportion of genes with fluorescence values above 6000 FU was observed in groups of genes with CAI values of 0·4 to 0·6 (21–25 %) compared to the genes with CAI values lower than 0·4 (4–10 %). The lower median (949 FU) and lowest percentage of genes over 6000 units (4 %) corresponded to the group of genes with CAI values lower than 0·2. Therefore, despite the fact that it is widely accepted that low-abundance polypeptides do not necessarily have low CAI values, in our experiments there was also a relationship between CAI and FU in genes with low CAI values. On the other hand, 10·4 % of non-PHE genes had high fluorescence values (>6000 FU), possibly reflecting the fact that these genes are upregulated under laboratory culture conditions. For instance, 55 % of the fatty-acid-metabolism genes (with medium or low CAI values) had values higher than 6000 units.



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 3. (A) Distribution of microarray fluorescence values. (B) Median microarray fluorescence values (hatched bars, left-hand axis) and percentage of genes over 6000 FU (black bars, right-hand axis) in CAI groups.

 
Although a general relationship was observed between the CAI and microarray fluorescence value when all genes were considered (Fig. 4A), a low value of r2 (0·09) was obtained when both variables were compared. This low r2 value could be explained by the stability of the CAI value (due to a long-term optimization to the fluctuating environment in vivo) and the dynamic nature of the amount of mRNA (taken from laboratory cultures growing under defined conditions). However, there was a significant relationship for genes of glycolysis (r2=0·46), of fatty acid metabolism (r2=0·37), and proteases (r2=0·27). Genes were classified in four categories: genes with CAI and fluorescence values higher than 0·5 and 6000, respectively; genes with CAI higher than 0·5; genes with fluorescence values higher than 6000; and genes with CAI and fluorescence values lower than the cut-off points. Most genes (80·6 %, 1220 out of 1513) corresponded to the last category. In order to rule out any possible effects of variations in probe length on the selection of genes with a relatively high amount of mRNA transcripts, the data shown in Fig. 3 were recalculated using FU values corrected for probe length. This did not appreciably affect the profile of PHE genes, except in the case of the ribosomal genes, due to their very short probe length (data not shown). Furthermore, the use of these corrected fluorescence values did not generate any perceptible changes to Fig. 4.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 4. Global comparison between CAI and fluorescence microarray values in the whole genome (A), low- and middle-expressed gene groups (B), and PHE gene groups (C). The line corresponds to the linear regression analysis for the whole genome and is also shown in (B) and (C). Gene function symbols: {blacktriangledown}, two-component systems; {blacksquare}, DNA transformation; +, fatty acid metabolism; {circ}, proteases; {triangleup}, glycolysis; x, elongation factors; {lozenge}, aminoacyl-tRNA synthetases; *, RP genes.

 
Energetic metabolism
S. pneumoniae, which has an anaerobic metabolism, lacks the genes that encode functions of the tricarboxylic acid cycle. Therefore, energetic metabolism relies on glycolysis and fermentation. Accordingly, most genes of glycolytic enzymes and two enzymes of fermentative metabolism (ldh and pfl) were among the 25 genes with the highest CAI values (Fig. 2B, Table 2), and also had high fluorescence values (>5700 units). Likewise, genes for alternative fermentation pathways, such as sp0285 and sp2026, encoding alcohol dehydrogenases, and sp1161, encoding a subunit of the enzyme that converts pyruvate into acetyl-CoA, were also PHE (Table 2).

Some of the genes involved in the complex pneumococcal network of sugar conversions were also PHE, such as the gene of the enzyme that cleaves lactose (lacG), genes of enzymes that convert galactose into glycolytic intermediates (lacA, lacB and lacD), and malQ, which encodes an enzyme involved in the degradation of maltodextrins, the first digestion product of starch. S. pneumoniae would be able to obtain energy easily under starvation conditions from glycogen, since the genes of glycogen phosphorylase (sp2106) and phosphoglucomutase (pgm) were PHE (Table 2). Additionally, the PHE gene sp1804 (Table 2) shows high similarity (>70 %) with the Enterococcus hirae gls24 gene that encodes a stress protein playing an important role during glucose starvation (Giard et al., 2000). In addition to the glycolytic and the two fermentation enzymes described above, malQ, sp2106 and pgm also showed high mRNA amounts.

Transcription and protein synthesis
As expected, genes of translation elongation factors were PHE genes (Fig. 2B, Table 2), as well as others involved in translation and transcription, such as those of aminoacyl-tRNA synthetases and RNA polymerase subunits (Fig. 2B, Table 2). Most of these PHE genes also had high median fluorescence values in the microarray experiments. Aminoacyl-tRNA synthetases had a median FU value of 4762, whereas the value for RNA polymerase subunits was 12 506 FU. In contrast, the genes of proteolytic enzymes, although they generally had very high fluorescence values (median FU of 3853), were not PHE (Fig. 4B), suggesting that proteolysis is enhanced under the laboratory culture conditions. Among the genes encoding chaperones, only dnaK and tig showed both high CAI and fluorescence values, being the only chaperones included in the 25 genes with the highest CAI values.

On the other hand, most RP genes had fluorescence values higher than the genome median but lower than 6000 FU (median 3210), plotting in the low-right quadrant of Fig. 4C, indicating that codon bias might be a more important factor than the amount of mRNA for the general abundance of RP proteins. Genes involved in amino-acid biosynthesis had quite homogeneous CAI values (generally <0·400), in accordance with the general tendency of the genome. However, much higher values were found in specific genes (ilvC, gdhA, metE, asd, cysM and glnA; Table 2), a feature that has been associated with control-pathway enzyme genes (Karlin & Mrazek, 2000). None of these genes showed high fluorescence values, which may be related to the abundance of casein-derived amino acids in THYE medium.

Transporters
S. pneumoniae has one of the highest proportions (30 %) of sugar transporter genes among the prokaryotic genomes (Tettelin et al., 2001), seeming to be highly adapted to compete for sugar nutrients with other respiratory tract micro-organisms. Several genes of sugar transporters and phosphotransferase systems were PHE (Table 2). However, under the rich and stable sugar environment of the THYE medium, only a few of these genes showed high mRNA amounts: ptsH, ptsI and sp0758 of the phosphotransferase system, and maltosaccharide transporter malX.

In addition, some genes for Fe and Mn transporters were also PHE (Table 2), possibly reflecting an adaptation to pathogenicity, given the vital importance of the acquisition of these elements inside the host (Jakubovics & Jenkinson, 2001). Among them, the psaABC operon encoding the Mn transporter also had relatively high levels of transcripts.

Oxidative metabolism
Genes involved in oxidant species detoxification and other redox reactions were PHE (trx, nox, sodA, fld and trxB; Table 2). Likewise, four of the genes classified as oxidoreductases in the unknown-specificity enzyme group (sp1325, sp1471, sp1472 and sp1588), and psaA, part of an Mn transporter involved in anti-oxidative defence, were also strongly PHE (Table 2). Taken together, these data suggest that defence against oxidative species is highly developed in S. pneumoniae, possibly as a consequence of its ability to colonize and persist in the nasopharynx, where partial oxygen pressure is high. Consistent with this hypothesis, nox, sodA and psaA, which are essential for infection (Auzat et al., 1999; Yesilkaya et al., 2000; Tseng et al., 2002) also appeared to be transcribed at high levels (11 200, 5630 and 12 987 FU, respectively). In spite of the anaerobic metabolism of S. pneumoniae, one of the highest CAI and fluorescence values (0·738 and 11 683 units, respectively) corresponded to the pyruvate oxidase gene, spxB, which is one of the more abundant polypeptides of the transparent variants of S. pneumoniae (Overweg et al., 2000). This enzyme is also essential for infection (Spellerberg et al., 1996), and produces, in the presence of oxygen, acetyl-phosphate and hydrogen peroxide. The latter is an important pneumococcal virulence factor (Duane et al., 2000), which additionally could cause an inhibitory effect on the growth of competitive microbes in the upper respiratory tract (Pericone et al., 2000).

Genes expressed at low levels
Low CAI values were calculated for genes with a putative regulatory function, which included 27 genes of two-component systems (TCS) (Fig. 2B) and 62 general regulators, with mean CAI values of 0·247 and 0·281, respectively. Additionally, low CAI values were also calculated for the 35 genes involved in prosthetic group/cofactor biosynthesis and the 19 genes of aromatic amino-acid biosynthesis with mean CAI values of 0·292 and 0·294, respectively. Some of these gene groups also had low median fluorescence values: regulators (914 FU, n=50), TCS (1399 FU, n=26) (Fig. 4B) and cofactor-vitamin biosynthesis (1542 FU, n=34).

Low CAI values were also calculated for 24 competence genes (mean CAI of 0·269; Fig. 2B), and most also had low fluorescence values (median 868 FU, n=21) (Fig. 4B). These genes localized in the central part of the COA plot, with the exception of comD, comE and comF, which localized in the right horn, and had G+C contents of 32·0 %, 30·7 % and 36·2 %, respectively. Consequently, they could be recently acquired genes. It is worth emphasizing that S. pneumoniae becomes naturally competent for only a few minutes, resulting in rapid changes in its protein profile (Morrison & Baker, 1979), and that constitutive activation of the competence regulon could be deleterious for the cell (Martin et al., 2000). Thus it is possible that the presence of rare codons in competence genes could be a mechanism that limits translation, thereby minimizing adverse physiological stresses prior to induction of competence-gene expression, as suggested in the case of some E. coli regulatory genes (Kronigsberg & Codson, 1983). In accordance with this hypothesis, other mechanisms negatively controlling expression of competence involve the cleavage of competence factors by the ClpP protease (Chastanet et al., 2001), and the action of the inhibitor of the competence-stimulator peptide (Berge et al., 2001). In contrast, the recA gene had a moderately high CAI (0·489) and a high fluorescence value (8286 FU), being the only competence gene that appears in the left horn of in the COA, probably because it is involved in multiple cellular processes.

Virulence factors
Virulence factors include capsule and cell-wall biosynthesis enzymes, pneumolysin, autolysin, neuraminidase, IgA1 protease, and some surface proteins (Paton et al., 1993). Nearly all these genes had CAI values of 0·250 to 0·350, and could be considered medium-expressed genes. Nevertheless, psaA was PHE. Most genes of capsule biosynthesis, as well as nanB, pspA, iga, genes of choline-binding proteins (cbpC and cbpF), and lytB appear in the right horn (Fig. 1C) of the COA, suggesting a recent acquisition by horizontal transfer. In agreement with this idea, the G+C contents of the cps4EFGH capsular genes, nanB and pspA were 27·8 % to 33·5 %, 33·4 %, and 35·0 %, respectively, which is lower than that of the bulk of the genome coding sequences (39·7 %).

Apparently there are two mechanisms that determine the persistence/virulence of S. pneumoniae, operating on different time scales. One is the optimization of codon usage, as detected by CAI analysis for sugar-transporter and oxidative-metabolism genes, possibly reflecting a long-term progressive adaptation to persistence in carrier hosts. The other is the recent acquisition of new virulence factors by horizontal transfer, as detected by COA and G+C content.


   ACKNOWLEDGEMENTS
 
A. J. M.-G. gratefully acknowledges receipt of a fellowship from the Comunidad Autónoma de Madrid, Spain. This study was supported by grant BIO2002-01398 from the Ministerio de Ciencia y Tecnología. J. M. W. acknowledges financial support for the microarray construction from EC contract QLK2-CT-2000-00543. We wish to thank Karin Overweg and Mark Reuter for advice and discussion concerning the microarray work.


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Andersson, S. G. E. & Sharp, P. M. (1996). Codon usage in the Mycobacterium tuberculosis complex. Microbiology 142, 915–925.[Abstract]

Auzat, I., Chapuy-Regaud, S., Le Bras, G., Dos Santos, D., Ogunniyi, A. D., Le Thomas, I., Garel, J. R., Paton, J. C. & Trombe, M. C. (1999). The NADH oxidase of Streptococcus pneumoniae: its involvement in competence and virulence. Mol Microbiol 34, 1018–1028.[CrossRef][Medline]

Berge, M., Garcia, P., Iannelli, F., Prere, M. F., Granadel, C., Polissi, A. & Claverys, J. P. (2001). The puzzle of zmpB and extensive chain formation, autolysis defect and non-translocation of choline-binding proteins in Streptococcus pneumoniae. Mol Microbiol 39, 1651–1660.[CrossRef][Medline]

Chastanet, A., Prudhomme, M., Claverys, J. P. & Msadek, T. (2001). Regulation of Streptococcus pneumoniae clp genes and their role in competence development and stress survival. J Bacteriol 183, 7295–7307.[Abstract/Free Full Text]

Chavancy, G. & Garel, J. P. (1981). Does quantitative tRNA adaptation to codon content in mRNA optimize the ribosomal translation efficiency? Proposal for a translation system model. Biochimie 63, 187–195.[Medline]

Dagkessamanskaia, A., Moscoso, M., Hénard, V., Guiral, S., Overweg, K., Reuter, M., Wells, J. M. & Claverys, J. P. (2004). Interconnection of competence, stress and CiaR regulons in Streptococus pneumoniae: competence triggers stationary phase autolysis of ciaR mutant cells. Mol Micro 51, 1071–1086.[Medline]

Dopazo, J., Mendoza, A., Herrero, J. & 13 other authors (2001). Annotated draft genomic sequence from a Streptococcus pneumoniae type 19F clinical isolate. Micro Drug Resist 7, 99–125.[CrossRef]

Dos Reis, M., Wernisch, L. & Savva, R. (2003). Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res 31, 6976–6985.[Abstract/Free Full Text]

Duane, P. G., Rubins, J. B., Weisel, H. R. & Janoff, E. N. (2000). Identification of hydrogen peroxide as a Streptococcus pneumoniae toxin for rat alveolar epithelial cells. Infect Immun 61, 4392–4397.

Giard, J. C., Rince, A., Capiaux, H., Auffray, Y. & Hartke, A. (2000). Inactivation of the stress- and starvation-inducible gls24 operon has a pleiotropic effect on cell morphology, stress sensitivity, and gene expression in Enterococcus faecalis. J Bacteriol 182, 4512–4520.[Abstract/Free Full Text]

Grosjean, H., Sankoff, D., Jou, W. M., Fiers, W. & Cedergren, R. J. (1978). Bacteriophage MS2 RNA: a correlation between the stability of the codon : anticodon interaction and the choice of code words. J Mol Evol 12, 113–119.[Medline]

Hoskins, J., Alborn, W. E., Arnold, J. & 37 other authors (2001). Genome of the bacterium Streptococcus pneumoniae strain R6. J Bacteriol 183, 5709–5717.[Abstract/Free Full Text]

Ikemura, T. (1981). Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes. J Mol Biol 146, 1–21.[Medline]

Jakubovics, N. S. & Jenkinson, H. F. (2001). Out of the iron age: new insights into the critical role of manganese homeostasis in bacteria. Microbiology 147, 1709–1718.[Free Full Text]

Karlin, S. & Mrazek, J. (2000). Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol 182, 5238–5250.[Abstract/Free Full Text]

Karlin, S., Mrazek, J., Campbell, A. & Kaiser, D. (2001). Characterization of highly expressed genes of four fast-growing bacteria. J Bacteriol 183, 5025–5040.[Abstract/Free Full Text]

Kronigsberg, W. & Codson, G. N. (1983). Evidence for use of rare codons in the dnaG gene and other regulatory genes of Escherichia coli. Proc Natl Acad Sci U S A 80, 687–691.[Abstract]

Kurland, C. G. (1991). Codon bias and gene expression. FEBS Lett 285, 165–169.[CrossRef][Medline]

Martin, B., Prudhomme, M., Alloing, G., Granadel, C. & Claverys, J. P. (2000). Cross-regulation of competence pheromone production and export in the early control of transformation in Streptococcus pneumoniae. Mol Microbiol 38, 867–878.[CrossRef][Medline]

McInerney, J. O. (1998). GCUA (General Codon usage Analysis). Bioinformatics 14, 372–373.[Abstract]

Médigue, C., Rouxel, T., Vigier, P., Hénaut, A. & Danchin, A. (1991). Evidence for horizontal gene transfer in Escherichia coli speciation. J Mol Biol 222, 851–856.[Medline]

Morrison, D. A. & Baker, M. F. (1979). Competence for genetic transformation in pneumococcus depends on synthesis of a small set of proteins. Nature 282, 215–217.[Medline]

Overweg, K., Pericone, C. D., Verhoef, G. G., Weiser, J. N., Meiring, H. D., De Jong, A. P., De Groot, R. & Hermans, P. W. (2000). Differential protein expression in phenotypic variants of Streptococcus pneumoniae. Infect Immun 68, 4604–4610.[Abstract/Free Full Text]

Paton, J. C., Andrew, P. W., Boulnois, G. J. & Mitchell, T. J. (1993). Molecular analysis of the pathogenicity of Streptococcus pneumoniae: the role of pneumococcal proteins. Annu Rev Microbiol 47, 89–115.[CrossRef][Medline]

Pericone, C. D., Overweg, K., Hermans, P. W. M. & Weiser, J. N. (2000). Inhibitory and bactericidal effects of hydrogen peroxide production by Streptococcus pneumoniae on other inhabitants of the upper respiratory tract. Infect Immun 68, 3990–3997.[Abstract/Free Full Text]

Sharp, P. M. (1991). Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J Mol Evol 33, 23–33.[Medline]

Sharp, P. M. & Li, W. (1987a). The codon adaptation index - a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15, 1281–1295.[Abstract]

Sharp, P. M. & Li, W. H. (1987b). The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol Biol Evol 4, 222–230.[Abstract]

Spellerberg, B., Cundell, D. R., Sandros, J., Pearce, B. J., Idanpaan-Heikkila, I., Rosenow, C. & Masure, H. R. (1996). Pyruvate oxidase, as a determinant of virulence in Streptococcus pneumoniae. Mol Microbiol 19, 803–813.[Medline]

Tettelin, H., Nelson, K. E., Paulsen, I. T. & 36 other authors (2001). Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science 293, 498–506.[Abstract/Free Full Text]

Tseng, H. J., McEwan, A. G., Paton, J. C. & Jennings, M. P. (2002). Virulence of Streptococcus pneumoniae: PsaA mutants are hypersensitive to oxidative stress. Infect Immun 70, 1635–1639.[Abstract/Free Full Text]

Wilkins, J. C., Homer, K. A. & Beighton, D. (2002). Analysis of Streptococcus mutans proteins modulated by culture under acidic conditions. Appl Environ Microbiol 68, 2382–2390.[Abstract/Free Full Text]

Wright, F. (1990). The ‘effective number of codons' used in a gene. Gene 87, 23–29.[CrossRef][Medline]

Yesilkaya, H., Kadioglu, A., Gingles, N., Alexander, J. E., Mitchell, T. J. & Andrew, P. W. (2000). Role of manganese-containing superoxide dismutase in oxidative stress and virulence of Streptococcus pneumoniae. Infect Immun 68, 2819–2826.[Abstract/Free Full Text]

Received 13 February 2004; revised 26 April 2004; accepted 28 April 2004.



This Article
Abstract
Full Text (PDF)
Supplementary data
Alert me when this article is cited
Alert me if a correction is posted
Citation Map
Services
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Google Scholar
Articles by Martín-Galiano, A. J.
Articles by de la Campa, A. G.
Articles citing this Article
PubMed
PubMed Citation
Articles by Martín-Galiano, A. J.
Articles by de la Campa, A. G.
Agricola
Articles by Martín-Galiano, A. J.
Articles by de la Campa, A. G.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
INT J SYST EVOL MICROBIOL MICROBIOLOGY J GEN VIROL
J MED MICROBIOL ALL SGM JOURNALS
Copyright © 2004 Society for General Microbiology.