Predicted highly expressed genes in the genomes of Streptomyces coelicolor and Streptomyces avermitilis and the implications for their metabolism

Gang Wu1, David E. Culley2 and Weiwen Zhang2

1 Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
2 Microbiology Department, Pacific Northwest National Laboratory, 902 Battelle Boulevard, PO Box 999, Mail Stop P7-50, Richland, WA 99352, USA

Correspondence
Weiwen Zhang
Weiwen.Zhang{at}pnl.gov


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Highly expressed genes in bacteria often have a stronger codon bias than genes expressed at lower levels, due to translational selection. In this study, a comparative analysis of predicted highly expressed (PHX) genes in the Streptomyces coelicolor and Streptomyces avermitilis genomes was performed using the codon adaptation index (CAI) as a numerical estimator of gene expression level. Although it has been suggested that there is little heterogeneity in codon usage in G+C-rich bacteria, considerable heterogeneity was found among genes in these two G+C-rich Streptomyces genomes. Using ribosomal protein genes as references, ~10 % of the genes were predicted to be PHX genes using a CAI cutoff value of greater than 0·78 and 0·75 in S. coelicolor and S. avermitilis, respectively. The PHX genes showed good agreement with the experimental data on expression levels obtained from proteomic analysis by previous workers. Among 724 and 730 PHX genes identified from S. coelicolor and S. avermitilis, 368 are orthologue genes present in both genomes, which were mostly ‘housekeeping’ genes involved in cell growth. In addition, 61 orthologous gene pairs with unknown functions were identified as PHX. Only one polyketide synthase gene from each Streptomyces genome was predicted as PHX. Nevertheless, several key genes responsible for producing precursors for secondary metabolites, such as crotonyl-CoA reductase and propionyl-CoA carboxylase, and genes necessary for initiation of secondary metabolism, such as adenosylmethionine synthetase, were among the PHX genes in the two Streptomyces species. The PHX genes exclusive to each genome, and what they imply regarding cellular metabolism, are also discussed.


Abbreviations: CAI, codon adaptation index; GC3s, G+C content at synonymously variable third positions of sense codons (except Met and Trp codons); Nc, effective number of codons; PHX, predicted highly expressed; SAV, S. avermitilis; SCO, S. coelicolor

The complete lists of all predicted highly expressed genes and their calculated CAI values for each Streptomyces genome, and the conserved PHX genes, are provided in Supplementary Tables 1, 2 and 3 with the online version of this paper.


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
The streptomycetes are among the most numerous and ubiquitous soil bacteria. In addition to their broad range of metabolic abilities, important for carbon recycling in soil environments, these Gram-positive bacteria are characterized by their complex morphological differentiation, resembling that of filamentous fungi (aerial mycelium), and the ability to produce a wide variety of secondary metabolites (Hopwood, 1999). The genomes of two species of streptomycetes, Streptomyces coelicolor and Streptomyces avermitilis, were recently sequenced (Bentley et al., 2002; Ikeda et al., 2003). Analysis of the genome sequences of these species revealed that they both contained linear chromosomes that have coding densities very similar to other bacteria. The large size (8–9 Mb) of Streptomyces genomes translates into a higher number of predicted genes than that of simple eukaryotes such as Saccharomyces cerevisiae (7825 predicted genes for S. coelicolor and 7574 for S. avermitilis, respectively) (Bentley et al., 2002; Ikeda et al., 2003; Hopwood, 2003). Comparative analysis of the S. avermitilis and S. coelicolor genomes revealed a 6·5 Mb highly conserved internal core region where most essential genes are located, with similar order and direction in these two species. The chemically diverse antibiotics are typically synthesized at the late exponential to stationary growth phases in these species (Hopwood, 1999; Huang et al., 2001).

Most of what is currently known about gene expression in Streptomyces has been obtained through investigations of individual genes in specific pathways (Hopwood, 1999). The availability of the complete genomic sequences from these organisms has made it possible for researchers to develop approaches that focus on the systemic properties of regulatory and metabolic networks, and to investigate gene expression and regulation in the context of a global cellular network. In several recent studies whole-genome DNA microarray or proteomics technologies have been applied to the study of expression patterns of genes and proteins associated with primary and secondary metabolism in S. coelicolor (Huang et al., 2001; Hesketh et al., 2002). Similar approaches have also been used to identify regulatory networks governing the induction of heat-shock genes in S. coelicolor (Bucca et al., 2003).

Codon preferences vary considerably within and between organisms (Grantham et al., 1981; Sharp et al., 1988; Karlin et al., 1998). Across genomes, the G+C composition resulting from mutational bias has been hypothesized to determine the major trends in codon usage of high- or low-G+C organisms (Knight et al., 2001). Within a genome, codon bias tends to be much stronger in highly expressed genes than in genes expressed at lower levels (Sharp & Li, 1986, 1987; Lafay et al., 2000; dos Reis et al., 2003). Selection for translational efficiency and accuracy has been suggested to be responsible for the stronger codon bias in the highly expressed genes of Saccharomyces cerevisiae and Escherichia coli (Ikemura, 1981, 1982). To dissect the patterns and causality of codon usage, many indices have been proposed to measure the degree and direction of codon bias (Sharp & Li, 1987; Wright, 1990). Among these, the ‘codon adaptation index’ (CAI) was proposed as a measure of codon usage in a gene relative to that in a reference set of genes (Sharp & Li, 1987). This index has been shown to correlate better with mRNA expression levels than other codon usage indices, such as the frequency of optimal codons (Ikemura, 1985) or the effective number of codons (Wright, 1990; Friberg et al., 2004). Therefore, CAI has been widely applied to the prediction of highly expressed genes in various organisms (Pan et al., 1998; Coghlan & Wolfe, 2000; dos Reis et al., 2003; Martin-Galiano et al., 2004). However, it has been suggested that for the bacteria with high G+C content, codon usage may not correlate well with gene expression level because there is generally little heterogeneity in codon usage among genes in these species, and all genes feature a similar, extremely biased codon usage (Ohama et al., 1990; Lafay et al., 2000). With the complete genomes of two Streptomyces species available, we decided to revisit the topic with the objectives to: (1) analyse the heterogeneity of codon bias among genes in Streptomyces genomes; (2) determine the feasibility of applying the CAI to predict highly expressed genes in two Streptomyces species as an alternative to experimental approaches; and (3) comparatively analyse the genes from each genome that are predicted to be highly expressed and interpret what this implies regarding cellular metabolisms in Streptomyces.


   METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Sequences.
Complete genome sequences of S. coelicolor and S. avermitilis, including their linear chromosomes and plasmids, were downloaded from the Comprehensive Microbial Resource of the Institute for Genomic Research (TIGR; http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl) (Bentley et al., 2002; Ikeda et al., 2003). The S. coelicolor genome includes 7513 ORFs in its chromosome and 384 ORFs in the SCP1 and SCP2 plasmids. The S. avermitilis genome contains 7576 ORFs in its chromosome and 96 ORFs in the SAP1 plasmid. The incomplete ORFs in both genomes were excluded in this study. Primary annotations done by the original researchers were also downloaded from TIGR (Bentley et al., 2002; Ikeda et al., 2003). The lists of genes conserved between the two Streptomyces genomes were retrieved from the Genome Project of Streptomyces avermitilis_AverGenome (http://avermitilis.ls.kitasato-u.ac.jp), which included 4864 orthologous genes with pair-wise comparison E-value <0·3 (Ikeda et al., 2003). The proteins detected by proteomic approaches, which represented the highly expressed genes under the tested conditions, were downloaded from StreptoBase Proteomics Database (http://dbkweb.ch.umist.ac.uk/StreptoBASE/s_coeli/referencegel/) (Hesketh et al., 2002). E. coli sequences (4211 ORFs) were downloaded from the ECOGENE database (http://bmb.med.miami.edu/EcoGene/EcoWeb/) (Rudd, 2000).

Analysis.
Three indices of codon usage bias were calculated for all genes in two genomes. The first one was the G+C content at synonymously variable third positions of sense codons except Met and Trp codons (GC3s), which can potentially vary from 0 to 1·0. The second was the ‘effective number of codons' (Nc) used in a gene (Wright, 1990). This is a measure of general non-uniformity of codon usage within groups of codon synonyms, which can vary from 20 (in a gene with extreme bias, where only one codon is used for each amino acid) to 61 (random codon usage). These two indices were calculated with CodonW (http://codonw.sourceforge.net//). The third was the ‘codon adaptation index’ (CAI), which was calculated using CAI Calculator 2 (http://www.evolvingcode.net/codon/CalculateCAIs.php). The CAI value varies from 0 to 1·0 (Sharp & Li, 1987), with higher CAI values indicating that the gene of interest has a codon usage pattern more similar to that in the reference genes. Statistical analysis including F-test of variance, t-test and Pearson correlation was performed as previously by Perriere & Thioulouse (2002).


   RESULTS AND DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Heterogeneity in codon usage in two Streptomyces genomes
For most bacteria with a balanced AT/GC genomic base composition, there is considerable heterogeneity in codon usage patterns among genes. This heterogeneity is usually associated with gene expression level, such that highly expressed genes have much higher frequencies of codons that are translationally optimal (Ikemura, 1981, 1982; Lafay et al., 2000). However, it is still unknown whether a similar correlation between gene expression and codon usage also holds for those organisms with extremely biased genomic base composition, such as the GC-rich Streptomyces genomes, although previous evidence showed the different codon usage of highly expressed EF-Tu gene in S. coelicolor, suggesting that weak translational selection may also be operating in Streptomyces (Wright & Bibb, 1992).

A two-step approach was taken to determine if codon heterogeneity existed among genes in Streptomyces. First, the G+C content at the third positions of codons (GC3s) and the effective number of codons (Nc) for all genes in both genomes were calculated. The results from Nc versus GC3s plots, which have been suggested to be an effective means to investigate the codon usage variations among genes in the same genome (Wright, 1990), showed that the Nc values of the genes range from 22 to 60 for both Streptomyces genomes (Fig. 1), suggesting that considerable heterogeneity is present in these GC-rich genomes. The genes encoding ribosomal proteins, which are expected to be expressed at high levels during rapid cell growth in Streptomyces (Blanco et al., 1994), were identified and are highlighted in the Nc plots. The clustering of most of the ribosomal protein genes of the two Streptomyces genomes at low ends is similar to that in the GC3s versus Nc plot of the E. coli genome (data not shown). The significantly stronger codon bias in the ribosomal protein genes (S. coelicolor: t=3·09, one-tailed P=0·0015; S. avermitilis: t=–5·94, one-tailed P=1·328x10–7) suggests that the codon usage in these highly expressed genes is a result of selection for translational efficiency as well.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 1. Effective number of codons used (Nc) in each gene plotted against the G+C content at synonymous third positions of codons (GC3s) for the S. coelicolor and S. avermitilis genomes. The continuous curve in each of the plots represents the null hypothesis that the GC bias at the synonymous site is solely due to mutation but not selection (Wright 1990). The ribosomal protein genes that are expected to be highly expressed are represented by black triangles.

 
As a second approach to determine codon heterogeneity between genes in these species, the CAI values were calculated for all genes in S. coelicolor, using the ribosomal protein genes as references, and plotted against GC3s. In this step, experimentally determined data on protein abundance were incorporated. In a previous study, approximately 10 % of the proteins from the S. coelicolor genome were detected using 2D gel electrophoresis and matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectrometry (Hesketh et al., 2002). The gene IDs of proteins detected by these proteomic approaches were downloaded and categorized based on the functional groups designated by TIGR. The proteins involved in primary metabolism, such as amino acid biosynthesis and energy metabolism, were identified and plotted as CAI versus GC3s along with all other genes in the S. coelicolor genome (Fig. 2a). Analysis of the frequency distribution of CAI values of the proteins involved in primary metabolism showed that most of them have a CAI >0·65 (Fig. 2b) and are clustered at the upper end with higher CAI and GC3s values, while the CAI values of all genes are distributed over a very wide range (from 0·118 to 0·95; Fig. 2a). This indicates a good correlation between the experimentally determined ‘highly expressed genes' with the ‘highly expressed genes' predicted using CAI values (Fig. 2). These analyses demonstrated that there are considerable heterogeneities in codon usage within these Streptomyces genomes, and that the CAI value could be a useful quantitative indicator for gene expression in Streptomyces.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 2. Agreement of the calculated CAI values with proteomics data experimentally determined in S. coelicolor. (a) Calculated CAI values for all coding genes in the S. coelicolor genome plotted against GC3s, as indicated by grey circles. Several major functional categories of the genes whose protein products were previously detected by previous proteomic study (Hesketh et al., 2002) are indicated by different coloured symbols. The cluster including most of the proteins identified by the proteomics approach is ringed. (b) Frequency distribution of CAI values for the genes that are involved in primary metabolism (genes belonging to the functional categories indicated by colour in a) and whose protein products were detected by previous proteomic study (Hesketh et al., 2002).

 
Identification of PHX genes in two Streptomyces genomes
The CAI values for all genes in both Streptomyces genomes were calculated, and their distributions are shown in Fig. 3. The CAI values range from 0·118 to 0·95, but the majority of genes have CAI values between 0·4 and 0·8. Only about 5 % showed CAI values greater than 0·8 (see Supplementary Tables 1 and 2 with the online version of this paper). The median CAI values for genes located on the chromosomes are 0·65 for S. coelicolor and 0·61 for S. avermitilis. No obvious correlation between CAI values and gene length was found (S. coelicolor, R=0·207; S. avermitilis, R=0·134), suggesting that codon bias is not likely to be the major mechanism determining the efficient translation of long genes in Streptomyces (Martin-Galiano et al., 2004). In addition, the genes located in plasmids were found to have relatively low median CAI values: 0·51 for genes carried on SCP1 and 0·61 for genes on SCP2 of S. coelicolor, and 0·54 for genes on SAP1 of S. avermitilis. The top 10 % of the genes, in terms of CAI value, were defined as the predicted highly expressed (PHX) genes. This corresponded to CAI cutoffs of 0·78 in S. coelicolor and 0·75 in S. avermitilis, and included 724 and 730 genes for S. coelicolor and S. avermitilis, respectively.



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 3. Frequency distribution of CAI values for all coding genes in the genomes of S. coelicolor (grey bars) and S. avermitilis (black bars).

 
Functional analysis showed that in both genomes, the PHX genes involved in energy metabolism were the largest functional class, followed by genes involved in transport functions. In addition, the functional classes of amino acid biosynthesis, DNA metabolism, central intermediary metabolism, fatty acid and phospholipid metabolism, and purine, pyrimidine, nucleoside and nucleotide metabolism each also included 20–30 PHX genes. However, only 9 and 11 genes involved in regulatory functions and signal transduction were predicted as PHX genes in S. coelicolor and S. avermitilis, out of a total of 765 and 731 genes in these categories according to functional annotation by TIGR. Additionally, none of the genes responsible for biosynthesis of the red-pigmented tripyrrole undecylprodigiosin (Red), the lipopeptide calcium-dependent antibiotic (CDA) and the deep-blue-pigmented polyketide actinorhodin (Act) in S. coelicolor, and avermectins in S. avermitilis, were predicted as PHX (Supplementary Tables 1 and 2).

The top 20 PHX genes of S. coelicolor and S. avermitilis included five ribosomal protein genes and seven others: genes encoding translation elongation factor Ts; the 60 kDa chaperonin involved in protein fate; ketol-acid reductoisomerase (ilvC), involved in the isoleucine and valine biosynthetic pathway (Cordes et al., 1992); aconitate hydratase, involved in the tricarboxylic acid cycle (Fisher & Magasanik, 1984), enolase and triosephosphate isomerase, involved in the glycolytic pathway (Leyva-Vazquez & Setlow, 1994); and serine hydroxymethyltransferase, which catalyses the reversible interconversion of serine and tetrahydrofolate to glycine and methylenetetrahydrofolate required for cytoplasmic one-carbon metabolism (Schirch et al., 1985). Among the top 20 PHX genes in S. coelicolor, 13 were also detected on 2D gels (Hesketh et al., 2002). In comparison with the list of top 20 PHX genes identified from the genomes of E. coli, Vibrio cholerae, Haemophilus influenzae and Bacillus subtilis (Karlin et al., 2001), the Streptomyces genomes shared the 60 kDa chaperonin, ketol-acid reductoisomerase and enolase, as well as five ribosomal proteins.

Comparison of the S. avermitilis and S. coelicolor genomes has previously revealed that a 6·5 Mb, highly conserved internal core region contains most of the ‘housekeeping’ genes (1·0–7·5 Mb for S. coelicolor and 2·0–8·5 Mb for S. avermitilis), while most of the laterally acquired genes are present in both arms outside the core region (Bentley et al., 2002; Ikeda et al., 2003). We have previously found that the majority of Streptomyces PPM-family protein phosphatases, whose origin involved lateral acquisition, are located outside the core conserved region (Shi & Zhang, 2004). Analysis of the distribution of PHX genes in the linear chromosomes of these two Streptomyces species showed a preferred location in the conserved cores, while only few PHX genes were found located in the arm regions (Fig. 4).



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 4. Distribution of PHX genes across the linear chromosomes of S. coelicolor (grey bars) and S. avermitilis (black bars). The numbers represent the ratio of PHX genes versus the total number of genes in each 0·5 Mb chromosomal fragment. The 6·5 Mb conserved cores in each Streptomyces genome are indicated by the bars below the figure. The inverted S avermitilis genome sequence was used for comparison.

 
Orthologous PHX genes in two Streptomyces genomes
A total of 368 orthologous gene pairs were identified as PHX in both genomes (Ikeda et al., 2003), with 77 PHX genes involved in protein biosynthesis and fate, 67 PHX genes involved in energy metabolism and 18 PHX genes in amino acid biosynthesis (Supplementary Table 3). About 42 % of the total PHX gene products were previously detected on 2D gels, while none of the genes predicted to be expressed at lower levels were detectable (CAI value <0·5 in both genomes, ~20 % of total genes). For PHX genes in amino acid biosynthesis and purine and pyrimidine metabolism, more than 60 % of their products were detected on 2D gels (Supplementary Table 3).

Protein synthesis.
Twenty-two ribosomal protein genes were among the PHX genes in both Streptomyces genomes, with most encoding genes for the small ribosomal subunits (Fig. 1, Supplementary Table 3). Five genes encoding translation factors, including elongation factor P, G, Tu, Ts and peptide chain release factor I, were also PHX in both genomes. Among genes functioning in translation, amino-acyl tRNA synthetases are generally not identified as PHX genes in many bacteria (Mrazek et al., 2001; Karlin et al., 2001). However, our results showed that in Streptomyces a total of 16 genes encoding various amino-acyl tRNA synthases with different substrate specificities are predicted as PHX genes.

Amino acid biosynthesis.
Eighteen genes involved in the biosynthesis of amino acids were predicted as PHX genes, including genes in the biosynthetic pathways of tryptophan, threonine, serine and branched chain amino acids (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Orthologous PHX pairs involved in amino acid metabolism

 
Argininosuccinase and argininosuccinate synthase were involved in the arginine biosynthetic pathway (Redshaw et al., 1979). Early study indicated that disruption of argininosuccinate synthase blocked the formation of aerial mycelium in several Streptomyces strains (Williams & Rogers, 1987). Methionine synthase is a vitamin B12-dependent enzyme that catalyses the final step in methionine biosynthesis, the conversion of homocysteine to methionine. Recent study showed that the disruption of the metH gene in S. coelicolor not only affected vegetative growth on minimal medium, but on rich medium blocked the conversion of aerial hyphae into chains of spores (Gehring et al., 2004). Glutamine synthetase is responsible for the ATP-dependent synthesis of glutamine from ammonium and glutamate. Since the amide moiety of glutamine serves as the nitrogen donor for the synthesis of many primary and secondary metabolites, glutamine synthetase occupies a crucial position in nitrogen metabolism in S. coelicolor (Fisher & Wray, 1989).

Central intermediary metabolism.
Several PHX genes in this functional class are also involved in secondary metabolism in Streptomyces (Table 2). The roles of S-adenosylmethionine synthetase (SAM-s), an intracellular factor in both cellular differentiation and antibiotic production in Streptomyces species, were previously established by showing that the overexpression of the SAM-s gene in Streptomyces lividans TK23 inhibited sporulation and aerial mycelium formation, but enhanced the production of actinorhodin (Okamoto et al., 2003). In addition, the SAM-s gene was found to be highly expressed in the actinorhodin-overproducing S. coelicolor mutant KO-179 (Kim et al., 2003).


View this table:
[in this window]
[in a new window]
 
Table 2. Orthologous PHX pairs involved in central intermediary metabolism

 
In Streptomyces, methylmalonyl-CoA and ethylmalonyl-CoA are two of the most common chain extender units for the biosynthesis of many polyketide antibiotics, and several pathways for their generation have been proposed (Hopwood & Sherman, 1990; Zhang & Reynolds, 2001). Three generally accepted routes to methylmalonyl-CoA are: (i) the isomerization of succinyl-CoA, an intermediate in the TCA cycle, catalysed by the coenzyme B12-dependent methylmalonyl-CoA mutase; (ii) carboxylation of propionyl-CoA, catalysed by propionyl-CoA carboxylase; and (iii) a multistep conversion of acetoacetyl-CoA by enzymes including crotonyl-coenzyme A reductase and isobutyryl-CoA mutase (Birch et al., 1993; Zhang et al., 1999; Li et al., 2004). In addition, a meaA gene, which shares similarity with the large subunit of methylmalonyl-CoA mutase, was involved in an unknown pathway to methylmalonyl-CoA formation (Zhang & Reynolds, 2001). Crotonyl-coenzyme A reductase and isobutyryl-CoA mutase were also involved in two pathways for production of ethylmalonyl-CoA, from valine degradation or acetoacetyl-CoA, respectively (Zerbe-Burkhardt et al., 1998; Liu & Reynolds, 1999). Our analysis identifed the key genes in all three pathways, encoding crotonyl-coenzyme A reductase, propionyl-CoA carboxylase and the methylmalonyl-CoA mutase homologue, as PHX genes in both Streptomyces genomes. This result, which initially appears to be inconsistent with our finding that almost no gene responsible for biosynthesis of secondary metabolites was PHX, can be explained by the fact that these genes, in addition to their functions in providing precursors for secondary metabolism, may also be involved in many other primary metabolic activities. For example, propionyl-CoA carboxylase may be involved in fatty acid metabolism (Rodriguez & Gramajo, 1999), and crotonyl-coenzyme A reductase has been suggested to be involved in a novel pathway for the growth of Streptomyces collinus on acetate (Han & Reynolds, 1997).

Energy metabolism.
The genes involved in energy metabolism can be divided into four groups: glycolysis, pyruvate metabolism, the pentose phosphate pathway and the TCA cycle (Table 3). The genes in glycolysis and pyruvate metabolism are predominantly PHX in most fast-growing bacteria (Karlin et al., 2001). This holds true for S. coelicolor and S. avermitilis as well, where almost all genes involved in glycolysis and pyruvate metabolism were PHX genes in both genomes. These included the genes for 6-phosphofructokinase, fructose-1,6-bisphosphate aldolase, triosephosphate isomerase, glyceraldehyde-3-phosphate dehydrogenase, phosphoglycerate kinase, phosphoglycerate mutase, enolase, pyruvate kinase, pyruvate dehydrogenase and dihydrolipoamide dehydrogenase. Most genes in the pentose phosphate pathway are not PHX in other fast-growing bacteria (Karlin et al., 2001). In agreement with this observation, only transketolase and transaldolase, the genes encoding the last two steps of the pentose phosphate pathway, were PHX in the Streptomyces species. In spite of the vital role the TCA cycle plays in energy metabolism, previous studies have shown that the TCA genes are generally not PHX in B. subtilis, H. influenzae and Synechocystis, and only the genes before the succinyl-CoA synthetase step are PHX in E. coli (Karlin et al., 2001; Mrazek et al., 2001). However, in S. coelicolor and S. avermitilis all of the genes in the TCA cycle were predicted to be PHX genes, including citrate synthase, aconitate hydratase, isocitrate dehydrogenase, 2-oxoglutarate dehydrogenase, succinyl-CoA synthetase, succinate dehydrogenase, fumarate hydratase and malate dehydrogenase (in order of their action in the TCA cycle) (Table 3). One possible reason for the high presence of PHX genes in the TCA cycle genes of Streptomyces may be that members of this genus depend on the TCA cycle not only for ATP production, but also as a major source of carbon chain precursors to various primary and secondary metabolites, such as methylmalonyl-CoA (Karlin et al., 2001; Zhang & Reynolds, 2001). Several genes encoding cytochrome, electron-transfer flavoprotein genes and ATP synthase are also among the PHX in the two Streptomyces species.


View this table:
[in this window]
[in a new window]
 
Table 3. Orthologous PHX pairs involved in energy metabolism

 
Fatty acid metabolism.
Fatty acid metabolism in Streptomyces is crucial because not only does it provide various fatty acids and phospholipids necessary for cell growth, but it also serves as a source of precursors for biosynthesis of secondary metabolites (Hopwood, 1993). The PHX genes involved in fatty acid biosynthesis in S. coelicolor and S. avermitilis included two genes encoding 3-oxoacyl-ACP reductases (FabG and FabG5), genes for enoyl-(ACP) reductases, acyl carrier protein (ACP) and fatty acid biosynthesis condensing enzyme (FabB). In addition, three genes involved in oxidation of fatty acids and two genes involved in activation of free fatty acids were identified as PHX in the Streptomyces genomes (Supplementary Table 3).

PHX genes with unknown function.
Sixty-one orthologous PHX gene pairs from the two Streptomyces genomes were annotated as hypothetical proteins or enzymes without known substrate specificity; however, one-third of them were detected in previous 2D gel experiments with S. coelicolor, suggesting that they may play important roles in cellular metabolism (Hesketh et al., 2002) (Supplementary Table 3).

PHX genes exclusive to each Streptomyces genome
A total of 356 genes were identified as PHX exclusively in S. coelicolor, and 362 genes as PHX exclusively in S. avermitilis. They were divided into two categories. The first category included genes with orthologues present in both genomes, but they were PHX genes in only one genome, not the other: 252 PHX genes in S. coelicolor and 225 PHX genes in S. avermitilis belong to this category. The second category included PHX genes without any orthologue in another genome: 103 PHX genes in S. coelicolor and 137 PHX genes in S. avermitilis belong to this category. In this paper, we will focus only on the analysis of the second category. The functionally known PHX genes exclusive to each genome are given in Table 4 and Table 5. Several interesting observations are discussed below.


View this table:
[in this window]
[in a new window]
 
Table 4. PHX genes exclusive to S. coelicolor

 

View this table:
[in this window]
[in a new window]
 
Table 5. PHX genes exclusive to S. avermitilis

 
Extracellular enzymes with glucanase and chitinase activities are important for energy metabolism from the hydrolysis of glucans that are widely present in soil. In addition, they are an important component for antagonism against fungi by Streptomyces species in natural environments. Different genes involved in these activities were found highly expressed in S. coelicolor and S. avermitilis. Two genes encoding glucanases, SAV2109 for a {beta}-glucanase and SAV2568 for an endoglucanase, were predicted as PHX in S. avermitilis, while SCO7263 encoding chitinase and SCO7575 encoding laminarinase were PHX in S. coelicolor.

Only one polyketide synthase gene from each Streptomyces genome was predicted as PHX (SCO6431 and SAV3649), which is consistent with the fact most secondary metabolites are synthesized after growth slows down. In addition, SAV3159, encoding a non-ribosomal peptide synthetase, was identified as PHX in S. avermitilis.

While different Streptomyces strains may share the conserved genes for certain key ‘housekeeping’ functions, the regulatory systems that control the expression of these conserved genes may be different (Chater & Horinouchi, 2003). It is thus worth noting that several genes involved in regulatory function were highly expressed exclusively in each of the two Streptomyces genomes. Two regulatory genes were identified as PHX in S. coelicolor. SCO4008 encodes a putative TetR family regulatory protein and SCO5338 a putative regulatory protein with 83 % identity to regulatory protein Pra in Streptomyces ambofaciens, which has been suggested as an activator of replication, integration and excision of the site-specific integrative element pSAM2 (Sezonov et al., 1998). Four regulatory genes were identified as PHX in S. avermitilis. SAV7267 encodes a protein with 60 % identity to a MalR repressor protein regulating regulated maltose metabolism of S. coelicolor. SAV3638 is a syrP-like gene encoding a regulatory protein that participates in a phosphorylation cascade controlling syringomycin production and virulence in Pseudomonas syringae pv. syringae (Zhang et al., 1997); the gene is located in a non-ribosomal peptide synthetase gene cluster (nrps2 gene cluster). In addition, SAV1195 encodes a putative RNA polymerase ECF-subfamily sigma factor, and SAV1199 encodes an AraC-type transcriptional regulator.

Conclusions
Although the concept of predicting gene expression from codon usage bias was proposed decades ago (Sharp & Li, 1986, 1987), only recently have these methods been successfully applied to the identification of highly expressed genes in various bacteria and eukaryotic organisms (Karlin & Mrazek, 2000; Karlin et al., 2001; Mrazek et al., 2001; Martin-Galiano et al., 2004). One reason for the earlier lack of success with this approach to predict gene expression levels is that it was originally proposed before whole-genome sequences were available and was based on analyses of small sets of genes. Because of this limited dataset, it has been difficult to evaluate the correlation of codon usage and expression potential in a global context. However, with recent progress in whole-genome analysis technologies, such as DNA microarray and proteomics, it is now possible to compare predictions based on codon usage data with experimental data on protein and mRNA expression in a more quantitative way (dos Reis et al., 2003; Jansen et al., 2003; Friberg et al., 2004). In this study, various approaches to estimating gene expression levels based on codon usage were applied to two industrially important Streptomyces strains with the objectives of testing this alternative method of studying whole-genome gene expression. Our results demonstrated significant heterogeneity in codon usage between genes in the two Streptomyces genomes. Furthermore, the predicted gene expression level using the quantitative measure CAI was found to correlate well with the highly abundant proteins detected by a 2D gel proteomics approach (Hesketh et al., 2002). In addition, since the expression levels measured by current DNA microarray and proteomics technologies represent the accumulated results of expression and degradation, the results from this computational approach could be used as reference data for calibrating and better interpreting experimental data. For example, observation of low levels of expression from proteomic or microarray data for a gene with a high PHX index might suggest the possible involvement of degradation in regulating expression levels of that gene. Although most of the PHX genes predicted are ‘housekeeping’ genes, the study also identified a number of functionally unknown genes as PHX based on their codon profile (Supplementary Tables 1, 2 and 3). Further investigation of these genes by an integrated computational and experimental approach will enhance our knowledge of the metabolism of Streptomyces species.


   ACKNOWLEDGEMENTS
 
We would like to thank Dr Liang Shi of Pacific Northwest National Laboratory for his critical reading of this manuscript, and Dr Lei Nie of the Department of Mathematics and Statistics, University of Maryland-Baltimore County, for his help with statistical analysis. Pacific Northwest National Laboratory is operated by Battelle Memorial Institute for the US Department of Energy through contract DE-AC06-76RLO 1830.


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Bentley, S. D., Chater, K. F., Cerdeno-Tarraga, A. M. & 40 other authors (2002). Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature 417, 141–147.[CrossRef][Medline]

Birch, A., Leiser, A. & Robinson, J. A. (1993). Cloning, sequencing, and expression of the gene encoding methylmalonyl-coenzyme A mutase from Streptomyces cinnamonensis. J Bacteriol 175, 3511–3519.[Abstract]

Blanco, G., Rodicio, M. R., Puglia, A. M., Mendez, C., Thompson, C. J. & Salas, J. A. (1994). Synthesis of ribosomal proteins during growth of Streptomyces coelicolor. Mol Microbiol 12, 375–385.[Medline]

Bucca, G., Brassington, A. M., Hotchkiss, G., Mersinias, V. & Smith, C. P. (2003). Negative feedback regulation of dnaK, clpB and lon expression by the DnaK chaperone machine in Streptomyces coelicolor, identified by transcriptome and in vivo DnaK-depletion analysis. Mol Microbiol 50, 153–166.[CrossRef][Medline]

Chater, K. F. & Horinouchi, S. (2003). Signalling early developmental events in two highly diverged Streptomyces species. Mol Microbiol 48, 9–15.[CrossRef][Medline]

Coghlan, A. & Wolfe, K. H. (2000). Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae. Yeast 16, 1131–1145.[CrossRef][Medline]

Cordes, C., Mockel, B., Eggeling, L. & Sahm, H. (1992). Cloning, organization and functional analysis of ilvA, ilvB and ilvC genes from Corynebacterium glutamicum. Gene 112, 113–116.[CrossRef][Medline]

Dos Reis, M., Wernisch, L. & Savva, R. (2003). Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res 31, 6976–6985.[Abstract/Free Full Text]

Fisher, S. H. & Magasanik, B. (1984). 2-Ketoglutarate and the regulation of aconitase and histidase formation in Bacillus subtilis. J Bacteriol 158, 379–382.[Medline]

Fisher, S. H. & Wray, L. V., Jr (1989). Regulation of glutamine synthetase in Streptomyces coelicolor. J Bacteriol 171, 237823–237883.

Friberg, M., von Rohr, P. & Gonnet, G. (2004). Limitations of codon adaptation index and other coding DNA-based features for prediction of protein expression in Saccharomyces cerevisiae. Yeast 21, 1083–1093.[CrossRef][Medline]

Gehring, A. M., Wang, S. T., Kearns, D. B., Storer, N. Y. & Losick, R. (2004). Novel genes that influence development in Streptomyces coelicolor. J Bacteriol 186, 3570–3577.[Abstract/Free Full Text]

Grantham, R., Gautier, C., Gouy, M., Jacobzone, M. & Mercier, R. (1981). Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res 9, r43–r74.[Abstract]

Han, L. & Reynolds, K. A. (1997). A novel alternate anaplerotic pathway to the glyoxylate cycle in streptomycetes. J Bacteriol 179, 5157–5164.[Abstract/Free Full Text]

Hesketh, A. R., Chandra, G., Shaw, A. D., Rowland, J. J., Kell, D. B., Bibb, M. J. & Chater, K. F. (2002). Primary and secondary metabolism, and post-translational protein modifications, as portrayed by proteomic analysis of Streptomyces coelicolor. Mol Microbiol 46, 917–932.[CrossRef][Medline]

Hopwood, D. A. (1993). Genetic engineering of Streptomyces to create hybrid antibiotics. Curr Opin Biotechnol 4, 531–537.[CrossRef][Medline]

Hopwood, D. A. (1999). Forty years of genetics with Streptomyces: from in vivo through in vitro to in silico. Microbiology 145, 2183–2202.[Medline]

Hopwood, D. A. (2003). The Streptomyces genome – be prepared! Nat Biotechnol 21, 505–506.[CrossRef][Medline]

Hopwood, D. A. & Sherman, D. H. (1990). Molecular genetics of polyketides and its comparison to fatty acid biosynthesis. Annu Rev Genet 24, 37–66.[CrossRef][Medline]

Huang, J., Lih, C. J., Pan, K. H. & Cohen, S. N. (2001). Global analysis of growth phase responsive gene expression and regulation of antibiotic biosynthetic pathways in Streptomyces coelicolor using DNA microarrays. Genes Dev 15, 3183–3192.[Abstract/Free Full Text]

Ikeda, H., Ishikawa, J., Hanamoto, K. & 7 other authors (2003). Complete genome sequence and comparative analysis of the industrial microorganism Streptomyces avermitilis. Nat Biotechnol 21, 526–531.[CrossRef][Medline]

Ikemura, T. (1981). Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol 151, 389–409.[CrossRef][Medline]

Ikemura, T. (1982). Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J Mol Biol 158, 573–597.[CrossRef][Medline]

Ikemura, T. (1985). Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2, 13–34.[Abstract]

Jansen, R., Bussemaker, H. J. & Gerstein, M. (2003). Revisiting the codon adaptation index from a whole-genome perspective: analyzing the relationship between gene expression and codon occurrence in yeast using a variety of models. Nucleic Acids Res 31, 2242–2251.[Abstract/Free Full Text]

Karlin, S. & Mrazek, J. (2000). Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol 182, 5238–5250.[Abstract/Free Full Text]

Karlin, S., Campbell, A. M. & Mrazek, J. (1998). Comparative DNA analysis across diverse genomes. Annu Rev Genet 32, 185–225.[CrossRef][Medline]

Karlin, S., Mrazek, J., Campbell, A. & Kaiser, D. (2001). Characterizations of highly expressed genes of four fast-growing bacteria. J Bacteriol 183, 5025–5040.[Abstract/Free Full Text]

Kim, D. J., Huh, J. H., Yang, Y. Y., Kang, C. M., Lee, I. H., Hyun, C. G., Hong, S. K. & Suh, J. W. (2003). Accumulation of S-adenosyl-L-methionine enhances production of actinorhodin but inhibits sporulation in Streptomyces lividans TK23. J Bacteriol 185, 592–600.[Abstract/Free Full Text]

Knight, R. D., Freeland, S. J. & Landweber, L. F. (2001). A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2, research0010.1–0010.13; doi:10·1186/gb-2001-2-4-research0010.

Lafay, B., Atherton, J. C. & Sharp, P. M. (2000). Absence of translationally selected synonymous codon usage bias in Helicobacter pylori. Microbiology 146, 851–860.[Medline]

Leyva-Vazquez, M. A. & Setlow, P. (1994). Cloning and nucleotide sequences of the genes encoding triose phosphate isomerase, phosphoglycerate mutase, and enolase from Bacillus subtilis. J Bacteriol 176, 3903–3910.[Abstract]

Li, C., Florova, G., Akopiants, K. & Reynolds, K. A. (2004). Crotonyl-coenzyme A reductase provides methylmalonyl-CoA precursors for monensin biosynthesis by Streptomyces cinnamonensis in an oil-based extended fermentation. Microbiology 150, 3463–3472.[CrossRef][Medline]

Liu, H. & Reynolds, K. A. (1999). Role of crotonyl coenzyme A reductase in determining the ratio of polyketides monensin A and monensin B produced by Streptomyces cinnamonensis. J Bacteriol 181, 6806–6813.[Abstract/Free Full Text]

Martin-Galiano, A. J., Wells, J. M. & de la Campa, A. G. (2004). Relationship between codon biased genes, microarray expression values and physiological characteristics of Streptococcus pneumoniae. Microbiology 150, 2313–2325.[CrossRef][Medline]

Mrazek, J., Bhaya, D., Grossman, A. R. & Karlin, S. (2001). Highly expressed and alien genes of the Synechocystis genome. Nucleic Acids Res 29, 1590–1601.[Abstract/Free Full Text]

Ohama, T., Muto, A. & Osawa, S. (1990). Role of GC-biased mutation pressure on synonymous codon choice in Micrococcus luteus, a bacterium with a high genomic GC-content. Nucleic Acids Res 18, 1565–1569.[Abstract]

Okamoto, S., Lezhava, A., Hosaka, T., Okamoto-Hosoya, Y. & Ochi, K. (2003). Enhanced expression of S-adenosylmethionine synthetase causes overproduction of actinorhodin in Streptomyces coelicolor A3(2). J Bacteriol 185, 601–609.[Abstract/Free Full Text]

Pan, A., Dutta, C. & Das, J. (1998). Codon usage in highly expressed genes of Haemophillus influenzae and Mycobacterium tuberculosis: translational selection versus mutational bias. Gene 215, 405–413.[CrossRef][Medline]

Perriere, G. & Thioulouse, J. (2002). Use and misuse of correspondence analysis in codon usage studies. Nucleic Acids Res 30, 4548–4555.[Abstract/Free Full Text]

Redshaw, P. A., McCann, P. A., Pentella, M. A. & Pogell, B. M. (1979). Simultaneous loss of multiple differentiated functions in aerial mycelium-negative isolates of streptomycetes. J Bacteriol 137, 891–899.[Medline]

Rodriguez, E. & Gramajo, H. (1999). Genetic and biochemical characterization of the alpha and beta components of a propionyl-CoA carboxylase complex of Streptomyces coelicolor A3(2). Microbiology 145, 3109–3119.[Medline]

Rudd, K. E. (2000). ECOGENE: a genome sequence database for Escherichia coli K-12. Nucleic Acids Res 28, 60–64.[Abstract/Free Full Text]

Schirch, V., Hopkins, S., Villar, E. & Angelaccio, S. (1985). Serine hydroxymethyltransferase from Escherichia coli: purification and properties. J Bacteriol 163, 1–7.[Medline]

Sezonov, G., Duchene, A. M., Friedmann, A., Guerineau, M. & Pernodet, J. L. (1998). Replicase, excisionase, and integrase genes of the Streptomyces element pSAM2 constitute an operon positively regulated by the pra gene. J Bacteriol 180, 3056–3061.[Abstract/Free Full Text]

Sharp, P. M. & Li, W. H. (1986). An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol 24, 28–38.[Medline]

Sharp, P. M. & Li, W. H. (1987). The codon adaptation index – a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15, 1281–1295.[Abstract]

Sharp, P. M., Cowe, E., Higgins, D. G., Shields, D. C., Wolfe, K. H. & Wright, F. (1988). Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity. Nucleic Acids Res 16, 8207–8211.[Abstract]

Shi, L. & Zhang, W. (2004). Comparative analysis of eukaryotic-type protein phosphatases in two streptomycete genomes. Microbiology 150, 2247–2256.[CrossRef][Medline]

Williams, M. G. & Rogers, P. (1987). Expression of arg genes of Escherichia coli during arginine limitation dependent upon stringent control of translation. J Bacteriol 169, 1644–1650.[Medline]

Wright, F. (1990). The ‘effective number of codons' used in a gene. Gene 87, 23–29.[CrossRef][Medline]

Wright, F. & Bibb, M. J. (1992). Codon usage in the G+C-rich Streptomyces genome. Gene 113, 55–65.[CrossRef][Medline]

Zerbe-Burkhardt, K., Ratnatilleke, A., Philippon, N., Birch, A., Leiser, A., Vrijbloed, J. W., Hess, D., Hunziker, P. & Robinson, J. A. (1998). Cloning, sequencing, expression, and insertional inactivation of the gene for the large subunit of the coenzyme B12-dependent isobutyryl-CoA mutase from Streptomyces cinnamonensis. J Biol Chem 273, 6508–6517.[Abstract/Free Full Text]

Zhang, J. H., Quigley, N. B. & Gross, D. C. (1997). Analysis of the syrP gene, which regulates syringomycin synthesis by Pseudomonas syringae pv. syringae. Appl Environ Microbiol 63, 2771–2778.[Abstract]

Zhang, W. & Reynolds, K. A. (2001). MeaA, a putative coenzyme B12-dependent mutase, provides methylmalonyl coenzyme A for monensin biosynthesis in Streptomyces cinnamonensis. J Bacteriol 183, 2071–2080.[Abstract/Free Full Text]

Zhang, W., Yang, L., Jiang, W., Zhao, G., Yang, Y. & Chiao, J. (1999). Molecular analysis and heterologous expression of the gene encoding methylmalonyl-coenzyme A mutase from rifamycin SV-producing strain Amycolatopsis mediterranei U32. Appl Biochem Biotechnol 82, 209–225.[CrossRef][Medline]

Zhang, Y. X., Denoya, C. D., Skinner, D. D. & 7 other authors (1999). Genes encoding acyl-CoA dehydrogenase (AcdH) homologues from Streptomyces coelicolor and Streptomyces avermitilis provide insights into the metabolism of small branched-chain fatty acids and macrolide antibiotic production. Microbiology 145, 2323–2334.[Medline]

Received 14 December 2004; revised 21 February 2005; accepted 1 April 2005.



This Article
Abstract
Full Text (PDF)
Supplementary tables
HTML Page - index.htslp
Alert me when this article is cited
Alert me if a correction is posted
Citation Map
Services
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Google Scholar
Articles by Wu, G.
Articles by Zhang, W.
Articles citing this Article
PubMed
PubMed Citation
Articles by Wu, G.
Articles by Zhang, W.
Agricola
Articles by Wu, G.
Articles by Zhang, W.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
INT J SYST EVOL MICROBIOL MICROBIOLOGY J GEN VIROL
J MED MICROBIOL ALL SGM JOURNALS
Copyright © 2005 Society for General Microbiology.