Only a Small Subset of the Horizontally Transferred Chromosomal Genes in Escherichia coli Are Translated into Proteins*,S
Masato Taoka
,
Yoshio Yamauchi
,
Takashi Shinkawa
,
Hiroyuki Kaji
,
Wakana Motohashi
,
Hiroshi Nakayama¶,
Nobuhiro Takahashi|| and
Toshiaki Isobe
,
,**,
From the
Department of Chemistry, Graduate School of Science, Tokyo Metropolitan University, Minami-osawa 1-1, Hachioji-shi, Tokyo 192-0397, Japan;
Integrated Proteomics System Project, Pioneer Research on Genome the Frontier, MEXT, c/o Department of Chemistry, Graduate School of Science, Tokyo Metropolitan University, Minami-osawa 1-1, Hachioji-shi, Tokyo 192-0397, Japan; ¶ RIKEN (The Institute of Physical and Chemical Research), Hirosawa 2-1, Wako-shi, Saitama 351-0198, Japan; || Department of Applied Biological Science, Tokyo University of Agriculture and Technology, Saiwaicho 3-5-8, Fuchu-shi, Tokyo 183-8509, Japan; and ** Division of Proteomics Research, The Institute of Medical Science, The University of Tokyo, Shiroganedai 4-6-1, Minato-ku, Tokyo 108-8639, Japan
 |
ABSTRACT
|
---|
Horizontally transferred genes are believed to play a critical role in the divergence of bacterial strains from a common ancestor, but whether all of these genes express functional proteins in the cell remains unknown. Here, we used an integrated LC-based protein identification technology to analyze the proteome of Escherichia coli strain K12 (JM109) and identified 1,480 expressed proteins, which are equivalent to
35% of the total open reading frames predicted in the genome. This subset contained proteins with cellular abundance of several dozens to hundreds of thousands of copies, and included nearly all types of proteins in terms of chemical characteristics, subcellular distribution, and function. Interestingly, the subset also contained 138 of 164 gene products that are currently known to be essential for bacterial viability (84% coverage). However, the subset contained only a very small population (10%) of protein products from genes mapped within K-loops, which are "hot spots" for the integration of foreign DNAs within the K12 genome. On the other hand, these genes in K-loops appeared to be transcribed to RNAs almost as efficiently as the native genes in the bacterial cell as monitored by DNA microarray analysis, raising the possibility that most of the recently acquired foreign genes are inadequate for the translational machinery for the native genes and do not generate functional proteins within the cell.
It is generally accepted that the genetic diversity of organisms arose from a number of mechanisms to obtain a new gene, including lateral gene transfer from other species, creation of mosaic genes from parts of other genes, duplication of pre-existing genes, and de novo invention of genes from DNA that had previously a noncoding sequence (1). Unlike eukaryotes, which evolve principally through the modification of pre-existing genetic information, bacteria have obtained a significant proportion of their genetic diversity through the acquisition of genes from distantly related organisms (15). This lateral, or horizontal, gene transfer has been studied by genetic approaches that compare gene content in a variety of genomes (68) or find genes with atypical G+C content and pattern of codon usage (9,10). These studies showed that the genetic diversity of bacteria actually results not only from errors in DNA replication and repair but from horizontal exchange and recombination of DNA sequences from similar and disparate species. It is believed that the horizontally transferred genes ascribe the rapid adaptation of bacteria to novel environments and effectively change the pathogenic and ecological character of bacterial species (15). In Escherichia coli, for example, several hundreds of genes, among 4,289 predicted genes in the genome (11), were acquired from other organisms because this species had diverged from Salmonella lineage 100 million years ago (12). Also, the subsequent multistep transfer of gene clusters conferring virulence characteristics, such as O-loops, transformed a benign strain of E. coli into a pathogenic strain (13). Thus, horizontal gene transfer appears to have a large impact on bacterial evolution; however, whether all of these genes contribute bacterial diversity by expressing functional proteins in the cell remains unknown.
Proteomic technologies powered by advancements in MS and bioinformatics and coupled with accumulated genome sequence data allow a comprehensive study of protein constituents of the cell and tissues. In particular, an integrated multidimensional LC-based protein identification technology is powerful for large-scale and systematic protein identifications in very complex biological samples (14). We constructed one of those systems by combination of a fully automated microscale multidimensional LC and a high-resolution hybrid MS coupled with a data analysis system (15) and used the system for large-scale identification of proteins expressed in Caenorhabditis elegans (16) or for development of a novel strategy to identify N-glycoproteins (17). Here, we applied this technology to the comprehensive analysis of a proteome of E. coli strain K12 (JM109) and identified a protein subset corresponding to
35% of total ORFs predicted in the genome. Based on the analysis as well as the analysis of gene expression by DNA microarrays, we propose that most of the recently acquired foreign genes do not express protein products and may therefore be either pseudogenes or genes that serve to generate functional RNAs.
 |
EXPERIMENTAL PROCEDURES
|
---|
Sample Preparation
E. coli K12 JM109 (endA1, gyrA96, thi, hsdR17, supE44, relA1,
(lac-proAB), recA1, F[traD36, proAB+, laclq, lacZ
M15]) cells were grown in shaking culture at 37 °C in a Luria-Bertani medium containing 1% tryptone (Becton Dickinson, Sparks, MD), 0.5% yeast extract (Becton Dickinson), and 1% NaCl. The mixed late logarithmic and early stationary phase cells were harvested from growing bacteria (OD600 = 1.5) by centrifugation at 10,000 x g for 10 min at room temperature. The precipitate was solubilized in 6 M guanidinium hydrochloride and S-alkylated with iodoacetamide as described (15, 16). The S-carbamoylmethylated proteins were dialyzed against 10 mM ammonium bicarbonate (pH 8.0) to remove the excess reagents and digested overnight at 37 °C with sequence-grade modified trypsin (Promega, Madison, WI) at an enzyme-substrate ratio of 1:100 (w/w). The digest was acidified to pH 2.0 with 1 M HCl, and the resulting precipitate was removed by centrifugation. The supernatant was adjusted to pH 8.0 with aqueous ammonia (8 M) and was analyzed immediately for protein identification.
Automated Two-dimensional (2D)1 LC-MS/MS Analysis of Peptides
The tryptic digest was analyzed by an automated microscale 2DLC-MS/MS system as described (15, 16). Briefly, the peptide mixture was separated by a combination of first-dimensional anion-exchange LC on a bioassist-Q column (2-mm ID x 35 mm, 10-µm particles; TOSOH, Tokyo, Japan) and second-dimensional reversed-phase LC on a Mightysil-C18 column (320-µm ID x 100 mm, 3-µm particles; Kanto Chemicals, Tokyo, Japan), which was synchronized by a computer program. The system was also equipped with a small "trap" precolumn packed with Mightysil-C18 (1-mm ID x 5 mm) that was inserted between the two analytical columns through a six-way column-switching valve to remove salts from the anion-exchange LC. The eluted peptides were sprayed directly into a Q-TOF hybrid mass spectrometer (Q-Tof2; Micromass UK Ltd., Manchester, United Kingdom). The peptides were detected in the MS mode to select a set of precursor ions for a data-dependent, CID mass spectrometric (MS/MS) analysis, and every 4 s the largest four signals selected were subjected to the MS/MS analysis. The total analysis time for a single 2DLC-MS/MS process was 16 h.
Protein Identification by MS/MS and Data Analyses
The large volume of MS/MS data that were generated was converted to text files listing mass values and intensities of fragment ions and was processed by the Mascot algorithm (Matrix Science Ltd., London, United Kingdom) for peptide assignment with reference to the E. coli sequence databases (11) (m52p) maintained at the University of Wisconsin-Madison genome project (www.genome.wisc.edu/pub/analysis/m52p.fap). The parameters for the database search were as previously described (16). We first screened the candidate peptides with probability-based Mowse scores that exceeded their thresholds (p < 0.05) and then applied more strict criteria for final assignment (16, 18): i) when the match scores exceeded the threshold by 10, identifications were accepted without further consideration; ii) when scores were lower than 10 above the threshold or if identifications were based on single matched MS/MS spectrum, we manually inspected the raw data for confirmation prior to acceptance.
Database Search and Annotation
Protein annotations, such as cellular role and subcellular localization, were obtained from the knowledge databases EcoCyc (19) (biocyc.org/ecocyc), COGs (20) (www.ncbi.nlm.nih.gov/COG/), PEC (Ver2.26; shigen.lab.nig.ac.jp/ecoli/pec/index.jsp), and m52orfs (11) (www.genome.wisc.edu/pub/analysis/m52orfs.txt). The transmembrane segments and signal peptides of proteins were predicted by the SOSUI program (21) maintained at Tokyo University of Agriculture and Technology (sosui.proteome.bio.tuat.ac.jp/sosuiframe0.html).
DNA Microarray Analysis
E. coli strain K12 (JM109) was grown as described above, and total RNA (400 µg) was isolated from 50 ml of the cells at OD600 = 1.5 by hot phenol purification protocol. RNA (10 µg) was labeled by Bioarray terminal labeling kit (Affymetrix, Santa Clara, CA) for microarray analysis. The analysis was performed on Affymetrix E. coli Antisense GeneChip arrays as described (22). Raw data files were analyzed by the statistical algorithm (23) in the Microarray Analysis Suite 5.1 (Affymetrix) by using the default parameters and were exported as text files for further sorting by Excel 2000 software (Microsoft, Redmond, WA). In the absolute present analysis of mRNA, we set the stringent threshold of 0.03 for the detection p value (23). Under these conditions, we typically obtained "present call" for between 2,600 and 3,000 genes. The analysis was repeated twice, and the reproducible signals were assigned as the "present" RNAs.
 |
RESULTS AND DISCUSSION
|
---|
Comprehensive Analysis of E. coli Proteome
We used laboratory strain K12 (JM109) (24) as a source of E. coli, because much information including its genome sequence (11) is accumulated on a variety of databases. To achieve a comprehensive analysis of the proteome, whole bacterial cells collected in a mixed late logarithmic and early stationary phase culture was dissolved in 6 M guanidinium hydrochloride, S-carbamoylmethylated under reducing conditions and digested with trypsin. The resulting peptide mixture was then analyzed directly by an integrated multidimensional protein identification system (1517), and spectral data were automatically processed to search the E. coli sequence database m52p (11) for protein identification.
After removing redundant peptide assignments derived from any single protein, we identified
850 proteins in a single analysis in which an average of about four peptides was assigned per protein. In LC-based protein identification technologies, however, multiple measurements of the same preparation generally increase the number of proteins identified due to the fact that the complexity of the sample peptide mixture often exceeds the separation capacity of the LC-MS system and because the selection of a peptide for MS/MS analysis is data dependent and somewhat irregular (16, 25). Thus, we analyzed the same peptide preparation repeatedly under the same conditions to maximally cover the E. coli proteome (Fig. 1). After repeating the analysis 10 times, a composite proteome of 1,480 proteins was obtained (Supplemental Table I), assigned using more than 58,700 peptides derived from
162,000 MS/MS spectra. The identified proteins corresponded to
35% of the total 4,289 ORFs predicted in the E. coli genome (11).

View larger version (13K):
[in this window]
[in a new window]
|
FIG. 1. Identification of E. coli proteins by LC-based protein identification technology. The analysis was repeated 10 times as described (1517), and the graph shows the number of newly identified proteins in each analysis (triangles) and the total number identified (circles).
|
|
This protein subset contained a wide range of proteins with respect to physico-chemical characteristics such as pI and molecular mass (Mr). The most acidic protein identified was the msyB gene product (pI 3.42), while the most basic was the rpmH gene product (pI 13.1). The smallest protein identified was the product of rpsV (Mr = 5.1 kDa), and the largest was the product of b2520 (Mr = 182 kDa). A 2D visualization of the pI and Mr of the 1,480 proteins and the E. coli proteome predicted from the ORFs (Fig. 2) suggests that our analysis covered > 99% of the bacterial proteome (with respect to Mr and pI). We also note that this protein subset contained not only abundant cellular proteins such as the ribosomal proteins and the elongation factors Tu and G that exist at
104 to 105 copies per cell (2628) but also very minor protein components such as the subunits of DNA polymerase III (dnaE, dnaX, dnaN, and holD), which are present at not more than 100 copies per cell (29). Thus, the analysis appears to have covered E. coli proteins whose cellular abundance range from 100 to 100,000 copies per cell (Table I).

View larger version (43K):
[in this window]
[in a new window]
|
FIG. 2. 2D display of the proteome that was experimentally obtained in this study (orange; 1,480 proteins) or predicted from the genome sequence (m52p) (yellow; 4,289 entries), and the proteome predicted from the genes encoded within K-loops (blue; 499 proteins). Mr and pI of the entire proteome were calculated from amino acid sequences without considering post-translational modifications. The y-axis is presented as a logarithmic scale.
|
|
Characterization of E. coli Proteome
Of the 1,480 proteins detected in this study, the cellular roles of 1,177 proteins (
80%) are known or could be predicted according to the classification of the Clusters of Orthologous Groups (COGs) database (20) maintained by the National Center for Biotechnology Information (NCBI). A survey of functional annotation indicated that this subset contained proteins with a variety of protein functions covering most cellular processes, including minor components of the transcriptional machinery such as sigma factors and repressors. In fact, a comparison of functional annotation with the equivalent data for all ORFs predicted from the E. coli genome suggests that the identified proteins covered
580% of the proteins in each classified functional group (Table II). For instance, we found most (80%) of the bacterial proteins with potential cellular roles in the "translation" group, and 35% of the proteins with potential roles in "signal transduction mechanism," etc. The coverage of proteins within the category "cell motility" was much lower (5%) than the average (35%), due presumably to the fact that laboratory strain K12 (JM109) has functional defects in cell motility (30).
On the other hand, the subcellular distribution of 684 of the 1,480 proteins could be annotated on the basis of the EcoCyc knowledge database (19). Comparison of the subcellular distribution of these proteins with the equivalent data for all E. coli ORFs suggests that the identified proteins covered 58% of the cytoplasmic and 44% of the periplasmic protein repertoire, as well as 23 and 30% of insoluble proteins that reside in an inner or outer membrane, respectively (Table III). Meanwhile, the E. coli proteome, consisting of 4,289 gene products or ORFs, contains 1,134 proteins (
27%) that contain a putative transmembrane (TM) segment(s) as predicted by SOSUI programs (21). We found 200 TM proteins (
14%) among the total of 1,480. These results suggest that the protein identification technology employed here could identify almost any type of protein regardless of its chemical characteristics. The analysis appears to slightly favor the identification of soluble proteins, presumably because membrane-spanning segments negatively influence the efficiency of protease digestion and/or subsequent LC-MS detection.
Previous genetic studies using a series of conditional lethal mutants of E. coli identified the essential genes for bacterial growth and viability. To date, 164 genes have been identified and cited in the PEC knowledge database. A search of this database revealed that our protein subset comprised the products of 138 essential genes (84% coverage) (Supplemental Table I). If we assume 84% coverage of the total proteins identified in this study, we speculate that the proteome of E. coli would consist of 1,760 proteins under cultured conditions. This putative size of proteome is within the range estimated by the transcriptome analysis that suggested 1,100 active E. coli genes (31) while another report suggested that almost all (3,700) of the bacterial genes are expressed (32).
Protein Products from Genes Within K-loops and Other Horizontally Transferred Genes
"K-loops" (also called K-islands) are well characterized as clusters of the most recently immigrated genes within the E. coli K12 genome (13, 33). K-loops were identified by comparing the genomes of E. coli strains K12 and O157 (13, 33). K-loops were integrated into the bacterial genome by horizontal gene transfer and show extremely different codon usage from the backbone of the bacterial genome and also contain genes from cryptic phages (33). To examine the relationship between K-loops and the proteome, the proteins identified in this study were plotted on a circular genome map (Fig. 3). Although the identified proteins were almost uniformly spread around the map, active genes were infrequently found within K-loops. Statistical estimation indicated that proteins were expressed from only 49 of the 499 K-loop genes (
10%). This percentage is significantly lower than that for active genes in the entire genome (35%) as estimated in this study (including 84% of the known essential genes).

View larger version (53K):
[in this window]
[in a new window]
|
FIG. 3. Circular representation of the K12 chromosome. The circle indicates the chromosomal location in base pairs (each tick = 100 kb). The red zones indicate the positions of ORFs that encode proteins identified in this study, and the blue zones indicate the positions of K-loops. Note that protein products are found infrequently within K-loops.
|
|
Previously, Geveart et al. (34) characterized E. coli K stain proteins during the late logarithmic growth phase (K12, HB2151 strain grown in Luria-Bertani medium) by a unique strategy of concentrating methionine-containing peptides. Among the 882 identified proteins in that subset, we found that only 5.8% of the protein products localized to genes on K-loops. In more recent study by Corbin et al. (35) that characterized the 1,147 E. coli proteins of the mid-logarithmic growth phase culture (K12, MG1655 strain grown in minimal medium), we also found only 5.0% of the protein products from K-loops. The composite proteome resulting from this study and those by Geveart et al. (34) and Corbin et al. (35) consists of 1,867 proteins that slightly increase the coverage of both proteins (44% of the total ORFs) and essential genes (89%; 146 of 164 genes) but still contains only 76 protein products derived from K-loops (15% coverage; data not shown). Because the ORFs predicted from the K-loops encode putative proteins that are representative of the entire genome in terms of chemical characteristics (Fig. 1), subcellular localization, and number of TM segments (data not shown), these results indicate that the recently immigrated genes may be inadequate for the expression machinery of the native genes.
In most LC-based protein identification technologies including the one reported here, the number of "peptide hits" used to identify a particular protein shows a positive semi-quantitative relationship to the abundance of that protein in the sample (16). In this study, an average of 39.7 hits per protein were used to identify the 1,480 proteins, and almost 50% of the proteins were identified by more than 10 hits. By contrast, the gene products from K-loops were identified by 6.7 hits on average, and only 10 products from the genes ompT, b1550, glcB, fecA, rfbB, rfbC, glf, wbbI, hsdM, and mrr were identified by more than 10 hits. This data suggests that K-loops are relatively inactive in terms of protein expression. Among the 49 genes mapped to K-loops whose products were detected in this study, the cellular roles of 38 genes are known or could be predicted by searching the COG database, while the remaining 11 genes were either hypothetical or were functionally undefined in the database. The genes with known or predicted cellular roles are classified into twenty categories (summarized in Supplemental Table II) and includes the proteins involved in cell wall biogenesis such as the synthesis of O-antigen (rfbA, rfbB, rfbC, glf, and wbbI clustered on K-loop 136 as discussed below), defense mechanisms such as DNA restriction-modification (hsdS, hsdM, hsdR, and mrr, K-loop 319), inorganic ion transport such as citrate-dependent iron transport (fecA and fecB, K-loop 313), transcription (dicA, dicC, and rpiR), energy production (glcB), and DNA recombination or transposition (trs5s). Interestingly, all these genes except dicA, dicC, and rpiR are so-called operational genes (8). This fact may be compatible with the complexity hypothesis of horizontal gene transfer (36), which states that transfer occurred more frequently for operational genes (involved in housekeeping) than for informational genes (involved in transcription, translation, and related processes) because informational genes are typically members of large, complex systems, whereas operational genes are not. Thus, horizontal transfer of informational gene products is less probable (36). Interestingly, E. coli K12 still expresses a number of enzymes in the O-antigen synthetic pathway, a repeated unit of lipopolysaccharide in the outer bacterial membrane, even though the strain lost its ability to synthesize O-antigen during the
80 years of adaptation to laboratory environments (24, 37). Nevertheless, our data suggest that most of the recently immigrated genes are nonfunctional.
The K12 and O157 strains diverged from a common ancestor
4.5 million years ago (38). Assuming that the K-loops were integrated at that time, the rate for acquiring a new functional gene is estimated at one per 100,000 years. On the other hand, it was postulated that a total of 755 E. coli genes are relics of horizontal events after the divergence of E. coli from the Salmonella lineage 100 million years ago (10). However, most of the horizontally transferred genes in the E. coli chromosome are of relatively recent origin, with an average age of 6.7 million years (10). We searched the products of the 755 genes in our list of E. coli proteins and found 90 gene products (12% of 755 genes; Supplemental Table I). Thus, again the horizontally transferred genes appear to be much less active than the native genes and may become functional at a rate of only one in 100,000 years. Besides those and K-loops genes, the 13 additional genes polA (39), polB (40), putP (41), icd (42), trpC (43), gapA (44), pabB (45), gnd (46), crr (47), mutS (48), mdh (49), uvrD (39), and aceK (50) underwent horizontal transfer as determined by analyses of nucleotide polymorphism patterns, levels of phylogenetic incongruence among gene sequences, and large-scale chromosomal measurement. We found that 9 of these 13 genes (69%) are expressed in E. coli. However, these genes appear to be exceptional not only because of their extraordinarily high frequency of protein expression but also because they have homologous genes in pathogenic E. coli including O157:H7 (13, 33) and CTF073 (51) as well as in the neighboring lineages Salmonella (52, 53) and Yersinia (54). Presumably, these genes are either of very old origin or were transferred from bacteria that have very similar codon usage to E. coli.
 |
CONCLUDING REMARKS
|
---|
It is generally believed that most immigrated genes play a critical role in bacteria by providing functional proteins important for diversification subsequent to strain divergence. Here, we propose that a large proportion of horizontally transferred genes in E. coli do not express protein products. This, however, raises the question as to how these genes have been conserved within the bacterial genome. It is widely accepted that if genes are not translated into proteins and do not contribute to cell survival then they tend to decay during evolution via insertion of internal stop codons, partial deletions, etc. Of course, the "nonfunctional" horizontally transferred E. coli genes may be undergoing such decay presently, but the range of time over which these genes were acquired (10) argues against this idea. It is possible that some of these genes are pseudogenes that lack the structural elements for transcription or/and translation, or maybe genes that do not produce proteins but serve to generate functional RNAs (55, 56). To obtain further insight into this problem, we performed transcriptome analysis on a DNA microarray that carried 4,241 unique E. coli genes including 458 genes from K-loops. Under the conditions described in "Experimental Procedures," we found the transcripts of 2,674 genes (63.1% of total genes on the array) in our bacterial culture, which included the transcripts of 208 genes from K-loops (Supplemental Table I). This suggests that about a half of K-loops genes (45.4%) were transcribed to RNAs, while we found only 25 protein products (12%) of the transcripts in our proteome analysis. On the other hand, we found a total of 1,138 proteins derived from the 2,674 transcripts detected on the array (42.6%). Thus, the genes in K-loops appear to be transcribed almost as efficiently as the native genes, but may be translated much less frequently to proteins than the native genes. Although the argument needs to be refined by more quantitative gene expression analysis such as quantitative RT-PCR, our results imply that the recently immigrated genes may be inadequate for the translational machinery of the bacterial cell. Nevertheless, whether some of the horizontally transferred genes serve to generate functional RNAs in the cell remains to be investigated.
Finally, the argument exists that horizontally transferred genes might be expressed under specific environmental conditions, for instance during times of stress that threaten cell survival. A search of the EcoCyc database (19) for stress-induced genes indicated that only seven of a total of 54 known stress-related proteins are encoded by horizontally transferred genes, indicating that they are not particularly rich in "survival" genes. We also note that the present study identified the products of many of these genes (29 of 54; 54%) that are induced by various stresses such as heat shock, cold shock, osmotic shock, and infection by phages. These genes include the product of the immigrated gene dinJ that is induced by DNA damage. Thus, while additional proteomic sets need to be defined under a variety of environmental conditions and by the analysis of other strains of E. coli, it appears likely that proteomic analyses will provide alternatives to the current paradigms of bacterial evolution.
 |
ACKNOWLEDGMENTS
|
---|
We thank Russell Doolittle (University of California San Diego) for valuable advise and critically reading the manuscript. We also thank Yukiko Yamazaki (National Institute of Genetics) for technical assistance and Jun-ichi Kato (Tokyo Metropolitan University) for helpful discussions.
 |
FOOTNOTES
|
---|
Received, February 23, 2004, and in revised form, April 28, 2004.
Published, MCP Papers in Press, April 28, 2004, DOI 10.1074/mcp.M400030-MCP200
1 The abbreviations used are: 2D, two-dimensional; TM, transmembrane. 
* This work was supported in part by grants for the Integrated Proteomics System Project, Pioneer Research on Genome the Frontier from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) of Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 
S The on-line version of this manuscript (available at http://www.mcponline.org) contains supplemental material. 

To whom correspondence should be addressed: Department of Chemistry, Graduate School of Science, Tokyo Metropolitan University, Minami-osawa 1-1, Hachioji-shi, Tokyo 192-0397, Japan. Fax: 81-426-77-2525; E-mail: isobe-toshiaki{at}c.metro-u.ac.jp
 |
REFERENCES
|
---|
- Wolfe, K. H., and Li, W. H.
(2003) Molecular evolution meets the genomics revolution.
Nat. Genet.
33, (suppl.)255
265[CrossRef][Medline]
- Doolittle, R. F.
(2002) Biodiversity: Microbial genomes multiply.
Nature
416, 697
700[CrossRef][Medline]
- Koonin, E. V., Makarova, K. S., and Aravind, L.
(2001) Horizontal gene transfer in prokaryotes: Quantification and classification.
Annu. Rev. Microbiol.
55, 709
742[CrossRef][Medline]
- Ochman, H., Lawrence, J. G., and Groisman, E. A.
(2000) Lateral gene transfer and the nature of bacterial innovation.
Nature
405, 299
304[CrossRef][Medline]
- Boucher, Y., Douady, C. J., Papke, R. T., Walsh, D. A., Boudreau, M. E., Nesbo, C. L., Case, R. J., and Doolittle, W. F.
(2003) Lateral gene transfer and the origins of prokaryotic groups.
Annu. Rev. Genet.
37, 283
328[CrossRef][Medline]
- Nelson, K. E., Clayton, R. A., Gill, S. R., Gwinn, M. L., Dodson, R. J., Haft, D. H., Hickey, E. K., Peterson, J. D., Nelson, W. C., Ketchum, K. A., McDonald, L., Utterback, T. R., Malek, J. A., Linher, K. D., Garrett, M. M., Stewart, A. M., Cotton, M. D., Pratt, M. S., Phillips, C. A., Richardson, D., Heidelberg, J., Sutton, G. G., Fleischmann, R. D., Eisen, J. A., White, O., Salzberg, S. L., Smith, H. O., Venter, J. C., and Fraser, C. M.
(1999) Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of
Thermotoga maritima. Nature
399, 323
329[CrossRef][Medline]
- Aravind, L., Tatusov, R. L., Wolf, Y. I., Walker, D. R., and Koonin, E. V.
(1998) Evidence for massive gene exchange between archaeal and bacterial hyperthermophiles.
Trends Genet
14, 442
444[CrossRef][Medline]
- Rivera, M. C., Jain, R., Moore, J. E., and Lake, J. A.
(1998) Genomic evidence for two functionally distinct gene classes.
Proc. Natl. Acad. Sci. U. S. A.
95, 6239
6244[Abstract/Free Full Text]
- Lawrence, J. G., and Ochman, H.
(1997) Amelioration of bacterial genomes: Rates of change and exchange.
J. Mol. Evol.
44, 383
397[Medline]
- Lawrence, J. G., and Ochman, H.
(1998) Molecular archaeology of the Escherichia coli genome.
Proc. Natl. Acad. Sci. U. S. A.
95, 9413
9417[Abstract/Free Full Text]
- Blattner, F. R., Plunkett, G., 3rd, Bloch, C. A., Perna, N. T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J. D., Rode, C. K., Mayhew, G. F., Gregor, J., Davis, N. W., Kirkpatrick, H. A., Goeden, M. A., Rose, D. J., Mau, B., and Shao, Y.
(1997) The complete genome sequence of Escherichia coli K-12.
Science
277, 1453
1474[Abstract/Free Full Text]
- Doolittle, R. F., Feng, D. F., Tsang, S., Cho, G., and Little, E.
(1996) Determining divergence times of the major kingdoms of living organisms with a protein clock.
Science
271, 470
477[Abstract]
- Perna, N. T., Plunkett, G., 3rd, Burland, V., Mau, B., Glasner, J. D., Rose, D. J., Mayhew, G. F., Evans, P. S., Gregor, J., Kirkpatrick, H. A., Posfai, G., Hackett, J., Klink, S., Boutin, A., Shao, Y., Miller, L., Grotbeck, E. J., Davis, N. W., Lim, A., Dimalanta, E. T., Potamousis, K. D., Apodaca, J., Anantharaman, T. S., Lin, J., Yen, G., Schwartz, D. C., Welch, R. A., and Blattner, F. R.
(2001) Genome sequence of enterohaemorrhagic Escherichia coli O157:H7.
Nature
409, 529
533[CrossRef][Medline]
- Takahashi, N., Kaji, H., Yanagida, M., Hayano, T., and Isobe, T.
(2003) Proteomics: Advanced technology for the analysis of cellular function.
J. Nutr.
133, 2090S
2096S[Abstract/Free Full Text]
- Isobe, T., Yamauchi, Y., Taoka, M., and Takahashi, N.
(2002) Automated two-dimensional LC-MS/MS for large-scale protein analysis, in
Proteins and Proteomics (Simpson, R. J., ed) pp.869
876, Cold Spring Harbor Press, Cold Spring Harbor, NY
- Mawuenyega, K. G., Kaji, H., Yamuchi, Y., Shinkawa, T., Saito, H., Taoka, M., Takahashi, N., and Isobe, T.
(2003) Large-scale identification of Caenorhabditis elegans proteins by multidimensional liquid chromatography-tandem mass spectrometry.
J. Proteome Res.
2, 23
35[CrossRef][Medline]
- Kaji, H., Saito, H., Yamauchi, Y., Shinkawa, T., Taoka, M., Hirabayashi, J., Kasai, K., Takahashi, N., and Isobe, T.
(2003) Lectin affinity capture, isotope-coded tagging and mass spectrometry to identify N-linked glycoproteins.
Nat. Biotechnol.
21, 667
672[CrossRef][Medline]
- Natsume, T., Yamauchi, Y., Nakayama, H., Shinkawa, T., Yanagida, M., Takahashi, N., and Isobe, T.
(2002) A direct nanoflow liquid chromatography-tandem mass spectrometry system for interaction proteomics.
Anal. Chem.
74, 4725
4733[CrossRef][Medline]
- Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Collado-Vides, J., Paley, S. M., Pellegrini-Toole, A., Bonavides, C., and Gama-Castro, S.
(2002) The EcoCyc database.
Nucleic Acids Res.
30, 56
58[Abstract/Free Full Text]
- Tatusov, R. L., Natale, D. A., Garkavtsev, I. V., Tatusova, T. A., Shankavaram, U. T., Rao, B. S., Kiryutin, B., Galperin, M. Y., Fedorova, N. D., and Koonin, E. V.
(2001) The COG database: New developments in phylogenetic classification of proteins from complete genomes.
Nucleic Acids Res.
29, 22
28[Abstract/Free Full Text]
- Hirokawa, T., Boon-Chieng, S., and Mitaku, S.
(1998) SOSUI: Classification and secondary structure prediction system for membrane proteins.
Bioinformatics
14, 378
379[Abstract]
- Soupene, E., van Heeswijk, W. C., Plumbridge, J., Stewart, V., Bertenthal, D., Lee, H., Prasad, G., Paliy, O., Charernnoppakul, P., and Kustu, S.
(2003) Physiological studies of Escherichia coli strain MG1655: Growth defects and apparent cross-regulation of gene expression.
J. Bacteriol.
185, 5611
5626[Abstract/Free Full Text]
- Liu, W. M., Mei, R., Di, X., Ryder, T. B., Hubbell, E., Dee, S., Webster, T. A., Harrington, C. A., Ho, M. H., Baid, J., and Smeekens, S. P.
(2002) Analysis of high density expression microarrays with signed-rank call algorithms.
Bioinformatics
18, 1593
1599[Abstract/Free Full Text]
- Stevenson, G., Neal, B., Liu, D., Hobbs, M., Packer, N. H., Batley, M., Redmond, J. W., Lindquist, L., and Reeves, P.
(1994) Structure of the O antigen of Escherichia coli K-12 and the sequence of its rfb gene cluster.
J. Bacteriol.
176, 4144
4156[Abstract]
- Davis, M. T., Beierle, J., Bures, E. T., McGinley, M. D., Mort, J., Robinson, J. H., Spahr, C. S., Yu, W., Luethy, R., and Patterson, S. D.
(2001) Automated LC-LC-MS-MS platform using binary ion-exchange and gradient reversed-phase chromatography for improved proteomic analyses.
J. Chromatogr. B. Biomed Sci. Appl.
752, 281
291[CrossRef][Medline]
- Pederson, S., Bloch, P. L., Reeh, S., and Neidhardt, F. C.
(1978) Patterns of protein synthesis in
E. coli: A catalog of the amount of 140 individual proteins at different growth rates. Cell
14, 179
190[Medline]
- Howe, J. G., and Hershey, W. J. B.
(1983) Initiation factor and ribosome levels are coordinately controlled in Escherichia coli growing at different rates.
J. Biol. Chem.
258, 1954
1959[Abstract/Free Full Text]
- Bremer, H., and Dennis, P. P.
(1996) Modulation of chemical composition and other parameters of the cell growth rate, in
Escherichia coli and Salmonella, Cellular and Molecular Biology (Neidhardt, F., Curtiss, I. R., Ingraham, J., Lin, E., Low, J. K. B., Magasanik, B., Reznikoff, W., Riley, M., Schaechter, M., and Umbarger, H., eds) Vol. 2, 2nd Ed., pp.1533
1569, American Society for Microbiology, Washington, D.C.
- Meeser, W., and Weigel, C.
(1996) Initiation of chromosome replication, in
Escherichia coli and Salmonella, Cellular and Molecular Biology (Neidhardt, F., Curtiss, I. R., Ingraham, J., Lin, E., Low, J. K. B., Magasanik, B., Reznikoff, W., Riley, M., Schaechter, M., and Umbarger, H., eds) Vol. 2, 2nd Ed., pp.1579
1601, American Society for Microbiology, Washington, D.C.
- Boudeau, J., Barnich, N., and Darfeuille-Michaud, A.
(2001) Type 1 pili-mediated adherence of Escherichia coli strain LF82 isolated from Crohns disease is involved in bacterial invasion of intestinal epithelial cells.
Mol. Microbiol.
39, 1272
1284[CrossRef][Medline]
- Richmond, C. S., Glasner, J. D., Mau, R., Jin, H., and Blattner, F. R.
(1999) Genome-wide expression profiling in Escherichia coli K-12.
Nucleic Acids Res.
27, 3821
3835[Abstract/Free Full Text]
- Selinger, D. W., Cheung, K. J., Mei, R., Johansson, E. M., Richmond, C. S., Blattner, F. R., Lockhart, D. J., and Church, G. M.
(2000) RNA expression analysis using a 30 base pair resolution Escherichia coli genome array.
Nat. Biotechnol.
18, 1262
1268[CrossRef][Medline]
- Hayashi, T., Makino, K., Ohnishi, M., Kurokawa, K., Ishii, K., Yokoyama, K., Han, C. G., Ohtsubo, E., Nakayama, K., Murata, T., Tanaka, M., Tobe, T., Iida, T., Takami, H., Honda, T., Sasakawa, C., Ogasawara, N., Yasunaga, T., Kuhara, S., Shiba, T., Hattori, M., and Shinagawa, H.
(2001) Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12.
DNA Res.
8, 11
22[Medline]
- Gevaert, K., Van Damme, J., Goethals, M., Thomas, G. R., Hoorelbeke, B., Demol, H., Martens, L., Puype, M., Staes, A., and Vandekerckhove, J.
(2002) Chromatographic isolation of methionine-containing peptides for gel-free proteome analysis: identification of more than 800 Escherichia coli proteins.
Mol. Cell Proteomics
1, 896
903[Abstract/Free Full Text]
- Corbin, R. W., Paliy, O., Yang, F., Shabanowitz, J., Platt, M., Lyons, C. E., Jr., Root, K., McAuliffe, J., Jordan, M. I., Kustu, S., Soupene, E., and Hunt, D. F.
(2003) Toward a protein profile of Escherichia coli: Comparison to its transcription profile.
Proc. Natl. Acad. Sci. U. S. A.
100, 9232
9237[Abstract/Free Full Text]
- Jain, R., Rivera, M. C., and Lake, J. A.
(1999) Horizontal gene transfer among genomes: The complexity hypothesis.
Proc. Natl. Acad. Sci. U. S. A.
96, 3801
3806[Abstract/Free Full Text]
- Liu, D., and Reeves, P. R.
(1994) Escherichia coli K12 regains its O antigen.
Microbiology
140, 49
57[Medline]
- Reid, S. D., Herbelin, C. J., Bumbaugh, A. C., Selander, R. K., and Whittam, T. S.
(2000) Parallel evolution of virulence in pathogenic Escherichia coli.
Nature
406, 64
67[CrossRef][Medline]
- Brown, E. W., LeClerc, J. E., Kotewicz, M. L., and Cebula, T. A.
(2001) Three Rs of bacterial evolution: How replication, repair, and recombination frame the origin of species.
Environ. Mol. Mutagen.
38, 248
260[CrossRef][Medline]
- Filee, J., Forterre, P., Sen-Lin, T., and Laurent, J.
(2002) Evolution of DNA polymerase families: Evidences for multiple gene exchange between cellular and viral proteins.
J. Mol. Evol.
54, 763
773[CrossRef][Medline]
- Nelson, K., and Selander, R. K.
(1992) Evolutionary genetics of the proline permease gene (putP) and the control region of the proline utilization operon in populations of Salmonella and
Escherichia coli. J. Bacteriol.
174, 6886
6895[Abstract]
- Wang, F. S., Whittam, T. S., and Selander, R. K.
(1997) Evolutionary genetics of the isocitrate dehydrogenase gene (icd) in
Escherichia coli and Salmonella enterica. J. Bacteriol.
179, 6551
6559[Abstract]
- Lecointre, G., Rachdi, L., Darlu, P., and Denamur, E.
(1998) Escherichia coli molecular phylogeny using the incongruence length difference test.
Mol. Biol. Evol.
15, 1685
1695[Abstract/Free Full Text]
- Nelson, K., Whittam, T. S., and Selander, R. K.
(1991) Nucleotide polymorphism and evolution in the glyceraldehyde-3-phosphate dehydrogenase gene (gapA) in natural populations of Salmonella and Escherichia coli.
Proc. Natl. Acad. Sci. U. S. A.
88, 6667
6671[Abstract]
- Guttman, D. S., and Dykhuizen, D. E.
(1994) Detecting selective sweeps in naturally occurring
Escherichia coli. Genetics
138, 993
1003[Abstract/Free Full Text]
- Bisercic, M., Feutrier, J. Y., and Reeves, P. R.
(1991) Nucleotide sequences of the gnd genes from nine natural isolates of Escherichia coli: Evidence of intragenic recombination as a contributing factor in the evolution of the polymorphic gnd locus.
J. Bacteriol.
173, 3894
3900[Medline]
- Hall, B. G., and Sharp, P. M.
(1992) Molecular population genetics of Escherichia coli: DNA sequence diversity at the celC, crr, and gutB loci of natural isolates.
Mol. Biol. Evol.
9, 654
665[Abstract]
- Brown, E. W., LeClerc, J. E., Li, B., Payne, W. L., and Cebula, T. A.
(2001) Phylogenetic evidence for horizontal transfer of mutS alleles among naturally occurring Escherichia coli strains.
J. Bacteriol.
183, 1631
1644[Abstract/Free Full Text]
- Boyd, E. F., Nelson, K., Wang, F. S., Whittam, T. S., and Selander, R. K.
(1994) Molecular genetic basis of allelic polymorphism in malate dehydrogenase (mdh) in natural populations of Escherichia coli and Salmonella enterica.
Proc. Natl. Acad. Sci. U. S. A.
91, 1280
1284[Abstract]
- Nelson, K., Wang, F. S., Boyd, E. F., and Selander, R. K.
(1997) Size and sequence polymorphism in the isocitrate dehydrogenase kinase/phosphatase gene (aceK) and flanking regions in Salmonella enterica and Escherichia coli.
Genetics
147, 1509
1520[Abstract/Free Full Text]
- Welch, R. A., Burland, V., Plunkett, G., 3rd, Redford, P., Roesch, P., Rasko, D., Buckles, E. L., Liou, S. R., Boutin, A., Hackett, J., Stroud, D., Mayhew, G. F., Rose, D. J., Zhou, S., Schwartz, D. C., Perna, N. T., Mobley, H. L., Donnenberg, M. S., and Blattner, F. R.
(2002) Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli.
Proc. Natl. Acad. Sci. U. S. A.
99, 17020
17024[Abstract/Free Full Text]
- McClelland, M., Sanderson, K. E., Spieth, J., Clifton, S. W., Latreille, P., Courtney, L., Porwollik, S., Ali, J., Dante, M., Du, F., Hou, S., Layman, D., Leonard, S., Nguyen, C., Scott, K., Holmes, A., Grewal, N., Mulvaney, E., Ryan, E., Sun, H., Florea, L., Miller, W., Stoneking, T., Nhan, M., Waterston, R., and Wilson, R. K.
(2001) Complete genome sequence of Salmonella enterica serovar Typhimurium LT2.
Nature
413, 852
856[CrossRef][Medline]
- Parkhill, J., Dougan, G., James, K. D., Thomson, N. R., Pickard, D., Wain, J., Churcher, C., Mungall, K. L., Bentley, S. D., Holden, M. T., Sebaihia, M., Baker, S., Basham, D., Brooks, K., Chillingworth, T., Connerton, P., Cronin, A., Davis, P., Davies, R. M., Dowd, L., White, N., Farrar, J., Feltwell, T., Hamlin, N., Haque, A., Hien, T. T., Holroyd, S., Jagels, K., Krogh, A., Larsen, T. S., Leather, S., Moule, S., OGaora, P., Parry, C., Quail, M., Rutherford, K., Simmonds, M., Skelton, J., Stevens, K., Whitehead, S., and Barrell, B. G.
(2001) Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18.
Nature
413, 848
852[CrossRef][Medline]
- Parkhill, J., Wren, B. W., Thomson, N. R., Titball, R. W., Holden, M. T., Prentice, M. B., Sebaihia, M., James, K. D., Churcher, C., Mungall, K. L., Baker, S., Basham, D., Bentley, S. D., Brooks, K., Cerdeno-Tarraga, A. M., Chillingworth, T., Cronin, A., Davies, R. M., Davis, P., Dougan, G., Feltwell, T., Hamlin, N., Holroyd, S., Jagels, K., Karlyshev, A. V., Leather, S., Moule, S., Oyston, P. C., Quail, M., Rutherford, K., Simmonds, M., Skelton, J., Stevens, K., Whitehead, S., and Barrell, B. G.
(2001) Genome sequence of Yersinia pestis, the causative agent of plague.
Nature
413, 523
527[CrossRef][Medline]
- Eddy, S. R.
(2001) Non-coding RNA genes and the modern RNA world.
Nat. Rev. Genet.
2, 919
929[CrossRef][Medline]
- Wassarman, K. M., Repoila, F., Rosenow, C., Storz, G., and Gottesman, S.
(2001) Identification of novel small RNAs using comparative genomics and microarrays.
Genes Dev.
15, 1637
1651[Abstract/Free Full Text]