Genomic analysis of the histidine kinase family in bacteria and archaea

Dong-jin Kim1 and Steven Forst1

Department of Biological Sciences, PO Box 413, University of Wisconsin, WI 53201, Milwaukee, USA1

Author for correspondence: Steven Forst. Tel: +1 414 229 6373. Fax: +1 414 229 3926. e-mail: sforst{at}uwm.edu


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Two-component signal transduction systems, consisting of histidine kinase (HK) sensors and DNA-binding response regulators, allow bacteria and archaea to respond to diverse environmental stimuli. HKs possess a conserved domain (H-box region) which contains the site of phosphorylation and an ATP-binding kinase domain. In this study, a genomic approach was taken to analyse the HK family in bacteria and archaea. Based on phylogenetic analysis, differences in the sequence and organization of the H-box and kinase domains, and the predicted secondary structure of the H-box region, five major HK types were identified. Of the 336 HKs analysed, 92% could be assigned to one of the five major HK types. The Type I HKs were found predominantly in bacteria while Type II HKs were not prevalent in bacteria but constituted the major type (13 of 15 HKs) in the archaeon Archaeoglobus fulgidus. Type III HKs were generally more prevalent in Gram-positive bacteria and were the major HK type (14 of 15 HKs) in the archaeon Methanobacterium thermoautotrophicum. Type IV HKs represented a minor type found in bacteria. The fifth HK type was composed of the chemosensor HKs, CheA. Several bacterial genomes contained all five HK types. In contrast, archaeal genomes either contained a specific HK type or lacked HKs altogether. These findings suggest that the different HK types originated in bacteria and that specific HK types were acquired in archaea by horizontal gene transfer.

Keywords: classification scheme, phylogenetic analysis, secondary structure analysis, horizontal gene transfer

Abbreviations: HK, histidine kinase; RR, response regulator


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
In bacteria and archaea, two-component signal transduction systems (Kofoid & Parkinson, 1988 ; Parkinson & Kofoid, 1992 ; Forst & Roberts, 1994 ; Hoch & Silhavy, 1995 ), also referred to as His–Asp phosphorelay systems (Egger et al., 1997 ), mediate adaptive responses to changes in environmental conditions. A typical two-component signal transduction system consists of a membrane-bound sensor histidine kinase (HK) and a cognate regulatory protein referred to as a response regulator (RR). HKs usually function as dimeric proteins that undergo transautophosphorylation on a conserved histidine residue in response to specific stimuli (Dutta et al., 1999 ). The phosphoryl group is subsequently transferred to an Asp residue in the receiver domain on the RR. Modulation of the phosphorylated state of the RR controls either expression of target genes or cellular behaviour, such as swimming motility (Hoch & Silhavy, 1995 ). Environmental stimuli received by highly divergent sensory input domains provides specificity for the signal transduction pathway and controls the level of the phosphorylated state of the RR. Bacteria can possess more than 30 different two-component signal transduction systems (Mizuno, 1997 ). Bacteria that possess a large number of HKs generally are able to adapt to a broad spectrum of environmental stimuli. While HKs have also been identified in fungi (Ota & Varshavsky, 1993 ; Loomis et al., 1998 ), amoeba (Schuster et al., 1996 ; Chang et al., 1998 ), Neurospora (Alex et al., 1996 ) and Arabidopsis (Chang et al., 1993 ; Suzuki et al., 1998 ), the HK content in eukaryotic genomes is much lower than that found in bacterial genomes.

HKs consist of an ATP-binding kinase domain and the H-box domain which includes the histidine site of phosphorylation. The kinase domain consists of three conserved consensus motifs called the N-, G1- and G2-boxes, and a fourth, more variable sequence, the F-box (Kofoid & Parkinson, 1988 ; Stock et al., 1988 , 1995 ). In most HKs, the kinase domain is directly connected to the C-terminal side of the H-box domain. In contrast, in the chemosensor CheA, the H-box (P1 domain) resides at the N terminus of the protein and is separated from the kinase domain by the intervening P2 and P3 modules (Garzon & Parkinson, 1996 ; Robinson & Stock, 1999 ).

The structures of the H-box domain of the osmosensor, EnvZ (Tomomori et al., 1999 ) and the P1 domain of CheA (Zhou & Dalquihst, 1997 ) have been determined. The H-box region of EnvZ consists of a four-helix bundle structure formed by the dimeric association of two identical subunits while the P1 domain is a monomeric four-helix bundle structure. While the structure of H-box domains differ, the structure of the kinase domains of EnvZ of Escherichia coli (Tanaka et al., 1998 ) and CheA of Thermatoga maritima (Bilwes et al., 1999 ) were shown to be homologous to each other and to the ATP-binding domains of DNA gyrase B and Hsp 90. The phosphotransfer reaction can be reconstituted using liberated H-box and kinase domains (Garzon & Parkinson, 1996 ; Park et al., 1998 ), indicating that the individual domains can be obtained as functionally intact modules.

Besides the typical two-component organization, multistep His–Asp–His–Asp phosphorelay systems can be composed of individual phosphotransfer proteins. This modular organization has been extensively investigated in the multi-step pathway controlling sporulation in Bacillus subtilis (Appleby et al., 1996 ; Fabret et al., 1999 ; Hoch, 1995 ; Perraud et al., 1999 ). Multistep phosphotransfer reactions can also occur within a single HK. These so-called hybrid HKs contain additional phosphotransfer modules referred to as the D1 receiver and the HPt phosphotransfer domains that are attached to the C-terminal side of the kinase domain (Appleby et al., 1996 ).

The number of HKs recognized has expanded enormously with the advent of microbial genomic sequencing projects. The HK superfamily has been classified by numerous criteria. Recently, Grebe & Stock (1999) separated the HK family into 11 different subtypes based on cluster analysis of 348 HKs. In the present study, the HK families in the completed genomes of 22 bacteria and 4 archaea was analysed. This genomic analysis divided the HK family into five major types. The HK type distribution differed markedly between bacteria and archaea.


   METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Analysis of bacterial and archaeal HK families.
The HK family of each genome was assembled using the gene tables of the completed genomes listed in TIGR (http://www.tigr.org). Additionally, BLAST analysis using the transmitter domains (H-box plus kinase domain) of EnvZ, CheA, NarX, YehU and DcuS as the search sequences was performed. During this analysis ORFs were retrieved which lacked either H-box or kinase consensus motifs. Also, proteins such as HipA of E. coli and SpoIIAB of B. subtilis, which contained kinase domains but lacked an identifiable H-box domain were retrieved. Only those proteins that contained both an identifiable H-box and complete kinase domain were included in the HK gene family.

Alignment of H-box and kinase domains.
The transmitter domains were initially aligned using the multi-sequence alignment program MSA version 2.1. Refinement of the alignments was aided by BLAST search analysis and visual inspection.

Phylogenetic analysis.
A distance dendrogram of each HK family was constructed using the unweighted pair-group method with arithmetic means (UPGMA) algorithm. Using this method, five different HK types were identified in E. coli. For each genome analysed, a dataset, which included the HKs from E. coli, was created and subsequently analysed using the UPMGA method. The assignment of HK types and subtypes within each genome analysed was accomplished using this approach.

Secondary structure analysis.
The PredictProtein server (http://www.embl-heidelberg.de/predictprotein/predictprotein.html) was used to predict the secondary structure of the H-box region in each HK retrieved. A predicted secondary structure was assigned to sequences that possessed a liability value of greater than 7.


   RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Characterization of the HK family of E. coli K-12
HKs were retrieved using the BLAST server within each genome listed in the TIGR microbial database. Various transmitters, which included both the H-box region and kinase domain, were used as search probes. Retrieved ORFs containing both an H-box region and a kinase domain were included in the HK gene family. We initially analysed the genomes containing large HK families. The HK family of the completed genomic sequence of E. coli K-12 (Blattner et al., 1997 ) was analysed first since it has been extensively studied and the function of 18 of its HKs are presently known (Egger et al., 1997 ).

Twenty-nine HKs were retrieved from the genome of E. coli. Fig. 1(a) shows the amino acid sequence alignment of the H-box and X regions and Fig. 1(b) shows the alignment of the kinase domains. During the BLAST analysis, proteins which contained phosphorelay subdomains but lacked kinase domains were retrieved. For example, YojN contains an HPt domain but no identifiable kinase domain. These types of proteins were not included in the HK gene family.



View larger version (81K):
[in this window]
[in a new window]
 
Fig. 1. Alignment of the H-box region of the HKs of E. coli. H-box consensus sequences are boxed. Bold characters depict invariable residues and characteristic conserved residues of a given subtype are represented by white letters. The predicted helix–loop–helix structures are underlined. The helix–loop–helix structure of EnvZ is shown by thick lines above the EnvZ sequence. ‘+’ denotes the position of the conserved positive amino acid residue at the end of helix 2. Proline residues in the putative loop region are shaded. (b) Alignment of the kinase domain of the HKs of E. coli. The conserved motifs are enclosed in boxes and shaded. Bold characters depict invariable residues and characteristic conserved residues of a given subtype are represented by white letters. The number of amino acid residues between motifs are represented by either a single dot (less than 10 residues), by double dots (10–25 residues) or double open circles (more than 25 residues). Dashes represent deleted residues.

 
Phylogenetic analysis separated the HK gene family into five major branches (Fig. 2). The analysis reveals that Type I and II HKs are related to each other while Type III and IV HKs occupy separate branches. Type I and II HKs both possess orthodox kinase domains, which contain the N, G1, F and G2 consensus motifs. Type III and IV HKs possess so-called unorthodox kinase domains in which N1 of the N-box motif is either a glycine (Type III) or a proline (Type IV) residue, the F-box is absent and the G2 motif is truncated (Fig. 1b). The conserved glycine residue identified on the C-terminal side of the G2 motif in almost all kinase domains is referred to as the G3 site. The Type I group contained the largest number of members (72%). Within this group, three separate subtypes could be distinguished. The Type IA group contained 12 HKs, the Type IB group contained the hybrid HKs and the Type IC group contained three HKs, including the nitrogen regulator, NtrB. While PhoQ contained a Type I H-box motif and an orthodox kinase domain, it did not branch within the Type IA, IB and IC subgroups.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 2. Phylogenetic analysis of the HKs of E. coli. Dendrogram of the 29 HKs of E. coli analysed using the UPGMA algorithm. The alignments shown in Fig. 1(a) and (b) were used in the analysis.

 
Each of the five HK types contained a characteristic H-box motif (Fig. 1a). The H-box of Type I HKs contained the consensus motif HEhRTPh. Secondary structure analysis predicted a helix–loop–helix structure (underlined in Fig. 1) in the H-box region. This structure was shown to exist in the NMR solution structure of EnvZ (Tomomori et al., 1999 ). Proline residues were present in the loop region of numerous HKs and positively charged residues were found at the end of the X-region. The H-box motif of the Type II HKs possessed a conserved asparagine residue at position 5 (the invariant H residue is defined as position 1) and lacked the positively charged residue and the proline at positions 4 and 6, respectively. These HKs lacked a predicted helix–loop–helix structure. The H-box motif of Type III HKs was characterized by an R-E-L sequence on the N-terminal side of the histidine site of phosphorylation and lacked the conserved positively charged and proline residues. A helix–loop–helix structure was predicted in these molecules. The H-box motif of Type IV HKs contained a proline residue on the N-terminal side and the conserved sequence FLFNAL on the C-terminal side of the histidine site of phosphorylation. Finally, the P1 domain of CheA contained the consensus motif HSIKG and exists at the N terminus of the protein.

The distance between the conserved histidine residue of the H-box and the conserved asparagine residue of the kinase domain (H to N distance) was characteristic for the different HK subtypes. The mean H to N distance was approximately 116 residues and 96 residues for the Type I and II HKs, respectively (Table 1). The mean H to N distance was 110 residues and 92 residues for the Type III and IV HKs, respectively. The kinase domain of CheA was characterized by insertions between the N and G1 boxes and the G1 and F boxes. The N-box of CheA contained a histidine residue at the N1 position. The localization of the H-box at the N terminus of CheA created an H to N distance of 325 residues (Table 1; Kofoid & Parkinson, 1988 ). Finally, it has been shown that 26 of the 29 HKs of E. coli are organized in operons with cognate RRs (Mizuno, 1997 ).


View this table:
[in this window]
[in a new window]
 
Table 1. Characteristic features of HK types

 
HK family of Pseudomonas aeruginosa
To determine whether the five major HK types found in E. coli were present in other bacteria, a phylogenetic analysis of HK gene family of Psd. aeruginosa (Stover et al., 2000 ) using the UMPGA method was performed (Fig. 3). Psd. aeruginosa possesses 63 HKs. The five HK types found in E. coli were also identified in Psd. aeruginosa. The majority of the family members (86%) were Type I HKs, containing typical orthodox kinase domains and H-box motifs. One cluster within the Type IA group (PA1396, 1976, 1992, 3271 and 4936) contained orthodox kinase domains while the H-box motifs contained a non-polar residue at position 4 and a glutamine residue at position 5. This clade of HKs formed a distinct branch within the Type IA group. In addition, a cluster of HKs in the Type IC group possessed the consensus H-box motif HDLNQPL in which the asparagine residue replaced the typical positively charged residue at position 4 and the glutamine residue at position 5 was highly conserved. Psd. aeruginosa lacked Type II HKs and possessed four CheAs. Two HKs, PA3078 and PA4380, could not be assigned to the defined type, so were categorized as unclassified. Helix–loop–helix structures were predicted in the H-box region of the Type I and III HKs and the H to N distances for each of the HK types were similar to those found in the different HK types of E. coli. Finally, the majority of the HKs were found in operons with cognate RRs.



View larger version (52K):
[in this window]
[in a new window]
 
Fig. 3. Phylogenetic analysis of the HKs of Psd. aeruginosa. Dendrogram showing the analysis of a combined dataset of E. coli and Psd. aeruginosa HKs. HKs of Psd. aeruginosa are represented by bold type.

 
HK families in bacterial genomes
The HK gene families of bacterial genomes listed as completed in the TIGR database were characterized using the cluster analysis approach. We began the analysis with free-living micro-organisms containing relatively large genomes.

A total of 44 HKs were identified in the genome of Vibrio cholerae (Heidelberg et al., 2000 ), which included three new proteins (VCA0705, VC0694, VCA0851) not previously listed in the TIGR gene table. V. cholerae possessed a large Type I group which included 7 Type IA, 9 Type IB and 12 Type IC molecules (Table 2). We noted that a cluster of HKs within the Type IC group possessed the H-box motif HDLNNP in which the typical positively charged residue at position 4 was substituted by an asparagine residue. The Type I HKs possessed helix–loop–helix structures in the H-box region and a mean H to N distance of 110 residues. Type II, III, IV and CheA HKs were also present in V. cholerae. Additionally, four HKs did not cluster with a defined group and were therefore placed in an unclassified category. Twenty-eight of the HKs were found on the large chromosome of V. cholerae (2·96 Mb) while 16 were found on the small chromosome (1·07 Mb). The majority of the HKs existed in operons with cognate RRs.


View this table:
[in this window]
[in a new window]
 
Table 2. Distribution of HK types in bacteria

 
Xylella fastidiosa (Simson, 2000 ), like E. coli and V. cholerae, is a member of the {gamma}-subclass of the Proteobacteria. X. fastidiosa contained predominantly Type I HKs (Table 2) and lacked Type II and III HKs. The HKs in this bacterium existed in operons with cognate RRs. The HK family of the Gram-positive bacterium B. subtilis (Kunst, 1997 ) contained all five HK types. The Type I group, containing six Type IA, five Type IC HKs and no Type IB HKs, was not as large as that found in the Gram-negative bacteria analysed above, while 25% of the HKs belonged to the Type III group. B. subtilis contained a relatively large number of HKs which did not cluster within the five defined HK types. Three unclassified HKs (YtsB, YvcQ, YxdK) clustered together into one clade while three other unclassified HKs (YbdK, YrkQ, YccG) formed a separate clade (see Table 5). All of the HKs of B. subtilis existed in operons with cognate RR with the exception of the Type IC group which were orphans (Fabret et al., 1999 ).


View this table:
[in this window]
[in a new window]
 
Table 5. Classification of HK family of completed genomes

 
A markedly different HK distribution was found in the cyanobacterium, Synechocystis sp. (Kaneko et al., 1996 ). This bacterium possessed a large Type IB group but lacked Type IC HKs. Several of the Type IB HKs did not contain D1 or HPt modules. Whether these Type IB molecules represent HKs to which additional phosphorelay modules had not been added or are hybrid types which lost the phosphorelay modules, remains to be determined. Interestingly, the two CheA proteins identified in Synechocystis sp. contained additional phosphorelay modules (Mizuno et al., 1996 ). Synechocystis sp. contains two type III HKs, slr0331 and slr1212. While slr0331 was a typical Type III HK, slr1212 contained an atypical H-box sequence (HHRhKNNLQ) connected to a typical unorthodox kinase domain. Type II and IV HKs were not present in Synechocystis sp. Finally, only 13 of the HKs were organized in operons with cognate RRs (Mizuno et al., 1996 ).

The Gram-positive bacterium Deinococcus radiodurans (White et al., 1999 ) contained 20 HKs, 13 of which belonged to the Type IA group. The HK content of Type III HKs (4 out of 20) was relatively high. The genome of this bacterium possesses two chromosomes (2·6 and 0·41 Mb), a megaplasmid (0·18 Mb) and a small plasmid (45 kb). Thirteen of the HKs were present on the large chromosome, three were located on the smaller chromosome and four were located on the megaplasmid. In the thermophilic bacterium Thermatoga maritima (Nelson et al., 1999 ) the majority of the HKs (5 out of 8) belonged to the Type I group while two HKs remained unclassified. Seven of the eight HKs of T. maritima existed in operons with cognate RRs. Finally, the hyperthermophilic bacterium, Aquifex aeolicus, which is considered to be one of the earliest diverging eubacteria (Deckert et al., 1998 ), possessed three HKs, none of which could be classified. This organism is motile and possesses polytrichous flagella but does not contain an identifiable CheA protein.

In summary, 92% of the HKs analysed were able to be assigned to one of the five major HK types. Several bacteria contained all five HK types. The majority of HKs (63%) belonged to the Type I group while the distribution of the various subtypes varied considerably. In the bacteria, most of the HKs were organized in operons with cognate RRs, with the notable exception of Synechocystis.

HK families of human pathogens
The size of the genomes of pathogenic bacteria is generally smaller than that of free-living bacteria (Table 3). The mean HK content of the pathogenic bacteria was 0·26% as compared with 0·65% for the free-living bacteria. The human pathogenic bacteria contained predominantly Type I HKs (Table 3). Interestingly, the Gram-positive bacterium, Mycobacterium tuberculosis (Davies et al., 1998 ) contained a relatively high content (4 of 13) of Type III HKs. One of the Type III HKs, Rv3220, contained a typical unorthodox kinase domain and the atypical H-box sequence HHRhKNNLQ which was similar to the H-box of slr1212 of Synechocystis sp. These proteins are referred to as Type IIIB HKs (see Table 5). The majority of the HKs and RRs in these bacteria were organized in operons with cognate RRs.


View this table:
[in this window]
[in a new window]
 
Table 3. Distribution of HK types in pathogenic bacteria

 
Helicobacter pylori and Campylobacter jejuni, both of which belong to the {delta}-subclass of the Proteobacteria, contained several unclassified HKs. This finding suggests that this group of bacteria may possess a unique HK type that is not yet identified in the current HK dataset.

Analysis of the HK family of archaeal genomes
The amino acid sequence alignment of the H-box regions and kinase domains of the 15 HKs of Archaeoglobus fulgidus (Klenk et al., 1998 ) is shown in Fig. 4(a) and (b), respectively. Thirteen of the HKs belonged to the Type II subtype while only one belonged to the Type I group. The H-box module of Type II HKs contained the characteristic asparagine residue at position 5. The H-box of the Type II HKs lacked a predictable secondary structure. Highly conserved glutamic acid and positively charged residues were identified downstream of the H-box (shaded in Fig. 4a). The kinase domains (Fig. 4b) possessed a conserved glycine residue in the F-box and the mean H to N distance was 96 residues. Cluster analysis revealed that the Type II group of Archaeoglobus formed a separate clade, designated the IIB subtype, within the Type II group of E. coli (Fig. 5). The 13 HKs did not exist in operons with the nine RRs identified in Archaeoglobus.



View larger version (83K):
[in this window]
[in a new window]
 
Fig. 4. Alignment of the kinase and H-box regions of Arc. fulgidus. (a) Sequence alignment of the H-box regions. Shadings and symbols are as in Fig. 1. Only AF1483 and CheA possessed predicted helix–loop–helix structures. (b) Sequence alignment of the kinase domains.

 


View larger version (34K):
[in this window]
[in a new window]
 
Fig. 5. Phylogenetic analysis of the HKs of Arc. fulgidus and Mbc. thermoautotrophicum. Dendrogram showing the analysis of a combined dataset of HKs of E. coli, Arc. fulgidus and Mbc. thermoautotrophicum. HKs of Arc. fulgidus and Mbc. thermoautotrophicum are represented by bold type.

 
Earlier analysis of the Methanobacterium thermoautotrophicum genome identified 15 HKs (Smith et al., 1997 ). The H-box region of 14 of these HKs contained the motif HHRVKNNLQ which was identical to the H-box sequence found in Mycobacterium and Synechocystis. HKs containing this sequence belong to the Type IIIB group. Cluster analysis also showed that the HKs of Mbc. thermoautotrophicum branched within the Type III group forming a clade that was distinct from the E. coli group (Fig. 5). The mean H to N distance of the Mbc. thermoautotrophicum HKs was 110 residues. Mbc. thermoautotrophicum possessed one orthodox HK subtype and did not contain semi-orthodox, minor, CheA, tripartite or hybrid HKs. Finally, most of the HKs were not organized in operons with cognate RRs.

Methanococcus jannaschii (Bult et al., 1996 ) and Aeropyrum pernix (Kawarabayasi et al., 1999 ) were previously found to lack HKs. A re-examination of these genomes confirmed that HKs were missing in these organisms. The genome of Pyrococcus horikoshii has been completed recently (Kawarabayasi et al., 1998 ). The only HK found in this genome was CheA (Table 4). The HK and RR organization in archaea was markedly different than that found in bacteria. Most of the HKs in Arc. fulgidus and Mbc. thermoautotrophicum were not organized in operons with a cognate RR. Only AF0450 of Arc. fulgidus and MTH0902 and MTH0444 of Mbc. thermoautotrophicum (Smith et al., 1997 ) were located in operons with a cognate RR.


View this table:
[in this window]
[in a new window]
 
Table 4. Distribution of HK types in archaea

 
The genome sequence of the archaeon Halobacterium sp. was recently completed (Ng et al., 2000 ). Of the 14 reported HKs, we retrieved 12 HKs, 8 of which formed a clade within the Type I group but were distinct from the IA, IB and IC subtypes. These HKs appear to represent a subtype that is so far unique to Halobacterium. Three Type II HKs and one CheA were also identified. Finally, as found in Archaeoglobus, there were more HKs (14) than RRs (6) in Halobacterium.


   DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Phylogenetic analysis of HKs in numerous bacterial and archaeal genomes led to the identification of five major HK types. Of the 336 HKs analysed in this study 92% could be assigned to one of the five HK types. Type I and II HKs possessed orthodox kinase domains while Type III and IV HKs possessed unorthodox kinase domains. The predicted secondary structure of the different H-box and X regions and the H to N distances in the different HKs were characteristic for the various HK types. The fifth HK type was composed of CheA molecules in which the H-box (P1) domain is located at the N terminus of the protein. Based of these findings, several conclusions can be made concerning the HK family of bacteria and archaea. (i) All bacteria sequenced to date, with the exception of mycoplasmas, contain HKs. The HK content was found to increase as the size of the genome increased. In free-living bacteria possessing larger genomes the HK content was relatively high while in pathogenic bacteria possessing smaller genomes, the HK content was relatively low. On the other hand, the HK content in archaea was highly variable with some archaeal genomes completely lacking HKs. (ii) Type I HKs were predominant in bacteria with the content of the different Type I subtypes varying greatly. For example, V. cholerae contained 12 Type IC HKs while Synechocystis lacked these HKs. Similarly, Synechocystis sp. contained 19 Type IB HKs while none was present in B. subtilis. In contrast, Type I HKs were not prevalent in archaea. (iii) Unlike bacterial HKs, archaeal HKs were generally not organized in operons with RRs. Some archaeal genomes possessed significantly more HKs than RRs. (iv) The Gram-positive bacteria analysed in this study contained a relatively high content of Type III HKs. Similarly, Grebe & Stock (1999) found that four of the eight HKs of the Gram-positive bacterium Streptomyces coelicolor belonged to the Type III (HPK7) group. (v) Hybrid HKs were found in bacteria but have not yet been identified in archaea. Interestingly, all known eukaryotic HKs belong to the hybrid (Type IB) HK group (Grebe & Stock, 1999 ). (vi) Finally, the HKs of Aqu. aeolicus and many of the HKs in Helicobacter, Campylobacter and Halobacterium remain unclassified. As more bacterial genomes are sequenced, new HK types may be established that will encompass these as yet unclassified HKs.

In this study, a genomics approach was taken to analyse the HKs of bacterial and archaeal genomes. A different approach was taken by Grebe & Stock (1999) in which cluster analysis of 348 HKs led to a classification scheme consisting of 11 HPK (histidine protein kinase) types. A primary difference in the respective classification schemes is found in the Type I group which was separated into four different HPK types (HPK 1–4) in the Grebe & Stock (1999) study. For example, the NtrB-related HKs were placed in the HPK 4 group while phylogenetic analysis (Fig. 2) placed these HKs within the Type I (Type IC) group. In addition, Type II HKs were separated into a bacterial group (HPK 5) and an Arc. fulgidus group (HPK 6) by Grebe & Stock (1999) . Similarly, the Type III HKs were separated into a bacterial group (HPK 7) and an Mbc. thermoautotrophicum group (HPK 11). HKs that did not cluster within a defined HK group remained unclassified in our study while Grebe & Stock (1999) either did not include these HKs or gathered them into a separate subgroup. Thus, we identified 36 HKs in B. subtilis with YvcQ, YxdK and YtsB remaining unclassified (Table 5), while Grebe & Stock (1999) identified 31 HKs and placed YvcQ, YxdK and YtsB in their own subgroup (HPK3i).

We show that bacteria possessing larger genomes contained several different HK types while archaeal genomes either lacked HKs or possessed a HK family consisting of a specific type. Arc. fulgidus and Mbc. thermoautotrophicum possessed one Type I HK and a large family of either Type II or III HKs, respectively. These findings raise the question of why Type II and III HKs, rather than Type I HKs, have expanded in different archaea. Furthermore, it appears that the different HK types arose in bacteria and were acquired by archaea via lateral gene transfer (Grebe & Stock, 1999 ). Presumably, Arc. fulgidus acquired a Type II HK gene from one bacterial source while Mbc. thermoautotrophicum acquired a Type III HK from a different bacterium. It is of interest to consider whether different HK types possess distinct functions that allow micro-organisms to exploit specific ecological niches. Biochemical studies have almost exclusively focused on the Type I HKs. A comparison of the biochemical and structural properties of the various HK types may reveal differences that could further our understanding of the role that HKs play in allowing micro-organisms to adapt to specific environmental conditions.


   ACKNOWLEDGEMENTS
 
We are grateful to A. Wolfe and B. Weisblum for their helpful discussions and critical reading of this work. We thank D. Saffarini, C. Wimpee and B. Boylan for their suggestions and discussions during the course of this study.


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Alex, L. A., Borkovich, K. A. & Simon, M. I. (1996). Hyphal development in Neurospora crassa: involvement of a two-component histidine kinase. Proc Natl Acad Sci USA 93, 3416-3421.[Abstract/Free Full Text]

Appleby, J. L., Parkinson, J. S. & Bourret, R. B. (1996). Signal transduction via the multistep phosphorelay: not necessarily a road less traveled. Cell 86, 845-848.[Medline]

Bilwes, A. M., Alex, L. A., Crane, B. R. & Simon, M. I. (1999). Structure of CheA, a signal-transducing histidine kinase. Cell 96, 131-141.[Medline]

Blattner, F. R., Plunkett, G., Bloch, C. A. & 14 other authors (1997). The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1474.[Abstract/Free Full Text]

Bult, C. J., White, O., Olsen, G. J. & 37 other authors (1996). Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058–1073.[Abstract]

Chang, C., Kwok, S. F., Bleecker, A. B. & Meyerowitz, E. M. (1993). Arabidopsis ethylene-response gene ETR1: similarity of product to two-component regulators. Science 262, 539-544.[Medline]

Chang, W. T., Thomason, P. A., Gross, J. D. & Neweil, P. C. (1998). Evidence that the RdeA protein is a component of a multistep phosphorelay modulating rate of development in Dictyostelium. EMBO J 17, 2809-2816.[Abstract/Free Full Text]

Davies, R., Devlin, K., Feltwell, T. & 39 other authors (1998). Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537–544.[Medline]

Deckert, G., Warren, P. V., Gaasterland, T. & 12 other authors (1998). The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392, 353–358.[Medline]

Dutta, R., Qin, L. & Inouye, M. (1999). Histidine kinases: diversity of domain organization. Mol Microbiol 34, 633-640.[Medline]

Egger, L. A., Park, H. & Inouye, M. (1997). Signal transduction via the histidyl-aspartyl phosphorelay. Genes Cells 2, 167-184.[Abstract/Free Full Text]

Fabret, C., Feher, V. A. & Hoch, J. A. (1999). Two-component signal transduction in Bacillus subtilis: how one organism sees its world. J Bacteriol 181, 1975-1983.[Free Full Text]

Forst, S. A. & Roberts, D. L. (1994). Signal transduction by the EnvZ–OmpR phosphotransfer system in bacteria. Res Microbiol 145, 363-373.[Medline]

Garzon, A. & Parkinson, J. S. (1996). Chemotactic signaling by the P1 phosphorylation domain liberated from the CheA histidine kinase of Escherichia coli. J Bacteriol 178, 6752-6758.[Abstract]

Grebe, T. W. & Stock, J. B. (1999). The histidine protein kinase superfamily. Adv Microb Physiol 41, 139-227.[Medline]

Heidelberg, J. F., Eisen, J. A., Nelson, W. C. & 29 other authors (2000). DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406, 477–483.

Hoch, J. A. (1995). Control of cellular development in sporulation bacteria by the phosphorelay two-component signal transduction system. In Two-Component Signal Transduction , pp. 129-144. Edited by J. A. Hoch & T. J. Silhavy. Washington, DC: American Society for Microbiology Press.

Hoch, J. A. & Silhavy, T. J. (eds) (1995). Two-Component Signal Transduction. Washington, DC: American Society for Microbiology Press.

Kaneko, T., Sato, S., Kotani, H. & 21 other authors (1996). Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res 3, 109–136.[Medline]

Kawarabayasi, Y., Sawada, M., Horikawa, H. & 29 other authors (1998). Complete sequence and gene organization of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3. DNA Res 5, 147–155.[Medline]

Kawarabayasi, Y., Hino, Y., Horikawa, H. & 22 other authors (1999). Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1. DNA Res 6, 83–101.[Medline]

Klenk, H.-P., Clayton, R. A., Tomb, J.-F. & 48 other authors (1998). The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon, Archaeoglobus fulgidus. Nature 390, 364–370.

Kofoid, E. C. & Parkinson, J. S. (1988). Transmitter and receiver modules in bacterial signaling proteins. Proc Natl Acad Sci USA 85, 4981-4985.[Abstract]

Kunst, F., Ogasawara, N., Moszer, I. & 148 other authors (1997). The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390, 249–256.[Medline]

Loomis, W. F., Kuspa, A. & Shaulsky, G. (1998). Two-component signal transduction systems in eukaryotic microorganisms. Curr Opin Microbiol 1, 643-648.[Medline]

Mizuno, T. (1997). Compilation of all genes encoding two-component phosphotransfer signal transducers in the genome of Escherichia coli. DNA Res 4, 161-168.[Medline]

Mizuno, T., Kaneko, T. & Tabata, S. (1996). Compilation of all genes encoding bacterial two- component signal transducers in the genome of the cyanobacterium, Synechocystis sp. strain PCC 6803. DNA Res 3, 407-414.[Medline]

Nelson, K. E., Clayton, R. A., Gill, S. R. & 23 other authors (1999). Evidence for the lateral gene transfer between archaea and bacteria from the sequence of Thermotoga maritima. Nature 399, 323–329.[Medline]

Ng, V. W., Kennedy, P. S., Mahairasa, G. G. & 40 other authors (2000). Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci USA 97, 12176–12181.[Abstract/Free Full Text]

Ota, I. M. & Varshavsky, A. (1993). A yeast protein similar to bacterial two-component regulators. Science 262, 566-569.[Medline]

Park, H., Saha, S. K. & Inouye, M. (1998). Two-domain reconstitution of a functional protein histidine kinase. Proc Natl Acad Sci USA 95, 6728-6732.[Abstract/Free Full Text]

Parkinson, J. S. & Kofoid, E. C. (1992). Communication modules in bacterial signaling proteins. Annu Rev Genet 26, 71-112.[Medline]

Perraud, A.-L., Weiss, V. & Gross, R. (1999). Signalling pathways in two-component phosphorelay systems. Trends Microbiol 7, 115-120.[Medline]

Robinson, V. L. & Stock, A. M. (1999). High energy exchange: proteins that make or break phosphoramidate bonds. Struct Fold Des 7, R47-R53.[Medline]

Schuster, S. S., Noegel, A. A., Oehme, F., Gerisch, G. & Simon, M. I. (1996). The hybrid histidine kinase DokA is part of the osmotic response system in Dictyostelium. EMBO J 15, 3880-3889.[Abstract]

Simson, A. J. G., Reinach, F. C., Arruda, P. & 113 other authors (2000). The genome sequence of the plant pathogen Xylella fastidiosa. Nature 406, 151–157.[Medline]

Smith, D. R., Doucette-Stamm, L. A., Deloughery, C. & 34 other authors 1997). Complete genome sequence of Methanobacterium thermoautotrophicum delta H: functional analysis and comparative genomics. J Bacteriol 179, 7135–7155.

Stock, A., Chen, T., Welsh, D. & Stock, J. (1988). CheA protein, a central regulator of bacterial chemotaxis, belongs to a family of proteins that control gene expression in response to changing environmental conditions. Proc Natl Acad Sci USA 85, 1403-1407.[Abstract]

Stock, J. B., Srette, M. G., Levit, M. & Park, P. (1995). Two-component signal transduction systems: structure function relationships and mechanisms of catalysis. In Two-Component Signal Transduction , pp. 25-51. Edited by J. A. Hoch & T. J. Silhavy. Washington, DC: American Society for Microbiology Press.

Stover, C. K., Pham, X. Q., Erwin, A. L. & 28 other authors (2000). Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen. Nature 406, 959–964.[Medline]

Suzuki, T., Imamura, A., Ueguchi, C. & Mizuno, T. (1998). Histidine-containing phosphotransfer (HPt) signal transducers implicated in His-to-Asp phosphorelay in Arabidopsis. Plant Cell Physiol 39, 1258-1268.[Medline]

Tanaka, T., Saha, S. K. & Inouye, M. (1998). NMR structure of the histidine kinase domain of the E. coli osmosensor EnvZ. Nature 396, 88-92.[Medline]

Tomomori, C., Tanaka, T., Dutta, R. & 12 other authors (1999). Solution structure of the homodimeric core domain of Escherichia coli histidine kinase EnvZ. Nature Struct Biol 6, 729–734.[Medline]

White, O., Eisen, J. A., Heidelberg, J. F. & 29 other authors (1999). Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1. Science 286, 1571–1577.[Abstract/Free Full Text]

Zhou, H. & Dalquihst, F. W. (1997). Phosphotransfer site of the chemotaxis-specific protein kinase CheA as revealed by NMR. Biochemistry 36, 699-710.[Medline]

Received 13 March 2000; revised 8 January 2001; accepted 25 January 2001.