Sample sequencing of a selected region of the genome of Erwinia carotovora subsp. atroseptica reveals candidate phytopathogenicity genes and allows comparison with Escherichia coli

Kenneth S. Bell1, Anna O. Avrova1, Maria C. Holeva1, Linda Cardle2, Wayne Morris1, Walter De Jong2, Ian K. Toth1, Robbie Waugh2, Glenn J. Bryan2 and Paul R. J. Birch1

Unit of Mycology, Bacteriology and Nematology1 and Unit of Genomics2, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, UK

Author for correspondence: Paul R. J. Birch. Tel: +44 1382 562731. Fax: +44 1382 562426. e-mail: pbirch{at}scri.sari.ac.uk


   ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Genome sequencing is making a profound impact on microbiology. Currently, however, only one plant pathogen genome sequence is publicly available and no genome-sequencing project has been initiated for any species of the genus Erwinia, which includes several important plant pathogens. This paper describes a targeted sample sequencing approach to study the genome of Erwinia carotovora subsp. atroseptica (Eca), a major soft-rot pathogen of potato. A large insert DNA (approx. 115 kb) library of Eca was constructed using a bacterial artificial chromosome (BAC) vector. Hybridization and end-sequence data revealed two overlapping BAC clones that span an entire hrp gene cluster. Random subcloning and one-fold sequence coverage (>200 kb) across these BACs identified 25 (89%) of 28 hrp genes predicted from the orthologous hrp cluster of Erwinia amylovora. Regions flanking the hrp cluster contained orthologues of known or putative pathogenicity operons from other Erwinia species, including dspEF (E. amylovora), hecAB and pecSM (E. chrysanthemi), sequences similar to genes from the plant pathogen Xylella fastidiosa, including haemagglutinin-like genes, and sequences similar to genes involved in rhizobacterium–plant interactions. Approximately 10% of the sequences showed strongest nucleotide similarities to genes in the closely related model bacterium and animal pathogen Escherichia coli. However, the positions of some of these genes were different in the two genomes. Approximately 30% of sequences showed no significant similarity to any database entries. A physical map was made across the genomic region spanning the hrp cluster by hybridization to the BAC library and to digested BAC clones, and by PCR between sequence contigs. A multiple genome coverage BAC library and one-fold sample sequencing are an effective combination for extracting useful information from important regions of the Eca genome, providing a wealth of candidate novel pathogenicity genes for functional analyses. Other genomic regions could be similarly targeted.

Keywords: draft sequencing, plant pathogen, bacterial genomics, enterobacterium

Abbreviations: BAC, bacterial artificial chromomsome; Eca, Erwinia carotovora subsp. atroseptica; Ecc, Erwinia carotovora subsp. carotovora

The GenBank accession numbers for the 424 sequences determined in this work are BH614193 to BH614616.


   INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
In recent years, the emergence of complete genome sequences has made a significant impact on many fields of biology. The Institute for Genome Research (TIGR) (http://www.tigr.org) states that more than 40 complete eubacterial genome sequences have been obtained and made publicly available, with more than 100 others in progress (as of September 2001). Nevertheless, the only complete, published sequence from a plant pathogen is that of Xylella fastidiosa (Simpson et al., 2000 ). The pathogenicity of this bacterium has been poorly studied, but generation of the full genome sequence has rapidly provided a plethora of candidate genes involved in metabolism and pathogenicity that could not readily have been obtained by other approaches (Simpson et al., 2000 ). Genome sequencing projects are ongoing for representatives of the major and most widely studied genera of bacterial phytopathogens (Agrobacterium, Ralstonia, Pseudomonas and Xanthomonas) with the notable exception of Erwinia.

The genus Erwinia includes several important phytopathogens such as E. amylovora, cause of fire-blight on apples, and the soft-rot erwinias E. chrysanthemi and E. carotovora (Alfano & Collmer, 1996 ). Additional interest in erwinias derives from their place in the Enterobacteriaciae and their close relationship with the model bacterium Escherichia coli. This affinity is confirmed by molecular taxonomy: Erwinia carotovora and Escherichia coli share greater than 95% 16S rDNA sequence identity (Hauben et al., 1998 ). With the entire E. coli genome sequence available (Blattner et al., 1997 ), and ongoing sequencing of several more enterobacterial species, there is now an excellent opportunity for comparisons of genome structure and content.

E. carotovora subspecies atroseptica (Eca) and E. carotovora. subsp. carotovora (Ecc) are economically important pathogens of crops, particularly potato, worldwide. Whereas Eca has a host-range restricted to potato, on which it causes both soft-rot of tubers and stem rot (blackleg), Ecc does not cause blackleg but elicits soft-rot on a wide range of hosts (Pérombelon & Salmond, 1995 ). E. carotovora degrades plant cell walls using a variety of extracellular enzymes (particularly pectic enzymes), secreted via the type I and type II pathways (Py et al., 1998 ). This ability, along with phenotypic and molecular phylogenetic heterogeneity in the genus, has led to proposals to reclassify the soft-rot Erwinia spp. as Pectobacterium (Hauben et al., 1998 ). Soft-rot can be considered a rather crude and opportunistic form of pathogenesis but the discovery of type III protein secretion systems and of diverse regulatory mechanisms controlling extracellular enzyme synthesis in both E. chrysanthemi and E. carotovora shows that soft-rot pathogenesis is more complex than previously thought (Bauer et al., 1994 ; Barras et al., 1994 ; Mukherjee et al., 1997 ). The type III secretion system, encoded by hrp genes, is thought to allow the delivery of proteins directly into host cells via a pilus-like structure (Galán & Collmer, 1999 ). The function of hrp genes in E. carotovora remains unclear.

Almost all genes known to influence virulence in E. carotovora have been identified by either direct gene cloning or transposon mutagenesis, and demonstration of their role in virulence has largely depended on simple plant assays, usually involving stem or tuber inoculation tests (Andersson et al., 1999 ; Hinton et al., 1989 ; Pérombelon & Salmond, 1995 ). Although this approach has been very successful over the past 15 years, there are many aspects of the plant–Erwinia interaction that are poorly understood. This is partly due to the limitations of the conventional routes of analysis. Relatively little is understood about the in planta behaviour of E. carotovora in terms of the genes involved in establishing infection, or the control of gene expression and temporal order of induction or repression of specific virulence genes and their regulators later in infection.

Genome sequencing offers an alternative approach to better understanding of Erwinia pathogenicity. However, the expense and resources required for whole-genome sequencing projects are often prohibitive. By the nature of the strategies required (Frangeul et al., 1999 ), a large proportion of the cost and effort will inevitably be directed to generation of redundant sequence information (from multiple coverage random sequencing) and gap closure. In this study we used a sample-sequencing strategy for gene discovery that is focused on a particular region of the Eca genome thought to contain putative pathogenicity genes, and which is contained on two large, overlapping DNA fragments cloned into a bacterial artificial chromosome (BAC) vector. Although sample sequencing from whole bacterial genomes has been described previously (e.g. McLelland & Wilson, 1998 ; Viprey et al., 2000 ), sample sequencing of a targeted region of a bacterial genome has not. The contiguous BAC clones were chosen as they hybridize to the hrpN gene from Ecc, and should thus span a hrp gene cluster. We address the following questions. 1. Based on the structure of hrp gene clusters sequenced in other species (Galán & Collmer, 1999 ), does one-fold coverage sample sequencing of the BAC clones provide data on all predicted ORFs from the Eca hrp gene cluster? 2. What genes flank the Eca hrp gene cluster and how are they ordered? 3. Is one-fold sample sequencing of contiguous BACs a cost-effective route to provide sequence information from a high percentage of ORFs that may feed directly into downstream gene functional analyses? 4. What comparisons can be made between the genomes of Eca and the model bacterium and animal pathogen E. coli?


   METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
General.
Molecular biological techniques were carried out according to standard methods (Sambrook et al., 1989 ). BAC clones were grown in Luria–Bertani broth (LB), or on LB agar with IPTG and X-Gal (LBIX), and 12·5 µg chloramphenicol ml-1. Subclones were grown in LBIX or terrific broth (TB) with ampicillin (100 µg ml-1). Oligonucleotide primers were obtained from MWG Biotech and are listed in Table 1.


View this table:
[in this window]
[in a new window]
 
Table 1. Oligonucleotide primers used for sequencing and PCR

 
The strain of Eca used throughout was Eca SCRI 1039, which has been extensively used in epidemiological and molecular studies at SCRI (e.g. Bain et al., 1990 ; Dellagi et al., 2000a , b ).

High-molecular-mass Eca DNA preparation.
A 30 ml culture of Eca was grown to an OD600 of 0·9 in LB at 27 °C with shaking (300 r.p.m.). The culture was cooled on ice for 15 min and cells were harvested by centrifugation at 2500 r.p.m. for 10 min at 4 °C. Cells were resuspended in 4 ml buffer 1 (200 mM NaCl, 10 mM Tris/HCl pH 7·2, 100 mM EDTA), harvested and resuspended in 1 ml buffer 2 (200 mM NaCl, 10 mM Tris/HCl pH 7·2, 10 mM EDTA). Cells were incubated at 37 °C for 5 min then 42 °C for 5 min before 1·5 ml of molten 1·5% LMP agarose in distilled H2O equilibrated to 42 °C was added and mixed. This was aliquoted into 100 µl plug moulds and cooled to room temperature. Plugs were removed and incubated in 20 ml GET (50 mM glucose, 10 mM EDTA, 25 mM Tris/HCl pH 8·0, 2·5 mg lysozyme ml-1) at 4 °C for 24 h. GET was replaced with 50 ml EPS (0·5 M EDTA pH 9·2, 1% w/v Sarcosyl, 1 mg proteinase K ml-1) at 50 °C and incubated for 24 h at 50 °C with gentle shaking, washed in EPS and incubated for a further 24 h at 50 °C. Plugs were washed twice in 10 mM Tris/HCl pH 8·0, 10 mM EDTA, 1 mM PMSF for 1 h at room temperature, four times in 30 ml 10 mM Tris/HCl pH 8·0, 10 mM EDTA at 50 °C, once in 30 ml 0·5 M EDTA pH 9·2 at 50 °C for 1 h and once in 0·05 M EDTA pH 8·0 for 1 h at 4 °C. Plugs were stored in 0·05 M EDTA pH 8·0 at 4 °C until use.

BAC library construction.
Five 100 µl plugs were chopped to a slurry with a clean razor blade. The slurry was washed with 1 ml 0·1% Triton X-100, centrifuged briefly and the supernatant removed. To 50 µl slurry were added 7 µl 40 mM spermidine, 7 µl 10x endonuclease buffer and 0·7 µl 10 mg BSA ml-1 followed by incubation for 30 min on ice. HindIII enzyme (0·05 U per tube in 5 µl) was added and the samples were incubated for 30 min on ice then 30 min at 37 °C. The reaction was stopped with 7 µl 0·5 M EDTA pH 8·0. The entire sample was loaded on a 1% w/v low-melting-point (LMP) agarose gel and electrophoresed on a Bio-Rad CHEF Mapper for 18 h, 6 V cm-1, 20 s switch time and 120° included angle at 11 °C, alongside suitable high-molecular-mass markers. Gel slices containing DNA fragments >100 kb were excised from the adjacent lanes (without exposure to ethidium bromide or UV) and washed three times in TE buffer for 30 min at 4 °C. Gel slices (100 mg) were incubated in 1 ml TES buffer (TE containing 50 mM NaCl) for 1 h. TES was removed and the gel slices melted at 65 °C for 10 min, then 40 °C for 10 min. One microlitre of ß-agarase (New England Biolabs) was added and the mix was incubated at 40 °C for a further 1 h. DNA concentration was estimated by comparison with lambda concentration standards. Finally, Erwinia DNA was ligated into HindIII-digested, dephosphorylated pBeloBAC11 (Kim et al., 1996 ) (100 ng DNA, 10 ng vector, 1x ligation buffer [including ATP], 1·33 U DNA ligase, volume 100 µl) overnight at 12 °C. Ligations were desalted by drop dialysis and 1 µl aliquots were electroporated into electrocompetent E. coli DH10B cells (prepared by standard methods) using a Bio-Rad E. coli Pulser. Transformants were diluted 1:20 with SOC (Sambrook et al., 1989 ) and shaken gently at 37 °C for 1 h prior to spread-plating on LBIX. After overnight incubation at 37 °C, recombinant (white) colonies were picked and insert sizes estimated by CHEF gel analysis after digestion with NotI. Clones were transferred to two 384-well microtitre plates with 70 µl freezing medium (LB with 36 mM K2HPO4, 13·2 mM KH2PO4, 1·7 mM sodium citrate, 0·4 mM MgSO4, 6·8 mM (NH4)2SO4, 4·4%, v/v, glycerol), grown overnight at 37 °C and stored at -80 °C.

Hybridization screening of the BAC library.
BAC library clones were transferred to two nylon membranes (saturated with LB) using a sterile 384-pin plastic replicator. The membranes were then placed on LB agar, incubated at 37 °C overnight to allow colony growth and DNA was blotted by standard methods (Sambrook et al., 1989 ).

The primers hrpN-1 and hrpN-2 were selected from the Ecc hrpN gene (GenBank accession no. L78834, 679 bp to 1022 bp). Other primer pairs were selected from BAC end or subclone sequences generated in this study. PCRs were performed using either Ecc (SCRI 193) or BAC clones 2B8 or 1C22 DNA as template, Boerhinger Taq polymerase and reaction mix with a thermal profile of 94 °C for 2 min followed by 35 cycles of 94 °C for 30 s, 61·4 °C for 30 s and 72 °C for 1 min. PCR products were purified (Promega Wizard PCR Prep), 32P-labelled using High-Prime kit (Boehringer), purified through a Nick Column (Pharmacia) and hybridized under high stringency (Sambrook et al., 1989 ) to the BAC library.

End-sequencing BAC clones.
BAC DNA for end-sequencing was obtained using the method of Kelley et al. (1999) . Sequencing reactions were performed using a Big Dye Sequencing Kit (Perkin Elmer) with approx. 0·5 µg BAC DNA, 2 pmol primer (T7 or SP6) and 4 mM added MgCl2 per reaction. The thermal profile was 98 °C for 2 min followed by 100 cycles of 96 °C for 30 s, 50 °C for 20 s and 60 °C for 4 min.

BAC subcloning.
BAC DNA was prepared from 4 l of culture using large-scale plasmid preps (Qiagen) according to the manufacturer’s instructions. The DNA was incubated overnight with PlasmidSafe (Cambiolab), using at least 20 µl enzyme per 100 µg DNA, to remove E. coli chromosomal DNA. Nebulization was performed using 30–50 µg DNA in 2 ml aliquots in a pre-chilled ‘Sidestream’ Nebuliser (Medic-Aid) on ice. Two samples were nebulized, one at 1 bar for 90 s in 10% glycerol, 10 mM Tris/HCl pH 8, 5 mM NaCl and the other at 1 bar for 20 s in 10 mM Tris/HCl pH 8, 5 mM NaCl using H2 or N2 gas. Nebulized DNA was transferred to 2 ml microtubes, precipitated with 2-propanol, redissolved in 250 µl 1x T4 DNA polymerase buffer containing 50 U T4 DNA polymerase and 500 nM dNTPs, and incubated for 30 min at room temperature for end repair. Following electrophoresis, gel slices containing DNA of the desired size fractions were excised and digested with Gelase (Cambiolab) overnight, according to the manufacturer’s instructions. Following sample extraction with phenol/chloroform and 2-propanol precipitation, samples were phosphorylated using T4 kinase according to the manufacturer’s instructions (Gibco). DNA samples were phenol/chloroform extracted twice, ethanol precipitated, dissolved in 30–50 µl TE and quantified by comparison with standards. Alternatively, DNA was fragmented by partial digestion using Sau3A1 (approx. 40 ng BAC DNA with 1 U enzyme for 30 min). DNA fragments of 600–800 bp were size-fractionated for cloning.

A high-copy-number cloning vector, pGEM3zf (Promega), was digested with HincII or BamHI (for insertion of sheared or Sau3AI-digested fragments respectively) and then dephosphorylated with shrimp alkaline phosphatase (New England Biolabs). Ligations were performed using 20–50 ng vector with 20–100 ng of insert DNA and 1 µl aliquots were used to transform electro-competent E. coli DH10B cells as described previously. Recombinant colonies were individually transferred to 1·3 ml TB in a 96-deep-well block. Blocks were covered with a gas-permeable seal and shaken for approximately 22 h prior to harvesting cells (3000 g for 7 min). The supernatant was removed and plasmid DNA was extracted by alkaline lysis using a Biomek 2000 robot, according to the University of Oklahoma Advanced Center for Genome Technology (http://www.genome.ou.edu/Biomek2000_dsisol_v1.html). DNA was dissolved in 50 µl H2O and 1 or 2 µl aliquots were used in sequencing reactions (Big Dye sequencing kit, Perkin Elmer) with SP6 or T7 primers (protocol at http://www.genome.ou.edu/big_dyes_plasmid.html). Reaction products were analysed using an ABI Prism 377 DNA Sequencer with 96-lane upgrade, according to the manufacturer’s instructions. Sequences were edited to remove vector sequence and regions of poor quality and searched using BLASTX or BLASTN (Altschul et al., 1997 ) against the GenBank ‘nr’ database (http://www.ncbi.nlm.nih.gov/).

Investigating the presence of sequences 95–100% similar to E. coli by PCR.
Primers were chosen from 20 subclone sequences that were 95–100% similar at the nucleotide level to E. coli (Table 1). Each primer pair was tested by PCR (as previously but with an annealing temperature of 60 °C or 64 °C) using either Eca or E. coli cells to provide DNA template.


   RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Hybridization screening and end-sequencing of BAC clones reveals a putative hrp gene cluster in Eca
To facilitate gene discovery and investigations of genome structure in Eca a BAC library was made with a mean insert size of 115 kb. BAC end-sequences are being used to design hybridization probes to construct a physical map of the genome (data not shown). Amongst the probes was a portion of the hrpN gene from Ecc, PCR amplified from Ecc genomic DNA using primers shown in Table 1. The hrpN probe hybridized to 15 clones in the BAC library, including 2B8 and 1C22, the insert sizes of which are approximately 112 kb and 90 kb respectively. Of two probes derived from each end-sequence of 2B8, only 2B8SP6 hybridized to both 2B8 and 1C22. Similarly, only the 1C22T7 probe hybridized to both 1C22 and 2B8. The 2B8SP6 and 1C22T7 end-sequences are homologuous to dspE and hrcJ respectively, both of which lie in the E. amylovora hrp–dsp cluster (see Table 2). This suggests that 2B8 and 1C22 span an entire hrp–dsp cluster, overlapping each other by approximately 9 kb (based on the distance between dspE and hrcJ in the E. amylovora hrp–dsp gene cluster) and thus predicted to cover almost 200 kb of the Eca genome. To test whether the BACs do indeed span the Eca hrp–dsp gene cluster, one-fold coverage sample (draft) sequencing of each BAC clone (i.e. 112 kb for 2B8 and 90 kb for 1C22) was conducted.


View this table:
[in this window]
[in a new window]
 
Table 2. hrp–dsp gene-like sequences from Eca BAC clones 2B8 and 1C22

 
Sample sequencing of BAC clones 2B8 and 1C22
Following removal of vector sequences and BLASTN searches against the GenBank ‘nr’ database, some sequences showed 95–100% nucleotide identity to E. coli sequences. PCR using primers designed to anneal to 11 such sequences gave the expected amplification products when using E. coli DNA as template but not when using Eca DNA. This indicates that these sequences were contaminating E. coli genomic DNA rather than cloned Eca DNA. All sequences showing 95–100% nucleotide identity to E. coli were thus eliminated from further analyses.

The remaining 364 sequences from 2B8, yielding a total of 118150 bp, and 256 sequences from 1C22, yielding a total of 99537 bp, represented a slight excess of the target one-fold coverage in each case. The sequences were assembled into contigs using CAP3 software (http://genome.cs.mtu.edu/cap/cap3.html). Only 84 sequences did not fall into contigs (singletons). In contrast, 536 sequences fell into 143 contigs, which ranged in size from 52 bp to 2346 bp, with a mean size of 651 bp. Taken together, the contigs and singletons comprise 119325 bp of independent sequence, approximately 60% of the predicted region covered by the BAC clones.

BLASTX/N searches were conducted on all contigs and singletons, and these sequences were compiled into categories according to the organism providing the strongest database match (Fig. 1). Sequences were categorized as ‘no match’ if the BLAST search found no similar sequences. The ‘no match’ category would be expected to include non-coding sequences in addition to novel ORFs. Perhaps unsurprisingly, given that the locus contains a putative hrp gene cluster, similarities to Erwinia genes constituted a significant percentage of the sequences from each BAC. In addition, a high percentage of sequences from each BAC showed no similarities to database sequences. The majority of strong matches to sequences from enterobacteria, including E. coli, were found on 2B8. Conversely, the majority of sequences with strongest similarities to genes in X. fastidiosa, the only plant pathogen to be completely sequenced, were found on 1C22 (Fig. 1).



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 1. Subcloned sequences from Eca BAC clones 2B8 and 1C22 categorized according to the origin of their most similar matching sequence by BLASTX search against the GenBank ‘nr’ database.

 
Sequences showing similarities to Erwinia genes
The majority of Erwinia-like sequences show significant similarity to genes in the completely sequenced hrp–dsp gene cluster of E. amylovora. However, in some cases the closest similarity was to hrp–dsp genes in other Erwinia species, e.g. to the recently published partial sequence of the hrp cluster from Ecc (Rantakari et al., 2001 ). All E. amylovora hrp and dsp genes are listed in Table 2, in the order in which they occur in this species (deduced from GenBank accession numbers and Bogdanove et al., 1998 ), along with details of the orthologous sequences obtained from Eca BAC clones 2B8 and 1C22.

In addition to hrp–dsp genes, sequences similar to Erwinia hec, pec, pel and peh genes were found (Table 3). In E. chrysanthemi, the hecAB genes are located within the hrp cluster (Kim et al., 1998 ). PCR with primers designed to anneal to each of the hec-like Eca sequences generated amplification products from 1C22 but not from 2B8. As the entire hrp cluster resides on 2B8, it thus appears that the hecAB genes of Eca are located outside the hrp cluster and are therefore arranged differently in Eca and E. chrysanthemi.


View this table:
[in this window]
[in a new window]
 
Table 3. Non-hrp sequences from BAC clones 2B8 and 1C22 with similarities to genes from Erwinia or other plant-associated microbes, or encoding putative pathogenicity factors

 
Sequence similarities to genes from other plant-associated microbes or to genes with a likely role in pathogenesis
Details of sequences similar to genes from other plant-associated microbes, or with a possible role in pathogenesis, are shown in Table 3. These include sequences similar to rhizobacterial genes involved in catabolism of opines (chrysopine, agropine, mannopine and rhizopine). In addition, sequence similarities were observed to two att genes (involved in attachment to plant cells; Matthysse et al., 2000 ) from Agrobacterium tumefaciens. In most cases, both opine-catabolism-like and att-like genes also show strongest similarities to ABC-transporter-like genes in P. aeruginosa, with the exception of attK-like and mocR-like sequences.

In addition to the hecAB orthologues, there are other sequences from 1C22 with similarity to adhesin, haemagglutanin and haemolysin genes from mammalian pathogens such as Neisseria meningitidis and Pseudomonas aeruginosa, and the plant pathogen X. fastidiosa (Simpson et al., 2000 ).

Sample sequencing allows a comparison of genome structure between Eca and E. coli
A number of Eca sequences show strongest similarities to E. coli ORFs. These sequences, and their position in the E. coli genome, are shown in Table 4. The E. coli genome (4·6 Mb) has been divided into 100 minutes. Each minute thus represents approximately 46 kb. Eca sequence similarities were observed to genes throughout the E. coli genome, from 23 to 99 min. Assuming a similar genome size, we would predict the region of the Eca genome covered by the two overlapping BAC clones to be no more than 5 min. These results suggest rearrangement between the genomes. Nevertheless, the majority of E. coli-like sequences in 2B8 reside between 27 and 36·5 min on the E. coli chromosome. In particular, the ORFs ychM, chaA, chaC and goaG are adjacent to nar genes (involved in nitrite/nitrate metabolism) in both Eca and E. coli, suggesting some conservation in gene order between these enterobacteria. In addition, ORFs b1598, ydgB, rstA and rstB are all clustered between 36 and 36·3 mins in the E. coli genome and appear to also be clustered in the Eca genome.


View this table:
[in this window]
[in a new window]
 
Table 4. Sequences from Eca BAC clones 2B8 and 1C22 with closest similarity to E. coli sequences

 
A physical map across the 1C22/2B8 region of Eca
To provide a physical map of the order of sequences flanking the hrp–dsp gene cluster in Eca, five sequences specific to 1C22 and six specific to 2B8 were used as probes. The probes were sequences similar to pelB, hecA, pehA, a probable haemagglutinin gene from P. aeruginosa and a haemagglutinin-like protein gene from X. fastidiosa (all from 1C22) and sequences similar to hrcU, pecS, mocR, narX, agtA and a chitin synthase gene (all from 2B8). Each sequence was PCR amplified (using primers shown in Table 1) then probed to the BAC library and to Southern blots of either 1C22 or 2B8 DNA (restriction-digested with BamHI, BglI, EcoRI, EcoRV, HindIII, KpnI, PstI, SalI or XhoI). Hybridizations to individual BAC clones were confirmed by PCR with each gene in turn, using each relevant BAC clone as a template. The order of these sequences across the genomic region spanned by 2B8 and 1C22 was determined by examining the differential hybridization patterns shown by the different probes (Fig. 2A).



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 2. (A) A physical map of the 200 kb region of Eca spanned by BAC clones 1C22 and 2B8. Names across the top of this region refer to sequences that were PCR amplified using primers in Table 1 and used as probes to hybridize both to the BAC library and to Southern blots of restriction-digested 1C22 or 2B8 (as appropriate). Some of these names are standard gene names but in addition, hem refers to a sequence similar to a probable haemagglutinin gene from P. aeruginosa; xhem to a sequence similar to X. fastidiosa haemagglutinin-like genes and chi to a sequence similar to a chitin synthase gene from Aspergillus fumigatus (all in Table 3). The positions of these sequences show their relative order across this region of the genome but not the distances between them. The relative positions of BAC clones (1C22, 2B8, 1J17, 1I1, 2C21, 1A24 and 2H1) are shown beneath the 200 kb region of Eca. (B) The relative positions and sizes of each of the hrp, dsp and hec genes are shown, as determined from the content of sequence contigs (Tables 2 and 3) and by PCR between contigs (amplification products P1–P12 are indicated beneath the diagram of the hrp–dsp cluster) using primers indicated in Table 1. P13–P16 represent unidirectional sequence reads from dspF to hecB using a series of primers in Table 1. PCR products P3 and P8 were sequenced to confirm the presence of hrpO and hrpA/hrpB respectively.

 
Of the 30 genes in the E. amylovora hrp–dsp cluster, 27 Eca orthologues were identified by sample sequencing. Assembling the sequences into contigs revealed, in many cases, that hrp–dsp gene order was conserved between E. amylovora and Eca (e.g. see contigs 36 and 99, Table 2). To assess whether the order of all hrp–dsp genes was conserved between these species, primers designed to anneal to each of the Eca hrp and dsp genes were used to PCR amplify between each of the sequence contigs. In all cases, PCR amplification products of the expected size were generated, indicating that gene order was conserved in the E. amylovora and Eca clusters (Fig. 2B). Only three Eca hrp orthologues, hrpO, hrpA and hrpB, were not identified by sample sequencing (Table 2). PCR amplification products spanning the regions predicted to contain these ‘missing’ ORFs were directly sequenced, and this revealed that each ORF was indeed present and in the expected position relative to flanking genes. In addition a sequence walk-out from dspF revealed that hecB was adjacent (Fig. 2B). The hecAB genes were determined to be adjacent to each other because sequence reads showing strong similarities to each of these genes were obtained from opposite ends of a single 1C22 subclone (Table 3).


   DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Physical mapping and one-fold sample sequencing reveal an intact hrp–dsp gene cluster and other candidate pathogenicity genes in Eca
Many virulence genes in E. carotovora have been isolated by transposon mutagenesis, e.g. those involved in plant cell wall degradation, enzyme regulation and export (Andersson et al., 1999 ; Hinton et al., 1989 ; Pérombelon & Salmond, 1995 ; Py et al., 1998 ). Alterations to the virulence phenotype are then assessed using simple plant assays involving inoculations of stems or tubers, often with artificially high concentrations of bacterial cells (Pérombelon & Salmond, 1995 ). However, the molecular bases for establishing an infection (where more subtle interactions involving lower numbers of bacteria may be taking place) or of host-range between closely related erwinias, are poorly understood and do not readily lend themselves to such approaches. The primary aim of this work was thus to present an alternative approach to gene discovery in Erwinia spp. by demonstrating that the combination of physical mapping, using a BAC library, and targeted sample sequencing was an effective way to identify novel candidate pathogenicity genes from the potato pathogen Eca.

An Eca BAC library was made, and is being used to generate a physical map of the Eca genome (results not shown). A region of two overlapping BACs, predicted to span an entire hrp–dsp gene cluster, was identified by a combination of BAC end sequencing and hybridization probing. The hrp–dsp gene cluster in E. amylovora has been sequenced and comprises 30 ORFs, covering approximately 30 kb of DNA. Despite the independent sequence coverage obtained after contiging (119325 bp) only representing 60% of the region spanned by the BACs, 27 (90%) of the predicted 30 Eca hrp–dsp ORFs were identified. Thirteen of these ORFs (approx. 9 kb) are included in the overlap between the BACs, where two-fold sequence coverage would be expected. Nevertheless, one-fold sequencing of 2B8 alone still revealed 25 (86%) of the 29 hrp–dsp genes on this BAC. The three ORFs that were not initially identified, hrpA, hrpB and hrpO, were relatively small (<446 bp) and thus less likely to be detected by one-fold sample sequencing. PCR amplification across the regions predicted to contain these ORFs revealed that they were indeed present in Eca. Extrapolation of this PCR strategy facilitated a detailed map of the Eca hrp–dsp gene cluster and showed that, whereas the Eca cluster is similar in content and arrangement to that of E. amylovora, it is different to that in the more closely related soft-rot potato pathogen E. chrysanthemi. Moreover, it is different to the hrp–dsp cluster of E. herbicola, which lacks the hrpW and hrpW-chaperone genes identified here (Mor et al., 2001 ). In addition, genes encoding two Hrp-dependent effector molecules identified in this work, dspE and hrpW, have not yet been observed in other soft-rot erwinias. The conservation of the hrp–dsp gene clusters between E. amylovora and Eca has thus provided a useful test of the number of ORFs that may be detected and identified by low-level draft sequencing of overlapping BAC clones.

In addition to the hrp and dsp genes, pel, peh, pec and hec genes were identified. E. chrysanthemi is the only Erwinia species previously reported to possess pecSM genes, which encode repressors of pel and cel gene expression (Praillet et al., 1997 ) and inducers of peh gene expression (Nasser et al., 1999 ). The hecAB genes have also only previously been observed in E. chrysanthemi; they are of unknown function but show strong sequence similarity to haemolysin genes and are predicted to be co-regulated with operons involved in pectinolysis (Kim et al., 1998 ). A haemolysin-like gene in Pseudomonas putida is involved in adhesion of the bacterium to plant seeds (Espinosa-Urgel et al., 2000 ). In addition to the hec-like sequences, several other sequences were observed with strong similarities to adhesin, haemagglutinin and haemolysin genes in mammalian pathogens and the plant pathogen X. fastidiosa. This finding is perhaps surprising, as X. fastidiosa is not an enterobacterium, has no known pathogenicity factors in common with Erwinia (it lacks pectinolytic and hrp genes) and is considered a specialized plant pathogen rather than a saprophyte or opportunistic pathogen. All of the haemagglutinin- and haemolysin-like sequences were obtained from the 1C22 BAC clone, although they seem to lie at different loci along the length of this BAC. A number of sequences similar to rhizobacterium genes were observed, including attachment and opine catabolism genes from Agrobacterium tumefaciens, and opine catabolism genes from rhizobacteria. These sequences were all located on 2B8 and the sequences similar to agtA and mocR were shown by hybridization to reside on a single EcoRI fragment and a single SalI fragment, both of approximately 23 kb in size. It is thus possible that they are derived from a single operon. Agrobacterium and other rhizobacteria induce opine synthesis in planta to provide a specialized nutrient source (Kemp, 1982 ; Murphy et al., 1993 ). Although opine catabolism is known in other bacteria, including Pseudomonas spp. (Beauchamp et al., 1991 ; Gardener & de Bruijn, 1998 ), it has not been reported in Erwinia spp. or other enterobacteria.

The functions of all of the Eca sequences identified above are unknown. However, Rantakari et al. (2001) have recently shown that type III secretion plays a role in pathogenesis in Ecc. As Eca is amenable to genetic manipulation, the information gained from sample sequencing could feed directly into gene functional studies through, for example, PCR identification of specific gene knock-outs from a Tn5 mutation library or microarrays for genome-scale analyses of gene expression.

Comparisons of genome organization between Eca and E. coli
Erwinia spp. are the closest plant-pathogenic relatives to the model bacterium and animal pathogen E. coli. Comparisons between the genomes of Eca and E. coli will thus contribute to our understanding of evolution within the Enterobacteriaceae and may also indicate common or distinct mechanisms in animal and plant pathogenesis. In a preliminary assessment of structural organization between Eca and E. coli, we compared the locations of only those sequences showing the strongest similarity to E. coli ORFs. Approximately 10% of the Eca sequences showed strongest similarity to those of E. coli. The majority of these were obtained from clone 2B8 and comprise the nar gene equivalents of these two species. The close association of ychM, chaA, chaC and goaG gene orthologues to nar genes in both E. coli and Eca suggests a degree of conservation in genome organization. Moreover, the b1598, ydgB, rstA and rstB ORFs are apparently clustered in E. coli and Eca. The order of orthologous genes on the chromosomes of different enterobacteria is usually conserved (Brunder & Karch, 2000 ), or else large regions may be rearranged but with conserved gene order within each region (Liu & Sanderson, 1996 ). However, many Eca sequences from the sampled BACs were strongly similar to sequences located throughout the E. coli genome, indicating considerably different gene order between these species. Such a lack of conservation in genome organization has been reported between the closely related Bacillus cereus and Bacillus subtilis (Økstad et al., 1999 ).

In conclusion, sample sequencing of overlapping BAC clones from a region of interest has yielded numerous novel candidate pathogenicity sequences and allowed a preliminary comparison of structural organization between the genomes of Eca and E. coli. Moreover, a multiple genome coverage BAC library of Eca, in conjunction with Southern blotting, facilitated the generation of a physical map across the 200 kb genomic region spanned by 2B8 and 1C22. This will allow comparisons with other erwinias to assess conservation of gene order, content, and the relative location of the hrp–dsp cluster.

Information from a high percentage of ORFs may be obtained by one-fold sample sequencing, as was demonstrated by the identification of 90% of expected hrp–dsp genes. As a physical map of Eca develops, this approach could be extended to the entire genome. One-fold sequence coverage of a minimum tiling path of BACs from around the genome would rapidly generate information for high-throughput analyses of gene expression and function at relatively low cost: a strategy that could be applied to other bacteria where genomic information is lacking and where its acquisition is limited by funds and resources.


   ACKNOWLEDGEMENTS
 
The authors thank Clare McQuade for running sequencing gels. The Scottish Crop Research Institute is grant-aided from the Scottish Executive Environment and Rural Affairs Department (SEERAD) and this project was funded by SEERAD (flexible fund SCR/98/523).


   REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Alfano, J. R. & Collmer, A. (1996). Bacterial pathogens in plants: life up against the wall. Plant Cell 8, 1683-1698.[Free Full Text]

Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402.[Abstract/Free Full Text]

Andersson, R. A., Palva, E. T. & Pirhonen, M. (1999). The response regulator expM is essential for the virulence of Erwinia carotovora subsp. carotovora and acts negatively on the sigma factor RpoS (sigma(S)). Mol Plant–Microbe Interact 12, 575-584.[Medline]

Bain, R. A., Pérombelon, M. C. M., Tsror, L. & Nachmias, A. (1990). Blackleg development and tuber yield in relation to numbers of Erwinia carotovora subsp. atroseptica on seed tubers. Plant Pathol 39, 125-133.

Barras, F., van Gijsegem, F. & Chatterjee, A. K. (1994). Extracellular enzymes and pathogenesis of soft-rot Erwinia. Annu Rev Phytopathol 32, 201-234.

Bauer, D. W., Bogdanove, A. J., Beer, S. V. & Collmer, A. (1994). Erwinia chrysanthemi hrp genes and their involvement in soft rot pathogenesis and elicitation of the hypersensitive response. Mol Plant–Microbe Interact 7, 573-581.[Medline]

Beauchamp, C. J., Kloepper, J. W, Lifshitz, R., Don, P. & Antoun, H. (1991). Frequent occurrence of the ability to utilize octopine in rhizobacteria. Can J Microbiol 37, 158-164.

Blattner, F. R., Plunkett, G., III, Bloch, C. A. & 14 other authors (1997). The complete genome sequence of Escherichia coli K12. Science 277, 1453–1462.[Abstract/Free Full Text]

Bogdanove, A. J., Kim, J. F., Wei, Z., Kolchinsky, P., Charkowski, A. O., Conlin, A. K., Collmer, A. & Beer, S. V. (1998). Homology and functional similarity of an hrp-linked pathogenicity locus, dspEF, of Erwinia amylovora and the avirulence locus avrE of Pseudomonas syringae pathovar tomato. Proc Natl Acad Sci USA 95, 1325-1330.[Abstract/Free Full Text]

Brunder, W. & Karch, H. (2000). Genome plasticity in Enterobacteriaceae. Int J Med Microbiol 290, 153-165.[Medline]

Dellagi, A., Heilbronn, J., Avrova, A. O. & 7 other authors (2000a). A potato gene encoding a WRKY-like transcription factor is induced in interactions with Erwinia carotovora subsp. atroseptica and Phytophthora infestans and is co-regulated with class I endo-chitin expression. Mol Plant–Microbe Interact 13, 1092–1101.[Medline]

Dellagi, A., Birch, P. R. J., Heilbronn, J., Lyon, G. & Toth, I. K. (2000b). cDNA-AFLP analysis of differential gene expression in the bacterial plant pathogen Erwinia carotovora. Microbiology 146, 165-171.[Abstract/Free Full Text]

Espinosa-Urgel, M., Salido, A. & Ramos, J.-L. (2000). Genetic analysis of functions involved in adhesion of Pseudomonas putida to seeds. J Bacteriol 182, 2363-2369.[Abstract/Free Full Text]

Frangeul, L., Nelson, K. E., Buchrieser, C., Danchin, A., Glaser, P. & Kunst, F. (1999). Cloning and assembly strategies in microbial genome projects. Microbiology 145, 2625-2634.[Free Full Text]

Galán, J. E. & Collmer, A. (1999). Type III secretion machines: bacterial devices for protein delivery into host cells. Science 284, 1322-1328.[Abstract/Free Full Text]

Gardener, B. B. M. & de Bruijn, F. J. (1998). Detection and isolation of novel rhizopine-catabolizing bacteria from the environment. Appl Environ Microbiol 64, 4944-4949.[Abstract/Free Full Text]

Hauben, L., Moore, E. R. B., Vautern, L., Steenackers, M., Mergaert, J., Verdonck, L. & Swings, J. (1998). Phylogenetic position of phytopathogens within the Enterobacteriaceae. Syst Appl Microbiol 21, 384-397.[Medline]

Hinton, J. C. D., Sidebottom, J. M., Hyman, L. J., Pérombelon, M. C. M. & Salmond, G. P. C. (1989). Isolation and characterisation of transposon-induced mutants of Erwinia carotovora subsp. atroseptica exhibiting reduced virulence. Mol Gen Genet 217, 141-148.[Medline]

Kelley, J. M., Field, C. E., Craven, M. B., Bocskai, D., Kim, U.-J., Rounsley, S. D. & Adams, M. D. (1999). High throughput direct end sequencing of BAC clones. Nucleic Acids Res 27, 1539-1546.[Abstract/Free Full Text]

Kemp, J. D. (1982). Plant pathogens that engineer their hosts. In Phytopathogenic Prokaryotes , pp. 443-457. Edited by M. S. Mount & G. H. Lacy. New York:Academic Press.

Kim, J. F., Ham, J. H., Bauer, D. W., Collmer, A. & Beer, S. V. (1998). The hrpC and hrpN operons of Erwinia chrysanthemi EC16 are flanked by plcA and homologs of hemolysin/adhesin genes and accompanying activator/ transporter genes. Mol Plant–Microbe Interact 11, 563-567.[Medline]

Kim, U. J., Birren, B. W., Slepak, T., Mancino, V., Boysen, C., Kang, H. L., Simon, M. I. & Shizuya, H. (1996). Construction and characterization of a human bacterial artifical chromosome library. Genomics 34, 213-218.[Medline]

Liu, S.-L. & Sanderson, K. E. (1996). Highly plastic genome organization in Salmonella typhi. Proc Natl Acad Sci USA 93, 10303-10308.[Abstract/Free Full Text]

Matthysse, A. G., Yarnall, H., Boles, S. B. & McMahan, S. (2000). A region of the Agrobacterium tumefaciens chromosome containing genes required for virulence and attachment to host cells. Biochim Biophys Acta 1490, 208-212.[Medline]

McLelland, M. & Wilson, R. K. (1998). Comparison of sample sequences of the Salmonella typhi genome to the sequence of the complete Escherichia coli K-12 genome. Infect Immun 66, 4305-4312.[Abstract/Free Full Text]

Mor, H., Manulis, S., Zuck, M., Nizan, M., Coplin, D. L. & Barash, I. (2001). Genetic organization of the hrp gene cluster and dspAE/BF operon in Erwinia herbicola pv. gypsophilae. Mol Plant–Microbe Interact 14, 431-436.[Medline]

Mukherjee, A., Cui, Y., Liu, Y. & Chatterjee, A. K. (1997). Molecular characterization and expression of the Erwinia carotovora hrpNEcc gene, which encodes an elicitor of the hypersensitive reaction. Mol Plant–Microbe Interact 10, 462-471.[Medline]

Murphy, P. J., Trenz, S. P., Grzemski, W., de Bruin, F. J. & Schell, J. (1993). The Rhizobium meliloti rhizopine mos locus is a mosaic structure facilitating its symbiotic regulation. J Bacteriol 175, 5193-5204.[Abstract]

Nasser, W., Shevchik, V. E. & Hugouvieux-Cotte-Pattat, N. (1999). Analysis of three clustered polygalacturonase genes in Erwinia chrysanthemi 3937 revealed an anti-repressor function for the PecS regulator. Mol Microbiol 34, 641-650.[Medline]

Økstad, O. A., Henga, I., Lindbäck, T., Rishovd, A.-L. & Kolstø, A.-B. (1999). Genome organization is not conserved between Bacillus cereus and Bacillus subtilis. Microbiology 145, 621-631.[Abstract]

Pérombelon, M. C. M. & Salmond, G. P. C. (1995). Bacterial soft rots. In Pathogenesis and Host Specificity in Plant Diseases , pp. 1-17. Edited by U. S. Singh, R. P. Singh & K. Kohmoto. Oxford:Pergamon Press.

Praillet, T., Revcheron, S., Robert-Baudouy, J. & Nasser, W. (1997). The PecM protein is necessary for the DNA-binding capacity of the PecS repressor, one of the regulators of virulence-factor synthesis in Erwinia chrysanthemi. FEMS Microbiol Lett 154, 265-270.[Medline]

Py, B., Barras, F., Harris, S., Robson, N. & Salmond, G. P. C. (1998). Extracellular enzymes and their role in Erwinia virulence. Methods Microbiol 27, 157-168.[Medline]

Rantakari, A., Virtaharju, S., Taira, S., Palva, E. T., Saarilahti, H. T. & Romantschuk (2001). Type III secretion contributes to the pathogenesis of the soft-rot pathogen Erwinia carotovora: partial characterisation of the hrp gene cluster. Mol Plant–Microbe Interact 14, 962–968.[Medline]

Sambrook, J., Maniatis, T. & Fritsch E. F. (1989). Molecular Cloning: a Laboratory Manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory.

Simpson, A. J. G., Reinach, F. C., Arruda, P. & 113 other authors (2000). The genome sequence of the plant pathogen Xylella fastidiosa. Nature 406, 151–159.[Medline]

Viprey, V., Rosenthal, A., Broughton, W. J. & Perret, X. (2000). Genetic snapshots of the Rhizobium species NGR234 genome. Genome Biol 1(6), research0014.–0014.17. http://www.genomebiology.com/2000/1/6/research/0014/.

Received 18 October 2001; revised 8 January 2002; accepted 14 January 2002.