Whole genome analysis of the Epiphyas postvittana nucleopolyhedrovirus

Otto Hyink1, Ross A. Dellow1, Michael J. Olsen1, Katherine M. B. Caradoc-Davies1, Kylie Drake1, Elisabeth A. Herniou2,3, Jennifer S. Cory3, David R. O’Reilly2 and Vernon K. Ward1

Department of Microbiology, Otago School of Medical Sciences, University of Otago, PO Box 56, Dunedin, New Zealand1
Department of Biological Sciences, Imperial College of Science, Technology and Medicine, London SW7 2AZ, UK2
Ecology and Biocontrol Group, Centre for Ecology and Hydrology, Mansfield Road, Oxford OX1 3SR, UK3

Author for correspondence: Vernon Ward. Fax +64 3 4798540. e-mail vernon.ward{at}stonebow.otago.ac.nz


   Abstract
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
The nucleotide sequence of the Epiphyas postvittana nucleopolyhedrovirus (EppoMNPV) genome has been determined and analysed. The circular dsDNA genome contains 118584 bp, making it the smallest group I NPV sequenced to date. The genome has a G+C content of 40·7% and encodes 136 predicted open reading frames (ORFs), five homologous repeat regions and one unique repeat region. Of the genome, 92·9% encodes predicted ORFs and 2·2% is in repeat regions; the remaining 4·9% of the genome comprises nonrepeat intergenic regions. EppoMNPV encodes homologues of 126 Orgyia pseudotsugata MNPV (OpMNPV) ORFs and 120 Autographa californica MNPV ORFs, with average identities of 64·7 and 53·5%, respectively. Between the four sequenced group I NPVs, 117 ORFs are conserved, whereas 86 ORFs are conserved between all fully sequenced NPVs. A total of 62 ORFs is present in all baculoviruses sequenced to date, with EppoMNPV lacking a homologue of the superoxide dismutase (sod) gene, which has been found in all other fully sequenced baculoviruses. Whole genome phylogenetic analyses of the ten fully sequenced baculoviruses using the sequences of the 62 shared genes, gene content and gene order data sets confirmed that EppoMNPV clusters tightly with OpMNPV in the group I NPVs. The main variation between EppoMNPV and OpMNPV occurs where extra clusters of genes are present in OpMNPV, with sod occurring in one such cluster. EppoMNPV encodes one truncated baculovirus repeated ORF (bro) gene. The only repeated ORFs are the four iap genes. Eight, randomly distributed, unique ORFs were identified on EppoMNPV, none of which show any significant homology to genes in GenBank.


   Introduction
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
Baculoviruses are being studied for their use in the expression of recombinant proteins and for their application as biological control agents for insect pests. The improvement of baculoviruses for both these applications requires a detailed knowledge of baculovirus infection and replication. The general features of baculoviruses are reviewed by Blissard et al. (2000) . The Baculoviridae are a large family of rod-shaped, occluded viruses with circular dsDNA genomes. They range in size from 80 to 180 kbp and are pathogenic for invertebrates. Baculoviruses form occlusion bodies (OBs) of a crystalline protein matrix and are divided into two genera, based, in part, on how the virions are packaged in this matrix. Granulovirus (GV) virions are occluded as single virions per OB, while nucleopolyhedrovirus (NPV) virions are packaged with multiple virions per OB. A further phenotypic distinction can be made for NPVs based on the number of nucleocapsids packaged into each virion, being either single capsid NVPs (SNPVs) or multiple capsid NVPs (MNPVs), although this appears to have no phylogenetic value. NPVs have been subdivided into two distinct groups based on molecular phylogenies (Zanotto et al., 1993 ). Group I NPVs are closely related, while group II NPVs appear to be more divergent.

Epiphyas postvittana MNPV (EppoMNPV) is a multiply embedded (unpublished data) NPV pathogenic for the light brown apple moth, Epiphyas postvittana, which is part of a major horticultural pest complex comprising seven species of leafroller insects present in New Zealand. EppoMNPV has a genome of under 120 kbp, making it the smallest group I NPV genome characterized to date (Hyink et al., 1998 ). The small size of the EppoMNPV genome and the economic importance of its host make EppoMNPV an important virus to study.

To better understand the evolution of baculoviruses and the molecular mechanisms behind baculovirus infection and replication, the sequencing of baculovirus genomes has been undertaken by a number of research groups. Six NPVs, Autographa californica MNPV (AcMNPV) (Ayres et al., 1994 ), Bombyx mori NPV (BmNPV) (Gomi et al., 1999 ), Orgyia pseudotsugata MNPV (OpMNPV) (Ahrens et al., 1997 ), Lymantria dispar MNPV (LdMNPV) (Kuzio et al., 1999 ), Spodoptera exigua MNPV (SeMNPV) (IJkel et al., 1999 ) and Helicoverpa armigera SNPV (HaSNPV) (Chen et al., 2001 ), and three GVs, Xestia c-nigrum GV (XecnGV) (Hayakawa et al., 1999 ), Plutella xylostella GV (PxGV) (Hashimoto et al., 2000 ) and Cydia pomonella GV (CpGV) (Luque et al., 2001 ), have now been completely sequenced. The genomes of these baculoviruses range in size from 101 to 179 kbp.

Phylogenetic analyses using the EppoMNPV polyhedrin (polh) and ecdysteroid UDP-glucosyltransferase (egt) genes place EppoMNPV in the group I NPVs (Caradoc-Davies et al., 2001 ; Hyink et al., 1998 ). Baculovirus phylogeny has traditionally been performed using the sequences of single genes (Zanotto et al., 1993 ) and conflicts between phylogenies based on different genes are often observed (Federici & Hice, 1997 ). The growing number of whole genome sequences available has allowed the development of phylogenetic methods based on these complete genomes using genes common to all viruses, gene order and total gene content as the base data sets. Phylogeny studies using these data sets support the divisions between the GVs and the NPVs and between the group I and II NPVs (Herniou et al., 2001 ).

We have undertaken a project to characterize EppoMNPV, including the sequencing of its entire genome. In this report, we present the sequence of the EppoMNPV genome and the analysis of the encoded open reading frames (ORFs). Whole genome phylogenetic analyses are presented for the ten baculoviruses that have now been completely sequenced.


   Methods
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
{blacksquare} Viral DNA, cloning and vectors.
EppoMNPV DNA was purified as described previously (Hyink et al., 1998 ). A complete HindIII restriction fragment library and a partial EcoRI restriction fragment library were isolated as described previously (Hyink et al., 1998 ). Vectors used for subcloning and sequencing were pBluescript SK(+) (Stratagene), pGEM-5Zf(+/-) (Promega) and pLITMUS28 (New England Biolabs).

{blacksquare} Sequencing strategy.
A targeted sequencing strategy was used to sequence the EppoMNPV genome. The genome-priming system (New England Biolabs) sequencing method was used for HindIII fragments B–D, G, H, K and P and EcoRI fragments D, H and K. Subcloning from existing restriction fragments and sequencing using universal primers, or sequencing using specifically designed primers, was used to sequence the remainder of the EppoMNPV genome. Where restriction fragment junctions could not be confirmed through sequencing from an overlapping restriction fragment, PCR primers were designed to amplify the junction region from EppoMNPV genomic DNA. The PCR products were then sequenced. Both strands of the entire genome of EppoMNPV were completely sequenced. Sequencing was carried out at the Centre for Gene Research, University of Otago, New Zealand using an ABI 377 DNA Sequencer.

{blacksquare} Sequence analysis.
Assembly and analysis of sequence information was performed using DNASTAR. Identification of ORFs was carried out using the BLAST algorithm and the PREDICTPROTEIN application (http://cubic.bioc.columbia.edu/predictprotein/) (Rost, 1996 ). A signal peptide prediction program was used to identify ORFs encoding proteins with probable secretion signals (http://www.cbs.dtu.dk/services/SignalP/) (Nielsen et al., 1997 ). The EppoMNPV repeat regions were identified using the FINDPATTERNS and TANDEM applications in GCG (http://angis.otago.ac.nz).

{blacksquare} Phylogenetic analyses.
The baculovirus repeated ORF (bro) genes were not included in any of these analyses because of the difficulty of establishing orthology between family members.

{blacksquare} Phylogeny based on gene sequences.
The amino acid sequences of each of the 62 genes common to all ten baculovirus genomes were aligned using CLUSTALW and the alignments refined by eye. These 62 alignments were then concatenated to form a single alignment of 25682 amino acids. This was analysed using both maximum parsimony (heuristic search with 20 random additions) and neighbour joining in PAUP* (Felsenstein, 1995 ). Branch support was evaluated by 1000 bootstrap replicates.

{blacksquare} Phylogeny based on gene order.
Breakpoint analysis was undertaken between all sequenced baculoviruses. The number of breakpoints (points in the gene order where there is a discontinuity) was calculated for every possible pair of genomes and then divided by the number of genes shared by each pair to obtain a matrix of relative breakpoint distances. A phylogenetic tree was calculated based on this matrix using the program NEIGHBOR (from the PHYLIP software package) (Felsenstein, 1995 ).

Neighbour pair analysis of baculovirus genes was undertaken as described by Herniou et al. (2001) . Gene order among the 62 shared genes was evaluated by examining each genome for the presence of all possible neighbouring pairs of genes. For each genome, the presence or absence of each possible gene pair was coded as either 1 or 0, respectively. Neighbour gene pairs resulting in constant characters (present in all genomes or absent from all genomes) were not taken into account. This gave a binary matrix of 99 characters, of which 71 were parsimony informative. A maximum-parsimony analysis (exhaustive search) of this data matrix was performed using PAUP*. Branch support was evaluated by 1000 bootstrap replicates.

{blacksquare} Phylogeny based on gene content.
A matrix was generated recording the presence or absence of each baculovirus gene in each genome. A total of 417 distinct genes were recorded in this matrix. Of these, 144 were parsimony informative. Phylogenetic analyses were performed using maximum parsimony in PAUP* (exhaustive search). Branch support was evaluated by 1000 bootstrap replicates.

{blacksquare} GenBank accession numbers.
The GenBank accession numbers for the NPV and GV sequences reported previously are as follows: AcMNPV, L22858 (Ayres et al., 1994 ); BmNPV, L33180 (Gomi et al., 1999 ); CpGV, U53466 (Luque et al., 2001 ); HaSNPV, AF271059 (Chen et al., 2001 ); LdMNPV, AF081810 (Kuzio et al., 1999 ); OpMNPV, U75930 (Ahrens et al., 1997 ); PxGV, AF270937 (Hashimoto et al., 2000 ); SeMNPV, AF169823 (IJkel et al., 1999 ); and XecnGV, AF162221 (Hayakawa et al., 1999 ).


   Results and Discussion
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
The EppoMNPV genome
The dsDNA genome of the baculovirus pathogenic for Epiphyas postvittana has been entirely sequenced. Sequence data were assembled into a contiguous 118584 bp sequence, which is consistent with the predicted size of 118·5 to 119 kbp (Hyink et al., 1998 ). The zero point identified previously (Hyink et al., 1998 ) was maintained as the EcoRI site of the EcoRI L fragment, which contains the polh gene, with polh facing in the reverse orientation for consistent alignment with other group I NPVs. Table 1 highlights the sizes and some of the features of the ten baculoviruses that have now been sequenced. These ten baculoviruses all infect lepidopteran insects. The two extremes in size are the PxGV, with the smallest genome at 101 kbp (Hashimoto et al., 2000 ), and XecnGV, with the largest genome at 179 kbp (Hayakawa et al., 1999 ). These could represent two extremes in the GV population. Seven NPV genomes have now been sequenced and EppoMNPV is the smallest of these. The sizes of a number of other lepidopteran NPVs have been determined by restriction fragment analysis (Chen et al., 1996 ; Cheng & Carner, 2000 ; Das & Prasad, 1996 ; Hu et al., 1998 ; Li et al., 1997 ; Richards et al., 1999 ; Sadler et al., 2000 ) and these are all larger than EppoMNPV. The EppoMNPV genome has a G+C content of 40·7%, which is similar to all the other sequenced baculovirus genomes except OpMNPV and LdMNPV (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Comparison of the fully sequenced baculoviruses

 
Two criteria were used to identify potential ORFs on the EppoMNPV genome. By convention (Ahrens et al., 1997 ; Ayres et al., 1994 ), the minimum size was set at 150 bp, which identified 302 methionine-initiated ORFs. Subsequent elimination of ORFs located predominantly within larger ORFs identified 136 putative ORFs on the EppoMNPV genome. The orientations of the EppoMNPV ORFs were very evenly split, with 69 facing in the forward (clockwise) orientation and 67 facing in the reverse orientation. The predicted ORFs span 92·9% of the EppoMNPV genome, leaving very little noncoding DNA. The presence of promoter motifs in the 150 bp upstream of the methionine initiation codons identified 79 ORFs with consensus baculovirus late promoter sequences (DTAAG) and 37 with the consensus early promoter motif (TATA) and cap site (CABH) 25–35 bp downstream of the TATA sequence. The presence of other promoter elements, such as upstream-activating regions and initiator-containing regions, on the EppoMNPV genome were not determined. Six repeat regions were also identified on the EppoMNPV genome, five being homologous repeat (hr) regions and one being unique to EppoMNPV. The predicted repeat regions account for 2·2% of the EppoMNPV genome, leaving just 4·9% as noncoding, intergenic DNA. Fig. 1 presents the circular genome of EppoMNPV with all the identified ORFs and repeat regions. Based on the sequence data, some minor changes to restriction fragment naming have occurred from the published restriction map of the EppoMNPV genome (Hyink et al., 1998 ). The previous HindIII fragment R is now fragment S (and vice versa) and the previous BamHI fragments D, F and G are now fragments F, G and D, respectively. Table 2 presents a summary of the current knowledge about the ORFs encoded by EppoMNPV.



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 1. Circular map of the EppoMNPV genome. The sites for the restriction enzymes HindIII (outer circle) and EcoRI (inner circle) are shown with fragments named according to size. Fragments R and S (*) are reversed from the original map published by Hyink et al. (1998) . Arrows indicate the size, location and direction of the 136 ORFs identified on the EppoMNPV genome. Open arrows indicate ORFs conserved between all the sequenced baculoviruses, shaded arrows indicate ORFs that have homologues in some but not all baculoviruses and black arrows indicate ORFs unique to EppoMNPV (u1–u8). The positions of repeat regions are indicated by black boxes. The unique repeat is identified as UR.

 

View this table:
[in this window]
[in a new window]
 
Table 2. Potentially expressed ORFs encoded by EppoMNPV

 

View this table:
[in this window]
[in a new window]
 
Table 2 (cont.)

 

View this table:
[in this window]
[in a new window]
 
Table 2 (cont.)

 

View this table:
[in this window]
[in a new window]
 
Table 2 (cont.)

 
Gene content of baculoviruses
EppoMNPV shares 119–126 ORFs with sequenced group I NPVs, 90–97 with sequenced group II NPVs and 67–74 with the three sequenced GVs. A recent review identified 67 genes as ‘core’ baculovirus genes from seven fully sequenced baculoviruses (Hayakawa et al., 2000 ). Since then, the genomes of HaSNPV (Chen et al., 2001 ) and CpGV (Luque et al., 2001 ) have also been completed and this number has reduced to 63, with HaSNPV lacking a homologue of ORF ac150 and CpGV lacking homologues of ORFs ac76, p10 and ie0. In this study, we have identified 62 ORFs common to the ten baculoviruses sequenced to date. EppoMNPV lacks a homologue of the auxiliary gene superoxide dismutase (sod). In the only published study of the sod gene encoded by AcMNPV (Tomalski et al., 1991 ), it was not found to give any advantage to the wild-type virus over a sod knock-out virus. We also identified 86 ORFs as being common to all NPVs, with some of these also occurring in one of the three GVs. Similarly, 116 genes were common to all four group I NPVs, reiterating the close phylogenetic relationship between these viruses.

Phylogenetic analysis of baculoviruses based on gene sequences, order and content
Phylogeny of baculoviruses has usually been performed using the highly conserved polh gene. Some of the limitations of using this single gene have been discussed previously (Federici & Hice, 1997 ). Phylogeny analysis based on gene sequences for the ten baculovirus sequences available to date was performed using a concatenated alignment of all 62 genes shared by all the viruses. A single most parsimonious tree, strongly supported by high bootstrap scores, was obtained from this analysis (Fig. 2A). It shows that the ten baculoviruses can be subdivided into three groups, the group I and II NPVs and the GVs, with EppoMNPV belonging to the group I NPVs and most closely related to OpMNPV. The relationships among the other baculoviruses are in agreement with the conclusions drawn in a previous study of baculovirus phylogeny using complete genomes (Herniou et al., 2001 ).



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 2. Whole genome phylogenetic tree. (A) Most parsimonious tree obtained from the concatenated 62 gene sequences data set. Numbers indicate percentage bootstrap support for maximum-parsimony and neighbour-joining analyses (1000 replicates). (B) Tree obtained with gene content analysis. Numbers indicate percentage bootstrap support (1000 replicates). Scale indicates the number of changes in gene content. (C) Tree obtained from the relative breakpoint distance analysis. Scale indicates the relative breakpoint distance generated in Table 3. (D) Tree obtained for the neighbour pair analysis. Numbers indicate bootstrap support (1000 replicates). Scale indicates changes in the order of the 62 conserved genes.

 

View this table:
[in this window]
[in a new window]
 
Table 3. Relative breakpoint distance matrix

 
Phylogenies were also inferred based on gene content (Fig. 2B) and gene order (Fig. 2C, D) differences between these genomes. Two methods were used to infer phylogenies based on gene order. Pairwise comparison of all ten baculovirus genomes was used to determine the minimum number of breakpoints that could account for the changes in gene order between each pair of viruses. These breakpoint numbers were then used to compile a matrix of relative breakpoint distances (Table 3) for phylogenetic analysis. Neighbour pair analysis looked at gene order among the 62 conserved genes by examining each genome for the presence of all possible neighbouring genes. These analyses gave very similar trees, each agreeing on the subdivision of these viruses into group I and II NPVs and the GVs and each indicating that EppoMNPV is most closely related to OpMNPV (Fig. 2). There was some disagreement on the precise relationships among the group II NPVs and the GVs. As discussed previously (Herniou et al., 2001 ), the tree based on the combined 62 gene sequences is, in our opinion, the most plausible representation of the relationships between these ten viruses. The gene order- and gene content-based methods are not yet able to resolve the relationships among the group II NPVs and/or the GVs due to the small number of genome sequences available and the greater divergence in these baculovirus groups. The data obtained here confirm the close phylogenetic relatedness of EppoMNPV and OpMNPV as indicated by previous studies (Caradoc-Davies et al., 2001 ; Hyink et al., 1998 ) based on phylogenies of the polh and egt genes. These studies and the information given in Table 2 suggest that EppoMNPV is even more closely related to Choristoneura fumiferana MNPV (CfMNPV), with individual EppoMNPV ORFs frequently showing higher homology to CfMNPV ORFs (where sequence information is available for CfMNPV). The hosts of EppoMNPV and CfMNPV belong to the family Tortricidae.

Gene content comparison of EppoMNPV with other group I NPVs
Table 1 summarizes the gene content of EppoMNPV in comparison to the other fully sequenced baculoviruses. EppoMNPV is the fourth group I NPV to be sequenced and a total of 116 ORFs are conserved between all the members of this group of NPVs. EppoMNPV shares 126 ORFs with OpMNPV and 120 with AcMNPV, with average identities of 64·7 and 53·5%, respectively. EppoMNPV shares 119 ORFs with BmNPV, the fourth fully sequenced group I NPV. A number of ORFs have now been identified as unique to this group of NPVs (Table 4), including some which encode well-characterized proteins. For example, gp64 is an essential glycoprotein found on the surface of budded virions and functions in the fusion of these virus particles to cells (Blissard & Rohrmann, 1989 ). Previous research has suggested that the acquisition of this gene by the group I NPVs has promoted virus diversification (Pearson et al., 2000 ). Also unique to group I NPVs are the ie2 and lef-7 genes, which are involved in viral gene expression (Carson et al., 1991 ; Kool et al., 1994 ; Morris et al., 1994 ). The iap-1 gene, a member of the apoptosis inhibitor family of proteins, has been shown to inhibit apoptosis (Maguire et al., 2000 ). The remainder of the ORFs unique to group I NPVs have not been functionally characterized. The gene named gta, EppoMNPV ORF 39 from database homologies, contains seven motifs common to the SNF2/SWI2 family of proteins which are involved in chromatin remodelling (for reviews see Pazin & Kadonaga, 1997 ; Peterson, 1996 ; Tsukiyama & Wu, 1997 ). EppoMNPV ORF 30 shows homology to a set of repeated ORFs, designated the tryptophan repeat family of proteins, in the Amsacta moorei entomopoxvirus (Bawden et al., 2000 ). EppoMNPV ORF 108 encodes a protein predicted to contain membrane-spanning regions, indicating that it encodes a transmembrane protein.


View this table:
[in this window]
[in a new window]
 
Table 4. Comparison of gene content in group I NPVs

 
Only ten of the 136 EppoMNPV ORFs identified are not present on the OpMNPV genome. Eight of these ORFs, labelled u1 to u8 on the genome map (Fig. 1) and in Table 2, have no homology to any genes in the databases making them unique to EppoMNPV. Motifs recognized for these unique ORFs (Table 2) are a predicted signal sequence for ORF 113, early promoter motifs (TATA and CABH) for ORFs 28, 43, 132 and 134 and late promoter motifs (DTAAG) for ORFs 9, 132 and 134. The presence of such motifs suggests that these ORFs encode proteins, although this remains to be elucidated. The other two ORFs without homologues in OpMNPV are ORFs 62 and 101. ORF 62 has 57·2% identity to AcMNPV ORF 69 and is required for the activity of EppoMNPV IAP-2 (ORF 63) (Maguire et al., 2000 ). The N-terminal region of ORF 101 has 38·8% identity to the whole XecnGV ORF 31, while the C-terminal region has 28·1% identity to the whole XecnGV ORF 30, making ORF 101 a conjugate of these two XecnGV genes (Hayakawa et al., 1999 ). To ensure this was not a sequencing artefact, ORF 101 was amplified by PCR from EppoMNPV genomic DNA and sequenced. This confirmed the presence and correct size of ORF 101. Interestingly, a homologue of XecnGV ORF 30 but not of ORF 31 is found in CpGV (Luque et al., 2001 ). No functional information is available for either of these XecnGV ORFs and no motifs that can be attributed to function could be identified on these or EppoMNPV ORF 101.

Six ORFs identified previously as being unique to OpMNPV have homologues on the EppoMNPV genome (Table 4). The EppoMNPV ORF 27 and 99 gene products both have multiple predicted membrane-spanning regions, indicating that they could encode transmembrane proteins. The EppoMNPV ORF 92 gene product, as described by Ahrens et al. (1997) , has a truncated BIR domain and a RING-finger motif and was therefore designated iap-4. The predicted proteins encoded by EppoMNPV ORFs 2, 3 and 29 have no recognizable motifs that can be attributed to function. Overall, the EppoMNPV genome shows close similarity to the genome of OpMNPV, with phylogenetic analysis consistently and strongly supporting the grouping of these two viruses.

The main differences between the EppoMNPV and OpMNPV genomes occur at two points on the OpMNPV genome, where OpMNPV has additional genes (Fig. 3). The largest of these clusters includes OpMNPV ORFs 28–34, which include the sod and conotoxin-like (ctl-2) genes, plus a dUTPase and two ribonucleotide reductase genes. Comparison of the OpMNPV and AcMNPV genomes reveals a larger cluster of genes, OpMNPV ORFs 30–37, all absent in AcMNPV. The last three ORFs (ORFs 35–37) do have homologues in EppoMNPV. The question that arises is have AcMNPV and EppoMNPV lost these genes or has OpMNPV gained them? One possible model for the loss of sod from EppoMNPV is that the OpMNPV/EppoMNPV lineage contained the OpMNPV ORFs 30–37 gene cluster and that EppoMNPV then lost the homologues of OpMNPV ORFs 28–34, thus losing sod. OpMNPV ORFs 147–150 also lack homologues in EppoMNPV. The proteins encoded by OpMNPV ORFs 147–149, opep-3, opep-2 and p8.9 have been described previously by Wu et al. (1993) and Shippam et al. (1997) . These ORFs occur between the odv-e56 and ie2 genes in OpMNPV, where ie2 is followed by an hr region. In EppoMNPV, the odv-e56 gene is followed by an hr region and ie2 faces in the opposite orientation to the ie2 gene on the OpMNPV genome. Interestingly, AcMNPV has the odv-e56 and ie2 genes in the same orientation as EppoMNPV, with two small ORFs between them instead of an hr region. This suggests a rearrangement has occurred in OpMNPV with the acquisition of the ORF 147–150 gene cluster. The BmNPV genome is identical to AcMNPV in gene order in these two regions, with the exception of the bro-a gene found upstream of the sod gene homologue. These two gene clusters account for 7·9 kbp of the 13·4 kbp size difference between EppoMNPV and OpMNPV. The remainder of the size difference between EppoMNPV and OpMNPV can be attributed to extra ORFs at apparently random locations around the OpMNPV genome and the OpMNPV I-R and GT repeat regions.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 3. Two regions of variability between EppoMNPV, OpMNPV and AcMNPV. (A) Variation at the group I NPV sod locus. The regions from fgf to ORF 30 are shown for the three viruses. (B) The region from the odv-e56 gene to the ie2 gene is shown for the three viruses. Horizontal brackets indicate possible insertions/deletions and a double arrow indicates an inversion. Single arrows indicate gene conservation; open arrows indicate ORFs found in all three viruses, dotted arrows indicate ORFs conserved between EppoMNPV and OpMNPV, shaded arrows indicate ORFs conserved between OpMNPV and AcMNPV and black arrows are ORFs in OpMNPV but not in EppoMNPV or AcMNPV. EppoMNPV ORF 28 (grey arrow) is unique to EppoMNPV.

 
Why is the EppoMNPV genome smaller than other NPV genomes?
Being the smallest group I NPV genome sequenced to date makes EppoMNPV the closest current approximation to a minimal group I NPV genome. A gene content comparison between EppoMNPV, the other group I NPVs and the other baculoviruses highlights some key reasons for the small size of EppoMNPV. As discussed above, the main differences between EppoMNPV and OpMNPV are in regions where OpMNPV has clusters of additional genes. In addition, there are several other ORFs found in at least two of the other group I NPVs that EppoMNPV does not encode (Table 4). The sod gene is the only gene, fully conserved between all the other fully sequenced baculoviruses, that is not encoded by EppoMNPV. Other baculoviruses have also been shown to be missing certain auxiliary genes: PxGV lacks homologues of chitinase (chi) and cathepsin (v-cath) (EppoMNPV ORFs 110 and 111) and XecnGV lacks a homologue of egt (EppoMNPV ORF 12) (Hashimoto et al., 2000 ; Hayakawa et al., 1999 ).

EppoMNPV lacks homologues of three other genes common to all the other group I NPVs. These are homologues of the AcMNPV ORFs 47, 87 (p15) and 122. The protein encoded by BmNPV p15 has been proposed to be a capsid-associated protein (Lu & Iatrou, 1997 ). EppoMNPV also lacks homologues of 13 ORFs that are found in two of the other three group I NPVs. Like BmNPV, no homologue of the ctl gene, pcna or AcMNPV ORF 85 were identified on the EppoMNPV genome. AcMNPV and OpMNPV both have these ORFs, with OpMNPV encoding two ctl genes (Ahrens et al., 1997 ; Ayres et al., 1994 ). Homologues of the ctl ORF have also been identified in LdMNPV, XecnGV and Buzura suppressaria SNPV (Hayakawa et al., 1999 ; Hu et al., 1998 ; Kuzio et al., 1999 ). Ten ORFs shared between AcMNPV and BmNPV have no homologues in either EppoMNPV or OpMNPV (Table 3).

The size of some of the larger baculoviruses has been attributed to the presence of repeated genes (Hayakawa et al., 2000 ). Approximately 10% of the genome of LdMNPV, the largest NPV genome, encodes copies of the bro genes (Kuzio et al., 1999 ). The XecnGV genome has four copies of an enhancin gene, three variants of p10, seven copies of bro genes, four ORFs with homology to the AcMNPV ORF 145–150 group and a further five ORFs that have similarity to each other but not to other genes in GenBank (Hayakawa et al., 1999 ). EppoMNPV has just one gene, ORF 103, with homology to the bro genes. This gene is 174 amino acids in length and shows 72·7% identity to OpMNPV ORF 116, which is even smaller at only 88 amino acids in length. The EppoMNPV bro homologue shows weak homology to other bro genes, with the best identity being 24·1% to one of the BmNPV bro genes. The iap genes have been classified as repeated ORFs (Hayakawa et al., 2000 ) and EppoMNPV, despite its small size, has four of these (iap1–4) (Maguire et al., 2000 ). The functions of the different classes of iap have still to be elucidated but both IAP1 and IAP2 are anti-apoptotic (Maguire et al., 2000 ) and IAP3 from OpMNPV is also a functional apoptosis inhibitor.

Repeat regions identified on EppoMNPV
EppoMNPV, like most other baculoviruses, has hr regions dispersed throughout its genome. Five hr sequences have been identified with a 30 bp imperfect palindrome inside a directly repeated sequence, with the number of palindromes per hr region varying from two to eight (Fig. 4A). The consensus palindrome sequence was obtained by taking bases that occurred at a frequency of greater than 70% at a given position. The EppoMNPV hr regions show considerably more variation than those of OpMNPV and AcMNPV (Fig. 4B). The similarity between the consensus palindrome sequences of EppoMNPV, AcMNPV and OpMNPV shows little correlation to the overall relatedness of these viruses, with the EppoMNPV palindromes being more similar to those of AcMNPV than OpMNPV. The hr sequences have been identified as enhancer elements for gene expression and can act as origins of replication (Leisy & Rohrmann, 1993 ). The baculovirus early gene transactivator has been shown to bind the 8 bp, three bases in from the ends of the palindrome sequence in AcMNPV (Fig. 4B) (Rodems & Friesen, 1995 ). It would be expected that the IE1 proteins from EppoMNPV and OpMNPV bind the same regions of the palindrome sequences in these viruses. Fig. 3 highlights two highly variable regions between EppoMNPV, OpMNPV and AcMNPV. Hr regions are present in or around these regions, suggesting a possible role for hr regions in recombination events in baculoviruses. LdMNPV, the most divergent of the NPVs sequenced to date, has more hr regions than any of the other completely sequenced baculoviruses (Hayakawa et al., 2000 ).



View larger version (114K):
[in this window]
[in a new window]
 
Fig. 4. The repeat regions of EppoMNPV. (A) The sequences of each of the hr regions identified on the EppoMNPV genome are shown. The shaded area indicates the consensus palindrome sequence. The consensus sequence is based on a nucleotide being present at greater than 70% at a given position. Where two bases are indicated these two are the most frequent bases occurring at this position and, together, make up greater than 70% at that position. (B) The consensus palindrome sequence aligned to those of AcMNPV and OpMNPV. The eight bases shown to bind the IE1 transactivator in AcMNPV are underlined, as are the equivalent regions for the OpMNPV and EppoMNPV palindromes. (C) The aligned sequence of the direct UR region identified on HindIII fragment C. Lines indicate gaps introduced to aid repeat alignment.

 
A unique repeat (UR) region (Fig. 4C) has been identified on the HindIII C fragment, upstream of the gp64 gene and downstream of ORF 113, a unique ORF identified on the EppoMNPV genome. The 301 bp repeat consists of a directly repeated sequence of 28 bp and shows no similarity to repeats found in other baculoviruses. OpMNPV has a unique G/T repeat region at the same genomic location (Ahrens et al., 1997 ).

Summary
EppoMNPV is the tenth baculovirus to have been fully sequenced and is the smallest group I NPV genome sequenced to date. EppoMNPV shares 116 ORFs with the three fully sequenced group I NPVs and 62 ORFs with all other sequenced baculoviruses. The EppoMNPV genome contains eight randomly dispersed, unique ORFs and lacks a homologue of the sod gene, which is present in nine fully sequenced baculoviruses. The only repeated genes present are the four iap genes. EppoMNPV has five hr regions and two of these are located in the regions showing the most variability with OpMNPV. Phylogenetic analysis of the whole genomes of baculoviruses places EppoMNPV in the group I NPVs, closely related to OpMNPV.


   Acknowledgments
 
This research was supported by the Marsden Fund (grant UOO602) and by the Foundation for Research, Science and Technology, New Zealand (CO6X0001). E.A.H. was supported by a NERC CASE studentship (GT04/99/TS/142).


   Footnotes
 
The GenBank accession number of the EppoMNPV genome sequence is AY043265


   References
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
Ahrens, C. H., Russell, R. L., Funk, C. J., Evans, J. T., Harwood, S. H. & Rohrmann, G. F. (1997). The sequence of the Orgyia pseudotsugata multinucleocapsid nuclear polyhedrosis virus genome. Virology 229, 381-399.[Medline]

Ayres, M. D., Howard, S. C., Kuzio, J., Lopez-Ferber, M. & Possee, R. D. (1994). The complete DNA sequence of Autographa californica nuclear polyhedrosis virus. Virology 202, 586-605.[Medline]

Bawden, A. L., Glassberg, K. J., Diggans, J., Shaw, R., Farmerie, W. & Moyer, R. W. (2000). Complete genomic sequence of the Amsacta moorei entomopoxvirus: analysis and comparison with other poxviruses. Virology 274, 120-139.[Medline]

Blissard, G. W. & Rohrmann, G. F. (1989). Location, sequence, transcriptional mapping, and temporal expression of the gp64 envelope glycoprotein gene of the Orgyia pseudotsugata multicapsid nuclear polyhedrosis virus. Virology 170, 537-555.[Medline]

Blissard, G. W., Black, B., Crook, N., Keddie, B. A., Possee, R. D., Rohrmann, G. F., Theilmann, D. & Volkman, L. (2000). Baculoviridae. In Virus Taxonomy: Seventh Report of the International Committee on Taxonomy of Viruses , pp. 195-202. Edited by M. H. V. van Regenmortel, C. M. Fauquet, D. H. L. Bishop, E. B. Carstens, M. K. Estes, S. M. Lemon, J. Maniloff, M. A. Mayo, D. J. McGeoch, C. R. Pringle & R. B. Wickner. San Diego:Academic Press.

Caradoc-Davies, K. M. B., Graves, S., O’Reilly, D. R., Evans, O. P. & Ward, V. K. (2001). Identification and in vivo characterization of the Epiphyas postvittana nucleopolyhedrovirus ecdysteroid UDP-glucosyltransferase. Virus Genes 22, 255-264.[Medline]

Carson, D. D., Summers, M. D. & Guarino, L. A. (1991). Molecular analysis of a baculovirus regulatory gene. Virology 182, 279-286.[Medline]

Chen, C.-J., Leisy, D. J. & Thiem, S. M. (1996). Physical map of Anagrapha falcifera multinucleocapsid nuclear polyhedrosis virus. Journal of General Virology 77, 167-171.[Abstract]

Chen, X., IJkel, W. F. J., Tarchini, R., Sun, X., Sandbrink, H., Wang, H., Peters, S., Zuidema, D., Lankhorst, R. K., Vlak, J. M. & Hu, Z. (2001). The sequence of the Helicoverpa armigera single nucleocapsid nucleopolyhedrovirus genome. Journal of General Virology 82, 241-257.[Abstract/Free Full Text]

Cheng, X. W. & Carner, G. R. (2000). Characterization of a single-nucleocapsid nucleopolyhedrovirus of Thysanoplusia orichalcea L. (Lepidoptera: Noctuidae) from Indonesia. Journal of Invertebrate Pathology 75, 279-287.[Medline]

Das, R. H. & Prasad, Y. D. (1996). Restriction endonuclease analysis of a Spodoptera litura nuclear polyhedrosis virus (NPV) isolate. Biochemistry and Molecular Biology International 39, 1-11.[Medline]

Federici, B. A. & Hice, R. H. (1997). Organization and molecular characterization of genes in the polyhedrin region of the Anagrapha falcifera multinucleocapsid NPV. Archives of Virology 142, 333-348.[Medline]

Felsenstein, J. (1995). PHYLIP: Phylogeny Inference Package, version 3.5c. University of Washington, Seattle, WA, USA.

Gomi, S., Majima, K. & Maeda, S. (1999). Sequence analysis of the genome of Bombyx mori nucleopolyhedrovirus. Journal of General Virology 80, 1323-1337.[Abstract]

Hashimoto, Y., Hayakawa, T., Ueno, Y., Fujita, T., Sano, Y. & Matsumoto, T. (2000). Sequence analysis of the Plutella xylostella granulovirus genome. Virology 275, 358-372.[Medline]

Hayakawa, T., Ko, R., Okano, K., Seong, S. I., Goto, C. & Maeda, S. (1999). Sequence analysis of the Xestia c-nigrum granulovirus genome. Virology 262, 277-297.[Medline]

Hayakawa, T., Rohrmann, G. F. & Hashimoto, Y. (2000). Patterns of genome organization and content in lepidopteran baculoviruses. Virology 278, 1-12.[Medline]

Herniou, E. A., Luque, T., Chen, X., Vlak, J. M., Winstanley, D., Cory, J. S. & O’Reilly, D. R. (2001). Use of whole genome sequence data to infer baculovirus phylogeny. Journal of Virology 75, 8117-8126.[Abstract/Free Full Text]

Hu, Z. H., Arif, B. M., Jin, F., Martens, J. W. M., Chen, X. W., Sun, J. S., Zuidema, D., Goldbach, R. W. & Vlak, J. M. (1998). Distinct gene arrangement in the Buzura suppressaria single-nucleocapsid nucleopolyhedrovirus genome. Journal of General Virology 79, 2841-2851.[Abstract]

Hyink, O., Graves, S., Fairbairn, F. M. & Ward, V. K. (1998). Mapping and polyhedrin gene analysis of the Epiphyas postvittana nucleopolyhedrovirus genome. Journal of General Virology 79, 2853-2862.[Abstract]

IJkel, W. F. J., van Strien, E. A., Heldens, J. G. M., Broer, R., Zuidema, D., Goldbach, R. W. & Vlak, J. M. (1999). Sequence and organization of the Spodoptera exigua multicapsid nucleopolyhedrovirus genome. Journal of General Virology 80, 3289-3304.[Abstract/Free Full Text]

Kool, M., Ahrens, C. H., Goldbach, R. W., Rohrmann, G. F. & Vlak, J. M. (1994). Identification of genes involved in DNA replication of the Autographa californica baculovirus. Proceedings of the National Academy of Sciences, USA 91, 11212-11216.[Abstract/Free Full Text]

Kuzio, J., Pearson, M. N., Harwood, S. H., Funk, C. J., Evans, J. T., Slavicek, J. M. & Rohrmann, G. F. (1999). Sequence and analysis of the genome of a baculovirus pathogenic for Lymantria dispar. Virology 253, 17-34.[Medline]

Leisy, D. J. & Rohrmann, G. F. (1993). Characterization of the replication of plasmids containing hr sequences in baculovirus-infected Spodoptera frugiperda cells. Virology 196, 722-730.[Medline]

Li, S., Erlandson, M., Moody, D. & Gillott, C. (1997). A physical map of the Mamestra configurata nucleopolyhedrovirus genome and sequence analysis of the polyhedrin gene. Journal of General Virology 78, 265-271.[Abstract]

Lu, M. & Iatrou, K. (1997). Characterization of a domain of the genome of BmNPV containing a functional gene for a small capsid protein and harboring deletions eliminating three open reading frames that are present in AcNPV. Gene 185, 69-75.[Medline]

Luque, T., Finch, R., Crook, N., O’Reilly, D. R. & Winstanley, D. (2001). The complete sequence of the Cydia pomonella granulovirus genome. Journal of General Virology 82, 2531-2547.[Abstract/Free Full Text]

Maguire, T., Harrison, P., Hyink, O., Kalmakoff, J. & Ward, V. K. (2000). The inhibitors of apoptosis of Epiphyas postvittana nucleopolyhedrovirus. Journal of General Virology 81, 2803-2811.[Abstract/Free Full Text]

Morris, T. D., Todd, J. W., Fisher, B. & Miller, L. K. (1994). Identification of lef-7: a baculovirus gene affecting late gene expression. Virology 200, 360-369.[Medline]

Nielsen, H., Engelbrecht, J., Brunak, S. & von Heijne, G. (1997). Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering 10, 1-6.[Abstract]

Pazin, M. J. & Kadonaga, J. T. (1997). SWI2/SNF2 and related proteins: ATP-driven motors that disrupt protein–DNA interactions? Cell 88, 737-740.[Medline]

Pearson, M. N., Groten, C. & Rohrmann, G. F. (2000). Identification of the Lymantria dispar nucleopolyhedrovirus envelope fusion protein provides evidence for a phylogenetic division of the Baculoviridae. Journal of Virology 74, 6126-6131.[Abstract/Free Full Text]

Peterson, C. L. (1996). Multiple switches to turn on chromatin? Current Opinion in Genetics and Development 6, 171-175.[Medline]

Richards, A., Speight, M. & Cory, J. (1999). Characterization of a nucleopolyhedrovirus from the vapourer moth, Orgyia antiqua (Lepidoptera Lymantriidae). Journal of Invertebrate Pathology 74, 137-142.[Medline]

Rodems, S. M. & Friesen, P. D. (1995). Transcriptional enhancer activity of hr5 requires dual-palindrome half sites that mediate binding of a dimeric form of the baculovirus transregulator IE1. Journal of Virology 69, 5368-5375.[Abstract]

Rost, B. (1996). PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods in Enzymology 266, 525-539.[Medline]

Sadler, T. J., Glare, T. R., Ward, V. K. & Kalmakoff, J. (2000). Physical and genetic map of the Wiseana nucleopolyhedrovirus genome. Journal of General Virology 81, 1127-1133.[Abstract/Free Full Text]

Shippam, C., Wu, X., Stewart, S. & Theilmann, D. A. (1997). Characterization of a unique OpMNPV-specific early gene not required for viral infection in tissue culture. Virology 227, 447-459.[Medline]

Tomalski, M. D., Eldridge, R. & Miller, L. K. (1991). A baculovirus homolog of a Cu/Zn superoxide dismutase gene. Virology 184, 149-161.[Medline]

Tsukiyama, T. & Wu, C. (1997). Chromatin remodeling and transcription. Current Opinion in Genetics and Development 7, 182-191.[Medline]

Wu, X., Stewart, S. & Theilmann, D. A. (1993). Characterization of an early gene coding for a highly basic 8·9K protein from the Orgyia pseudotsugata multicapsid nuclear polyhedrosis virus. Journal of General Virology 74, 1591-1598.[Abstract]

Zanotto, P. M., Kessing, B. D. & Maruniak, J. E. (1993). Phylogenetic interrelationships among baculoviruses: evolutionary rates and host associations. Journal of Invertebrate Pathology 62, 147-164.[Medline]

Received 26 September 2001; accepted 14 December 2001.