Natural variation among human adenoviruses: genome sequence and annotation of human adenovirus serotype 1

Kim P. Lauer1, Isabel Llorente1, Eric Blair1, Jason Seto1, Vladimir Krasnov1, Anjan Purkayastha1,4,5, Susan E. Ditty2,5, Ted L. Hadfield2,5, Charles Buck3, Clark Tibbetts4,5 and Donald Seto1,4,5

1 Bioinformatics and Computational Biology, School of Computational Sciences, George Mason University, 10900 University Boulevard, MSN 5B3, Manassas, VA 20110, USA
2 Division of Microbiology, Department of Infectious and Parasitic Diseases Pathology, Armed Forces Institute of Pathology, 5300 Georgia Avenue NW, Washington, DC 20306, USA
3 Department of Virology, American Type Culture Collection (ATCC), Manassas, VA 20108, USA
4 HQ USAF Surgeon General Office, Directorate of Modernization (SGR), 5201 Leesburg Pike, Suite 1401, Falls Church, VA 22041, USA
5 Epidemic Outbreak Surveillance (EOS) Consortium, 5201 Leesburg Pike, Suite 1401, Falls Church, VA 22041, USA

Correspondence
Donald Seto
dseto{at}gmu.edu


   ABSTRACT
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
The 36 001 base pair DNA sequence of human adenovirus serotype 1 (HAdV-1) has been determined, using a ‘leveraged primer sequencing strategy’ to generate high quality sequences economically. This annotated genome (GenBank AF534906) confirms anticipated similarity to closely related species C (formerly subgroup), human adenoviruses HAdV-2 and -5, and near identity with earlier reports of sequences representing parts of the HAdV-1 genome. A first round of HAdV-1 sequence data acquisition used PCR amplification and sequencing primers from sequences common to the genomes of HAdV-2 and -5. The subsequent rounds of sequencing used primers derived from the newly generated data. Corroborative re-sequencing with primers selected from this HAdV-1 dataset generated sparsely tiled arrays of high quality sequencing ladders spanning both complementary strands of the HAdV-1 genome. These strategies allow for rapid and accurate low-pass sequencing of genomes. Such rapid genome determinations facilitate the development of specific probes for differentiation of family, serotype, subtype and strain (e.g. pathogen genome signatures). These will be used to monitor epidemic outbreaks of acute respiratory disease in a defined test bed by the Epidemic Outbreak Surveillance (EOS) project.


   INTRODUCTION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Through five decades, since the first detailed characterizations of human adenoviruses (HAdVs) (Rowe et al., 1953; Hillemann & Werner, 1954; Buescher, 1967; Benko et al., 2000), this host–pathogen system has repeatedly served to catalyse insights into molecular biology and genetics, as well as complex epidemiology and pathogenesis (Wadell, 1984).

Recognized diversity among the HAdV serotypes became understood in relation to six clades or species (formerly subgroups). Among these, members of the HAdV-C (serotypes 1, 2, 5 and 6) cause typically benign respiratory and gastrointestinal infections in endemic pattern among most human hosts in early childhood. The serotypes of this species can establish lifelong persistent shedding infections of lymphoid tissues accounting for their early identification as outgrowths of human cell culture exposed to tissue extracts from even apparently healthy individuals.

A strikingly different pattern of epidemiology and pathogenesis is shared by species B1 and E serotypes HAdV-3, -4, -7 and -21. In contrast, these adenoviruses (AdV) have been associated with otherwise healthy young adults, causing epidemic outbreaks of acute respiratory disease (ARD) among basic military trainees (Dudding et al., 1972). These costly outbreaks were controlled by the introduction of effective live virus vaccines since about 1970. However, the manufacture of HAdV vaccines ceased in 1996, which has led to the recurrence of frequent HAdV-based ARD epidemics at military basic training venues (Gray et al., 2000). Such outbreaks are characterized by extensive morbidity and occasional mortality, adding a human toll to high economic costs, and reviving earlier broad interest in HAdV (Ryan et al., 2001). The Epidemic Outbreak Surveillance (EOS) Consortium has undertaken an integration of genome sequence-based, advanced diagnostic platforms and medical bioinformatics for near real-time detection and accelerated identification of the aetiology of specific ARD outbreaks.

The first reported genome sequences of human adenoviruses were those of HAdV-2 and -5 (Roberts et al., 1986; Chroboczek et al., 1992), both members of the HAdV-C species. These were presented as mosaic genome sequences from different laboratories using different methodologies, and at different times. Subsequently, genomes of serotypes representing species A (HAdV-12), D (HAdV-17) and F (HAdV-40) have been deposited in GenBank. Renewed interest in HAdV genomes has led to a recent ‘third party annotation’ of these HAdV genomes (Davison et al., 2003), as well as to determinations of a pair of species B2 genomes (twice each), HAdV-11 and -35 (Mei et al., 2003; Stone et al., 2003; Gao et al., 2003; Vogels et al., 2003), and to redetermination of HAdV-5 genome (Sugarman et al., 2003).

This genome sequencing of the ‘first human’ adenovirus, HAdV-1, applies an efficient methodology to acquire diverse and accurate virus genome sequences to identify unambiguously and to distinguish the serotypes, strains and variants. HAdV-C serotypes will likely be isolated from persistently infected military basic trainees, amid the background of pathogens. The leveraged sequencing methodology has assisted in the acquisition of genome sequences from those HAdV serotypes that are aetiologic agents of ARD outbreaks (HAdV-4 and -7), none of which have been previously reported.


   METHODS
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Cells and virus.
HAdV-1 (ATCC VR-1; strain Adenoid 71) was obtained from the American Type Culture Collection (ATCC) and grown in monolayer cultures. Initially, the stocks were expanded in MRC-5 cells, but were subsequently switched to A-549 cells (ATCC CCL-185).

Viral DNA preparation.
DNA was prepared as described previously, with slight modifications (Scarpini et al., 1999). Suspensions were subjected to 7 or 8 freeze–thaw–vortex cycles at 37 °C and –70 °C in order to release the viral DNA. CsCl density-gradient centrifugation to purify further viral DNA was performed in a Beckman VTi65.2 rotor at 40 000 r.p.m.

PCR strategy and methodology.
Standard PCR methodologies were used to amplify regions to be sequenced. PCR conditions employed Pfu Turbo DNA polymerase (Stratagene) at 1·25 U in a total volume containing 50 µl 1x polymerase buffer (Stratagene), dNTPs (200 µM; stock solutions of 1 mM), oligonucleotide primers (0·2 µM; stock solutions of 10 µM) and template DNA (165·8 ng; stock solutions of 33·16 ng µl–1). A PE-Applied Biosystems (ABI) GeneAmp PCR System 9700 thermocycler was used under these conditions: 96 °C for 2 min for one cycle to denature and 94 °C for 30 s; 55 °C for 1 min; 72 °C for 1 min for 25 cycles to amplify. At the end of the cycling, an additional extension period of 72 °C for 10 min was included, after which the samples were stored at 4 °C.

DNA sequencing
Thermocycle sequencing of PCR products.
Fluorescent-based Sanger DNA sequencing protocols were used to determine the genome sequence as described (Seto et al., 1994). PCR products were treated with 0·5 U shrimp alkaline phosphatase (USB) and 1 U exonuclease I (USB) (37 °C for 1 min; 72 °C for 15 min), and then used for direct DNA cycle sequencing in an ABI 377 sequencer. Either bracketing PCR or internal primers were used as sequencing primers to obtain overlapping and complementary sequences and a minimum threefold coverage using the 2+1 rule. Finally, the entire HAdV-1 genome was re-sequenced using primers determined by the initial completed consensus sequence. Reactions were performed as a sparsely tiled PCR and sequencing primer array using an ABI 3100 capillary array sequencer.

Direct genomic cycle sequencing of ends.
Viral ends were sequenced directly with whole genome DNA and appropriate primers. In brief, 16 µl BigDye sequencing solution was used along with 20 pmol of the appropriate primer and 1 µg repurified DNA (in 22 µl recovered from purification: Microcon 30 microconcentrator; Amicon) in a total volume of 40 µl. This sample was denatured initially at 95 °C for 5 min, and then amplified at 95 °C for 30 s and 50 °C for 20 s for 80 cycles, followed by a final extension at 60 °C for 4 min and stored at 4 °C.

Sequence analysis and genome annotation.
DNA sequences were assembled using a beta version of Sequencher 4.1.1 (Gene Codes Corporation). Features of the DNA sequence were revealed using the Wisconsin GCG package (SeqWeb v.2).

Genome sequence was annotated by parsing into 1 kb non-overlapping segments. These were queried systematically against the non-redundant NCBI database using the BLASTX program of the BLAST suite sequence-alignment software (Altschul et al., 1990, 1997). Searches used the default parameters of word size=3 and expectation=10, with the BLOSUM62 substitution matrix and with gap penalties of 11 (existence) and 1 (extension). Low complexity sequences were filtered out of the queries.

For gene predictions, GenomeScan identified exons from the coding sequences where exon–intron borders were difficult to determine. This algorithm uses exon–intron identification combined with similarity searches to a sequence database in order to predict coding sequences in a given DNA fragment (Yeh et al., 2001). Novel sequences, ‘hypothetical proteins’, were also found with GeneMark, a fifth-order Hidden Markov Method (HMM)-based gene prediction software (Besemer & Borodovsky, 1999). In the course of this annotation, while GeneMark had a slightly higher accuracy than GenomeScan, neither was completely accurate nor comprehensive in generating a list of genes.

Artemis, an annotation tool from the Sanger Center (Berriman & Rutherford, 2003; Rutherford et al., 2000) was used to expedite genome annotation (see Table 2). Both the genome sequence and annotation of the HAdV-1 genome are accessible from GenBank (AF534906).


View this table:
[in this window]
[in a new window]
 
Table 2. HAdV-1 genome gene coding annotation

Fifty-three coding regions are identified for HAdV-1 as described in detail in the text. Their proteins and functions are indicated. The nucleotide positions of the start and stop codons, and of the applicable splice sites, are noted (5' to 3' direction). Coding sequences transcribed from the complementary strand are designated by ‘c,’ e.g. ‘(30923–31090)c’.

 
Whole genome comparisons.
Whole genome comparisons between HAdV-1 and serotypes HAdV-2, -5, -12 and -40 were performed with multiple alignment algorithms: PipMaker (http://bio.cse.psu.edu/pipmaker) (Schwartz et al., 2000); FLAG (FAST Local Alignment for Gigabases, (http://flag.itri.org.tw/index.html); Wisconsin GCG package ‘compare’; multiple alignment program (MAP) (Huang, 1994). Gene order and synteny were assessed using GeneOrder2.0 (Zafar et al., 2001).


   RESULTS AND DISCUSSION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Confirmation of serotype
The infectivity of the virus sequenced was specifically neutralized by NIAID HAdV-1 neutralizing rabbit immune serum V-201-501-565 (ATCC VR-1078AS/Rab). When the virus TCID50 was determined in the presence and absence of neutralizing antisera, the NIAID antisera against HAdV-1 reduced the titre by four log10, while the NIAID antisera against the closely related HAdV-2 cross-reacted only slightly, giving a one ‘log reduction’ in titre. This confirms the starting material as HAdV-1.

PCR and initial DNA sequencing strategy
A leveraging strategy to maximize useful data acquisition and to minimize time, effort and cost was developed. PCR primers were generated from a consensus sequence derived from HAdV-2 and -5 using Multiple Alignment Program (Huang, 1994). Potential primers for PCR amplifications and sequencing were identified by PrimOU (http://www.genome.ou.edu/informatics/primou.html). Large PCR products, about 10 kb, were used initially. Smaller products were used for covering gaps and for complementing sequences.

Re-sequencing with a sparsely tiled array
Given the newly determined consensus HAdV-1 genome, primers were picked for efficient and low cost genome re-sequencing based on a sparsely tiled overlap of PCR and sequencing primers. This independent corroborating determination of the genome validates the notion of a rapid sequencing method for examining and understanding related and similar genomes of immediate and urgent interest, for example, of the same species, of the same serotype, of ‘field strains' or even of unknown serotype.

Genome size, structure and heterogeneity
The genome of HAdV-1 is 36 001 bp, compared with 35 937 bp for HAdV-2 (Roberts et al., 1986) and 35 935 bp for HAdV-5 (Chroboczek et al., 1992), its fellow members of the HAdV-C genomes. A recent redetermination of HAdV-5 notes its genome size as 35 934 bp (Sugarman et al., 2003). The genome lengths of other HAdV genomes from different species are consistently smaller: HAdV-12 (A), 34 125 bp; HAdV-17 (D), 35 100 bp; HAdV-40 (F), 34 214 bp; HAdV-11 (B2), 34 794 bp; and HAdV-35 (B2), 34 794 bp. It is anticipated that this genome and others forthcoming will contribute to the understanding of the genomics of HAdV (Davison et al., 2003).

Genome comparison
Dot plot analyses.
Whole genome alignments show a predictable high degree of identity among the HAdV-Cs (1, 2 and 5) (data not shown). Alignments of HAdV-1 against HAdV-12 (A) and HAdV-40 (F) display more variations. Most of these differences (indels and SNPs) lie in the region spanning the late and early E3 and E4 genes. Multiple alignment methods (see Methods) were used to verify these observations. GeneOrder2.0 (Zafar et al., 2001) analyses indicate that, at the protein level, gene order and synteny are maintained among the HAdV-C members.

Repetitive sequences.
Analysis of HAdV-1 against itself reveals no internal repetitive sequences. There is a perfect inverted terminal repeat at the ends of the genome, consistent as a hallmark.

Inverted terminal repeat
Structural features.
Inverted terminal repeats (ITRs) are motifs of AdV genomes (Sprengel et al., 1994). These play important roles in the initiation of AdV DNA replication by providing binding sites for the DNA polymerase complex (Dan et al., 2001). HAdV-1 ITRs are a perfect 103 bp complement of both ends. This contrasts with ITRs of the other HAdVs: 161 bp for HAdV-12 (Sprengel et al., 1994), 163 bp for HAdV-40 (Davison et al., 1993), 146 bp for HAdV-17 (Sequencher alignment with GenBank data) and 137 bp for HAdV-11 (Mei et al., 2003; Stone et al., 2003), but is equivalent to the 102 bp for HAdV-2 (Roberts et al., 1986) and 103 bp for HAdV-5 (Sequencher alignment with GenBank data). The deletion in the HAdV-2 ITR may be a sequencing error revealed by alignments.

HAdV ITRs are considerably shorter than the 368 bp ITR from bovine adenovirus BAdV-10, the longest AdV ITR sequence presently known (Dan et al., 2001). However, these 100+ bp ITR sequences are in line with ones found in other members of the genus Mastadenovirus, in contrast to the much shorter ITRs of the proposed genus Atadenovirus (Dan et al., 2001).

Functional features.
An analysis of the left ITR of HAdV-1 reveals several expected AdV replication factor(s) binding sites (Dan et al., 2001). Both repeats start with the CATCATCAAT motif that is conserved in other HAdV genomes (Stone et al., 2003). The ‘core origin’, originally defined as the minimal DNA requirement for the initiation of AdV replication, is present in its highly conserved form as a core comprising AATAATATACC (Table 1) that binds the pre-terminal protein–DNA polymerase complex. It is present at nt 8–18 (Temperley & Hay, 1992). It is in perfect alignment with its counterparts from the other sequenced HAdV genomes.


View this table:
[in this window]
[in a new window]
 
Table 1. HAdV-1 genome non-coding motifs annotation

DNA sequence motifs are identified for HAdV-1. Their nucleotide signatures and putative functions are indicated. The nucleotide positions of their location are noted in the 5' to 3' orientation. Functionality embedded within the complementary strand is designated by ‘c,’ e.g. ‘(4089–4094)c’.

 
Several eukaryotic transcription factor-binding sequences are present in the HAdV-1 ITR. These factors strongly enhance HAdV replication, presumably by allowing the exploitation of the host's proteins to expedite both replication and transcription of the HAdV genome (Mul et al., 1990; Hatfield & Hearing, 1991). Two blocks of sequences that are required for efficient replication are present in HAdV-1; for example, the first block includes two DNA-binding sites for nuclear factors I and III (NFI and NFIII) (Leegwater et al., 1985; Pruijn et al., 1988; Hatfield & Hearing, 1991). NFI is also known as cellular transcription factor (CTF). Oct-1 binds to the NFIII motif and stimulates transcription initiation by 6–8-fold (Evans & Hearing, 2002). These two motifs act to enhance HAdV replication independently (Mul et al., 1990).

A second critical block contains binding sites for the cellular transcription factors Sp1 and ATF. These contribute to the efficiency of viral DNA replication (Hatfield & Hearing, 1993). The terminus-proximal Sp1 and ATF sites are responsible for promoter activity for the ITR and ITR-dependent E1A stimulated transcription (Hatfield & Hearing, 1991).

DNA-binding sites for ATF are present in two locations within this ITR as perfectly conserved matches (TGACGT) at nt 64–69 and 96–101. The first is highly conserved across all of the above noted HAdV genomes, and the second is present across the HAdV-C members but not HAdV-12, -17 and -40.

The composition of the HAdV-1 ITR shows a ratio of (G+C) to (A+T) of 50 %. This non-GC rich sequence nevertheless contains the putative GC-rich binding sites for Sp1. There is a perfect match of the Sp1 recognition site at nt 50–57 (GGGGGTGG) and an imperfect match at nt 76–83 (GGGCGTGG). The first Sp1 is conserved across the genomes noted above; the second is identical with other HAdV-C members and not conserved in HAdV-12, -17 and -40. There is a third imperfect copy of this Sp1-binding site at nt 87–94 (CGGGGCGG). This is also conserved as such within the HadV-C members and is also present in HAdV-40 (GGCGGGCGG).

Genome annotation
The overall similarity of the HAdV-1 genome to those of other mastadenoviruses suggests that it too is organized into early, intermediate and late transcription regions (Wold & Gooding, 1991). Annotation algorithms (see Methods) identified a total of 53 coding sequences in the genome. These are identified and annotated using the archived GenBank HAdV gene annotations as reference (Table 2). Several predicted or hypothetical genes are identified.

Non-coding features
Sequence motifs.
The ITR and late region control elements have been extensively analysed in the literature. Non-coding DNA sequence motifs play important roles in the biology of the adenovirus. For example, the adenoviral late genes are transcribed from a single promoter known as the major-late promoter (MLP). HAdV-1 analysis yields motifs consistent with those reported. Based on genome sequence comparisons, regulatory elements in the HAdV-1 MLP are identified (Table 1). These include an inverted CAAT box (Reach et al., 1991), the upstream element (Reach et al., 1991), TATA box (Concino et al., 1984) and the MAZ/Sp1-binding sites flanking the TATA box (Parks & Shenk, 1997). Additionally, the initiator element, major late transcript, is located at 6058–6064 (Lee et al., 1988). Two downstream elements are identified in the HAdV-1 genome, corresponding to sites that recognize the IVa2 protein (Leong et al., 1990; Reach et al., 1991).

These and other non-coding DNA sequence motifs, including DNA replication sites, gene expression regulation sites, DNA–protein-binding sites and polyA sites, are detailed in Table 1. These DNA features are also discussed in the following text within the context of the appropriate genes.

Splicing sites.
There are 15 spliced putative genes identified in the HAdV-1 genome (Table 2). These all have the canonical donor–acceptor splicing motifs of GT-AG. Spliced genes are found along both strands, as eight of the 18 complementary strand-encoded genes are spliced. Six of the 14 annotated hypothetical genes appear to be spliced.

VA RNA.
The virus-associated (VA) RNA species are non-protein coding sequences that have been shown to repress the antiviral activity of host interferons (Mathews & Shenk, 1991). VA RNAs I and II coding sequences are located at nt 10 628–10 787 and 10 885–11 042. Both RNAs are 98–100 % identical to their counterparts in other members of HAdV-C (HAdV-2, -5 and -6).

Gene coding features
Early genes: E1A.
The E1A gene is the first transcription unit to be expressed after infection. E1A proteins function as transcriptional regulators within the host cell, modulating both viral and cellular gene expression (Flint & Shenk, 1997). These proteins lack sequence-specific DNA-binding activity (Zu et al., 1992) and apparently control gene expression by interacting with cellular elements of the transcription machinery. Multiple E1A proteins are generated by the alternative splicing of a common RNA precursor transcribed from a constitutively active promoter. Three putative E1A proteins of sizes 6·1, 26·4 and 31·8 kDa are identified in HAdV-1. Sequence comparison of HAdV-1 with the HAdV-2 sequence allows the identification of the putative E1A promoter TATTTA at position 468–473.

E1B.
The E1B region contains four apparent coding sequences. One is a putative 21 kDa protein with high BLAST similarity to the small T antigen that is conserved in other AdVs. Another, the 55·4 kDa protein, has identity to the large T antigen protein, which has been shown to inhibit cellular p53-mediated host defence mechanisms (Yew et al., 1994). The large T antigen protein also plays a role in regulating viral late gene expression. Additionally, two putative coding sequences with BLAST hits against hypothetical proteins in the HAdV-C genome annotation are also identified. These correspond to a 1·26 and 1·36 kb mRNA encoded protein. These have been identified by conceptual translation and have partial identity to the large T antigen protein.

E2.
The E2 transcriptional unit encodes proteins required for viral DNA replication. Along with host cellular proteins, HAdV DNA replication requires three viral-encoded factors: ‘terminal protein’ precursor, DNA polymerase and DNA-binding protein (De Jong et al., 2003). This E2 transcription unit is divided into two regions, E2A and E2B. The E2A late promoter, TACAAATTT, is identified at position 25 987–25 995 by comparing the HAdV-1 and HAdV-2 sequences. This transcript is transcribed from the complementary strand. A 59 kDa DNA-binding protein is identified within E2A transcript. A 135·6 kDa DNA polymerase and a 74·5 kDa terminal protein precursor are identified in the E2B transcript.

E3.
The E3 region is of special interest as it is the insertion site for foreign gene constructs in AdV gene therapy vectors (Russell, 2000). It apparently only exists in members of the mastadenovirus family, with no homologous genes detected in the other four genera (Benko & Harrach, 2003). The HAdV E3 transcriptional region encodes proteins that are not required for efficient virus growth in vitro, but are antagonists to the host immune response (Wold & Gooding, 1991).

In particular, the HAdV-1 E3 region yielded seven proteins upon BLAST analysis. These proteins are as follows: 12·3, 6·8, 18·5, 10·6, 10·2, 14·9 and 14·7 kDa. The 12·3 kDa protein has significant similarity to an immunomodulatory E3 protein counterpart in HAdV-2. The 18·5 kDa protein appears to be similar to the 19 kDa glycosylated protein of HAdV-2. The 10·2 kDa protein has similarity to an E3 protein, which might have a role in down-regulating the epidermal growth factor (EGF) receptor. The 14·7 kDa protein has similarity to an E3 protein known to protect against virus-infected cells during TNF-induced cytolysis (Horton et al., 1990).

E4.
Proteins encoded in the E4 transcription unit perform a range of functions (Leppard, 1997), for example, viral RNA export and stabilization. E4 Orf6 protein combines with the E1B 55 kDa protein to inhibit cellular p53. E4 Orf6/7 protein regulates cellular transcription factor E2F, while E4 Orf4 controls protein phosphorylation in infected cells. A total of eight putative coding sequences in the HAdV-1 E4 region are identified. These include a 34 kDa putative counterpart of the E4 Orf6 protein, a 17·4 kDa protein similar to Orf6/7 protein, a 13·3 kDa putative counterpart of Orf4 protein, a 13·2 kDa putative counterpart of Orf3 protein, a nuclear-binding protein and a 28·2 kDa protein with a dUTPase domain.

Intermediate genes: IX.
Two proteins are produced from the intermediate gene region. Protein IX plays a critical but poorly understood role in controlling DNA packaging. It functions as a structural protein as well as a transcriptional activator. In HAdV-5, protein IX acts as a transcriptional activator for the MLP and other viral and cellular promoters. An ORF encoding a 144 000 protein IX is identified at position 3621–4043.

The second intermediate protein IVa2 plays a serotype-specific role in packaging viral DNA during HAdV assembly (Zhang et al., 2001). It also has a role as a transcription factor for the major-late genes (Binger & Flint, 1984). A protein IVa2 coding sequence is identified at position 4102–5729, in the complementary strand. Splice sites for IVa2, located at complementary 5438 (acceptor) and 5717 (donor), are identical to the GTAG consensus splice sequences.

Late genes.
The HAdV late genes transcribe initially as a single primary transcript from a single promoter known as the major-late promoter (MLP) (Young, 2003). Multiple polyA signals are utilized to produce the various distinct mRNA species. The late transcript is grouped into five families L1–L5. All of these late genes contain the 5' ‘tripartite leader sequence’.

L1.
Two L1 gene proteins, the 52 kDa protein (11 059–12 306) and the protein IIIa (12 327–14 084), are identified. The 52 kDa protein acts as a scaffold for capsid assembly during the assembly of the virus (Hasson et al., 1989), whereas the IIIa protein is found on the outer surface of the virus and seems to have a function in holding the viral facets together (San Martin & Burnett, 2003).

L2.
There are four L2 coding sequences contained in the HAdV-1 genome. The penton base protein III is part of the capsid's 12 pentagonal vertices, and its gene is located at position 14 166–15 890. This penton protein binds to the host integrins via a conserved RGD sequence to trigger virus internalization (Wickham et al., 1993). The RGD motif in the HAdV-1 penton is located at nt 15 183–15 190. Coding sequences for precursors of proteins VII and V, which are found at the viral core, are located at positions 15 897–16 493 and 16 563–17 669, respectively. A coding sequence for an 8·8 kDa protein X was identified at nt 17 697–17 939. Protein X, also known as the mu protein, has no defined function.

L3.
There are three L3 gene coding sequences: the minor capsid protein precursor (pVI), the hexon protein and a 23 kDa protease. The minor capsid protein is found on the inner capsid surface and might play a role as a structural intermediate between the capsid and the viral core. Coding sequence for the protein VI precursor is located at nt 18 022–18 774. The 108·7 kDa HAdV-1 hexon is located at position 18 861–21 755. This hexon protein is the major structural component of the AdV capsid, making up about 63 % of the virion mass. Because of its importance and abundance, its 3D structure has been studied by the use of X-ray crystallography, molecular modelling and sequence-based methods (Rux et al., 2003). HAdV-1 hexon is 964 aa in length. A comparison with other HAdV-C hexons reveals that the HAdV-1 hexon is 89 % identical to the HAdV-2 hexon and 86 % identical to the HAdV-5 hexon. The hexon monomer structure has been reported in the literature, and comprises two eight-stranded {beta}-barrels and three extended loops. Multiple sequence alignment using CLUSTAL (Thompson et al., 1997) revealed four major regions of variation (VR A–D) among the HAdV-C hexons (Fig. 1). When mapped onto the reported 3D structure of the HAdV-2 hexon, all four regions from HAdV-1 hexon map onto a series of outer loops. These four variable loops probably represent the serotype-specific epitopes. Little sequence variation was observed outside these regions, illustrating the contributions of these four regions in defining the virion. Finally, L3 region encodes a 23 kDa protease that is located at position 21 788–22 402. This protease is required for the cleavage of viral proteins during virus maturation and assembly.



View larger version (73K):
[in this window]
[in a new window]
 
Fig. 1. Multiple sequence alignment of the HAdV-C hexons. CLUSTAL alignment of the amino acid sequences of the HAdV-C hexons reveals four major regions of variation (noted as ‘VR A–D’). All variable regions map onto a series of loops in the 3D structure of the Ad2 hexon. CLUSTAL notes amino acid alignments as asterisk (*), conserved amino acid; full-stop (.), either size or hydropathy is conserved and colon (:), both size and hydropathy are conserved.

 
L4.
Four coding sequences are identified in L4, corresponding to the 100 kDa protein (24 118–26 541), the 22 kDa protein (26 252–26 567, 26 770–27 137), the 33 kDa protein (26 252–26 836) and the pVIII protein (27 225–27 908). Splice sites of the 33 kDa protein at positions 26 567 (donor) and 26 770 (acceptor) are identical to the GT-AG consensus splice signals. Studies suggest that the 100 kDa non-structural protein might have a role in hexon assembly and is required for the efficient translation of the late viral mRNAs (Oosterom-Dragon & Ginsberg, 1981; Hayes et al., 1990). The functions of the 22 and 33 kDa proteins are yet to be defined. Protein VIII is found on the interior of the capsid and may form a bridge between the capsid and the viral core elements (San Martin & Burnett, 2003). Like its other adenoviral homologues, the HAdV-1 protein VIII is rich in proline, arginine and lysine (approximately 15 %), and probably has a less ordered structure.

L5.
The L5 region encodes the 61·8 kDa fibre protein. A trimeric fibre assembly protrudes from each of the 12 pentagonal vertices of the icosahedral adenoviral capsid. The N-terminal domain attaches non-covalently to the penton base protein, while the globular C-terminal ‘knob’ domain binds host cells. A study of the crystal structure of the HAdV-12 (A) knob domain bound to the ‘coxsackie and adenovirus receptor’ (CAR) has defined certain key fibre residues that are required for binding (Howitt et al., 2003). These residues include Asp415, Pro417 and Pro418. Another apparently important residue, Lys429, is conserved throughout all HAdV species except for species F. Multiple sequence alignment of the fibre knob sequences of the HAdV-C members and HAdV-12 shows that the aspartate residue is conserved in HAdV-1 and HAdV-2, but replaced with an alanine in HAdV-5 (Fig. 2, under 1). At a key position in all of the HAdV-C fibres, a serine residue is substituted for proline (S417P), unlike HAdV-12 (Fig. 2, under 2). Pro418 is absolutely conserved in these four serotypes (Fig. 2, under 3), as is Lys429 (Fig. 2, under 4). This conservation of key residues between HAdV-12 and the HAdV-C species is expected, given that, like HAdV-12, HAdV-C members also bind CAR.



View larger version (54K):
[in this window]
[in a new window]
 
Fig. 2. Multiple sequence alignment of the knob region of HAdV fibres. Amino acid sequences of the HAdV-C members (HAdV-1, -2 and -5) are aligned with the HAdV-12 fibre sequence. The 3D structure of the HAdV-12 fibre has been solved, and the key residues involved in CAR binding have been mapped. Some of these key residues are marked by numbers at the top of the alignment to show conservation among the CAR binding HAdV-C adenoviruses and relative to HAdV-12: D415 (1), P417 (2), P418 (3) and K429 (4). CLUSTAL notes amino acid alignments as asterisk (*), conserved amino acid; full-stop (.), either size or hydropathy is conserved and colon (:), both size and hydropathy are conserved.

 
Predicted proteins: miscellaneous proteins.
A short ORF with a high BLAST score against the ‘U exon’ region is found at position 30 923–31 090, corresponding to the E3 region and reading in the complementary strand. This is consistent with similar ORFs found between the genes of pVIII and the fibre in the four genera (Benko & Harrach, 2003). The ‘U exon’ was originally reported in HAdV-40 as a small coding region extending from an initiation codon to a splice donor site, and seems to represent the N-terminal exon of a protein (Davison et al., 1993). However, a downstream exon has yet to be identified and this ‘U exon’ has no known function reported.

Hypothetical proteins.
Several hypothetical proteins are predicted in this genome by BLAST analysis of ORFs, and with the aid of gene prediction software. Eleven hypothetical proteins are identified across the entire genome, on both strands. Two are identified by GeneMark; the rest are identified by BLAST analysis as counterparts to hypothetical proteins of AdVs already deposited in GenBank. Of these, four localize to the 5 kb stretch between the L1 52 kDa protein coding sequence and the MLP initiator element. Another four are found in the E2B region, and one is located downstream of the pIVa2 coding sequence. The E1B transcript encodes two hypothetical proteins, which are presumably expressed from 1·26 and 1·31 kb mRNAs. Both these putative proteins have partial identity with the E1B 55 kDa protein. The hypothetical proteins predicted by GeneMark have no similarity to any protein in the GenBank database. The presence of so many hypothetical proteins suggests that the complete set of proteins encoded by the adenoviral genome sequence has not yet been characterized.

Conclusion
The HAdV-1 complete genome has been sequenced and its content comprehensively annotated. The employed methodologies (instruments, software and protocols) are consistent with large-scale high throughput DNA sequencing, and are readily leveraged for rapidly obtaining genomes of related organisms, not just of a single and unique organism. For example, data and strategies generated from this report establish tiled primer sets for the HAdV-1 genome (‘leveraged primer sequencing strategy’) that can facilitate genome sequencing of any serotypes, strains or variants of the HAdV-C species. These strategies are an example of ‘applied genomics' and allow for rapid and accurate low-pass sequencing of genomes. Rapid genome sequence determinations allow identification and development of specific probes to differentiate family, subtype, serotype and strains (pathogen DNA sequence signatures). These are being used to monitor aetiologies in ARD epidemic outbreaks in a defined test bed (EOS consortium). Additionally, rapid, accurate and cost-effective sequencing of HAdV genomes will lead to further studies of the natural history of AdVs, and the biology, epidemiology, phylogeny and evolution of virus families. Adenoviruses still serve as an important model system for many applications.


   ACKNOWLEDGEMENTS
 
J. S. and D. S. dedicate this work to the memory of Dante Ferrini (26 March 1915–26 January 2004). Mi Ha Yuen provided a preliminary partial annotation. A portion of this work was supported by a grant from the NIH-NHGRI, R01 HG00562 (C. T., PI). Partial support was also provided through the Epidemic Outbreak Surveillance project (EOS), funded through HQ USAF Surgeon General Office (SGR) and the Defense Threat Reduction Agency. The opinions and assertions contained herein are the private ones of the authors and are not to be construed as official or reflecting the views of the Department of Defense.

During the course of this work, the membership of the EOS Consortium was as follows.

Sponsorship: Col. Peter F. Demitry and Lt. Col. Theresa Lynn Difato (USAF/SGR).

Executive Board and Principal Investigators: Maj. Eric H. Hanson and Capt. Robb K. Rowley (WAF/SGR), Clark Tibbetts (The George Washington University, IPA); Rosana R. Holliday (USAF/SGR, Ctr).

Operational Board and Senior Scientists: Curtis White (Lackland AFB, TX); David A. Stenger (Naval Research Laboratory); Donald Seto and Jennifer Weller (George Mason University, IPA); Elizabeth A. Walter (Texas A&M University San Antonio, IPA); Jerry Diao (USAF/SGR, Ctr); Maj. Brian K. Agan (Wilford Hall Medical Center); Russell P. Kruzelock (Virginia Tech, IPA).

Technical Advisors and Collaborating Investigators: Cdr. Kevin Russell, David Metzgar and Jianguo Wu (Navy Health Research Center); Ted Hadfield (Armed Forces Institute of Pathology).

Research and Clinical Staff: Anjan Purkayastha and Jing Su (George Mason University); Chris Olsen (USAF/SGR, Ctr); Baochuan Lin, Dzung Thach, Gary J. Vora, Joseph P. Pancrazio and Zheng Wang (Naval Research Laboratory), Dong Xia, Robert Crawford, Sue Ditty and John McGraw (Armed Forces Institute of Pathology); John Gomez, Jose J. Santiago, Margaret Jesse and Sue A. Worthy (Lackland AFB, TX); Linda Canas (Air Force Institute of Operational Health); Mi Ha Yuen (George Mason University); TSgt. Michael Jenkins (Wilford Hall Medical Center).

Operations Support Staff: Cheryl J. James, Kathy Ward and Kenya Grant (USAF/SGR, Ctr); Kindra Nix (Lackland AFB, TX).


   REFERENCES
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS AND DISCUSSION
REFERENCES
 
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990). Basic local alignment search tool. J Mol Biol 215, 403–410.[CrossRef][Medline]

Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402.[Abstract/Free Full Text]

Benko, M. & Harrach, B. (2003). Molecular evolution of adenoviruses. In Adenoviruses: Model and Vectors in Virus–Host Interactions, pp. 3–35. Edited by W. Doerfler & P. Bohm. Berlin: Springer.

Benko, M., Harrach, B. & Russell, W. C. (2000). The Adenoviridae. In Virus Taxonomy. Seventh Report of the International Committee on Taxonomy of Viruses, pp. 227–238. Edited by M. H. V. van Regenmortel, C. M. Fauquet, G. H. L. Bishop, E. B. Carstens, M. K. Estes, S. M. Lemon, J. Maniloff, M. A. Mayo, D. J. McGeoch, C. R. Pringle & R. B. Wickner. San Diego: Academic Press.

Berriman, M. & Rutherford, K. (2003). Viewing and annotating sequence data with Artemis. Brief Bioinform 4, 124–132.[Medline]

Besemer, J. & Borodovsky, M. (1999). Heuristic approach to deriving models for gene finding. Nucleic Acids Res 27, 3911–3920.[Abstract/Free Full Text]

Binger, M. H. & Flint, S. J. (1984). Accumulation of early and intermediate mRNA species during subgroup C adenovirus productive infections. Virology 136, 387–403.[CrossRef][Medline]

Buescher, E. L (1967). Respiratory disease and the adenoviruses. Med Clin North Am 51, 769–779.[Medline]

Chroboczek, J., Bieber, F. & Jacrot, B. (1992). The sequence of the genome of adenovirus type 5 and its comparison with the genome of adenovirus type 2. Virology 186, 280–285.[Medline]

Concino, M., Goldman, R. A., Caruthers, M. H. & Weinmann, R. (1984). Point mutations of the adenovirus major late promoter with different transcriptional efficiencies in vitro. J Biol Chem 258, 8493–8496.

Dan, A., Elo, P., Harrach, B., Zadori, Z. & Benko, M. (2001). Four new inverted terminal repeat sequences from bovine adenoviruses reveal striking differences in the length and content of the ITRs. Virus Genes 22, 175–179.[CrossRef][Medline]

Davison, A. J., Telford, E. A., Watson, M. S., McBride, K. & Mautner, V. (1993). The DNA sequence of adenovirus type 40. J Mol Biol 234, 1308–1316.[CrossRef][Medline]

Davison, A. J., Benko, M. & Harrach, B. (2003). Genetic content and evolution of adenoviruses. J Gen Virol 84, 2895–2908.[Abstract/Free Full Text]

De Jong, R. N., Van Der Vliet, P. C. & Brenkman, A. B. (2003). Adenovirus DNA replication: protein priming, jumping back and the role for the DNA binding protein DBP. In Adenoviruses: Model and Vectors in Virus–Host Interactions, pp. 187–211. Edited by W. Doerfler & P. Bohm. Berlin: Springer.

Dudding, B. A., Wagner, S. C., Zeller, J. A., Gmelich, J. T., French, G. R. & Top, F. H., Jr (1972). Fatal pneumonia associated with adenovirus type 7 in three military trainees. N Engl J Med 286, 1289–1292.[Medline]

Evans, J. D. & Hearing, P. (2002). Adenovirus replication. In Adenoviral Vectors for Gene Therapy, pp. 39–70. Edited by D. T. Curiel & J. T. Douglas. San Diego: Academic Press.

Flint, J. & Shenk, T. (1997). Viral transactivating proteins. Annu Rev Genet 31, 177–212.[CrossRef][Medline]

Gao, W., Robbins, P. D. & Gambotto, A. (2003). Human adenovirus type 35: nucleotide sequence and vector development. Gene Ther 10, 1941–1949.[CrossRef][Medline]

Gray, G. C., Goswami, P. R., Malasig, M. D., Hawksworth, A. W., Trump, D. H., Ryan, M. A. & Schnurr, D. P. (2000). Adult adenovirus infections: loss of orphaned vaccines precipitates military respiratory disease epidemics. For the Adenovirus Surveillance Group. Clin Infect Dis 31, 663–670.[CrossRef][Medline]

Hasson, T. B., Soloway, P. D., Ornelles, D. A., Doerfler, W. & Shenk, T. (1989). Adenovirus L1 52- and 55-kilodalton proteins are required for assembly of virions. J Virol 63, 3612–3621.[Medline]

Hatfield, L. & Hearing, P. (1991). Redundant elements in the adenovirus type 5 inverted terminal repeat promote bidirectional transcription in vitro and are important for virus growth in vivo. Virology 184, 265–276.[Medline]

Hatfield, L. & Hearing, P. (1993). The NFIII/OCT-1 binding site stimulates adenovirus DNA replication in vivo and is functionally redundant with adjacent sequences. J Virol 67, 3931–3939.[Abstract]

Hayes, B. W., Telling, G. C., Myat, M. M., Williams, J. F. & Flint, S. J. (1990). The adenovirus L4 100-kilodalton protein is necessary for efficient translation of viral late mRNA species. J Virol 64, 2732–2742.[Medline]

Hillemann, M. R. & Werner, J. R. (1954). Recovery of new agent from patients with acute respiratory illness. Proc Soc Exp Biol Med 85, 183–188.

Horton, T. M., Tollefson, A. E., Wold, W. S. & Gooding, L. R. (1990). A protein serologically and functionally related to the group C E3 14, 700-kilodalton protein is found in multiple adenovirus serotypes. J Virol 64, 1250–1255.[Medline]

Howitt, J., Anderson, C. W. & Freimuth, P. (2003). Adenovirus interaction with its cellular receptor CAR. In Adenoviruses: Model and Vectors in Virus–Host Interactions, pp. 331–364. Edited by W. Doerfler & P. Bohm. Berlin: Springer.

Huang, X. (1994). On global sequence alignment. Comput Appl Biosci 10, 227–235.[Abstract]

Lee, R. F., Concino, M. F. & Weinmann, R. (1988). Genetic profile of the transcriptional signals from the adenovirus major late promoter. Virology 165, 51–56.[CrossRef][Medline]

Leegwater, P. A., van Driel, W. & van der Vliet, P. C. (1985). Recognition site of nuclear factor I, a sequence-specific DNA-binding protein from HeLa cells that stimulates adenovirus DNA replication. EMBO J 4, 1515–1521.[Abstract]

Leong, K., Lee, W. & Berk, A. J. (1990). High-level transcription from the adenovirus major late promoter requires downstream binding sites for late phase-specific factors. J Virol 64, 51–60.[Medline]

Leppard, K. N. (1997). E4 gene function in adenovirus, adenovirus vector and adeno-associated virus infections. J Gen Virol 78, 2131–2138.[Free Full Text]

Mathews, M. B. & Shenk, T. (1991). Adenovirus virus-associated RNA and translation control. J Virol 65, 5657–5662.[Medline]

Mei, Y. F., Skog, J., Lindman, K. & Wadell, G. (2003). Comparative analysis of the genome organization of human adenovirus 11, a member of the human adenovirus species B, and the commonly used human adenovirus 5 vector, a member of species C. J Gen Virol 84, 2061–2071.[Abstract/Free Full Text]

Mul, Y. M., Verrijzer, C. P. & van der Vliet, P. C. (1990). Transcription factors NFI and NFIII/oct-1 function independently, employing different mechanisms to enhance adenovirus DNA replication. J Virol 64, 5510–5518.[Medline]

Oosterom-Dragon, E. A. & Ginsberg, H. S. (1981). Characterization of two temperature-sensitive mutants of type 5 adenovirus with mutations in the 100,000-dalton protein gene. J Virol 40, 491–500.[Medline]

Parks, C. L. & Shenk, T. (1997). Activation of the adenovirus major late promoter by transcription factors MAZ and Sp1. J Virol 71, 9600–9607.[Abstract]

Pruijn, G. J., van Miltenburg, R. T., Claessens, J. A. & van der Vliet, P. C. (1988). Interaction between the octamer-binding protein nuclear factor III and the adenovirus origin of DNA replication. J Virol 62, 3092–3102.[Medline]

Reach, M., Xu, L.-X. & Young, C. S. H. (1991). Transcription from the adenovirus major late promoter uses redundant activating elements. EMBO J 10, 3439–3446.[Abstract]

Roberts, R. J., Akusjarvi, G., Alestrom, P., Gelinas, R. E., Gingeras, T. R., Sciaky, D. & Pettersson, U. (1986). A consensus sequence for the adenovirus-2 genome. In Adenovirus DNA, pp. 1–51. Edited by W. Doerfler. Boston: Martinus Nijhoff.

Rowe, W. P., Huebner, R. J., Gilmore, L. K., Parrot, R. H. & Ward, T. G. (1953). Isolation of a cytopathogenic agent from human adenoids undergoing spontaneous degradation in tissue culture. Proc Soc Exp Biol Med 84, 570–573.

Russell, W. C. (2000). Update on adenovirus and its vectors. J Gen Virol 81, 2573–2604.[Free Full Text]

Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P., Rajandream, M. A. & Barrell, B. (2000). Artemis: sequence visualization and annotation. Bioinformatics 16, 944–945.[Abstract]

Rux, J. J., Kuser, P. R. & Burnett, R. M. (2003). Structural and phylogenetic analysis of adenovirus hexons by use of high-resolution X-ray crystallographic, molecular modeling, and sequence-based methods. J Virol 77, 9553–9566.[Abstract/Free Full Text]

Ryan, M. A. K., Gray, G. C., Malasig, M. D., Binn, L. N., Asher, L. V., Cute, D., Kehl, S. C., Dunn, B. E. & Yund, A. J. (2001). Two fatal cases of adenovirus-related illness in previously healthy young adults - Illinois, 2000. Morb Mortal Wkly Rep 50, 553–555.[Medline]

San Martin, C. & Burnett, R. M. (2003). Structural studies on adenoviruses. In Adenoviruses: Model and Vectors in Virus–Host Interactions, pp. 57–94. Edited by W. Doerfler & P. Bohm. Berlin: Springer.

Scarpini, C., Arthur, J., Efstathiou, S., McGrath, Y. & Wilkinson, G. (1999). Herpes simplex virus and adenovirus vectors. In DNA Viruses, a Practical Approach, pp. 267–306. Edited by A. J. Cann. New York: Oxford University Press.

Schwartz, S., Zhang, Z., Frazer, K. A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R. & Miller, W. (2000). PipMaker - a web server for aligning two genomic DNA sequences. Genome Res 10, 577–586.[Abstract/Free Full Text]

Seto, D., Koop, B. F., Deshpande, P., Howard, S., Seto, J., Wilk, E., Wang, K. & Hood, L. (1994). Organization, sequence, and function of 34·5 kb of genomic DNA encompassing several murine T-cell receptor {alpha}/{delta} variable gene segments. Genomics 20, 258–266.[CrossRef][Medline]

Sprengel, J., Schmitz, B., Heuss-Neitzel, D., Zock, C. & Doerfler, W. (1994). Nucleotide sequence of human adenovirus type 12 DNA: comparative functional analysis. J Virol 68, 379–389.[Abstract]

Stone, D., Furthmann, A., Sandig, V. & Lieber, A. (2003). The complete nucleotide sequence, genome organization, and origin of human adenovirus type 11. Virology 309, 152–165.[CrossRef][Medline]

Sugarman, B. J., Hutchins, B. M., McAllister, D. L., Lu, F. & Thomas, B. K. (2003). The complete nucleotide acid sequence of the adenovirus type 5 reference material (ARM) genome. Bioprocessing J September/October, 27–32.

Temperley, S. M. & Hay, R. T. (1992). Recognition of the adenovirus type 2 origin of DNA replication by the virally encoded DNA polymerase and preterminal proteins. EMBO J 11, 761–768.[Abstract]

Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. (1997). The CLUSTALX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 24, 4876–4882.[CrossRef]

Vogels, R., Zuijdgeest, D., van Rijnsoever, R. & 20 other authors (2003). Replication-deficient human adenovirus type 35 vectors for gene transfer and vaccination: efficient human cell infection and bypass of preexisting adenovirus immunity. J Virol 77, 8263–8271.[Abstract/Free Full Text]

Wadell, G. (1984). Molecular epidemiology of human adenoviruses. Curr Top Microbiol Immunol 110, 191–220.[Medline]

Wickham, T. J., Mathias, P., Cheresh, D. A. & Nemerow, G. R. (1993). Integrins {alpha}v{beta}3 and {alpha}v{beta}5 promote adenovirus internalization but not virus attachment. Cell 73, 309–319.[Medline]

Wold, W. S. M. & Gooding, L. R. (1991). Region E3 of adenovirus: a cassette of genes involved in host immunosurveillance and virus–cell interactions. Virology 184, 1–8.[Medline]

Yeh, R. F., Lim, L. P. & Burge, C. B. (2001). Computational inference of homologous gene structures in the human genome. Genome Res 11, 803–816.[Abstract/Free Full Text]

Yew, P. R., Liu, X. & Berk, A. J. (1994). Adenovirus E1B oncoprotein tethers a transcriptional repression domain to p53. Genes Dev 8, 190–202.[Abstract]

Young, C. S. H. (2003). The structure and function of the adenovirus major late promoter. In Adenoviruses: Model and Vectors in Virus–Host Interactions, pp. 213–249. Edited by W. Doerfler & P. Bohm. Berlin: Springer.

Zafar, N., Mazumder, R. & Seto, D. (2001). Comparisons of gene colinearity in genomes using GeneOrder2.0. Trends Biochem Sci 26, 514–516.[CrossRef][Medline]

Zhang, W., Low, J. A., Christensen, J. B. & Imperiale, M. J. (2001). Role for the adenovirus IVa2 protein in packaging of viral DNA. J Virol 75, 10446–10454.[Abstract/Free Full Text]

Zu, Y. L., Takamatsu, Y., Zhao, M. J., Maekawa, T., Handa, H. & Ishii, S. (1992). Transcriptional regulation by a point mutant of adenovirus-2 E1a product lacking DNA-binding activity. J Biol Chem 267, 20181–20187.[Abstract/Free Full Text]

Received 18 March 2004; accepted 10 May 2004.