Conserved RNA secondary structures in Flaviviridae genomes

Caroline Thurner1, Christina Witwer1, Ivo L. Hofacker1 and Peter F. Stadler1,2,3

1 Institut für Theoretische Chemie und Molekulare Strukturbiologie, Universität Wien, Währingerstraße 17, A-1090 Wien, Austria
2 Bioinformatik, Institut für Informatik, Universität Leipzig, Kreuzstraße 7b, D-04103 Leipzig, Germany
3 The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA

Correspondence
Ivo L. Hofacker
ivo{at}tbi.univie.ac.at


   ABSTRACT
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Presented here is a comprehensive computational survey of evolutionarily conserved secondary structure motifs in the genomic RNAs of the family Flaviviridae. This virus family consists of the three genera Flavivirus, Pestivirus and Hepacivirus and the group of GB virus C/hepatitis G virus with a currently uncertain taxonomic classification. Based on the control of replication and translation, two subgroups were considered separately: the genus Flavivirus, with its type I cap structure at the 5' untranslated region (UTR) and a highly structured 3' UTR, and the remaining three groups, which exhibit translation control by means of an internal ribosomal entry site (IRES) in the 5' UTR and a much shorter less-structured 3' UTR. The main findings of this survey are strong hints for the possibility of genome cyclization in hepatitis C virus and GB virus C/hepatitis G virus in addition to the flaviviruses; a surprisingly large number of conserved RNA motifs in the coding regions; and a lower level of detailed structural conservation in the IRES and 3' UTR motifs than reported in the literature. An electronic atlas organizes the information on the more than 150 conserved, and therefore putatively functional, RNA secondary structure elements.

Supplementary figures are supplied in JGV Online.


   INTRODUCTION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Viral RNA genomes not only code for proteins but in many instances also carry RNA motifs that play a crucial role in the viral life-cycle. Well-known examples are the internal ribosomal entry site (IRES), the RRE motif in human immunodeficiency virus, or the CRE hairpin in Picornaviridae. The detection of such functional motifs in the viral genome is a difficult task because almost all RNA molecules form secondary structures and the functional structures are not significantly different from the structures formed by random sequences (Fontana et al., 1993; Rivas & Eddy, 2000).

RNA secondary structures have been shown to be very sensitive to mutations (Fontana et al., 1993; Schuster et al., 1994): mutations in about 10 % of the sequence positions already leads almost surely to unrelated structures if the mutated positions are chosen randomly. Secondary structure elements that are consistently present in a group of sequences with less than, say, 95 % mean pairwise identity are therefore most likely the result of stabilizing selection, not a consequence of the high degree of sequence homology. This fact can be exploited to design algorithms that reliably detect conserved RNA secondary structure elements in a small sample of related RNA sequences (Hofacker et al., 1998; Hofacker & Stadler, 1999). This method was recently applied quite successfully to a survey of the genomes of Picornaviridae (Witwer et al., 2001) and the RNA pre-genome of Hepadnaviridae (Stocsits et al., 1999).

Here we report a comprehensive survey of members of the family Flaviviridae, which possess a single-stranded positive-sense RNA (ss+RNA) genome. The family is subdivided into the three genera Flavivirus, Pestivirus and Hepacivirus and the group of GB virus C/hepatitis G viruses (GBV-C) with a currently uncertain taxonomic classification (van Regenmortel et al., 2000). The RNA genome, which has a size of 9·6–12·3 kb, is characterized by a similar organization (Fig. 1) in all genera and acts as the only mRNA found in infected cells. It contains one single long open reading frame flanked by 5' and 3' untranslated regions (UTR). These are known to form into specific secondary structures required for genome replication and translation. Viral proteins are synthesized as one single polyprotein, which is co- and post-translationally cleaved by viral and cellular proteinases.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 1. Genome map for the Flavivirus species dengue virus, Japanese encephalitis virus, yellow fever virus and tick-borne encephalitis virus and Pestivirus species hepatitis C virus, GB virus C/hepatitis G virus and non-cytopathic virus. Putative conserved secondary structures are indicated by the boxes above the RNA sequence.

 
Based on the control of replication and translation it is useful to consider two subgroups of Flaviviridae. The first group is formed by the genus Flavivirus and is characterized by a type I cap structure at the 5' UTR (Brinton & Dispoto, 1988) and a highly structured 3' UTR. In this group there is evidence that the 5' and 3' ends stack together to cause a cyclization of the genome (sometimes referred to as a ‘panhandle structure’) that might be an important feature for RNA replication (Hahn et al., 1987; Khromykh et al., 2001).

The second group, consisting of Hepacivirus (hepatitis C virus; HCV), Pestivirus (PESTI) and GBV-C, controls translation by means of an IRES in the 5' UTR and has a short, less-structured 3' UTR. Pestivirus and Hepacivirus have very similar IRES regions (Pestova et al., 1998); the IRES of GBV-C is 50 % longer and structurally quite different (Simons et al., 1996). Therefore we treat these two groups separately.

While the 5' and 3' UTRs of Flaviviridae have been the object of several studies, very little is known about the secondary structures of the coding regions despite some evidence that the coding region might also contain functional RNA motifs (Simmonds & Smith, 1999; Tuplin et al., 2002).


   METHODS
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Sequence data were obtained from the NCBI genome database. The phylogenetic distribution and the pairwise sequence similarity of the available data were such that the following groups of Flaviviridae could be investigated in detail: the genera Pestivirus and Hepacivirus, the unclassified group GBV-C and some species of the genus Flavivirus. These are dengue virus (DEN), Japanese encephalitis virus (JEV), yellow fever viruses (YFV) and tick-borne encephalitis virus (TBE). Statistical information on these sequence data is compiled in Table 1.


View this table:
[in this window]
[in a new window]
 
Table 1. Number of analysed sequences n, length of our alignments, length of 5' UTR, IRES (if present), coding region and 3' UTR with the respective mean pairwise sequence identities {sigma}

 
The genus Pestivirus can be subdivided into two groups, the cytopathic pestiviruses, which cause cell shrinkage, membrane blebbing and cell death, and the non-cytopathic ones. Cytopathic viruses develop from non-cytopathic viruses by RNA recombination, resulting in genome duplicates, rearrangements, deletions and insertions (Myers & Thiel, 1996). Characteristically, they have at least one additional copy of the NS3 protein, isolated from NS2/NS3 through ubiquitin (Ub) or cIns insertions (Myers & Thiel, 1996; Tautz et al., 1999). In this study we only use full-length genomes of non-cytopathic pestiviruses, because insertions of Ub and cIns cause extended gaps in the multiple alignments that interfere with the analysis. On the other hand, a detailed study of cytopathic viruses and particularly the effects of the extended insertions on the secondary structure of pestiviruses was not possible because there were too few sequences available in the public databases.

Multiple sequence alignments were calculated using CLUSTAL W (Thompson et al., 1994). All sequence positions reported here refer to the multiple sequence alignments that are available as part of the supplemental material.

RNA genomes were folded in their entirety using McCaskill's partition function algorithm (McCaskill, 1990) as implemented in the VIENNA RNA package (Hofacker et al., 1994), based on the energy parameters published in Mathews et al. (1999). The result of this computation is a matrix of base pairing probabilities for each potential base pair (i, j) of the genomic RNA.

The ALIDOT algorithm (Hofacker et al., 1998; Hofacker & Stadler, 1999) was used to search the base pairing probability matrix for conserved secondary structure patterns. This method requires an independent prediction of the secondary structure for each of the sequences and a multiple sequence alignment that is obtained without any reference to the predicted secondary structures. The algorithm ranks base pairs using both the thermodynamic information contained in the base pairing probability matrix and the information on compensatory, consistent (e.g. GC->GU) and inconsistent mutations contained in the multiple sequence alignment. The approach is different from efforts to simultaneously compute alignment and secondary structures (Gorodkin et al., 1997; Sankoff, 1985) and from programs such as CONSTRUCT (Lück et al., 1996, 1999) and ALIFOLD (Hofacker et al., 2002) because it does not assume that the sequences have a single common structure. An implementation of this algorithm is available from http://www.tbi.univie.ac.at/RNA/. The ALIFOLD algorithm (Hofacker et al., 2002) is used to obtain consensus structures of regions with significant structural conservation.

Computational results are shown as Hogeweg-style mountain plots (Hogeweg & Hesper, 1984) with colour codes indicating sequence covariations.


   RESULTS
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
In the survey reported here we found many putative structural elements, as indicated in Fig. 1. A complete description of each one of them cannot be displayed in print because of space constraints, but see Fig. 6 for selected examples. The complete material including positional information, sequences, accession numbers, multiple sequence alignments, structure predictions, structure drawings and information on the sequence covariation are available as supplemental material in electronic form in our Viral RNA Structure Database at http://rna.tbi.univie.ac.at/virus/. This web site can also be used to retrieve the computational results for regions that we have not identified as structurally conserved. Unless noted otherwise, the names used to denote individual conserved helices follow the scheme used on the web site.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 6. Examples of conserved secondary structures in the coding region of GBV-C, HCV and PESTI. (a) to (c) and (e) conserved structures in GBV-C coding region; (c) and (e) were already proposed by Cuceanu et al. (2001) (SLII and SLIII, respectively). (d) and (f) examples from HCV, (d) was first proposed by Tuplin et al. (2002). (g) and (h) proposed conserved structures in PESTI coding region.

 
Genus Flavivirus
The genus Flavivirus is a widespread genus containing very diverse species. Since at least six sufficiently diverged genomic sequences of each species are necessary for our analysis method, we focused on the species DEN, JEV, YFV and TBE. In Fig. 2 we show an overview of the conserved secondary structure elements of the 5' and 3' ends and the cyclization domains called P1', P1, P2 and CS or CS "A", respectively, which we found by applying our algorithms.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2. The minimum free energy structure of one sequence of the respective virus species is represented. Coloured backgrounds mark regions that our folding algorithm and selection criteria allowed for all sequences of the respective species. The same colour is used for equivalent structures in different species, grey motifs are conserved only within a single species. The nomenclature of the structures corresponds to the website atlas of structures, see rna.tbi.univie.ac.at. Conserved secondary structures where 5' and 3' UTRs are involved in genome cyclization are called P1', P1, and P2; CS and CS "A" are taken from Hahn et al. (1987) and Khromykh et al. (2001) respectively. Distances along the x-axis are not to scale; the exact positions of the structure elements are given in supplemental material B.

 
Genome cyclization.
Hahn et al. (1987) found complementary sequences [cyclization sequences (CS)] close to the 5' and the 3' end of the genome and concluded that the two ends of the genome of Flavivirus stick together in a panhandle-like structure. Recently, it has been shown that RNA synthesis in vitro requires both 5' and 3' ends present, either connected in the same RNA sequence, or added in trans (You & Padmanabhan, 1999). Another piece of evidence for the cyclization of the genomic RNA is the finding that the first stem in 5' UTR and the last stem in 3' UTR together with the CS are necessary and sufficient for virus translation and replication (Khromykh et al., 2001).

The mean pairwise sequence identity of all four species of the genus Flavivirus (less than 50 %) was too low to yield good alignments. The species TBE differs most from the other species in both sequence and structure. From an alignment of the remaining species (DEN, JEV and YFV) we obtained a common structure for the CS (Fig. 2), which supports the prediction of Hahn et al. (1987). We then compared only DEN and JEV. In our data the CS contained no sequence variation but was predicted with pair probabilities close to one. Adjacent to the CS we found a further stem which participates in genome cyclization and which contains several sites of sequence variation (P2'' in supplemental material A). Between CS and P2'' there is a well-conserved hairpin structure supported by numerous compensatory mutations (DV2/JE2 in supplemental material A).

TBE.
The conserved cyclization motifs first reported by Hahn et al. (1987) for mosquito-borne viruses are absent in the TBE group. Putative CSs were proposed for Powassan virus RNA (Mandl et al., 1993), for TBE and cell fusing agent (Khromykh et al., 2001). In all proposed motifs for genome cyclization we did not find any mutations in the sequences; thus we could not use our method to confirm the predicted structure by means of sequence covariation. Thermodynamic folding, however, provided strong evidence for the CS "A"-motif (Fig. 2) (Khromykh et al., 2001; Mandl et al., 1993) because these base pairs appeared with probabilities close to one in the folds of the complete genome. Khromykh's region CS"B" was folded only by one single sequence (TEU27491) and thus could not be considered as a common motif for all members of TBE.

5' UTR.
The 5' UTRs of DEN, JEV and YFV form into a very similar secondary structure, while the structure for TBE is significantly different (Fig. 3); see DV1, JE1, YF1 and TB1, respectively. For DEN we found structural conservation, while the sequences of JEV and YFV were highly conserved. A manually improved alignment of this region for JEV and YFV to DEN showed that there was significant structure conservation among all three genera. Furthermore, all structures contained an interior loop of three Us (one on the 5'- and two on the 3'-strand; DV1, JE1 and YF1 in Fig. 3).



View larger version (48K):
[in this window]
[in a new window]
 
Fig. 3. Conserved secondary structures of the Flavivirus species DEN, JEV, YFV and TBE in the 5' UTR (first column) and 3' UTR (second and third column). Mountain plots (Hogeweg & Hesper, 1984) faithfully represent secondary structures: each base pair (i, j) is represented by a slab ranging from position i to j; its height is proportional to the base pairing probability in thermodynamic equilibrium, computed with McCaskill's algorithm. Colours indicate the number of different types of base pairs (red 1, ocher 2, green 3, turquoise 4, blue 5, violet 6). Saturated colour indicates that all sequences can form the base pair, while two levels of pale colour mean that 1 or 2 input sequences have non pairing bases at positions i and j. If there are more than 2 non-compatible sequences the pair is not displayed. In the conventional drawings consistent and compensatory mutations are indicated by circles around bases that have mutations. Grey letters indicate inconsistent mutations.

 
For DEN, a structure similar to DV1 was proposed by Leitmeyer et al. (1999) and Khromykh et al. (2001). A stem–loop structure from positions 80–105 reported in Leitmeyer et al. (1999) is predicted thermodynamically, but has a conserved sequence and thus is not supported by sequence covariation.

A stem carrying the initiator AUG proposed by Hahn et al. (1987) for YFV was found to fold in all sequences but is not supported by sequence covariation. The 5' UTR structure proposed by Khromykh et al. (2001) for TBE was inconsistent with the available sequence data. We found a different structure that was confirmed by several mutations, both consistent and compensatory (TB1 in Fig. 3).

Coding region.
Several conserved secondary structures were found in the coding regions of DEN, JEV, YFV and TBE. These structures are available on the web site. So far, no functions have been proposed for these regions. Hahn et al. (1987) already proposed the stem–loop DV2 for Den-2 virus.

3' UTR.
Conserved structures in the 3' UTR are shown in Fig. 3: the structures show strong similarity between species. Sequence variation in the stem DV6a was high in DEN and present in JEV. For YFV, we found a stem corresponding to DV6a and to JE7a.

Structures similar to DV6, JE7, YF27 or TB19 were also proposed by Hahn et al. (1987) for DEN 2 and YFV, by Khromykh et al. (2001) for DEN, YFV, JEV and TBE, by Rauscher et al. (1997) (B for DEN, YFV, and JEV and I, II and III for TBE), by Proutski et al. (1999) (TL1/RCS2 or TL2/CS2 for DEN and JEV, and "stem–loop 1 in subregion I" for YFV), and by Leitmeyer et al. (1999) for DEN.

DEN.
For the DEN 3' UTR, we found the same structures as Rauscher et al. (1997) where the analysis was restricted to the isolated 3' UTR. None of the long-range interactions interfered with any of these structural motifs. Leitmeyer et al. (1999) propose additional base pairings that we could not find because they conflicted with the cyclization domains.

We only found parts of the secondary structures proposed by Proutski et al. (1997) for DEN2 as conserved for all DEN species. In particular, we did not find structures I2 and I3, II1 and III except region 3' LSH (our DV7). DV6 and DV7 are also discussed by Proutski et al. (1999) for DEN4. All other structures that are reported in that study are disrupted by the cyclization of the viral genome. Assuming that cyclization of the genome is vital, we can reinterpret the deletion studies reported by Men et al. (1996) in the following way: the deletion of DV6a (TL2) yields a delayed and reduced growth in simian and mosquito cells. When the deletions were extended more, to the 3' end of the sequence, the CS region was destroyed [mutant 3' 172–83 of Men et al. (1996)] and hence no viable viruses were found. A non-viable mutant 3'd 172–107 may be explained by the importance of the sequence motif CAAAAA for virus propagation (Men et al., 1996). Our data indicate that, in this case, the sequence motif is important rather than any structure associated with it. For the mutants 3'd 333–183 and 3'd 384–183, Men et al. (1996) measure a greatly delayed and reduced growth in living cells. We would argue that these deletions destroy a possible prolongation of the cyclization region that we found for dengue viruses (data not shown). Our data indicate that each single sequence allows additional stems for cyclization in this region even though their exact positions vary slightly among the different sequences. It is plausible that such an extended cyclization region adds to the efficiency of virus replication but is not necessarily essential for its viability.

YFV and JEV.
The sequences in our dataset had about {sigma}=91·7 % pairwise identity in the 3' UTR. We observed only a small number of compensatory mutations to verify structural features predicted based on our thermodynamic algorithm. We essentially found the same structures as Rauscher et al. (1997); again none of the structures reported by Rauscher et al. (1997) conflicted with CS regions. YF28 was shorter by 9 bp than reported by Hahn et al. (1987). YF28 and YF27 corresponded to 3' LSH and I1, respectively, JE7 and JE8 to 3' LSH and II2, respectively, as proposed by Proutski et al. (1997). More structures could not be found for similar reasons as explained for DEN above.

TBE.
We recovered structures very similar to those reported by Mandl et al. (1998) and Rauscher et al. (1997). In particular, TB17 and TB18 correspond to IV and VI of Mandl et al. (1998) and Rauscher et al. (1997), respectively, TB19 contains stem III of Mandl et al. (1998) and Rauscher et al. (1997) and TB16 corresponds to VII, VIII and IX of Mandl et al. (1998) and Rauscher et al. (1997). Structure A1 reported by Mandl et al. (1998) was shorter because of conflicts with cyclization sequences P1' and CS "A". Structure A2 (Mandl et al., 1998) did not seem to be conserved. For structures MS and V of Mandl et al. (1998) we had evidence from thermodynamic folding. However, these two structures conflict with P2. TB16 to TB21 conform with structures proposed by Proutski et al. (1997).

Pestivirus, Hepacivirus and GBV-C
5' UTR.
The 5' UTRs of these virus groups contain an IRES. For parts of the HCV IRES even tertiary structure studies are available (Kieft et al., 2002; Lukavsky et al., 2001). The sequences of 5' UTRs of GBV-C and HCV are significantly more conserved than the rest of their respective genomes (Table. 1). For these two virus groups we found that the secondary structure of the 5' UTR is less conserved than we expected (due to the few sequence covariations); an overview is given in Fig. 4. This was consistent with the data reported by Witwer et al. (2001) for Picornaviridae. In contrast, the IRES of PESTI turned out to be highly conserved.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 4. Schematic illustration of 5' UTRs of GBV-C, HCV and PESTI. Conserved structures are discussed in the text. Notations in parentheses correspond in (a) to Simons et al. (1996), (b) to Honda et al. (1996a) and (c) to Brown et al. (1992).

 
The IRES structures of HCV and PESTI shared a common overall structure despite the fact that they were not comparable at the sequence level. Nevertheless, they shared a few significant details: the IIIa stem carried a completely conserved loop sequence and stem IIIc was conserved in its sequence.

GBV-C.
The 5' UTR sequences of GBV-C were more highly conserved than the rest of the genome (Table. 1). Most of the sequence variation occurred around nucleotide (nt) positions 410–437, which comprised the structural element HG6 (IVb) (Fig. 5a). This motif was also predicted in previous studies (Simons et al., 1996; Smith et al., 1997).



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 5. (a) GBV-C: 5' UTR nt: 410–437, IRES conserved element HG6(IVb). (b) Pestivirus 5' UTR nt:1–420; the IRES is supposed to begin with stem PV2(II).

 
We found a stem, HG2 (Fig. 4a), that was shorter and more shifted to the 5' end of the IRES than stem–loop II reported by Simons et al. (1996). Our prediction was supported by compensatory mutations (data not shown). The reason for the discrepancy was the formation of a panhandle-like structure by means of a base pairing interaction from nt 163–175 with nt 9213–9201 (discussed later).

The sequences were too conserved in the remainder of the 5' UTR to support predicted structures by means of sequence variation. The thermodynamic prediction, however, found structures similar to those previously proposed (Katayama et al., 1998; Simons et al., 1996; Smith et al., 1997).

HCV.
The 5' UTR of HCV comprises 341–342 nt. The RNA fold algorithm recovered structures similar to those reported in previous studies (Collier et al., 2002; Honda et al., 1996b; Kalliampakou et al., 2002; Kieft et al., 2001; Kolupaeva et al., 2000a; Odreman-Macchioli et al., 2000; Psaridi et al., 1999; Spahn et al., 2001; Tang et al., 1999; Fig. 4b). Our algorithm was not designed to predict pseudoknots. However, we made sure that nucleic acids that are known to be involved in pseudoknots (Pestova et al., 1998) do not pair to other parts of the sequence.

Due to high sequence conservation (Table 1) we found only two sites with compensatory mutations in HC3 (called IIIa, b and c by Honda et al., 1996b) in our dataset of nine complete genomic sequences. When additional sequences of the IRES region were included in the analysis, the structure was well supported by compensatory mutations (data not shown). This structure, HC3, has received considerable attention since it appears to act as a binding site for the eIF3–40S complex. It has an internal loop, which is twisted in itself (Collier et al., 2002). Even though we found a mean identity of 98·3 % in this region, there were two compensatory mutations just before and after this highly structured part of the HCV IRES. This confirms Collier's interpretation that the shape of the backbone rather than the sequence composition is important for translation initiation.

We found a stem, HC2, which corresponds to IIa proposed by Honda et al. (1996a). For the nucleotides following stem IIa, the prediction favoured long-range interactions with nt 8571–8552 (NS5B); see HCVCS2 (discussed later). When the isolated IRES region (i.e. nt 44–357) was folded separately, stems IIa and IIb were recovered as proposed by Honda et al. (1996a).

Pestivirus.
As with HCV and GBV-C the sequence of the 5' UTR region was more conserved than the rest of the genome (Table 1) but we still found a considerable amount of consistent and compensatory mutations.

Stem PV1 was proposed as Ia by Brown et al. (1992) and as domain A by Deng & Brock (1993) (Fig. 5b).

Fletcher & Jackson (2002) observed that a deletion of nucleotides comprising stem PV2 (II in Brown et al., 1992; domain C in Deng & Brock, 1993) decreased the activity of IRES to 19 %. Though the pair probabilities in stem PV2 were small (Fig. 5b), we found no inconsistencies and a considerable amount of compensatory mutations. This might point out the importance of the structure rather than the sequence to IRES function in this region.

As in previous studies (Deng & Brock, 1993; Fletcher & Jackson, 2002; Kolupaeva et al., 2000b; Moser et al., 2001) our method detected stem PV3 as an important feature of Pestivirus IRES structure. Even though our algorithm does not allow pseudoknots, both stems of the pseudoknot reported by Pestova et al. (1998) show up in the base pairing probabilities.

Coding region
GBV-C.
We found two significantly conserved stems (HG9 and HG10) in the E1 region, which were previously proposed by Simmonds & Smith (1999) based on a different algorithm (data presented in the supplemental material).

Conserved secondary structures seemed to be concentrated in the NS5A and NS5B region of the GBV-C genome (Fig. 6). Some of these had already been proposed by Cuceanu et al. (2001) (Fig. 6c, e). Furthermore, HG38 corresponds to SLV and HG39 to SLIV. In our data the SLI motif is completely conserved in the sequence. In SLVI we found more inconsistent mutations than compensatory mutations (data not shown) and the proposed SLVII structure could not be found with our method.

HCV.
Again, we found most of the conserved structures in the NS5A and NS5B regions. Some of these have been previously reported as important for the efficiency of the IRES function (Tuplin et al., 2002; Zhao & Wimmer, 2001). One of the motifs detected by Tuplin et al. (2002) is HC4, shown in Fig. 6(d). Tuplin et al. (2002) further found HC6 as SL443, HC27 as SL8828 and HC28 as SL9011. According to our data there was no evidence for the existence of SL7730 and SL9118. SL8926 showed too many inconsistencies in our data and SL8376 was not folded because of interactions of this region with the 3' UTR (discussed later).

Ray et al. (1999) argued that HCV persistence is associated with sequence variability in putative envelope genes E1 and E2. We found a conserved RNA structure, HC7, in the E1 region (Fig. 6f).

Pestivirus.
All putative conserved secondary structural elements in the coding region of PESTI were very short. A stem–loop downstream of the initiator AUG appears in our data to have too many inconsistencies and thus cannot be considered as a conserved feature of PESTI, in agreement with the analysis of Myers et al. (2001). The most prominent stems found in the coding region are shown in Fig. 6(g) and (h).

3' UTR
GBV-C.
The 3' UTR sequences of GBV-C are highly conserved ({sigma}=96·7 %). Not surprisingly, we predicted structures similar to those previously reported (Katayama et al., 1998; Okamoto et al., 1997; Xiang et al., 2000) but not all of them were supported by sequence covariation (data not shown). Some of the previously proposed structures conflict with long-range interactions to the 5' UTR predicted by our method (discussed later). One example well supported by sequence covariation is the structure HG43 that was also proposed by Cuceanu et al. (2001) and Xiang et al. (2000).

HCV.
The 3' UTR consists of a short sequence of variable length and composition (variable region), a U-rich stretch (poly-U-UC region) variable in its length and a highly conserved sequence of approximately 100 nt at the 3' end (conserved region, X-tail) (Kolykhalov et al., 1996; Tanaka et al., 1996; Yamada et al., 1996). Within this X-tail we found only a single mutation, which is compatible with the predicted structure. Our stem HC29 corresponds to SL1 as previously reported (Blight & Rice, 1997; Ito & Lai, 1997; Yamada et al., 1996). Stems SL2 and SL3, as proposed by Blight & Rice (1997) and Ito & Lai (1997), compete in our data with the formation of two long-range interactions, LR1 and LR2. The probability of base pairs in LR1 was around P=0·54, significantly higher than HC29 (SL1). The elements SL2 and SL3 were thermodynamically unfavourable in the genomic context and could only be detected when a sequence window was used that was too small to contain the long-range interactions. More recently, Yi & Lemon (2003) introduced several point mutations in the X-tail of the 3' UTR of HCV. Their results could not provide proof for the existence of SL2 or SL3 but indicated that there are stringent requirements for the sequence in this region.

Pestivirus.
Pestiviruses are very heterogeneous in their 3' UTR region, due to extended AU-rich insertions in some strains. The only RNA feature that was shared among all available sequences is the terminal stem PV15 that was originally described by Deng & Brock (1993) and also by Becher et al. (1998) and Yu et al. (1999).

Genome cyclization.
Surprisingly, we discovered strong evidence for genome cyclization not only in the genus Flavivirus, where this effect has already been described in the literature, but also within HCV and GBV-C. The most prominent of them are shown in Fig. 7.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 7. Putative cyclization regions HCVCS3 and HGVCS3 in the genomes of HCV and GBV-C, respectively. The boxed areas point out sequences that might be read as palindrome sequences and may play a functional role in replication processes.

 
In the GBV-C genome, cyclization is localized to base-pairings between nt 33–48 with nt 9367–9353 (HGCS1: pair probabilities {approx} 0·6), nt 128–140 with nt 9224–9214 (HGVCS2) and nt 163–175 with nt 9213–9201 (HGVCS3) (both with pair probabilities {approx} 0·7) (Fig. 7). These domains are very conserved in the sequence. We found only one consistent mutation at base pair (42,9357). On the other hand there was one sequence carrying an inconsistent mutation at base pair (130,9222).

In HCV, putative cyclization domains comprised base pairs of nt 1–3 with nt 8627–8625 (HCVCS1), 88–92 with 8602–8606 (HCVCS2) and 95–110 with 8556–8571 (HCVCS3). Within HCVCS3 we found two sites of compensatory mutations (Fig. 7). In HCV, nucleotides from the IRES region (nt 1–3, 88–92 and 95–110) are paired with nucleotides within the coding region for the protein NS5B. At the same time we observed two regions of the 3' UTR to fold forward to the NS5B region as well: (i) LR1: nt 8628–8661 (NS5B) paired with nt 9599–9633 (3' UTR) and (ii) LR2: nt 8978–8995 (NS5B) paired with nt 9583–9598 (3' UTR). This brought the 5' and 3' regions into very close proximity, as illustrated in supplemental material C. Sequence position 8627 is involved in the interaction with the IRES; the adjacent nt 8627 pairs with the 3' UTR.

All of the mutations (15 point mutations and 6 double mutations) studied in Yi & Lemon (2003) exhibit reduced or no replication activity. Most of them would disrupt base pairs in either LR1 or LR2, supporting our proposed interactions. However, five of the point mutations are in predicted loop regions and would be expected to cause only minor secondary structural changes. This could indicate that there are sequence constraints beyond conservation of secondary structure. However, to prove or disprove the existence of LR1 and LR2, more mutation experiments would be needed.


   DISCUSSION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
We have employed a combination of structure prediction based on thermodynamic rules and the evaluation of consistent and compensatory mutations to search Flaviviridae genomes for functional RNA structure motifs. While the UTRs of some of these viruses have been previously studied, this contribution reports a comprehensive survey of structural features across the full genomes of the whole family Flaviviridae. Furthermore, instead of using a ‘sliding window’ technique, all predictions were carried out for the complete genomic RNA sequences. This enables our algorithm to find long-range interactions; in particular we found significant probability for cyclization in all genera except Pestivirus.

In the genus Flavivirus a cyclization of the genome had already been described in the literature and localized to very conserved cyclization sequences. Apart from recovering these known cyclization sequences, we detected further sequences which took part in cyclization for all species in this study (P1', P1 and P2). These sequences varied considerably in sequence, length and position. Men et al. (1996) showed that deleting these sequences led to a greatly delayed and reduced growth in simian and mosquito cells. It is possible that these additional cyclization domains are not strictly necessary for virus viability, but only support and stabilize viral genome cyclization.

Most surprisingly, we also found viral genome cyclization in GBV-C and HCV, which had not been reported before, although Yi & Lemon (2003) suppose a cyclization of HCV genome by the assistance of some cellular protein. Our algorithm made out base pair probabilities for both previously reported secondary structures in 5' and 3' UTRs as well as for genome cyclization. For both cases, our data revealed no inconsistencies. Thus known structures compete with genome cyclization. Our evaluation conditions favoured genome cyclization based both on thermodynamic prediction, in the case of HCV, and sequence covariation. This result can be interpreted either as a relict of ancient ancestors between these genera and the genus Flavivirus or, more speculatively, as a switch providing different functions in different states of the viral life-cycle (e.g. a switch between replication and translation states of the virus).

While in Flavivirus and GBV-C the 5' and 3' ends pair within the untranslated regions, we found base pairing in HCV between the 5' and the 3' ends to a region some 1000 nt upstream of the 3' end (i.e. a region within the NS5B protein). More interestingly, we observed that, in this way, 5' and 3' ends were brought closely together. This could be a reason for the particular importance of the NS5B region as assumed in the literature (Oh et al., 1999, 2000). It may also explain the results of Friebe et al. (2001) and Kim et al. (2002), who observed that domains HC1(I) and HC2(II) in the 5' UTR are essential for replication, while domain HC3(III) helps to facilitate replication but is not absolutely required.

Furthermore, in this report (and in the supplementary material available online), we present a large number of secondary structure elements that have not been described before, most importantly within the coding region. This information could be used to identify additional regions that might be important for virus viability and propagation, and thus to gain more insight into the life-cycle of the members of the family Flaviviridae.


   ACKNOWLEDGEMENTS
 
C. T. and C. W. are supported by the Austrian Fonds zur Förderung der Wissenschaftlichen Forschung, project no. P-13545-MAT. The work of P. F. S. is supported in part by the DFG Bioinformatics Initiative, BIZ-6/1-2. We would like to thank the anonymous reviewers for their helpful comments.


   REFERENCES
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Becher, P., Orlich, M. & Thiel, H. J. (1998). Complete genomic sequence of border disease virus, a pestivirus from sheep. J Virol 72, 5165–5173.[Abstract/Free Full Text]

Blight, K. J. & Rice, C. M. (1997). Secondary structure determination of the conserved 98-base sequence at the 3' terminus of hepatitis C virus genome RNA. J Virol 71, 7345–7352.[Abstract]

Brinton, M. A. & Dispoto, J. H. (1988). Sequence and secondary structure analysis of the 5'-terminal region of flavivirus genome RNA. Virology 162, 290–299.[Medline]

Brown, E. A., Zhang, H., Ping, L. H. & Lemon, S. M. (1992). Secondary structure of the 5' nontranslated regions of hepatitis C virus and pestivirus genomic RNAs. Nucleic Acids Res 20, 5041–5045.[Abstract]

Collier, A. J., Gallego, J., Klinck, R., Cole, P. T., Harris, S. J., Harrison, G. P., Aboul-Ela, F., Varani, G. & Walker, S. (2002). A conserved RNA structure within the HCV IRES eIF3-binding site. Nat Struct Biol 9, 375–380.[Medline]

Cuceanu, N. M., Tuplin, A. & Simmonds, P. (2001). Evolutionarily conserved RNA secondary structures in coding and non-coding sequences at the 3' end of the hepatitis G virus/GB-virus C genome. J Gen Virol 82, 713–722.[Abstract/Free Full Text]

Deng, R. & Brock, K. V. (1993). 5' and 3' untranslated regions of pestivirus genome: primary and secondary structure analyses. Nucleic Acids Res 21, 1949–1957.[Abstract]

Fletcher, S. P. & Jackson, R. J. (2002). Pestivirus internal ribosome entry site (IRES) structure and function: elements in the 5' untranslated region important for IRES function. J Virol 76, 5024–5033.[Abstract/Free Full Text]

Fontana, W., Konings, D. A. M., Stadler, P. F. & Schuster, P. (1993). Statistics of RNA secondary structures. Biopolymers 33, 1389–1404.[Medline]

Friebe, P., Lohmann, V., Krieger, N. & Bartenschlager, R. (2001). Sequences in the 5' nontranslated region of hepatitis C virus required for RNA replication. J Virol 75, 12047–12057.[Abstract/Free Full Text]

Gorodkin, J., Heyer, L. J. & Stormo, G. D. (1997). Finding common sequences and structure motifs in a set of RNA molecules. In Proceedings of the ISMB-97, pp. 120–123. Edited by T. Gaasterland, P. Karp, K. Karplus, C. Ouzounis, C. Sander & A. Valencia. Menlo Park, CA: AAAI Press.

Hahn, C. S., Hahn, Y. S., Rice, C. M., Lee, E., Dalgarno, L., Strauss, E. G. & Strauss, J. H. (1987). Conserved elements in the 3' untranslated region of flavivirus RNAs and potential cyclization sequences. J Mol Biol 198, 33–41.[Medline]

Hofacker, I. L. & Stadler, P. F. (1999). Automatic detection of conserved base pairing patterns in RNA virus genomes. Comput Chem 23, 401–414.[CrossRef][Medline]

Hofacker, I. L., Fontana, W., Stadler, P. F., Bonhoeffer, S., Tacker, M. & Schuster, P. (1994). Fast folding and comparison of RNA secondary structures. Monatsh Chem 125, 167–188.

Hofacker, I. L., Fekete, M., Flamm, C., Huynen, M. A., Rauscher, S., Stolorz, P. E. & Stadler, P. F. (1998). Automatic detection of conserved RNA structure elements in complete RNA virus genomes. Nucleic Acids Res 26, 3825–3836.[Abstract/Free Full Text]

Hofacker, I. L., Fekete, M. & Stadler, P. F. (2002). Secondary structure prediction for aligned RNA sequences. J Mol Biol 319, 1059–1066.[CrossRef][Medline]

Hogeweg, P. & Hesper, B. (1984). Energy directed folding of RNA sequences. Nucleic Acids Res 12, 67–74.[Abstract]

Honda, M., Brown, E. A. & Lemon, S. M. (1996a). Stability of a stem–loop involving the initiator AUG controls the efficiency of internal initiation of translation on hepatitis C virus RNA. RNA 2, 955–968.[Abstract]

Honda, M., Ping, L. H., Rijnbrand, R. C., Amphlett, E., Clarke, B., Rowlands, D. & Lemon, S. M. (1996b). Structural requirements for initiation of translation by internal ribosome entry within genome-length hepatitis C virus RNA. Virology 222, 31–42.[CrossRef][Medline]

Ito, T. & Lai, M. M. C. (1997). Determination of the secondary structure of and cellular protein binding to the 3'-untranslated region of the hepatitis C virus RNA genome. J Virol 71, 8698–8706.[Abstract]

Kalliampakou, K. I., Psaridi-Linardaki, L. & Mavromara, P. (2002). Mutational analysis of the apical region of domain II of the HCV IRES. FEBS Lett 511, 79–84.[CrossRef][Medline]

Katayama, K., Kageyama, T., Fukushi, S., Hoshino, F. B., Kurihara, C., Ishiyama, N., Okamura, H. & Oya, A. (1998). Full-length GBV-C/HGV genomes from nine Japanese isolates: characterization by comparative analysis. Arch Virol 143, 1–13.[CrossRef][Medline]

Khromykh, A. A., Meka, H., Guyatt, K. J. & Westaway, E. G. (2001). Essential role of cyclization sequences in flavivirus RNA replication. J Virol 75, 6719–6728.[Abstract/Free Full Text]

Kieft, J. S., Zhou, K., Jubin, R. & Doudna, J. A. (2001). Mechanism of ribosome recruitment by hepatitis C IRES RNA. RNA 7, 194–206.[Abstract/Free Full Text]

Kieft, J. S., Zhou, K., Grech, A., Jubin, R. & Doudna, A. (2002). Crystal structure of an RNA tertiary domain essential to HCV IRES-mediated translation initiation. Nat Struct Biol 9, 370–374.[Medline]

Kim, Y. K., Kim, C. S., Lee, S. H. & Jang, S. K. (2002). Domains I and II in the 5' nontranslated region of the HCV genome are required for RNA replication. Biochem Biophys Res Commun 290, 105–112.[CrossRef][Medline]

Kolupaeva, V. G., Pestova, T. V. & Hellen, C. U. (2000a). An enzymatic footprinting analysis of the interaction of 40S ribosomal subunits with the internal ribosomal entry site of hepatitis C virus. J Virol 74, 6242–6250.[Abstract/Free Full Text]

Kolupaeva, V. G., Pestova, T. V. & Hellen, C. U. (2000b). Ribosomal binding to the internal ribosomal entry site of classical swine fever virus. RNA 6, 1791–1807.[Abstract/Free Full Text]

Kolykhalov, A. A., Feinstone, S. & Rice, C. M. (1996). Identification of a highly conserved sequence element at the 3' terminus of hepatitis C virus genome RNA. J Virol 70, 3363–3371.[Abstract]

Leitmeyer, K. C., Vaughn, D. W., Watts, D. M., Salas, R., Villalobos, I., de Chacon, I. V., Ramos, C. & Rico-Hesse, R. (1999). Dengue virus structural differences that correlate with pathogenesis. J Virol 73, 4738–4747.[Abstract/Free Full Text]

Lück, R., Steger, G. & Riesner, D. (1996). Thermodynamic prediction of conserved secondary structure: application to the RRE element of HIV, the tRNA-like element of CMV, and the mRNA of prion protein. J Mol Biol 258, 813–826.[CrossRef][Medline]

Lück, R., Gräf, S. & Steger, G. (1999). ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. Nucleic Acids Res 27, 4208–4217.[Abstract/Free Full Text]

Lukavsky, P. J., Kim, I., Otto, G. A. & Puglisi, J. D. (2003). Structure of HCV IRES domain II determined by NMR. Nat Struct Biol 10, 1033–1038.[CrossRef][Medline]

Mandl, C. W., Holzmann, H., Kunz, C. & Heinz, F. X. (1993). Complete genomic sequence of Powassan virus: evaluation of genetic elements in tick-borne versus mosquito-borne flaviviruses. Virology 194, 173–184.[CrossRef][Medline]

Mandl, C. W., Holzmann, H., Meixner, T., Rauscher, S., Stadler, P. F., Allison, S. L. & Heinz, F. X. (1998). Spontaneous and engineered deletions in the 3' noncoding region of tick-borne encephalitis virus: construction of highly attenuated mutants of a flavivirus. J Virol 72, 2132–2140.[Abstract/Free Full Text]

Mathews, D. H., Sabina, J., Zuker, M. & Turner, H. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288, 911–940.[CrossRef][Medline]

McCaskill, J. S. (1990). The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29, 1105–1119.[Medline]

Men, R., Bray, M., Clark, D., Chanock, R. M. & Lai, C. J. (1996). Dengue type 4 virus mutants containing deletions in the 3' noncoding region of the RNA genome: analysis of growth restriction in cell culture and altered viremia pattern and immunogenicity in rhesus monkeys. J Virol 70, 3930–3937.[Abstract]

Meyers, G. & Thiel, H. J. (1996). Molecular characterization of pestiviruses. Adv Virus Res 47, 53–118.[Medline]

Moser, C., Bosshart, A., Tratschin, J. D. & Hofmann, M. A. (2001). A recombinant classical swine fever virus with a marker insertion in the internal ribosome entry site. Virus Genes 23, 63–68.[CrossRef][Medline]

Myers, T. M., Kolupaeva, V. G., Mendez, E., Baginski, S. G., Frolov, I., Hellen, C. U. & Rice, C. M. (2001). Efficient translation initiation is required for replication of bovine viral diarrhea virus subgenomic replicons. J Virol 75, 4226–4238.[Abstract/Free Full Text]

Odreman-Macchioli, F. E., Tisminetzky, S. G., Zotti, M., Baralle, F. E. & Buratti, E. (2000). Influence of correct secondary and tertiary RNA folding on the binding of cellular factors to the HCV IRES. Nucleic Acids Res 28, 875–885.[Abstract/Free Full Text]

Oh, J. W., Ito, T. & Lai, M. M. (1999). A recombinant hepatitis C virus RNA-dependent RNA polymerase capable of copying the full-length viral RNA. J Virol 73, 7694–7702.[Abstract/Free Full Text]

Oh, J. W., Sheu, G. T. & Lai, M. M. (2000). Template requirement and initiation site selection by hepatitis C virus polymerase on a minimal viral RNA template. J Biol Chem 275, 17710–17717.[Abstract/Free Full Text]

Okamoto, H., Nakao, H., Inoue, T., Fukuda, M., Kishimoto, J., Iizuka, H., Tsuda, F., Miyakawa, Y. & Mayumi, M. (1997). The entire nucleotide sequences of two GB virus C/hepatitis G virus isolates of distinct genotypes from Japan. J Gen Virol 78, 737–745.[Abstract]

Pestova, T. V., Shatsky, I. N., Fletcher, S. P., Jackson, R. J. & Hellen, C. U. (1998). A prokaryotic-like mode of cytoplasmic eukaryotic ribosome binding to the initiation codon during internal translation initiation of hepatitis C and classical swine fever virus RNAs. Genes Dev 12, 67–83.[Abstract/Free Full Text]

Proutski, V., Gould, E. A. & Holmes, E. C. (1997). Secondary structure of the 3' untranslated region of flaviviruses: similarities and differences. Nucleic Acids Res 25, 1194–1202.[Abstract/Free Full Text]

Proutski, V., Gritsun, T. S., Gould, E. A. & Holmes, E. C. (1999). Biological consequences of deletions within the 3'-untranslated region of flaviviruses may be due to rearrangements of RNA secondary structure. Virus Res 64, 107–123.[CrossRef][Medline]

Psaridi, L., Georgopoulou, U., Varaklioti, A. & Mavromara, P. (1999). Mutational analysis of a conserved tetraloop in the 5' untranslated region of hepatitis C virus identifies a novel RNA element essential for the internal ribosome entry site function. FEBS Lett 453, 49–53.[CrossRef][Medline]

Rauscher, S., Flamm, C., Mandl, C. W., Heinz, F. X. & Stadler, P. F. (1997). Secondary structure of the 3'-noncoding region of flavivrus genomes: comparative analysis of base pairing probabilities. RNA 3, 779–791.[Abstract]

Ray, S. C., Wang, Y. M., Laeyendecker, O., Ticehurst, J. R., Villano, S. A. & Thomas, D. L. (1999). Acute hepatitis C virus structural gene sequences as predictors of persistent viremia: hypervariable region 1 as a decoy. J Virol 73, 2938–2946.[Abstract/Free Full Text]

Rivas, E. & Eddy, S. R. (2000). Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 16, 583–605.[Abstract]

Sankoff, D. (1985). Simultaneous solution of the RNA folding, alignment, and proto-sequence problems. SIAM J Appl Math 45, 810–825.

Schuster, P., Fontana, W., Stadler, P. F. & Hofacker, I. L. (1994). From sequences to shapes and back: a case study in RNA secondary structures. Proc R Soc Lond B Biol Sci 255, 279–284.[Medline]

Simmonds, P. & Smith, D. B. (1999). Structural constraints on RNA virus evolution. J Virol 73, 5787–5794.[Abstract/Free Full Text]

Simons, J. N., Desai, S. M., Schultz, D. E., Lemon, S. M. & Mushahwar, I. K. (1996). Translation initiation in GB viruses A and C: evidence for internal ribosome entry and implication for genome organization. J Virol 70, 6126–6135.[Abstract]

Smith, D. B., Cuceanu, N., Davidson, F., Jarvis, L. M., Mokili, J. L., Hamid, S., Ludlam, C. A. & Simmonds, P. (1997). Discrimination of hepatitis G virus/GBV-C geographical variants by analysis of the 5' non-coding region. J Gen Virol 78, 1533–1542.[Abstract]

Spahn, C. M., Kieft, J. S., Grassucci, R. A., Penczek, P. A., Zhou, K., Doudna, J. A. & Frank, J. (2001). Hepatitis C virus IRES RNA-induced changes in the conformation of the 40s ribosomal subunit. Science 291, 1959–1962.[Abstract/Free Full Text]

Stocsits, R., Hofacker, I. L. & Stadler, P. F. (1999). Conserved secondary structures in hepatitis B virus RNA. In Computer Science in Biology, pp. 73–79. Univ. Bielefeld, Bielefeld, Germany. Proceedings of the GCB'99, Hannover, Germany.

Tanaka, T., Kato, N., Cho, M. J., Sugiyama, K. & Shimotohno, K. (1996). Structure of the 3' terminus of the hepatitis c virus genome. J Virol 70, 3307–3312.[Abstract]

Tang, S., Collier, A. J. & Elliott, R. M. (1999). Alterations to both the primary and predicted secondary structure of stem-loop IIIc of the hepatitis C virus 1b 5' untranslated region (5'UTR) lead to mutants severely defective in translation which cannot be complemented in trans by the wild-type 5'UTR sequence. J Virol 73, 2359–2364.[Abstract/Free Full Text]

Tautz, N., Harada, T., Kaiser, A., Rinck, G., Behrens, S. & Thiel, H. J. (1999). Establishment and characterization of cytopathogenic and noncytopathogenic pestivirus replicons. J Virol 73, 9422–9432.[Abstract/Free Full Text]

Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680.[Abstract]

Tuplin, A., Wood, J., Evans, D. J., Patel, A. H. & Simmonds, P. (2002). Thermodynamic and phylogenetic prediction of RNA secondary structures in the coding region of hepatitis C virus. RNA 8, 824–841.[Abstract/Free Full Text]

van Regenmortel, M. H. V., Fauquet, C., Bishop, D. & 8 other authors (2000). Virus Taxonomy: The Classification and Nomenclature of Viruses. The Seventh Report of the International Committee on Taxonomy of Viruses. San Diego: Academic Press. http://www.ncbi.nlm.nih.gov/ICTVdb/

Witwer, C., Rauscher, S., Hofacker, I. L. & Stadler, P. F. (2001). Conserved RNA secondary structures in Picornaviridae genomes. Nucleic Acids Res 29, 5079–5089.[Abstract/Free Full Text]

Xiang, J., Wunschmann, S., Schmidt, W., Shao, J. & Stapleton, J. T. (2000). Full-length GB virus C (Hepatitis G virus) RNA transcripts are infectious in primary CD4-positive T cells. J Virol 74, 9125–9133.[Abstract/Free Full Text]

Yamada, N., Tanihara, K., Takada, A., Yorihuzi, T. T., Tsutsumi, M., Shimomura, H., Tsuji, T. & Date, T. (1996). Genetic organization and diversity of the 3' noncoding region of the hepatitis C virus genome. Virology 223, 255–261.[CrossRef][Medline]

Yi, M. K. & Lemon, S. M. (2003). 3' nontranslated RNA signals required for replication of hepatitis C virus RNA. J Virol 77, 3557–3568.[Abstract/Free Full Text]

You, S. & Padmanabhan, R. (1999). A novel in vitro replication system for dengue virus. Initiation of RNA synthesis at the 3'-end of exogenous viral RNA templates requires 5'- and 3'-terminal complementary sequence motifs of the viral RNA. J Biol Chem 274, 33714–33722.[Abstract/Free Full Text]

Yu, H., Grassmann, C. W. & Behrens, S. E. (1999). Sequence and structural elements at the 3' terminus of bovine viral diarrhea virus genomic RNA: functional role during RNA replication. J Virol 73, 3638–3648.[Abstract/Free Full Text]

Zhao, W. D. & Wimmer, E. (2001). Genetic analysis of a poliovirus/hepatitis C virus chimera: new structure for domain II of the internal ribosomal entry site of hepatitis C virus. J Virol 75, 3719–3730.[Abstract/Free Full Text]

Received 26 June 2003; accepted 10 December 2003.