A short open reading frame terminating in front of a stable hairpin is the conserved feature in pregenomic RNA leaders of plant pararetroviruses

Mikhail M. Pooggin1,2, Johannes Fütterer3, Konstantin G. Skryabin2 and Thomas Hohn1

Friedrich Miescher Institute, PO Box 2543, CH-4002 Basel, Switzerland1
Centre `Bioengineering', Russian Academy of Sciences, 117312 Moscow, Russia2
Institute for Plant Sciences, ETH Zentrum, CH-8092 Zürich, Switzerland3

Author for correspondence: Thomas Hohn.Fax +41 61 697 39 76. e-mail thomas.hohn{at}fmi.ch


   Abstract
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
In plant pararetroviruses, pregenomic RNA (pgRNA) directs synthesis of circular double-stranded viral DNA and serves as a polycistronic mRNA. By computer-aided analysis, the 14 plant pararetroviruses sequenced so far were compared with respect to structural organization of their pgRNA 5'-leader. The results revealed that the pgRNA of all these viruses carries a long leader sequence containing several short ORFs and having the potential to form a large stem–loop structure; both features are known to be inhibitory for downstream translation. Formation of the structure brings the first long ORF into the close spatial vicinity of a 5'-proximal short ORF that terminates 5 to 10 nt upstream of the stable structural element. The first long ORF on the pgRNA is translated by a ribosome shunt mechanism discovered in cauliflower mosaic (CaMV) and rice tungro bacilliform viruses, representing the two major groups of plant pararetroviruses. Both the short ORF and the structure have been implicated in the shunt process for CaMV pgRNA translation. The conservation of these elements among all plant pararetroviruses suggests conservation of the ribosome shunt mechanism. For some of the less well-studied viruses, the localization of the conserved elements also allowed predictions of the pgRNA promoter region and the translation start site of the first long ORF.


   Introduction
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
To date, the genomes of 14 plant pararetroviruses have been sequenced (see Table 1 for abbreviations and references). They can be classified into two major groups, the icosahedral caulimoviruses and the bacilliform badnaviruses (Table 1), and two distinct viruses, CVMV and PVCV, which differ from both groups and each other in genome organization (Hohn & Fütterer, 1997 ; Richert-Pöggeler & Shepherd, 1997 ). PVCV is also characterized by the presence of core sequences for a putative integrase function (Richert-Pöggeler & Shepherd, 1997 ), which are absent in other pararetroviruses.


View this table:
[in this window]
[in a new window]
 
Table 1. Sequenced plant pararetroviruses

 
In these viruses, circular double-stranded DNA, 7161 (in CSSV) to 8175 (in SoyCMV) bp in length, contains all the long ORFs exclusively on one strand. These ORFs are closely spaced or slightly overlapping, with the exception of one large intergenic region (LIGR) of 497 (in SoyCMV) to 1027 (in ScBV) nt. The LIGR contains, or is supposed to contain, a transcription start site and a poly(A) signal which is positioned such that it allows the production of a terminally redundant pregenomic RNA (pgRNA). The pgRNA has at least two functions in the cytoplasm of infected cells, being both a template for replication via reverse transcription and a complex polycistronic mRNA (reviewed by Rothnie et al., 1994 ). Promoters for pgRNAs have been isolated and characterized for seven viruses (see Table 1). In five of these cases, the pgRNA 5'-end has been mapped by primer extension (see Table 1) and is located 31 to 34 nt downstream of a TATA-box. In SVBV, PVCV, CSSV, SoyCMV and CERV, only putative TATA-boxes have been described (see Table 1 for references). In general, the pgRNA promoter overlaps the coding region, with the TATA-box located within the LIGR, close to or (in FMV) at its left border. This reveals that the pgRNA of plant pararetroviruses begins with an unusually long leader that carries several short ORFs preceding the first long viral ORF (Fig. 1). Moreover, in CaMV, FMV, CERV (Fütterer et al., 1988 ) and RTBV (Hay et al., 1991 ), the leader sequence has been predicted to form a large stem–loop structure. For CaMV, formation of such a structure has been confirmed in vitro (Hemmings-Mieszczak et al., 1997 , 1998 ).



View larger version (47K):
[in this window]
[in a new window]
 
Fig. 1. Comparison of the primary structures in the pgRNA leader of plant pararetroviruses. On the left, a phylogenetic tree shows the relationship of their reverse transcriptases (according to Richert-Pöggeler & Shepherd, 1997 ; note that BSV and DaBV were not included in that comparison). Caulimoviruses and badnaviruses are grouped (as indicated on the right). The leader sequence preceding the first long ORF (ORF VII or ORF I) is depicted as a thick line; the sORFs are indicated by boxes, with internal start codons indicated by vertical lines. The numbered genome position of the pgRNA 5'-end is enclosed within an ellipse if mapped by primer extension or not enclosed if putative. The numbering within the leader is from the 5'-end (except for SoyCMV where the latter is unclear). Red arrows under the leader define the complementary sequences that form the base of the large stem–loop structures shown in Figs 2–4. The conserved sORFs preceding the structures are also in red. A green triangle indicates a putative or, in the case of CaMV, FMV and RTBV (Rothnie et al., 1996 ), mapped poly(A) signal. An arrowhead adjacent to a vertical line (in blue) shows the location of the PBS for reverse transcription.

 
The complex organization of the pgRNA leader raises questions about the mode of initiation of translation. According to the scanning model of translation initiation in eukaryotes (Kozak, 1989 ), which postulates linear migration of ribosomes from the mRNA 5'-end, upstream AUG codons and strong secondary structure are inhibitory elements that should preclude downstream translation. The studies of translational control in CaMV (Fütterer et al., 1990 , 1993 ) and in RTBV (Fütterer et al., 1996 ) have led to the hypothesis of non-linear ribosome migration (ribosome shunt). This hypothesis states that ribosomes start conventional scanning from the pgRNA capped 5'-end for a short distance and then some of them are translocated (shunted) to a landing sequence near the 3'-end of the leader, where they resume scanning and initiate translation of the first long ORF. Further investigation of the shunt mechanism in CaMV has revealed two cis-elements in the leader which are required for efficient shunt, namely a strong structural element at the base of the stem–loop structure, called stem-section 1, and the most 5'-proximal short ORF (sORF), sORF A (Dominguez et al., 1998 ; Hemmings-Mieszczak et al., 1998 ). However, the role of these elements and mechanistic details of the ribosome shunt remain unclear.


   Methods
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
{blacksquare}
For the analysis described in this paper, the considered 5'-region of pgRNA comprised the leader sequence and at least 300 nt of downstream sequence. The 5'-end of the leader was defined in accordance with the primer extension results (referred to in Table 1) or, when those were not available, at a position 31 nt downstream from a putative TATA-box (see Fig. 1). The RNA secondary structures at 25 °C were predicted using the MFold computer program (Wisconsin Package, version 6.0; Genetics Computer Group, Madison, WI, USA) and the resulting optimal structures are drawn schematically in Figs 2, 3 and 4.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2. Structural organization in the pgRNA leaders of caulimoviruses. MFold-predicted, large stem–loop structures of the leaders are schematically drawn with thick lines. Stability of the structures in kcal/mol is given. 5'- and 3'-sequences flanking the main structure are shown in open conformation, since they do not form any extensive structures. (Note that addition of longer upstream or downstream sequences into the MFold program did not disturb the formation of the main structure.) The stable structural element (circled at the stem base) and adjacent regions are enlarged alongside. Short ORFs are boxed and/or numbered. AUG codons are in bold; potential non-AUG initiator codons within the shunt landing sequence are in italics. C-terminal amino acids of the conserved 5'-proximal sORF are indicated below. The nucleotide numbering is from the leader 5'-end, except for SoyCMV where the genome positions are given.

 


View larger version (29K):
[in this window]
[in a new window]
 
Fig. 3. Structural organization in the pgRNA leaders of badnaviruses. For details see the legend to Fig. 2.

 


View larger version (13K):
[in this window]
[in a new window]
 
Fig. 4. Structural organization in the pgRNA leaders of distinct members of plant pararetroviruses. For details see the legend to Fig. 2.

 

   Results and Discussion
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
In the present study, we applied computer-aided comparison of all sequenced plant pararetroviruses with regard to structural organization of their pgRNA leaders in order to identify phylogenetically conserved structural features that might be essential for the ribosome shunt. In most cases, 5'-parts of the leader annealed to its 3'-parts to form an extended stem–loop structure. Furthermore, inspection of these structures revealed another striking similarity manifested as a 5'-proximal sORF terminating a few nucleotides upstream of a stable helix located at or near the base of the stem–loop structure (close-ups in Figs 2, 3 and 4). In the following sections we describe in detail the results of this comparison for different groups, subgroups and distinct members of plant pararetroviruses, and the implications for the ribosome shunt mechanism and overall translational strategy of these viruses.

Caulimoviruses
In CaMV, the type member of the caulimoviruses, sORF A (consisting of four codons including the stop codon) is the most 5'-proximal sORF in the leader (Fig. 1). It terminates 6 nt upstream of stem-section 1, the most stable element located at the base of the large stem–loop structure (Fig. 2). The formation of stem-section 1 or a similar structure has previously been shown by mutation analysis to be essential for ribosome shunt in vitro (Dominguez et al., 1998 ). Furthermore, knock-out mutation of sORF A nearly abolished shunt-mediated translation (Fütterer et al., 1993 ), and shifting its position back or forth relative to stem-section 1 reduced shunt efficiency considerably (Dominguez et al., 1998 ). Evidence for the important role of sORF A was also obtained from the high reversion rate of various sORF A mutations in infectivity studies in planta (Pooggin et al., 1998 ). Taken together, these results have led us to hypothesize (Pooggin et al., 1998 ) that sORF A is translated and during or just after termination of this translational event, the released 40S subunit is associating with the shunt landing sequence. The latter is brought into the close spatial vicinity of the sORF A stop codon due to the formation of stem-section 1 (see Fig. 2). In this hypothesis, the positions of the 5'-proximal sORF stop codon and the shunt landing sequence, both flanking the base of the strong structure, should be critical for efficient shunt.

The pgRNA leaders of FMV and CERV, the closest relatives of CaMV (see Fig. 1), are structurally very similar. As in CaMV, these leaders are around 600 nt in length and contain seven to nine sORFs (Fig. 1). Their most 5'-proximal sORF consists of four codons and terminates 7 and 8 nt, respectively, in front of the base of the structure (Fig. 2). This similarity suggests that ribosome shunt may also operate in both of these viruses. The only difference is that in CERV the first long ORF, ORF VII, is separated from the structural element by a 148 nt sequence containing two sORFs. These sORFs might significantly reduce downstream translation, unless they are poorly recognized by shunting ribosomes due to unfavourable contexts of their start codons, or unless they allow efficient reinitiation. In CaMV and FMV, ORF VII is at a closer distance – 55 and 36 nt, respectively – with no AUG in between (Figs 1 and 2).

In SVBV, the next most closely related caulimovirus (Fig. 1), the folding of the 530 nt LIGR revealed an extended stem–loop structure situated 36 nt upstream of the ORF VII (Fig. 2). Moreover, an sORF of five codons terminates 6 nt in front of this structure. Further upstream of this sORF, there is a typical TATA-box (at position 7220 of the genome), which has been assigned by Petrzik et al. (1998) to the pgRNA promoter. A corresponding transcription start site should be around position 7251, which makes the described sORF the most 5'-proximal in the pgRNA leader (Fig. 1). Furthermore, a putative poly(A) signal was identified at 7325, a position downstream of the putative transcription start. Thus, SVBV strongly resembles the above-described viruses with respect to the pgRNA leader organization and may therefore use ribosome shunt to control pgRNA translation.

PCSV and SoyCMV form a subgroup of the caulimoviruses, because they have a different genome organization. In place of ORF II of other caulimoviruses, they both have two consecutive ORFs, designated A and B. In addition, the primer binding site (PBS) for reverse transcription is located within ORF A, in contrast to all other plant pararetroviruses where the PBS is upstream of ORF I (see Fig. 1) (Mushegian et al., 1995 ; Hohn & Fütterer, 1997 ). In PCSV, a region with promoter activity has recently been identified (Maiti & Shepherd, 1998 ), suggesting that the pgRNA 5'-end is around position 6080. The resulting pgRNA leader is 345 nt in length and contains four AUGs (Fig. 1). With MFold this leader could be folded into a rod-like structure (Fig. 2). The first three AUGs are in-frame within one sORF with the stop codon positioned 6 nt upstream of the most stable helix near the base of the structure (Fig. 2). It is plausible that this sORF is translated (regardless of which AUG is recognized) and following this translational event shunt might occur due to the structure. The shunting ribosomes would then resume scanning just upstream of the ORF VII AUG located 36 nt downstream of the stable helix (Fig. 2). Thus, we conclude that the structural organization of the PCSV pgRNA leader is also conserved and is consistent with the shunt model.

SoyCMV is the only caulimovirus that so far does not fit into our scheme. Its putative TATA-box at position 6147 within the LIGR (Hasegawa et al., 1989 ) is identical to that of the PCSV promoter (TATAAATAAG). The resulting pgRNA leader would be 317 nt in length containing only one sORF and this overlaps ORF VII (Fig. 1). With MFold no extensive secondary structure could be predicted in this leader (not shown). However, when the whole LIGR of 497 nt was folded, a complex, branched structure did appear 14 nt upstream of the ORF VII start codon (Fig. 2). Although we were not able to recognize any typical TATA-box in the further upstream sequence, a one-codon sORF, AUGUAA, was present 8 nt in front of the structure (Fig. 2). It should be mentioned, however, that such a one-codon sORF cannot undergo a normal translation process. If introduced into the CaMV leader in place of sORF A it strongly reduces both infectivity of the virus (Pooggin et al., 1998 ) and efficiency of the ribosome shunt (M. M. Pooggin, T. Hohn & J. Fütterer unpublished results). Nevertheless, in the presence of the downstream hairpin, translation of a longer sORF may be started from an in-frame, non-AUG initiator codon that is, in the case of SoyCMV, the AUA codon just in front of the AUG.

Badnaviruses
Badnaviruses differ from caulimoviruses in several aspects of genome organization and gene expression strategy (Rothnie et al., 1994 ; Hohn & Fütterer, 1997 ). Particularly, caulimoviruses possess a translational transactivator protein (TAV), the product of ORF VI, that allows translation of polycistronic pgRNA via a reinitiation mechanism (Bonneville et al., 1989 ; Fütterer & Hohn, 1991 ; Scholthof et al., 1992 ; reviewed by Fütterer & Hohn, 1996 ). In contrast, badnaviruses lack TAV and seem to use a leaky scanning mechanism to express ORFs II and III from pgRNA, as has been recently shown for RTBV (Fütterer et al., 1997 ). Despite this difference in strategy, both RTBV and CaMV utilize the ribosome shunt to deal with a complex leader and initiate translation of the first long ORF (ORF I or VII, respectively) (Fütterer et al., 1993 , 1996 ).

The comparison with caulimoviruses revealed that pgRNA of badnaviruses begins with a longer leader containing even more sORFs (Fig. 1). The RTBV leader is 697 nt long with 13 upstream AUGs. The MFold-predicted structure of this leader showed the same conserved features as described above for caulimoviruses: the most 5'-proximal sORF of seven codons, sORF 1, terminates 7 nt upstream of the base of a large stem–loop structure, with the proposed shunt landing sequence immediately downstream of the structure (Fig. 3). The shunt landing sequence has been precisely defined (Fütterer et al., 1996 ) as a 12 nt sequence between two AUU triplets at positions 686 and 698 of the leader (underlined in Fig. 3). The second AUU, being in a favourable sequence context for translation initiation, serves as an authentic start codon of ORF I, which is recognized by a fraction of the shunting ribosomes (Fütterer et al., 1996 ). By analogy with CaMV and based on the structural conservation, we propose here that the shunting ribosomes are delivered to the AUU start codon from a take-off point located near the sORF 1 stop codon, following termination of a translational event at this 5'-proximal sORF.

In CoYMV, as well as in RTBV, the transcriptional start site has been mapped and the promoter characterized (Medberry et al., 1990 , 1992 ). The structural prediction revealed that the CoYMV leader (Fig. 1) could be folded into an elongated rod-like structure (Fig. 3). Closer inspection indicated that sORF 2 (of 15 codons including two internal, in-frame AUGs) terminates 7 nt upstream of a very stable helix located near the base of the rod. On the complementary side, 52 nt away from this helix, the initiator AUG of ORF I is situated (Fig. 3; note that the position of the ORF I start codon was corrected by Cheng et al., 1996 ). It is tempting to propose that ribosome shunt occurs upon translation of sORF 2, which delivers ribosomes towards the ORF I start codon.

ScBV is the closest but least well-characterized relative of CoYMV (Bouhida et al., 1993 ) (Fig. 1). The two viruses have a nearly identical genome organization (Hohn & Fütterer, 1997 ; Fütterer et al., 1997 ). The folding of the ScBV LIGR resulted in an extended, hairpin-like structure formed just in front of ORF I. A typical TATA-box is located at position 7373, upstream of the region forming this structure, suggesting a putative transcription start at around 7403. Consistent with this, the region 5999 to 7420 has recently been shown (Tzafrir et al., 1998 ) to possess promoter activity. Counting from the putative transcription start, a second sORF of nine codons including one internal, in-frame AUG was identified that terminates 7 nt upstream of a stable helix. 41 nt downstream of the helix, the adjacent 3'-sequence contains the ORF I AUG start codon (Fig. 3). Thus, the same conserved features are evident, which allow us to propose a shunt mechanism that delivers ribosomes to the 3'-end of the leader following translation of sORF 2.

It should be mentioned that both CoYMV and ScBV leaders contain another sORF, sORF 1, upstream of the conserved sORF 2 (Fig. 1 and 3). However in both cases, the sORF 1 start codon is in a weak context and therefore ribosomes could reach sORF 2 by leaky scanning.

In CSSV, a branched stem–loop structure can be predicted. Pairing a part of the LIGR and the beginning of the ORF I coding region (Figs 1 and 3) forms the base of this structure. A typical TATA-box in the LIGR has been recognized at position 6962 (Hagen et al., 1993 ). This suggests a putative 5'-end of the pgRNA leader at around position 6993, which is about 200 nt upstream of this structure. The most stable helix near the base of this structure is preceded by three sORFs, the first two having weak AUGs and the third one (sORF 3) a strong start codon and one internal, in-frame AUG. sORF 3 terminates 7 nt upstream of the stable helix, suggesting that this sORF might be part of the shunt donor (Fig. 3). Downstream of the structured region up to the ORF III AUG there is no other AUG save for the ORF II start codon in a weak context, a situation reminiscent of RTBV and therefore consistent with the leaky scanning strategy of polycistronic translation (discussed by Fütterer et al., 1997 ). By analogy with RTBV, it can be further proposed that the CSSV ORF I starts with a non-AUG initiator codon, since its upstream, in-frame AUG is deeply buried within the structured region (Fig. 1), which would be bypassed by possible shunt. A good candidate for such a codon is a GUG triplet in a moderate context, located 11 nt away from the structural element (Fig. 3), at a location similar to that of the AUU start codon of RTBV. In our scenario, after translation of sORF 3, the shunting ribosomes could be delivered directly to this codon and a small fraction of them could recognize it as an initiator, whereas the others could scan further downstream to reach ORF II and eventually ORF III.

Two new badnaviruses, BSV (Harper & Hull, 1998 ) and DaBV (R. Briddon, S. Phillips, A. Brunt & R. Hull, unpublished results), have just been sequenced. BSV has a LIGR of 956 nt between ORFs III and I. The folding of this region revealed a large hairpin structure 37 nt in front of the ORF I start codon. Strikingly, an sORF of six codons including one internal, in-frame AUG terminates 5 nt upstream of this hairpin (Fig. 3). Inspection of the further upstream sequence revealed a typical TATA-box (position 7231 of the genome). Given a putative transcription start 31 nt downstream of this TATA-box (at 7262), the resulting pgRNA leader is 644 nt long and contains 13 AUGs in total. The described sORF is the most 5'-proximal in the BSV leader (Fig. 1) and might be implicated in a shunt process.

Folding of the DaBV LIGR plus 300 nt of ORF I revealed an extended hairpin structure (Fig. 3) that involves 74 nt of ORF I in the formation of its base, thus resembling the situation in CSSV. An sORF of four codons terminates 9 nt in front of this hairpin. Further upstream, we found a typical TATA-box (position 7267), suggesting a pgRNA start at around 7298. Hence, the resulting leader is 582 nt in length and contains six sORFs. The sORF terminating in front of the hairpin becomes the most 5'-proximal in the leader and, by analogy with the aforementioned cases, might be involved in ribosome shunt. As in the case of CSSV, we envisage that due to the structure, shunting ribosomes should arrive at a landing sequence downstream of the ORF I AUG and then resume scanning in search for an initiator codon. In fact, the region of DaBV genome between the AUGs of ORF I and III contains only the AUG of ORF II in a weak context, which is reminiscent of RTBV and CSSV and consistent with the leaky scanning strategy for translation of badnavirus pgRNAs. In DaBV ORF I sequence, we found several potential, in-frame non-AUG start codons downstream of the structure, which might be recognized by shunting ribosomes (one of them in a strong context is shown in Fig. 3). It can be suggested that, in the cases of DaBV and CSSV, the ORF I first-and-only AUG might still be reached via a scanning-reinitiation mechanism, which would result in expression of a full-length product. This protein and its truncated version(s) produced via the shunt mechanism may play different roles in the virus life-cycle.

We conclude that the structural organization of pgRNA leaders of the newly sequenced viruses, BSV and DaBV, is similar to other badnaviruses and also caulimoviruses and therefore implies a ribosome shunt mechanism in their translation strategy.

It is worth mentioning that most badnaviruses (except for RTBV) possess another striking feature. In contrast to caulimoviruses and PVCV and CVMV, they have the PBS for reverse transcription in the 5'-part of the leader, in front of the region forming the large stem–loop structure (see Fig. 1; the PBS is shown as an arrowhead adjacent to a vertical line). Furthermore, a putative poly(A) signal is located very close to the transcription start site (Fig. 1; indicated with a triangle), which would result in a very short terminal redundancy on pgRNA. The significance of such organization for regulation of reverse transcription remains to be investigated.

CVMV and PVCV, distinct plant pararetroviruses
Based on transcription start site mapping (Verdaguer et al., 1996 ), it can be deduced that the pgRNA leader of CVMV is 584 nt in length and contains 16 upstream AUGs (Fig. 1). The folding revealed that most of these codons are located in a large hairpin-like structure (Fig. 4). In this case, a structural organization similar to that of badnaviruses CoYMV and ScBV could be recognized. The second 5'-proximal sORF consisting of 11 codons including two internal, in-frame AUGs terminates 7 nt upstream of a very stable helix near the base of the hairpin (Fig. 4); 137 nt downstream of this helix an AUG start codon of ORF I is located. Again, such an organization suggests a shunt mechanism delivering ribosomes that have translated sORF 2 to a landing sequence near the 3'-end of the leader.

In PVCV, the pgRNA 5'-end position can be deduced from the location of a typical TATA-box in the LIGR at position 6877 (Richert-Pöggeler & Shepherd, 1997 ) and also based on conserved structural organization reported here. In fact, the folding of the LIGR revealed an extended structure formed in the region between the putative promoter and ORF I (Fig. 4). This structure brings into close spatial proximity the most 5'-proximal sORF of four codons, which terminates 10 nt in front of the base of the structure, and the ORF I start codon located 9 nt downstream of the base.

In conclusion, the more distantly related members of plant pararetroviruses also possess a structural organization of their pgRNA leader similar to that found in caulimo- and badnaviruses and consistent with the ribosome shunt model.

The shunt consensus
Overall comparison of the primary and secondary structures presented in Figs 1–4 strongly suggested that the consensus cis-elements essential for the ribosome shunt mechanism have been maintained in the evolution of the whole family of plant pararetroviruses.

The pgRNA leaders of all these viruses contain several sORFs, the coding potential of which is not conserved, and that are rather randomly distributed along the leader sequence (Fig. 1). We have recently shown that removal of most of the sORFs from the CaMV leader by point mutations did not lead to the loss of infectivity (Pooggin et al., 1998 ). However, some of the mutations resulted in a delayed phenotype and appearance of revertants. In revertants, the local secondary structure elements affected by some of those mutations were restored and of all sORFs only the 5'-proximal sORF A was restored frequently. Moreover, analysis of various sORF A mutations and the ensuing reversions suggested that the position of the sORF A stop codon near stem-section 1 is particularly critical, while shifts of its start codon and modification of the coding content are, to a certain extent, tolerated (Pooggin et al., 1998 ; M. M. Pooggin, T. Hohn & J. Fütterer, unpublished observations). In the present study, phylogenetic comparison revealed that a 5'-proximal sORF, which terminates 5 to 10 nt in front of a stable structural element, in fact represents a consensus feature, while other sORFs are not conserved. The length and coding content of this sORF are not maintained across the virus family, although some preference for Cys, Ser, Glu and Gly at the C terminus could be noticed, with Glu or Gly as the last codon and Cys or Ser at the penultimate position (see Figs 2–4). Experiments are now in progress to investigate whether proper termination of sORF translation in front of the stable structure is a prerequisite for efficient ribosome shunting.

In most cases, AUG start codons of the consensus 5'-proximal sORFs do not possess an optimal initiation context, i.e. a critical G at position +4 is missing. However, the stable hairpin at an optimal distance downstream of some of those AUGs is expected to strongly increase the rate of initiation. Such hairpins – or even weaker ones – have been shown to retard scanning and to increase the recognition efficiency of the start codons located at the decoding site of a paused 40S subunit around 15 nt upstream of the hairpin (Kozak, 1990 ). Interestingly, when the consensus sORF is longer (as in PCSV, BSV, ScBV, CoYMV, CSSV and CVMV), it always contains an internal, in-frame AUG positioned at the optimal distance from the hairpin.

Another consensus feature can be found in the sequence following the sORF stop codon, which contains a CU-stretch of 5 to 9 nt (with the exception of FMV, CERV and CSSV where this stretch is shorter) (see Figs 2–4). Interestingly, in the yeast GCN4 RNA leader, the sORF 4 that precludes translation reinitiation (reviewed by Hinnebusch, 1997 ) is also followed by a CU-rich sequence. In contrast, the sequence downstream of the GCN4 sORF 1 that promotes efficient reinitiation is, and must be, essentially AU-rich (Grant & Hinnebusch, 1994 ). Strikingly, our comparison revealed that a region of about 20 nt downstream of the stable structural element is also AU-rich (see Figs 2–4). By analogy with the GCN4 case, we assume that the CU-rich sequence following the 5'-proximal sORF stop codon might promote loosening of the contact between the RNA and the shunting ribosome, whereas the AU-rich sequence located in a close spatial vicinity might then receive the shunting ribosomes, which resume scanning and reinitiate translation.

We noticed that in most cases the consensus AU-rich sequence possesses a potential non-AUG start codon in a good or moderate context, situated some 9 to 14 nt downstream of the structure base (italics in Figs 2–4). In RTBV, such a codon is in fact recognized by a fraction of shunting ribosomes as the start codon of ORF I (see above). Thus, we can propose that a non-AUG start codon located within an AU-rich context is an important cis-element, which may constitute the shunt landing sequence. It should be noted that in PVCV and SoyCMV this position is occupied by the AUG start codon of the first long ORF (Figs 2 and 4).

Biological significance of the leader structure and ribosome shunt
In this study, we show that the pgRNA leader of all the plant pararetroviruses has the potential to form a large stem–loop structure. A biological significance for the formation of this structure can be suggested from the dual role of pgRNA in the cytoplasm of infected cells. Besides its translational function, this RNA is believed to be packaged into a previrion particle where replication should occur (Mesnard & Carriere, 1995 ). The sorting of pgRNA for translation or packaging must be regulated by cis-elements in the pgRNA. In the caulimoviruses CaMV, FMV and CERV and in a badnavirus, RTBV, a conserved purine-rich sequence located in the upper part of the predicted leader structures (the so-called `bowl' region) has been identified and implicated particularly in packaging as a part of an encapsidation signal (Fütterer et al., 1988 ; Hay et al., 1991 ). The ribosome shunt process would leave most of the leader structure intact, i.e. not melted by the scanning ribosomes, thus exposing the putative encapsidation signal for interaction with the viral coat protein. In fact, the CaMV coat protein can physically interact with the CaMV bowl region as has been shown with the yeast three-hybrid system (O. Guerra, M. Hemmings-Mieszczak. & T. Hohn, unpublished results). This resembles the situation found in human immunodeficiency virus where the Gag protein binds the viral RNA encapsidation signal residing in the RNA leader (Bacharach & Goff, 1998 ). Consistent with this, accumulation of the coat protein at later stages of infection would lead to sequestering of pgRNA into the previrion. Alternatively, a low efficiency of ribosome shunt would increase flow of the scanning ribosomes to the centre of the leader, which in turn would disrupt the structure and prevent packaging.

It should be noted that the bowl primary sequence is not absolutely conserved among all the plant pararetroviruses and, besides CaMV, FMV, CERV and RTBV, could be identified only in ScBV and DaBV at positions 458 to 494 and 399 to 433 of their leaders, respectively. However, in other cases, similar purine-rich patterns could be found in the region forming the upper part of the structure. We therefore assume that primary and/or secondary structure elements involved in packaging have diverged in the evolution of different groups and subgroups of the viruses.

Many animal viruses including some complex retroviruses and picornaviruses possess RNA genomes that also begin with long, multiple AUG-containing leaders regulating both gene expression and replication. Phylogenetic comparison of 13 avian leukosis/sarcoma retroviruses revealed three conserved sORFs and similar structural conformations in their RNA leaders (Hackett et al., 1991 ). It has been further suggested that translation-reinitiation events at these sORFs induce structural rearrangements required for the binding of Gag protein to an encapsidation signal located in the leader (Donzé & Spahr, 1992 ; Donzé et al., 1995 ; Sonstegard & Hackett, 1996 ). Our own inspection of those structures did not reveal an organization similar to that of plant pararetroviruses described here. RNA leaders of picornaviruses (e.g. poliovirus end encephalomyocarditis virus), hepatitis C virus and pestiviruses (e.g. classical swine fever virus) contain complex, structured, internal ribosome entry elements (Le et al., 1996 ; reviewed by Stewart & Semler, 1998) with several sORFs. When inspected with MFold, these leaders did not reveal the conservation of a 5'-proximal sORF adjacent to a stable hairpin (not shown). We therefore conclude that the reported structural organization is a distinct and biologically significant feature of plant pararetroviruses.


   Acknowledgments
 
We thank Rob Briddon (John Innes Centre, Norwich, UK) for providing a complete sequence of DaBV prior to publication, and Vitaly Boyko, Helen Rothnie and Orlene Guerra for critical reading of the manuscript and helpful discussions.

M.M.P. was partially supported by INTAS fellowship YCF98-111.


   References
Top
Abstract
Introduction
Methods
Results and Discussion
References
 
Bacharach, E. & Goff, S. P. (1998). Binding of the human immunodeficiency virus type 1 Gag protein to the viral RNA encapsidation signal in the yeast three-hybrid signal. Journal of Virology 72, 6944-6949.[Abstract/Free Full Text]

Bao, Y. & Hull, R. (1993). Mapping the 5'-terminus of rice tungro bacilliform viral genomic RNA. Virology 197, 445-448.[Medline]

Bhattacharyya-Pakrasi, M., Peng, J., Elmer, J. S., Laco, G., Shen, P., Kaniewska, M. B., Kononowicz, H., Wen, F., Hodges, T. K. & Beachy, R. N. (1993). Specificity of a promoter from the rice tungro bacilliform virus for expression in phloem tissues. Plant Journal 4, 71-79.[Medline]

Bonneville, J.-M., Sanfaçon, H., Fütterer, J. & Hohn, T. (1989). Posttranscriptional trans-activation in cauliflower mosaic virus. Cell 59, 1135-1143.[Medline]

Bouhida, M., Lockhart, B. E. L. & Olszewski, N. E. (1993). An analysis of the complete sequence of a sugarcane bacilliform virus genome infectious to banana and rice. Journal of General Virology 74, 15-22.[Abstract]

Calvert, L. A., Ospina, M. D. & Shepherd, R. J. (1995). Characterization of cassava vein mosaic virus: a distinct plant pararetrovirus. Journal of General Virology 76, 1271-1278.[Abstract]

Chen, G., Müller, M., Potrykus, I., Hohn, T. & Fütterer, J. (1994). Rice tungro bacilliform virus: transcription and translation in protoplasts. Virology 204, 91-100.[Medline]

Cheng, C.-P., Lockhart, B. E. L. & Olszewski, N. E. (1996). The ORF I and II proteins of Commelina yellow mottle virus are virion-associated. Virology 223, 263-271.[Medline]

de Kochko, A., Verdaguer, B., Taylor, N., Carcamo, R., Beachy, R. N. & Fauquet, C. (1998). Cassava vein mosaic virus (CsVMV), type species for a new genus of plant double stranded DNA viruses? Archives of Virology 143, 945-962.[Medline]

Dominguez, D. I., Ryabova, L. A., Pooggin, M. M., Schmidt-Puchta, W., Fütterer, J. & Hohn, T. (1998). Ribosome shunt in cauliflower mosaic virus: identification of an essential and sufficient structural element. Journal of Biological Chemistry 273, 3669-3678.[Abstract/Free Full Text]

Donzé, O. & Spahr, P.-F. (1992). Role of the open reading frames of Rous sarcoma virus leader RNA in translation and genome packaging. EMBO Journal 11, 3747-3757.[Abstract]

Donzé, O., Damay, P. & Spahr, P.-F. (1995). The first and third uORFs in RSV leader RNA are efficiently translated: implications for translation regulation and packaging. Nucleic Acids Research 23, 861-868.[Abstract]

Franck, A., Guilley, H., Jonard, G., Richards, K. & Hirth, L. (1980). Nucleotide sequence of cauliflower mosaic virus DNA. Cell 21, 285-294.[Medline]

Fütterer, J., Gordon, K., Bonneville, J.-M., Sanfaçon, H., Pisan, B., Penswick, J. & Hohn, T. (1988). The leading sequence of caulimovirus large RNA can be folded into a large stem–loop structure. Nucleic Acids Research 16, 8377-8390.[Abstract]

Fütterer, J. & Hohn, T. (1991). Translation of a polycistronic mRNA in the presence of the cauliflower mosaic virus transactivator protein. EMBO Journal 10, 3887-3896.[Abstract]

Fütterer, J. & Hohn, T. (1996). Translation in plants – rules and exceptions. Plant Molecular Biology 32, 159-189.[Medline]

Fütterer, J., Gordon, K., Sanfaçon, H., Bonneville, J.-M. & Hohn, T. (1990). Positive and negative control of translation by the leader sequence of cauliflower mosaic virus pregenomic 35S RNA. EMBO Journal 9, 1697-1707.[Abstract]

Fütterer, J., Kiss-László, Z. & Hohn, T. (1993). Non-linear ribosome migration on cauliflower mosaic virus 35S RNA. Cell 73, 789-802.[Medline]

Fütterer, J., Potrykus, I., Bao, Y., Li, L., Burns, T. M., Hull, R. & Hohn, T. (1996). Position-dependent ATT initiation during plant pararetrovirus rice tungro bacilliform virus translation. Journal of Virology 70, 2999-3010.[Abstract]

Fütterer, J., Rothnie, H. M., Hohn, T. & Potrykus, I. (1997). Rice tungro bacilliform virus open reading frames II and III are translated from polycistronic pregenomic RNA by leaky scanning. Journal of Virology 71, 7984-7989.[Abstract]

Grant, C. M. & Hinnebusch, A. G. (1994). Effect of sequence context at stop codons on efficiency of reinitiation in GCN4 Translational control. Molecular and Cellular Biology 14, 606-618.[Abstract]

Guilley, H., Dudley, R. K., Jonard, G., Balazs, E. & Richards, K. E. (1982). Transcription of cauliflower mosaic virus DNA: detection of promoter sequences, and characterization of transcripts. Cell 30, 763-773.[Medline]

Hackett, P. B, Dalton, M. W., Johnson, D. P. & Petersen, R. B. (1991). Phylogenetic and physical analysis of the 5'-leader RNA sequences of avian retroviruses. Nucleic Acids Research 19, 6929-6934.[Abstract]

Hagen, L. S., Jacquemond, M., Lepingle, A., Lot, H. & Tepfer, M. (1993). Nucleotide sequence and genomic organization of cacao swollen shoot virus. Virology 196, 619-628.[Medline]

Harper, G. & Hull, R. (1998). Banana streak virus, complete genome. EMBL accession no. AF002234.

Hasegawa, A., Verver, J., Shimada, A., Saito, M., Goldbach, R., Van Kammen, A., Miki, K., Kameya-Iwaki, M. & Hibi, T. (1989). The complete sequence of soybean chlorotic mottle virus DNA and the identification of a novel promoter. Nucleic Acids Research 17, 9993-10013.[Abstract]

Hay, J. M., Jones, M. C., Blakebrough, M. L., Dasgupta, I., Davies, J. W. & Hull, R. (1991). An analysis of the sequence of an infectious clone of rice tungro bacilliform virus, a plant pararetrovirus. Nucleic Acids Research 19, 2615-2621.[Abstract]

Hemmings-Mieszczak, M., Steger, G. & Hohn, T. (1997). Alternative structures of the cauliflower mosaic virus 35S RNA leader: implications for viral expression and replication. Journal of Molecular Biology 267, 1075-1088.[Medline]

Hemmings-Mieszczak, M., Steger, G. & Hohn, T. (1998). Regulation of CaMV 35S RNA translation is mediated by a stable hairpin in the leader. RNA 4, 101-111.[Abstract]

Hinnebusch, A. G. (1997). Translation regulation of yeast GCN4: a window on factors that control initiator tRNA binding to the ribosome. Journal of Biological Chemistry 272, 21661-21664.[Free Full Text]

Hohn, T. & Fütterer, J. (1997). The proteins and functions of plant pararetroviruses: knowns and unknowns. Critical Reviews in Plant Sciences 16, 133-161.

Hull, R., Sadler, J. & Longstaff, M. (1986). The sequence of carnation etched ring virus DNA: comparison with cauliflower mosaic virus and retroviruses. EMBO Journal 5, 3083-3090.

Kozak, M. (1989). The scanning model for translation: an update. Journal of Cell Biology 108, 229-241.[Abstract]

Kozak, M. (1990). Downstream secondary structure facilitates recognition of initiator codons by eukaryotic ribosomes. Proceedings of the National Academy of Sciences, USA 87, 8301-8305.[Abstract]

Le, S., Siddiqui, A. & Maizel, J. V.Jr (1996). A common structural core in the internal ribosome entry sites of picornavirus, hepatitis C virus, and pestivirus. Virus Genes 12, 135-147.[Medline]

Maiti, I. B. & Shepherd, R. J. (1998). Isolation and expression analysis of peanut chlorotic streak caulimovirus (PClSV) full-length transcript (FLt) promoter in transgenic plants. Biochemical Biophysical Research Communications 244, 440-444.

Medberry, S. L., Lockhart, B. E. L. & Olszewski, N. E. (1990). Properties of Commelina yellow mottle virus's complete DNA sequence, genomic discontinuities and transcript suggest that it is a pararetrovirus. Nucleic Acids Research 18, 5505-5513.[Abstract]

Medberry, S. L., Lockhart, B. E. L. & Olszewski, N. E. (1992). The Commelina yellow mottle virus promoter is a strong promoter in vascular and reproductive tissues. Plant Cell 4, 185-192.[Abstract/Free Full Text]

Mesnard, J. M. & Carriere, C. (1995). Comparison of packaging strategy of retroviruses and pararetroviruses. Virology 213, 1-6.[Medline]

Mushegian, A. R., Wolff, J. A., Richins, R. D. & Shepherd, R. J. (1995). Molecular analysis of the essential and nonessential genetic elements in the genome of peanut chlorotic streak caulimovirus. Virology 206, 823-834.[Medline]

Odell, J. T., Nagy, F. & Chua, N. H. (1985). Identification of DNA sequences required for activity of the cauliflower mosaic virus 35S promoter: combinatorial regulation of transcription in plants. Nature 313, 810-812.[Medline]

Petrzik, K., Benes, V., Mráz, I., Honetslegrová-Fránová, J., Ansorge, W. & Spak, J. (1998). Strawberry vein banding virus – definitive member of the genus Caulimovirus. Virus Genes 16, 303-305.[Medline]

Pooggin, M. M., Hohn, T. & Fütterer, J. (1998). Forced evolution reveals the importance of short ORF A and secondary structure in the cauliflower mosaic virus 35S RNA leader. Journal of Virology 72, 4157-4169.[Abstract/Free Full Text]

Richert-Pöggeler, K. R. & Shepherd, R. J. (1997). Petunia vein-clearing virus: a plant pararetrovirus with the core sequences for an integrase function. Virology 236, 137-146.[Medline]

Richins, R. D., Scholthof, H. B. & Shepherd, R. J. (1987). Sequence of figwort mosaic virus DNA (caulimovirus group). Nucleic Acids Research 15, 8451-8466.[Abstract]

Rothnie, H. M. (1996). Plant mRNA 3'-end formation. Plant Molecular Biology 32, 43-61.[Medline]

Rothnie, H. M., Chapdelaine, Y. & Hohn, T. (1994). Pararetroviruses and retroviruses: a comparative review of viral structure and gene expression strategies. Advances in Virus Research 44, 1-67.[Medline]

Sanfaçon, H. (1994). Analysis of figwort mosaic virus (plant pararetrovirus) polyadenylation signal. Virology 198, 39-49.[Medline]

Sanger, M., Daubert, S. & Goodman, R. M. (1990). Characteristics of a strong promoter from figwort mosaic virus: comparison with the analogous 35S promoter from cauliflower mosaic virus and the regulated mannopine synthase promoter. Plant Molecular Biology 14, 433-443.[Medline]

Scholthof, H. B., Gowda, S., Wu, F. C. & Shepherd, R. J. (1992). The full-length transcript of a caulimovirus is a polycistronic mRNA whose genes are trans-activated by the product of gene VI. Journal of Virology 66, 3131-3139.[Abstract]

Sonstegard, T. S. & Hackett, P. B. (1996). Autogenous regulation of RNA translation and packaging by Rous sarcoma virus Pr76gag. Journal of Virology 70, 6642-6652.[Medline]

Stewart, S. R. & Semler, B. L. (1998). RNA determinants of picornavirus cap-independent translation initiation. Seminars in Virology 8, 242-255.

Tzafrir, I., Torbert, K. A., Lockhart, B. E. L., Somers, D. A. & Olszewski, N. E. (1998). The sugarcane bacilliform badnavirus promoter is active in both monocots and dicots. Plant Molecular Biology 38, 347-356.[Medline]

Verdaguer, B., de Kochko, A., Beachy, R. N. & Fauquet, C. (1996). Isolation and expression in transgenic tobacco and rice plants, of the cassava vein mosaic virus (CVMV) promoter. Plant Molecular Biology 31, 1129-1139.[Medline]

Received 19 January 1999; accepted 8 April 1999.