Evidence for diversifying selection in Potato virus Y and in the coat protein of other potyviruses

Benoît Moury1,2, Caroline Morel1, Elisabeth Johansen2 and Mireille Jacquemond1

Station de Pathologie Végétale, Institut National de la Recherche Agronomique, F-84143 Montfavet cedex, France1
Danish Institute of Agricultural Sciences, Biotechnology Group, Thorvaldsensvej 40 1, DK-1871 Frederiksberg C, Denmark2

Author for correspondence: Benoît Moury (at Institut National de la Recherche Agronomique). Fax +33 4 32 72 28 42. e-mail moury{at}avignon.inra.fr


   Abstract
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
The modes of evolution of the proteins of Potato virus Y were investigated with a maximum-likelihood method based on estimation of the ratio between non-synonymous and synonymous substitution rates. Evidence for diversifying selection was obtained for the 6K2 protein (one amino acid position) and coat protein (24 amino acid positions). Amino acid sites in the coat proteins of other potyviruses (Bean yellow mosaic virus, Yam mosaic virus) were also found to be under diversifying selection. Most of the sites belonged to the N-terminal domain, which is exposed to the exterior of the virion particle. Several of these amino acid positions in the coat proteins were shared between some of these three potyviruses. Identification of diversifying selection events in these different proteins will help to unravel their biological functions and is essential to an understanding of the evolutionary constraints exerted on the potyvirus genome. The hypothesis of a link between evolutionary constraints due to host plants and occurrence of diversifying selection is discussed.


   Introduction
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Potato virus Y (PVY) is the type member of the genus Potyvirus in the family Potyviridae (Shukla et al., 1994 ). The monopartite genome is composed of a single-stranded, positive-sense RNA molecule of about 9·7 kb. During the infection process, this RNA is translated into a large precursor polyprotein that is cleaved co- and post-translationally into 10 mature proteins (Dougherty & Carrington, 1988 ; Riechmann et al., 1992 ). PVY is the cause of major diseases in solanaceous crops including potato, tobacco, pepper and tomato, and also infects many solanaceous and non-solanaceous weeds. The symptoms induced by PVY and its host range are highly variable. These traits are used for the classification of PVY strains into different groups. For example, strains from potato are classified into three main subgroups (O, N and C) according to symptomatology and serology (with monoclonal antibodies).

PVY subgroups obtained by phylogenetic analyses of sequence data correlate only partially with the biological traits of the strains. No molecular determinants of symptomatology or host range have yet been isolated, with the exception of the NIa proteinase of PVY, required for elicitation of the Ry-mediated resistance in potato (Mestre et al., 2000 ). Knowledge of modes of evolution in the different genome regions and in the different phylogenetic lineages would help to identify symptomatology and host-range molecular determinants as well as external (including plant hosts, vector insects and physical environment) and internal (biological functions of PVY proteins) evolutionary constraints exerted on PVY.

The ratio ({omega}) of non-synonymous (amino acid-altering) to synonymous (silent) substitution rates provides an estimate of the selective pressure at the protein level (Kimura, 1983 ). A value of {omega}>1 means that non-synonymous mutations offer fitness advantages to the protein and have higher fixation probabilities than synonymous mutations (diversifying or positive selection). On the other hand, {omega} values close to 0 mean that the protein is essentially conserved at the amino acid level (purifying or negative selection) and {omega}=1 corresponds to neutral evolution. Calculation of {omega} as a mean over all codons in the gene and over the entire evolutionary time that separates the sequences provides only limited information and impedes the detection of diversifying selection events in many cases. Indeed, many proteins appear to be under purifying selection most of the time (Li, 1997 ) and a large proportion of their amino acids are largely invariable (with {omega} close to 0) due to structural constraints. When knowledge of the functional domains of the protein is not available, or when only a few codons or a few lineages undergo diversifying selection, a better approach is to devise statistical models that allow for heterogeneous {omega} ratios among codons or among lineages (Yang, 1998 ; Nielsen & Yang, 1998 ). This can be achieved by maximum-likelihood methods (Yang, 1997 ; Yang et al., 2000 ).

Most of the PVY genomic sequences available are from potato strains. To obtain more information about the diversity and evolution of PVY, we have cloned and sequenced the genomes of two PVY strains, one from tomato and the other from pepper. Using a maximum-likelihood algorithm, we looked for positive selection events in each of the PVY-encoded proteins. Finally, we also examined the presence of substitution saturation and the occurrence of recombination events among the sequences, since these can interfere with the detection of positive selection events. Amino acid sites in two PVY proteins [the 6K2 and coat or capsid proteins (CP)] were found to be under positive selection. We also examined the CPs of other potyviruses to confirm our findings and to evaluate the involvement of the CP in host adaptation.


   Methods
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
{blacksquare} Nucleotide sequences.
PVY nucleotide sequences available in the GenBank database were used (Table 1). Strain LYE84.2 was isolated from tomato and has been repeatedly passed through Lycopersicon spp. genotypes (Legnani, 1995 ). Strain SON41 was isolated from the weed Solanum nigrum and has been repeatedly passed through Capsicum annuum cultivar ‘Florida VR2’ (Gebré-Selassié et al., 1985 ). Degenerate oligonucleotide primers were defined in regions conserved among PVY strains. Overlapping cDNA fragments covering the whole genome of SON41 were amplified by PCR using Taq DNA polymerase and cloned in pCR2.1-TOPO (Invitrogen). Nucleotide sequence reactions (ABI Prism BigDye Terminator cycle sequencing ready reaction kit, Applied Biosystems) were analysed on an automated ABI 377XL DNA sequencer (Applied Biosystems). Specific PCR primers were designed and overlapping PCR fragments were amplified with Pfu DNA polymerase (Stratagene) and cloned in pCR-bluntII-TOPO (Invitrogen). The nucleotide sequences of these clones were determined as described previously. Two strategies were used to obtain the cDNA of the strain LYE84.2. A 3·8 kb clone covering the 3'-terminal region was obtained using the procedure of Gubler & Hoffman (1983) . The remaining part of the virus was amplified by PCR and cloned in pZErO-2 (Invitrogen). Sequencing was performed by Genome Express (Grenoble, France). The nucleotide sequences at the 5' termini of the SON41 and LYE84.2 RNAs have not been determined. Therefore, a consensus primer corresponding to the first 28 nucleotides was used as the 5' primer for cDNA amplification of this part of the genomes.


View this table:
[in this window]
[in a new window]
 
Table 1. PVY sequences used for phylogenetic analyses

 
Nucleotide sequences of the coat proteins of Bean yellow mosaic virus (BYMV), Lettuce mosaic virus (LMV), Potato virus A (PVA), Potato virus V (PVV), Turnip mosaic virus (TuMV) and Yam mosaic virus (YMV) were obtained from the GenBank database.

{blacksquare} Sequence alignments.
Because the data contained variable numbers of sequences along the PVY genome (Table 1), each of the 10 proteins resulting from the cleavage of the PVY polyprotein was analysed separately, unless indicated. The nucleotide sequences were aligned using the CLUSTAL method of aligning multiple sequences (MegAlign program) available with the DNASTAR package for the Apple Macintosh (version 1.02, DNASTAR Inc.). No gaps were observed, except for P3 and NIb. For P3, sequence X12456 shows a one-nucleotide deletion after nucleotide 569 followed by a one-nucleotide insertion after nucleotide 612. In the NIb coding region of sequence X12456, a similar local frame shift was identified by a nucleotide deletion after nucleotide 266 followed by a one-nucleotide insertion after nucleotide 311. Both of these local frame shifts resulted in a deduced amino acid sequence for X12456 that is highly divergent in these regions from the five other strains. It is highly probable that these one-nucleotide deletions and insertions are sequencing errors (C. Robaglia, personal communication) and they were considered as such in the following analyses. For NIb, sequence X12456 shows a two-amino acid codon insertion after amino acid codon 45 in comparison with all other sequences. These positions were discarded in the following analyses. For other potyviruses, only CP sequences that did not show any gaps were kept for further analysis.

{blacksquare} Phylogenetic relationships among PVY strains.
When the genetic distance increases between sequences, the number of observed transitions (substitutions between T and C and between A and G) relative to that of transversions (all other changes) gradually decreases due to substitution ‘saturation’. This can lead to detection of artefactual diversifying selection events. For instance, a lower transition/transversion ratio may reflect a positive, diversifying selective pressure on the region (and vice versa) since, overall, transitions are more likely to be synonymous changes than are transversions. Simulation studies show that phylogenetic information is essentially lost when the observed saturation is equal to or greater than half of full substitution saturation (Xia, 1999 ). The presence of substitution saturation was tested in our sequence data sets with DAMBE version 4.0.43 (Xia & Xie, 2001 ). Comparison of the saturation index expected when assuming half of full saturation with the observed saturation index was performed with a t test for each of the proteins. A plot of the transition and transversion rates (estimated with DAMBE) versus the divergence of pairwise comparisons between the six full-length ORFs also offered a visual display of substitution saturation.

Phylogeny construction and evaluation was done using the neighbour-joining (NJ), Fitch and Margoliash and maximum-likelihood methods implemented in the PHYLIP software package (Felsenstein, 1993 ). One thousand bootstrap replications were performed to place confidence estimates on groups contained in the most-parsimonious unrooted trees. Nodes with low reliability (bootstrap support below 70%; Hillis & Bull, 1993 ) were collapsed and the subsequent tree topology was used for maximum-likelihood analyses of codon substitution.

{blacksquare} Determination of recombination events.
Because recombination events cause different regions of a gene to have different evolutionary histories, it is important to detect them before performing analyses of evolutionary rates and searches for positive selection. The initial search for recombination in PVY was conducted using split decomposition, a method that depicts all the shortest pathways linking sequences, including those that produce an interconnected network, as expected under recombination (Bandelt & Dress, 1992 ). This analysis was performed with the program SplitsTree2 (Huson, 1998 ). When putative recombinant sequences were detected, the use of Lard version 2.2 (Holmes et al., 1999 ) and SiScan version 1.01 (Smith et al., 2000 ) allowed mapping of the recombination events and assessment of the statistical support for them. Further analyses were performed either discarding the putative recombinant strains or with sequences that did not contain any recombination breakpoints.

{blacksquare} Tests for positive selection.
In order to identify regions submitted to positive selection in the PVY genome, we employed PAML version 3.0c, which estimates the occurrence of {omega} values greater than 1·0 and which has proven useful in the documentation of positive selection in other viruses (Yang et al., 2000 ). Using the program CODEML and the tree structures obtained for each of the 10 PVY-encoded proteins, we employed two categories of maximum-likelihood models. The first category of models aims to identify codon sites submitted to positive selection. These models assume that the non-synonymous/synonymous substitution rate ratio, {omega}, is constant across all the lineages but varies across the codons. Different evolutionary models were tested (Yang et al., 2000 ): a model (M0) assuming a constant {omega} ratio, models (M1 and M7) assuming that amino acid sites are either neutral or deleterious ({omega}<=1) and models (M3 and M8) allowing the occurrence of positively selected sites ({omega}>1). Several other models available in PAML were not exploited because of their lack of power or because they converged particularly slowly, according to the recommendations of Yang et al. (2000) . For each codon, the probability of observing the data was computed using the proportion of sites belonging to these different categories. The log likelihood is the sum of these probabilities over all codons in the sequence. When {omega}>1 for some codons, the likelihood ratio of the two models to be compared (M3 versus M1 or M8 versus M7) tests whether the positive selection model fits the data significantly better than the null hypothesis: twice the difference in log likelihood between the two models is compared with a {chi}2 distribution with n degrees of freedom, n being the difference between the numbers of parameters of the two models. An empirical Bayesian approach implemented in CODEML was used to infer to which category (neutral, deleterious or advantageous) each amino acid most likely belongs. Codons with posterior probabilities above 95% were considered significant.

The second category of models aims to identify lineages undergoing positive selection. They assume that {omega} is constant across all the codons but varies across the N different branches in the tree. Comparison of the scenario where each branch has a particular {omega} value (‘N-ratio’ model) to the null hypothesis, where {omega} is constant across all branches and all codons (model M0; Yang et al., 2000 ), constitutes a likelihood ratio test of the constancy of {omega} among evolutionary lineages. When branches with {omega}>1 were obtained, we tested whether the {omega} values associated with those particular branches were significantly greater than 1·0 using Fisher’s exact test of homogeneity (Zhang et al., 1997 ).

Care was taken to run PAML with several initial {omega} values (at least the four initial values 0·2, 1·0, 1·2 and 2·1) for each model to avoid local maxima. In almost all analyses, at least three initial values led to an identical maximum likelihood (that was larger than the one obtained with the fourth initial value, when this was different). When an ambiguity remained, three more initial values were tested and allowed to produce the maximum likelihood with satisfactory confidence.

A maximum-likelihood method, also implemented in PAML, allowed the estimation of {omega} values in pairwise sequence comparisons (Yang & Nielsen, 2000 ). Significant departures from neutral evolution ({omega} estimates significantly greater than 1·0) were assessed with Fisher’s exact test of homogeneity (Zhang et al., 1997 ).


   Results
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Sequence data
We have cloned and sequenced the entire genomes of the LYE84.2 and SON41 PVY strains. No differences were observed between the clones obtained with Taq or Pfu DNA polymerases (SON41) or between the different sequence reactions (LYE84.2). The nucleotide sequences and deduced amino acid sequences of LYE84.2 and SON41 have been submitted to the EMBL database under accession numbers AJ439545 and AJ439544, respectively. These are the first full-length sequences of non-potato strains and the first sequences of non-potato strains for several PVY proteins (P3, 6K1, CI, 6K2 and NIa).

Analysis of the phylogenetic signal
Evidence for lack of substitution saturation comes from linear regression of the transition and transversion rates versus the divergence of the sequences (Fig. 1). In the case of saturation, the number of observed transitions relative to that of transversions gradually decreases with increasing divergence (Xia, 1999 ). The plot of pairwise comparisons between the six full-length ORFs of PVY showed that this is not the case here (Fig. 1). Expected and observed saturation indices were also compared for each data set, corresponding to each protein, with DAMBE. The significance of their differences showed that there was no evidence for substitution saturation (data not shown).



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 1. Plot of the transition rate ({diamondsuit}) and transversion rate ({circ}) against the Tamura & Nei (1993) (TN93) distance for pairwise comparisons of the coding sequences of the six full-length PVY strains. Linearity is an indication of lack of saturation of the phylogenetic signal.

 
Topology of the PVY trees
As expected, the trees obtained with the different PVY genes can be divided into three main parts (illustrated for the CP gene in Fig. 2): PVY strains were mainly divided into the N and O strains and a third group, composed mainly of non-potato strains. For the CP, PVY strains of the C group are joined to the non-potato strains (C1 subgroup) or consist of a separate group (C2 subgroup) between the non-potato strains and the N and O strains. No other sequences are available for strains in the C groups, except one for the P1 protein that clusters with SON41 and LYE84.2. Tobacco strains can be joined to any group. This clustering was supported by high bootstrap values and was consistent for all PVY genes, with the exception of recombination events (see below). The different methods used to investigate the PVY phylogeny gave the same topologies with slight differences in the bootstrap values (data not shown).



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 2. Unrooted neighbour-joining phylogenetic tree of the CP gene of PVY. Bootstrap analysis was applied using 1000 bootstrap samples. Some bootstrap percentages at internal nodes are reported. The different subgroups of PVY strains from potato plants are indicated. Underlined sequences and sequences in boxes are respectively from tobacco and pepper strains.

 
Detection of recombination in the PVY genome
Several putative recombination events were identified among the sequences examined and were found to be statistically significant (Fig. 3A). The recombinations, occurring in CP and in the 3' untranslated region, have already been documented (Revers et al., 1996 ). The recombinations affecting strain N-Wilga (in the P1 protein) and strain M95491 (in the P3 and VPg proteins) have also been identified (Glais et al., 2002 ) and resulted in changes of phylogenetic group (between PVY N and O subgroups) that were supported by high bootstrap values (70%). Finally, two recombination events in the NIb gene were found to be significant for strain X12456, with the central part of NIb linked to the N group and the rest of the genome linked to the O group of PVY (Fig. 3B). Reanalysis of the data with SplitsTree2 without these recombinant strains did not reveal any other putative recombinant strains.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 3. (A) Map of recombination breakpoints identified in the PVY genome. These recombination events were also shown in Revers et al. (1996) and Glais et al. (2002) except for the two recombination breakpoints in the NIb gene. (B) Z scores for total nucleotide identity scores of the NIb gene were calculated with SiScan for two sets of comparisons: strains X12456 and M95491 (dashed line) and strains X12456 and D12539 (unbroken line), showing two putative recombination events for strain X12456. Z values above 3·67 are significant (< 5%).

 
Non-synonymous/synonymous substitution rate analyses in PVY
Heterogeneous evolution across amino acid sites.
Log likelihood values and parameter estimates for the best-fit model among M0, M1 and M3 (see Methods) are listed in Table 2 for each PVY protein. As a whole, the evolution of the PVY proteins appears to be conservative, with mean {omega} values ranging from 0·03 (CI) to 0·23 (P1). There was no evidence of selection for codon usage (effective number of codons is 52–58). Occurrence of sites submitted to positive selection was shown for the 6K2 and CP proteins. The {omega} values associated with the positively selected sites were weak (respectively 2·0 and 1·1) and comprised one (amino acid position 14 of the 6K2 protein) and 24 (CP; Fig. 4) amino acid positions with a posterior probability greater than 95% that they belong to the positive selection {omega} category. Comparison of models M8 and M7 (Yang et al., 2000 ) confirmed the significance of positive selection for CP but was slightly above the 5% threshold for the 6K2 protein (Table 2). For the 6K2 protein, the significance of positive selection (comparisons of M3 and M1 and of M8 and M7) increased when a larger region was analysed (from the end of CI to the end of VPg) with the six sequences available in the data set (data not shown). It is unlikely that detection of positive selection in CP was due to the large number of sequences used for the analysis. Positive selection was still detected with only the six CP sequences corresponding to the full-length PVY genomes [using PAML allowed detection of a category of codon positions with {omega}=1·5 (model M8) and showed that model M8 is significantly better than model M7 (P=0·02)]. A proportion of amino acid sites with relatively high {omega} values was also observed for NIa-Pro (2·6% of sites with {omega}=0·86), NIb (4·2% of sites with {omega}=0·85) and especially P3, with 21·3% of the sites undergoing neutral evolution ({omega}=1).


View this table:
[in this window]
[in a new window]
 
Table 2. Maximum-likelihood analysis of the evolution of PVY proteins with models allowing {omega} to vary across amino acid sites

 


View larger version (64K):
[in this window]
[in a new window]
 
Fig. 4. Alignment of the CP sequences of PVY, PVA, BYMV and YMV and amino acid sites (shaded) that are putatively submitted to diversifying selection (> 95% to belong to {omega} categories 1·09, 1·55, 1·80 and 1·05, respectively). Positions submitted to positive selection and shared by several potyviruses are indicated by arrows.

 
Heterogeneous evolution across lineages.
For all proteins in the PVY genome except the P1, 6K1 and 6K2 proteins, the ‘N-ratio’ model (see above) fitted the data significantly better than M0, indicating heterogeneity in the mode of evolution along lineages (data not shown). In all 10 proteins, branches were detected with {omega} estimates greater than 1 (data not shown), but these particular branches involved only a very small number of substitutions (a maximum of eight substitutions was observed for a branch in the P3 tree with {omega}=1·9). The {omega} estimates of these branches were never significantly greater than 1, essentially because of the small number of substitutions along these branches in comparison with the background branches (data not shown).

Evolution modes of the CP of other potyviruses
Few sequence data are available for the 6K2 proteins of other potyviruses. However, the numerous available CP sequences can be used to see whether the CP of other potyviruses share the same evolution pattern as that of PVY. Hence, the same analyses as before were performed with the CP genes of seven potyviruses, with either wide or narrow host ranges (Table 3). For Plum pox virus (PPV), TuMV and YMV, several sequences were discarded because of important gaps in the alignments. Most of the potyviruses showed a varying proportion of fast-evolving amino acid sites, except PVV and TuMV (Table 3). Diversifying selection was statistically significant for YMV and BYMV when models M3 and M1 or M8 and M7 in PAML were compared and for PVA when models M3 and M1 were compared (Table 3). Some of the amino acid positions subjected to diversifying selection (P 95%) are identical in an alignment of the CP of PVY, PVA, BYMV and YMV (Fig. 4).


View this table:
[in this window]
[in a new window]
 
Table 3. Occurrence of diversifying selection in the CP of potyviruses with varying host-range properties

 
As for the PVY proteins, models allowing the {omega} ratio to vary across lineages showed significant heterogeneity (‘N-ratio’ model versus M0) in most cases, but did not reveal any branch with {omega} significantly greater than 1·0 (data not shown).

Estimates of {omega} in sequence pairwise comparisons (Yang & Nielsen, 2000 ) for all sequence data sets were rarely greater than 1·0 or involved only a small number of substitutions. An exception was found for PPV, where pairwise comparisons between the whole CP sequences of two strains [corresponding to accession numbers X81078 (unknown host origin) and X81083 (sour cherry origin)] and several other strains revealed {omega} estimates greater than 1·0 with relatively large numbers of substitutions. The estimate between CP sequences X81078 and X81083 is {omega}=3·09, with 16·3 non-synonymous versus 2·0 synonymous substitution estimates, very near the 5% significance threshold (P=0·06).


   Discussion
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Occurrence of positive selection in potyviruses
Sequence analysis has allowed the detection of genome regions that are undergoing positive selection in several animal viruses (reviewed in Yang & Bielawski, 2000 ), mainly in domains that interact with the host immune system. It is difficult, however, to derive evidence for positive selection from sequence data when {omega} ratios are estimated for entire genes. Indeed, {omega} ratios for plant virus genes are usually low, except for some overlapping ORFs (reviewed in García-Arenal et al., 2001 ). Positive selection is, however, suggested to occur in natural populations of Tobacco mild green mosaic virus (Fraile et al., 1996 ).

Our study showed that amino acids in two PVY proteins were submitted to positive selection: a single amino acid position in the 6K2 protein and several amino acid positions in the CP (especially in the N terminus). Evidence for diversifying selection was also obtained for the CP of BYMV and YMV. Several of these positions were shared (or nearby) in several potyviruses, strengthening their significance. Heterogeneous modes of evolution were detected not only across amino acid sites but also across lineages. For most of the PVY proteins and for the CP of other potyviruses, the {omega} ratio is significantly different across the different lineages. However, no specific branches with {omega} ratios significantly greater than 1·0 were detected, because of the relatively small numbers of substitutions. Diversifying selection events were also suspected for the CP of PVA and of two PPV isolates in comparison with several other isolates. It should be noted that the test of molecular adaptation used in PAML is highly conservative: it will fail if positive selection affects only a few amino acid sites along a few lineages on the phylogeny. Models that allow the selective pressure to vary both among lineages and among codons would be more realistic for detecting adaptive molecular evolution, but should have increased power.

Evolutionary constraints on the potyvirus genome
We have obtained evidence for positive selection events occurring in two proteins of the PVY genome and in the CP of other potyviruses. In the absence of a three-dimensional structural model or with only very partial delineation of the functional domains of these proteins, it is difficult to unravel the evolutionary constraints that are responsible for these events. It is plausible that the regions identified are ligand-binding domains and that positive selection reflects gains or losses of affinity for these ligands. In this respect, interactions with plant or vector-insect factors as well as interactions with other PVY proteins can be the driving forces of this evolution.

The N-terminal part of CP is non-essential for replication and cell-to-cell movement of the potyviruses Tobacco etch virus (TEV) (Dolja et al., 1993 , 1994 ) and Zucchini yellow mosaic virus (Arazi et al., 2001 ). Mutations in this part of the genome can, however, affect some virus functions quantitatively. Deletion of 25 amino acids in the CP of TEV (8 of these amino acids correspond to amino acid positions submitted to positive selection in the PVY genome; Fig. 4) reduced the speed of cell-to-cell movement in tobacco and completely abolished systemic movement (Dolja et al., 1994 ). A substitution of the amino acid immediately following the DAG motif in the CP of tobacco vein mottling potyvirus (the corresponding amino acid in PVY is potentially submitted to positive selection; Fig. 4) reduced the efficiency of aphid transmission of the virus (Atreya et al., 1995 ). This illustrates that amino acid substitutions in this region can affect the fitness of various potyviruses quantitatively and can consequently be the target of selective processes. The N-terminal part of CP is a striking example of the multifunctionality of viral proteins. It is exposed on the virion surface (thus potentially involved in binding ligands) and involved in aphid transmission (Atreya et al., 1990 , 1991 , 1995 ) and cell-to-cell and long-distance movement (Dolja et al., 1994 , 1995 ; Rojas et al., 1997 ). Therefore, it is difficult to know which virus function is involved in these positive selection events.

A three-dimensional structure model was recently proposed for the CP of PVA (Baratova et al., 2001 ). We compared the amino acid positions previously identified with this model as a reference, using the CP sequence alignment in Fig. 4 and keeping in mind that only weak evidence of diversifying selection was obtained for PVA (Table 3). In all four potyviruses, a large proportion of sites that belong to the positive selection category are in the N-terminal part of the CP, comprising the regions most accessible at the particle surface (Baratova et al., 2001 ). Several of the positively selected amino acid sites are also encountered in the central and C-terminal (internal) parts of the capsid, without coinciding with a particular domain of the putative protein structure.

The central hydrophobic domain of the 6K2 protein is responsible for binding to the ER (Schaad et al., 1997 ) and it has consequently been proposed that the 6K2 protein is required for genome amplification and that it anchors the replication apparatus to ER membranes. Chu et al. (1997) showed that substitutions in a region spanning the end of CI, 6K2 and the beginning of VPg of TEV were sufficient (but not absolutely necessary) to induce a wilting response in Tabasco pepper. Since most of the TEV field isolates belong to the ‘wilting type’, the determinants of the wilting response in Tabasco could confer a selective advantage to TEV. However, the two amino acid changes in the 6K2 peptide that distinguish a non-wilting from a wilting isolate of TEV do not coincide with amino acid position 14 of PVY, which we suspect to be under positive selection.

Host adaptation and evolution of the potyvirus genome
Many environmental factors can exert evolutionary constraints on PVY, but one of the most important is certainly the host plant. Occurrence of diversifying selection in the CP of potyviruses was not found to correlate with the width of host range (Table 3). The three potyviruses that showed evidence of positive selection (PVY, BYMV and YMV) correspond to CP sequences with the largest diversity measured at the nucleotide level (Table 3). Strong conservation of the nucleotide sequences for the other potyviruses can be interpreted as an effect of purifying selection and can also be responsible for the lack of significance of positive selection when {omega} estimates greater than 1·0 were obtained (PVA, PPV and LMV). There was no structuring of the phylogenetic trees as a function of the host species of the isolates (data not shown) except, partially, for PVY, which would be an indication of the lack of involvement of the CP as a host-range determinant. However, it should be emphasized that very limited host-range data are available for the majority of the strains used in these sequence analyses.

Biological arguments have already been reported for host adaptation of PPV in cherry trees versus other Prunus species (Dosba et al., 1987 ; Nemchinov et al., 1996 ) and especially for PVY. In PVY, isolates from pepper usually do not infect potato plants systemically (McDonald & Kristjansson, 1993 ; d’Aquino et al., 1995 ; Gebré-Selassié et al., 1985 ) when inoculated manually. Reciprocally, manual inoculations did not allow systemic infection of pepper plants with PVYN strains (Gebré-Selassié et al., 1985 ; McDonald & Kristjansson, 1993 ; Valkonen et al., 1996 ), whereas some of the strains in the O and C groups could infect some pepper cultivars (McDonald & Kristjansson, 1993 ; Valkonen et al., 1996 ; Blanco-Urgoiti et al., 1998 ). Tomato (Stobbs et al., 1994 ; Legnani, 1995 ) and tobacco (Blancard, 1998 ) plants lacking specific resistance genes can be infected by most, if not all, PVY isolates, including those that originate from potato and pepper. Taken together, these data suggest an influence of some host plants (potato and pepper) but not others (tomato and tobacco) in the selection and evolution of PVY isolates. There are also partial arguments for host-range variations according to the source plants for strains of BYMV (Wada et al., 2000 ) and TuMV (e.g. Stavolone et al., 1998 ). PVV, PVA, YMV and LMV, which all share very narrow host ranges, do not seem to undergo adaptation or diversification according to host plants (except for overcoming of specific cultivar resistance genes, or propagation of particular virus strains that could be due to vegetative and clonal multiplication of cultivars). However, two kinds of bias should be emphasized. (i) Once again, very limited host-range data are available for the majority of the strains used in these sequence analyses, and (ii) PAML analyses are based on substitution models; very divergent CP sequences (e.g. some cherry isolates of PPV and some YMV and TuMV isolates) could not be included in the analyses because of numerous gaps in the alignments. This will reduce the possibility of detecting divergent evolution events and may introduce a bias for the comparison of CP evolution of the different potyviruses.

In conclusion, evidence of diversifying selection in several potyviruses cannot yet be interpreted in terms of evolutionary constraints. Further biological characterization should be performed and the role of amino acid substitutions in these different proteins investigated through molecular analyses. Confrontation of biological data with the modes of evolution of the CP makes this protein an interesting candidate for host adaptation of some potyviruses, including PVY, BYMV, YMV and, maybe, PPV.


   Acknowledgments
 
B.M. and C.M. contributed equally to this work and should be considered as joint first authors. B.M. was the recipient of a post-doctoral NATO fellowship and C.M. of a PhD fellowship from the French Research Ministry. We thank F. García-Arenal for improving the manuscript, N. Galtier for helpful comments, V. Marie-Jeanne Tordo for unpublished sequence data and L. Guilbaud and B. Olsen for efficient technical help.


   Footnotes
 
The EMBL accession numbers of the sequences reported in this paper are AJ439544 and AJ439545.


   References
Top
Abstract
Introduction
Methods
Results
Discussion
References
 
Arazi, T., Shiboleth, Y. M. & Gal-On, A. (2001). A nonviral peptide can replace the entire N terminus of zucchini yellow mosaic potyvirus coat protein and permits viral systemic infection. Journal of Virology 75, 6329-6336.[Abstract/Free Full Text]

Atreya, C. D., Raccah, B. & Pirone, T. P. (1990). A point mutation in the coat protein abolishes aphid transmissibility of a potyvirus. Virology 178, 161-165.[Medline]

Atreya, P. L., Atreya, C. D. & Pirone, T. P. (1991). Amino acid substitutions in the coat protein result in loss of insect transmissibility of a plant virus. Proceedings of the National Academy of Sciences, USA 88, 7887-7891.[Abstract]

Atreya, P. L., Lopez-Moya, J. J., Chu, M., Atreya, C. D. & Pirone, T. P. (1995). Mutational analysis of the coat protein N-terminal amino acids involved in potyvirus transmission by aphids. Journal of General Virology 76, 265-270.[Abstract]

Bandelt, H. J. & Dress, A. W. M. (1992). Split decomposition: a new and useful approach to phylogenetic analysis of distance data. Molecular Phylogenetics and Evolution 1, 242-252.[Medline]

Baratova, L. A., Efimov, A. V., Dobrov, E. N., Fedorova, N. V., Hunt, R., Badun, G. A., Ksenofontov, A. L., Torrance, L. & Järvekülg, L. (2001). In situ spatial organization of potato virus A coat protein subunits as assessed by tritium bombardment. Journal of Virology 75, 9696-9702.[Abstract/Free Full Text]

Blancard, D. (1998). Maladies du Tabac. Observer, Identifier, Lutter. Edited by INRA. Paris, France.

Blanco-Urgoiti, B., Sánchez, F., Pérez de San Román, C., Dopazo, J. & Ponz, F. (1998). Potato virus Y group C isolates are a homogeneous pathotype but two different genetic strains. Journal of General Virology 79, 2037-2042.[Abstract]

Chu, M., Lopez-Moya, J. J., Llave-Correas, C. & Pirone, T. P. (1997). Two separate regions in the genome of tobacco etch virus contain determinants of the wilting response of Tabasco pepper. Molecular Plant–Microbe Interactions 10, 472-480.[Medline]

d’Aquino, L., Dalmay, T., Burgyan, J., Ragozzino, A. & Scala, F. (1995). Host range and sequence analysis of an isolate of potato virus Y inducing veinal necrosis in pepper. Plant Disease 79, 1046-1050.

Dolja, V. V., Herndon, K. L., Pirone, T. P. & Carrington, J. C. (1993). Spontaneous mutagenesis of a plant potyvirus genome after insertion of a foreign gene. Journal of Virology 67, 5968-5975.[Abstract]

Dolja, V. V., Haldeman, R., Robertson, N. L., Dougherty, W. G. & Carrington, J. C. (1994). Distinct functions of capsid protein in assembly and movement of tobacco etch potyvirus in plants. EMBO Journal 13, 1482-1491.[Abstract]

Dolja, V. V., Haldeman-Cahill, R., Montgomery, A. E., Vandenbosch, K. A. & Carrington, J. C. (1995). Capsid protein determinants involved in cell-to-cell and long distance movement of tobacco etch potyvirus. Virology 206, 1007-1016.[Medline]

Dosba, F., Maison, P., Lansac, M. & Massonie, G. (1987). Experimental transmission of plum pox virus (PPV) to Prunus mahaleb and Prunus avium. Journal of Phytopathology 120, 199-204.

Dougherty, W. G. & Carrington, J. C. (1988). Expression and function of potyviral gene products. Annual Review of Phytopathology 26, 123-143.

Fakhfakh, H., Makni, M., Robaglia, C., Elgaaied, A. & Marrakchi, M. (1995). Polymorphisme des régions capside et 3' NTR de 3 isolats tunisiens du virus Y de la pomme de terre (PVY). Agronomie 15, 569-579.

Felsenstein, J. (1993). PHYLIP: phylogenetic inference package, version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle, USA.

Fraile, A., Malpica, J. M., Aranda, M. A., Rodríguez-Cerezo, E. & García-Arenal, F. (1996). Genetic diversity in tobacco mild green mosaic tobamovirus infecting the wild plant Nicotiana glauca. Virology 223, 148-155.[Medline]

García-Arenal, F., Fraile, A. & Malpica, J. M. (2001). Variability and genetic structure of plant virus populations. Annual Review of Phytopathology 39, 157-186.[Medline]

Gebré-Selassié, K., Marchoux, G., Delecolle, B. & Pochard, E. (1985). Variabilité naturelle des souches du virus Y de la pomme de terre dans les cultures de piment du sud-est de la France. Caractérisation et classification en pathotypes. Agronomie 5, 621-630.

Glais, L., Tribodet, M. & Kerlan, C. (2002). Genomic variability in potato potyvirus Y (PVY): evidence that PVYNW and PVYNTN variants are single to multiple recombinants between PVYO and PVYN isolates. Archives of Virology 147, 363-378.[Medline]

Gubler, U. & Hoffman, B. J. (1983). A simple and very efficient method for generating cDNA libraries. Gene 25, 263-269.[Medline]

Hillis, D. M. & Bull, J. J. (1993). An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systematic Biology 42, 182-192.

Holmes, E. C., Worobey, M. & Rambaut, A. (1999). Phylogenetic evidence for recombination in dengue virus. Molecular Biology and Evolution 16, 405-409.[Abstract]

Huson, D. H. (1998). SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14, 68-73.[Abstract]

Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press.

Legnani, R. (1995). Analyse, comparaison et exploitation des résistances au virus Y de la pomme de terre (PVY) et au tobacco etch virus (TEV) chez la tomate. PhD thesis, University of Montpellier II, France.

Li, W.-H. (1997). Molecular Evolution. Sunderland, MA: Sinauer Associates.

McDonald, J. G. & Kristjansson, G. T. (1993). Properties of strains of potato virus YN in North America. Plant Disease 77, 87-89.

Marie-Jeanne Tordo, V., Chachulska, A. M., Fakhfakh, H., Le Romancer, M., Robaglia, C. & Astier-Manifacier, S. (1995). Sequence polymorphism in the 5'NTR and in the P1 coding region of potato virus Y genomic RNA. Journal of General Virology 76, 939-949.[Abstract]

Mestre, P., Brigneti, G. & Baulcombe, D. C. (2000). An Ry-mediated resistance response in potato requires the intact active site of the NIa proteinase from potato virus Y. Plant Journal 23, 653-661.[Medline]

Nei, M. (1987). Molecular Evolutionary Genetics. New York: Columbia University Press.

Nemchinov, L., Hadidi, A., Maiss, E., Cambra, M., Candresse, T. & Damsteegt, V. (1996). Sour cherry strain of plum pox potyvirus (PPV): molecular and serological evidence for a new subgroup of PPV strains. Phytopathology 86, 1215-1221.

Nielsen, R. & Yang, Z. (1998). Likelihood models for detecting positively selected amino acid sites and application to the HIV-1 envelope gene. Genetics 148, 929-936.[Abstract/Free Full Text]

Revers, F., Le Gall, O., Candresse, T., Le Romancer, M. & Dunez, J. (1996). Frequent occurrence of recombinant potyvirus isolates. Journal of General Virology 77, 1953-1965.[Abstract]

Riechmann, J. L., Laín, S. & García, J. A. (1992). Highlights and prospects of potyvirus molecular biology. Journal of General Virology 73, 1-16.[Medline]

Rojas, M. R., Zerbini, F. M., Allison, R. F., Gilbertson, R. L. & Lucas, W. J. (1997). Capsid protein and helper component-proteinase function as potyvirus cell-to-cell movement proteins. Virology 237, 283-295.[Medline]

Schaad, M. C., Jensen, P. E. & Carrington, J. C. (1997). Formation of plant RNA virus replication complexes on membranes: role of an endoplasmic reticulum-targeted viral protein. EMBO Journal 16, 4049-4059.[Abstract/Free Full Text]

Shukla, D. D., Ward, C. W. & Brunt, A. A. (1994). Genome structure, variation and function. In The Potyviridae, pp. 74–110. Wallingford, UK: CAB International.

Smith, G. R., Borg, Z., Lockhart, B. E. L., Braithwaite, K. S. & Gibbs, M. J. (2000). Sugarcane yellow leaf virus: a novel member of the Luteoviridae that probably arose by inter-species recombination. Journal of General Virology 81, 1865-1869.[Abstract/Free Full Text]

Stavolone, L., Alioto, D., Ragozzino, A. & Laliberté, J.-F. (1998). Variability among turnip mosaic potyvirus isolates. Phytopathology 88, 1200-1204.

Stobbs, L. W., Poysa, V. & Van Schagen, J. G. (1994). Susceptibility of cultivars of tomato and pepper to a necrotic strain of potato virus Y. Canadian Journal of Plant Pathology 16, 43-48.

Tamura, K. & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution 10, 512-526.[Abstract]

Valkonen, J. P. T., Kyle, M. M. & Slack, S. A. (1996). Comparison of resistance to potyviruses within Solanaceae: infection of potatoes with tobacco etch potyvirus and peppers with potato A and Y potyviruses. Annals of Applied Biology 129, 25-38.

Wada, Y., Iwai, H., Ogawa, Y. & Arai, K. (2000). Comparison of pathogenicity and nucleotide sequences of 3'-terminal regions of Bean yellow mosaic virus isolates from Gladiolus. Journal of General Plant Pathology 66, 345-352.

Xia, X. (1999). DAMBE (Software Package for Data Analysis in Molecular Biology and Evolution). User Manual. Hong Kong: Department of Ecology and Biodiversity, University of Hong Kong.

Xia, X. & Xie, Z. (2001). DAMBE: software package for data analysis in molecular biology and evolution. Journal of Heredity 92, 371-373.[Abstract/Free Full Text]

Yang, Z. (1997). PAML: a program package for phylogenetic analysis by maximum likelihood. Computer Applications in the Biosciences 13, 555-556.[Medline]

Yang, Z. (1998). Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Molecular Biology and Evolution 15, 568-573.[Abstract]

Yang, Z. & Bielawski, J. P. (2000). Statistical methods for detecting molecular adaptation. Trends in Ecology & Evolution 15, 496-503.[Medline]

Yang, Z. & Nielsen, R. (2000). Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution 17, 32-43.[Abstract/Free Full Text]

Yang, Z., Nielsen, R., Goldman, N. & Pedersen, A.-M. K. (2000). Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431-449.[Abstract/Free Full Text]

Zhang, J., Kumar, S. & Nei, M. (1997). Small-sample tests of episodic adaptive evolution: a case study of primate lysozymes. Molecular Biology and Evolution 14, 1335-1338.[Free Full Text]

Received 27 March 2002; accepted 17 June 2002.