Evidence for the Non-quasispecies Evolution of RNA Viruses

Gareth M. Jenkins, Michael Worobey, Christopher H. Woelk and Edward C. Holmes

Department of Zoology, University of Oxford, Oxford, England


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
The quasispecies model of RNA virus evolution differs from those formulated in conventional population genetics in that neutral mutations do not lead to genetic drift of the population, and natural selection acts on the mutant distribution as a whole rather than on individual variants. By computer simulation, we show that this model could be inappropriate for many RNA viruses because the neutral sequence space may be too large to allow the formation of a quasispecies distribution. This view is supported by our analysis of gene sequences from vesicular stomatitis virus, which is considered a prototype RNA virus quasispecies. Our results are relevant to the evolution of RNA systems in general.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
The extent to which the mutation rates of RNA viruses exceed those of their DNA microbial counterparts is striking. While DNA microbes have mutation rates close to 0.003 mutations per genome per replication, values for RNA viruses typically lie between 0.1 and 10 mutations per genome per replication (Drake et al. 1998Citation ). The consequence of such high mutation rates is that populations of RNA viruses can exist as heterogeneous mutant swarms rather than as copies of one or a few dominant sequences. This, in turn, has led to the labeling of many RNA viruses as "quasispecies" of variable genome sequences. In particular, the term is commonly applied to HIV (Wain-Hobson 1992Citation ), hepatitis C virus (Forns, Purcell, and Bukh 1999Citation ), foot-and-mouth disease virus (Domingo et al. 1992Citation ), and vesicular stomatitis virus (Steinhauer et al. 1989Citation ).

The quasispecies concept was introduced by M. Eigen and co-workers as a formal mathematical model of early life forms based on chemical kinetics (Eigen and Schuster 1977Citation ). The quasispecies model is an equilibrium mutation-selection process and describes a heterogeneous distribution of whole genomes ordered around one or a degenerate set of fittest sequences known as "master" sequences (Eigen 1987, 1993, 1996a, 1996bCitation ; Nowak 1992Citation ). The master sequences continually generate mutants upon replication but maintain a stable frequency in the population rather than diverging and diffusing over neutral sequence space. This is in contrast to conventional population genetic models in which neutral mutations would lead to genetic drift of the population. The quasispecies model does not deny the existence of neutral mutations, but argues that the concept can be ignored for organisms with sufficiently small genomes, large population sizes, and high mutation rates because at any one time, the neutral space surrounding a fitness peak may be completely explored so that the population does not undergo genetic drift (Eigen 1987, 1996bCitation ). A further difference between the quasispecies model and those of conventional population genetics is that in the former, the frequency of any variant in a population is a function not only of its ability to replicate without errors, but also of the probability that it will arise by the erroneous replication of other templates within the mutant distribution. Consequently, genomes are not independent entities due to mutational coupling among variants, and instead, the entire mutant distribution forms an organized cooperative structure which acts like (quasi) a single unit (species), hence its name. The quasispecies evolves to maximize the average rate of replication of the entire mutant distribution rather than the frequency of the single fittest sequence in the population, and thus the target of natural selection has been proposed to be "the mutant distribution as a whole," as opposed to individual variants (Eigen 1987, 1993, 1996a, 1996bCitation ). One consequence of this is that under quasispecies theory, individual variants of lower fitness can outcompete those of higher fitness by being surrounded by better-adapted mutants. Finally, the quasispecies model can also be defined mathematically as a Markov process representing the set of genome frequencies that maximizes the average replication rate of the population given a mutation matrix containing the probabilities of genomes mutating to other variants within the population and the relative fitness of each individual variant.

Domingo et al. (1978)Citation were the first to suggest that RNA viruses might have quasispecies distributions. By T1 fingerprinting, they showed that a multiply passaged Qß phage population was an equilibrium distribution of closely related mutants, the hallmark of a quasispecies. Not only was the consensus (average) sequence of the population stable over multiple passages, but mutants only differed from the consensus sequence by an average of one or two nucleotides. While the formation of an equilibrium state is often unlikely in real infections due to continual fluctuations in adaptive environment such as those caused by immune pressure, cellular tropism, or drug therapy (Eigen 1996aCitation ; Domingo and Holland 1997Citation ), the quasispecies model has nevertheless become the dominant paradigm for RNA virus evolution (Domingo et al. 1985Citation ; Steinhauer and Holland 1987Citation ; Domingo 1992Citation ; Holland, De La Torre, and Steinhauer 1992Citation ; Domingo and Holland 1997Citation ). Confusingly, though, the term "quasispecies" is also currently used by many virologists to loosely describe any genetic heterogeneity within viral populations, rather than to refer to the precise evolutionary model (Eigen 1996aCitation ). Here, we use the term in its formal sense.

Despite its widespread usage, there have been few studies testing whether the quasispecies is an appropriate evolutionary model for real RNA viruses. This was our aim, since it is of fundamental importance for understanding viral evolution to know whether genetic drift is absent and if genomes are not independently evolving identities, the two central tenets of quasispecies theory. First, we used computer simulation to illustrate the conditions under which a viral mutant distribution can form a quasispecies. We then analyzed gene sequences of vesicular stomatitis virus (VSV), a supposedly archetypal quasispecies, to investigate whether these conditions are met by real RNA viruses. Experiments similar to those described on Qß phage have been carried out on VSV and have revealed high levels of heterogeneity within clonal populations of this virus, as well as a stable consensus sequence over multiple passages in cell culture (Steinhauer et al. 1989Citation ). From these observations, it was concluded that VSV was a quasispecies. In a subsequent study, it was shown that a highly fit variant would only rise to dominance if seeded above a certain threshold, leading to the suggestion that such variants were suppressed by interactions within the quasispecies and that the competitive ability of any variant depends on the mutant spectrum which surrounds it (De La Torre and Holland 1990Citation ). We investigated whether such observations could be explained only by the quasispecies model.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Simulations
A simple model of RNA virus evolution was produced as follows. Genomes were composed of 100 monomers, each of which could exist as state A, C, G, or U. A randomized sequence of even base composition was defined as the master sequence and was assigned a relative fitness of 1.0. All other mutants had a relative fitness of 0.1. This was similar to the fitness landscape used in a previous computer model of a quasispecies (Swetina and Schuster 1982Citation ). The initial population consisted of 200,000 copies of the master sequence. Replication was initiated by subjecting these sequences to mutation according to a Poisson process in which all substitutions were equally likely and then randomly sampling sequences and including a number of copies of each, proportional to 10 times its fitness score, in the next generation. Sampling was continued until the new generation reached a size of 200,000, and this population was in turn subjected to mutation and used to sample genomes for inclusion in the next generation. This recursive mutation-selection process was continued for the required number of replication cycles, and the population size was kept constant at 200,000. The equilibrium frequencies of the master sequence and surrounding mutants were then calculated under a variety of mutation rates (described in the Results section).

Using these parameters, the numbers of possible genomes with one, two, or three mutations away from a defined master sequence of this length and composition are 300, 4.5 x 104, and 4.4 x 106, respectively. The population size used was thus greater than the number of two-error mutants but less than the number of three-error mutants. This is comparable to a typical RNA virus population of size 1010 (Eigen 1996bCitation ) and length 104; for such a virus, the population size is also greater than the number of two-error mutants but less than the number of three-error mutants. Therefore, while the population size in our model was smaller than those of real RNA virus populations, the relative amount of sequence space explored was similar.

The effects of neutral evolution were added to the model by classifying sites within the master sequence as either neutral or negatively selected. Mutations at neutral sites did not alter fitness, but mutations at negatively selected sites decreased fitness from 1.0 to 0.1 in one step. The basic process of mutation and selection was the same as in the original model, and the population was likewise initiated as 200,000 copies of the master sequence. To test whether the population could reach an equilibrium, 200 genomes were randomly sampled from the mutant distribution after 1,000 replication cycles, and their frequencies in the population were calculated at this and subsequent time points. The frequencies of the 200 sampled genomes would remain unchanged if the mutant distribution was at equilibrium, and 1,000 replication cycles was deemed sufficient to allow an equilibrium to form if one was possible, since each site in every genome would be expected to mutate on average at least once under the mutation rates used (e.g., 1.0 mutations per genome per replication). The consensus sequence of the population was measured at 200 replication cycle intervals by calculating the average sequence of the whole population.

Simulation programs were written in C++, and source code is available on request.

Sequence Analysis
Thirty-four complete VSV glycoprotein (G) gene sequences were obtained from GenBank with the accession numbers M27165, M23450, M21416M21421, M21423M21427, M21428M21437, and M21558M21568. To check the validity of quasispecies theory in another RNA virus, 16 complete foot-and-mouth disease virus (FMDV) polyprotein 1 (P1) gene sequences were collected from the Aphthovirus sequence database (http://www.iah.bbsrc.ac.uk/virus/picornaviridae/SequenceDatabase/) with the accession numbers U82271, M55287, AF154271, X00871, M72587, X74811, M10975, X00429, M90372, L29078, M90376, L29061, M90368, M60118, M90367, and AJ007347. Sequences were then aligned using CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ).

The average numbers of synonymous (dS) to nonsynonymous (dN) substitutions in both viruses were calculated using the PAML package (Nielsen and Yang 1998Citation ) employing a maximum-likelihood method based on an explicit model of codon substitution (Goldman and Yang 1994Citation ). A sliding-window analysis of the distribution of synonymous and nonsynonymous substitutions along each gene was performed using the SNAP program (available at http://hiv-web.lanl.gov).

Codon bias, which may limit the size of neutral space, was measured using the effective codon index, NC (Wright 1990Citation ). This index is a measure of overall codon bias, i.e., that produced by both mutation bias and translational selection, analogous to the effective-number-of-alleles measure used in population genetics, and was calculated using the codonW program (available at http://www.molbiol.ox.ac.uk/cu/codonW.html). The reported value of NC is always between 20 (when only one codon is effectively used for each amino acid) and 61 (when codons are used randomly) and is a reliable estimator of codon bias for input sequences longer than 100 codons.

Finally, the folding energies of possible RNA secondary structures, another possible constraint on neutral space, were calculated using the program MFOLD (Mathews et al. 1999Citation ). This program calculates the optimal thermodynamic folding energies of RNA structures and gives a visual impression of the ability of an RNA molecule to fold into a well-defined structure. The significance of energy values, i.e., whether these secondary structures are likely to be functionally important, was tested by comparing these values against a distribution of optimal folding energies for 50 randomized sequences of the same length and base composition. This procedure was recently used to test whether significant RNA secondary structure exists within cellular mRNAs (Seffens and Digby 1999Citation ).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Simulation Studies
To illustrate the conditions under which an RNA virus population can form a quasispecies distribution, a simple computer simulation was performed incorporating a unique master sequence of relative fitness 1.0 and a set of surrounding sequences with fitness 0.1. The population initially consisted of identical copies of the master sequence, and the structure of the mutant distribution is shown following 1,000 and 2,000 replication cycles at a mutation rate of one mutation per genome per replication (fig. 1 ). This analysis demonstrated that the mutant distribution reached a steady state in which the frequency of the master sequence and each class of mutant remained constant over time. This population could be viewed as a quasispecies because it was both heterogeneous—there were over three times as many mutants as copies of the master sequence—and at equilibrium, in that the master sequence persisted in the population at a constant frequency, and random drifting of the mutant distribution was restricted.



View larger version (40K):
[in this window]
[in a new window]
 
Fig. 1.—Frequencies of the master sequence and surrounding mutants following 1,000 and 2,000 replication cycles at a mutation rate of one mutation per genome per replication

 
As the mutation rate was increased, the frequency of the master sequence after 1,000 replication cycles decreased until reaching 0 at a mutation rate of 2.3 mutations per genome per replication (fig. 2 ). At this point, the mutation rate was too high for the master sequence to survive in the population over time. Out of 200 genomes randomly sampled from the population after 1,000 replication cycles at this mutation rate, no copies of these sequences were present after a further 10 cycles. Therefore, above an error threshold, the mutant distribution ceased to be centered around the master sequence and instead drifted stochastically over sequence space. The resulting population was not a quasispecies, since natural selection was unable to maintain the presence of any single variant in the population over time, and consequently the equilibrium condition was not fulfilled. Increasing the ratio of the fitness of wild-type to mutant sequences in our simulation was found to increase the error threshold, as did increasing the population size. This is in agreement with theoretical predictions (Swetina and Schuster 1982Citation ; Demetrius, Schuster, and Sigmund 1985Citation ).



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 2.—Frequency of the master sequence as a function of mutation rate. Each simulation was run for 1,000 replication cycles

 
The model of selection used in the previous simulations had two important biologically unrealistic features. First, as sites within genomes were assumed to act independently, the adaptive landscape contained only one peak, whereas in real viral populations, epistatic interactions may exist among sites, giving rise to local optima. The steepness and frequency of local optima increases as the number of interactions among sites increases (Kauffman 1993Citation ), and steep fitness peaks would potentially favor the formation of a quasispecies by trapping populations within small regions of sequence space, thereby preventing individual variants from diffusing randomly. Burch and Chao (1999)Citation have proposed that the fitness landscape of bacteriophage {phi}6 is likely to be rugged in such a way, hence lending support to a quasispecies model for this virus.

Second, selection was assumed to act equally on all sites within a genome, whereas in reality, sites range from neutral to highly adaptive. However, if an RNA virus genome contains a set of sites which both have little or no effect on fitness and do not interact epistatically, the number of possible sequences close to maximum fitness may greatly exceed the population size. This in itself would prevent the formation of a quasispecies, because the population would not be able to explore all the neutral space surrounding a fitness peak and would be subject to continual genetic drift and therefore, by definition, unable to reach an equilibrium. Under these conditions, the mutant distribution will diverge and become dispersed over the neutral space, thereby preventing extensive mutational coupling among variants during replication and hence preventing natural selection from acting on the mutant distribution as a whole. Since every neutral site within a genome increases the size of the neutral space by a factor of four, it takes very few independent neutral sites for the number of possible genomes to exceed realistic population sizes: 20 independent neutral sites would be sufficient for the neutral space to exceed a population size of 1010, which is typical of estimates of the census population sizes of RNA viruses within individual hosts during natural infections. For example, this value has been estimated to be on the order of 108 for HIV (Leigh Brown 1997Citation ) and between 109 and 1012 for FMDV (Domingo et al. 1992Citation ), although effective population sizes, that is, the number of viruses contributing progeny to the next generation, may be several orders of magnitude lower (Leigh Brown 1997Citation ).

We therefore added neutral sites to our previous model of an RNA virus population. Initially, the master sequence was divided into 10 neutral sites and 90 negatively selected sites. After 1,000 replication cycles, no copies of the master sequence were present and the structure of the mutant distribution changed rapidly upon subsequent replication (fig. 3 ). Out of 200 genomes randomly sampled from the population after 1,000 replication cycles, no copies of these sequences were present 100 cycles later. The equilibrium condition was not satisfied, since no variants survived within the population over time, and as a result, the population could not be considered a quasispecies. There was a clear lack of any organization within the mutant distribution, and individual variants appeared to be independent entities undergoing a random walk through sequence space. Changing the mutation rate, genome length, or ratio of the fitness of the master sequence to sequences with mutations at negatively selected sites, while keeping the population size and number of neutral sites constant, did not alter the population's inability to reach an equilibrium (results available on request). Critically, though, while the mutant distribution was in a permanent state of genetic drift, the consensus sequence was stable over the course of 1,000 replication cycles. At negatively selected sites, this sequence contained only the favored nucleotide, but at neutral sites it contained an approximately even mixture of all four nucleotides, where the proportion of each was in the range 0.15–0.35. A stable consensus sequence was also obtained when the number of neutral sites was increased to 20. Under such parameters, the neutral space was over 106 times as large as the population size.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 3.—a, The frequency of 200 randomly sampled genomes after 1,000 replication cycles under a simple neutral model of sequence evolution in which the ratio of neutral to negatively selected sites was 10:90 and the mutation rate was one mutation per genome per replication. b, The frequency of these genomes after a further 10 replication cycles. c, The frequency of these genomes after a further 100 replication cycles

 
Empirical Analysis
Since genomes of RNA viruses are among the most compact of all organisms, it is unlikely that many nucleotides are wasted as nonfunctional nucleic acids. Even synonymous sites within protein-coding regions may not be neutral due to biases in codon usage or constraints imposed by RNA secondary structure. To determine whether real RNA viruses have neutral sites, which we have shown is critical to the rejection of the quasispecies model, we carried out an analysis of VSV glycoprotein (G) sequences. Since VSV causes acute infections, samples of sequences taken from single hosts would be expected to be highly homogeneous because there has been little time to generate substantial variation. Consequently, sequences from temporally and geographically distinct isolates were used instead to estimate the number of neutral sites. Our assumption was that sites which were neutral over a long period of evolutionary time would also be neutral during the course of a single infection and so could be used to determine whether the formation a quasispecies in a single host was possible if enough time and a constant adaptive environment were provided. The G gene was chosen because, in addition to its being relatively long (1.5 kb), there are a comparatively large number of sequences available for it.

An alignment of 34 complete G sequences from natural isolates of the New Jersey serotype contained 338 synonymous sites and 531 polymorphic sites, of which 419 were at third codon positions. The ratio of synonymous to nonsynonymous substitutions (dS/dN) across the protein-coding sequence was 20.4 ± 1.89, and a sliding-window analysis of their distribution is shown in figure 4 . The large excess of synonymous substitutions and their relatively even distribution strongly suggested that most synonymous sites were under very little selective constraint. This was confirmed by a Tajima (1989)Citation D test, for which the null hypothesis that the genetic variation in the data set (all sites) was entirely due to neutral mutation was not rejected (D = 1.54, P > 0.05).



View larger version (21K):
[in this window]
[in a new window]
 
Fig. 4.—Average numbers of synonymous and nonsynonymous substitutions per site (dS and dN per site) for a 20-codon sliding window along the VSV G gene

 
We also found no evidence of strong codon bias or extensive RNA secondary structure within the VSV G gene. Codon bias was calculated using the effective codon number index (NC), and an average value of 51.4 was found in our data (range 48.8–53.6), suggesting only a slight bias in codon choice. Likewise, an analysis of RNA folding potential along the genomic and coding sense of this gene using the MFOLD program revealed that while the existence of some local secondary structures could not be ruled out, there appeared to be long regions which lacked the ability to fold into a well-defined structure. For example, the optimal folding energy of the 500nt region from position 1,000 to position 1,500 of this gene (coding sense) was similar to that of randomized sequences of the same length and base composition (P > 0.2 with a confidence level of >0.99).

Overall, these results suggest that some synonymous sites within the VSV G gene are neutral and may not interact epistatically due to lack of extensive RNA secondary structure. Given that synonymous sites account for 20% of sites along this gene, that a 500-nt region lacks significant RNA secondary structure, and that there is little evidence for codon bias due to either mutation pressure or selection, we propose that a conservative estimate for the number of independent neutral sites within this gene is 50. Even if this were the total number of such sites within the entire genome, the neutral space surrounding a fitness peak would still be far too large to allow a population of realistic size to form a quasispecies. For example, the neutral space would be 1020 times as large as a population size of 1010. In reality, of course, the number of independent neutral sites within the whole genome may be far greater, and it is also likely that some nonsynonymous and noncoding sites may be both neutral and independent of each other.

A similar sequence analysis was carried out on 16 sequences of the foot-and-mouth disease virus (FMDV) polyprotein 1 (P1) gene, since FMDV is also widely considered an archetypal quasispecies virus (Domingo et al. 1992Citation ). Our findings were similar to those obtained for VSV. Out of 2,038 sites, 976 were polymorphic, of which 609 were located at third codon positions. The rate of synonymous to nonsynonymous substitution (dS/dN) was 12.5 ± 0.6, and a sliding-window analysis showed that synonymous changes were evenly distributed along the gene (not shown). The average effective codon usage was 54.0 (range 52.6–55.6), and regions lacking significant RNA secondary structure in both the genomic and the antigenomic sense were also found—for example, the 500-nt stretch beginning 1,225 nt downstream of the start codon. Therefore, it is also possible that the neutral space for FMDV exceeds realistic population sizes and thus prevents the formation of a quasispecies distribution. However, in contrast to VSV, a Tajima D test of this data set also found some evidence of selection (D = 2.21, 0.01 < P < 0.05), which was expected, since the P1 gene of FMDV is known to undergo positive selection at antigenic sites (Martin et al. 1998Citation ).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Our simulation study illustrated that two factors are critical in determining whether a viral population will form a quasispecies: the mutation rate and the relative sizes of the population and neutral space surrounding a fitness peak. The mutation rate should be high enough to generate a heterogeneous distribution of mutants but low enough for the master sequence(s) to survive within the population over time, and the population size should be large enough to enable the viral population to completely explore the neutral space surrounding a fitness peak.

Our simulations also showed that a stable consensus sequence was not in itself sufficient for a population to qualify as a quasispecies. In the Qß phage experiments (Domingo et al. 1978Citation ), mutants differed from the consensus sequence by an average of only one to two nucleotides, so the neutral space may be indeed be small enough to be completely explored by populations of this virus. In contrast, variants within a multiply passaged population of VSV grown from wild-type virus were roughly estimated to differ from each other at "dozens" of positions, indicating that this virus has an extremely wide mutant distribution (Steinhauer et al. 1989Citation ). Moreover, many synonymous sites within VSV appear to be evolving in a neutral manner, thereby providing a sequence space which is far larger than realistic population sizes for variants of this virus to fully explore the neutral space surrounding a fitness peak. We therefore propose that a "random walk" model, in which single genomes are independent evolutionary entities, as predicted by conventional population genetic models, is a more appropriate description of VSV evolution than is the quasispecies theory.

Ironically, the experiments of Steinhauer et al. (1989)Citation , which led to VSV being classified as a quasispecies, support our hypothesis in showing the mutant distribution of VSV to be extremely dispersed, making it implausible that such a distribution could realistically form a quasispecies. Our results also clearly demonstrate that a stable consensus sequence, as observed by Steinhauer et al. (1989)Citation , is not sufficient to conclude that a population is at equilibrium and hence a quasispecies. Finally, the observation that a high-fitness variant of VSV isolated from a quasispecies distribution can only rise to dominance if seeded above a certain threshold (De La Torre and Holland 1990Citation ) can also be explained by a nonquasispecies model. Specifically, if genetic drift is in operation, then the probability that a high-fitness variant will rise to dominate a population is partially a function of its initial frequency. Hence, most rare variants will be lost by drift despite having superior fitness, particularly if the population size is small. Since in the VSV experiments passages were performed at a low multiplicity of infection, this is perhaps a more reasonable interpretation of these observations. It is further possible that the suppression of high-fitness variants could occur through clonal interference, in which beneficial mutations that occur in different lineages of an asexual population compete with one another for fixation (Miralles et al. 1999Citation ). A lower-frequency mutant will have less chance of accumulating new beneficial mutations than higher-frequency variants, so a high-frequency, low-fitness variant could prevent the fixation of a low-frequency, high-fitness variant.

The key question which now arises is how many other RNA viruses like VSV are inappropriately classified as quasispecies? Is the evolution of RNA viruses really different from that of DNA-based organisms, or are RNA viruses simply capable of more variation without being fundamentally distinct from their DNA counterparts? It is particularly important to address this question with respect to HIV and hepatitis C virus, both of which are of great medical significance and routinely described as quasispecies. As yet, there is no evidence from either virus that individual variants of lower fitness can outcompete those with higher fitness by being surrounded by better adapted mutants, a key departure of the quasispecies theory from conventional population genetics. Moreover, although continual selective sweeps may purge the variation in such populations, resulting in a narrow mutant distribution, a quasispecies must also possess a stable consensus sequence, which will not be possible in the face of constantly changing adaptive environments. More fundamentally, as the quasispecies model was intended to refer to a precise evolutionary theory rather than to merely be a surrogate term for variation, the onus is on those wishing to describe RNA viruses as quasispecies to rigorously show that the conditions of the theory are fulfilled by real RNA viruses. This is particularly pertinent given the contrary evidence that RNA viruses can be comfortably housed within conventional population genetic theory (Moya et al. 2000).

To date, Domingo et al.'s (1978)Citation classic experiments on Qß phage are probably the most convincing evidence that RNA viruses can form quasispecies. It is also possible that Qß phage is not a good representative of RNA viruses in general due to its small size (4.2 kb); such a highly compact genetic organization with overlapping open reading frames and extensive functionally important RNA secondary structure could mean that the neutral space surrounding fitness peaks is indeed small enough to be completely explored by typical populations. In contrast, larger RNA viruses like VSV may have enough genomic flexibility to make the formation of an equilibrium state impossible even in a constant adaptive environment. Specifically, the genomes of larger RNA viruses would be expected to contain more neutral sites, as they are longer and have a greater proportion of neutral sites due to their weaker structural/functional constraints. Therefore, if the population sizes of smaller and larger RNA viruses are similar, the size of the neutral space relative to the population size would be much greater in the case of the latter, and hence the evolution of smaller and larger RNA viruses may be different. Our results are also relevant to subviral RNA pathogens like viroids and satellite RNAs, which form extensive secondary structures but lack open reading frames. Since extensive neutral networks are proposed to exist in RNA molecules, where a neutral change is considered one that does not alter RNA structure (Schuster et al. 1994Citation ), the number of sequences close to maximum fitness could also potentially exceed the population size for these pathogens, thereby preventing the formation of quasispecies.

Recently, it has been suggested that a quasispecies can exist at a phenotypic level even if the conditions are not met at a genotypic level, since a stable RNA secondary structure can survive within a population of randomly drifting genotypes (Huynen, Stadler, and Fontana 1996Citation ; Schuster and Stadler 1999Citation ). With regard to RNA viruses, the "phenotypic quasispecies" could correspond to a backbone of nucleotide sites that are essential for virus replication. However, such reformulation of quasispecies theory makes it convergent with conventional population genetic theory in that genetic drift is now allowed to play a role in sequence evolution. The only difference is that this phenotypic-quasispecies model still allows a phenotype of lower individual fitness to outcompete a phenotype of higher individual fitness if the former is surrounded by higher-fitness phenotypes than the latter. This could occur if the error rate in replicating a phenotype were high, and if the phenotypic fitness landscape contained both high, sharp peaks and low, broad peaks. Critically, there is as yet no empirical evidence from RNA viruses which supports such a model.

The extent to which the term "quasispecies" has become so deeply ingrained in RNA virus terminology is at best unhelpful. Remarkably, the term has even been applied to Epstein-Barr virus (Gutierrez et al. 1998Citation ) and Helicobacter pylori (Blaser 1997Citation ), both DNA-based organisms with genome sizes several-fold greater than those of RNA viruses and with demonstrably lower mutation rates. Clearly, many biologists use the term but do not fully understand the implications of the quasispecies theory. As we have shown in this study, extreme heterogeneity within clonal populations of finite size may actually be evidence against the quasispecies model. In sum, it would seem unwise to uncritically use quasispecies theory as a model of RNA virus evolution.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
This work was supported by research grants from the Royal Society, the Wellcome Trust, and the Rhodes Trust. We are also grateful to Lin Chao for valuable comments on an earlier draft of this manuscript.


    Footnotes
 
Adam Eyre-Walker, Reviewing Editor

1 Keywords: RNA viruses quasispecies sequence space neutral evolution vesicular stomatitis virus Back

2 Address for correspondence and reprints: Edward C. Holmes, Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, United Kingdom. edward.holmes{at}zoo.ox.ac.uk Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 

    Blaser, M. J.. 1997. Heterogeneity of Helicobacter pylori.. Eur. J. Gastroenterol. Hepatol. 9:S3–S6

    Burch, C. L., L. Chao. 1999. Evolution by small steps and rugged landscapes in the RNA virus {phi}6. Genetics. 151:921–927[Abstract/Free Full Text]

    De La Torre, J. C., J. J. Holland. 1990. RNA virus quasispecies populations can suppress vastly superior mutant progeny. J. Virol. 64:6278–6281[ISI][Medline]

    Demetrius, L., P. Schuster, K. Sigmund. 1985. Polynucleotide evolution and branching processes. Bull. Math. Biol. 47:239–262[ISI][Medline]

    Domingo, E.. 1992. Genetic variation and quasispecies. Curr. Opin. Genet. Dev. 2:61–63[Medline]

    Domingo, E., C. Escarmis, M. A. Martinez, E. Martinez-Salas, M. G. Mateu. 1992. Foot-and-mouth disease virus populations are quasispecies. Curr. Top. Microbiol. Immunol. 176:33–47[ISI][Medline]

    Domingo, E., J. J. Holland. 1997. RNA virus mutations and fitness for survival. Annu. Rev. Microbiol. 51:151–178[ISI][Medline]

    Domingo, E., E. Martinez-Salas, F. Sobrinoet al. (14 co-authors). 1985. The quasispecies (extremely heterogeneous) nature of viral RNA genome populations: biological relevance—a review. Gene. 40:1–8[ISI][Medline]

    Domingo, E., D. Sabo, T. Taniguchi, C. Weissman. 1978. Nucleotide sequence heterogeneity of an RNA phage population. Cell. 13:735–744[ISI][Medline]

    Drake, J. W., B. Charlesworth, D. Charlesworth, J. F. Crow. 1998. Rates of spontaneous mutation. Genetics. 148:1667–1686[Abstract/Free Full Text]

    Eigen, M.. 1987. New concepts for dealing with the evolution of nucleic acids. Cold Spring Harb. Symp. Quant. Biol. 52:307–319[ISI][Medline]

    ———.1993. The origin of genetic information: viruses as models. Gene. 135:37–47

    ———.1996a.. On the nature of viral quasispecies. Trends Microbiol. 4:216–218

    ———.1996b.. Steps towards lifeOxford University Press, New York

    Eigen, M., P. Schuster. 1977. A principle of natural self-organization. Naturwissenschaften. 64:541–565[ISI][Medline]

    Forns, X., R. H. Purcell, J. Bukh. 1999. Quasispecies in viral persistence and pathogenesis of hepatitis C virus. Trends Microbiol. 7:402–410[ISI][Medline]

    Goldman, N., Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725–736[Abstract/Free Full Text]

    Gutierrez, M. I., G. Spangler, D. Kingma, M. Raffeld, I. Guerrero, O. Misad, E. S. Jaffe, I. T. Magrath, K. Bhatia. 1998. Epstein-Barr virus in nasal lymphomas contains multiple ongoing mutations in the EBNA-1 gene. Blood. 92:600–606[Abstract/Free Full Text]

    Holland, J. J., J. C. De La Torre, D. A. Steinhauer. 1992. RNA virus populations as quasispecies. Curr. Top. Microbiol. Immunol. 176:1–20[ISI][Medline]

    Huynen, M. A., P. F. Stadler, W. Fontana. 1996. Smoothness within ruggedness: the role of neutrality in adaption. Proc. Natl. Acad. Sci. USA. 93:397–401[Abstract/Free Full Text]

    Kauffman, S. A.. 1993. The origins of orderOxford University Press, New York

    Leigh Brown, A. J.. 1997. Analysis of HIV-1 env gene sequences reveals evidence for a low effective number in the viral population. Proc. Natl. Acad. Sci. USA. 94:1862–1865[Abstract/Free Full Text]

    Martin, M. J., J. I. Nunez, F. Sobrino, J. Dopazo. 1998. A procedure for detecting positive selection in highly variable genomes: evidence of positive selection in antigenic regions of capsid protein VP1 of foot-and-mouth disease virus. J. Virol. Methods. 74:215–221[ISI][Medline]

    Mathews, D. H., J. Sabina, M. Zuker, D. H. Turner. 1999. Expanded sequence dependence of thermodynamic parameters provides robust prediction of RNA secondary structure. J. Mol. Biol. 288:911–940[ISI][Medline]

    Miralles, R., P. J. Gerrish, A. Moya, S. F. Elena. 1999. Clonal interference and the evolution of RNA viruses. Science. 285:1745–1747[Abstract/Free Full Text]

    Moya, A., S. F. Elena, A. Bracho, R. Miralles, E. Barrio. 2000. The evolution of RNA viruses: a population genetics view. Proc. Natl. Acad. Sci. USA. 97:6967–6973[Abstract/Free Full Text]

    Nielsen, R., Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 148:929–936[Abstract/Free Full Text]

    Nowak, M. A.. 1992. What is a quasispecies?. Trends Ecol. Evol. 7:118–121[ISI]

    Schuster, P., P. F. Stadler. 1999. Nature and evolution of early repliconsPp. 1–24 in E. Domingo, R. Webster, and J. Holland, eds. Origin and evolution of viruses. Academic Press, London

    Schuster, P., W. Fontana, P. F. Stadler, I. L. Hofacker. 1994. From sequences to shapes and back: a case study in RNA secondary structure. Proc. R. Soc. Lond. B Biol. Sci. 255:279–284[ISI][Medline]

    Seffens, W., D. Digby. 1999. mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. Nucleic Acids Res. 27:1578–1584[Abstract/Free Full Text]

    Steinhauer, D. A., J. C. De La Torre, E. Meier, J. J. Holland. 1989. Extreme heterogeneity in populations of vesicular stomatitis virus. J. Virol. 63:2072–2080[ISI][Medline]

    Steinhauer, D. A., J. J. Holland. 1987. Rapid evolution of RNA viruses. Annu. Rev. Microbiol. 41:409–433[ISI][Medline]

    Swetina, J., P. Schuster. 1982. A model for polynucleotide replication. Biophys. Chem. 16:329–345[ISI][Medline]

    Tajima, F.. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 123:585–595[Abstract/Free Full Text]

    Thompson, E. A., D. G. Higgins, T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680[Abstract]

    Wain-Hobson, S.. 1992. Human immunodeficiency virus type 1 quasispecies in vivo and ex vivo. Curr. Top. Microbiol. Immunol. 176:181–193[ISI][Medline]

    Wright, F.. 1990. The effective number of codons used in a gene. Gene. 87:23–29[ISI][Medline]

Accepted for publication February 8, 2001.