Phylogenetic analysis reveals a low rate of homologous recombination in negative-sense RNA viruses

Elizabeth R. Chare1, Ernest A. Gould2 and Edward C. Holmes1

1 Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK
2 Centre for Ecology and Hydrology, Mansfield Road, Oxford, UK

Correspondence
Edward Holmes
Edward.Holmes{at}zoo.ox.ac.uk


   ABSTRACT
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Recombination is increasingly seen as an important means of shaping genetic diversity in RNA viruses. However, observed recombination frequencies vary widely among those viruses studied to date, with only sporadic occurrences reported in RNA viruses with negative-sense genomes. To determine the extent of homologous recombination in negative-sense RNA viruses, phylogenetic analyses of 79 gene sequence alignments from 35 negative-sense RNA viruses (a total of 2154 sequences) were carried out. Powerful evidence was found for recombination, in the form of incongruent phylogenetic trees between different gene regions, in only five sequences from Hantaan virus, Mumps virus and Newcastle disease virus. This is the first report of recombination in these viruses. More tentative evidence for recombination, where conflicting phylogenetic trees were observed (but were without strong bootstrap support) and/or where putative recombinant regions were very short, was found in three alignments from La Crosse virus and Puumala virus. Finally, patterns of sequence variation compatible with the action of recombination, but not definitive evidence for this process, were observed in a further ten viruses: Canine distemper virus, Crimean-Congo haemorrhagic fever virus, Influenza A virus, Influenza B virus, Influenza C virus, Lassa virus, Pirital virus, Rabies virus, Rift Valley Fever virus and Vesicular stomatitis virus. The possibility of recombination in these viruses should be investigated further. Overall, this study reveals that rates of homologous recombination in negative-sense RNA viruses are very much lower than those of mutation, with many viruses seemingly clonal on current data. Consequently, recombination rate is unlikely to be a trait that is set by natural selection to create advantageous or purge deleterious mutations.


   INTRODUCTION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
RNA viruses are notorious for their ability to evolve rapidly, which can be attributed to their high rates of mutation and replication and large population sizes (Domingo & Holland, 1997). However, another evolutionary process, recombination, is being recognized increasingly as a potentially important means of generating and shaping genetic diversity in these infectious agents (Lai, 1992; Worobey & Holmes, 1999).

Despite the mounting evidence for recombination in RNA viruses, it is also apparent that recombination rates vary extensively. Recombination appears to be relatively frequent in retroviruses (Hu & Temin, 1990; Jung et al., 2002) and in viruses with segmented genomes in the guise of reassortment (Gorman et al., 1992). Furthermore, there is now compelling evidence for more sporadic recombination in a variety of positive-sense RNA viruses from animals (Ball, 1997; Lai, 1992; Liao & Lai, 1992; Strauss & Strauss, 1997; Worobey et al., 1999), plants (Aaziz & Tepfer, 1999; Roossinck, 1997; Simon & Bujarski, 1994) and bacteria (Chetverin, 1997; Mindich, 1996). In contrast, there are few reports of recombination (aside from reassortment) in negative-sense RNA viruses, which has led some authors to suggest that these viruses are characterized by low rates of recombination (Pringle & Parry, 1982). To date, evidence for recombination has only been documented in some ambisense arenaviruses (Archer & Rico-Hesse, 2002; Charrel et al., 2001), hantaviruses (Klempa et al., 2003; Sibold et al., 1999; Sironen et al., 2001) and Influenza A virus (Gibbs et al., 2001). Moreover, the case for recombination in Influenza A virus is controversial, with a re-analysis of the critical data indicating that the unusual pattern of evolution in an isolate of the ‘Spanish flu’ from 1918 was more likely due to variation in nucleotide substitution rates among different regions of the haemagglutinin (HA) gene than recombination between human and pig strains (Worobey et al., 2002).

There are two general explanations for why recombination rates are so variable in RNA viruses. One hypothesis is that recombination rate is a characteristic set by natural selection, since this process confers two types of fitness advantage that might benefit some viruses more than others. First, recombination can create and spread advantageous genotypes more rapidly than occurs in clonal systems. As an example, it is likely that frequent recombination in Human immunodeficiency virus facilitates the spread of drug-resistant mutants (Burke, 1997; Moutouh et al., 1996). The second major fitness benefit is that deleterious mutations can be removed by recombination with mutation-free parts of different genomes, thereby enabling weak or non-replicative mutant strains to recover viable genome sequences. When the population sizes of RNA viruses are small, which may occur if there are major bottlenecks at transmission, such purging of deleterious changes by recombination may allow viruses to escape the long-term reduction in fitness associated with clonality (‘Muller's ratchet’), as demonstrated in some experimental systems (Chao, 1990; Duarte et al., 1992). Given that the rate of deleterious mutation in RNA viruses may be ~1 per genome per replication (Drake & Holland, 1999; Elena & Moya, 1999), selective pressure to preserve RNA viruses from fitness losses could be considerable.

Alternatively, there may be purely mechanistic and/or ecological explanations for differing rates of recombination in RNA viruses. In this case, recombination rate is not a selectively determined trait in itself but rather the natural outcome of a particular genome structure or virus ecology. For example, there are several steps in the replication process that influence recombination rate. Most obviously, multiple virus strains must infect the same host cell. This may only occur at a very low rate if distinct virus strains are ecologically or geographically isolated, if there is a potent immune response that limits superinfection or if specific host factors block the entry of multiple viruses into single cells (Danis et al., 1993; Simon et al., 1990; Worobey & Holmes, 1999). Even if co-infection of individual cells is successful, physical constraints, such as the degree of sequence dissimilarity, may prevent the formation of viable hybrids and so inhibit recombination (Bujarski & Nagy, 1996). Finally, most recombinants, like most point mutations, are likely to be deleterious and so will be removed from the population by purifying selection. Indeed, the large-scale loss of recombinants by purifying selection has been observed in coronaviruses (Banner & Lai, 1991).

Given the potential importance of recombination in RNA virus evolution, it is imperative to determine the frequency with which it occurs and the factors that control its rate. To this end, we analysed the extent of recombination in all those negative-sense RNA viruses for which population samples are available currently in GenBank; this comprised 35 viruses from six families and totalled 2154 individual gene sequences. We focused only on RNA recombination occurring within genomic segments, which most likely occurs through copy-choice replication (Cooper et al., 1974; Lai, 1992). This process is distinct from that of reassortment in segmented RNA viruses, which does not concern this paper. Furthermore, we only consider homologous recombination, although rare non-homologous recombination has been reported in Influenza A virus (Orlich et al., 1994) and that which occurs within individual virus species. Our analysis involved the use of three phylogenetic (bioinformatic) methods to detect recombination. Although these methods look for different signatures of recombination in gene sequence alignments, we regard the occurrence of phylogenetic incongruence, in which regions of a sequence alignment have significantly different evolutionary histories, as the only definitive evidence for recombination. The observation of phylogenetic incongruence was the first indication that recombination occurred in Human immunodeficiency virus (Robertson et al., 1995) and is regarded generally as the best evidence for recombination at the level of gene sequence analysis (Posada et al., 2002).


   METHODS
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Sequence data.
A total of 79 gene sequence alignments comprising 2154 individual sequences and representing 35 negative-sense RNA viruses from six virus families was collected from GenBank (Table 1). Since the probability of finding recombination within a sequence alignment grows with increasing sequence length and the number of isolates analysed, the alignments for each gene comprised the longest contiguous stretch of sequence for which several isolates were available from individual virus species. Consequently, 47 of 79 alignments contained 20 or more isolates (average 27 sequences) and 56 alignments were more than 1000 bp in length (average 1502 bp). All sequences were aligned using the CLUSTALX program (Thompson et al., 1997) and sites with gaps were removed in most cases. The sequence alignments used in this study are available at http://evolve.zoo.ox.ac.uk/.


View this table:
[in this window]
[in a new window]
 
Table 1. Sequence alignments of negative-sense RNA viruses used in this study

 
Testing for recombination
(i) Phylogenetic incongruence.
Our main approach to assessing the extent of recombination in negative-sense RNA viruses was to document cases in which different regions of a sequence alignment produced phylogenetic trees with significantly different topologies – so-called phylogenetic incongruence. This analysis involved three steps. First, for each alignment, we constructed neighbour-joining (NJ) trees for windows of 300–400 bp and slid along the sequence alignment by this amount. This simple method enabled us to detect individual sequences that changed phylogenetic position. In all cases, this analysis was conducted assuming the HKY85 model of nucleotide substitution and a gamma distribution of rate variation among sites in the sequence alignment (four rate categories, shape parameter {alpha}=0·5). Next, for those sequences that changed phylogenetic position, we determined whether the differences in tree topology were greater than expected by chance alone and also identified the most likely break-points. This was achieved using a maximum-likelihood (ML) method (LARD) (Holmes et al., 1999). Here, a three-sequence alignment comprising the putative recombinant and the two closest ‘parental’ lineages identified in the NJ trees is split in two at every position and the branch lengths then estimated for both regions. The break-point that gives the highest likelihood is identified and a likelihood ratio (LR) test is used to compare the likelihood of this ‘recombination tree’ to that of a tree where recombination is prevented. Significance values were determined using 200 Monte Carlo-simulated sequences (SEQ-GEN) (Rambaut & Grassly, 1997), subjected to the same break-point analysis as the real data and again excluding recombination. Finally, ML phylogenetic trees were estimated on either side of the break-point determined above and a bootstrap analysis was undertaken to assess the support for the conflicting phylogenetic positions of the putative recombinant. Strong evidence for phylogenetic incongruence, and hence recombination, is deemed to be present if the conflicting tree positions are each supported by >75 % of bootstrap replicates. In all cases, ML trees were found by successive rounds of tree bisection-reconnection branch-swapping in which the parameter values for the general time-reversible model of nucleotide substitution, the gamma distribution of among site rate variation (with eight rate categories) and the base composition were estimated from the data. Bootstrap re-sampling analysis involved 1000 replicate NJ trees reconstructed under the ML substitution model determined previously. All these analyses were undertaken using PAUP* (Swofford, 2002).

(ii) Sawyer's runs test.
The second method used to detect recombination, Sawyer's runs test (GENECONV) (Sawyer, 1989), also looks for distinct recombination break-points in sequence alignments. In this case, a search is made for unusually long fragments within an alignment over which a pair of sequences are identical, or nearly identical, even though these sequences do not share common ancestry (hence the output in this case consists of pairs of recombinant sequences). Whether these runs of similarity are longer than expected by chance is assessed using randomly permuted data sets derived from the same data. For each putative recombinant, a ‘global’ permutation P value is calculated, which reflects the proportion of permuted alignments for which some fragment from some pair of sequences has a higher score than for the reference sequences. In all cases, the default parameters were used with the following exceptions: insertions and deletions were skipped and the threshold for the global P value was set at 0·05. This method was shown recently in simulation studies to be a powerful, although conservative, method to detect recombination (Posada et al., 2002).

(iii) Informative sites test.
Although searching for break-points is a powerful way to detect recombination in sequence alignments, the sequences analysed must be sufficiently divergent so that conflicting phylogenetic signals can be detected (Posada et al., 2002). Therefore, we employed a third method, the informative sites test, which can detect recombination among more closely related sequences and which also performed well in simulation studies (PIST) (Worobey, 2001). The test detects recombination by distinguishing the ‘apparent rate heterogeneity’ among nucleotide sites caused by recombination from the ‘real rate heterogeneity’ among sites caused by mutation. Such a distinction is possible because mutation and recombination affect the pattern of variability at polymorphic sites in different ways, as reflected in the proportion of two-state parsimony informative sites (that is, those variable sites that partition the sequence alignment into two phylogenetic groups) relative to all polymorphic sites, denoted by the ‘q’ parameter. The values of q computed for the real data are then compared to a null distribution (qc) obtained by simulating 200 data sets along a ML tree linking these data (with the ML substitution parameters) assuming no recombination. A P value is defined as the proportion of simulated alignments that satisfy the condition qc<q, with P<0·05 taken as significant evidence for recombination. To reduce the possibility that the results are affected by natural selection, which might also determine patterns of sequence variation at polymorphic sites, the analysis was only run on third codon position sites from non-overlapping reading frames.


   RESULTS
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Analysis of phylogenetic incongruence
Although alignments from a number of negative-sense RNA viruses contained sequences that changed phylogenetic position, in only five cases (from three different viruses) were these changes in tree topology significant in the ML test and result in strongly conflicting bootstrap support values for the putative recombinant on either side of the ML break-point (Table 2). These sequences were as follows: the G2 gene from Hantaan virus, strain 84FLi (AF366569); the F gene from Mumps virus, strain MP 93-AK (Kashiwagi et al., 1999), and the HN gene from Mumps virus, strain 4972 (AF448528); the HN gene from Newcastle disease virus, strain GPMV/QY97-1 (AF192406), and the N gene from Newcastle disease virus, strain chicken/Mexico/37821/96 (Seal et al., 2002). ML phylogenetic trees revealing the clear topological movement of these sequences are shown in Figs 1 to 5. There was no evidence for strong phylogenetic incongruence indicative of recombination in any of the remaining 74 sequence alignments.


View this table:
[in this window]
[in a new window]
 
Table 2. Sequence alignments with significant evidence for recombination under the ML analysis of phylogenetic incongruence

 


View larger version (12K):
[in this window]
[in a new window]
 
Fig. 1. ML tree for the G2 gene from Hantaan virus showing the movement of strain 84FLi (boxed), indicative of recombination. Bootstrap values (percentage from 1000 replications) are shown for the relevant nodes. Sequence positions are numbered according to the specific alignment used. The tree is mid-point-rooted for purposes of clarity only and all horizontal branch lengths are drawn to scale.

 


View larger version (24K):
[in this window]
[in a new window]
 
Fig. 2. ML tree for the F gene from Mumps virus showing the movement of strain MP 93-AK (boxed). Bootstrap values are shown for the relevant nodes. Sequence positions are numbered according to the specific alignment used. The tree is mid-point-rooted for purposes of clarity only and all horizontal branch lengths are drawn to scale.

 


View larger version (17K):
[in this window]
[in a new window]
 
Fig. 3. ML tree for the HN gene from Mumps virus showing the movement of strain 4972 (boxed). Bootstrap values are shown for the relevant nodes. Sequence positions are numbered according to the specific alignment used. Where strain names are not available, GenBank accession numbers are used in their place. The tree is mid-point-rooted for purposes of clarity only and all horizontal branch lengths are drawn to scale.

 


View larger version (28K):
[in this window]
[in a new window]
 
Fig. 4. ML tree for the HN gene from Newcastle disease virus showing the movement of strain GPMV/QY97-1 (boxed). Bootstrap values are shown for the relevant nodes. Sequence positions are numbered according to the specific alignment used. Where strain names are not available, GenBank accession numbers are used in their place. The tree is mid-point-rooted for purposes of clarity only and all horizontal branch lengths are drawn to scale.

 


View larger version (26K):
[in this window]
[in a new window]
 
Fig. 5. ML tree for the N gene from Newcastle disease virus showing the movement of strain chicken/Mexico/37821/96 (boxed). Bootstrap values are shown for the relevant nodes. Sequence positions are numbered according to the specific alignment used. Where strain names are not available, GenBank accession numbers are used in their place. The tree is mid-point-rooted for purposes of clarity only and all horizontal branch lengths are drawn to scale.

 
In the case of Hantaan virus, although the precise phylogenetic position of strain 84FLi in the first part of the G2 gene (bases 1–445) could not be resolved in the bootstrap analysis, the change in tree topology with respect to the second region (bases 446–1638) was so well supported that this sequence is considered to contain strong evidence for recombination. Conversely, although conflicting phylogenetic trees were observed in the M gene from Measles virus – isolate Yamagata-1 (Haga et al., 1992) (break-point at position 438, P<0·005) – the putative recombinant sequence as well as one of the parental lineages were both clones of the same virus (Yamagata-1) and have been subject to substantial laboratory manipulation. As this may indicate that the recombination event occurred in the laboratory rather than in nature, this putative recombinant was excluded from subsequent discussions. Furthermore, this same data set was negative for recombination in both the Sawyer's runs and the informative sites tests, and no other recombinants were observed in the relatively large sample of measles viruses analysed here. Finally, in the case of the N gene from Newcastle disease virus, it may seem anomalous that the parents to the recombinant chicken/Mexico/37821/96 strain were sampled 50 years apart and on different continents. However, it is important to remember that these represent parental lineages and not necessarily the actual sequences that recombined. Indeed, it is highly unlikely that we will sample the exact parents in each case. Hence, as long as the parental chicken/Italy/Milano/45 strain of Newcastle disease virus produced descendents and can be transported long distances, the inference that recombination has occurred in this data set is reasonable.

Sawyer's runs test
The analysis of recombination using Sawyer's runs test produced very similar results to those obtained using the ML phylogenetic incongruence method. Indeed, of the five sequence alignments with major differences in tree topology, four were also found to have global P values <0·05 under Sawyer's runs test (Table 3). The only exception was the F gene from Mumps virus, the most weakly supported recombinant in the ML incongruence test (that is, with the lowest LR statistic) and which was not significant in Sawyer's runs test.


View this table:
[in this window]
[in a new window]
 
Table 3. Sequence alignments with significant evidence for recombination under Sawyer's runs test

 
There were, however, some differences in the isolates identified as recombinant and in the location of the break-points between the incongruence and Sawyer's runs tests. In the case of the G2 gene from Hantaan virus, incongruence analysis identified isolate 84FLi as recombinant at position 445, with strains H8205 and A16 being the closest parental lineages. However, GENECOV identified H8205 and A16 as recombinant, although at the same break-point, while a second recombination event was identified at position 1002, this time involving 84FLi. Consequently, the precise history of recombination in the G2 gene from Hantaan virus is difficult to reconstruct, although there is clear evidence of phylogenetic incongruence involving three isolates of this virus. A similar situation was seen in the N gene from Newcastle disease virus. Here, the incongruence test pin-pointed isolate chicken/Mexico/37821/96 as a recombinant, with strains chicken/Italy/Milano/45 and chicken/Mexico/37822/96 as the parental lineages. However, Sawyer's runs test suggested that the two parents were, in fact, the recombinants, although the break-points were in the same region. Finally, Sawyer's runs test suggested that multiple recombination events had occurred in the HN gene from Newcastle disease virus, with similar break-points to those identified in the incongruence analysis. In all cases, isolate GPMV/QY97-1 was identified as a recombinant, as it was in the incongruence test, along with both closest parental lineages (strains F48E9 and GD/1/98/Go). The only other putative recombinant sequence identified by Sawyer's runs test was ZJ/1/00/Go but, as this sequence is related very closely to strain GD/1/98/Go, this is likely to represent the same recombination event. Indeed, that both the incongruence and the Sawyer's runs tests have identified essentially the same isolates and break-points suggests strongly that the same recombination events have been detected by both methods.

Sawyer's runs test also found evidence for recombination in three data sets that were not identified in the incongruence test: the G gene from La Crosse virus and the G1 and NP genes of Puumala virus (Table 3). In the case of La Crosse virus, GENECOV suggested that a 99 bp region from isolates M87664 and 74-32813 was recombinant. Visual inspection of trees constructed from the respective gene regions (region 1, bases 1–3306 and 3406–4323 combined; region 2, bases 3307–3405) revealed that they were indeed incongruent, with sequence M87664 grouping with 74-32813 or LAC30928-31 in different parts of the alignment, in both cases with 100 % bootstrap support. However, because the putative recombinant region is so short, the tree is resolved poorly and this recombination event was not detected using the incongruence test. Similar reservations apply to the G1 gene from Puumala virus, where GENECOV suggested that strains Vindeln/L20Cg/83 and Sotkamo were recombinant in an 84 bp region. Analysis of phylogenetic trees from the appropriate regions (region 1, bases 1–1323 and 1408–2442 combined; region 2, bases 1324–1407) indicated that although strain Sotkamo (Vapalahti et al., 1992) changed phylogenetic position, bootstrap levels were extremely low; thus, the evidence for recombination in these data are ambiguous. In the case of the NP gene from Puumala virus, Sawyer's runs test suggested that recombination had occurred in the Gomselga and Karhumaki strains, but with marginal levels of significance. Construction of trees on either side of the break-point (region 1, bases 1–381; region 2, bases 382–1299) revealed that isolate Karhumaki shares common ancestry with both the Gomselga and the Kolodozero isolates (all strains provided by Asikainen et al., 2000), although without strong bootstrap support. Therefore, the evidence for recombination in this alignment is also ambiguous. The full results of the phylogenetic analyses described in this section are available from the authors on request.

Informative sites test
A different picture of the extent of recombination in negative-sense RNA viruses was revealed by the informative sites test. Under this analysis, 24 gene sequence alignments representing 15 different viruses gave significant evidence for recombination (P<0·05) (Table 4). More significant was that evidence for recombination was found in 14 new data sets, representing ten negative-sense RNA viruses. These viruses were: Canine distemper virus, Crimean-Congo haemorrhagic fever virus, Influenza A virus, Influenza B virus, Influenza C virus, Lassa virus, Pirital virus, Rabies virus, Rift Valley Fever virus and Vesicular stomatitis virus. The best support for recombination (P<0·001) was found in the three data sets from Hantaan virus (G1, G2 and N), the HA gene from Influenza A virus (human, swine and avian, subtype H1N1), the NP gene from Lassa virus, the P gene from Newcastle disease virus, the N gene from Pirital virus, the G1 and NP genes from Puumala virus, and the N gene from Rabies virus. Conversely, the P values for Canine distemper virus, Influenza B virus and Influenza C virus were only marginally significant (P=0·044, 0·048 and 0·048, respectively). With the exception of the N gene from Newcastle disease virus, all those data sets with evidence of recombination under the incongruence or Sawyer's runs tests also gave positive results in the informative sites test.


View this table:
[in this window]
[in a new window]
 
Table 4. Sequence alignments with significant evidence for recombination under the informative sites test

 

   DISCUSSION
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
We have performed a comprehensive phylogenetic analysis of the extent of recombination in 79 gene sequence alignments from 35 negative-sense RNA viruses. In only five sequences from a total of 2154 was there clear evidence for recombination, as depicted by regions of the alignment with conflicting (incongruent) phylogenetic histories. These sequences came from three viruses Hantaan virus (Bunyaviridae), Mumps virus (Paramyxoviridae) and Newcastle disease virus (Paramyxoviridae). The case for recombination in these viruses is, therefore, compelling and this is the first time it has been reported.

Similar results were obtained using Sawyer's runs test, which also identifies specific break-points in sequence alignments, although recombination was also detected in three alignments from La Crosse virus and Puumala virus. In the case of La Crosse virus, the conflicting phylogenetic trees in the G gene were supported strongly but the putative recombinant region was so short that caution must be exercised. Similar reservations apply to Puumala virus, where evidence was found for recombination in the G1 and NP genes, although without strong bootstrap support for conflicting phylogenies but where the action of recombination has been suggested previously (Sironen et al., 2001). Hence, in the case of La Crosse virus and Puumala virus, we conclude that the case for recombination is suggestive, but not proven, and that more studies are required. Indeed, because our study has necessarily used gene sequence data taken from GenBank, we are unable to determine whether each incidence of recombination we document is natural or whether it has occurred as a laboratory artefact.

Somewhat different results were obtained using the informative sites test, which looks for patterns of sequence variation (polymorphism) indicative of recombination. In this case, significant evidence of recombination was found in 15 viruses, ten of which were apparently clonal under both the incongruence and the Sawyer's runs test. However, with the exception of the N gene from Newcastle disease virus, all data sets with evidence for recombination in the incongruence and Saywer's runs tests also gave positive results in the informative sites test. There was no evidence for recombination in the remaining 20 negative-sense RNA viruses using any of the analytical methods employed here.

Although some sequences were found to be recombinant in all analyses, the informative sites test suggested far more frequent recombination than the two methods that detected specific break-points. There are two possible explanations for this difference. First, it is possible that recombination has occurred in all those viruses highlighted by the informative sites test but that the effects on tree topology are so slight that they have not resulted in clear phylogenetic incongruence or in ‘mosaic’ sequence alignments. This may be the case if the recombinant and parental lineages are closely related, so that phylogenetic resolution is low. Alternatively, it is possible that the informative sites test has been misled by another evolutionary process, so that false-positive results are common. In particular, positive natural selection, such as that involved in immune escape, could cause similar sequences to evolve in unrelated strains but on such a localized scale as not to cause widespread incongruence. As a number of the data sets with significant results under the informative sites tests are surface antigens, it is possible that the effects of selection are relatively strong in these cases. Indeed, although the informative sites test performed well in simulation studies (Worobey, 2001), the confounding effects of selection were not explored. However, we attempted to minimize this effect by only analysing the third positions of codons where selection pressure is expected to be weakest. Consequently, it is impossible to confirm or exclude the action of recombination in those viruses that only give positive results under the informative sites test. We conclude, therefore, that these viruses have evolved in a manner that is compatible with recombination but that unequivocal evidence for this process, in the form of incongruent phylogenetic trees, has yet to be provided.

The difficulties in confirming the presence of recombination in some RNA viruses is most notable with respect to human Influenza A virus. Previously, it was proposed that recombination had occurred in a strain isolated from 1918 (South Carolina/1918) with most of the globular domain (HA1) deriving from a swine lineage, while the ancestry of the stalk region (HA2) was human (Gibbs et al., 2001). Although this recombination event was supported in a number of analytical tests, it was demonstrated later that this apparent recombination could be explained better by a substantial difference in substitution rate between HA1 and HA2, which occurred in the human, but not the swine, form of the virus (Worobey et al., 2002). According to our informative sites test, the alignment used by Gibbs et al. (2001) does contain evidence for recombination. Furthermore, the ML incongruence test provided significant evidence (P<0·005) for a recombination event in the South Carolina/1918 strain at the junction of the HA1 and HA2 regions and at a very similar position in isolate Mongolia/88 (full results are available from the authors on request). However, these incongruent phylogenetic trees did not receive strong bootstrap support and Sawyer's runs test found no evidence for recombination in these data. At present, therefore, the case for RNA recombination in Influenza A virus remains unproven, with extensive rate variation producing a similar phylogenetic signal in this case.

Overall, our study reveals that recombination is unlikely to be a frequent process in negative-sense RNA viruses, with only a few clear-cut examples in the 79 gene sequence alignments studied here. While we were unable to estimate precise recombination rates from our analyses, it is clear that these rates must be lower than those of mutation, which is not the case in some other viruses (Jung et al., 2002). Indeed, the absence of any detectable recombination in 20 of 35 negative-sense RNA viruses suggests that they may be entirely clonal organisms, although this will clearly need to be confirmed with much larger sequence data sets. Why is recombination so rare in negative-sense RNA viruses? One possibility is that the superinfection of host cells by different virus strains only occurs at a low rate because negative-sense RNA viruses often produce acute infections with relatively short recovery periods. Consequently, there is only a small probability that hosts are multiply infected with different strains, thereby limiting detectable recombination. Indeed, the best evidence for recombination in negative-sense RNA viruses prior to this study comes from some arenaviruses and hantaviruses that establish persistent infections in their rodent hosts (Archer & Rico-Hesse, 2002; Charrel et al., 2001; Sibold et al., 1999; Sironen et al., 2001). While the rapidity of infection may limit the rate of recombination in some cases, it cannot be the sole explanation since reassortment is well documented in segmented negative-sense RNA viruses that cause acute infections, such as in Influenza A virus.

A second possible reason for the low rate of recombination in negative-sense RNA viruses is that this process is hampered by the presence of the ribonucleoprotein complex (RNP), which never disassembles from the RNA and may, therefore, affect the ability of RNA polymerase to switch templates during replication (Conzelmann, 1998). Indeed, the RNP is associated with unique properties of some negative-sense RNA viruses. For example, it appears that encapsidation by the RNP is required for replication in paramyxoviruses (Kolakofsky et al., 1998). Electron microscopy indicates further that in these paramyxoviruses each nucleoprotein encapsidates six bases of genomic RNA (Egelman et al., 1989), suggesting that the viral polymerase will only replicate the genome efficiently when the last six nucleotides are completely encapsidated by a single nucleoprotein. This phenomenon, known as the ‘rule of six’, also dictates that efficient recombination will only be achieved when the total number of nucleotides is a multiple of six (Calain & Roux, 1993). On the other hand, the considerable evidence for the production of defective RNAs in negative-sense viruses indicates that the polymerase is able to jump off and on to the template of the same (or different) RNA molecules relatively frequently (Calain et al., 1999; Duhaut & Dimmock, 1998; Jennings et al., 1983; Li & Pattnaik, 1997; Murphy & Parks, 1997), so that the RNP may not be a serious barrier to RNA recombination. In summary, the factors that constrain recombination in negative-sense RNA recombination have yet to be fully elucidated.

Whatever the reason for the low rate of homologous recombination in negative-sense RNA viruses, it is unlikely to be a trait that is selected because of its possible fitness benefits (although it is possible that genome segmentation has evolved as a strategy to increase fitness by reassortment in some cases) (Chao, 1994). Specifically, as the mutation rate in negative-sense RNA viruses is similar to that in other viruses replicated by RNA-dependent RNA polymerase (Jenkins et al., 2002), so that fitness compromising deleterious mutations will appear frequently and as these viruses will be faced commonly with a variety of adaptive challenges, at least some negative-sense RNA viruses would be expected to show higher rates of recombination if this allowed viruses to purge deleterious mutations and/or create advantageous mutations. This is clearly not the case; those negative-sense RNA viruses that recombine are perhaps simply those that have the greatest opportunity to do so.

While our study suggests that recombination rates are not high enough in negative-sense RNA viruses to confer a substantial fitness advantage, we cannot rule out that these rates have been minimized by natural selection. Although this idea may sound paradoxical, it follows directly from the observation that recombination reduces the error threshold – the point beyond which so many deleterious mutations are made in each replication cycle that viral genomes are unable to reproduce themselves faithfully and extinction occurs (Boerlijst et al., 1996). In these circumstances, recombination has a negative effect on virus fitness by bringing together combinations of deleterious mutations more rapidly than can occur by mutation alone. Given recent developments in using ‘lethal mutagenesis' as a basis to antiviral therapy, in which the application of mutagens increases the virus mutation rate beyond the error threshold (Crotty et al., 2001; Loeb et al., 1999; Sierra et al., 2000), the accurate estimation of recombination rates in RNA viruses is clearly of fundamental importance.


   ACKNOWLEDGEMENTS
 
We thank the Royal Society (London) and the Ian Karten Scholarship (University of Oxford) for financial support. Two anonymous referees and Dr John McCauley also made very helpful comments.


   REFERENCES
Top
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
REFERENCES
 
Aaziz, R. & Tepfer, M. (1999). Recombination in RNA viruses and in virus-resistant transgenic plants. J Gen Virol 80, 1339–1346.[Free Full Text]

Archer, A. M. & Rico-Hesse, R. (2002). High genetic divergence and recombination in arenaviruses from the Americas. Virology 304, 274–281.[CrossRef][Medline]

Asikainen, K., Hanninen, T., Henttonen, H. & 7 other authors (2000). Molecular evolution of puumala hantavirus in Fennoscandia: phylogenetic analysis of strains from two recolonization routes, Karelia and Denmark. J Gen Virol 81, 2833–2841.[Abstract/Free Full Text]

Ball, L. A. (1997). Nodavirus RNA recombination. Semin Virol 8, 95–100.[CrossRef]

Banner, L. R. & Lai, M. M. C. (1991). Random nature of coronavirus RNA recombination in the absence of selection pressure. Virology 185, 441–445.[Medline]

Boerlijst, M. C., Bonhoeffer, S. & Nowak, M. A. (1996). Viral quasi-species and recombination. Proc R Soc Lond B Biol Sci 263, 1577–1584.

Bujarski, J. J. & Nagy, P. D. (1996). Different mechanisms of homologous and nonhomologous recombination in brome mosaic virus: role of RNA sequences and replicase proteins. Semin Virol 7, 363–372.[CrossRef]

Burke, D. S. (1997). Recombination in HIV: an important viral evolutionary strategy. Emerg Infect Dis 3, 253–259.[Medline]

Calain, P. & Roux, L. (1993). The rule of six, a basic feature for efficient replication of Sendai virus defective interfering RNA. J Virol 67, 4822–4830.[Abstract]

Calain, P., Monroe, M. C. & Nichol, S. T. (1999). Ebola virus defective interfering particles and persistent infection. Virology 262, 114–128.[CrossRef][Medline]

Chao, L. (1990). Fitness of RNA virus decreased by Muller's ratchet. Nature 348, 454–455.[CrossRef][Medline]

Chao, L. (1994). Evolution of genetic exchange in RNA viruses. In The Evolutionary Biology of Viruses, pp. 233–250. Edited by S. S. Morse. New York: Raven Press.

Charrel, R. N., de Lamballerie, X. & Fulhorst, C. F. (2001). The Whitewater Arroyo virus: natural evidence for genetic recombination among Tacaribe serocomplex viruses (family Arenaviridae). Virology 283, 161–166.[CrossRef][Medline]

Chetverin, A. B. (1997). Recombination in bacteriophage Q{beta} its satellite RNAs: the in vivo and in vitro studies. Semin Virol 8, 121–129.[CrossRef]

Conzelmann, K.-K. (1998). Nonsegmented negative-strand RNA viruses: genetics and manipulation of viral genomes. Annu Rev Genet 32, 123–162.[CrossRef][Medline]

Cooper, P. D., Steiner-Pryor, S., Scotti, P. D. & Delong, D. (1974). On the nature of poliovirus genetic recombinants. J Gen Virol 23, 41–49.[Medline]

Crotty, S., Cameron, C. E. & Andino, R. (2001). RNA virus error catastrophe: direct molecular test by using ribavirin. Proc Natl Acad Sci U S A 98, 6895–6900.[Abstract/Free Full Text]

Danis, C., Mabrouk, T., Garzon, S. & Lemay, G. (1993). Establishment of persistent reovirus infection in SC1 cells: absence of protein synthesis inhibition and increased level of double-stranded RNA-activated protein kinase. Virus Res 27, 253–265.[CrossRef][Medline]

Domingo, E. & Holland, J. J. (1997). RNA virus mutations for fitness and survival. Annu Rev Microbiol 51, 151–178.[CrossRef][Medline]

Drake, J. W. & Holland, J. J. (1999). Mutation rates among RNA viruses. Proc Natl Acad Sci U S A 96, 13910–13913.[Abstract/Free Full Text]

Duarte, E., Clarke, D., Moya, A., Domingo, E. & Holland, J. J. (1992). Rapid fitness losses in mammalian RNA virus clones due to Muller's ratchet. Proc Natl Acad Sci U S A 89, 6015–6019.[Abstract]

Duhaut, S. D. & Dimmock, N. J. (1998). Heterologous protection of mice from a lethal human H1N1 influenza A virus infection by H3N8 equine defective interfering virus: comparison of defective RNA sequences isolated from the DI inoculum and mouse lung. Virology 248, 241–253.[CrossRef][Medline]

Egelman, E., Wu, S., Amrein, M., Portner, A. & Murti, G. (1989). The Sendai virus nucleocapsid exists in at least four different helical states. J Virol 63, 2233–2243.[Medline]

Elena, S. F. & Moya, A. (1999). Rate of deleterious mutation and the distribution of its effects on fitness in vesicular stomatitis virus. J Evol Biol 12, 1078–1088.[CrossRef]

Gibbs, M. J., Armstrong, J. S. & Gibbs, A. J. (2001). Recombination in the hemagglutinin gene of the 1918 ‘Spanish flu’. Science 293, 1842–1845.[Abstract/Free Full Text]

Gorman, O. T., Bean, W. J. & Webster, R. G. (1992). Evolutionary processes in influenza viruses: divergence, rapid evolution, and stasis. Curr Top Microbiol Immunol 176, 75–97.[Medline]

Haga, T., Komase, K., Yoshikawa, Y. & Yamanouchi, K. (1992). Molecular analysis of virus-producing and non-producing clones derived from a defective SSPE virus Yamagata-1 strain. Microbiol Immunol 36, 257–267.[Medline]

Holmes, E. C., Worobey, M. & Rambaut, A. (1999). Phylogenetic evidence for recombination in dengue virus. Mol Biol Evol 16, 405–409.[Abstract]

Hu, W. S. & Temin, H. M. (1990). Retroviral recombination and reverse transcription. Science 250, 1227–1233.[Medline]

Jenkins, G. M., Rambaut, A., Pybus, O. G. & Holmes, E. C. (2002). Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis. J Mol Evol 54, 152–161.

Jennings, P. A., Finch, J. T., Winter, G. & Robertson, J. S. (1983). Does the higher order structure of the influenza virus ribonucleoprotein guide sequence rearrangements in influenza viral RNA? Cell 34, 619–627.[Medline]

Jung, A., Maier, R., Vartanian, J. P., Bocharov, G., Jung, V., Fischer, U., Meese, E., Wain-Hobson, S. & Meyerhans, A. (2002). Multiply infected spleen cells in HIV patients. Nature 418, 144.[CrossRef][Medline]

Kashiwagi, Y., Takami, T., Mori, T. & Nakayama, T. (1999). Sequence analysis of F, SH, and HN genes among mumps virus strains in Japan. Arch Virol 144, 593–599.[CrossRef][Medline]

Klempa, B., Schmidt, H. A., Ulrich, R., Kaluz, S., Labuda, M., Meisel, H., Hjelle, B. & Krüger, D. H. (2003). Genetic interaction between distinct Dobrava hantavirus subtypes in Apodemus agrarius and A. flavicollis in nature. J Virol 77, 804–809.[CrossRef][Medline]

Kolakofsky, D., Pelet, T., Garcin, D., Hausmann, S., Curran, J. & Roux, L. (1998). Paramyxovirus RNA synthesis and the requirement for hexamer genome length: the rule of six revisited. J Virol 72, 891–899.[Free Full Text]

Lai, M. M. C. (1992). RNA recombination in animal and plant viruses. Microbiol Rev 56, 61–79.[Medline]

Li, T. & Pattnaik, A. K. (1997). Replication signals in the genome of the vesicular stomatitis virus and its defective interfering particles: identification of a sequence element that enhances DI RNA replication. Virology 232, 248–259.[CrossRef][Medline]

Liao, C.-L. & Lai, M. M. C. (1992). RNA recombination in a coronavirus: recombination between viral genomic RNA and transfected RNA fragments. J Virol 66, 6117–6124.[Abstract]

Loeb, L. A., Essigmann, J. M., Kazazi, F., Zhang, J., Rose, K. D. & Mullins, J. I. (1999). Lethal mutagenesis of HIV with mutagenic nucleoside analogs. Proc Natl Acad Sci U S A 96, 1492–1497.[Abstract/Free Full Text]

Mindich, L. (1996). Heterologous recombination in the segmented dsRNA genome of bacteriophage {phi}6. Semin Virol 7, 389–397.[CrossRef]

Moutouh, L., Corbeil, J. & Richman, D. D. (1996). Recombination leads to the rapid emergence of HIV-1 dually resistant mutants under selective drug pressure. Proc Natl Acad Sci U S A 93, 6106–6111.[Abstract/Free Full Text]

Murphy, S. K. & Parks, G. D. (1997). Genome nucleotide lengths that are divisible by six are not essential but enhance replication of defective interfering RNAs of the paramyxovirus simian virus 5. Virology 232, 145–157.[CrossRef][Medline]

Orlich M. , Gottwald, H. & Rott, R. (1994). Nonhomologous recombination between the hemagglutinin gene and the nucleoprotein gene of an influenza virus. Virology 204, 462–465.[CrossRef][Medline]

Posada, D., Crandall, K. A. & Holmes, E. C. (2002). Recombination in evolutionary genomics. Annu Rev Genet 36, 75–97.[CrossRef][Medline]

Pringle, C. R. & Parry, J. E. (1982). Measurement of surface antigen by specific bacterial adherence and scanning electron microscopy (SABA/SEM) in cells infected by vesiculovirus ts mutants. J Gen Virol 59, 207–211.[Abstract]

Rambaut, A. & Grassly, N. C. (1997). SEQ-GEN: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput Appl Biosci 13, 235–238.[Abstract]

Robertson, D. L., Sharp, P. M., McCutchan, F. E. & Hahn, B. H. (1995). Recombination in HIV-1. Nature 374, 124–126.[Medline]

Roossinck, M. J. (1997). Mechanisms of plant virus evolution. Annu Rev Phytopathol 35, 191–209.[CrossRef]

Sawyer, S. (1989). Statistical tests for detecting gene conversion. Mol Biol Evol 6, 526–538.[Abstract]

Seal, B. S., Crawford, J. M., Sellers, H. S., Locke, D. P. & King, D. J. (2002). Nucleotide sequence analysis of the Newcastle disease virus nucleocapsid protein gene and phylogenetic relationships among the Paramyxoviridae. Virus Res 83, 119–129.[CrossRef][Medline]

Sibold, C., Meisel, H., Krüger, D. H., Labuda, M., Lysy, J., Kozuch, O., Pejcoch, M., Vaheri, A. & Plyusnin, A. (1999). Recombination in Tula hantavirus evolution: analysis of genetic lineages from Slovakia. J Virol 73, 667–675.[Abstract/Free Full Text]

Sierra, S., Dávila, M., Lowenstein, P. R. & Domingo, E. (2000). Response of foot-and-mouth disease virus to increased mutagenesis: influence of viral load and fitness in loss of infectivity. J Virol 74, 8316–8323.[Abstract/Free Full Text]

Simon, A. E. & Bujarski, J. J. (1994). RNA–RNA recombination and evolution in virus-infected plants. Annu Rev Phytopath 32, 337–362.[CrossRef]

Simon, K. O., Cardamone, J. J., Jr, Whitaker-Dowling, P. A., Youngner, J. S. & Widnell, C. C. (1990). Cellular mechanisms in the superinfection exclusion of vesicular stomatitis virus. Virology 177, 375–379.[Medline]

Sironen, T., Vaheri, A. & Plyusnin, A. (2001). Molecular evolution of Puumala hantavirus. J Virol 75, 11803–11810.[Abstract/Free Full Text]

Strauss, J. H. & Strauss, E. G. (1997). Recombination in alphaviruses. Semin Virol 8, 85–94.[CrossRef]

Swofford, D. L. (2002). PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods), version 4. Sunderland, MA, Sinauer Associates.

Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. (1997). The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25, 4876–4882.[Abstract/Free Full Text]

Vapalahti, O., Kallio-Kokko, H., Salonen, E. M., Brummer-Korvenkontio, M. & Vaheri, A. (1992). Cloning and sequencing of Puumala virus Sotkamo strain S and M RNA segments: evidence for strain variation in hantaviruses and expression of the nucleocapsid protein. J Gen Virol 73, 829–838.[Abstract]

Worobey, M. (2001). A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacteria, and mitochondria. Mol Biol Evol 18, 1425–1434.[Abstract/Free Full Text]

Worobey, M. & Holmes, E. C. (1999). Evolutionary aspects of recombination in RNA viruses. J Gen Virol 80, 2535–2543.[Free Full Text]

Worobey, M., Rambaut, A. & Holmes, E. C. (1999). Widespread intra-serotype recombination in natural populations of dengue virus. Proc Natl Acad Sci U S A 96, 7352–7357.[Abstract/Free Full Text]

Worobey, M., Rambaut, A., Pybus, O. G. & Robertson, D. L. (2002). Questioning the evidence for genetic recombination in the 1918 ‘Spanish flu’ virus. Science 296, 211.[CrossRef][Medline]

Received 7 April 2003; accepted 10 June 2003.