A Codon-Based Model of Host-Specific Selection in Parasites, with an Application to the Influenza A Virus

Roald Forsberg1, and Freddy Bugge Christiansen

Bioinformatics Research Center (BiRC), Department of Genetics and Ecology, The Institute of Biological Sciences, University of rhus, rhus, Denmark

Correspondence: E-mail: forsberg{at}stats.ox.ac.uk.


    Abstract
 TOP
 Abstract
 Introduction
 Theory and Model
 Implementation and Results
 Discussion
 Acknowledgements
 Literature Cited
 
Parasites sometimes expand their host range by acquiring a new host species. After a host change event, the selective regime acting on a given parasite gene may change as a result of host-specific adaptive alterations of protein functionality or host-specific immune-mediated selection. We present a codon-based model that attempts to include these effects by allowing the position-specific substitution process to change in conjunction with a host change event. Following maximum-likelihood parameter estimation, we employ an empirical Bayesian procedure to identify candidate sites potentially involved in host-specific adaptation. We discuss the applicability of the model to the more general problem of ascertaining whether the selective regime differs in two groups of related organisms. The utility of the model is illustrated on a data set of nucleoprotein sequences from the influenza A virus obtained from avian and human hosts.

Key Words: host-specific adaptation • influenza evolution • codon model


    Introduction
 TOP
 Abstract
 Introduction
 Theory and Model
 Implementation and Results
 Discussion
 Acknowledgements
 Literature Cited
 
Parasite species are capable of expanding their host range. These host radiations may have a detrimental effect on the new host species. Well-known examples include the influenza pandemic of 1918–1920 known as the Spanish Influenza (Gorman et al. 1991), the recent epidemic of the Nipah virus (Chua et al. 2000), and the current epidemic of acquired immunodeficiency syndrome in humans (Gao et al. 1999). A successful host radiation consists of three steps: interspecies transmission to the new host, replication within individuals of the new host, and transmission between individuals of the new host species. The intensity of interspecies transmissions is determined by ecology and behavior and must be quite high among coexisting host species with intimate contact, such as, e.g., predators and their prey. Yet, host radiation events remain rare, and most parasites have a clearly defined and constant host range. The host radiation process must therefore be limited by the latter two steps of replication and transmission. An explanation for this limitation is offered by the general theory of ecological specialization (Turner and Elena 2000). This holds that adaptations to different habitats are antagonistic so that the adaptation of a parasite to replication and transmission in its native host creates a barrier against replication and transmission in other host species. In support of this theory, it has been demonstrated experimentally that parasite adaptation indeed occurs upon the introduction to a new host, and that this happens at the cost of maladaptation to the original host (Crill et al. 2000; Turner and Elena 2000), a phenomenon that has been exploited for decades in the production of live attenuated vaccines through repeated passage on non-host cells (Ebert 1998; Nielsen et al. 2001).

When a host radiation event occurs, it is therefore expected to be accompanied by adaptive evolutionary changes that establish a new phenotype and allow the parasite to complete its life cycle in the new host. Thus, identification of the positions in the parasite genome underlying the phenotypic difference between host-specific strains may provide information about the molecular basis for species-specific adaptation.

Previous studies aimed at this adaptation have focused on identifying fixed species-specific changes in the genomes of parasites from different host species (Hughes et al. 2000; Chang, Sgro, and Parrish 1992). These are natural candidates readily identified from alignment data. However, adaptive alteration of protein functionality may not occur only through fixation of different amino acids in different hosts. It may present itself as a more subtle change in the suite of amino acids that are allowed to occur, resulting not in fixation, but in a different substitution process in the two species. Additionally, host-specific adaptation can occur continuously as a result of host-specific immune selection that continuously selects for protein variants with new antigenic configurations. Fixation of genetic variants is therefore too rigid a criterion for the identification of positions involved in species-specific adaptation. Instead, a comparative methodology which focuses on changes in the evolutionary process in the different host environments may provide a fruitful approach.

To this end, we elaborate a simple codon-based model of nucleotide substitution which describes changes in the selective regime at the protein level that may affect the genes of a parasite confronted with a new host environment. The purpose of the model is twofold: First, we aim to gain a better understanding of the evolutionary processes in parasites that undergo species change. Second, we wish to use the model to identify positions in parasite genomes that may be involved in host-specific adaptations. These may then act as candidates for further experimental work on the molecular biology of species-specific adaptation.

The fundamental idea of the model is to infer how selection acts on variants of parasite genes in different hosts. To do so, we use a site-specific comparison of the substitution process of synonymous (silent) and nonsynonymous (amino acid altering) nucleotide substitutions in the parasite populations of different hosts. This type of comparison has previously been shown to be a useful analytical tool in the study of molecular evolution (Kimura 1983; Yang et al. 2000). The model is based on evolutionary comparisons on a genealogy and thus exploits the evolutionary information contained in the data while incorporating the correlation by common ancestry. The method is illustrated with an application to data from the influenza A virus.


    Theory and Model
 TOP
 Abstract
 Introduction
 Theory and Model
 Implementation and Results
 Discussion
 Acknowledgements
 Literature Cited
 
Background
We assume that synonymous mutations are selectively neutral and that their rate of substitution is constant along the sequence, so that the expected number of synonymous substitutions is equal in all positions and provides an estimate of the number of available mutations for the fixation process on a given branch of the tree. We further assume that the forces of selection acting at the protein level are constant within the different hosts of the parasite and that they remain unaffected by any potential changes in the rate of mutation. The latter is an often unstated assumption of codon-based models, which implies that the rate of fixation of nonsynonymous and synonymous changes scales identically with changes in the mutation rate. Under these assumptions, the ratio of the fixation rates of nonsynonymous and synonymous changes ({omega}) in a given codon position provides a measure of the selective regime acting on the encoded amino acid (Kimura 1983; Nielsen and Yang 1998; Yang et al. 2000).

Upon introduction to the new host, changes in the position-specific selective regime that acts on parasite protein variants may occur through several mechanisms: In the adaptation to the new host, parts of the protein may acquire new functions, or different chemical configurations, in order to interact properly with the new and biochemically different host-cell environment. Hence, the functional constraints on the amino acid positions involved in such adaptation may differ in the two hosts, resulting in a change in the site-specific substitution process of amino acids. Alternatively, positions that are important for protein function in the original host may lose their function in the new host and therefore experience relaxed functional constraints after host radiation. Finally, the new host environment may alter the pattern of immune-mediated selection. The location of antigenic regions may change with ensuing changes in the substitution pattern along the sequence. Also, the epidemiological dynamics may change in the new host, thereby altering the strength of immune-mediated selection. Suppose the dynamics of the parasite in the original host is that of a childhood disease where the parasite most commonly infects immunologically naive individuals. Immune-mediated selection may therefore be limited or absent. Properties of the new host may, however, dictate a different epidemiological dynamic, where the parasite is dependent on continuous reinfection of previously infected individuals and therefore subjected to a strong immune-mediated selection that favors changes in its antigenic properties.

These host-mediated changes in the selective regime will potentially result in an observable host-mediated difference in the position-specific ratio of the two kinds of nucleotide substitutions, and hence will allow for the identification of codon positions involved in host-specific adaptation.

Data
We consider a sample of coding sequences from homologous genes taken from a virus that infects two different species (fig. 1). At some point in its evolutionary history the parasite acquired a new host species. Assuming no recombination, the relationship between the sequences can be described by a tree in which one clade represents sequences evolved in the new host N and the rest of the tree represents parasites evolved in the original host O. In the tree shown in figure 1 and in the following formulae the root of the tree represents the point of host radiation, but this could equally well be represented by an internal node in the tree. The differentiation between the original host and the new host is used throughout this article for the sake of clarity. It is not necessary, however, to know which species is the original host as long as the point of divergence between the two groups of sequences can be identified in the tree.



View larger version (9K):
[in this window]
[in a new window]
 
FIG. 1. A model tree of parasites from two different hosts. At the node of divergence the virus was transmitted from the original host to the new host

 
Basic Markov Model of Codon Substitution
Suppose the gene sequences in question consist of n codons. The data at site h(h = 1, ... , n) are represented by two vectors oh and nh, where oh is a vector of codons from the sequences in the original host species at site h, and nh is a vector of codons from the sequences in the new host species at site h.

The nucleotide-substitution process is described by a codon-based model of substitution, which is a modification of the model proposed by Goldman and Yang (1994). This is given by a 61 x 61 matrix (stop codons not allowed) of the relative instantaneous substitution rates, where the rate of transition from codon i to codon j at site h is


The transition probability matrix of codon substitution over a branch in the tree of length t is then calculated as P(t) = eQt. The parameter {omega} modifies the ratio between the rates of nonsynonymous and synonymous substitution and describes the propensity of sites to accept amino acid–altering substitutions. The parameter {kappa} represents the transition/transversion rate ratio, {pi}j is the equilibrium frequency of codon j (estimated as the observed codon frequency in the data) and the parameter {nu} is a scaling factor defined by the requirement that the average rate of substitution be one:


This scaling means that branch lengths can be interpreted as the expected number of nucleotide substitutions per codon averaged over all sites.

Allowing for Heterogeneous Selection Pressure
The selective regime may differ at different positions in the gene, and potentially in different hosts. To reflect this difference, we allow a statistical distribution p({omega}) of {omega} ratios among sites (Yang et al. 2000). The probability of observing the data in a site is then obtained by integration,


Constant Selection Pressure After Host Radiation
Consider a codon in the alignment. We assume that the selective regime is constant in the clade that represents viruses from the original host. The simplest event that may occur at the time of host radiation is the event c, that the selective regime in this codon position (and therefore {omega}) remains the same. Conditional on event c we have that the total probability of the data in site h is given by


Here, xj is the codon state in the node of divergence (fig. 1). M represents the model parameters {kappa}, {pi}, and the tree, including topology and branch lengths. The probability of observing codon xj given the subtrees from the two different species, (P(xj, oh|M, {omega}), P(xj, nh|M, {omega})), is found by traversing the subtrees according to Felsenstein's pruning algorithm (Felsenstein 1981). We can now construct the simplest hypothesis, H1, that no changes occur after host radiation. The probability of the data under this hypothesis is simply


and under the assumption that sites evolve independently, the full likelihood of the data is


Describing Host-Mediated Changes in the Selective Regime
To allow for the selective regime to change after the introduction to the new host, we include in our model the event d, which specifies a divergence in the selective regime affecting a site after host radiation. This means that if d occurs, then the ratio of the rate of nonsynonymous to synonymous substitutions ({omega}) on the branches of the tree that represent evolution in the new host is permitted to differ from that in the original host. Lacking any prior assumptions concerning the direction of the change, we simply say that if d occurs, then the choices of the {omega} parameters in the two hosts become independent of each other. Conditional on d, the probability of observing oh and nh is therefore obtained by independently integrating over the {omega} distribution in the two parts of the tree and multiplying


However, the selective regime on all sites in a protein is not likely to change after host radiation. Therefore, in the construction of a hypothesis H0 that allows for host-specific selection we let the event d occur for a site with probability pd. We then have that the total probability of observing the data under H0 is


where pd is a parameter that describes the propensity of a site to experience a change in the selective regime. The total likelihood is found by a similar expression to (1).

Comparing Models by Likelihood Ratio Tests
The two hypotheses are nested in that H1 H0. Comparison of the models, however, corresponds to fixing the parameter pd at the boundary. In this situation, evaluation of the test statistic using the {chi}2 distribution will provide a conservative test of significance. To avoid this effect, the simple {chi}2 distribution can be replaced with a mixed distribution when comparing test statistics that are close to the critical value (Ota et al. 2000).

Identification of Candidate Sites
To identify sites where the selective regime has changed after host radiation, we use an empirical Bayes approach (Nielsen and Yang 1998; Yang et al. 2000). Following maximum likelihood estimation of the parameters in the model (, d), we can calculate the probability of event d in a given site h as



    Implementation and Results
 TOP
 Abstract
 Introduction
 Theory and Model
 Implementation and Results
 Discussion
 Acknowledgements
 Literature Cited
 
A computer program written in JAVA that implements these models was constructed using parts of the PAL library (Drummond and Strimmer 2001) and can be downloaded from the Web site http://birc.dk.

To allow for heterogeneous selection regimes, we first implemented a discrete version of a continuous Gamma distribution following Yang et al. (2000). However, as have others, we encountered several problems with this approach, and it was therefore abandoned in favor of a discrete distribution (Yang et al. 2000). Say we have K classes of sites in the alignment with the {omega} ratios and proportions given as


This adds K rate ratio and K - 1 class-probability parameters to the model.

To demonstrate the utility of our model, we analyzed a data set of nucleoprotein (NP) sequences from the influenza A virus taken from avian and human hosts. The NP gene encodes a protein with 498 amino acids whose primary function is the binding of viral RNA segments to form the nucleocapsid of the virus particle (Lamb and Krug 1996). The NP gene has been assigned a putative role as a determinant of host range (Murphy and Webster 1996), but the identification of residues important for species-specific adaptation has been hindered by the lack of a resolved three-dimensional protein structure (Lamb and Krug 1996). Molecular biological analysis, however, has revealed regions of the NP that are of functional importance and several epitopes that are supposedly recognized by cytotoxic T lymphocytes (CTL) of the cellular immune response (Neumann, Castrucci, and Kawaoka 1997; DiBrino et al. 1993; Davey, Dimmock, and Colman 1985; Voeten et al. 2000; Boon et al. 2002). The present-day form of the human influenza. A nucleoprotein is believed to have entered the human population as an avian virus was transferred to humans immediately before the onset of the Spanish Influenza pandemic of 1918–1920 (Gorman et al. 1991). Since then, the nucleoprotein gene has evolved independently in human and avian hosts. In contrast, other genetic elements were exchanged between avian and human influenza viruses prior to each of the two pandemics in the last half of the twentieth century (Murphy and Webster 1996).

Sequences were downloaded from the Influenza Sequence Database (Macken et al. 2001) and include all available avian NP sequences and an equal number of human NP sequences that were chosen to cover the largest possible time span. Figure 2 shows the genealogy of the chosen isolates. This was inferred under the maximum likelihood (ML) criteria by the fastDNAmL algorithm (Olsen et al. 1994) provided at http://bioweb.pasteur.fr/seqanal/interfaces/fastdnaml-simple.htm. A very similar genealogy was inferred with the Neighbor-Joining (NJ) algorithm as implemented in the PAUP software (Swofford 2002). Furthermore, a Bayesian estimate of the posterior genealogy distribution was performed using the MrBayes program (Huelsenbeck and Ronquist 2001) with a general time reversible (GTR) model of substitution and a gamma distribution on rate heterogeneity. The estimation involved 3 million Markov Chain Monte Carlo samples and showed that besides the ordering of a few short external branches, there was high confidence in the structure of the ML tree (results not shown). The position of the root was determined by the inclusion of an equine influenza virus (Gorman et al. 1991), and because this isolates the human influenza lineages as a monophyletic group, the root was used as the time of the transmission event. Branch lengths were estimated using the CODEML program of the PAML package under a model that allows three discrete classes of {omega} rates (Yang 2000).



View larger version (34K):
[in this window]
[in a new window]
 
FIG. 2. Genealogy of the nucleoprotein sequences used in the study. Sequences are listed with standard names and the branch length in units of expected substitutions per codon is indicated by the scale bar

 
Analyses were performed using the genealogies estimated by ML and NJ. This yielded very similar parameter estimates and site-specific probabilities, indicating that the methodology is robust to minor differences in the estimated genealogy. For this reason we have listed only results obtained using the ML tree. Results of the analysis under the different hypotheses, using three classes of {omega} ratios, are shown in table 1. Hypothesis H0 allows the ratio between the rates of nonsynonymous to synonymous substitution ({omega}) to differ in the two host species, and it provides a significantly better fit to the data than H1, which does not include this feature (P << 0.01). It is also evident from table 1 that the estimated distribution of {omega} values is wider under H0 than under H1, and that it also encompasses a class of positively selected sites ({omega} > 1). The empirical Bayes assignment of site-specific probabilities of {omega} divergence identified 12 sites with a high probability (P > 0.90) of {omega} divergence. These are listed in table 2, which also lists molecular biological results on NP function and immunogenicity. One of these sites (334) is located in a region proposed to be involved in the transport of NP over the nuclear membrane; another site (423) is located in a proposed CTL epitope; and two sites (32, 343) are located in regions that are proposed to be involved in both nuclear transport and CTL recognition. Furthermore, two sites (350, 353) are located in the flanking region of a proposed CTL epitope where mutations are known to disrupt cytosolic processing of viral CTL epitopes. Six sites, however, are located in regions of NP where no biological role has yet been proposed. Interestingly, two of these (101, 102) are neighbors, whereas the remaining four are spaced further apart (62, 131, 214, 290).


View this table:
[in this window]
[in a new window]
 
Table 1 Analysis of the Nucleoprotein Gene from the Influenza A Virus.

 

View this table:
[in this window]
[in a new window]
 
Table 2 Identification of Codon Positions Potentially Involved in Host Adaptation.

 

    Discussion
 TOP
 Abstract
 Introduction
 Theory and Model
 Implementation and Results
 Discussion
 Acknowledgements
 Literature Cited
 
The application of our model to the NP data showed that the hypothesis, which allows the position-specific substitution process to differ between parasites occupying different hosts, provides the better description of data. We believe that our model captures a relevant biological phenomenon in the evolution of this host-changing parasite: namely, that selection favors parasite proteins with different properties in different hosts and therefore creates different patterns of evolution. Furthermore, we were encouraged to find that several of the sites identified by our method correspond to regions of the NP gene known to be of functional or immunological importance in the human host. An additional six sites were identified in regions of unknown function and constitute interesting targets for molecular biological analysis, such as directed mutagenesis, to evaluate their effect on host-specific adaptation in the influenza virus.

This approach focuses on identifying sites where the selective regime has changed, either as a result of host-specific alterations in immune-mediated selection or as a result of altered functional constraints after an adaptive move in sequence space. In the latter case, we are not inferring the adaptive evolutionary change itself, but rather its signature in the ensuing evolutionary process. We would, however, expect these adaptive changes to create an excess of nonsynonymous substitutions in the time immediately following the host radiation event. In an attempt to include this possibility, we also implemented a model that allows for a burst of nonsynonymous substitutions on the uppermost branch of the subtree relating the sequences from the new (human) host. This was done by multiplying all {omega} values in the distribution by a positive number (fb) when calculating the substitution probabilities on this branch, thus increasing their value. The results of this series of calculations were, however, not easily interpretable, as the model seemed to favor lower {omega} values on this branch (fb < 1), contrary to our expectation.

The inference of parameters like fb that have their effect deep within an evolutionary tree is an intrinsically difficult problem because the uncertainty of the states at the inner nodes of a tree increases as one progresses upward from the leaves, making it difficult to determine whether the above-mentioned result is an actual phenomenon, or an artifact of the methodology. A potential explanation, is that the chosen node of divergence is a bad estimate of the true time of divergence. A branch in a tree is an arbitrary evolutionary unit affected by sampling effects. Hence, the inclusion of further avian lineages might break up the branch leading to the clade of human influenzas, signifying that part of the evolution on this branch occurred in the original avian hosts rather than in the new human host. In this case, a generally higher rate of nonsynonymous substitution on the human part of the branch caused by, e.g., immune-mediated selection would give a poor description of the evolutionary regime predominant in the true avian host. A way to counter this effect would be to lower the fb parameter and thus the number of nonsynonymous substitutions.

There are two interesting implications of this that we did not pursue. It may be possible to infer the maximum likelihood position of the species transmission event along this branch, which in conjunction with a molecular clock model could provide an estimated date of species transmission. Furthermore, it may be possible to infer which host species is most likely to be new to the parasite if the history of the parasite is unknown.

The motivation for developing the model presented here was to study the evolution of parasites undergoing host radiation, but the model addresses the more general problem of describing the substitution process in two groups of related organisms and determining whether and where the selective regime may differ. In our formulation, the evolutionary event that is believed to have changed the way natural selection acts is the introduction to a new host environment. A closely related problem, and one that has received considerable attention, is that of functional divergence in paralogous genes, where the decisive evolutionary event is a duplication of the genetic element.

It is therefore not surprising that two recent models of functional divergence in paralogous genes are conceptually similar to our model in using models of amino acid substitution that allow the position-specific rate of amino acid substitution to change after gene duplication (Gu 2001; Knudsen and Miyamoto 2001). However, these models infer changes in the selective regime through a change in the absolute rate of substitution, whereas the codon-based approach presented here uses a comparison between the two different types of nucleotide substitution, thus enabling use of the full information in the nucleotide sequence and inclusion of the known biological phenomenon of transition-transversion bias. A potential advantage of the amino acid–based models is that they use empirically based transition matrices, which may include the effect that the physiochemical properties of amino acids have on the substitution process.

To simplify our model, we have not included any such empirical priors on substitution rates, but such factors could easily be included in the codon-based substitution model (see Goldman and Yang 1994). In relation to this aspect of the model, the {omega} parameter used is a rather crude indicator of the selective regime, and it would be interesting to consider a model which allowed additional indicators such as, e.g., shifts between biochemically different groups of amino acids.

There are additional ways in which the present approach could be improved. An apparent and readily made generalization, is to extend the model to allow for several independent host change events—i.e., a tree in which several clades of the new host are embedded in a larger tree of isolates from the original host. In the construction of the model, we have made the implicit assumption that the equilibrium codon frequencies do not change after host change. The validity of this assumption is not clear, but changes in the host cell environment could potentially change the direction of any codon bias. A simple way to include this assumption could be to estimate the equilibrium codon frequencies from the two groups separately and then apply separate transition matrices to the different parts of the tree.

Another assumption is that the distribution of {omega} is the same in both species—i.e., that the change in {omega} is symmetric so that an upward change in one site is balanced by a downward change in another site. This assumption is justified by the recognition that there is limited information in the data to estimate two independent distributions and that any bias it induces on the identification procedure will be toward more conservative estimates. A more grave assumption is that of independence between codon positions. This is a commonly made assumption in molecular evolutionary analysis of proteins, but it is inherently wrong because of the known three-dimensional nature of biological molecules. Some efforts have been made to address this issue (Pedersen and Jensen 2001; Pollock, Taylor, and Goldman 1999), but both approaches are computationally very demanding and were therefore not pursued here.

Rates of mutation may change after a viral host change because of alterations in, e.g., the rate of replication. As stated previously, we have assumed that the rates of synonymous and nonsynonymous substitution scale identically with such changes in the mutation rate. The validity of this assumption depends on the unknown population genetic mechanisms underlying the process of fixation. For sites where all amino acid substitutions are either neutral or strongly deleterious, we expect this scaling to hold. This may not be the case however, for sites undergoing positive selection if the process of fixation is limited by factors other than the availability of mutations. Should this assumption be violated, a change in the mutation rate could cause a change in the ratio of the rates of synonymous and nonsynonymous substitution to occur without there being any change in the underlying selective regime acting at the protein level, and therefore, bias analyses like the one presented here lead to the identification of false positives. In our future work, we intend to explore and potentially eliminate this bias through the construction of a model which allows for simultaneous changes in both the selective regime acting at the protein level and the underlying rate of mutation as represented by the rate of synonymous substitution.

All this having been said, we still believe that the results from the analysis of the NP gene data show promise for our comparative approach and that this type of analysis, correlated with structural, immunological, and other information, may provide a useful way to identify candidate sites for the molecular biological investigation of species-specific adaptation in parasites.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Theory and Model
 Implementation and Results
 Discussion
 Acknowledgements
 Literature Cited
 
We thank Jakob Skou Pedersen, Mikkel Heide Schierup, Bjarne Knudsen, and Alexei Drummond for useful comments and discussions. This study was supported by grants 21-02-0206 and 51-00-0392 from the Danish Natural Science Research Council; by grant 1-R01-GM60729-01 from the National Institutes of Health, USA; by grant HAMJW from EPSRC; and by grant HAMKA from MRC.


    Footnotes
 
1 Present address: Bioinformatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom. Back

Keith Gandall, Associate Editor


    Literature Cited
 TOP
 Abstract
 Introduction
 Theory and Model
 Implementation and Results
 Discussion
 Acknowledgements
 Literature Cited
 

    Boon, A. C., G. de Mutsert, Y. M. Graus, R. A. Fouchier, K. Sintnicolaas, A. D. Osterhaus, and G. F. Rimmelzwaan. 2002. Sequence variation in a newly identified HLA-B35-restricted epitope in the influenza A virus nucleoprotein associated with escape from cytotoxic t lymphocytes. J. Virol. 76:2567-2572.[Abstract/Free Full Text]

    Chang, S. F., J. Y. Sgro, and C. R. Parrish. 1992. Multiple amino acids in the capsid structure of canine parvovirus coordinately determine the canine host range and specific antigenic and hemagglutination properties. J. Virol. 66:6858-6867.[Abstract]

    Chua, K. B., W. J. Bellini, and P. A. Rota, et al. (22 co-authors). 2000. Nipah virus: a recently emergent deadly paramyxovirus. Science 288:1432-1435.[Abstract/Free Full Text]

    Crill, W. D., H. A. Wichman, and J. J. Bull. 2000. Evolutionary reversals during viral adaptation to alternating hosts. Genetics 154:27-37.[Abstract/Free Full Text]

    Davey, J., N. J. Dimmock, and A. Colman. 1985. Identification of the sequence responsible for the nuclear accumulation of the influenza virus necleoprotein in Xenopus oocytes. Cell 40:667-675.[CrossRef][ISI][Medline]

    DiBrino, M., T. Tsuchida, R. V. Turner, K. C. Parker, J. E. Coligan, and W. E. Biddison. 1993. HLA-A1 and HLA-A3 T cell epitopes derived from influenza virus proteins predicted from peptide binding motifs. J. Immunol. 151:5930-5935.[Abstract/Free Full Text]

    Drummond, A., and K. Strimmer. 2001. PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics 17:662-663.[Abstract/Free Full Text]

    Ebert, D. 1998. Experimental evolution of parasites. Science 282:1432-1435.[Abstract/Free Full Text]

    Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368-376.[ISI][Medline]

    Gao, F., E. Bailes, and D. Robertson, et al. (12 co-authors). 1999. Origin of HIV-1 in the chimpanzee Pan troglodytes troglodytes. Nature 397:436-441.[CrossRef][ISI][Medline]

    Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736.[Abstract/Free Full Text]

    Gorman, O. T., W. J. Bean, Y. Kawaoka, I. Donatelli, Y. J. Guo, and R. G. Webster. 1991. Evolution of influenza A virus nucleoprotein genes: implications for the origins of H1N1 human and classical swine viruses. J. Virol. 65:3704-3714.[ISI][Medline]

    Gu, X. 2001. A site-specific measure for rate difference after gene duplication or speciation. Mol. Biol. Evol. 18:2327-2330.[Free Full Text]

    Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: bayesian inference of phylogenetic trees. Bioinformatics 17:754-755.[Abstract/Free Full Text]

    Hughes, M. T., M. Matrosovich, M. E. Rodgers, M. McGregor, and Y. Kawaoka. 2000. Influenza A viruses lacking sialidase activity can undergo multiple cycles of replication in cell culture, eggs, or mice. J. Virol. 74:5206-5212.[Abstract/Free Full Text]

    Kimura, M. 1983. The Neutral Theory of Molecular Evolution, 1st edition. Cambridge University Press, Cambridge.

    Knudsen, B., and M. M. Miyamoto. 2001. A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc. Natl. Acad. Sci. USA 98:14512-14517.[Abstract/Free Full Text]

    Lamb, R. A., and R. M. Krug. 1996. Orthomyxoviridae: the viruses and their replication. Pp. 1353–1395 in B. N. Fields, ed. Fields Virology, 3rd Edition. Raven Press, New York.

    Macken, C., H. Lu, J. Goodman, and L. Boykin. 2001. The value of a database in surveillance and vaccine selection. Pp. 103–106 in A. Osterhaus, N. Cox, and A. Hampson, eds, Options for the Control of Influenza IV. Elsevier Science, New York.

    Murphy, B. R., and R. G. Webster. 1996. Orthomyxoviruses. Pp. 1397–1445 in B. N. Fields, ed. Fields Virology, 3rd Edition. Raven Publishers, New York.

    Neumann, G., M. R. Castrucci, and Y. Kawaoka. 1997. Nuclear import and export of influenza virus nucleoprotein. J. Virol. 71:9690-9700.[Abstract]

    Nielsen, H. S., M. B. Oleksiewicz, R. Forsberg, T. Stadejek, A. Botner, and T. Storgaard. 2001. Reversion of a live porcine reproductive and respiratory syndrome virus vaccine investigated by parallel mutations. J. Gen. Virol. 82:(Pt 6): 1263-1272.[Abstract/Free Full Text]

    Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929-936.[Abstract/Free Full Text]

    Olsen, G. J., H. Matsuda, R. Hagstrom, and R. Overbeek. 1994. fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10:41-48.[Abstract]

    Ota, R., P. J. Waddell, M. Hasegawa, H. Shimodaira, and H. Kishino. 2000. Appropriate likelihood ratio tests and marginal distributions for evolutionary tree models with constraints on parameters. Mol. Biol. Evol. 17:798-803.[Abstract/Free Full Text]

    Pedersen, A. M., and J. L. Jensen. 2001. A dependent-rates model and an MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames. Mol. Biol. Evol. 18:763-776.[Abstract/Free Full Text]

    Pollock, D. D., W. R. Taylor, and N. Goldman. 1999. Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287:187-198.[CrossRef][ISI][Medline]

    Swofford, D. L. 2002. PAUP*, phylogenetic analysis using parsimony (*and other methods). version 4. Sinauer Associates, Sunderland, Mass.

    Turner, P. E., and S. F. Elena. 2000. Cost of host radiation in an RNA virus. Genetics 156:1465-1470.[Abstract/Free Full Text]

    Voeten, J. T., T. M. Bestebroer, N. J. Nieuwkoop, R. A. Fouchier, A. D. Osterhaus, and G. F. Rimmelzwaan. 2000. Antigenic drift in the influenza A virus (H3N2) nucleoprotein and escape from recognition by cytotoxic T lymphocytes. J. Virol. 74:6800-6807.[Abstract/Free Full Text]

    Yang, Z. 2000. Phylogenetic Analysis by Maximum Likelihood (PAML), 3rd edition. University College London.

    Yang, Z., R. Nielsen, N. Goldman, and A. M. Petersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431-449.[Abstract/Free Full Text]

Accepted for publication March 31, 2003.