* Department of Statistics, University of California, Berkeley; The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia;
Swiss Institute for Experimental Cancer Research, NCCR Molecular Oncology, Bioinformatics, Lausanne, Switzerland
Correspondence: E-mail: terry{at}stat.berkeley.edu.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Plasmodium serine repeat antigen (SERA) reconciliation orthology paralogy GC content
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Recently, it has become clear that SERA5 and another well-known P. falciparum SERA5 paralog known as SERPH, or SERA6, belong to a large gene family (Knapp et al. 1991). This family includes at least nine members in P. falciparum, four members in P. yoelii, and numerous members in other Plasmodium genomes that have not yet been completely sequenced (Kiefer et al. 1996; Gor, Li, and Rosenthal 1998; Carlton et al. 2002; Gardner et al. 2002).
Orthologous gene family members that have diverged as a result of speciation are more likely to conserve protein function than paralogous family members, which diverged as a result of gene duplication (Thornton and DeSalle 2000; Zmasek and Eddy 2001). In this paper, we aimed to shed light on aspects of SERA function by establishing the evolutionary history of the members of the SERA gene family. In particular, we sought to clarify the relationship among the P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents.
Both objectives required inference about the pattern of duplication, speciation, and gene loss events in the SERA gene family history. "Reconciliation" is one approach for making such inferences (Goodman et al. 1979; Mirkin, Muchnik, and Smith 1995; Guigo, Muchnik, and Smith 1996; Page and Charleston 1997; Page 1998; Thornton and DeSalle 2000; Zmasek and Eddy 2001). The reconciliation algorithm relies, however, upon a trusted or hypothesized gene tree and a species tree. (Here and throughout, we use the phrase "gene tree" to denote the evolutionary history of a gene familywhich may be inferred directly from the DNA sequences or from the amino acid sequences of the derived proteins. "Species tree," in contrast, will refer to the evolutionary relationships among the species in which members of the gene family are found.)
To produce a set of candidate gene trees, we considered SERA proteins from seven species of Plasmodium. All possess a central domain that shows homology to the papain family of cysteine proteases, although some exhibit an unusual cysteine-to-serine substitution at the active site cysteine residue (Bzik et al. 1988; Kiefer et al. 1996; Gor, Li, and Rosenthal 1998; Hodder et al. 2003). We first used the amino acid sequences from this domain to infer an unrooted gene tree.
At the nucleotide level, GC content is relatively consistent among SERA sequences within a single Plasmodium species, but it differs widely between species. Such compositional differences have previously been shown to have an impact on phylogenetic inference, with similarity of GC content creating "spurious attraction." It has been further suggested that GC content differences may also affect phylogenetic analysis based on the corresponding amino acid sequences (Galtier and Gouy 1995, 1998; Foster and Hickey 1999). Galtier and Gouy have proposed both distance (1995) and maximum-likelihood (1998) approaches that explicitly address the issue, and to assess whether spurious attraction may have influenced the amino acid tree, we also applied their distance approach to the SERA nucleotide sequences.
For candidate species trees, we reviewed the literature. In previous studies that have included some or all of these species, the use of different genesand even of different inference methods applied to the same genehas produced two contradictory results, shown in figure 1 (Escalante and Ayala 1994; McCutchan et al. 1996; Templeton and Kaslow 1997; Escalante et al. 1998; Rich and Ayala 2000; Rathore et al. 2001; Michon et al. 2002; Perkins and Schall 2002). It is not clear that a single species tree need apply to the entire Plasmodium genome, and we, therefore, considered both alternatives as competing hypotheses.
|
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Gene Tree Inference
The SERA proteins possess a central domain that shows strong homology to the papain family of cysteine proteases (Bzik et al. 1988; Kiefer et al. 1996; Gor, Li, and Rosenthal 1998). All amino acid sequence data for the 33 putative homologs were aligned using ClustalW with default parameters (Thompson, Higgins, and Gibson 1994), and approximately 266 residues from this readily alignable protease domain were used for subsequent analysis.
Pairwise distances were computed as maximum-likelihood estimates of the expected number of substitutions per site, under the Jones-Taylor-Thornton (JTT) matrix substitution model (Jones, Taylor, and Thornton 1992). These distances were calculated with PROTDIST, and unrooted trees were constructed with the neighbor-joining algorithm (Saitou and Nei 1987) as implemented in NEIGHBOR, both in the PHYLIP package (Felsenstein 1996). Except for choice of substitution model, default parameters were used here and below.
To assess the degree to which the inferred topology depended on choice of substitution model, an amino acid pairwise distance matrix was also computed using the Dayhoff PAM model (Dayhoff 1979). The resulting neighbor-joining tree was topologically identical to the JTT tree.
Neighbor-joining trees were also constructed for two additional distance matrices computed from the corresponding DNA sequences. The GC content for the aligned regions differed substantially from one species to the next (table 1 and fig. 2). To assess the impact, if any, of this compositional bias on the JTT gene tree, both the Galtier and Gouy substitution modelwhich explicitly addresses varying GC contentand the PHYLIP default substitution model for DNA sequence data were applied (Galtier and Gouy 1995; Felsenstein 1996). We will refer to the two DNA substitution models as "GG" and "F84." (Note that nucleotide data for only one of the P. vivax five sequences was available.)
|
|
Reconciliation of Gene and Species Trees
Several authors have presented a straightforward technique for reconciling a rooted gene tree with a trusted or hypothesized species tree (Goodman et al. 1979; Guigo, Muchnik, and Smith 1996; Page and Charleston 1997; Thornton and DeSalle 2000; Zmasek and Eddy 2001). This technique assumes that (1) the evolutionary histories of both the gene family and the species from which the genes derive can be represented as binary trees, and (2) members of the gene family have arisen through duplication, speciation, and deletion events only, not through horizontal gene transfer.
The algorithm provides a means of selecting the set of internal nodes in the gene tree that represent duplication events. In addition, unobserved genesgenes that have disappeared through deletion, have mutated to such an extent that they are no longer recognizable members of the gene family, or have simply been overlooked or omitted from analysisare inferred as well. A minimal set of such genes and their most-parsimonious placement on the gene tree, is returned along with the list of inferred duplication events. Implementation of the reconciliation algorithm for the SERA gene family was with the GeneTree software package (Page 1998).
Although a detailed description of the reconciliation algorithm can be found in the references, we give a brief overview here for convenience. We first present the reconciliation algorithm informally, in a way that highlights the intuition upon which it is based:
Under assumptions (1) and (2) above, a split in the gene family tree that resulted from a speciation event must, by definition, lead to two subtrees composed of genes from disjoint sets of species. Splits that correspond to duplication events, on the other hand, lead to two subtrees composed of genes from identical sets of speciesprovided, of course, that no genes are deleted or missing. Even in the more realistic case, where gene deletion and incomplete gene family data are permitted, if the sets of species included in the two subtrees arising from a given internal node contain at least one common member, we infer that the node represents a duplication event. If, on the other hand, these two sets are disjoint and, moreover, they reflect a division of species that is consistent with the hypothesized species tree, then we infer that the node represents a speciation event.
The reconciliation algorithm can be stated formally as follows: Let S and G denote the set of nodes of the species tree and gene tree, respectively. (Both trees are assumed to be rooted and binary.) For any g G, define
(g) to be the set of species contained in the subtree beginning at node g and more recent than g. Define
(s) similarly for any s
S.
We may now define a map from G to S: for every g G, let M(g) be the lowest (most recent) s
S for which
(g)
(s). Now for any internal g
G, with child nodes g1 and g2, we infer that g represents a duplication event if and only if M(g) is equal to either M(g1) or M(g2); that is, if the node g maps to the same position in the species tree as one of its children.
As discussed previously, in the simplest case in which no genes are missing or have been deleted, a duplication at node g produces identical sets of species in both subsequent subtrees. Thus, (g) =
(g1) =
(g2), and so M(g) = M(g1) = M(g2). It is straightforward to show that if
(g1)
(g2) is nonempty, then M(g) equals either M(g1) or M(g2), and the informal description given above is equivalent to the formal algorithm.
In addition to inferring which nodes in the gene tree represent speciation events and which represent duplication events, the algorithm also permits inference of a minimal set of missing or deleted genes, as well as the most-parsimonious placement of such genes in the gene tree. Briefly, a node g G, which is inferred to be a duplication, is assumed to propagate all species in
(M(g)) into both child subtrees; any members of
(M(g)) not found in the subtree children of g are inferred to be lost or missing. The inferred deletion events are then placed so as to account for multiple missing genes with fewer deletion events and to require fewer missing genes. A complete explanation of the parsimonious placement of such lost or missing genes can be found in the references.
The set of inferred duplications and missing or deleted genes is of inherent interest, and it also provides a parsimony-based criterion for rooting unrooted trees and for selecting among alternative gene and species tree topologies. If duplication and true deletions (as opposed to genes that are present in the organism but were overlooked in the analysis) are assumed to be rare, then it is sensible, ceteris paribus, to minimize the total number of such events. The use of such a criterion amounts to a maximum-parsimony approach to duplication and gene loss events. Although we do not utilize it here, an explicit-probability model for such events has recently been proposed, permitting a maximum-likelihood approach as well (Arvestad et al. 2003). Maximum-likelihood inference has several advantages over cruder parsimony arguments, such as the ability to compute standard errors and confidence intervals. The validity of such computed quantities, however, hangs on the assumption of homogeneity of the birth-death process throughout the evolutionary history of the gene family and on our ability to accurately model this process.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Figure 3A shows the unrooted gene tree returned by the neighbor-joining algorithm when applied to the JTT distance matrix. Bootstrap support values (percentages, with B = 200) for several important internal edges are superimposed. In addition, the amino acids present at two sites that are important for proteolytic activity are indicated. Note that the cysteine and serine protease families segregated, with 100% bootstrap support. Two further subsets of the serine family also feature a second active site mutation: histidine to methionine in one case, and to leucine in the other. Again, there was 100% bootstrap support for segregation by residue at this second active site. The fact that the inferred gene tree neatly accommodated these unusual mutations at two sites with functional significance in the papain homolog lends support to its core topology.
|
Another point worth noting is that bootstrap support for the relationship among the three main subtrees of the serine familythe P. falciparum/P. reichenowi subtree (I), the serine/methionine subtree (II), and the P. vivax/P. knowlesi subtree (III)was substantially weaker than that seen in the amino acid tree. The relative location of these subtrees did not change, but the small edge separating I and II in this tree had only 43% bootstrap support. Further, table 2 shows that among the bootstrap trees considered, two alternative configurations for the three subtrees appeared with nearly equal frequency. Thus, nucleotide data used in combination with the Galtier and Gouy substitution model were insufficient for resolution of the core topology in this region of the gene tree.
|
We next note the suspicious exchange of the k3-k4 and v2-k5 branches in subtree III. GC content for all sequences in the v1-v3-v4-v5b group is above 50%. GC content for v2 and k5 is 51% and 45%, respectively, compared with only 37% for both k3 and k4. One is tempted, therefore, to attribute this swap to "attraction" between sequences of similar GC content. It is doubtful, however, that the F84 tree correctly represents the configuration of sequences in this subtree: as mentioned above, the active site histidine-to-leucine mutation is unusual. In the absence of good evidence to the contrary, parsimony suggests that a gene tree requiring such a mutation to occur twice in the gene family's evolutionary historyas is the case for the F84 tree shownis less plausible than a gene tree that requires only one such mutation.
The exchange of position of subtrees I and II is interestingthe possibility of such an exchange will come up in the context of gene and species tree reconciliationbut bootstrap support for this exchange was very poor. In fact, the core topology of the F84 gene tree proved to be highly unstable among the bootstrap trees. As can be seen in table 2, the depicted F84 gene tree was not even the consensus tree among bootstrap results. The consensus tree (not shown) placed subtree III at B (fig. 3C), in close proximity to the P. vivax and P. knowlesi cysteine proteases. That subtree III should move to A or Badjacent to the other high-GC P. vivax and P. knowlesi sequencesis again what one would expect if GC content induced attraction among sequences of similar composition. As will be discussed below, placement of subtree III at A creates problems for reconciliation unless a new, third species tree previously unreported in the literature is entertained. Placement of subtree III at B is even less justified. It requires both a novel species tree for satisfactory reconciliation and that the unusual cysteine-to-serine mutation has occurred twice in the SERA family's evolutionary history.
Reconciliation: Root Placement, Duplications, and Deletions
Parsimony suggests that needless inference of gene duplication or loss should be avoided, and this criterion provides a basis for rooting the gene trees produced by neighbor-joining. (More specifically, it suggests an optimal internal edge but does not provide for exact placement of the root along this edge.) For the topology found in the JTT and GG gene trees, placement of the root on the long edge connecting the cysteine and serine protease families substantially reduced the number of lost or missing genes implied by reconciliation (using either of the two species trees). This edge was also selected by the traditional "midpoint" method.
Figure 4 shows the pattern of duplication and missing or lost genes inferred by reconciliation of the so-rooted JTT tree with species trees #1 and #2. In both cases, a substantial number of lost genes were required. Replacing the JTT tree with the GG tree produced qualitatively similar results (not shown), and a summary of the number of implied duplications and deletions is given in table 3.
|
|
With respect to the missing or deleted genes, gene deletion is commonplace in the evolutionary history of many organisms. In addition, it is highly likely that some SERA family members were overlookedparticularly from species such as P. vinckei, P. chabaudi, and P. reichenowi, for which genome data is still incomplete. Nonetheless, the significant number of lost or missing genes inferred for the better-studied genomes suggests that we may be able to improve on this result:
Modifications to Improve Reconciliation
A substantially more parsimonious reconciliation of the gene and species trees is possible if we consider configurations around low confidence internal edges of the inferred gene trees to be flexible. In particular, bootstrap support for the placement of f6 and the f7/r7 pair in all gene trees was poor, and the placement of these genes was seen to be sensitive to inference method. If we accept the JTT/GG configuration of major subtrees in the serine family as correct but reposition f6 and f7/r7 as shown in figure 3D, the number of missing or deleted genes required for reconciliation with species tree #2 is substantially reduced, from as many as 25 to eight, all but one of which are in less-studied or incomplete genomes. If we consider instead the minimum number of deletion events required for reconciliation (loss of a single gene early in the SERA family's evolutionary history can produce multiple missing genes among present-day sequences), a comparable improvement is obtained, from as many as 14 to just seven. Figure 4C depicts the inferred pattern of duplication and missing or deleted genes after the suggested modifications were made to the JTT gene tree.
If species tree #1 is to be used instead, a similar improvement in parsimony of reconciliation can also be achieved by exchanging subtrees I and II. (The f6 branch must also be repositioned slightly.) However, such a modification seems less justifiable because this configuration for the serine protease family subtrees appeared in only 14% of JTT bootstrap trees. (It did appear with slightly higher frequency among F84 bootstrap trees and just under half the time among GG bootstrap trees, but the low bootstrap support for any particular configuration of serine family subtrees in the GG and F84 analyses indicates that the nucleotide sequence data were insufficient for resolution of the subtree configuration.)
Although it has little bearing on the larger analysis, one might also consider minor adjustments among f2/r2 and f4/r4 to reduce the number of inferred duplication events by one: bootstrap support for the f2/f4 clade shown in the JTT tree was a modest 64%.
For completeness, we note that if subtree III is moved to A in figure 3Cor even to B as suggested by the F84 consensus tree, albeit with a different root locationcomparable improvements in reconciliation parsimony can be obtained by using a species tree that features a P. vivax/P. knowlesi outgroup. (Again, f6 and f7/r7 must be repositioned slightly.) Because we have not seen such a species tree in the literature, and because there is little support for the required subtree swap in any of the inferred gene trees, we discard this configuration.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Although the JTT tree improved on both nucleotide-based trees, it could not be reconciled with either candidate species tree unless we accept that many SERA genesincluding some in the better-studied specieshave disappeared during the history of the gene family or have yet to be turned up among available sequence data, and that there are nontrivial hidden duplications and unanticipated paralogies among the SERA family.
If, instead, we admit the existence of small errors in the inferred gene tree and make appropriate modifications, a substantially more satisfactory reconciliation can be achieved. Given that our stochastic substitution models are approximations at best, it make sense to regard fine detail and small internal edges in inferred phylogenies with skepticism, and to entertain reasonable modifications to such features.
To achieve satisfactory reconciliation with species tree #1, we had to exchange subtrees I and II in the serine protease family, and also reposition f6 slightly. The latter modification is easily justified, but bootstrap analysis of the JTT gene tree provided moderately strong evidence against the former and, thus, against use of species tree #1. For satisfactory reconciliation with species tree #2, on the other hand, only the f6 and f7/r7 nodes required minor repositioning. Because none of the models could place these nodes definitively, we feel that substantial gains in parsimony of reconciliation justify such modifications. (Curiously, the inferred set of missing genes under the species tree #1 scenario is identical to that produced when species tree #2 is used, so the composition of this set provides no further basis for selecting one scenario over the other. It is also interesting to note that the quantity of available Plasmodium sequence data in public databases increased significantly during the writing of this paper. Although they were not included in the present analysis, sequences that appear to correspond to three of the missing P. chabaudi genes in figure 4C, the predicted P. vivax ortholog of k2, and two of the three missing P. reichenowi genes are now available.)
In summary, we conclude that the SERA family gene tree should be rooted as shown in figure 5, with f6 and f7/r7 repositioned as suggested by reconciliation. In repositioning these nodes, an effort was made to preserve relative pairwise distances from the original JTT tree, but depicted edge lengths in this region of the tree remain somewhat arbitrary. Parsimony of reconciliation provides no information about evolutionary distances, and future work to develop a method for smoothly integrating the results of reconciliation is still required.
|
Our analysis has several implications for SERA biology. Clearly, there are two distinct evolutionary groupings of SERA proteins that separate according to the residue (cysteine or serine) in the catalytic position, and it would appear that at least two of each type is present in each Plasmodium species. It is also evident that there are very different numbers of SERA proteins in the different Plasmodia. P. falciparum, for example, has nine genes, whereas species affecting rodentsP. yoelii, P. vinckei, and P. chabaudiappear to have only four. Furthermore, there has been substantially more duplication and sequence divergence among the serine family than among the cysteine family. This implies either that the former group is under greater selection pressure or that the latter is under greater functional constraint.
Several questions remain. In many instances, it is still not clear which SERA proteins are likely to perform the same biological role across the species. Among species affecting rodents, for example, what are the functional orthologs of the dominantly expressed and apparently essential P. falciparum SERA5 and SERA6 genes? (The gene tree in figure 4C gives two alternatives for each.) Why do different species exhibit different numbers of SERA genes? What selective forces are promoting gene duplication (especially among the serine family), and to what degree are such forces species specific? Do the SERA paralogs within a single species all have distinct biological roles? In P. falciparum, expression profiling has shown that, although coregulated, different SERA genes are expressed to very different levels and that this profile is common across laboratory lines; that is, SERA genes are not differentially expressed like members of some other antigen-encoding gene families such as the var and rifin genes (Miller et al. 2002; Aoki et al. 2002). Furthermore, gene knockout analysis has shown that only strongly expressed SERA genes appear to be important for blood stage growth (Miller et al. 2002).
Such evidence, together with the phylogenetic analysis described in this paper, indicates that the number of different functions performed by the SERA proteins in blood stages is likely to be far more limited than the total number of family members would suggestperhaps to as few as two different roles that separate according to the nature of the catalytic residue in the active site. As dominantly expressed blood-stage antigens, SERA proteinsmost notably SERA5 and SERA6are established vaccine candidates and/or emerging drug targets. The likelihood that other P. falciparum SERAs may have precisely the same biological function (perhaps simply requiring expression to the appropriate level to be fully functionally complementary) suggests that effective vaccines or drugs targeting the SERAs may need to target multiple family members.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Aoki, S., J. Li, S. Itagaki, B. A. Okech, T. G. Egwang, H. Matsuoka, N. M. Q. Palacpa, T. Mitamura, and T. Horii. 2002. Serine repeat antigen (SERA5) is predominantly expressed among the SERA multigene family of Plasmodium falciparum, and the acquired antibody titers correlate with serum inhibition of the parasite growth. J. Biol. Chem. 277:4753347540.
Arvestad, L., A. C. Berglund, J. Lagergren, and B. Sennblad. 2003. Bayesian gene/species tree reconciliation and orthology analysis using MCMC. Bioinformatics 19(suppl 1):i7i15.
Bahl, A., B. Brunk, R. L. Coppel et al. (17 co-authors). 2002. PlasmoDB: the Plasmodium genome resource. An integrated database providing tools for accessing and analyzing mapping, expression, and sequence data (both finished and unfinished). Nucleic Acids Res. 30:8790.
Bzik, D. J., W. B. Li, T. Horii, and J. Inselburg. 1988. Amino acid sequence of the serine-repeat antigen (SERA) of Plasmodium falciparum determined from cloned cDNA. Mol. Biochem. Parasitol. 30:279288.[CrossRef][ISI][Medline]
Carlton, J. M., S. V. Angiuoli, B. B. Suh et al. (44 co-authors). 2002. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 419:512519.[CrossRef][ISI][Medline]
Conway, D. J., and J. Baum. 2002. In the bloodthe remarkable ancestry of Plasmodium falciparum. Trends Parasitol. 18:351355.[CrossRef][ISI]
Dayhoff, M. O. 1979. Atlas of protein sequence and structure, Vol. 5, 1978. National Biomedical Research Foundation, Washington, DC.
Delplace, P., B. Fortier, G. Tronchin, J. F. Dubremetz, and A. Vernes. 1987. Localization, biosynthesis, processing and isolation of a major 126 kDa antigen of the parasitophorous vacuole of Plasmodium falciparum. Mol. Biochem. Parasitol. 23:193201.[CrossRef][ISI][Medline]
Escalante, A. A., and F. J. Ayala. 1994. Phylogeny of the malarial genus Plasmodium, derived from rRNA gene sequences. Proc. Natl. Acad. Sci. USA 91:1137311377.
Escalante, A. A., D. E. Freeland, W. E. Collins, and A. A. Lal. 1998. The evolution of primate malaria parasites based on the gene encoding cytochrome b from the linear mitochondrial genome. Proc. Natl. Acad. Sci. USA 95:81248129.
Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783791.[ISI]
. 1996. Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol. 266:418427.[ISI][Medline]
Foster, P. G., and D. A. Hickey. 1999. Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J. Mol. Evol. 48:284290.[ISI][Medline]
Galtier, N., and M. Gouy. 1995. Inferring phylogenies from DNA sequences of unequal base compositions. Proc. Natl. Acad. Sci. USA 92:1131711321.[Abstract]
. 1998. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15:871879.[Abstract]
Gardner, M. J., N. Hall, E. Fung et al. (45 co-authors). 2002. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419:498511.[CrossRef][ISI][Medline]
Goodman, M., J. Czelusniak, G. W. Moore, A. E. Romero-Herrera, and G. Matsuda. 1979. Fitting the gene lineage into its species lineage, a parsimony strategy illustrated by cladograms constructed from globin sequences. Syst. Zool. 28:132168.[ISI]
Gor, D. O., A. C. Li, and P. J. Rosenthal. 1998. Protective immune responses against protease-like antigens of the murine malaria parasite Plasmodium vinckei. Vaccine 16:11931202.[CrossRef][ISI][Medline]
Guigo, R., I. Muchnik, and T. F. Smith. 1996. Reconstruction of ancient phylogenies. Mol. Phylogenet. Evol. 6:189213.[CrossRef][ISI][Medline]
Hodder, A. N., D. R. Drew, V. C. Epa et al. (12 co-authors). 2003. Enzymic, phylogenetic, and structural characterization of the unusual papain-like protease domain of Plasmodium falciparum SERA5. J. Biol. Chem. 278:4816948177.
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275282.[Abstract]
Kiefer, M. C., K. A. Crawford, L. J. Boley, K. E. Landsberg, H. L. Gibson, D. C. Kaslow, and P. J. Barr. 1996. Identification and cloning of a locus of serine repeat antigen (SERA)-related genes from Plasmodium vivax. Mol. Biochem. Parasitol. 78:5565.[CrossRef][ISI][Medline]
Knapp, B., E. Hundt, U. Nau, and H. A. Kupper. 1989. Molecular cloning, genomic structure and localization in a blood stage antigen of Plasmodium falciparum characterized by a serine stretch. Mol. Biochem. Parasitol. 32:7383.[CrossRef][ISI][Medline]
Knapp, B., U. Nau, E. Hundt, and H. A. Kupper. 1991. A new blood stage antigen of Plasmodium falciparum highly homologous to the serine-stretch protein SERP. Mol. Biochem. Parasitol. 44:113.[CrossRef][ISI][Medline]
McCutchan, T. F., J. C. Kissinger, M. G. Touray, M. J. Rogers, and J. Li. 1996. Comparison of circumsporozoite proteins from avian and mammalian malarias: biological and phylogenetic implications. Proc. Natl. Acad. Sci. USA 93:1188911894.
Michon, P., J. R. Stevens, O. Kaneko, and J. H. Adams. 2002. Evolutionary relationships of conserved cysteine-rich motifs in adhesive molecules of malaria parasites. Mol. Biol. Evol. 19:11281142.
Miller, S. K., R. T. Good, D. R. Drew, M. Delorenzi, P. R. Sanders, A. N. Hodder, T. P. Speed, A. F. Cowman, T. F. de Koning-Ward, and B. S. Crabb. 2002. A subset of Plasmodium falciparum SERA genes are expressed and appear to play an important role in the erythrocytic cycle. J. Biol. Chem. 277:4752447532.
Mirkin, B., I. Muchnik, and T. F. Smith. 1995. A biologically consistent model for comparing molecular phylogenies. J. Comput. Biol. 2:493507.[Medline]
Page, R. D. M. 1998. GeneTree: comparing gene and species phylogenies using reconciled trees. Bioinformatics 14:819820.[Abstract]
Page, R. D. M., and M. A. Charleston. 1997. Reconciled trees and incongruent gene and species trees. Pp. 5770 in B. Mirkin, F. R. McMorris, F. S. Roberts, and A. Rzhetsky, eds. Mathematical hierarchies in biology, DIMACS series in discrete mathematics and theoretical computer science, Vol. 37. American Mathematical Society, Providence, RI.
Perkins, S. L., and J. J. Schall. 2002. A molecular phylogeny of malarial parasites recovered from cytochrome b gene sequences. J. Parasitol. 88:972978.[CrossRef][ISI][Medline]
Rathore, D., A. M. Wahl, M. Sullivan, and T. F. McCutchan. 2001. A phylogenetic comparison of gene trees constructed from plastid, mitochondrial and genomic DNA of Plasmodium species. Mol. Biochem. Par. 114:8994.[CrossRef][ISI]
Rich, S. M., and F. J. Ayala. 2000. Population structure and recent evolution of Plasmodium falciparum. Proc. Natl. Acad. Sci. USA 97:69947001.
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425.[Abstract]
Templeton, T. J., and D. C. Kaslow. 1997. Cloning and cross-species comparison of the thrombospondin-related anonymous protein (TRAP) gene from Plasmodium knowlesi, Plasmodium vivax and Plasmodium gallinaceum. Mol. Biochem. Parasitol. 84:1324.[CrossRef][ISI][Medline]
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
Thornton, J. W., and R. DeSalle. 2000. Gene family evolution and homology: genomics meets phylogenetics. Annu. Rev. Genomics Hum. Genet. 1:4173.[CrossRef][ISI][Medline]
Zmasek, C. M., and S. R. Eddy. 2001. A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinfomatics 17:821828.[CrossRef]
|