Fungal Zuotin Proteins Evolved from MIDA1-like Factors by Lineage-Specific Loss of MYB Domains

Edward L. Braun and Erich Grotewold

Department of Plant Biology and Plant Biotechnology Center, Ohio State University


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Proteins are often characterized by the presence of multiple domains, which make specific contributions to their cellular function. While the gain of domains in proteins by duplication and shuffling is well established, domain loss is poorly documented. Here, we provide evidence that domain loss has played an important role in the evolution of protein architecture and function by demonstrating that fungal Zuotin proteins evolved from MIDA1-like proteins, present in animals and plants, by complete loss of the carboxyl-terminal MYB domains. Phylogenetic analyses of the DnaJ motif (the J domain) present in both Zuotin and MIDA1 proteins were complicated by the limited length and profound differences in evolutionary rates exhibited by this domain. To rigorously examine J domain phylogeny, we combined the nonparametric bootstrap with Monte Carlo simulation. This method, which we have designated the resampled parametric bootstrap, allowed us to assess type I and type II error associated with these analyses. These results revealed significant support for domain loss rather than domain gain or gene loss involving paralogs. The absence of sequences related to the MIDA1 MYB domains in Saccharomyces cerevisiae further indicates that the domains have been completely lost, consistent with known functional differences between Zuotin and MIDA1 proteins. These analyses suggest that the description of additional examples of complete domain loss may provide a method to identify orthologous proteins exhibiting functional differences using genomic sequence data.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Proteins often consist of multiple domains (Doolittle 1995Citation ; Henikoff et al. 1997Citation ), and evolutionary changes in the domain structures of these proteins have obvious functional implications. Indeed, domain structure differences have been used to infer protein-protein interactions (Enright et al. 1999Citation ; Marcotte et al. 1999Citation ). Similarity of domain structures has also been used as information for assigning orthology (Huynen and Bork 1998Citation ), a method especially helpful for large-scale genomic comparisons (e.g., Rubin et al. 2000)Citation . Variation in domain organization has largely been attributed to domain shuffling, intramolecular duplications, the fusion of distinct proteins to form multifunctional polypeptides, and pathways leading to the acquisition of novel domains during evolution (Doolittle 1995Citation ; Henikoff et al. 1997Citation ; Li 1997Citation ).

In principle, domain loss provides an additional mechanism by which diversity in domain organization has been generated, but there have been few attempts to document domain loss, in part because of the difficulties associated with distinguishing domain loss from domain gain or gene loss. Domain loss can reflect the fission of specific sequence modules (Snel, Bork, and Huynen 2000)Citation , resulting in the dispersal of the domains present in a single protein into two or more polypeptides. This mechanism is conservative, since orthologs of each domain remain after the fission. Alternatively, specific sequence modules present in proteins may undergo deletion or sufficient sequence divergence in some lineages to be eliminated (e.g., Rodriguez-Monge, Ouzounis, and Kyrpides 1998Citation ; Braun and Grotewold 1999Citation ). This form of domain loss may be conservative, since it can occur after gene duplications as part of functional divergence between paralogs (e.g., Braun and Grotewold 1999Citation ). Alternatively, DNA sequences encoding specific domains can be completely deleted in some lineages (complete domain loss). This type of change might result in proteins with orthologous domains likely to have undergone functional changes as profound as those typically characterizing paralogs with similar differences in domain structure.

Unequivocal demonstration of complete loss of domains requires entire genome sequences, since the existence of paralogs with the ancestral domain organization must be excluded. Even when complete genome sequences are available, it is important to examine the phylogenetic relationships among sequences containing the more broadly shared domain to establish whether the change in domain organization reflects loss or gain. However, the short lengths of many domains and the high degree of divergence between many multidomain proteins that exhibit differing domain organizations present substantial challenges for phylogenetic reconstruction (Nei, Kumar, and Takahashi 1998Citation ; Philippe and Laurent 1998Citation ). Since proteins that have undergone complete domain loss are unlikely to be sampled from a broad variety of organisms, limited taxon sampling may also have an impact on the estimates of phylogenetic relationships for these proteins.

The MIDA/Zuotin family of proteins provides an interesting example of variation in the domain structure associated with particular groups of organisms. MIDA1-like proteins, found in animals and plants, are characterized by the presence of an amino-terminal J domain embedded in a region of similarity to fungal Z-DNA binding proteins (Zuotins), along with two carboxyl-terminal MYB domains (fig. 1 ). The J domain is present in a large number of prokaryotic and eukaryotic proteins (Kelley 1998Citation ), but the MYB domains are related to a group of transcription factors that have been found almost exclusively in the eukaryotes (Ouzounis and Papavassiliou 1997Citation ). Thus, the MIDA1 homologs are multidomain proteins in which the ancient J domain, present in both prokaryotes and eukaryotes, has been fused to eukaryote-specific MYB domains.



View larger version (97K):
[in this window]
[in a new window]
 
Fig. 1.—Structure of MIDA1 and Zuotin proteins. A, Schematic representation of MIDA1 and Zuotin homologs, highlighting the relevant domains. B, Alignment of the mouse MIDA1 (Shoji et al. 1995Citation ), budding yeast (Saccharomyces cerevisiae) Zuotin (Zhang et al. 1992Citation ), and Arabidopsis F24K9.12 (MIDA1-like) proteins. Identical residues are indicated in black, gaps in the alignment are indicated with dashes, and the positions of the J domain and the MYB domains are shaded

 
The mouse MIDA1 protein was identified because it interacts physically with the helix-loop-helix protein Id1 (Shoji et al. 1995Citation ), and overexpression suggests a role in the regulation of growth and proliferation. This function is consistent with the identification of the human MIDA1 ortholog as an M-phase phosphoprotein (MPP11; Matsumoto-Taniura et al. 1996Citation ) and the defect in asymmetric cell division evident in Volvox carteri GlsA (MIDA1) mutants (Miller and Kirk 1999Citation ). Thus, the ancestral function of MIDA1 in animals and plants is likely to involve the regulation of cell division, although the specific targets of this protein remain unknown. MIDA1-like proteins are absent from the fungi despite a large number of phylogenetic analyses supporting an animal-fungal clade that excludes the plants (reviewed by Braun et al. 1998Citation ; Baldauf 1999Citation ). However, the fungal Zuotins (Zhang et al. 1992Citation ) correspond to MIDA1 homologs lacking the conserved carboxyl-terminal MYB domains. Thus far, Zuotins have not been found in the plant or animal kingdoms. Saccharomyces cerevisiae Zuotin, encoded by the ZUO1 gene, is a ribosome-associated DnaJ (Yan et al. 1998Citation ) that exhibits tRNA- and Z DNA-binding activities (Zhang et al. 1992Citation ; Wilhelm et al. 1994Citation ). These and other findings (Yan et al. 1998Citation ) suggest that Zuo1p is the DnaJ partner of the Ssb-type Hsp70 proteins (Nelson et al. 1992Citation ), and thus a component of the fungal translation machinery.

Here, we provide evidence that complete loss of conserved domains is one mechanism by which diversity in the domain distribution of multidomain proteins is established. By examining the origin of fungal Zuotin homologs, we determined that these DnaJ proteins originated from MIDA1-like proteins, present in animals and plants, by the complete loss of carboxyl-terminal MYB domains. We present an estimate of phylogenetic relationships among J domains—shared by MIDA1 proteins, Zuotins, and more distantly related DnaJ/Hsp40 homologs—using the results to distinguish between domain loss and more complex patterns of gene duplication and gene loss that could produce similar end results. To examine the reliability of J domain phylogeny, we employed the "resampled parametric bootstrap," combining the nonparametric bootstrap (Felsenstein 1985Citation ) with Monte Carlo simulation. This tool permitted us to place our confidence in the phylogenetic reconstruction in a substantially more rigorous framework, allowing the impact of limited sequence length and unequal evolutionary rates on both type I and type II error to be assessed. Our findings provide the first described example of complete domain loss, consistent with observed functional differences between Zuotins and MIDA1-like proteins, which we demonstrate to be orthologs. Domain loss may represent a general mechanism underlying variation in the structure of multidomain proteins, and the complete domain loss exemplified by the MIDA1/Zuotin proteins may also contribute to functional differences between proteins.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Selection and Alignment of Sequences
Homologs of the MIDA1 protein were identified using BLASTP (Altschul et al. 1997Citation ), and the positions of protein sequence motifs were established by searching the Pfam (Bateman et al. 2000)Citation . Sequences were aligned using CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ) followed by manual adjustment. Full alignments are available on request.

The protein sequences used in these analyses were as follows: (1) MIDA1 homologs from Arabidopsis thaliana (AAF00659 [F24K9.12] and BAA98200 [K16F4.7]), Caenorhabditis elegans (AAB09157), Drosophila melanogaster (AAF51678), Homo sapiens (CAA66913), Mus musculus (BAA09854), and Volvox carteri (AAD26632); (2) Zuotin homologs from S. cerevisiae (CAA45156) and Schizosaccharomyces pombe (CAB10796); (3) eukaryotic J domain proteins (closely related to MIDA1/Zuotin) from A. thaliana (AAD12011 [T3K9.23] and T00641 [F3I6.4]), Babesia bovis (AAC27389), D. melanogaster (AAF43627), Hevea brasiliensis (AAD120055), H. sapiens (Q99615), Nicotiana tabacum (AAD09516), S. cerevisiae (AAB67594), and Salix gilgiana (BAA35121 and BAA76888); (4) eukaryotic J domain proteins (more distantly related to MIDA1/Zuotin) from C. elegans (T15851 [C56C10.11], T22648 [F54D5.8], and T24396 [T03F6.2]), D. melanogaster (Q24133 [DNAJ1] and AAF47301 [GH03108]), H. sapiens (BAA88769 [DNAJ], BAA02656 [DNAJ-2], NP_036398 [HSP40–3], and NP_006136 [HDJ1]), M. musculus (NP_031895), Pisum sativum (T06594), Plasmodium falciparum (F71623 [PFB0090c] and G71610 [PFB0595w]), S. cerevisiae (NP_014172), and S. pombe (T41362); (5) prokaryotic J domain proteins from Borrelia burgdorferi (F70181), Buchnera sp. (BAB12870), Chlamydia pneumoniae (G72128), Escherichia coli (P08622), Haemophilus ducreyi (P48298), Haemophilus influenzae (P43735), Lactobacillus sakei (CAA06942), Mycoplasma genitalium (P47265), Mycoplasma pneumoniae (P78004), Nitrosomonas europaea (O06431), Rhodothermus marinus (AAD37973), Synechocystis sp. (P50027), Thermus thermophilus (Q56237), and Thermotoga maritima (B72327).

Phylogenetic Reconstruction
Phylogenetic analyses using amino acid sequences were conducted using a combination of neighbor joining (NJ; Saitou and Nei 1987Citation ), maximum parsimony (MP), and maximum likelihood (ML). Analyses conducted using the full alignment and after deleting regions that were difficult to align yielded similar results. Results obtained using the full alignment are reported.

NJ trees were inferred using either ML distance estimates or p-distances (uncorrected distances) and constraining branch lengths to be nonnegative, using either the appropriate option in PAUP* 4.0b4a (Swofford 2000)Citation or nnneighbor (W. J. Bruno, Los Alamos National Laboratories). ML distances were estimated using protml from MOLPHY 2.3b3 (J. Adachi and M. Hasegawa, Institute of Statistical Mathematics, Japan) with options "-D" (distance matrix) and "-df" (for PAM distances; Dayhoff, Schwartz, and Orcutt 1978Citation ). ML distance estimates under the BLOSUM model (Henikoff and Henikoff 1992Citation ) were calculated using a modified version of protml.

Phylogenetic reconstruction by MP used PAUP* 4.0b4a, weighting amino acid changes equally and collapsing branches if the minimum branch length was 0. ML trees were estimated using protml and the PAM model, which exhibited the best fit to the data (see Results and Discussion). MP tree searches in PAUP* used 10 random-addition sequence replicates and TBR branch swapping. Amino acid substitutions at specific sites were examined using the "trace character" option in MACLADE 3.05 (Maddison and Maddison 1992Citation ), and overall patterns of amino acid substitution were inferred using the "state changes and stasis" option in MACLADE, counting unambiguous events only. Inferred amino acid changes were classified as conservative (changes to another amino acid within the same class, using the six classes of amino acids described by Dayhoff, Schwartz, and Orcutt [1978Citation ]) or radical (changes to another class of amino acid). For the PAM model, the expected numbers of amino acid substitutions in each class were estimated by simulations, as described below. Among-sites rate heterogeneity was examined using an eight-category discrete approximation to a {Gamma} distribution (Yang 1994Citation ), as implemented in TREE-PUZZLE 4.0.2 (K. Strimmer, University of Oxford, and A. von Haeseler, Max Planck Institute for Evolutionary Anthropology). Improvements in model fit for nested models were evaluated using the likelihood ratio test (Felsenstein 1988Citation ; Goldman and Whelan 2000Citation ), comparing the test statistic (2{delta}, where {delta} is the difference between ln L values for each model) with the appropriate {chi}2 or 2 distribution.

The reliability of phylogenetic analyses was assessed by nonparametric bootstrap resampling (Felsenstein 1985Citation ) using 500 replicates and adding sequences in random orders for each replicate. MP bootstrap analyses used 10 random additions and no branch swapping.

To examine the probability of obtaining specific bootstrap proportions, 500 data sets were simulated based on the null hypothesis (H0) tree using PSeq-Gen 1.1 (Grassly, Adachi, and Rambaut 1997Citation ) and the PAM+{Gamma} model of evolution with parameters estimated from the data assuming H0. (The H0 topologies, branch lengths, and other simulation parameters are available on request.) Then, the nonparametric bootstrap support for the relevant group was estimated from each simulated data set as described above. The probability of observing a given bootstrap value if H0 were correct corresponds to the proportion of simulated data sets with bootstrap values greater than or equal to the value, with the critical bootstrap value given by the upper 5% of this null distribution (5% probability of type I error). The power of a specific analysis can be assessed using simulations under the alternative hypothesis. Power was calculated using the proportion of simulations under the alternative hypothesis (HA) with bootstrap support in the upper 5% of the distribution generated using the null hypothesis (since simulations under HA with support under the critical bootstrap value would reflect cases of type II error, and power = 1 - ß, where ß is the type II error probability). Like the data for H0 simulations, all data relevant to the HA simulations are available on request. Unix shell scripts used to conduct these analyses are also available on request.


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
Primary Structure and Molecular Evolution of MIDA1 Homologs
The primary question we wanted to address was whether the fungal Zuotins exhibit an orthologous or paralogous relationship to MIDA1 proteins. Given the large number of distinct analyses supporting the existence of an animal-fungal clade (Baldauf and Palmer 1993Citation ; Wainright et al. 1993Citation ; Nikoh et al. 1994Citation ; reviewed by Braun et al. 1998Citation ; Baldauf 1999Citation ), an orthologous relationship between MIDA1-type proteins and the Zuotins would suggest that the difference in their domain organizations reflects domain loss. Searches of genomic sequences from S. cerevisiae and S. pombe did not reveal the presence of any MYB domains closely related to those present in the animal and plant MIDA1 homologs (data not shown). This indicates that the domain loss event would be complete, rather than a more conservative gene fission.

To investigate evolutionary relationships among MIDA1-like proteins from animals, plants, and fungi, we estimated the phylogeny of these proteins using the Zuotin homologous region (fig. 1B ). These analyses revealed a high level of bootstrap support for each eukaryotic kingdom (fig. 2 ). In fact, this estimate of MIDA1/Zuotin phylogeny is compatible with current estimates of animal phylogeny, which suggest the existence of a clade (the Ecdysozoa; Aguinaldo et al. 1997Citation ) containing the arthropods (D. melanogaster) and nematodes (C. elegans). The only gene duplication implied by this phylogeny occurred within the land plant lineage after the divergence of the chlorophyte algae, resulting in the A. thaliana MIDA1 homologs encoded by paralogous genes on chromosomes 3 (F24K9.12) and 5 (K16F4.7).



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 2.—Phylogeny of the MIDA1/Zuotin proteins based on the Zuotin homology region. Branch lengths were estimated by maximum-likelihood using the PAM+{Gamma} model of sequence evolution and are presented as expected amino acid substitutions per site (EAASS). Bootstrap support based on neighbor joining of PAM distances (500 replicates) is presented above branches as percentages

 
Although other MIDA1 or Zuotin paralogs containing the J domain could not be identified, searches using the MYB domains revealed the existence of proteins containing these MYB domains related to those in MIDA1 in both C. elegans and A. thaliana (data not shown). In addition, a distinct class of plant MYB homologs, including the tomato I-box binding factor (Rose, Meier, and Wienand 1999Citation ), are characterized by the presence of a MIDA1-like MYB domain, in addition to a more distantly related MYB domain. The existence of these proteins can be explained by domain shuffling and the more conservative form of domain loss in which gene duplication precedes domain loss.

The estimate of MIDA1/Zuotin phylogeny obtained using the Zuotin homology region (fig. 2 ) does not allow the reconstruction of the pattern of evolution for the MYB domains present in MIDA1 homologs, because the tree cannot be rooted in the absence of an outgroup containing the Zuotin region. However, this analysis provided valuable information regarding evolutionary relationships within this group and their patterns of sequence evolution. We found that accommodating the unequal amino acid composition of these proteins resulted in a significant improvement in model fit (ln L = -3,408.69 [proportional model]; ln L = -3,549.09 [Poisson model]; P < 0.001). Use of the PAM model of evolution (Dayhoff, Schwartz, and Orcutt 1978Citation ) to accommodate differences in the probability for different types of amino acid substitution resulted in additional improvement (ln L = -3,259.26) and further revealed that the PAM model was better than the related JTT and BLOSUM models (ln L = -3,261.02 [JTT]; ln L = -3,298.20 [BLOSUM]). The Zuotin region exhibits modest among-sites rate heterogeneity, and the assumption that rates follow a {Gamma} distribution (with shape parameter {alpha} = 1.07) resulted in a significant improvement to the PAM model of evolution (ln L = -3,196.22; P < 0.001). Similar results were obtained using the J domain alone, regardless of the specific set of sequences analyzed (data not shown). Despite the improvement in likelihood score when the PAM model of sequence evolution was used, all models (and methods of phylogenetic inference) supported the same relationships among these sequences.

Two alternative positions for the root of the MIDA1/Zuotin phylogeny (fig. 2 ) are compatible with current estimates of eukaryotic phylogeny (reviewed by Braun et al. 1998Citation ; Baldauf 1999Citation ). Placement of the root between the plants and an animal-fungal clade ({alpha} in fig. 2 ) is compatible with organismal phylogeny in the absence of ancient gene duplications, and it is also suggested by midpoint rooting (data not shown). This placement suggests the complete loss of MYB domains in MIDA1-like proteins during the evolution of the fungi. Placement of the root between MIDA1 and Zuotin proteins (ß in fig. 2 ) could be reconciled with current estimates of organismal phylogeny by postulating (minimally) the loss of the MIDA1 ortholog in the S. cerevisiae lineage along with independent losses of Zuotin orthologs in the Ecdysozoa and the Arabidopsis lineage. There is evidence for gene loss in S. cerevisiae, C. elegans, and D. melanogaster based on both large-scale surveys (Aravind et al. 2000Citation ; Braun et al. 2000Citation ; Rubin et al. 2000)Citation and detailed analyses of specific gene families (Braun et al. 1998Citation ; Steele, Stover, and Sakaguchi 1999Citation ). Since members of at least one other class of MYB homologs (the c-MYB-like transcription factors found in plants and animals; Braun and Grotewold 1999Citation ) have been lost in S. cerevisiae and C. elegans, it is especially evident that the gene loss model cannot be dismissed. Alternatively, support for root ß could indicate that previous estimates of eukaryotic phylogeny supporting the existence of an animal-fungal clade are incorrect (Wang, Kumar, and Hedges [1999Citation ] suggest that support for this clade is overstated), although we believe this is unlikely given the distinct lines of evidence supporting the animal-fungal clade (summarized by Braun et al. 1998Citation ; Baldauf 1999Citation ).

Thus, the observed phylogeny and the absence of MYB domains in the fungal Zuotins can be reconciled with either gene loss or complete domain loss models. We believe that the difficulties in distinguishing between these two models with a high degree of confidence are a major reason why examples of complete domain loss have not previously been reported. Thus, it is essential to precisely define the position of the root in the MIDA1/Zuotin phylogeny (fig. 2 ).

Placement of the Root in the MIDA1/Zuotin Subgroup
The placement of the root for the MIDA1/Zuotin phylogeny could be reliably inferred using an outgroup. However, use of a DnaJ outgroup required limiting the alignment to the much shorter region of sequence homology corresponding to the J domain, only 75 residues in length (fig. 1B ). To determine the position of the root within the MIDA1/Zuotin subgroup and establish the identity of the DnaJ protein most closely related to this subgroup, we retrieved a large set of DnaJ homologs—including prokaryotic sequences—and estimated the phylogeny of J domains related to MIDA1 and Zuotin (fig. 3 ).



View larger version (46K):
[in this window]
[in a new window]
 
Fig. 3.—Phylogeny of J domains related to MIDA1 and Zuotin. Branch lengths were estimated by ML using the PAM+{Gamma} model of evolution. Bootstrap support based on neighbor joining of PAM distances (500 replicates) is indicated below branches when it exceeds 50%. The MIDA1/Zuotin group is shaded for emphasis. Prokaryotic J domains are presented in capital letters, the subgroup used in the eukaryotic data set are indicated with asterisks, and the rapidly evolving J domains ("fast" sequences) are indicated with bold letters. The paralogous MIDA1-like proteins from Arabidopsis thaliana are differentiated by the chromosomal locations of the genes encoding these proteins (chromosome 3 or chromosome 5)

 
In phylogenetic trees obtained by NJ of PAM distances estimated using the J domain alignment, there is 75% bootstrap support for an animal-fungal clade containing MIDA1 and Zuotin proteins (fig. 3 ). Lower levels of bootstrap support for an animal-fungal J domain clade were evident when using NJ of p-distances and equally weighted MP (table 1 ), although neither analysis provided any support for placement of the root in position ß. The data exhibited a good fit to the PAM+{Gamma} model of sequence evolution, suggesting that the best method of phylogenetic estimation used was NJ of PAM distances, which provided support for root {alpha} in figure 2 . Indeed, this method appeared to represent the most powerful test of this hypothesis (see below).


View this table:
[in this window]
[in a new window]
 
Table 1 The Impact of Taxon Sampling and Different Methods of Phylogenetic Analysis on Support for Grouping Animal MIDA1 Homologs and Fungal Zuotins

 
Since the exact set of sequences examined in phylogenetic analyses can have a profound effect on estimates of evolutionary relationships (Lecointre et al. 1993Citation ), we wanted to assess the impact of the specific set of J domain sequences used. Analyses using a subset of eukaryotic J domains that appeared more closely related to the MIDA1 group (designated the eukaryotic data set) were conducted, providing results virtually identical to those obtained with the larger data set (table 1 ). Since the MIDA1 homologs from the Ecdysozoa (C. elegans and D. melanogaster) and V. carteri and Zuotin from S. cerevisiae appeared divergent regardless of the region analyzed (see branch lengths in figs. 2 and 3 ), we examined the impact of removing these rapidly evolving ("fast") sequences. Although there are clear limitations imposed by the small number of MIDA1 and Zuotin sequences available, these analyses substantially increased bootstrap support for an animal-fungal J domain clade (table 1 ) for NJ (of both PAM and p-distances) and MP analyses, providing additional evidence for root {alpha}.

Assessing Support for Placement of the Root Using the Resampled Parametric Bootstrap
Placement of the root at position {alpha} in figure 2 suggests that fungal Zuotin proteins evolved from MIDA1-like proteins through loss of the MYB domains, providing the first reported example of this type of domain loss in a specific group of organisms. However, this placement of the root is contingent on the 75% bootstrap support for an animal-fungal J domain clade (fig. 3 , branch {alpha}). This level of bootstrap support is often considered fairly high, since a number of studies have demonstrated that the bootstrap provides underestimates of the actual confidence level in a phylogenetic hypothesis under a number of conditions (Zharkikh and Li 1992, 1995Citation ; Felsenstein and Kishino 1993Citation ; Hillis and Bull 1993Citation ). However, given the implications for the origin of Zuotin proteins that root {alpha} has, we wanted to examine support for an animal-fungal J domain clade in a more rigorous fashion.

The complex relationship between bootstrap proportion and the probability that a specific clade exists (Hillis and Bull 1993Citation ; Zharkikh and Li 1995Citation ; Huelsenbeck, Hillis, and Jones 1996Citation ) represents a frustrating aspect of bootstrap analyses. Since Monte Carlo simulation allows the analysis of a variety of informative statistics when their distributions cannot be obtained from theory (Besag and Diggle 1977Citation ), we envisioned that it might provide a solution to this problem. Indeed, Monte Carlo simulation was one method used to determine the tendency of the bootstrap to underestimate the support for specific groups (Hillis and Bull 1993Citation ). Monte Carlo simulation has also been used extensively to examine bias in phylogenetic analyses (e.g., Huelsenbeck, Hillis, and Jones 1996Citation ; Huelsenbeck 1998Citation ; Chang and Campbell 2000Citation ; Sanderson et al. 2000Citation ), so an additional benefit of this approach is its ability to determine whether the position of the J domains from plant MIDA1 homologs might be driven by differences in their evolutionary rates.

To determine the probability of observing a bootstrap value supporting root {alpha} as extreme as that observed (fig. 3 and table 1 ) given the null hypothesis (H0) of placing the root in position ß, 500 data sets were simulated assuming H0. Bootstrap support for monophyly of animal-fungal MIDA1/Zuotin J domains—the alternative hypothesis (HA) in this analysis—was then estimated for each simulation. Analyses of the eukaryotic data set using NJ of PAM distances allowed us to reject H0 (root position ß) whether or not the fast sequences were included (23 simulations >= 75% bootstrap support, P = 0.046 [all sequences]; 4 simulations >= 98% bootstrap support, P = 0.008 [fast sequences removed]), although support was stronger when fast sequences were removed. Similar results were obtained using the full data set (P = 0.06 [all sequences] and P = 0.02 [fast sequences removed]), although analyses using the broader set of sequences required substantially longer times to run, and the bootstrap support using all sequences was slightly lower than necessary for significance. Thus, depending on the specific data set analyzed, the support for placement of the root of the MIDA1/Zuotin subgroup in position {alpha} was usually significant and sometimes highly significant. Since the methodology we employed involves nonparametric bootstrap resampling of data sets generated by Monte Carlo simulation (also called the parametric bootstrap), we call this Monte Carlo approach the "resampled parametric bootstrap."

The value of employing the resampled parametric bootstrap in this study rather than applying a "rule-of-thumb" critical value for the bootstrap (such as 70%) is revealed by differences in the critical bootstrap value (the value defining the upper 5% of the null distribution) calculated by simulation. For this analysis, the critical value ranges from as little as 73% (for the eukaryotic data set) to as much as 94% (for the full data set excluding fast sequences). Thus, the inclusion of different sequences changed both the observed bootstrap proportions and the meaning of specific bootstrap values. This variation in the correspondence between bootstrap values and confidence in specific clades given differences in factors such as branch lengths is consistent with the results of previous simulations (Huelsenbeck, Hillis, and Jones 1996Citation ), suggesting that application of the resampled parametric bootstrap to various problems can provide a much more rigorous test of support for specific phylogenetic hypotheses.

The resampled parametric bootstrap can provide a rigorous test for bias due to factors such as long-branch attraction (Felsenstein 1978Citation ), allowing one to determine whether long branches in a given phylogeny are long enough to both attract and exhibit the level of bootstrap support observed from the data. Since the H0 topology is characterized by a very short internal branch uniting the animals and the plants, as well as long branches to the outgroup and plant MIDA1 subgroups (data not shown), long-branch attraction might explain the observed support for HA (fig. 3 and table 1 ). Indeed, phylogenetic analyses of some data sets simulated assuming H0 revealed relationships consistent with HA (table 2 ). However, the observed bias did not appear to be extremely strong, with fewer than 60% of simulated replicates supporting HA in analyses based on either NJ of corrected distances or MP. Since the resampled parametric bootstrap is a method of establishing expected bootstrap proportions given a specific null hypothesis, it provides a more conservative means to assess the impact of long-branch attraction on phylogenetic analyses. In fact, this approach was recently used specifically to examine potential bias in estimates of phylogeny based on vertebrate rhodopsin sequences (Chang and Campbell 2000)Citation . The ability of the resampled parametric bootstrap to provide information regarding the impact of bias on phylogenetic analyses might represent an advantage over other methods available for establishing the statistical significance of bootstrap support (e.g., Zharkikh and Li 1995Citation ; Efron, Halloran, and Holmes 1996Citation ).


View this table:
[in this window]
[in a new window]
 
Table 2 Bias in Phylogenetic Estimation for MIDA1/Zuotin Phylogenya

 
However, our analyses of J domain phylogeny suggest a need to extend the resampled parametric bootstrap beyond analyses using simulation under H0, since analyses conducted using either MP or NJ of p-distances did not provide significant support for HA (P = 0.118 [MP] and P = 0.3 [NJ of p-distances]). Although these results are consistent with the lower bootstrap proportions observed using these methods (table 1 ), they raise the question of why only one method provided significant support. To address this issue, 500 data sets were simulated assuming HA (monophyly of animal and fungal J domains), and the proportion of bootstrap proportions less than the critical value estimated by simulation under H0 was found, since this corresponded to the type II error probability. The power of a statistical analysis is defined as the probability that it can reject the null hypothesis (Cohen 1977Citation ), i.e., one minus the probability of type II error. We found that analyses using NJ of PAM distances (power = 0.614) were more powerful than analyses using MP (power = 0.442) or NJ of p-distances (power = 0.3). The results of this dual simulation strategy suggest that the failure of analyses using either MP or NJ of p-distances to reject H0 could reflect the lower power of these analyses.

The incorporation of prior knowledge about types of amino acid substitutions likely to occur during protein evolution based on analysis of many of homologous proteins, using evolutionary models such as the PAM model (Dayhoff, Schwartz, and Orcutt 1978Citation ), may increase the power of phylogenetic analyses. However, it is important to note that the current study assumed that the PAM+{Gamma} model of evolution was a sufficiently accurate representation of the actual process of J domain evolution to provide useful estimates of both type I and type II error for phylogenetic analyses using this region. The PAM+{Gamma} model of evolution had the best fit to the J domain data (see above), and the observed proportion of conservative amino acid substitutions in the J domain was close to that expected under the PAM+{Gamma} model (see below), suggesting that it is a reasonable representation of J domain evolution despite the fact that the actual process of amino acid substitution in proteins is almost certainly more complex than the process assumed by the PAM+{Gamma} model.

Despite these concerns regarding differences between the PAM+{Gamma} model of sequence evolution and the actual process of amino acid substitution for J domains, we believe that the simulations conducted as part of this study provide reasonable estimates of both type I and type II error because the PAM model of evolution with among-sites rate heterogeneity captures many notable aspects of J domain evolution. In fact, the examination of power using this dual simulation approach and the resampled parametric bootstrap may provide valuable information regarding the behavior of different methods of phylogenetic estimation. Simulations have shown that the best methods for examining specific phylogenetic problems can be difficult to predict by evaluating model fit alone. For example, NJ of p-distances may be more efficient than NJ of distances estimated using a more complex model even when the data were generated using the more complex model (Takahashi and Nei 2000)Citation . Likewise, MP or ML using a simple model may also be more efficient than ML using the true model for some trees (Hillis, Huelsenbeck, and Swofford 1994Citation ; Yang 1997Citation ; Huelsenbeck 1998Citation ). Thus, NJ or ML analysis using the best-fitting model (based on ML analyses) may not necessarily represent the best method of phylogenetic inference. Indeed, the dual simulation approach can even provide information about the best method of phylogenetic estimation when proteins exhibit patterns of evolution that have not been implemented as models in an ML framework.

Evaluating the probability of type II error for phylogenetic analyses using different models, distance transformations, or weighting schemes could represent an excellent method of choosing among different analyses. Likewise, the impact of changing the set of sequences included in a phylogenetic analysis can be evaluated. We found a slight decrease in power for analyses using the eukaryotic data set with the fast sequences removed (power = 0.552). This suggests that inclusion of J domain sequences from the Ecdysozoa, S. cerevisiae, and V. carteri was beneficial for these analyses, despite the increased bootstrap support for the animal-fungal clade (table 1 ) and the clear rejection of H0 using NJ of PAM distances after eliminating divergent J domain sequences from the fast taxa (see above). Furthermore, some of the fast J domain sequences were involved in rearrangements within the animal MIDA1 and fungal Zuotin group, with some analyses suggesting that S. cerevisae Zuotin is the outgroup of the animal and fungal sequences (also note the very short branch uniting the fungi in fig. 3 ). The strong support for monophyly of the fungi and other eukaryotic kingdoms (fig. 2 ) suggests that the inference of different topologies for the MIDA1/Zuotin J domains reflects the limited length of the J domain. This further emphasizes the need for caution in placing the root of this group and underscores the need to conduct the simulations we used to examine the position of the root. Analyses excluding the divergent J domain sequences exhibited a greater degree of bias in phylogenetic reconstruction (table 2 ), probably explaining the decrease in power.

Despite the limited power of all analytical approaches used, which highlighted the need for caution when interpreting phylogenetic analyses of relatively short sequence alignments, such as the J domain alignment analyzed here, we believe that the rejection of H0 using NJ of PAM distances provided evidence for root {alpha}. The limited power of analyses using either MP or NJ of p-distances also provided a reasonable explanation for the inability of these analyses to support root {alpha}, and it is important to note that none of these analyses supported root ß. Taken as a whole, the analyses we conducted provided evidence that fungal Zuotin proteins were derived from MIDA1-like proteins present in the common ancestor of animals, plants, and fungi by complete loss of the carboxyl-terminal MYB domains.

Comparison Between Patterns of Evolution for the J Domain and the PAM Model of Evolution
An often untested aspect of parametric tests of evolutionary hypotheses is the impact of model inaccuracy on their results. Although the PAM model of sequence evolution exhibited a better fit to the evolution of the J domain than the other models tested (the JTT and BLOSUM models; see above), an untested model that fits the data even better may exist.

To examine the fit of the PAM model, the types of amino acid changes that occurred in the J domain were reconstructed by equally weighted parsimony. As we expected, a substantial bias toward conservative changes was evident in the J domain region (213 of 416 unambiguous changes were conservative substitutions, using the categories described by Dayhoff, Schwartz, and Orcutt [1978Citation ]). This number of conservative changes is significantly greater than that expected under an unconstrained model (60.3 conservative substitutions expected, {chi}2 = 440.23, P < 0.001) or a model constrained to amino acid substitutions involving single-nucleotide substitutions (145.6 conservative substitutions expected, {chi}2 = 48.0, P < 0.001). However, the number of observed conservative changes (213) was only slightly greater than expected under the PAM+{Gamma} model of evolution (191 conservative substitutions expected, {chi}2 = 4.63, P = 0.032). Comparison of observed and expected numbers of substitutions in each amino acid category revealed that the deviation from the PAM model resulted from greater numbers of conservative changes involving the hydrophobic (55 observed and 27.3 expected, {chi}2 = 30.13, P < 0.001) and aromatic (20 observed and 10.3 expected, {chi}2 = 9.5, P = 0.002) groups than expected under the PAM+{Gamma} model. The observed numbers of substitutions in other categories did not differ from expectation under the PAM+{Gamma} model (P > 0.1). Thus, the PAM+{Gamma} model of evolution appeared to exhibit a relatively good fit to the J domain data with the exception of the excess substitutions in two specific amino acid classes.

To examine the robustness of phylogenetic analyses to the observed deviation from the PAM model, we examined bootstrap support for an animal-fungal J domain clade (fig. 3 , branch {alpha}) based on NJ of distance estimates obtained using a model of evolution based on the BLOSUM matrix (Henikoff and Henikoff 1992Citation ). The BLOSUM model was more tolerant of substitutions involving both hydrophobic and aromatic amino acids than the PAM model, probably explaining the poorer fit of this model to the data in ML analyses (see above). Although NJ analyses using BLOSUM distances resulted in an increased probability of type II error (power = 0.494), the observed level of bootstrap support was similar to that obtained using NJ of PAM distances (table 1 ), and we found significant support for the existence of an animal-fungal J domain clade (P = 0.046 with the eukaryotic data set). Likewise, NJ of BLOSUM distances did not result in substantially increased bias in phylogenetic reconstruction (table 2 ). These results suggest that the results of NJ of corrected distances for the J domain are relatively robust to deviations between the model of sequence evolution used to obtain distance estimates and the actual pattern of evolution, providing additional support for our conclusions regarding the origin of fungal Zuotins by the process of complete domain loss in the fungal lineage.

Despite the support for root {alpha} provided by parametric approaches to phylogenetic analyses (see above), we wanted to examine the amino acid changes that supported different placements of the root for MIDA1 and Zuotin phylogeny. Thus, we looked for potential synapomorphies (shared derived character states) uniting either the animal MIDA1 and fungal Zuotin J domains (root {alpha}) or the animal and plant MIDA1 J domains (root ß). Examination of ancestral state reconstructions for all sites in the J domain alignment revealed the presence of two sites (aligned to Val 124 and Lys 150 of the S. cerevisiae Zuotin sequence) supporting root {alpha} and the absence of sites supporting root ß, consistent with our phylogenetic analyses supporting root {alpha} (fig. 3 and table 1 ). The first site supporting root {alpha} involved a change from Ala in the plant and outgroup J domains to Val in the animals and fungi, while the second involved a change from Glu in the plant and outgroup J domains to Lys in the animals and fungi. Volvox carteri GlsA differs from animal, fungal, plant, and outgroup J domains at both of these sites (with Cys and Asp residues aligned to Val 124 and Lys 150 of the S. cerevisiae Zuotin sequence, respectively). Since V. carteri GlsA is one of the divergent sequences (see branch lengths in figs. 2 and 3 ), the existence of unique (autapomorphic) character states in this sequence is not surprising. We believe that the support for root {alpha} from parametric analyses, coupled with the absence of any sites potentially supporting monophyly of J domains from animal and plant MIDA1-like proteins, provides strong evidence for our hypothesis that fungal Zuotin proteins evolved from MIDA1-like ancestors through complete domain loss.

Additional Applications of the Resampled Parametric Bootstrap
The use of a dual simulation approach to examine both type I and type II error represents an excellent method for examining specific questions in molecular evolution. Previous simulation studies to examine power in phylogenetic analyses have focused on establishing the number of sites necessary to have a given probability of reconstructing the correct topology (Hillis, Huelsenbeck, and Swofford 1994Citation ; Huelsenbeck, Hillis, and Jones 1996Citation ). The dual simulation approach we employed here could extend this approach to power analysis by establishing the number of sites necessary to reject a null hypothesis with specific type I and type II error probabilities, such as the values used by convention in many studies (probability of type I error = 0.05 and power = 0.8; see Cohen 1977Citation ). However, we want to emphasize that the dual simulation approach used here is not limited to the resampled parametric bootstrap. Indeed, it could also be applied to tests that compare differences in ML, MP, or minimum evolution (ME) scores with a null distribution generated by simulation (e.g., Swofford et al. 1996Citation ; a Monte Carlo approach called the "SOWH test" by Goldman, Anderson, and Rodrigo [2000Citation ]).

The use of the resampled parametric bootstrap and either NJ or MP with a fast search algorithm (e.g., the methods used by Takahashi and Nei 2000)Citation may have benefits relative to the SOWH test for some phylogenetic problems. The SOWH test requires the identification of the optimal topology using ML, MP, or ME for each simulated data set, which may not be tractable from a computational standpoint for data sets with a large number of sequences. Since increased taxon sampling has been suggested to improve the accuracy of phylogenetic estimation in some cases (Lecointre et al. 1993Citation ; Hillis 1996Citation ), the ability to conduct simulation-based analyses of such large data sets is clearly desirable. In addition, searches for optimal topologies may not result in the identification of the true topology when applied to data sets with a limited number of aligned sites, like the J domain (fig. 1 ). In fact, simulation studies have demonstrated that the true topology often has a slightly suboptimal ML, MP, or ME score in analyses of short alignments (see Nei, Kumar, and Takahashi 1998Citation ). Even if the true topology has the optimal ML, MP, or ME score, it may be one of multiple equally optimal topologies identified using these criteria when the alignment is short. Thus, it has been suggested that phylogenetic analyses of short alignments should focus on establishing support for specific groups rather than conducting extensive tree searches in order to identify optimal topologies (Nei, Kumar, and Takahashi 1998Citation ). Since the resampled parametric bootstrap uses the support for a specific group as a test statistic, it may have advantages over the SOWH test (which uses differences between the values of optimality criteria as a test statistic) for analyses of short alignments.

In this study, all simulations were focused on testing a single null hypotheses developed by combining information about the domain structures of MIDA1 and Zuotin proteins, previous estimates of eukaryotic phylogeny, and estimates of phylogeny obtained using the larger Zuotin homology region. Multiple simulations were used to assess the impact of changing both the set of sequences analyzed and the parameters describing the evolutionary process on our tests of this null hypothesis. Thus, the null hypothesis was rejected in this study with P-critical = 0.05, with the results of different simulations interpreted as evidence that differences in taxon selection and parameter estimates had limited impact on the null distribution of bootstrap proportions. The similarities of the null distributions obtained with different simulations suggest that our conclusions regarding J domain phylogeny are fairly robust (see above). However, we stress the importance of conducting an appropriate correction for global type I error (e.g., Rice 1989Citation ) when multiple independent phylogenetic hypotheses are examined. Since corrections of type I error for conducting multiple independent tests inflate type II error, conducting simulations under the relevant alternative hypotheses in order to assess the power of phylogenetic analyses will also provide valuable information when these corrections are used.


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
The role of domain loss in producing variation in the distribution of domains in multidomain proteins is not unexpected, and proteome scale studies of the distribution of multidomain proteins have suggested that specific processes have acted to balance the formation of multidomain proteins (Wolf et al. 1999Citation ). However, the relative contributions of protein fission, domain loss after gene duplication, and complete domain loss remain unclear. In fact, the complete domain loss apparently involved in the origin of Zuotin from MIDA1-like proteins can only be demonstrated using complete genome sequence information for relevant organisms.

The limited sequence length and high degree of divergence between homologous domains present in many multidomain proteins that exhibit different domain structures present substantial challenges for the phylogenetic analyses required to distinguish between gene loss, domain gain, and domain loss. The dual simulation strategy described here allows examination of the impact that both limited sequence length and high degrees of sequence divergence have on type I and type II error in phylogenetic estimation. This method could be combined with broader surveys of complete domain loss as a useful strategy for the identification of proteins that have undergone functional changes during evolution. As genomic sequence data accumulate, it should become possible to conduct these types of analyses in a framework that explicitly considers the probabilities of domain gain and loss, as well as the probabilities of gene duplication and gene loss. Indeed, considering the possibility of complete domain loss may alter the interpretation of previous surveys of gene loss in particular groups of organisms (e.g., Aravind et al. 2000Citation ; Braun et al. 2000)Citation , since some putative gene loss events revealed by these surveys may correspond only to (complete) domain losses.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 
We thank Rebecca Kimball for helpful discussions regarding both phylogenetic analyses and the use of Monte Carlo simulation and for careful reading of this manuscript. We are also grateful for the comments of two anonymous reviewers, which improved this manuscript. This work was supported by the USDA (fellowship 1999-01582 to E.L.B), the National Science Foundation (grant MCB-9896111 to E.G.), and grants to E.G. from Pioneer Hi-Bred International and the Ohio State University Office of Research.


    Footnotes
 
Elizabeth Kellogg, Reviewing Editor

1 Abbreviations: ML, maximum likelihood; MP, maximum parsimony; NJ, neighbor joining. Back

2 Keywords: domain evolution gene loss protein evolution statistical power analysis comparative genomics Monte Carlo simulation Back

3 Address for correspondence and reprints: Edward L. Braun, Department of Plant Biology and Plant Biotechnology Center, 206 Rightmire Hall, 1060 Carmack Road, Ohio State University, Columbus, Ohio 43210. E-mail: braun.83{at}osu.edu Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Conclusions
 Acknowledgements
 References
 

    Aguinaldo A. M. A., J. M. Turbeville, L. S. Linford, M. C. Rivera, J. R. Garey, R. A. Raff, J. A. Lake, 1997 Evidence for a clade of nematodes, arthropods and other moulting animals Nature 387:489-493[ISI][Medline]

    Altschul S. F., T. L. Madden, A. A. Schäffer, J. H. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402[Abstract/Free Full Text]

    Aravind L., H. Watanabe, D. J. Lipman, E. V. Koonin, 2000 Lineage-specific loss and divergence of functionally linked genes in eukaryotes Proc. Natl. Acad. Sci. USA 97:11319-11324[Abstract/Free Full Text]

    Baldauf S. L., 1999 A search for the origins of animals and fungi: comparing and combining molecular data Am. Nat 154:S178-S188[ISI][Medline]

    Baldauf S. L., J. D. Palmer, 1993 Animals and fungi are each other's closest relatives: congruent evidence from multiple proteins Proc. Natl. Acad. Sci. USA 90:11558-11562[Abstract]

    Bateman A., E. Birney, R. Durbin, S. R. Eddy, K. L. Howe, E. L. L. Sonnhammer, 2000 The PFAM protein families database Nucleic Acids Res 28:263-266[Abstract/Free Full Text]

    Besag J., P. J. Diggle, 1977 Simple Monte Carlo tests for spatial patterns Appl. Stat 26:327-333

    Braun E. L., E. Grotewold, 1999 Newly discovered plant c-myb-like genes rewrite the evolution of the plant myb gene family Plant Physiol 121:21-24[Free Full Text]

    Braun E. L., A. L. Halpern, M. A. Nelson, D. O. Natvig, 2000 Large-scale comparison of fungal sequence information: mechanisms of innovation in Neurospora crassa and gene loss in Saccharomyces cerevisiae. Genome Res 10:416-430[Abstract/Free Full Text]

    Braun E. L., S. Kang, M. A. Nelson, D. O. Natvig, 1998 Identification of the first fungal annexin: analysis of annexin gene duplications and implications for eukaryotic evolution J. Mol. Evol 47:531-543[ISI][Medline]

    Chang B. S. W., D. L. Campbell, 2000 Bias in phylogenetic reconstructions of vertebrate rhodopsin sequences Mol. Biol. Evol 17:1220-1231[Abstract/Free Full Text]

    Cohen J., 1977 Statistical power analysis for the behavioral sciences Academic Press, New York

    Dayhoff M. O., R. M. Schwartz, B. C. Orcutt, 1978 A model of evolutionary change in proteins Pp. 345–352 in M. O. Dayhoff, ed. Atlas of protein sequence and structure. National Biomedical Research Foundation, Silver Springs, Md

    Doolittle R. F., 1995 The multiplicity of domains in proteins Annu. Rev. Biochem 64:287-314[ISI][Medline]

    Efron B., E. Halloran, S. Holmes, 1996 Bootstrap confidence levels for phylogenetic trees Proc. Natl. Acad. Sci. USA 93:13429-13434[Abstract/Free Full Text]

    Enright A. J., I. Iliopoulos, N. C. Kyrpides, C. A. Ouzounis, 1999 Protein interaction maps for complete genomes based on gene fusion events Nature 402:86-90[ISI][Medline]

    Felsenstein J., 1978 Cases in which parsimony or compatibility methods will be positively misleading Syst. Zool 27:401-410[ISI]

    ———. 1985 Confidence limits on phylogenies—an approach using the bootstrap Evolution 39:783-791[ISI]

    ———. 1988 Phylogenies from molecular sequences—inference and reliability Annu. Rev. Genet 22:521-565[ISI][Medline]

    Felsenstein J., H. Kishino, 1993 Is there something wrong with the bootstrap on phylogenies—a reply Syst. Biol 42:193-200[ISI]

    Goldman N., J. P. Anderson, A. G. Rodrigo, 2000 Likelihood-based tests of topologies in phylogenetics Syst. Biol 49:652-670[ISI][Medline]

    Goldman N., S. Whelan, 2000 Statistical tests of gamma-distributed rate heterogeneity in models of sequence evolution in phylogenetics Mol. Biol. Evol 17:975-978[Free Full Text]

    Grassly N. C., J. Adachi, A. Rambaut, 1997 PSeq-Gen: an application for the Monte Carlo simulation of protein sequence evolution along phylogenetic trees Comput. Appl. Biosci 13:559-560[Medline]

    Henikoff S., E. A. Greene, S. Pietrokowski, P. Bork, T. K. Attwood, L. Hood, 1997 Gene families: the taxonomy of protein paralogs and chimeras Science 278:609-614[Abstract/Free Full Text]

    Henikoff S., J. G. Henikoff, 1992 Amino acid substitution matrices from protein blocks Proc. Natl. Acad. Sci. USA 89:10915-10919[Abstract]

    Hillis D. M., 1996 Inferring complex phylogenies Nature 383:130-131[ISI][Medline]

    Hillis D. M., J. J. Bull, 1993 An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis Syst. Biol 42:182-192[ISI]

    Hillis D. M., J. P. Huelsenbeck, D. L. Swofford, 1994 Hobgoblin of phylogenetics? Nature 369:363-364[ISI][Medline]

    Huelsenbeck J. P., 1998 Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? Syst. Biol 47:519-537[ISI][Medline]

    Huelsenbeck J. P., D. M. Hillis, R. Jones, 1996 Parametric bootstrapping in molecular phylogenies: applications and performance Pp. 19–45 in J. D. Ferraris and S. R. Palumbi, eds. Molecular zoology. Wiley-Liss, New York

    Huynen M. A., P. Bork, 1998 Measuring genome evolution Proc. Natl. Acad. Sci. USA 95:5849-5856[Abstract/Free Full Text]

    Kelley W. L., 1998 The J-domain family and the recruitment of chaperone power Trends Biochem. Sci 23:222-227[ISI][Medline]

    Lecointre G., H. Philippe, H. L. V. L, H. Le Guyader, 1993 Species sampling has a major impact on phylogenetic inference Mol. Phylogenet. Evol 2:205-224[Medline]

    Li W.-H., 1997 Molecular evolution Sinauer, Sunderland, Mass

    Maddison W. P., D. R. Maddison, 1992 MacClade: analysis of phylogeny and character evolution Sinauer, Sunderland, Mass

    Marcotte E. M., M. Pellegrini, H. L. Ng, D. W. Rice, T. O. Yeates, D. Eisenberg, 1999 Detecting protein function and protein-protein interactions from genome sequences Science 285:751-753[Abstract/Free Full Text]

    Matsumoto-Taniura N., F. Pirollet, R. Monroe, L. Gerace, J. M. Westendorf, 1996 Identification of novel M phase phosphoproteins by expression cloning Mol. Biol. Cell 7:1455-1469[Abstract]

    Miller S. M., D. L. Kirk, 1999 GlsA, a Volvox gene required for asymmetric division and germ cell specification, encodes a chaperone-like protein Development 126:649-658[Abstract/Free Full Text]

    Nei M., S. Kumar, K. Takahashi, 1998 The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small Proc. Natl. Acad. Sci. USA 95:12390-12397[Abstract/Free Full Text]

    Nelson R. J., T. Ziegelhoffer, C. Nicolet, M. Werner-Washburne, E. A. Craig, 1992 The translation machinery and 70 kD heat shock protein cooperate in protein synthesis Cell 71:97-105[ISI][Medline]

    Nikoh N., N. Hayase, N. Iwabe, K. Kuma, T. Miyata, 1994 Phylogenetic relationship of the kingdoms Animalia, Plantae, and Fungi, inferred from 23 different protein species Mol. Biol. Evol 11:762-768[Abstract/Free Full Text]

    Ouzounis C. A., A. G. Papavassiliou, 1997 DNA-binding motifs of eukaryotic transcription factors Pp. 1–21 in A. G. Papavassiliou, ed. Transcription factors in eukaryotes. R. G. Landes, Austin, Tex

    Philippe H., J. Laurent, 1998 How good are deep phylogenetic trees? Curr. Opin. Genet. Dev 8:616-623[ISI][Medline]

    Rice W. R., 1989 Analyzing tables of statistical tests Evolution 43:223-225[ISI]

    Rodriguez-Monge L., C. A. Ouzounis, N. C. Kyrpides, 1998 A ferredoxin-like domain in RNA polymerase 30/40-kDa subunits Trends Biochem. Sci 23:169-170[ISI][Medline]

    Rose A., I. Meier, U. Wienand, 1999 The tomato I-box binding factor LeMYBI is a member of a novel class of Myb-like proteins Plant J 20:641-652[ISI][Medline]

    Rubin G. M., M. D. Yandell, J. R. Wortman, et al. (55 co-authors). 2000 Comparative genomics of the eukaryotes Science 287:2204-2215[Abstract/Free Full Text]

    Saitou N., M. Nei, 1987 The neighbor-joining method—a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]

    Sanderson M. J., M. F. Wojciechowski, J. M. Hu, T. S. Khan, S. G. Brady, 2000 Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants Mol. Biol. Evol 17:782-797[Abstract/Free Full Text]

    Shoji W., T. Inoue, T. Yamamoto, M. Obinata, 1995 MIDA1, a protein associated with Id, regulates cell growth J. Biol. Chem 270:24818-24825[Abstract/Free Full Text]

    Snel B., P. Bork, M. Huynen, 2000 Genome evolution. Gene fusion vs. gene fission Trends Genet 16:9-11[ISI][Medline]

    Steele R. E., N. A. Stover, M. Sakaguchi, 1999 Appearance and disappearance of Syk family protein-tyrosine kinase genes during metazoan evolution Gene 239:91-97[ISI][Medline]

    Swofford D. L., 2000 PAUP* Phylogenetic analysis using parsimony (*and other methods). Version 4.0b4a. Sinauer, Sunderland, Mass

    Swofford D. L., G. J. Olsen, P. J. Waddell, D. M. Hillis, 1996 Phylogenetic inference Pp. 407–514 in D. M. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. Sinauer, Sunderland, Mass

    Takahashi K., M. Nei, 2000 Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used Mol. Biol. Evol 17:1251-1258[Abstract/Free Full Text]

    Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W—improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    Wainright P. O., G. Hinkle, M. L. Sogin, S. K. Stickel, 1993 Monophyletic origins of the metazoa: an evolutionary link with fungi Science 260:340-342[ISI][Medline]

    Wang D. Y. C., S. Kumar, S. B. Hedges, 1999 Divergence time estimates for the early history of animal phyla and the origin of plants, animals and fungi Proc. R. Soc. Lond. B Biol. Sci 266:163-171[ISI][Medline]

    Wilhelm M. L., J. Reinbolt, J. Gangloff, G. Dirheimer, F. X. Wilhelm, 1994 Transfer RNA binding protein in the nucleus of Saccharomyces cerevisiae. FEBS Lett 349:260-264[ISI][Medline]

    Wolf Y. I., S. E. Brenner, P. A. Bash, E. V. Koonin, 1999 Distribution of protein folds in the three superkingdoms of life Genome Res 9:17-26[Abstract/Free Full Text]

    Yan W., B. Schilke, C. Pfund, W. Walter, S. Kim, E. A. Craig, 1998 Zuotin, a ribosome-associated DnaJ molecular chaperone EMBO J 17:4809-4817[Abstract/Free Full Text]

    Yang Z., 1994 Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods J. Mol. Evol 39:306-314[ISI][Medline]

    ———. 1997 How often do wrong models produce better phylogenies? Mol. Biol. Evol 14:105-108[Free Full Text]

    Zhang S., C. Lockshin, A. Herbet, E. Winter, A. Rich, 1992 Zuotin, a putative Z-DNA binding protein in Saccharomyces cerevisae. EMBO J 11:3787-3796[Abstract]

    Zharkikh A., W.-H. Li, 1992 Statistical properties of bootstrap estimation of phylogenetic estimation of phylogenetic variability from nucleotide sequences. II. Four taxa without a molecular clock J. Mol. Evol 35:356-366[ISI][Medline]

    ———. 1995 Estimation of confidence in phylogeny: the complete-and-partial bootstrap technique Mol. Phylogenet. Evol 4:44-63[Medline]

Accepted for publication March 23, 2001.