Department of Organismic and Evolutionary Biology, Harvard University
Abstract
Two spurious nodes were found in phylogenetic analyses of vertebrate rhodopsin sequences in comparison with well-established vertebrate relationships. These spurious reconstructions were well supported in bootstrap analyses and occurred independently of the method of phylogenetic analysis used (parsimony, distance, or likelihood). Use of this data set of vertebrate rhodopsin sequences allowed us to exploit established vertebrate relationships, as well as the considerable amount known about the molecular evolution of this gene, in order to identify important factors contributing to the spurious reconstructions. Simulation studies using parametric bootstrapping indicate that it is unlikely that the spurious nodes in the parsimony analyses are due to long branches or other topological effects. Rather, they appear to be due to base compositional bias at third positions, codon bias, and convergent evolution at nucleotide positions encoding the hydrophobic residues isoleucine, leucine, and valine. LogDet distance methods, as well as maximum-likelihood methods which allow for nonstationary changes in base composition, reduce but do not entirely eliminate support for the spurious resolutions. Inclusion of five additional rhodopsin sequences in the phylogenetic analyses largely corrected one of the spurious reconstructions while leaving the other unaffected. The additional sequences not only were more proximal to the corrected node, but were also found to have intermediate levels of base composition and codon bias as compared with neighboring sequences on the tree. This study shows that the spurious reconstructions can be corrected either by excluding third positions, as well as those encoding the amino acids Ile, Val, and Leu (which may not be ideal, as these sites can contain useful phylogenetic signal for other parts of the tree), or by the addition of sequences that reduce problems associated with convergent evolution.
Introduction
Phylogenetic analysis is a complex problem in inference. It is therefore not surprising that all existing phylogenetic methods are known to fail under some conditions and for a variety of reasons. In recent years, several issues have emerged as particularly thorny. Use of an oversimplified model of molecular evolution or strong violation of the assumptions of a model can result in convergence to an incorrect topology with greater certainty as sequence length increases (i.e., inconsistency). This type of problem is particularly relevant to parsimony analyses, especially in cases in which some branches are much longer than others, a problem which has been dubbed "long-branch attraction" (Felsenstein 1978
). Phylogenetic methods based on explicit models of evolution, such as distance and maximum likelihood, tend to be less vulnerable to this type of problem, but even these are known to display inconsistency under conditions where their assumptions are strongly violated (Hillis, Huelsenbeck, and Cunningham 1994
; Gaut, and Lewis 1995
; Yang 1996
; Huelsenbeck 1997
; Sullivan and Swofford 1997
; Huelsenbeck 1998
). In addition, although taxon sampling has long been an important issue in phylogenetic analyses, it remains difficult to establish reasonable guidelines for sampling and to assess the effects that it may have on the accuracy of tree topologies (Hillis 1996, 1998
; Poe 1998
; Rannala et al. 1998
).
Determining the particular conditions under which phylogenetic methods fail is critical to both understanding their limitations and developing new, improved models and algorithms better suited to the analysis of molecular data. For example, third positions have been thought to be problematic in many data sets due to the effects of base compositional bias (Saccone, Pesole, and Preparata 1989
; Sidow and Wilson 1990
; Sogin, Hinkle, and Lelpe 1993
). This has led to the development of models that incorporate nonstationary changes in base composition for both distance and likelihood phylogenetic methods (Lockhart et al. 1994
; Steel 1994
; Galtier and Gouy 1995, 1998
). However, in practice, it is often difficult to identify concrete examples of failure of phylogenetic methods in real data sets and to pinpoint the reasons for that failure. Another common feature of molecular data sets that may cause phylogenetic methods to fail is variation in codon bias across the tree, but examples of this in real data sets have yet to be isolated, and the challenges they pose for phylogenetic reconstruction have only just begun to be addressed (Goldman and Yang 1994
; Muse 1996
; Yang 1997
).
Rhodopsin is an ideal genetic system for exploring issues in phylogenetic reconstruction, because it has been cloned from a variety of species, and much is known about its function and molecular evolution (Chang et al. 1995, 1996
; Baylor 1996
; Baylor and Burns 1998
; Bowmaker 1998
; Sakmar 1998
; Townson et al. 1998
). Rhodopsin is a single-copy nuclear gene encoding a seven-transmembrane G-proteincoupled receptor which forms the first step in the visual transduction cascade in the photoreceptors of the eye (Nathans 1992
). In vertebrates, it is expressed at high levels in a single cell type, rod photoreceptor cells (Khorana 1992
; Chang et al. 1996
; Baylor and Burns 1998
; Sakmar 1998
). Rhodopsin has been found to exist in more than one copy only in rare instances, for example, in polyploid animals such as the carp, Cyprinus carpio (Larhammar and Risinger 1994
). Most important for this study, phylogenetic relationships among vertebrates, for which rhodopsin sequences are available, have been well-characterized using fossil, morphological, and molecular data (Carroll 1997
).
This study takes advantage of well-established vertebrate relationships to examine in detail molecular evolutionary forces which result in spurious reconstructions in a data set of vertebrate rhodopsin sequences. Once these factors have been identified, methods are explored to reduce their effects and eliminate the spurious reconstructions.
Materials and Methods
Rhodopsin sequences were obtained from the GenBank database via NCBI's website (http://www.ncbi.nlm.nih.gov/genbank/). GenBank accession numbers for all the sequences used are given in table 1 . Rhodopsin cDNA sequences were aligned using CLUSTAL W and modified by hand to allow only gaps between codons. This file was then translated to yield an equivalently aligned amino acid rhodopsin data set. Parsimony, distance, and maximum-likelihood phylogenetic analyses were performed using a beta-test version of PAUP*, version 4 (Swofford 1999
). Trees were rooted using the lamprey sequence as an outgroup. In addition, many of the analyses also included four paralogous rodlike cone opsin genes (GenBank accession numbers: gekko blue, M92035; chick green, M92038; goldfish green1, L11865; goldfish green2, L11866) as outgroup sequences in order to confirm the position of the root (Chang et al. 1995
). The results of these analyses confirmed the position of the lamprey as the most basally diverging vertebrate rhodopsin.
|
In addition to equally weighted parsimony analyses, 2:1 transversion (Tv) : transition (Ts) weighting was also used. Although other weighting schemes were explored, they produced less reliable trees (data not shown). In addition, this weighting scheme reflects the likelihood estimate of the Tv/Ts ratio (1.5). Distance bootstrap analyses were performed using the HKY85+, HKY85, and K2P models and the neighbor-joining algorithm.
In order to assess phylogenetic signal in the data set, 10,000 random trees were generated in PAUP* to calculate g1 statistics (Hillis and Huelsenbeck 1992
). In addition, two measures of codon bias, scaled
2 and effective number of codons (ENC) (Shields et al. 1988
; Wright 1990
), were calculated to assess codon usage in each taxon. Measures of nucleotide and codon bias were calculated using the program MEA (generously provided by its author, E. Moriyama).
To test for long-branch attraction (Huelsenbeck 1998
), 100 data sets were simulated by parametric bootstrapping using the program SIMINATOR (Huelsenbeck, Hillis, and Jones 1996
) with parameters estimated from the original rhodopsin sequence data set. The simulated data sets were subsequently analyzed using equally weighted maximum parsimony in PAUP*, version 4, with 100 replications of (nonparametric) bootstrapping, 10 random-addition replicates each.
Results
Phylogenetic Analyses
Phylogenetic analyses were performed on a data set of 20 vertebrate rhodopsin nucleotide sequences (table 1
). Although this data set showed high levels of genetic variation (table 2
) and generally performed well in reconstructing traditional relationships among vertebrates, phylogenetic analyses consistently show substantial bootstrap support for two groupings which contradict established vertebrate relationships: reptiles and amphibians form a clade (fig. 1B
), instead of the more traditional reptiles and mammals (fig. 1A
), and alligator and anolis form a clade (fig. 2B
), instead of alligator and chicken (fig. 2A
). In parsimony analysis with equal weights (table 3
), bootstrap support was 86% for the reconstruction of amphibians as the sister group to reptiles (this node is hereinafter referred to as amph+rept) and 72% for the grouping of alligator + anolis as the sister lineage to the chicken (this node is hereinafter referred to as gator+anol). Support for these resolutions is robust to changes in the relative weightings of transversions and transitions: 91% for rept+amph and 65% for gator+anol with 2-to-1 Tv/Ts (table 3
). Less than 5% bootstrap support was seen for more accepted resolutions of these nodes. This is in contrast to the robust support for established relationships elsewhere in the tree (fig. 3
). Note that this data set, like many other molecular data sets, does not recover the Glires clade (rodents + rabbits), but instead places the rodents basal to a clade containing artiodactyls and other mammals. On the other hand, most morphological data recover the Glires (de Jong 1998
). The Glires controversy is beyond the scope of this paper and does not influence its major observations.
|
|
|
|
|
|
When analyzed alone, third positions showed substantial support for the spurious resolutions (68% for rept+amph, 41% for gator+anol) and no support for the well-corroborated relationships, an effect which was robust to changes in Tv/Ts weighting (table 3 and fig. 4 ). Analyses of the amino acid sequences, which should be free of the base compositional and codon bias effects particularly problematic for third-base positions and transitions, did not show any support for the incongruent relationships (table 3 ). However, the bootstrap phylogeny based on amino acids was rather poorly resolved in general (fig. 5 ).
|
Given the variation in base composition in this data set, especially at third positions (see table 1
), analyses using LogDet/paralinear distance methods (Lake 1994
; Lockhart et al. 1994
; Steel 1994
) were performed. These methods allow for nonstationary changes in base composition among sequences in a phylogeny and would be expected to perform better for data sets where this is a problem. Phylogenetic bootstrap analyses using LogDet distances did show reduced support for the problematic reconstructions (56% for rept+amph and 38% for gator+anol; table 3
and fig. 6 ). Moreover, for one of the problematic nodes, there was also slightly increased support for the correct reconstruction (38% for chick+gator; table 3
).
|
|
|
|
Finally, it has been suggested that hydrophobic amino acids may be less useful for phylogenetic reconstruction than other amino acids (Naylor and Brown 1997
). To explore the effects of hydrophobic amino acids in the rhodopsin data set, nucleotide positions encoding the hydrophobic amino acids Ile, Leu, and Val were excluded in a parsimony analysis (189 nucleotide positions excluded, representing 63 amino acids). This analysis showed greatly reduced bootstrap support for the spurious resolutions (<5% for rept+amph and 33% for gator+anol, 2:1 Tv/Ts; table 3 ), indicating that positions encoding for these amino acids may underlie the spurious signal. If the spurious signal was due mainly to functional constraints on these hydrophobic amino acids, then excluding third positions should not affect the analysis. This was not the case, as the effect remained even when only third positions of the hydrophobic amino acids Ile, Leu, and Val were excluded (table 3
).
Statistical Tests Comparing Trees
Several statistical tests were performed using the rhodopsin nucleotide data set in order to determine if phylogenies with and without the two spurious reconstructions were significantly different. The Templeton (1983)
test and the "winning sites" test (Prager and Wilson 1988
) compare trees under the parsimony criteria, whereas the Kishino-Hasegawa test (Kishino and Hasegawa 1989
) was formulated to compare trees under either likelihood or parsimony. Tests under the parsimony criteria are shown in table 5
, and tests under the likelihood criteria are shown table 6
. Each of the two spurious reconstructions (rept+amph, gator+anol) was tested separately in pairwise tests of trees with and without each spurious reconstruction. These tests confirmed the results of the phylogenetic bootstrap analysis, pinpointing third positions and nucleotides encoding Ile, Leu, and Val as the sites supporting the spurious reconstructions. Although neither spurious reconstruction (rept+amph, gator+anol) was significantly better with all nucleotide sites included, when only third positions and sites encoding Ile, Leu, or Val were considered, trees with the spurious reconstructions became significantly better than those without. This was true under both parsimony (table 5
) and likelihood (table 6
). Conversely, when only first and second positions, excluding those sites encoding Ile, Leu, or Val, were considered, the tree without spurious reconstructions was found to be better than either one of the trees with the spurious reconstructions. This result was significant under parsimony, but not under likelihood (tables 5 and 6
).
|
|
Results from parsimony bootstrap analysis of the 100 simulated data sets are graphed in figure 8B and C, representing the expected null distribution of parsimony bootstrap values for each spurious reconstruction (rept+amph, gator+anol). Note that support for these spurious clades was being examined under conditions where the data were simulated from topologies reflecting the more established relationships (rept+mamm, gator+chick). The median level of bootstrap support for the incorrect rept+amph clade was 10.5% and that for the gator+anol clade was 19% in the simulated data sets. In the real rhodopsin data set, bootstrap support for both spurious resolutions was significantly higher than expected from the null distribution of simulated data sets generated by parametric bootstrapping (86% for rept+amph and 72% for gator+anol; P < 0.05 in both cases). This indicates that the level of support seen for the problematic reconstructions is higher than would be expected given the conditions of the simulations, and therefore unlikely to be due to long-branch attraction.
Base Composition and Codon Bias Measures
Since the results of the phylogenetic analyses and statistical tests comparing phylogenies implied that third positions, as well as transitions, underlie the bootstrap support of the spurious reconstructions, base composition and codon bias measures were examined for evidence of convergent evolution. First- and second-position nucleotide compositions were fairly homogeneous across all sequences. However, at third positions, reptile and amphibian rhodopsins tended to have lower %GC than other sequences (table 1
). This pattern of convergent evolution may confound phylogenetic analyses and result in the spurious grouping, as shown by mapping the GC content on the phylogeny (fig. 9
). Furthermore, amphibian and reptile rhodopsins are less biased in their codon usage, as shown by scaled 2 and ENC codon bias measures, than are the rhodopsins of other vertebrate groups (table 1
). Not only are there convergences in the overall degree of codon bias, but there are also convergences in the usage frequencies of specific codons that reflect the spurious groupings. This convergent pattern was evident when the codon usage frequencies were mapped on a tree. For example, convergences in the frequency of GGC, one of four codons coding for glycine, are shown mapped on the tree in figure 9
.
|
Effect of Increased Sampling
If the spurious reconstructions seen in this data set were due to convergent evolution, perhaps better sampling across the tree could ameliorate this effect. Rhodopsin sequences from five basally diverging taxa that were recent additions to GenBank were added to the data set: sea lamprey, Conger eel, Anguilla eel, skate, and Myripristis berndti, a holocentrid marine fish (table 1
). It is important to note that not only do these sequences represent basal species poorly sampled in the original data set, but several of them also display values of base composition at third positions and/or codon bias quite different from their closest neighbors on the tree, and are thus more likely to "break up" convergent effects.
Myripristis berndti and Anguilla rhodopsins have only 67.81% and 73.65% GC content at third positions, as compared with other fish rhodopsins, which average 80.16% (table 1
). Similarly, skate rhodopsin has much lower %GC at third positions (70.70%) than the nearest basal lineage, lamprey rhodopsin (87.57%). The two measures of codon bias, scaled 2 and ENC, also showed the M. berndti, skate, and Conger rhodopsins to be atypically low in codon bias compared with neighboring fish and lamprey sequences (table 1
).
For this expanded data set, equally weighted parsimony analysis of all positions showed reduced bootstrap support for the spurious rept+amph clade (48%) as compared with the original data set (86% without the additional sequences) and increased support for the correct rept+mamm clade, which rose from <5% in the original data set (table 3 ) to 25% in the expanded data set (table 7 ). Unlike analyses of the original data set, in which there was virtually no difference between equal weights versus 2-to-1 Tv/Ts weighting schemes, analysis of the expanded data set was highly sensitive to differences in weighting, particularly in the resolution of the reptile-mammal-amphibian node. When Tv/Ts weighting was used, bootstrap support for the correct rept+mamm clade jumped from 25% (equal weights) to 70% (2:1 Tv/Ts; fig. 10 ). In contrast, bootstrap support for the spurious gator+anol clade remains substantial in the analysis of the expanded data set (73%), and the high degree of sensitivity to differences in Tv/Ts weighting was not seen here (table 7 ).
|
|
The patterns of bootstrap support in distance analyses of the expanded data set (table 7
) remained very similar to those of the original data set (table 3
), with very little difference in support between the models used, showing neither decreased support for spurious resolutions nor increased support for correct resolutions. Maximum-likelihood reconstructions under HKY85+ in the expanded data set also showed results similar to those found for the original data set and did not show reduced support for the spurious nodes nor heightened support for the correct nodes in the expanded data set (table 7
).
Statistical comparisons of trees with and without the spurious reconstructions (rept+amph, gator+anol) were consistent with the phylogenetic bootstrap analyses. A tree with the gator+anol clade was still better than one without this spurious reconstruction when only third positions and sites encoding Ile, Leu, and Val were considered. This result was significant under the parsimony criterion (table 5 ) and not quite significant under the likelihood criterion (P = 0.07; table 6 ). However, the sites which clearly supported the spurious gator+anol reconstruction in both the original and extended data sets and also supported the spurious rept+amph reconstruction in the original data set were no longer capable of distinguishing between a tree with the spurious rept+amph reconstruction and one without in the extended data set (tables 5 and 6 ). This result is again consistent with the phylogenetic bootstrap analyses, which suggest that the additional sequences aid in breaking up convergences among the sequences, but only for the spurious rept+amph reconstruction, which is more proximal to the additional sequences, leaving the spurious gator+anol reconstruction largely unaffected.
Discussion
Our results indicate that the two problematic reconstructions in the original rhodopsin data set were probably not the result of topological effects such as long-branch attraction. This is demonstrated by the persistence of these spurious nodes when maximum-likelihood methods were used and by the fact that the bootstrap support for these spurious nodes was well outside of the distribution of support obtained for each node from simulated data sets generated by parametric bootstrapping. Rather, these spurious reconstructions were most likely due to convergences in base compositional bias at third positions, in codon bias, and in positions encoding for the hydrophobic amino acids Ile, Val, and Leu, which tend to group unrelated sequences. This represents a strong violation of phylogenetic model assumptions of stationary base composition and codon frequencies across the tree, which would cause methods not directly addressing these problems to fail under these conditions.
Base compositional bias at third positions has often been found to be problematic for phylogenetic reconstruction, and several methods have been developed in an attempt to address this problem (Lockhart et al. 1994
; Galtier and Gouy 1995, 1998
). Although these methods did reduce support for the spurious reconstructions in the rhodopsin data set, they were not completely effective in eliminating the problematic nodes, and it seems clear that base compositional bias is not the only reason for the spurious nodes. In fact, simulation studies on a data set of bat sequences have shown that levels of base compositional bias must be extremely high (>90% AT) in order to show any evidence of spurious reconstructions (Van Den Bussche et al. 1998
). Although fairly high, levels of base compositional bias are not so extreme in the rhodopsin data set.
In addition to base compositional bias, convergent effects in codon bias and in positions encoding hydrophobic amino acids also appear to be supporting the spurious reconstructions in the rhodopsin data set. Other phylogenetic studies that have also found problematic reconstructions have attributed these to various problems such as not incorporating rate heterogeneity across sites into the phylogenetic model (Takezaki and Gojobori 1999
), which is clearly not the case here. However, there is growing evidence that convergent or parallel evolution at the level of nucleotides (or amino acids) is a common feature of many molecular data sets and may pose a significant challenge in attempting to reconstruct unbiased phylogenies (Naylor and Brown 1997, 1998
; Cao et al. 1998
; Foster and Hickey 1999
; Lee 1999
). In particular, nucleotide sites encoding the hydrophobic amino acids Ile, Leu, and Val have been shown in other studies to display lower retention indices than other sites (Naylor and Brown 1997
), and the analyses of the rhodopsin data set presented here provide more evidence of the importance of this effect. The reasons for it still remain unclear but may be related to relaxed constraints on hydrophobic amino acids contained within transmembrane domains.
There are several ways to address these problems of bias in base composition, codon frequencies, and sites encoding hydrophobic amino acids. All of these positions could be excluded from a parsimony phylogenetic analysis. This method can be effective in principle, but in fact may not be ideal, as these positions often contain useful phylogenetic signal in addition to the spurious signal, and excluding them can result in loss of resolution in the phylogenetic reconstructions (e.g., see Campbell, Brower, and Pierce 2000). Another way of addressing this problem would be to develop more complex models of evolution which incorporate these assumptions about base composition, codon bias, and amino acid composition. However, this may require the addition of many more parameters to the model, which may become problematic.
In addition to advances in phylogenetic methodology, this problem may be effectively addressed, albeit indirectly, via better sampling of species. Note that here "better sampling" means the addition of sequences not only proximal to problematic nodes, but also intermediate in base composition and codon bias. In other words, it is not only important when considering sampling issues to "break up" long branches that can lead to the failure of methods such as parsimony, but even more important to "break up" convergences in base composition and codon bias that can cause all types of phylogenetic methods, not just parsimony, to fail. In fact, it should be noted that of all the phylogenetic methods used here, only weighted parsimony methods are able to recover the correct topology once appropriately sampled sequences are included in the analysis, and thus these methods outperform both distance and maximum-likelihood methods in this regard. This may reflect greater sensitivity of maximum-likelihood and distance methods to incorrect assumptions in the underlying models (with respect to nonstationary nucleotide and codon bias and hydrophobic sites) in comparison with parsimony methods, which sometimes may prove more robust to violations of these assumptions despite the fact that maximum-likelihood methods are known to be consistent over a larger set of conditions than are parsimony methods (Hillis, Huelsenbeck, and Cunningham 1994
; Huelsenbeck 1997
; Sullivan and Swofford 1997
).
Acknowledgements
We thank Z. Yang, R. Honeycutt, and two anonymous reviewers for many helpful comments on the manuscript, and N. Pierce and M. Donoghue for discussion and advice. B.S.W.C. is an NSF/Alfred P. Sloan Fellow in Molecular Evolution.
Footnotes
Rodney Honeycutt, Reviewing Editor
1 Present address: Department of Molecular Biology and Biochemistry, Rockefeller University.
2 Present address: Department of Biology, University of Maryland, College Park.
3 Keywords: molecular evolution
hydrophobic amino acids
base compositional bias
codon bias
parametric bootstrapping
4 Address for correspondence and reprints: Belinda S. W. Chang, Rockefeller University, 1230 York Ave., Box 284, New York, New York 10021. E-mail: changb{at}rockvax.rockefeller.edu
literature cited
Baylor, D. 1996. How photons start vision. Proc. Natl. Acad. Sci. USA 93:560565.
Baylor, D. A., and M. E. Burns. 1998. Control of rhodopsin activity in vision. Eye 12:521525.
Bowmaker, J. 1998. Evolution of colour vision in vertebrates. Eye 12:541547.
Campbell, D. L., A. V. Z. Brower, and N. E. Pierce. 2000. Molecular evolution of the Wingless gene and its implications for the phylogenetic placement of the butterfly family Riodinidae (Lepidoptera: Papilionoidea). Mol. Biol. Evol. 17:684696.
Cao, Y., A. Janke, P. J. Waddell, M. Westerman, O. Takenaka, S. Murata, N. Okada, S. Paabo, and M. Hasegawa. 1998. Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders. J. Mol. Evol. 47:307322.[ISI][Medline]
Carroll, R. L. 1997. Patterns and processes of vertebrate evolution. Cambridge University Press, Cambridge, England.
Chang, B. S. W., D. Ayers, W. C. Smith, and N. E. Pierce. 1996. Cloning of the gene encoding honeybee long-wavelength rhodopsin: a new class of insect visual pigments. Gene 173:215219.
Chang, B. S. W., K. S. Crandall, J. P. Carulli, and D. L. Hartl. 1995. Opsin phylogeny and evolution: a model for blue shifts in wavelength regulation. Mol. Phylogenet. Evol. 4:3143.[Medline]
de Jong, W. W. 1998. Molecules remodel the mammalian tree. Trends Ecol. Evol. 13:270275.[ISI]
Felsenstein, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27:401410.[ISI]
. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368376.[ISI][Medline]
. 1991. PHYLIP: phylogeny inference package. Version 3.4. University of Washington, Seattle.
Foster, P. G., and D. A. Hickey. 1999. Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J. Mol. Evol. 48:284290.[ISI][Medline]
Galtier, N., and M. Gouy. 1995. Inferring phylogenies from sequences of unequal base compositions. Proc. Natl. Acad. Sci. USA 92:1131711321.
. 1998. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15:871879.[Abstract]
Gaut, B. S., and P. O. Lewis. 1995. Success of maximum likelihood phylogeny inference in the four-taxon case. Mol. Biol. Evol. 12:152162.[Abstract]
Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725736.
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:672677.
Hillis, D. M. 1996. Inferring complex phylogenies. Nature 383:130131.
. 1998. Taxonomic sampling, phylogenetic accuracy, and investigator bias. Syst. Biol. 47:38.[ISI][Medline]
Hillis, D. M., and J. P. Huelsenbeck. 1992. Signal, noise, and reliability in molecular phylogenetic analyses. J. Hered. 83:189195.[ISI][Medline]
Hillis, D. M., J. P. Huelsenbeck, and C. W. Cunningham. 1994. Application and accuracy of molecular phylogenies. Science 164:671677.
Huelsenbeck, J. P. 1997. Is the Felsenstein zone a fly trap? Syst. Biol. 46:6974.
. 1998. Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved? Syst. Biol. 47:519537.
Huelsenbeck, J. P., D. M. Hillis, and R. Jones. 1996. Parametric bootstrapping in molecular phylogenetics: applications and performance. Pp. 1945 in J. D. Ferraris and S. R. Palumbi, eds. Molecular zoology. Wiley and Sons, New York.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.
Khorana, H. G. 1992. Rhodopsin, photoreceptor of the rod cell. J. Biol. Chem. 267:14.
Kimura, M. 1980. A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111120.[ISI][Medline]
Kishino, H., and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29:170179.[ISI][Medline]
Lake, J. A. 1994. Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc. Natl. Acad. Sci. USA 91:14551459.
Larhammar, D., and C. Risinger. 1994. Molecular genetic aspects of tetraploidy in the common carp, Cyprinus carpio. Mol. Phylogenet. Evol. 1:5968.
Lee, M. S. Y. 1999. Molecular phylogenies become functional. Trends Ecol. Evol. 14:177178.[ISI][Medline]
Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11:605612.
Muse, S. V. 1996. Estimating synonymous and nonsynonymous substitution rates. Mol. Biol. Evol. 13:105114.[Abstract]
Nathans, J. 1992. Rhodopsin: structure, function, and genetics. Biochemistry 31:49234931.
Naylor, G. J. P., and W. M. Brown. 1997. Structural biology and phylogenetic estimation. Nature 388:527528.
. 1998. Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst. Biol. 47:6176.[ISI][Medline]
Poe, S. 1998. The effect of taxonomic sampling on accuracy of phylogeny estimation: test case of a known phylogeny. Mol. Biol. Evol. 15:10861090.
Prager, E. M., and A. C. Wilson. 1988. Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences. J. Mol. Evol. 27:326335.[ISI][Medline]
Rannala, B., J. P. Huelsenbeck, Z. Yang, and R. Nielsen. 1998. Taxon sampling and the accuracy of large phylogenies. Syst. Biol. 47:702710.[ISI][Medline]
Saccone, C., G. Pesole, and G. Preparata. 1989. DNA microenvironments and the molecular clock. J. Mol. Evol. 29:407411.[ISI][Medline]
Sakmar, T. P. 1998. Rhodopsin: a prototypical G protein-coupled receptor. Prog. Nucleic Acid Res. Mol. Biol. 59:134.[ISI][Medline]
Shields, D. C., P. M. Sharp, D. G. Higgins, and F. Wright. 1988. "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704716.[Abstract]
Sidow, A., and A. C. Wilson. 1990. Compositional statistics: an improvement of evolutionary parsimony and its deep branches in the tree of life. J. Mol. Evol. 31:5168.[ISI][Medline]
Sogin, M. L., G. Hinkle, and D. D. Lelpe. 1993. Universal tree of life. Nature 362:795.
Steel, M. 1994. Recovering a tree from the Markov leaf colourations it generates under a Markov model. Appl. Math. Lett. 7:1923.
Sullivan, J., and D. L. Swofford. 1997. Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J. Mamm. Evol. 4:7786.
Swofford, D. L. 1999. PAUP*, phylogenetic analysis using parsimony (*and other methods). Version 4.0. Sinauer, Sunderland, Mass.
Takezaki, N., and T. Gojobori. 1999. Correct and incorrect vertebrate phylogenies obtained by the entire mitochondrial DNA sequences. Mol. Biol. Evol. 16:590601.[Abstract]
Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the humans and apes. Evolution 37:221244.
Townson, S. M., B. S. W. Chang, E. Salcedo, L. Chadwell, N. E. Pierce, and S. G. Britt. 1998. Isolation and physiological characterization of the genes encoding the blue and ultraviolet sensitive opsins of the honeybee, Apis mellifera. J. Neurosci. 18:24122422.
Van Den Bussche, R. A., R. J. Baker, J. P. Huelsenbeck, and D. M. Hillis. 1998. Base compositional bias and phylogenetic analyses: a test of the "flying DNA" hypothesis. Mol. Phylogenet. Evol. 10:408416.[ISI][Medline]
Wright, F. 1990. The effective number of codons' used in a gene. Gene 87:2329.
Yang, Z. 1993. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10:13961401.[Abstract]
. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39:105111.[ISI][Medline]
. 1996. Phylogenetic analysis using parsimony and likelihood methods. J. Mol. Evol. 42:294307.[ISI][Medline]
. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555556.[Medline]
Yang, Z., N. Goldman, and A. Friday. 1994. Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. Mol. Biol. Evol. 11:316324.[Abstract]