Department of Ecology and Evolutionary Biology, University of California at Irvine
Misión Biológica de Galicia (CSIC), Pontevedra, Spain
Instituto de Investigaciones Agrobiológicas de Galicia (CSIC), Santiago de Compostela, Spain
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The family Drosophilidae exhibits extensive nucleotide composition variation (Rodríguez-Trelles, Tarrío, and Ayala 1999, 2000a, 2000b
; Tarrío, Rodríguez-Trelles, and Ayala 2000
). Despite its relevance as a model for evolutionary studies, significant aspects of the phylogeny of this family remain unresolved. Two unsettled cases involve taxa with extremely low GC contents: (1) the position of the genus Chymomyza relative to the genera Scaptodrosophila and Drosophila, and (2) the monophyly of the Sophophora subgenus of the genus Drosophila. Morphological (Throckmorton 1975
; Grimaldi 1990
) and molecular (Kwiatowski et al. 1994
; Powell and DeSalle 1995
; Tatarenkov et al. 1999
) surveys agree that Chymomyza and Scaptodrosophila are distantly related to the rest of drosophilids, but the question of which one derived earlier remains uncertain. Because they are well known and easily available, these two lineages are often used as outgroups to Drosophila (Powell 1997
); therefore, knowing which of them originated first is important for correct assessment of the plesiomorphy in this genus. The monophyly of Sophophora is well established based on morphology (Throckmorton 1975
) and on the evolution of structural features of several genes (Wojtas et al. 1992
; Tatarenkov et al. 1999
). However, determination of the monophyletic status of this subgenus from the substitution process of the sequences has proven elusive. Molecular studies characteristically achieve weak bootstrap support for the critical node (Kwiatowski et al. 1994
; Russo, Takezaki, and Nei 1995
; Remsen and DeSalle 1998
; Kwiatowski and Ayala 1999
; Tatarenkov et al. 1999
), with some studies placing the willistoni (and its sister clade saltans) species group outside the Drosophila genus (Pélandakis and Solignac 1993
). The uncertainties remain despite an increasing number of nucleotide regions included in the analyses.
Current knowledge of the molecular systematics of the Drosophilidae is based on the strength of bootstrap support for nodes, but virtually no attention has been paid to the substitution models employed for the reconstruction of the trees (although the topic is discussed by Whitfield and Cameron [1998]
and Steel, Huson, and Lockhart [2000]
in connection with the evolution of mitochondrial rDNA genes in insects). The extensive nucleotide composition differences that occur among representatives of the family have been neglected in formulations of the substitution processes. The situation is aggravated because Ceratitis capitata, a member of the sister family Tephritidae, which is frequently used for rooting the tree of the Drosophilidae, exhibits a highly biased AT content. Additional potentially relevant parameters, such as the variation among nucleotide sites in their rates of substitution, have also been neglected.
In the present study, we address the systematics of the Drosophilidae with a focus on the substitution processes governing the evolution of the sequences. We adopted a maximum-likelihood (ML) framework of phylogenetic inference in order to investigate 4,650 nucleotide characters pertaining to five nuclear loci: alcohol dehydrogenase (Adh), dopa-decarboxilase (Ddc), glycerophosphate dehydrogenase (Gpdh), superoxide dismutase (Sod), and xanthine dehydrogenase (Xdh). We demonstrate that accounting for the large nucleotide composition differences among sequences yields a phylogeny that significantly differs from the relationships obtained when the heterogeneous GC content is omitted from the substitution model. Yet, the topologies obtained under the two different sets of assumptions were statistically highly supported in the bootstrap analyses. Our study (1) favors Chymomyza as the sister genus to Drosophila, with Scaptodrosophila derived earlier, and (2) confidently places the willistoni group within the Sophophora subgenus.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Sequences were aligned using the default option of CLUSTAL W, version 1.5 (Thompson, Higgins, and Gibson 1994
). After the removal of gaps and incompletely determined columns, the alignment of the five gene coding regions spanned 4,650 nucleotide positions: 513 from Adh, 963 from Ddc, 747 from Gpdh, 342 from Sod, and 2,085 from Xdh. To our knowledge, this is the largest number of regions and nucleotide characters jointly employed to investigate the phylogeny of the Drosophilidae.
Statistical Analyses
In order to control possible errors due to imperfect knowledge of the phylogeny, we considered two working tree topologies for model fitting. The first topology (hereinafter referred to as the first working topology) was the strict consensus of the topologies that resulted after applying the computer programs DNADIS, DNAML, and DNAPARS from the PHYLIP package (Felsenstein 1993
), using the default options with the five gene regions pooled together. This topology coincides with that shown in figure 3a. Drosophila mimica and Dorsilopha (busckii), not shown in the figure, are positioned according to the Adh + Sod + Xdh data as the sister clades to virilis-repleta and Zaprionus, respectively. The second topology represents the relationships proposed by Throckmorton (1975; see hypothesis 1 in table 4) on the basis of morphological data. In Throckmorton's (1975)
scheme, D. mimica and Dorsilopha form a trichotomy together with Hirtodrosophila. These two topologies are substantially different; use of other reasonable tree topologies for model fitting is not expected to change the best-fit models identified in this study (see Yang 1994
; Yang, Goldman, and Friday 1994
).
We considered two sets of nested models. Models in one set were all special forms of the general time-reversible (GTR) Markov process model (Tavaré 1986
; Yang 1994
), which allows for unequal nucleotide frequencies at equilibrium (A
C
G
T), and six substitution classes (two transition and four transversion types). The GTR model assumes that (1) the substitution pattern has remained constant over the tree (i.e., the uniformity premise), and (2) all lineages exhibit the same nucleotide composition (i.e., the stationarity premise). Models in the second set are nested versions of the model of Galtier and Gouy (1998) (hereinafter denoted T92+GC). This model is based on Tamura's (1992)
(T92) representation of the substitution process, which allows unequal transition and transversion rates, and GC
AT (with G = C and A = T) at equilibrium. Galtier and Gouy's (1998) implementation of the T92 model allows the nucleotide composition to change from branch to branch by assigning a different equilibrium GC content parameter to each branch. The model is neither homogeneous nor stationary, since equilibrium GC content can vary among lineages. Because the model lacks reversibility, trees are rooted.
Among-sites rate variation was accommodated into the models by treating rate differences among sites as a random effect using the discrete gamma distribution (eight equal-probability categories of rates, represented by the mean) with shape parameter (denoted as dG models). The value of
is inversely related to the extent of rate variation (Yang 1996
). Analyses were conducted with the BASEML program of PAML, version 2.0g (Yang 1999
), and the EVAL_NH and EVAL_NHG programs from the NHML package (Galtier and Gouy 1998; Galtier, Tourasse, and Gouy 1999
).
The relevance of specific parameters for describing the evolution of the sequences was evaluated by means of the likelihood ratio test (Yang 1994
; Huelsenbeck and Crandall 1997
). For a given tree topology (e.g., fig. 3a), a model (H1) with p parameters and log likelihood L1 fits the data significantly better than a nested submodel (H0) with q = p - n restrictions and likelihood L0 if the deviance 2
= 2 ln(L1/L0) = -2(ln L1 - log L0) falls in the rejection region of a
2n (where n represents degrees of freedom). Specifically for the test of rate constancy among sites, where the H0 (
=
) is equivalent to fixing
at the boundary of the parameter space of the H1 (
<
), 2
follows a 50:50 mixture of
2n-1 and
2n distributions (Whelan and Goldman 1999
). For this test, we used the critical values for the rejection of the H0 provided by Goldman and Whelan (2000)
.
Varying the parameter addition sequence can affect best-fit model selection (Cunninghan, Zhu, and Hillis 1998
). We took into account this potential source of bias by assaying different parameter addition sequences. Identified best models remained the same (results not shown).
The model found to satisfactorily describe the substitution process was used for generating candidate tree topologies by the distance-based neighbor-joining (NJ) criterion. Estimates of the shape parameter used in distance computation were those obtained simultaneously by the joint likelihood comparison of all sequences in the first stage, which can be considered the most reliable (Yang 1996
). NJ trees were generated using the best-fit model identified by the likelihood ratio test in the ML analysis. Statistical support for nodes of the NJ trees was assessed with the bootstrap method (retaining nodes representing >50% of 1,000 bootstrap replications; Felsenstein 1985
). Galtier and Gouy's (1995)
gamma distances and the NJ trees built from them were obtained with the GGG95 and SK programs, kindly provided by Dr. Nicolas Tourasse.
Phylogenetic hypotheses derived from the analyses were compared by the resampling estimated log likelihood (RELL) method of Kishino, Miyata, and Hasegawa (1990)
(as implemented in PAML 2.0g; Yang 1999
). For a given model of evolution, this test provides an estimate of the significance of a difference between the log likelihood scores of several candidate tree topologies.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We have shown that base composition in the Drosophilidae is nonstationary. Now we are interested in the pattern of compositional differences across taxa, because it can help to identify potential biases in reconstructed topologies elicited by the heterogeneous composition of the sequences. Figure 1
depicts the relationships inferred from the nucleotide composition of the Adh, Ddc, Gpdh, Sod, and Xdh sequences pooled together, taking into account the three codon positions. Similar cladograms were obtained for the five gene regions analyzed separately (results not shown). Because of its low GC content, Drosophila willistoni is repelled from its subgenus (i.e., Sophophora; herein represented by the GC-rich species Drosophila melanogaster and Drosophila pseudoobscura) and becomes associated with the cluster of GC-poor taxa Ceratitis, Chymomyza, and Hirtodrosophila. Scaptodrosophila, currently viewed as representing a different genus (Grimaldi 1990
; Kwiatowski et al. 1994
; Tatarenkov et al. 1999
), clusters with species (including the Drosophila subgenus) that exhibit intermediate GC contents. The GC contents of D. busckii and D. mimica, not shown in the figure, are also intermediate (50.3% and 51.0%, and 61.8% and 61.9%, in first-plus-second and in third codon positions, respectively, for Ddc, Sod, and Xdh combined). The relationships in figure 1
are strongly supported statistically (bootstrap values above the nodes are all at or near 100), reflecting the extensive GC content differences among taxa. The topology remains the same after excluding third codon positions, but the bootstrap support (values below the nodes) decreases, surely because fewer sites showing biased multiple substitution are included in the analysis.
|
|
|
|
Figure 2 shows the NJ trees derived from the GTR+dG and T92+dG+GC distance matrices using the Adh + Sod + Xdh data set (for which all species are available), and this data set combined separately with Ddc (D. busckii unavailable) and Gpdh (D. mimica unavailable). Combining the data sets results in increased resolution of the phylogeny and reveals conflicts between the GTR+dG and T92+dG+GC models in the resulting branching pattern of the topologies. The GTR+dG model always places Scaptodrosophila as more closely related to Drosophila than Chymomyza, and it places D. willistoni outside all other species of the Drosophila genus. In contrast, the T92+dG+GC model identifies Scaptodrosophila as the first derived lineage after Ceratitis (followed by Chymomyza) and places D. willistoni within the subgenus Sophophora. These two alternative branching patterns receive strong bootstrap support from their respective models. With regard to the remaining relationships, the two models are congruent across data sets in the well-resolved nodes. Both models support D. mimica as the sister lineage to the clade consisting of D. hydei + D. virilis and the association of D. melanogaster with D. pseudoobscura. In addition, the Adh + Ddc + Sod + Xdh data set supports inclusion within the subgenus Drosophila of Zaprionus, which derives first, followed successively by Hirtodrosophila, D. mimica, and the clade consisting of D. hydei + D. virilis.
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several ML approaches have been devised to deal with the problem of varying compositional biases between lineages. Galtier and Gouy's (1998) implementation of the Tamura (1992)
model is faster than other approximations (e.g., Yang and Roberts 1995
) as a tool for describing the substitution process (Galtier and Gouy 1998). The method has proven useful for the study of GC content evolution in mammals (Galtier and Mouchiroud 1998
), as well as Drosophila (Rodríguez-Trelles, Tarrío, and Ayala 2000c
), and also for inferring nucleotide composition of ribosomal RNA in the "cenancestor" (i.e., the most recent common ancestor of all extant life forms) (Galtier, Tourasse, and Gouy 1999
). However, the algorithm is computationally too time-demanding for tree reconstruction from data sets as large as ours (see Galtier and Gouy 1998). We circumvented this drawback by using the distance-based NJ implementation of the TN92+GC+dG model (Galtier and Gouy 1995
; Galtier, Tourasse, and Gouy 1999
) to infer the tree. This method outperforms ML and distance-based tree-making methods that assume homogeneous and stationary conditions, as well as maximum-parsimony methods in cases of heterogeneous base composition (Galtier and Gouy 1995
).
Probably the most popular distance correction for coping with the problem of heterogeneous base composition is the LogDet transformation (Lockhart et al. 1994
). Compared with Galtier and Gouy's (1995)
distance model, LogDet has the disadvantage that it generally does not yield the amount of change along branches and that it assumes that substitution rates are equal across sites (Lockhart et al. 1994
). Unlike Galtier and Gouy's (1995)
distance measure, LogDet distances cannot be directly modified to take account of a specific distribution of rates, such as the gamma distribution (see Swofford et al. 1996
). Inclusion of invariant sites in the LogDet calculation tends to underestimate the amount of change, and sites that vary greatly are problematic because of saturation (Lockhart et al. 1994
). It has been shown to be useful to exclude both these extremes by using only parsimony-informative sites (Lockhart et al. 1994
). Substitution rate varies widely from site to site in our data set (see table 3
). Therefore, we calculated LogDet distances using only parsimony-informative sites, considering first-plus-second (420 sites) or third (1,233 sites) codon positions. When parsimony sites from first-plus-second codon positions were used, 2 out of the 40 pairwise comparisons (i.e., D. pseudoobscura, and Hirtodrosophila vs. C. capitata) had negative determinants, for which the logarithm (and thus the distance) is undefined. In other words, there is such a large divergence between these two pairs of taxa that their sequences are effectively random with respect to each other (see Foster and Hickey 1999
). In order to build the NJ tree from the distance matrix, the program PAUP*, version 4.0 (Swofford 1999
), arbitrarily sets the values of these undefined distances at twice the distance of the largest defined distance in the distance matrix (i.e., 2 x 2.5332, the distance between D. hydei and C. capitata). To guard against the effects of choosing these distances on the topology, we additionally tried factors of 1.1x, and 5x. When undefined distances were set to 1.1 times the largest defined distance in the matrix, the resulting NJ topology was identical to the T92+dG+GC topology except that it placed Chymomyza closer than Scaptodrosophila to C. capitata (likewise the GTR+dG model; see fig. 3a
; note that Chymomyza is still compositionally more biased than D. willistoni toward C. capitata; see fig. 1
). When the factor was set to 2x (i.e., PAUP* choice), Chymomyza remained closer than Scaptodrosophila to C. capitata, and D. willistoni appeared displaced to an external position to the Drosophila genus (likewise the GTR+dG model; see fig. 3a
); when the factor was set to 5x, the resulting NJ topology exhibited disparate relationships. Similar analyses conducted using the parsimony sites of third codon positions (13 out of the 40 pairwise comparisons yielded undefined distances) also produced inconsistent configurations. Therefore, it seems that by limiting the analysis to parsimony sites from first-plus-second codon positions and arbitrarily adjusting undefined distances in the LogDet transformation, it is possible to cope with some (i.e., the compositional bias of D. willistoni), but not all (i.e., the even larger compositional bias of Chymomyza), of the nucleotide composition variation present in our data set. In this respect, our study corroborates the results of other authors who point out that the LogDet correction can fail when there are large nucleotide composition differences among sequences (Foster and Hickey 1999
).
Failure to account for nucleotide substitution differences among sites when they exist can dramatically affect phylogenetic inferences (Yang 1996
). Phylogenetic studies of the Drosophilidae have faced this problem by arbitrarily dropping fast-changing third codon positions from the analysis, thus dismissing any phylogenetic signal they may contain (e.g., Kwiatowski et al. 1994
; Tatarenkov et al. 1999
). Paradoxically, because first and second codon positions are usually under stronger functional constraints and can greatly vary along the sequences, they generally exhibit more extensive among-sites rate variation than when they are analyzed in conjunction with third codon positions. Here we show that among-sites rate variation is a significant feature of the data (see table 3
). The ML methods that we used to account for it made use of the full length of the sequences, such that sites were given a phylogenetic weight inversely related to their rate of change in an objective manner (see Yang 1996
).
Our study shows that two different representations of the substitution process generate two different tree topologies, each attaining high nodal bootstrap support. Our study illustrates a well-known property of bootstrapping: high nodal bootstrap support indicates that the optimal tree would be unlikely to change as sequence length increases, but it gives absolutely no indication as to whether the results are converging to the right tree (see Swofford et al. 1996
). Previous discussions about bootstrap support values for nodes of trees of the Drosophilidae from unrealistic models of substitution should therefore be taken cautiously. Similarly, caution should be exercised in adopting topological congruency among phylogenetic algorithms as a criterion to use in choosing among candidate trees: the GTR+dG and the LogDet (to an extent that can depend on arbitrary choices) distance methods both agree in supporting a wrong topology.
The Kishino, Miyata, and Hasegawa (1990)
RELL test is a popular means to test competing evolutionary hypotheses in an ML framework. Strictly speaking, the RELL test is only valid for comparison of tree topologies that have been specified a priori. (Kishino and Hasegawa 1989
; Kishino, Miyata, and Hasegawa 1990
; Swofford et al. 1996
). Several authors have warned about the risks of including one or more a posteriorispecified trees in the comparison, specifically the ML tree resulting from the data used to conduct the test (Goldman, Anderson, and Rodrigo 2000)
. Our application of the Kishino, Miyata, and Hasegawa (1990)
RELL test is correct because the phylogenetic hypotheses generated by our analyses (see fig. 3a and b
) were obtained using distance-based methods (i.e., we cannot assume that they are ML trees), while the other competing hypotheses were derived from other sources.
The monophyly of the Sophophora subgenus has been determined from anatomical and biogeographical evidence (Throckmorton 1975
) and is in agreement with the evolution of structural properties of several coding regions: the absence of an intron in the Gpdh gene (Wojtas et al. 1992
) and the deletion of three coding nucleotides in the Ddc gene (Tatarenkov et al. 1999
) are features specific to the four major species groups of Sophophora (i.e., melanogaster, obscura, saltans, and willistoni). So far, however, attempts at confirming this positioning by tree-making methods based on conventional descriptions of the nucleotide substitution process have tended to place the saltans and willistoni groups outside the genus Drosophila (see fig. 3a
). Our results strongly suggest that the GC-poor D. willistoni sequence is artifactually attracted by the relatively GC-poor C. capitata outgroup sequence when the heterogeneous base composition is not accounted for by the substitution model (see figs. 1 and 3
). A similar effect impacts the GC-poor Chymomyza sequence. Its position as a closer outgroup to Drosophila than the Scaptodrosophila genus obtained in our study is consistent with the hypothesis of Throckmorton (1975)
based on the evolution of morphological characters. Our results corroborate on a more solid basis previous conclusions about the branching order of Zaprionus and Hirtodrosophila and their position closer to the subgenus Drosophila than the Sophophora subgenus.
The fact that the phylogenetic hypothesis produced by our study is based on a more realistic approach than previous assessments by no means guarantees that we have arrived at the correct tree. Dealing with different causes of tree-building inconsistency at the same time can be problematic (see Whitfield and Cameron 1998
; Steel, Huson, and Lockhart 2000
). It has been shown that in situations like the one tackled in our study, in which the substitution process is nonstationary and substitution rates are unequal across sites, additive pairwise distance methods lose the ability to recognize the parametric topology (see Baake 1998
). Despite these caveats, the results from the T92+dG+GC distance may be preferred, on the one hand, because no other pairwise distance measure exists, apart from the LogDet transformation, that could be applied to our data on a better-grounded theoretical basis. Moreover, it recovers a topology which is fully congruent with the topology achieved by explicit ML methods through the joint comparison of all sequences, although in this regard, it can be argued that we did not perform an exhaustive search (we just limited the ML analysis to a few hypotheses of interest; see table 4
) or that the assumed gamma distribution does not appropriately accommodate the true among-sites rate variation present in the sequences (which, given the observed absence of stationarity, would lead ML to the problem of loss of identifiability, mentioned above for distances; see Steel, Székely, and Hendy 1994
; Baake 1998
). However, there is now better agreement between different classes of data, including morphological and molecular evidence.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: heterogeneous GC content
unequal base composition
maximum-likelihood phylogeny
nonhomogeneous phylogeny models
molecular phylogeny
Drosophilidae
2 Address for correspondence and reprints: Francisco Rodríguez-
Trelles Francisco J. Ayala, Department of Ecology and Evolutionary
Biology, 321 Steinhaus Hall, University of California, Irvine, California
92697-2525. ftrelles{at}iiag.cesga.es
.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Baake H., 1998 What can and what cannot be inferred from pairwise sequence comparisons? Math. Biosci 154:1-21[ISI][Medline]
Cunninghan C. W., H. Zhu, D. M. Hillis, 1998 Best-fit maximum-likelihood models for phylogenetic inference: empirical tests with known phylogenies Evolution 52:978-987[ISI]
DeSalle R., 1992 The phylogenetic relationships of the flies in the family Drosophilidae deduced from mtDNA sequences Mol. Phylogenet. Evol 1:31-40[Medline]
Felsenstein J., 1985 Confidence limits on phylogenies: an approach using the bootstrap Evolution 39:783-791[ISI]
. 1988 Phylogenies from molecular sequences: inference and reliability Annu. Rev. Genet 22:521-565[ISI][Medline]
. 1993 PHYLIP (phylogeny inference package) Version 3.5c. Distributed by the author, Department of Genetics, University of Washington, Seattle
Foster P. G., D. A. Hickey, 1999 Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions J. Mol. Evol 48:284-290[ISI][Medline]
Galtier N., M. Gouy, 1995 Inferring phylogenies from DNA sequences of unequal base compositions Proc. Natl. Acad. Sci. USA 92:11317-11321[Abstract]
. 1998 Inferring the pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis Mol. Biol. Evol 15:871-879[Abstract]
Galtier N., D. Mouchiroud, 1998 Isochore evolution in mammals: a human-like ancestral structure Genetics 150:1577-1584
Galtier N., N. Tourasse, M. Gouy, 1999 A nonhyperthermophilic common ancestor to extant life forms Science 283:220-221
Goldman N., J. P. Anderson, A. G. Rodrigo, 2000 Likelihood-based tests of topologies in phylogenetics Syst. Biol. 49:652-670[ISI][Medline]
Goldman N., S. Whelan, 2000 Statistical tests of gamma distributed rate heterogeneity in models of sequence evolution in phylogenetics Mol. Biol. Evol 17:975-978
Grimaldi D. A., 1990 A phylogenetic revised classification of genera in the Drosophilidae (Diptera) Bull. Am. Mus. Nat. Hist 197:1-139
Hillis D. M., J. J. Bull, 1993 An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis Syst. Biol 42:182-192[ISI]
Huelsenbeck J. P., 1995 The performance of phylogenetic methods in simulation Syst. Biol 44:17-48[ISI]
Huelsenbeck J. P., K. A. Crandal, 1997 Phylogeny estimation and hypothesis testing using maximum likelihood Annu. Rev. Ecol. Syst 28:437-466[ISI]
Jukes T. H., C. R. Cantor, 1969 Evolution of protein molecules Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Kimura M., 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences J. Mol. Evol 16:111-120[ISI][Medline]
Kishino H., M. Hasegawa, 1989 Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from the DNA sequence data, and the branching order in Hominoidea J. Mol. Evol 29:170-179[ISI][Medline]
Kishino H., T. Miyata, M. Hasegawa, 1990 Maximum likelihood inference of protein phylogeny and the origin of chloroplasts J. Mol. Evol 31:151-160[ISI]
Kwiatowski J., F. J. Ayala, 1999 Phylogeny of Drosophila and related genera: conflict between molecular and anatomical analyses Mol. Phylogenet. Evol 13:319-328[ISI][Medline]
Kwiatowski J., D. Skarecky, K. Bailey, J. A. Ayala, 1994 Phylogeny of Drosophila and related genera inferred from the nucleotide sequence of the Cu,Zn Sod gene J. Mol. Evol 38:443-454[ISI][Medline]
Lockhart P. J., M. A. Steel, M. D. Hendy, D. Penny, 1994 Recovering evolutionary trees under a more realistic model of sequence evolution Mol. Biol. Evol 11:605-612
Pélandakis M., M. Solignac, 1993 Molecular phylogeny of Drosophila based on ribosomal RNA sequences J. Mol. Evol 37:525-543[ISI][Medline]
Powell J. R., 1997 Progress and prospects in evolutionary biology: the Drosophila model Oxford University Press, New York
Powell J. R., R. DeSalle, 1995 Drosophila molecular phylogenies and their uses Evol. Biol 28:87-138[ISI]
Remsen J., R. DeSalle, 1998 Character congruence of multiple data partitions and the origin of the Hawaiian Drosophilidae Mol. Phylogenet. Evol 9:225-235[ISI][Medline]
Rodríguez-Trelles F., R. Tarrío, F. J. Ayala, 1999 Switch in codon bias and increased rates of amino acid substitution in the Drosophila saltans species group Genetics 153:339-350
. 2000a. Fluctuating mutation bias and the evolution of the base composition in Drosophila J. Mol. Evol 50:1-10[ISI][Medline]
. 2000b. Disparate evolution of paralogous introns in the Xdh gene of Drosophila J. Mol. Evol 50:123-130[ISI][Medline]
. 2000c. Evidence for a high ancestral GC content in Drosophila Mol. Biol. Evol 17:1710-1717
Russo C. A., N. Takezaki, M. Nei, 1995 Molecular phylogeny and divergence times of Drosophilid species Mol. Biol. Evol 12:391-404[Abstract]
Rzhetsky A., M. Nei, 1995 Tests of the applicability of several substitution models for DNA sequence data Mol. Biol. Evol 12:131-151[Abstract]
Steel M. A., D. Huson, P. J. Lockhart, 2000 Invariable sites models and their use in phylogenetic reconstruction Syst. Biol 49:225-232[ISI][Medline]
Steel M. A., P. J. Lockhart, D. Penny, 1993 Confidence in evolutionary trees from biological sequence data Nature 364:440-442[ISI][Medline]
Steel M. A., L. A. Szikely, M. D. Hendy, 1994 Reconstructing trees when sequence sites evolve at variable rates J. Comput. Biol 1:153-163[Medline]
Swofford D. L., 1999 PAUP*: phylogenetic analysis using parsimony (*and other methods) Version 4.0b2. Sinauer, Sunderland, Mass
Swofford D. L., G. J. Olsen, P. J. Waddell, D. M. Hillis, 1996 Phylogenetic inference Pp. 407514 in D. M. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. Sinauer, Sunderland, Mass
Tamura K., 1992 Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C content biases Mol. Biol. Evol 9:678-687[Abstract]
Tarrio R., F. Rodriguez-Trelles, F. J. Ayala, 1998 New Drosophila introns originate by duplication Proc. Natl. Acad. Sci. USA 95:1658-1662
. 2000 Tree rooting with outgroups when they differ in their nucleotide composition from the ingroup: the Drosophila saltans and willistoni groups, a case study Mol. Phylogenet. Evol 16:344-349[ISI][Medline]
Tatarenkov A., J. Kwiatowski, D. Skarecky, E. Barrio, F. J. Ayala, 1999 On the evolution of Dopa decarboxylase (Ddc) and Drosophila systematics J. Mol. Evol 48:445-462[ISI][Medline]
Tavar S., 1986 Some probabilistic and statistical problems on the analysis of DNA sequences Pp. 5786 in R. M. Miura, ed. Some mathematical questions in biologyDNA sequence analysis. Lec. Math. Life Sci. Vol. 17, Providence, R.I.
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]
Throckmorton L. H., 1975 The phylogeny ecology and geography of Drosophila Pp. 421436 in R. C. King, ed. Handbook of genetics. Vol. 3. Plenum Press, New York
Tourasse N. J., W. H. Li, 1999 Performance of the relative-rate test under nonstationary models of nucleotide substitution Mol. Biol. Evol 16:1068-1078[Abstract]
Wheeler M. R., 1981 The Drosophilidae: a taxonomic overview Pp. 197 in M. Ashburner, H. L. Carson, and J. N. Thompson Jr., eds. The genetics and biology of Drosophila. Academic Press, New York
Whelan S., N. Goldman, 1999 Distributions of statistics used for the comparison of models of sequence evolution in phylogenetics Mol. Biol. Evol 16:1292-1299
Whitfield J. B., S. Cameron, 1998 Hierarchical analysis of variation in the mitochondrial 16S rRNA gene among Hymenoptera Mol. Biol. Evol 15:1728-1743
Wojtas K. M., L. von Kalm, J. R. Weaver, D. T. Sullivan, 1992 The evolution of duplicate glyceraldehyde-3-phosphate dehydrogenase genes in Drosophila Genetics 132:789-797
Yang Z., 1994 Estimating the pattern of nucleotide substitution J. Mol. Evol 39:105-111[ISI][Medline]
. 1996 The among-site rate variation and its impact on phylogenetic analyses TREE 11:367-372
. 1999 PAML: phylogenetic analysis by maximum likelihood Version 2.0g. Distributed by the author, Department of Biology, Galton Laboratory, University College London
Yang Z., N. Goldman, N. E. Friday, 1994 Comparison of models for nucleotide substitution used in maximum likelihood phylogenetic estimation Mol. Biol. Evol 11:316-324[Abstract]
Yang Z., D. Roberts, 1995 On the use of nucleic acid sequences to infer branchings in the tree of life Mol. Biol. Evol 12:451-458[Abstract]