Was the ANITA Rooting of the Angiosperm Phylogeny Affected by Long-Branch Attraction?

Yin-Long Qiu, Jungho Lee, Barbara A. Whitlock, Fabiana Bernasconi-Quadroni and Olena Dombrovska

Department of Biology, University of Massachusetts at Amherst;
Institute of Systematic Botany, University of Zurich, Zurich, Switzerland


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Five groups of basal angiosperms, Amborella, Nymphaeales, Illiciales, Trimeniaceae, and Austrobaileya (ANITA), were identified in several recent studies as representing a series of the earliest-diverging lineages of the angiosperm phylogeny. All of these studies except one employed a multigene analysis approach and used gymnosperms as the outgroup to determine the ingroup topology. The high level of divergence between gymnosperms and angiosperms, however, has long been implicated in the difficulty of reconstructing relationships at the base of angiosperm phylogeny using DNA sequences, for fear of long-branch attraction (LBA). In this study, we replaced the gymnosperm sequences from the five-gene matrix (mitochondrial atp1 and matR, plastid atpB and rbcL, and nuclear 18S rDNA) used in our earlier study with four categories of divergent sequences—random sequences with equal base frequencies or equally AT- and GC-rich contents, homopolymers and heteropolymers, misaligned gymnosperm sequences, and aligned lycopod and bryophyte sequences—to evaluate whether the gymnosperms were an appropriate outgroup to angiosperms in our earlier study that identified the ANITA rooting. All 24 analyses performed rooted the angiosperm phylogeny at either Acorus or Alisma (or Alisma-Triglochin-Potamogeton in one case due to use of a slightly different alignment) and placed the monocots as a basal grade, producing genuine LBA results. These analyses demonstrate that the identification of ANITA as the basalmost extant angiosperms was based on historical signals preserved in the gymnosperm sequences and that the gymnosperms were an appropriate outgroup with which to root the angiosperm phylogeny in the multigene sequence analysis. This strategy of evaluating the appropriateness of an outgroup using artificial sequences and a series of outgroups with increments of divergence levels can be applied to investigations of phylogenetic patterns at the bases of other major clades, such as land plants, animals, and eukaryotes.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Using an outgroup to polarize character states for identifying basal lineages within an ingroup is a virtually universal practice in phylogenetic analyses (Farris 1972Citation ; Stevens 1980Citation ; Maddison, Donoghue, and Maddison 1984Citation ; Nixon and Carpenter 1993Citation ). Choosing outgroups for assessing relationships at the bases of most major clades, however, has been difficult because of the great divergence between the potential outgroups and the ingroup at both morphological and molecular levels, which could confound interpretation of the homology of characters and character states. To reconstruct relationships at the base of the angiosperm phylogeny, extant and fossil gymnosperms or a hypothetical ancestor has been used as the outgroup in morphological cladistic analyses (Dahlgren and Bremer 1985Citation ; Donoghue and Doyle 1989Citation ; Loconte and Stevenson 1991Citation ; Taylor and Hickey 1992Citation ; Doyle, Donoghue, and Zimmer 1994Citation ). In molecular analyses, however, only living gymnosperms can be, and usually are, used as the outgroup (Martin and Dowd 1991Citation ; Hamby and Zimmer 1992Citation ; Chase et al. 1993Citation ; Qiu et al. 1993, 1999, 2000Citation ; Doyle, Donoghue, and Zimmer 1994Citation ; Goremykin et al. 1996Citation ; Chaw et al. 1997Citation ; Soltis et al. 1997, 2000Citation ; Parkinson, Adams, and Palmer 1999Citation ; Soltis, Soltis, and Chase 1999Citation ; Barkman et al. 2000Citation ; Graham and Olmstead 2000Citation ; Savolainen et al. 2000Citation ). Concerns have been expressed that gymnosperms may be too divergent to use as the outgroup for reconstructing relationships among basal angiosperms (Qiu et al. 1993Citation ; Donoghue and Mathews 1998Citation ). In DNA sequence data, due to the limited number of states for character evolution and the time elapsed since separation of the outgroup and the ingroup, a distant outgroup might no longer contain historical signals to polarize character states and thus would behave like a random sequence (Miyamoto and Boyle 1989Citation ; Wheeler 1990Citation ; Qiu and Palmer 1999Citation ). Thus, they might attract the longest ingroup branch, creating the so-called "long-branch attraction" (LBA) problem (Felsenstein 1978Citation ; Hendy and Penny 1989Citation ).

Several approaches have been explored to deal with the distant-outgroup problem. The first approach is to use genes that duplicated along the branch leading to the ingroup, to reciprocally root the two gene phylogenies with each other and thus to infer the organismal phylogeny (Gogarten et al. 1989Citation ; Iwabe et al. 1989Citation ; Donoghue and Mathews 1998Citation ). This strategy works when the duplication occurred close to the point at which the ingroup diversified and when the duplicated copies did not experience dramatic rate acceleration. Another way to deal with the distant-outgroup problem is to extend the length of sequence analyzed by combining data from multiple genes of all three genomes so that the signal/noise ratio can be increased to allow a reliable rooting of the ingroup topology (Hillis 1996Citation ; Soltis et al. 1998Citation ; Qiu and Palmer 1999Citation ; Graham and Olmstead 2000Citation ). A third way to circumvent the distant-outgroup problem is to use genomic structural features that are conserved in their evolution and have clearly understood evolutionary mechanisms (Manhart and Palmer 1990Citation ; Raubeson and Jansen 1992Citation ; Qiu et al. 1998Citation ). Finally, understanding the homology of morphological characters across the large gap between the outgroup and the ingroup at a deeper level by taking the molecular developmental biology approach represents a major direction for future investigation of diversification patterns at the bases of major clades (Kellogg and Shaffer 1993Citation ; Doyle 1994Citation ; Carroll 1995Citation ; Davidson, Peterson, and Cameron 1995Citation ; Raff 1996Citation ; Frohlich and Meyerowitz 1997Citation ; Shubin, Tabin, and Carroll 1997Citation ; Theissen et al. 2000Citation ).

In reconstructing relationships among basal angiosperms, the first two strategies have been used in several recent studies that identified the first branches of the angiosperm phylogeny (Mathews and Donoghue 1999, 2000Citation ; Parkinson, Adams, and Palmer 1999Citation ; Qiu et al. 1999, 2000Citation ; Soltis, Soltis, and Chase 1999Citation ; Barkman et al. 2000Citation ; Graham and Olmstead 2000Citation ; Soltis et al. 2000Citation ). Despite the mutual corroboration between the studies that employed the duplicated gene rooting strategy and those that adopted the multigene analysis approach in identifying the ANITA lineages as the basalmost extant angiosperms, it is essential to demonstrate that the multigene analysis approach can stand on a solid analytic ground on its own and that sampling multiple genes can indeed enhance the level of phylogenetic signal and thus can overcome the divergence gap problem between gymnosperms and angiosperms. This concern is especially justified by the fact that duplicated gene rooting has been shadowed by the difficulty of placing Ceratophyllum (Mathews and Donoghue 1999, 2000Citation ), which was identified as the first lineage of angiosperms in earlier rbcL analyses (Les, Garvin, and Wimpee 1991Citation ; Chase et al. 1993Citation ; Qiu et al. 1993Citation ).

The key argument used in suggesting that distant outgroups might no longer be appropriate outgroups in molecular phylogenetic analyses is that the outgroup sequences are so divergent that the variation they contain has been randomized due to back-mutations and parallel mutations during the long time span since separation of the ingroup and the outgroup (Miyamoto and Boyle 1989Citation ; Wheeler 1990Citation ; Qiu and Palmer 1999Citation ). Hence, one can test whether or not a particular outgroup still contains phylogenetic signal to root the ingroup by replacing it with a random sequence. If the subsequent analysis reproduces the ingroup topology obtained by the original outgroup, this may be an indication of LBA caused by the randomized outgroup. Alternatively, if the random sequence attracts the longest ingroup branch and yields a different topology, this would suggest that the use of the original outgroup might have been appropriate (Miyamoto and Boyle 1989Citation ; Wheeler 1990Citation ; Maddison, Ruvolo, and Swofford 1992Citation ; Donoghue 1994Citation ; Graham 1997Citation , pp. 122–161; Sullivan and Swofford 1997Citation ).

In this study, we performed a series of analyses on the original data matrix used to identify ANITA as the earliest-diverging lineages of angiosperms (Qiu et al. 1999, 2000Citation ) using several types of artificial (random and nonrandom) sequences, as well as sequences that are more divergent than those of gymnosperms, namely, those of a lycopod and a bryophyte, to test whether our original use of gymnosperms as the outgroup was justified. Together with the ingroup taxon deletion analyses and constraint topology analyses presented earlier (Qiu et al. 2000Citation ), we hope that these analyses provide a rigorous analytic perspective for identifying the ANITA lineages as the earliest branches of the angiosperm phylogeny.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Four categories of divergent sequences were used to replace the eight gymnosperms (Cycas, Zamia, Ginkgo, Podocarpus, Metasequoia, Pinus, Gnetum, and Welwitschia) in the original matrix (Qiu et al. 2000Citation ) as the outgroup in a series of 24 analyses (table 1 ). In the first category, three types of random sequences were generated using the RANUNI function of SAS 8.1 (SAS Institute 2000Citation ): the first type consisted of 10 random sequences with equal base frequencies (25% each for A, C, G, and T), the second type consisted of two random sequences with 37.5% each for A and T and 12.5% each for G and C, and the third type consisted of two random sequences with 12.5% each for A and T and 37.5% each for G and C. These sequences represent truly random sequences with equal base frequencies or AT- and GC-rich contents. For the second category, we manually generated five artificial, nonrandom sequences which were homopolymers (poly-A's, poly-C's, poly-G's, and poly-T's) and heteropolymers (poly-ACGT's). These sequences represent extreme forms in a sequence universe. Both of these categories of sequences are of the same length (8,741 nt) as that of the five genes used in our earlier study (Qiu et al. 2000Citation ). Because these two categories of sequences were of nonbiological origin, they likely lacked certain unique properties of biological sequences and might behave erratically in phylogenetic analyses. To counteract this argument, we generated the third category of divergent sequences by misaligning the original five-gene sequences of the eight gymnosperms through deletion of the first position in the alignment (that of atp1) and filling in the last position in the alignment (that of nu18S rDNA) with a question mark (missing data). In so doing, we destroyed all the nucleotide position homology between gymnosperms and angiosperms by disrupting the original alignment, thus creating artificially divergent sequences (relative to the angiosperm ingroup) but of the same biological origin as the original gymnosperm sequences. For the last category, we used aligned sequences of the five genes of a lycopod and a bryophyte. Both the lycopod and the bryophyte sequences were composite. For the former, atp1 was from Lycopodium digitatum (AF209113), matR and atpB were from Huperzia lucidula (AY033145, this study, and U93819), rbcL was from Lycopodium obscurum (Y07935), and 18S rDNA was from Lycopodium tristachyum (U18511). For the latter, atp1 (M68929), atpB (X04465), rbcL (X04465), and 18S rDNA (X75521) were all from Marchantia polymorpha, and matR (AF068932) was from Notothylas breutelii (a hornwort), since the gene is a group II intron-encoded open reading frame and is absent in liverworts and most mosses (Qiu et al. 1998Citation ; unpublished data). To use this last category of divergent sequences, a new alignment for all of the original angiosperm and gymnosperm sequences was needed, since inclusion of the lycopod and bryophyte involved addition or removal of gaps in the alignment, particularly for matR. The alignment was done using Clustal X (Thompson et al. 1997Citation ). The purpose of using this last category of divergent sequences was to determine at what point a well-aligned biological outgroup sequence behaved like a random sequence when a nonangiosperm was used as the outgroup.


View this table:
[in this window]
[in a new window]
 
Table 1 The Results of 24 Analyses that Used Divergent Sequences as the Outgroup to Root the Angiosperm Phylogeny

 
In all 24 analyses except one, we used one divergent sequence to replace the eight gymnosperms in the original five-gene matrix as the outgroup (table 1 ). In only one analysis, we used eight misaligned gymnosperm sequences to replace the eight aligned ones. This analysis was designed to evaluate the effect of the number of divergent sequences on rooting of the angiosperm phylogeny, since we could compare its result with that of two other analyses in which either the misaligned sequence of Cycas (which lacks data for atpB; see Qiu et al. 2000Citation ) or that of Ginkgo was used as the outgroup. A heuristic parsimony (equal weighting) search was conducted using 1,000 random-taxon-addition replicates, one tree held at each step during stepwise addition, tree bisection-reconnection (TBR) branch swapping, the steepest-descent option, the MulTrees option, and no upper limit of MaxTrees. A bootstrap analysis was subsequently performed using 1,000 resampling replicates and the same tree search procedure as described above except with simple taxon addition. All the analyses were performed using PAUP 4.0* (Swofford 1998Citation ).

To identify the longest ingroup branch and to examine distribution of branch lengths within angiosperms, we performed an unrooted ingroup (i.e., angiosperm only) analysis without using any outgroup. All of the angiosperms in the original matrix (Qiu et al. 2000Citation ) were kept after the eight gymnosperms were deleted. A heuristic search with 1,000 random-taxon-addition replicates and the same tree search procedure described above was conducted.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The details of search results of the 24 analyses using divergent sequences as the outgroup are presented in table 1 . One of the most parsimonious trees found in the analysis where a random sequence (sequence 1) with equal base frequencies was used as the outgroup is presented in figure 1 . The branch lengths, the bootstrap values, and the nodes that collapsed in the strict consensus are shown in the figure. All 24 analyses except one produced virtually the same topology, with the first branch of angiosperm phylogeny being identified as either Acorus (which consists of two species, A. calamus and A. gramineus) or Alisma and the monocots forming a grade at the base of the phylogeny (table 1 and fig. 1 ). Only in the analysis where the aligned bryophyte sequence was used as the outgroup was the first branch of angiosperm trees composed of Alisma, Triglochin, and Potamogeton, and this variance could be due to the slightly different alignment used in the analysis. In all analyses, the topology of the strict consensus of the most-parsimonious trees was essentially identical to that of our earlier studies (Qiu et al. 1999, 2000Citation ), with the conspicuous exceptions of the monocot rooting and the ANITA lineages forming a clade (which occurred in the earlier rbcL analyses when Ceratophyllum was placed as a sister to all other angiosperms; see below; Chase et al. 1993Citation ; Qiu et al. 1993Citation ). Magnoliales were sister to Laurales, and Winterales were sister to Piperales, and together these four clades formed the eumagnoliid clade. Relationships among the eumagnoliids, eudicots, Chloranthaceae, Ceratophyllum, and ANITA were unresolved or resolved with no or low bootstrap support.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 1.—One of the nine most-parsimonious trees found in the search in which random sequence 1 was used to replace the eight gymnosperms as the outgroup in the five-gene matrix from Qiu et al. (2000)Citation to root the angiosperm phylogeny. Numbers above branches are branch lengths (ACCTRAN optimization); those below in italics are bootstrap values (only those >50% are shown). The nodes labeled with asterisks are collapsed in the strict consensus of the nine shortest trees. Abbreviations: MON, monocots; CER, Ceratophyllum; CHL, Chloranthaceae; ITA, Illiciales, Trimeniaceae, and Austrobaileya; AMB, Amborella; NYM, Nymphaeales; EUD, eudicots; WIN, Winterales; PIP, Piperales; MAG, Magnoliales; LAU, Laurales; Acorus_c, Acorus calamus; Acorus_g, Acorus gramineus; Ceratophyllum_d, C. demersum; Ceratophyllum_s, C. submersum; Rand seq 1, random sequence 1

 
In the unrooted ingroup analysis, we found two islands of 12 equally most parsimonious trees with a length of 10,520 steps, a consistency index of 0.411, and a retention index of 0.610. One of the trees, presented as an unrooted network showing branch lengths, is shown in figure 2 . It is obvious that the longest ingroup branches are those leading to Alisma, Potamogeton, Triglochin, A. calamus, and A. gramineus, when the tree centers around the juncture of the ANITA lineages, Piperales, Winterales, Laurales, Magnoliales, eudicots, Chloranthaceae, monocots, and Ceratophyllum.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2.—One of the 12 most-parsimonious trees found in the ingroup (angiosperms) only analysis. Numbers along branches are branch lengths (ACCTRAN optimization). The tree is shown as an unrooted phylogram. The part of the tree covering Laurales exclusive of Calycanthaceae (abbreviated as "L") is shown in 2x magnification

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The mystique surrounding LBA (Felsenstein 1978Citation ; Hendy and Penny 1989Citation ) has created a situation in which the phenomenon is frequently invoked to explain a topology that seems to be in conflict with other evidence or simply to dismiss an unfavorable topology. However, cases in which LBA is explicitly demonstrated and carefully investigated are few (Miyamoto and Boyle 1989Citation ; Wheeler 1990Citation ; Maddison, Ruvolo, and Swofford 1992Citation ; Graham 1997Citation , pp. 122–161; Huelsenbeck 1997Citation ; Sullivan and Swofford 1997Citation ; Siddall and Whiting 1999; Sanderson et al. 2000Citation ). In the case of reconstruction of basal angiosperm phylogeny, taxa with very different morphologies were placed at the base of the trees in earlier molecular studies that analyzed different data sets: Schisandraceae in nuclear rbcS (Martin and Dowd 1991Citation ), Nymphaeales in nuclear rDNAs as well as plastid ITS and rDNA (Hamby and Zimmer 1992Citation ; Goremykin et al. 1996Citation ; Chaw et al. 1997Citation ), Ceratophyllum in plastid rbcL (Les, Garvin, and Wimpee 1991Citation ; Chase et al. 1993Citation ; Qiu et al. 1993Citation ), and Austrobaileya-Illiciales and Amborella in nuclear 18S rDNA (Soltis et al. 1997Citation ). These seemingly unstable results heightened plant systematists' fear of the effect of LBA on rooting of the angiosperm phylogeny when gymnosperms, which always have the longest branch in any data set, were used as the outgroup (Qiu et al. 1993Citation ; Donoghue and Mathews 1998Citation ). One critical issue that has not been addressed in any of the molecular studies using gymnosperms as the outgroup to root the angiosperm phylogeny is whether they really are divergent enough to cause LBA (Martin and Dowd 1991Citation ; Hamby and Zimmer 1992Citation ; Chase et al. 1993Citation ; Qiu et al. 1993, 1999, 2000Citation ; Doyle, Donoghue, and Zimmer 1994Citation ; Goremykin et al. 1996Citation ; Chaw et al. 1997Citation ; Soltis et al. 1997, 2000Citation ; Parkinson, Adams, and Palmer 1999Citation ; Soltis, Soltis, and Chase 1999Citation ; Barkman et al. 2000Citation ; Graham and Olmstead 2000Citation ; Savolainen et al. 2000Citation ). Here, we demonstrate the conditions under which LBA really becomes a problem. With two categories of artificial sequences and misaligned gymnosperm sequences as the outgroups, we consistently rooted the angiosperm phylogeny at either Acorus or Alisma (table 1 and fig. 1 ), two of the longest branches (even longer than any ANITA members or Ceratophyllum) among all the angiosperms (fig. 2 ). These outgroup sequences plainly contain no historical information and have immensely long branches in comparison with all others in the trees (fig. 1 and data not shown). The branch length of the outgroup, random sequence 1, shown in figure 1 (6,065 steps!) is also in stark contrast to that of the gymnosperms in our earlier study (354 steps). The fact that virtually the same topology was reproduced in all of these analyses suggests that we have demonstrated the conditions under which genuine LBA can occur, and this is what the earlier authors had predicted (Miyamoto and Boyle 1989Citation ; Wheeler 1990Citation ). The variation of rooting between Acorus (45.5% GC) and Alisma (48.1% GC) appears to be correlated with the GC content of the outgroup sequence (table 1 ). A few exceptions (rand-seq-2, rand-seq-4, rand-seq-10, and the aligned bryophyte) may be due to the altered GC content of informative sites relative to the entire sequence. Therefore, these analyses demonstrate that while the gymnosperm sequences are highly divergent relative to those of angiosperms, they are not divergent enough to cause LBA and thus were an appropriate outgroup in our original studies (Qiu et al. 1999, 2000Citation ). Consequently, the placement of the ANITA lineages at the base of the angiosperm phylogeny was based on unique historical signals preserved in the gymnosperm sequences and was not caused by LBA.

The next question to ask is whether the ANITA rooting can still be an artifact caused by some mechanisms that generate similarities in unrelated lineages by chance but do not necessarily produce long branches. One molecular evolutionary phenomenon, RNA editing, so far known to occur only in organellar genomes (Yoshinaga et al. 1996Citation ; Steinhauser et al. 1999Citation ), may be such a mechanism (Bowe and dePamphilis 1996Citation ; Qiu and Palmer 1999Citation ). Nevertheless, individual analyses of three genes from two organellar genomes (mitochondrial atp1 and matR and plastid atpB) have all identified the ANITA clades as the earliest-branching angiosperm lineages (Qiu et al. 1999, 2000Citation ; Barkman et al. 2000Citation ; Savolainen et al. 2000Citation ). It is highly unlikely that the three genes in two genomes would experience extensive RNA editing in both gymnosperms and the ANITA members but not in any other lineages. Furthermore, an analysis of the nuclear 18S rDNA alone with extensive taxon sampling also placed Austrobaileya-Illiciales and Amborella at the base of angiosperm phylogeny (Soltis et al. 1997Citation ). No RNA editing has been reported at this locus to date. Finally, and most importantly, rooting of the angiosperm phylogeny using duplicated nuclear phytochrome genes has produced a similar result (Mathews and Donoghue 1999, 2000Citation ), reinforcing our belief that the ANITA rooting was not caused by RNA editing.

GC content bias is another mechanism that does not necessarily increase branch length dramatically but still can generate analytic artifacts in phylogenetic analysis of DNA sequences (Steel, Lockhart, and Penny 1993Citation ). A brief examination of the GC content in the five genes across all major lineages of basal angiosperms and gymnosperms shows that there is no significant difference among lineages. Thus, it is unlikely that the ANITA rooting was affected by this factor.

A final question to ask is whether the concern that distant outgroups could cause LBA was well placed (Miyamoto and Boyle 1989Citation ; Wheeler 1990Citation ; Qiu et al. 1993Citation ; Donoghue and Mathews 1998Citation ; Qiu and Palmer 1999Citation ). Our analyses using well-aligned lycopod and bryophyte sequences as the outgroup to root the angiosperm phylogeny indicate that exceedingly divergent outgroups can indeed generate a spurious rooting topology. Both analyses identified either Acorus or Alisma-Triglochin-Potamogeton as the first branch of the angiosperm phylogeny and placed the monocots as a basal grade (table 1 ). These results suggest that the lycopod and bryophyte sequences are so divergent that they behave like random sequences. The outgroup branch length in the bryophyte rooting analysis was 1,464 steps, and that in the lycopod rooting analysis was 1,181 steps, as opposed to the 354 steps of the gymnosperm branch in Qiu et al. (2000)Citation . (Note that the alignment used for the bryophyte and lycopod rooting analyses was a slightly different one.) On the other hand, placing aligned gymnosperm sequences back into the matrix produced the ANITA rooting again (data not shown; the gymnosperms formed a monophyletic group, and the Gnetum-Welwitschia clade was sister to Pinus), supporting the earlier suggestion that one can avoid LBA by judiciously increasing taxon sampling to break long branches (Chase et al. 1993Citation ; Hillis 1996Citation ; Graybeal 1998Citation ; Soltis et al. 1998Citation ; Qiu et al. 1999Citation ; Qiu and Palmer 1999Citation ).

The analyses presented here demonstrate that the gymnosperms were an appropriate outgroup with which to root the angiosperm phylogeny in our earlier multigene analyses (Qiu et al. 1999, 2000Citation ) and that the ANITA rooting is likely free of the LBA effect. Several other multigene analyses reached similar conclusions on the identity of the earliest angiosperms (Parkinson, Adams, and Palmer 1999Citation ; Soltis, Soltis, and Chase 1999Citation ; Barkman et al. 2000Citation ; Graham and Olmstead 2000Citation ; Soltis et al. 2000Citation ). It can be extrapolated that their use of gymnosperms as the outgroup did not violate any fundamental rule of choosing an appropriate outgroup. In retrospect, gymnosperms were well-behaved outgroups even in most single-gene analyses. Various members of the ANITA grade were placed at the base of angiosperm trees: Schisandraceae in nuclear rbcS (Martin and Dowd 1991Citation ), Nymphaeales in nuclear rDNAs as well as plastid ITS and rDNA (Hamby and Zimmer 1992Citation ; Goremykin et al. 1996Citation ; Chaw et al. 1997Citation ), and Austrobaileya-Illiciales and Amborella in nuclear 18S rDNA (Soltis et al. 1997Citation ). Insufficient taxon sampling in all of these studies and the use of single genes (which obviously contain less signal than multigene data sets) naturally complicate the effort of building a well-resolved phylogeny and lead to the suspicion that these seemingly different rooting topologies were produced by LBA due to the great divergence between gymnosperms and angiosperms. Ironically, the only single-gene analyses that sampled basal angiosperms extensively produced a rooting that seems to be an analytical artifact, i.e., the Ceratophyllum rooting (Chase et al. 1993Citation ; Qiu et al. 1993Citation ). A reanalysis of the rbcL matrix used in our recent multigene analyses (Qiu et al. 1999, 2000Citation ) shows that even the placement of Ceratophyllum as the sister to all other angiosperms was also largely due to the historical signal contained in the gymnosperm sequences. When the gymnosperm sequences were replaced with the artificial sequences and misaligned gymnosperm sequences used in this study, the angiosperm trees were rooted at various taxa that have branches longer than Ceratophyllum (data not shown). Ceratophyllum is likely an early-diverging lineage of angiosperms, even though its exact relationship to other major clades of basal angiosperms is not well resolved at present (Qiu et al. 1999, 2000Citation ; Soltis, Soltis, and Chase 1999Citation ; Mathews and Donoghue 2000Citation ; Savolainen et al. 2000Citation ; Soltis et al. 2000Citation ). Thus, its placement at the base of angiosperm trees in the rbcL analyses was probably caused by both phylogenetic signal and a few homoplasious changes that happened to be shared with gymnosperms (not necessarily by LBA).

Reconstruction of phylogenetic relationships at the bases of major clades using molecular sequence data routinely generates controversial results (Qiu and Palmer 1999Citation ; Adoutte et al. 2000Citation ; Philippe, Germot, and Moreira 2000Citation ), largely due to use of divergent outgroups and sparse taxon sampling. The LBA problem is frequently invoked to explain results that are otherwise inexplicable. Nevertheless, most claims of LBA have not been substantiated by explicit analyses. Several parsimony- or likelihood-based tests have been developed to examine whether long branches indeed attract each other and to reduce the LBA effect (Huelsenbeck 1997Citation ; Lyons-Weiler and Hoelzer 1997Citation ; Willson 1999Citation ; Sanderson et al. 2000Citation ). The strategy employed here follows the ideas of Miyamoto and Boyle (1989)Citation , Wheeler (1990)Citation , Maddison, Ruvolo, and Swofford (1992)Citation , Graham (1997)Citation , and Sullivan and Swofford (1997)Citation in using random sequences to evaluate whether phylogenetic signal in the outgroup has been randomized. We further elaborated this approach by increasing the repertoire of test sequences by using homo- and heteropolymers, misaligned original outgroup sequences, and more distantly related aligned outgroup sequences. In particular, this last category of outgroup sequences showed several increments of divergence levels and helped to define the point beyond which the outgroup was no longer appropriate for rooting the ingroup. As it becomes clear that sampling multiple genes from all two or three genomes of a large number of organisms can lead to reliable reconstruction of complicated organismal phylogenies (Hillis 1996Citation ; Qiu et al. 1999, 2000Citation ; Soltis, Soltis, and Chase 1999Citation ; Savolainen et al. 2000Citation ; Soltis et al. 2000Citation ) and that the LBA problem is tractable thanks to the various strategies that are being developed, phylogenetic analyses of DNA sequences will undoubtedly, along with comparative genomics and evolutionary developmental biology, allow evolutionary biologists to tackle many of the issues in the tree of life.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank Ronald Adkins, Albert Blarer, James A. Doyle, Eva Goldwater, Sean Graham, Margaret Hoey, Libo Li, and Peter F. Stevens for helpful suggestions, and Schweizerischer Nationalfonds and University of Massachusetts for financial support.


    Footnotes
 
Elizabeth Kellogg, Reviewing Editor

1 Abbreviations: ANITA, Amborella, Nymphaeales, and Illiciales-Trimeniaceae-Austrobaileya; LBA, long-branch attraction. Back

2 Keywords: Amborella ANITA basal angiosperms long-branch attraction outgroup random sequences Back

3 Address for correspondence and reprints: Yin-Long Qiu, Department of Biology, University of Massachusetts, Amherst, Massachusetts 01003-5810. yqiu{at}bio.umass.edu . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Adoutte A., G. Balavoine, N. Lartillot, O. Lespinet, B. Prud'homme, R. de Rosa, 2000 The new animal phylogeny: reliability and implications Proc. Natl. Acad. Sci. USA 97:4453-4456[Abstract/Free Full Text]

    Barkman T. J., G. Chenery, J. R. McNeal, J. Lyons-Weiler, W. J. Ellisens, G. Moore, A. D. Wolfe, C. W. dePamphilis, 2000 Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny Proc. Natl. Acad. Sci. USA 97:13166-13171[Abstract/Free Full Text]

    Bowe L. M., C. W. dePamphilis, 1996 Effects of RNA editing and gene processing on phylogenetic reconstruction Mol. Biol. Evol 13:1159-1166[Abstract]

    Carroll S. B., 1995 Homeotic genes and the evolution of arthropods and chordates Nature 376:479-485[ISI][Medline]

    Chase M. W., D. E. Soltis, R. G. Olmstead, et al. (39 co-authors) 1993 Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL Ann. Mo. Bot. Gard 80:528-580

    Chaw S.-M., A. Zharkikh, H.-M. Sung, T.-C. Lau, W.-H. Lee, 1997 Molecular phylogeny of extant gymnosperms and seed plant evolution: analysis of nuclear 18S rRNA sequences Mol. Biol. Evol 14:56-68[Abstract]

    Dahlgren R., K. Bremer, 1985 Major clades of the angiosperms Cladistics 1:349-368

    Davidson E. H., K. J. Peterson, R. A. Cameron, 1995 Origin of bilaterian body plans: evolution of developmental regulatory mechanisms Science 270:1319-1325[Abstract]

    Donoghue M. J., 1994 Progress and prospects in reconstructing plant phylogeny Ann. Mo. Bot. Gard 81:405-418

    Donoghue M. J., J. A. Doyle, 1989 Phylogenetic analysis of angiosperms and the relationships of Hamamelidae Pp. 17–45 in P. R. Crane and S. Blackmore, eds. Evolution, systematics, and fossil history of the Hamamelidae. Vol. 1. Clarendon, Oxford, England

    Donoghue M. J., S. Mathews, 1998 Duplicated genes and the root of angiosperms, with an example using phytochrome sequences Mol. Phylogenet. Evol 9:489-500[ISI][Medline]

    Doyle J. A., M. J. Donoghue, E. A. Zimmer, 1994 Integration of morphological and ribosomal RNA data on the origin of angiosperms Ann. Mo. Bot. Gard 81:419-450

    Doyle J. J., 1994 Evolution of a plant homeotic multigene family: toward connecting molecular systematics and molecular developmental genetics Syst. Biol 43:307-328[ISI]

    Farris J. S., 1972 Estimating phylogenetic trees from distance matrices Am. Nat 106:645-668[ISI]

    Felsenstein J., 1978 Cases in which parsimony and compatibility methods will be positively misleading Syst. Zool 27:401-410[ISI]

    Frohlich M. W., E. M. Meyerowitz, 1997 The search for homeotic gene homologs in basal angiosperms and Gnetales: a potential new source of data on the evolutionary origin of flowers Int. J. Plant Sci 158:S131-S142[ISI]

    Gogarten J. P., H. Kilbak, P. Dittrich, L. Taiz, E. J. Bowman, B. J. Bowman, M. F. Manolson, R. J. Poole, T. Date, T. Oshima, 1989 Evolution of vacuolar H+-ATPase: implications for the origin of eukaryotes Proc. Natl. Acad. Sci. USA 86:6661-6665[Abstract]

    Goremykin V., V. Bobrova, J. Pahnke, A. Troitsky, A. Antonov, W. Martin, 1996 Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support Gnetalean affinities of angiosperms Mol. Bio. Evol 13:383-396[Abstract]

    Graham S. W., 1997 Phylogenetic analysis of breeding system evolution in heterostylous monocotyledons Ph.D. dissertation, University of Toronto, Toronto, Canada

    Graham S. W., R. G. Olmstead, 2000 Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms Am. J. Bot 87:1712-1730[Abstract/Free Full Text]

    Graybeal A., 1998 Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol 47:9-17[ISI][Medline]

    Hamby R. K., E. A. Zimmer, 1992 Ribosomal RNA as a phylogenetic tool in plant systematics Pp. 50–91 in P. S. Soltis, D. E. Soltis, and J. J. Doyle, eds. Molecular systematics of plants. Chapman and Hall, New York

    Hendy M. D., D. Penny, 1989 A framework for the quantitative study of evolutionary trees Syst. Zool 38:297-309[ISI]

    Hillis D. M., 1996 Inferring complex phylogenies Nature 383:130-131[ISI][Medline]

    Huelsenbeck J. P., 1997 Is the Felsenstein zone a fly trap? Syst. Biol 46:69-74[ISI][Medline]

    Iwabe N., K.-I. Kuma, M. Hasegawa, S. Osawa, T. Miyata, 1989 Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes Proc. Natl. Acad. Sci. USA 86:9355-9359[Abstract]

    Kellogg E. A., H. B. Shaffer, 1993 Model organisms in evolutionary studies Syst. Biol 42:409-414[ISI]

    Les D. H., D. K. Garvin, C. F. Wimpee, 1991 Molecular evolutionary history of ancient aquatic angiosperms Proc. Natl. Acad. Sci. USA 88:10119-10123[Abstract]

    Loconte H., D. W. Stevenson, 1991 Cladistics of the Magnoliidae Cladistics 7:267-296[ISI]

    Lyons-Weiler J., G. A. Hoelzer, 1997 Escaping from the Felsenstein zone by detecting long branches in phylogenetic data Mol. Phylogenet. Evol 8:375-384[ISI][Medline]

    Maddison D. R., M. Ruvolo, D. L. Swofford, 1992 Geographic origins of human mitochondrial DNA: phylogenetic evidence from control region sequences Syst. Biol 41:111-124[ISI]

    Maddison W. P., M. J. Donoghue, D. R. Maddison, 1984 Outgroup analysis and parsimony Syst. Zool 33:83-103[ISI]

    Manhart J. R., J. D. Palmer, 1990 The gain of two chloroplast tRNA introns marks the green algal ancestors of land plants Nature 345:268-270[ISI][Medline]

    Martin P. G., J. M. Dowd, 1991 Studies of angiosperm phylogeny using protein sequences Ann. Mo. Bot. Gard 78:296-337

    Mathews S., M. J. Donoghue, 1999 The root of angiosperm phylogeny inferred from duplicate phytochrome genes Science 286:947-950[Abstract/Free Full Text]

    ———. 2000 Basal angiosperm phylogeny inferred from duplicated phytochromes A and C Int. J. Plant Sci 161:S41-S55[ISI]

    Miyamoto M. M., S. M. Boyle, 1989 The potential importance of mitochondrial DNA sequence data to eutherian mammal phylogeny Pp. 437–450 in B. Fernholm, K. Bremer, and H. Joernvall, eds. The hierarchy of life. Elsevier, Amsterdam, the Netherlands

    Nixon K. C., J. M. Carpenter, 1993 On outgroups Cladistics 9:413-426[ISI]

    Parkinson C. L., K. L. Adams, J. D. Palmer, 1999 Multigene analyses identify the three earliest lineages of extant flowering plants Curr. Biol 9:1485-1488[ISI][Medline]

    Philippe H., A. Germot, D. Moreira, 2000 The new phylogeny of eukaryotes Curr. Opin. Genet. Dev 10:596-601[ISI][Medline]

    Qiu Y.-L., M. W. Chase, D. H. Les, C. R. Parks, 1993 Molecular phylogenetics of the Magnoliidae: cladistic analyses of nucleotide sequences of the plastid gene rbcL Ann. Mo. Bot. Gard 80:587-606

    Qiu Y.-L., Y. Cho, J. C. Cox, J. D. Palmer, 1998 The gain of three mitochondrial introns identifies liverworts as the earliest land plants Nature 394:671-674[ISI][Medline]

    Qiu Y.-L., J. Lee, F. Bernasconi-Quadroni, D. E. Soltis, P. S. Soltis, M. Zanis, E. A. Zimmer, Z. Chen, V. Savolainen, M. W. Chase, 1999 The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes Nature 402:404-407[ISI][Medline]

    ———. 2000 Phylogeny of basal angiosperms: Analyses of five genes from three genomes Int. J. Plant Sci 161:S3-S27[ISI]

    Qiu Y.-L., J. D. Palmer, 1999 Phylogeny of early land plants: insights from genes and genomes Trends Plant Sci 4:26-30[ISI][Medline]

    Raff R. A., 1996 The shape of life University of Chicago Press, Chicago

    Raubeson L. A., R. K. Jansen, 1992 Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants Science 255:1697-1699[ISI]

    Sanderson M. J., M. F. Wojciechowski, J.-M. Hu, T. Sher Khan, S. G. Brady, 2000 Error, bias, and long-branch attraction in data for two chloroplast photosystem genes in seed plants Mol. Biol. Evol 17:782-797[Abstract/Free Full Text]

    SAS Institute. 2000 SAS 8.1 SAS Institute, Cary, N.C

    Savolainen V., M. W. Chase, S. B. Hoot, C. M. Morton, D. E. Soltis, C. Bayer, M. F. Fay, A. Y. de Bruijn, S. Sullivan, Y.-L. Qiu, 2000 Phylogenetics of flowering plants based upon a combined analysis of plastid atpB and rbcL gene sequences Syst. Biol 49:306-362[ISI][Medline]

    Shubin N., C. Tabin, S. Carroll, 1997 Fossils, genes, and the evolution of animal limbs Nature 388:639-648[ISI][Medline]

    Sidall M. E., M. F. Whiting, 1999 Long branch abstraction Cladistics 15:9-24[ISI]

    Soltis D. E., P. S. Soltis, M. W. Chase, et al. (16 co-authors) 2000 Angiosperm phylogeny inferred from 18S rDNA, rbcL and atpB sequences Bot. J. Linn. Soc 133:381-461[ISI]

    Soltis D. E., P. S. Soltis, M. E. Mort, M. W. Chase, V. Savolainen, S. B. Hoot, C. M. Morton, 1998 Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms Syst. Biol 47:32-42[ISI][Medline]

    Soltis D. E., P. S. Soltis, D. L. Nickrent, et al. (16 co-authors) 1997 Angiosperm phylogeny inferred from 18S ribosomal DNA sequences Ann. Mo. Bot. Gard 84:1-49

    Soltis P. S., D. E. Soltis, M. W. Chase, 1999 Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology Nature 402:402-404[ISI][Medline]

    Steel M. A., P. J. Lockhart, D. Penny, 1993 Confidence in evolutionary trees from biological sequence data Nature 364:440-442[ISI][Medline]

    Steinhauser S., S. Beckert, I. Capesius, O. Malek, V. Knoop, 1999 Plant mitochondrial RNA editing J. Mol. Evol 48:303-312[ISI][Medline]

    Stevens P. F., 1980 Evolutionary polarity of character states Annu. Rev. Ecol. Syst 11:333-358[ISI]

    Sullivan J., D. L. Swofford, 1997 Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics J. Mamm. Evol 4:77-86

    Swofford D. L., 1998 PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4.0b2 Sinauer, Sunderland, Mass

    Taylor D. W., L. J. Hickey, 1992 Phylogenetic evidence for the herbaceous origin of angiosperms Plant Syst. Evol 180:137-156[ISI]

    Theissen G., A. Becker, A. Di Rosa, A. Kanno, J. T. Kim, T. Muenster, K.-U. Winter, H. Saedler, 2000 A short history of MADS-box genes in plants Plant Mol. Biol 42:115-149[ISI][Medline]

    Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The Clustal X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 24:4876-4882

    Wheeler W. C., 1990 Nucleic acid sequence phylogeny and random outgroups Cladistics 6:363-367[ISI]

    Willson S. J., 1999 A higher order parsimony method to reduce long-branch attraction Mol. Biol. Evol 16:694-705[Free Full Text]

    Yoshinaga K., H. Iinuma, T. Masuzawa, K. Ueda, 1996 Extensive RNA editing of U to C in addition to C to U substitution in the rbcL transcripts of hornwort chloroplasts and the origin of RNA editing in green plants Nucleic Acids Res 24:1008-1014[Abstract/Free Full Text]

Accepted for publication May 18, 2001.