* Institut für Spezielle Botanik, Universität Jena, Jena, Germany
Zentrum Pharmakologie und Toxikologie, Universität Göttingen, Göttingen, Germany
Klinik für Innere Medizin, Universität Jena, Jena, Germany
Correspondence: E-mail: vadim.goremykin{at}uni-jena.de.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Amborella chloroplast genomes angiosperms gymnosperms molecular evolution substitution rates
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Most recently, some papers employing large multigene data sets (Parkinson, Adams, and Palmer 1999; Qiu et al. 1999; Soltis, Soltis, and Chase 1999) suggested that the tropical monotypic family Amborellaceae, with only one speciesAmborella trichopoda Baill., might belong to the most archaic lineage of the angiosperms. Some later molecular investigations (Barkman et al. 2000; Graham and Olmstead 2000), however, yielded unstable topologies either with Amborella as a sister group to the rest of the angiosperms or with Amborella as a first group to split off the most basal angiosperm clade, including Nymphaeales.
Here, we present the complete sequence of the Amborella trichopoda chloroplast genome, sequenced with overall 10x coverage to achieve a total quality of one mistake among 23,341 bases (6.97 possible mistakes in the whole plastome sequence), as determined using PHRED (Ewing et al. 1998) confidence values for the genomic consensus sequence.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The Amborella trichopoda chloroplast genome sequence reported in this paper has been deposited in the EMBL database (accession number AJ506156). The alignments used in these analyses and primer sequences are available upon request.
Sequence Assembly
All reads were base-called with the PHRED program. Masking of the vector and primer sequences and assembly were performed using the STADEN package (Staden, Beal, and Bonfield 2000) on a Linux Pentium III PC. The sequencing data were accumulated with 10x coverage for every PCR product. The remaining gaps were closed by PCR.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Phylogenetic Considerations and Analyses
Before phylogenetic analyses, divergence at synonymous and nonsynonymous sites was estimated in the codon-based nucleotide alignments of 61 chloroplast genes common to 13 completely sequenced chloroplast genomes of land plants (Ohyama et al. 1986; Shinozaki et al. 1986; Hiratsuka et al. 1989; Wakasugi et al. 1994; Maier et al. 1995; Sato et al. 1999; Hupfer et al. 2000; Kato et al. 2000; Schmitz-Linneweber et al. 2001; Ogihara et al. 2002) by the method of Yang (1997a). A comparison of synonymous and nonsynonymous distances between the genes encoded on the Pinus cpDNA and their angiosperm homologs (fig. 2) shows that most of the chloroplast coding sequences are very divergent at their synonymous sites for this taxonomical range. One can therefore conclude that using complete sequences of protein-coding genes from cpDNA for investigating affinity between different representatives of angiosperms and gymnosperms could lead to too high variance of phylogenetic reconstruction and to erroneous correction for multiple substitutions per site with substitution models based on total distance measurement.
|
We concatenated 61 individual alignments of the first and the second codon positions from the coding genes common to 13 known chloroplast genomes of the land plants into one data set. Another data set was produced with the translated sequences of these genes. There were no missing sequences of complete genes in our data sets. After manual editing we obtained 30,017-positions-long nucleotide and 14,655-positions-long amino acid alignments of good quality.
Analyses of Nucleotide Sequences
A Neighbor-Joining (NJ) tree from Kimura two-parameter distances (as implemented in the TREECON package [Van de Peer and De Wachter 1994]) based on the alignment of the first and the second codon positions from 61 chloroplast protein-coding genes is presented in figure 3. On this tree, Amborella trichopoda does not form the most basal angiosperm clade. Instead, it appears on a common branch with Calycanthus fertilis, a member of Laurales, at the base of the eudicot cluster. Both this branch and the monophyly of all dicotyledonous plants under study received the highest bootstrap support (100/100). Applying Jukes-Cantor; Felsenstein F81; Felsenstein F84; Kimura three-parameter; Hasegawa, Kishino, and Yano; Tajima-Nei; Tamura-Nei; and General time-reversible models as implemented in the PAUP package (Swofford 2002) instead of the Kimura two-parameter model did not lead to any changes in tree topology or in bootstrap support values for these two branches.
|
Log determinant analysis (PAUP) produced the same topology with the same bootstrap proportion support for these two branches, which suggests that the topology on figure 3 is not likely to be an artifact of the compositional biases in different lineages.
These results are supported by maximum parsimony. Maximum parsimony with heuristic algorithm (PAUP and PHYLIP [Felsenstein 1989] implementations) recovered the same topology as shown on figure 3. The branch uniting Amborella and Calycanthus and the one bearing out all the dicotyledonous plants under study were found in all bootstrap samples (100/100 BP). The same result was obtained using branch and bound algorithm (PAUP). The regression analysis performed with the AUTODECAY program showed that the branch Calycanthus-Amborella was reproduced in the first 27 shortest trees, whereas the branch uniting all dicotyledonous plants under study disappeared starting from the 88th shortest tree. The constrained tree with Amborella as a sister group to other angiosperms was 87 steps longer than the most parsimonious tree, on which it formed a common clade with Calycanthus at the base of the dicotyledonous cluster.
Maximum-likelihood analyses were performed with the Tree-Puzzle program using the fast quartet-puzzling tree search algorithm (Strimmer and von Haeseler 1996). The program provides a method-specific statistical support for the internal tree branchesa percentage of topology bipartitions occurring in intermediate trees it builds in all puzzling steps. Branches showing quartet puzzling support (QPS) above 90% are considered as strongly supported. Employing Tamura-Nei and Hasegawa, Kishino, and Yano substitution models with default settings and the root set to Marchantia, we found the same topology as shown in figure 3 with, respectively, 93/94 QPS for the branch uniting all dicotyledonous plants and 99/99 QPS for the branch Calycanthus-Amborella. The competing topologies (eudicots basal or Calycanthus-Amborella branch basal) did not include one with Amborella basal to all angiosperms. We did not register a single bipartition out of 1,000 possible supporting such placement of Amborella in the above analyses.
Applying the substitution model fitted with the ModelTest program (Posada and Crandall 1998) to our nucleotide data set with all positions including gaps deleted, we received the "standard" topology, with Amborella forming the most basal branch and Calycanthus branching second. Changing the alpha shape parameter from 0.72 to 0.28 (the value we previously estimated from the data) with the fitted model resulted in a drastic change of the topology: Eudicots became paraphyletic with Nicotiana separated from the rest of them by the branch bearing out the grasses, whereas Calycanthus became the most basal and Amborella branched off second. A general drop of QPS values across the tree was observed.
Analyses of Protein Sequences
NJ trees built from Kimura, Tajima-Nei (TREECON implementation), and Dayhoff (PHYLIP implementation) protein distances on the basis of the 14,655amino acidslong alignment yielded topologies identical to the topology presented on figure 3. Bootstrap support for the monophyly of the dicotyledonous plants and the Calycanthus-Amborella branch remained on the 100/100 BP level in these analyses.
The topology presented in figure 3 was also recovered in maximum-parsimony (PAUP) analyses of the above data set with 100/100 BP support for the branch bearing out all the dicots under analysis and 96/100 BP support for the sister group relationship between Calycanthus and Amborella. We did not register an alternative topology with Amborella forming the most basal angiosperm clade in these analyses.
Maximum-likelihood analyses of this data set with Dayhoff, Schwartz, and Orcutt (1978), Henikoff and Henikoff (1992) (Blosum62), and Whelan and Goldman (2001) (WAG) amino acid substitution models as implemented in the Tree-Puzzle program yielded 99 QPS support for the Amborella-Calycanthus clade and 100 QPS for the sister group relationship between it and the clade uniting all eudicots (Rosopsida) under analysis. With Müller and Vingron (2000) and Jones, Taylor, and Thornton (1992) models, both these branches were recovered with 99 QPS.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the two cases in which this topology could not be confirmed, complex fitted models were used. It is important to note that a small change in the fitted substitution model causes a big difference to the tree obtained. One of the recovered fitted topologies is obviously wrong. Multiple cases have previously been reported in which such fitted models failed to yield the correct tree (Yang 1997b; Posada and Crandall 2001), whereas unfitted models recovered correct topologies.
One can also note that, in contrast to the placement of Amborella as a sister group to all other angiosperms, the position recovered with unfitted models is in better compliance with the taxonomical views, which were maintained for more than a century. Amborella trichopoda was first described by Baillon (1869), who placed it in Monimiaceae (Laurales) because of the similarity of its male flowers with of those of Hedycarya. It was further treated accordingly by Bentham and Hooker (1880), Pax (1889), Perkins and Gilg (1901), and Perkins (1925). The female flowers of Amborella remained unknown to science until 1948, when they were described by Bailey and Swamy (1948) who also found that Amborella had vesselless xylem. In accordance with their suggestions, Pichon (1948) proposed a separate family for the plant, Amborellaceae, within Laurales. The separation of Amborella into a separate family was later supported by Money, Bailey, and Swamy (1952) in their detailed investigation of the morphology of Monimiaceae. Following suggestions by Bailey and Swamy (1948) they narrowed the definition of this family, also separating Trimenia and Piptocalyx into another new family, Trimeniaceae. Yet Money, Bailey, and Swamy (1952) regarded Amborella close to Monimiaceae because of their sharing of hippocrepiform sclereids and the similar forms and structures of the pollen.
Takhtajan (1966) viewed the vesselless condition of xylem as an archaic character in angiosperms. Writing on the origins of Laurales, he argued that they developed from certain ancient vesselless members of Magnoliales, the oldest order of angiosperms in his system. Magnoliales themselves, in his opinion, radiated from plants with the vesselless Winteraceae type of xylem. As a consequence, vesselless Amborella was placed by Takhtajan (1966) in a basal position within Laurales. This placement of Amborella was further supported by Cronquist (1981). Describing the taxonomic position of this plant, he wrote: "It is clearly a member of Laurales, in which its primitively vesselless wood, alternate leaves, essentially hypogynous flowers, several carpels, abundant endosperm and stamens dehiscent by longitudinal slits mark it as an archaic type."
On the other hand, association of Amborella with Nymphaeales is problematic from the morphological standpoint: in contrast to the water lilies, Amborella has small unisexual flowers and abundant endosperm versus endosperm almost lacking in Nymphaeales, which instead develop perisperm. There are significant differences in early endosperm development of Amborella and Nymphaeales (Floyd and Friedman 2001). Like Eudicots and Eumagnoliids, Amborella has a seven-celled female gametophyte and triploid endosperm, as opposed to the four-celled female gametophyte and diploid endosperm characteristic of Nymphaeales (Williams and Friedman 2002).
Amborella was placed at the root of all other angiosperms when molecular studies by Parkinson, Adams, and Palmer (1999), Qiu et al. (1999), and Soltis, Soltis, and Chase (1999) emerged. Yet, these studies were based on a limited number of characters derived from only a few genes. Furthermore, using unmasked sequences of chloroplast genes with high substitution rates at their synonymous sites (see fig. 2) in maximum-parsimony analyses aimed at finding affinity between angiosperms and spermatophyte outgroups can be misleading.
The uniformity and high resolution of the results provided by the data set of the coding genes from chloroplast DNA in the overwhelming majority of analyses give reasons to believe that further accumulation of the genomic cpDNA data of basal angiosperms should provide means to resolve the first stages of flowering plant evolution. However, further studies on the properties of these data and possible underlying causes influencing the position of Amborella obtained here using various models are needed. Although not systematically investigated here, it is possible that the use of different closely related outgroups may influence the position of Amborella in the tree. Further studies of this aspect using chloroplast genome sequences from further gymnosperms are required as well.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Arber, E. A. N., and J. Parkin. 1907. On the origin of angiosperms. J. Linnean Soc. 38:29-80.
Arber, E. A. N., and J. Parkin. 1908. Studies on the evolution of angiosperms: the relationship of the angiosperms to Gnetales. Ann. Bot. (London) 22:489-515.
Bailey, I. W., and B. G. L. Swamy. 1948. Amborella trichopoda Baill : a new morphological type of vesselless dicotyledons. J. Arnold Arb. 29:245-254.
Baillon, H. 1869. Histoire des plantes. Vol. I. L. Hachette & Cie. Paris, London, Leipzig.
Barkman, T. J., G. Chenery, J. R. McNeal, J. Lyons-Weiler, W. J. Ellisens, G. Moore, A. D. Wolfe, and C. W. dePamphilis. 2000. Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny. Proc. Natl. Acad. Sci. USA 97:13166-13171.
Bentham, G., and J. D. Hooker. 1880. Genera plantarum, III(1). Reeve & Company, London.
Cronquist, A. 1981. An integrated system of classification of flowering plants. Columbia University Press, New York.
Dayhoff, M. O., R. M. Schwartz, and B. C. Orcutt. 1978. A model of evolutionary change in proteins. Pp. 345352 in M. O. Dayhoff, ed. Atlas of protein sequence structure, Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington, D.C.
Ewing, B., L. Hillier, M. C. Wendl, and P. Green. 1998. Base-calling of automated sequencer traces using PHRED. I. Accuracy assessment. Genome Res. 8:175-185.
Felsenstein, J. 1989. PHYLIPphylogeny inference package. Version 3.2. Cladistics 5:164-166.
Floyd, S. K., and W. E. Friedman. 2001. Developmental evolution of endosperm in basal angiosperms: evidence from Amborella (Amborellaceae), Nuphar (Nymphaeaceae), and Illicium (Illiciaceae). Plant Syst. Evol. 228:153-169.[CrossRef][ISI]
Goremykin, V., K. I. Hirsch-Ernst, S. Wölfl, and F. H. Hellwig. 2003. The chloroplast genome of the "basal" angiosperm Calycanthus fertilisstructural and phylogenetic analyses. Plant Syst. Evol. (in press).
Graham, S. W., and R. G. Olmstead. 2000. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87:1712-1730.
Henikoff, S., and J. G. Henikoff. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89:10915-10919.[Abstract]
Hiratsuka, J., H. Shimada, and R. Whittier, et al. (16 co-authors). 1989. The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol. Gen. Genet. 217:185-194.[ISI][Medline]
Hupfer, H., M. Swiatek, S. Hornung, R. G. Hermann, R. M. Maier, W-L. Chiu, and B. Sears. 2000. Complete nucleotide sequence of the Oenothera elata plastid chromosome, representing plastome I of the five distinguishable Euoenothera plastomes. Mol. Gen. Genet. 263:581-585.[CrossRef][ISI][Medline]
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275-282.[Abstract]
Kato, T., T. Kaneko, S. Sato, Y. Nakamura, and S. Tabata. 2000. Complete structure of the chloroplast genome of a legume, Lotus japonicus. DNA Res. 7:323-330.[ISI][Medline]
Lockhart, P. J., C. J. Howe, A. C. Barbrook, A. W. D. Larkum, and D. Penny. 1999. Spectral analysis, systematic bias, and the evolution of chloroplasts. Mol. Biol. Evol. 16:573-576.
Maier, R. M., K. Neckermann, G. L. Igloi, and H. Kossel. 1995. Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J. Mol. Biol. 251:614-628.[CrossRef][ISI][Medline]
Martin, W., B. Stoebe, V. Goremykin, S. Hansmann, M. Hasegawa, and K. V. Kowallik. 1998. Gene transfer to the nucleus and the evolution of chloroplasts. Nature 393:162-165.[CrossRef][ISI][Medline]
Meyen, S. V. 1986. Hypothesis of the origin of the angiosperms from Bennetitales by gamoheterotrophy: transition of characters from one sex to another. Zh. Obshzh. Biol. 47:291-309 [in Russian].
Money, L. L., I. W. Bailey, and B. G. L. Swamy. 1952. The morphology and relationships of the Monimiaceae. J. Arnold Arb. 31:372-404.
Müller, T., and M. Vingron. 2000. Modeling amino acid replacement. J. Comput. Biol. 7:761-776.[CrossRef][ISI][Medline]
Murray, M. G., and W. F. Thompson. 1980. Rapid isolation of high molecular weight DNA. Nucleic Acids Res. 8:4321-4325.[Abstract]
Ogihara, Y., K. Isono, and T. Kojim, et al. (19 co-authors). 2002. Structural features of a wheat plastome as revealed by complete sequencing of chloroplast DNA. Mol. Genet. Genomics. 266:740-746.[CrossRef][ISI][Medline]
Ohtani, K., H. Yamamoto, and K. Akimitsu. 2002. Sentitivity to Alternaria alternata toxin in citrus because of altered mitochondrial RNA processing. Proc. Natl. Acad. Sci. USA 99:2439-2444.
Ohyama, K., H. Fukuzawa, and T. Kohchi, et al. (13 co-authors). 1986. Chloroplast gene organization deduced from complete sequence of liverwort Marchantia polymorpha chloroplast DNA. Nature 322:572-574.[ISI]
Parkinson, C. L., K. L. Adams, and J. D. Palmer. 1999. Multigene analyses identify the three earliest lineages of extant flowering plants. Curr. Biol. 9:1485-1488.[CrossRef][ISI][Medline]
Pax, F. 1889. Monimiaceae. Pp. 94105 in A. Engler and K. Prantl, eds. Die Natürlichen Pflanzenfamilien, III(2). W. Engelmann, Leipzig.
Perkins, J., and E. Gilg. 1901. Monimiaceae. Pp 1122 in A. Engler, ed. Das Pflanzenreich, V(101). W. Engelmann, Leipzig.
Perkins, J. 1925. Übersicht über die Gattungen der Monimiaceae. W. Engelmann, Leipzig.
Pichon, P. 1948. Les Monimiacées, famille hétérogène. Bull. Mus. Nat. Hist. Nat. Paris 20:383-384.
Posada, D., and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817-818.[Abstract]
Posada, D., and K. A. Crandall. 2001. Simple (wrong) models for complex trees: a case from retroviridae. Mol. Biol. Evol. 18:271-275.
Qiu, Y-L., J. Lee, F. Bernasconi-Quadroni, D. E. Soltis, P. S. Soltis, M. Zanis, E. A. Zimmer, Z. Chen, V. Savolainen, and M. W. Chase. 1999. The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402:404-407.[CrossRef][ISI][Medline]
Sato, S., Y. Nakamura, T. Kaneko, E. Asamizu, and S. Tabata. 1999. Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 6:283-290.[Medline]
Schmitz-Linneweber, C., R. M. Maier, J. P. Alcaraz, A. Cottet, R. G. Herrmann, and R. Mache. 2001. The plastid chromosome of spinach (Spinacia oleracea): complete nucleotide sequence and gene organization. Plant Mol. Biol. 45:307-315.[CrossRef][ISI][Medline]
Shinozaki, K., M. Ohme, and M. Tanaka, et al. (23 co-authors). 1986. The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J. 5:2043-2049.[ISI]
Soltis, P. S., D. E. Soltis, and M. W. Chase. 1999. Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402:402-403.[CrossRef][ISI][Medline]
Staden, R., K. F. Beal, and J. K. Bonfield. 2000. The Staden package, 1998. Methods Mol. Biol. 132:115-130.[Medline]
Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964-969.
Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
Takhtajan, A. 1966. Systema et phylogenia Magnoliophytorum. Nauka, Moscow, Leningrad.
Van de Peer, Y., and R. De Wachter. 1994. TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput. Appl. Biosci. 10:569-570.[Medline]
von Wettstein, R. 1924. Handbuch der systematischen Botanik. 3rd edition. Franz Deutige, Leipzig.
Wakasugi, T., J. Tsudzuki, S. Ito, K. Nakashima, T. Tsudzuki, and M. Sugiura. 1994. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Natl. Acad. Sci. USA 91:9794-9798.
Whelan, S., and N. Goldman. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18:691-699.
Williams, J. H., and W. E. Friedman. 2002. Identification of diploid endosperm in an early angiosperm lineage. Nature. 415:522-526.[CrossRef][ISI][Medline]
Yang, Z. 1997a. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555-556.[Medline]
Yang, Z. 1997b. How often do wrong models produce better phylogenies? Mol. Biol. Evol. 14:105-108.