Nees Institute for Biodiversity of Plants, University of Bonn, Bonn, Germany
Correspondence: E-mail: c.loehne{at}uni-bonn.de.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: chloroplast noncoding DNA group II intron petD phylogeny microstructural changes basal angiosperms
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
To infer phylogenetic relationships at deeper levels, rather conserved genes such as rbcL have been sequenced, whereas rapidly evolving noncoding regions served to infer relationships among species and genera. Examples of such rapidly evolving parts in the chloroplast genome are the intergenic spacers between atpB and rbcL, trnT and trnL, trnL and trnF, the group I intron in trnL, and the group II introns in rpl16, rps16, rpoC1, trnV, and ndhA (see Soltis and Soltis [1998] and Kelchner [2000, 2002] for reviews), all of which are located in the large and small single-copy regions. The group II introns in rpl2, rps12, and ndhB are situated in the highly conserved inverted repeat and were sequenced for a set of basal angiosperms by Graham and Olmstead (2000a) and Graham et al. (2000). Introns in particular possess a mosaic of highly conserved core elements that are responsible for their function alternating with sequence stretches that might be more or less freely evolving. Nevertheless, the overall variability of spacers and introns is higher than in most coding DNA. Therefore, it has been assumed that high substitution rates are present, leading to saturation and homoplasy, and that frequent length mutations cause homology assessment (alignment) of noncoding sequences to be difficult or even impossible in data sets covering a broad taxonomic spectrum.
Recently, Borsch et al. (2003) employed the noncoding parts of the trnT-trnF region, consisting of two spacers and the trnL group I intron, for deep-level phylogenetic analysis of angiosperms. Their study revealed that high length variability in trnT-trnF was confined to mutational hotspots in the intron and that these corresponded to certain stem-loop elements of the proposed secondary structure. Considering length mutations as single evolutionary events involving one or more nucleotides allowed for reliable alignment of trnT-trnF sequences. This finding and others suggest that microstructural changes in chloroplast genomes follow certain patterns and that understanding the nature of these patterns is essential for phylogenetic interpretation of length variability of sequences (Graham et al. 2000; Kelchner 2000). Analysis of the trnT-trnF data revealed a tree for basal angiosperms largely congruent with multigene, multigenome studies (Qiu et al. 1999; Soltis et al. 2000; Zanis et al. 2002) with most nodes gaining high statistical support. Several questions arise concerning the general significance of these findings. Does extreme variability in other noncoding cpDNA regions also correspond to particular structural elements (resulting in mutational hotspots), and can they be confidently aligned similar to trnT-trnF? Are other noncoding regions effective for deep-level phylogenetic studies as well, or is trnT-trnF an exception? Are there differences in the phylogenetic utility of different types of noncoding regions, such as spacers and group I and group II introns?
Variation in intron sequences is to a large extent correlated with the secondary structure of their RNA, which is essential for the self-splicing function of the intron (Learn et al. 1992). Based on differing RNA folding patterns, organelle introns are classified into group I and II (Michel, Umesono, and Ozeki 1989; Michel and Ferat 1995). Because the trnL intron employed by Borsch et al. (2003) is a group I intron, we searched for an omnipresent and alignable group II intron in the chloroplast genome to compare information content of group I and II introns. We restricted our analyses to the chloroplast genome because it is inherited as a single linkage unit. Consequently, differences between group I and group II introns could be more clearly linked to evolutionary processes operating at structurally and functionally different loci, without having to worry about effects of recombination, hybridization, or lineage sorting (Doyle 1992).
Structure and function of group II introns have been studied in detail by several authors (Michel, Umesono, and Ozeki 1989; Knoop and Brennicke 1993; Michel and Ferat 1995; Bonen and Vogel 2001; Federova, Mitros, and Pyle 2003). The secondary structure model (Michel, Umesono, and Ozeki 1989; Michel and Ferat 1995) has largely been validated in numerous experiments (see Kelchner [2002] for review). Utilizing group II introns as phylogenetic tools, their presence or absence was found to provide valuable information among land plant lineages (e.g., Qiu et al. 1998; Pruchner et al. 2002). In angiosperms, intron losses in different plant lineages have been reported from several chloroplast genes, such as rps16, rpoC1, and rpl16 (see Kelchner [2002] for review). At the sequence level, sound knowledge of the secondary structure allows recognition of structure-linked mutation patterns and, subsequently, their evaluation in a phylogenetic context (Kelchner 2002). So far, chloroplast group II intron sequences yielded well-resolved phylogenies at the genus level (e.g., Kelchner and Clark 1997; Asmussen and Chase 2001; Clausing and Renner 2001). The mitochondrial nad5 intron proved useful among ferns and allies (Vangerow, Teerkorn, and Knoop 1999). To examine their broader applicability, we generated a data set covering the range of seed plants.
We selected the intron in petD for study because (1) no losses of this locus have been reported for angiosperms and gymnosperms, (2) initial alignment using sequences of available chloroplast genomes was successful, and (3) the presence of highly conserved sequences in the flanking regions suitable for designing universal primers for amplification. To our knowledge, the petD intron or the petB-petD intergenic spacer so far had never been used in phylogenetic studies. The petD gene is part of the psbB operon (Westhoff and Herrmann 1988). The pentacistronic primary transcript of this operon is processed into monocistronic and dicistronic mRNAs, with petB and petD normally staying connected as a dicistronic mRNA (Rock, Barkan, and Taylor 1987; Tanaka et al. 1987; Dixit et al. 1999; Monde, Greene, and Stern 2000).
The present study has involved characterization of patterns of variability and homoplasy in the petD intron of flowering plants. We discuss the impact of structural (and functional) constraints on substitutions and microstructural changes in petD and test hypotheses concerning the phylogenetic relationships of basal angiosperms. This second aspect of our work has allowed us to evaluate the potential of group II introns as molecular markers for deeper-level phylogenetic problems.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
DNA Isolation
Total genomic DNA was isolated from fresh or silica geldried leaf tissue. To gain an optimal quantity of high-quality DNA, a modified CTAB method with triple extractions was used (Borsch et al. 2003). After chloroform extraction, DNA was precipitated with isopropanol, resuspended in TE, and further purified by ammonium acetate and sodium acetate washing steps followed by ethanol precipitation.
Primer Design
Universal primers to amplify the petD region (consisting of the petB-petD intergenic spacer, the petD 5' exon, and the petD intron) in seed plants were designed based on the completely sequenced chloroplast genomes of Arabidopsis thaliana (GenBank accession number NC001284), Lotus japonicus (GenBank accession number NC002694), Nicotiana tabacum (GenBank accession number NC001879), Pinus thunbergii (GenBank accession number NC001631), and Spinacia oleracea (GenBank accession number NC001631). The petD region was amplified in one fragment with the forward primers PIpetB1411F (5'-GCCGTMTTTATGTTAATGC-3') or PIpetB1365F (5'-TTGACYCGTTTTTATAGTTTAC-3') that anneal to the 3' exon of petB and the reverse primer PIpetD738R (5'-AATTTAGCYCTTAATACAGG-3') that anneals to the 3' exon of petD (fig. 1). PIpetD346R (5'-TCTTCCTYAGATCCC-3') was designed as an additional internal sequencing primer located in domain I of the petD intron because electropherograms were not readable after homonucleotide strings in the petB-petD spacer of Aristolochia and Ginkgo.
|
Sequence Alignment
Noncoding regions are characterized by relatively high numbers of length mutations in addition to substitutions. Levinson and Gutman (1987) suggested slipped strand mispairing (SSM) as a mechanism that generates length mutations. SSM might be the underlying process for simple sequence repeats, whereas hairpin structures have been shown to favor inversions (Kelchner and Wendel 1996). For correct primary homology assessment (De Pinna 1991) the molecular processes leading to microstructural changes, first pointed out by Gu and Li (1995), have to be considered. Unfortunately, our understanding of the exact mechanisms leading to microstructural changes is still poor, but the resulting sequence motifs can be observed. Although in recent years, there has been considerable progress in automated multiple-sequence alignment methods, ranging from early work on global Needleman-Wunsch scoring criteria to more recent local segmentbased methods (e.g., Dialign2 [Morgenstern 1999]), alignment programs may nevertheless still fail to correctly align sequences containing repeats or inversions. In the present study, such motifs were identified by eye and aligned manually using QuickAlign version 1.5.6 (Müller and Müller 2003). Rules for alignment were proposed on the basis of inferred mechanisms of sequence evolution and the similarity-based criteria for homology assessment suggested by Golenberg et al. (1993), Kelchner and Clark (1997), Hoot and Douglas (1998), Graham et al. (2000), Kelchner (2000), Simmons and Ochoterena (2000) and Borsch et al. (2003). We also suggest additional principles to those described by Borsch et al. (2003). The necessity for these became apparent after considering sequence variability patterns in the petD data set. The principles for alignment follow and are illustrated in figure 2. The petD alignment can be obtained from the corresponding author.
|
(2) For gap placement, positional homology of sequence motifs was given priority when alternative gap placements were possible (fig. 2a). Thus, simple sequence repeats were accounted for as internal paralogs. Based on the sequence variability found in this data set, we considered multiple repeats as multiple events (fig. 2b). A prerequisite, of course, is that primary sequences are sufficiently complex to allow unambiguous motif recognition.
(3) Homonucleotide strings strictly involving only one kind of nucleotide (microsatellites) occurred in different positions. Formally, homonucleotide strings can be considered as stepwise indels (overlapping indels) and can be aligned according to a parsimony principle of individual steps as outlined in principle 6. However, the probability of reversal or parallel length mutations may be quite high, depending on the number and size of repeat units. Borsch et al. (2003) preferred not to align such microsatellites, because prevalence of single nucleotides hinders motif recognition. Microsatellites were frequent in this petD data set, and no substitutions were involved leading to false phylogenetic signal. Thus, for practical reasons, we treated them as overlapping indels.
(4) Entire indels, that is, indels of the same positional extension occurring in several taxa, were aligned in the same column(s). In cases where sequence composition adjacent to an entire indel was not sufficient to restrict the placement of this indel to only a single position, we followed the suggestion of Simmons and Ochoterena (2000) and placed the gap in the same position (same column) in all sequences, because it would be most parsimonious to assume a single event in all taxa (fig. 2c).
(5) Substitutions in indels occurring only in one copy, template or repeat, were excluded from phylogenetic analysis by introduction of ambiguity codes (fig. 2d).
(6) In the case of overlapping indels, a parsimony principle was employed to arrange gaps in a way that globally requires the least number of length mutational events.
(7) Regions of uncertain homology were excluded from analysis (hotspots sensu Borsch et al. [2003]).
(8) Inversions in the petB-petD spacer were reverse complemented in the alignment whenever recognized (fig. 2e). Left unchanged, this would result in substitutions giving false signal (Kelchner 2000; Quandt, Müller, and Huttunen 2003).
Coding of Length Mutational Events
Several workers recently have developed methods to code information from length mutational events and to utilize them in phylogeny reconstruction (e.g., Graham and Olmstead 2000b; Graham et al. 2000; Kelchner 2000; Simmons and Ochoterena 2000). Simmons and Ochoterena (2000) proposed two kinds of formalized coding strategies, both recognizing gaps as characters. We basically used the simple indel-coding approach, thereby assigning character state 1 when the sequence was present in the respective taxon and character state 0 if there was a gap. However, we found patterns of microstructural changes present in our data set that apparently were not covered by the existing simple indel-coding principles. These limitations stem from strictly focusing on gaps as characters, rather than considering any microstructural change as a character. The following additional principles were, thus, employed:
(1) Inversions were coded as a single binary character (1 = present, 0 = lacking [fig. 5]).
|
(3) Multiple repeats within a given sequence were coded as separate indels. This procedure is a consequence of extended alignment principle 2.
(4) Length mutations within homonucleotide strings were not coded. For microsatellites, increased rates of length mutational events may lead to high levels of homoplasy. This condition becomes readily apparent when including sequences from a broad spectrum of plants. Lutzoni et al. (2000) suggested using multiple states for length-variable homonucleotide strands. We omitted these indels in a more conservative approach because current understanding of the evolutionary processes involved at these sites is still very limited.
Secondary Structure
The large-scale study on group II introns by Michel, Umesono, and Ozeki (1989) has provided a secondary structure model that is nowadays widely accepted. For the purpose of this study, the calculation of secondary structure appeared to be unnecessary, because group II intron core structures are highly conserved, and visual examination of sequences allowed the recognition of reverse-complement regions and the demarcation of domain boundaries and structural elements. This approach was facilitated by the petD intron sequences of maize, tobacco, spinach, and the liverwort Marchantia polymorpha already examined by Michel, Umesono, and Ozeki (1989). Classification of elements such as stems, loops, bulges, and interhelical sequences followed Vawter and Brown (1993) and the modifications by Kelchner (2002). Stems are helices formed by complementarily pairing nucleotides (including G-U wobble-pairs where they were not terminating a helix). Single-stranded nucleotide stretches terminating a helix are termed loops, whereas unpaired nucleotides within stems are bulges. Interhelical sequences are those single-stranded nucleotides connecting helices of adjacent domains and subdomains.
Phylogenetic Analysis
For phylogeny reconstruction, the following data partitions were analyzed: intron sequences alone (= intron matrix), intron and spacer sequences combined (= intron + spacer matrix), all indels alone (= indel matrix), intron sequences plus respective indels (= intron + indel matrix), and intron and spacer sequences plus all indels (= combined matrix). Furthermore, all characters of the intron matrix that were assigned to stems and nonpairing elements were analyzed as separate partitions. All characters were equally weighted, and gaps were treated as missing characters. Before combining individual matrices, incongruence-length difference tests were performed in 1,000 random addition replicates using PAUP* version 4.0b10 (Swofford 2002). Parsimony analysis (MP) with PAUP* 4.0b10 employed heuristic searches with 1,000 replicates of random addition and tree bisection and reconnection (TBR) branch swapping. For small matrices, the limit of trees saved was set to 10,000. Measures of support for each node were obtained through bootstrapping (BS) 500 replicates (each with 10 random addition replicates) using PAUP* 4.0b10 and Bremer support (= decay) analysis using PRAP (10 random addition replicates per constraint tree, parsimony ratchet not employed) (Müller 2004).
Bayesian inference (BI) of the substitution-based matrices (intron matrix and intron + spacer matrix) was performed using MrBayes version 2.01 (Huelsenbeck and Ronquist 2001). Following the Akaike Information Criterion in Modeltest version 3.06 (Posada and Crandall 1998) a GTR+I+G model of molecular evolution was implemented. We conducted four runs of Metropolis-coupled Markov Chain Monte Carlo analysis, each with four chains and saving one tree every 100 generations for 1,000,000 generations, starting with a random tree. The temperature for heating was set at 0.2. After 50,000 generations in the first two runs and 70,000 generations in the third and forth run, likelihood scores appeared to be stationary. Consequently, the burn-in was set at this generation, sampling only the trees obtained thereafter. GC content and transition:transversion (ti:tv) ratios were calculated using MEGA version 2.1 (Kumar et al. 2001). Indel and substitution characters were optimized on one of the shortest trees found in the combined data set using Winclada version 1.00.08 (Nixon 2002), assuming accelerated transformation (ACCTRAN).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Microstructural Changes
A total of 241 length mutations (Appendix 2 in Supplementary Material online) were coded in a binary matrix, 73 of which belong to the spacer and 168 of which belong to the intron. Most indels were found to be simple sequence repeats (SSRs; 58% in the spacer and 46% in the intron). All other indels are insertions of unknown origin or deletions. Within repeats, substitutional differences were rare; in 33% of the SSRs, repeat and template differ by one or, rarely, more substitutions. Indel length varies between 1 and 76 nt. Indels longer than 10 nt were relatively rare (only 7% of all indels). Among the shorter indels, most were SSRs, with single-base insertions or deletions representing the most frequent size class (25%), followed by 4-nt and 5-nt indels (16% and 17%, respectively). As mentioned above, the highest number of indels was observed in domains I and IV, whereas in domain V and domain VI, only one length mutational event could be detected (table 1). Concerning structural partitions, indels were most frequent in loops and less frequent in stems (table 2). Actually, only three indels were found in stem regions, all of them being single-base length mutations. Two inversions could be recognized in the spacer (fig. 5), a 4-bp to 6-bp inversion in Cabomba (accurate length cannot be detected because of a palindromic motif) and a 33-nt inversion in Impatiens. In both cases, the inversions were flanked by short (6 bp) inverted-repeat stretches. No inversions were detected in the intron.
|
|
|
|
Considering the intron secondary structure, differences in percentage of variable and informative characters as well as GC content and ti:tiv ratio between the six domains (table 1) and between structural components (table 2) become evident. Domain VI has the highest percentage of variable and informative characters but lowest GC content. In contrast, domains V and VI, both containing relatively small loops, are characterized by low percentage of variable and informative characters but high GC content. Generally, stems have much higher GC contents (49 %, SD = 1.0) than do loops (31%, SD = 1.8 [table 2]). Stems are also characterized by higher ti:tv ratios (4.1, SD = 1.0) than are loop stretches (2.5, SD = 0.9), but ratios are even higher in bulges (4.6, SD = 3.3) and interhelical sequences (4.7, SD = 6.4).
Trees Obtained from Individual and Combined Data Partitions
Incongruence-length difference tests (substitutions versus indels and spacer versus intron) indicated that data partitions are not significantly incongruent (P values ranging from 0.09 to 1.0) and, therefore, were combined for phylogenetic analysis. Table 3 gives an overview of the trees obtained from parsimony analyses. Analysis of the intron matrix (first column of table 3) revealed 114 shortest trees with a CI of 0.564 and a RC of 0.365. By combining intron and spacer sequences (second column) the number of trees was reduced to 32, but CI and RC increased only slightly. Most-parsimonious (MP) analysis of the indel matrix revealed a lower homoplasy (CI = 0.898, RC = 0.796 [Appendix 3 in Supplementary Material online). In the combined analyses of indels and substitutions, six or two MP trees were recovered (fig. 6; table 3, columns 4 and 5). CI and RC values were considerably higher than in analyses of substitutions alone (columns 1 and 2) but lower than in the indel matrix (column 3). The stem partition only comprises 192 characters (nonparing = 800) and only resolves few clades, such as Nymphaeaceae and Piperaceae (Appendix 5 in Supplementary Material online).
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Kind and Distribution of Length Mutations
Analysis of length mutations in the petD data set allowed identification and coding of 219 indels. The resulting indel matrix is one of the largest ever compiled (Simmons, Ochoterena, and Carr 2001), allowing a thorough analysis of frequency, size distribution, and kind of length mutations. Most indels (57%) in the petD region are simple sequence repeats. This is congruent with the findings of Graham et al. (2000) in the chloroplast inverted repeat and Borsch et al. (2003) in the trnT-trnF region. Among the SSRs, most indels result from single-base events or are 4 to 5 bp long. Indels of 2 to 3 bp or more than 6 bp are considerably less frequent, as also has been observed with other noncoding cp DNA regions (Graham et al. 2000; Borsch et al. 2003).
Structural Effects on Sequence Evolution in the Group II Intron
Group II introns can be subdivided into six domains, with domain I being the largest and most complex, whereas domains II to VI are simple stem-loop structures (Michel, Umesono, and Ozeki 1989; Michel and Ferat 1995). Categorization of these components for group II introns makes it possible to compare sequence evolution of different structural partitions and allows analysis of the contribution of different structural elements to phylogeny reconstructions. Similar to the rpl16 intron in Myoporaceae (Kelchner 2002), almost 90% of sequence could be unequivocally assigned to elements such as stems, loops, bulges, and interhelical stretches in the petD intron. The comparison of petD intron sequences across basal angiosperms shows stem regions to contain fewer variable sites and also fewer potentially informative characters than loops and other nonpairing stretches. The same pattern is found in the rpl16 intron in Myoporaceae (Kelchner 2002), although distances are generally smaller in this family level data set. Thus, the group II intron core structure seems to be a governing factor for mutational dynamics, regardless of the gene in which the intron is inserted.
Within the petD intron, GC content and ti:tv ratios are higher in stems and lower in loops and, corresponding to that, GC content and ti:tv ratios are higher in domains that contain large proportions of stem regions (domains I, V, and VI) and lower in domains consisting mainly of loop stretches (table 1). GC-rich stems have a ti:tv of 4.12 (SD = 1.0), whereas AT-rich loops have a ti:tv of 2.52 (SD = 0.9 [table 2]). It appears that ti:tv ratios are a function of the respective GC content, in line with observations of Bakker et al. (2000) on trnL-F sequences. This finding is contrary to prevailing thinking that high ti:tv ratios depend on saturation through multiple transitions (Hillis, Allard, and Miyamoto 1993), thereby reflecting their evolutionary distance. Maintenance of secondary structure in GC-rich stem regions through compensatory mutations might further favor transitions (Rousset, Pélandakis, and Solignac 1991; Bakker et al. 2000). Obviously, stems evolve under high functional constraints because they are essential for the secondary structure and splicing function of the group II intron. Therefore, stems are less variable regarding nucleotide substitutions as well as length mutations, in line with findings in the mitochondrial rps3 intron (Laroche and Bousquet 1999). In fact, only three indels have been observed in stems, all of which are 1 bp. Indels longer than 1 bp only occur in nonpairing sequence stretches. In contrast, nonpairing DNA is more AT rich and bears more variable and potentially informative characters, as well as length mutations.
The six intron domains are characterized not only by a typical secondary structure but also by their specific function in the splicing reaction (Dib-Hajj et al. 1993; Costa, Michel, and Westhof 2000). Domain VI, for instance, is one of the most important structures in the group II intron. Functional constraints on the evolution of this particular domain are supposed to be high. In fact, the frequency of length mutations in domain VI is considerably lower than in domain I, although both have almost the same amount of loop stretches relative to their total sequence length. Differences in domain conservation imply differences in phylogenetic utility. Less-conserved regions (e.g., domains II to IV and loops in general) should provide information for the terminal nodes, whereas more-conserved regions (domains V and VI and stems in general) possibly provide information for basal nodes. Because findings on different molecular evolution in domains of the petD intron corroborate findings made in rpl16 (Kelchner 2002), these considerations might be valid for group II introns in general. Parts of the less-conserved domain Ic and d2 loops, as well as of the domain VI loop of the petD intron, were so variable that they could not be aligned unambiguously across seed plants. Similar to the trnL intron (Borsch et al. 2003), hotspots are strictly confined to loop stretches and do not comprise more than 18% of the whole-intron sequence length (20% in trnL). Thus, they do not impair the general utility of that kind of noncoding cpDNA for phylogenetic reconstruction on higher taxonomic levels.
Molecular Evolution of the Spacer
In contrast to the trnT-L and trnL-F spacers, the petB-petD intergenic spacer is less variable. Because the whole petB-petD region persists as a dicistronic mRNA after transcription, it can be assumed that RNA secondary structure is important for the evolution of the petB-petD spacer. Moreover, this spacer is considered to play an important role in translation of the petD gene, by containing a sequence motif that allows ribosomes to detect the petD initiation codon (Monde, Greene, and Stern 2000). However, the secondary structure of the petB-petD intergenic spacer has not been analyzed, and information on sites relevant for translation of the petD gene is not yet available. The two inversions observed in Cabomba and Impatiens are associated with short inverted repeats, indicating the presence of stem-loop structures. In other taxa, the regions enclosed by these inverted repeats show numerous length mutations as well as substitutions (see figure 5b).
Phylogenetic Signal of petD Sequence Data
Differential analysis of data partitions and the respective tree statistics provide evidence for the high potential of microstructural changes as phylogenetic characters. Overall, indel characters are considerably less homoplastic than substitutions in this petD data set (table 3, columns 2 and 3). This finding becomes evident when comparing CI and RC values of the indel tree (Appendix 3 in Supplementary Material online) with the trees inferred from substitutions only (indels: CI = 0.895, RC = 0.789; substitutions: CI = 0.566, RC = 0.368). The resolution of the indel tree is even more striking, given that only 59 out of 168 indels in the intron are parsimony informative (35%). The spacer yields 21 parsimony-informative indels out of 73. Most of the parsimony-informative indels (71%) are synapomorphies (CI, RI = 1), whereas 23 out of 79 informative indels were reconstructed to have originated two or more times independently (empty boxes in figure 6). Homoplastic indels in the intron are generally located in loop regions. In the spacer, only indel 59 is homoplastic, which is part of a possible loop between inverted-repeat stretches (fig. 5b). This confirms the theoretical expectation that structural constraints have effects on the frequency of length mutations.
Indels are relatively rare at the deepest nodes within angiosperms. Nevertheless, indel number 53, a simple sequence repeat of 4 nt, is synapomorphic for the magnoliid clade (comprising Magnoliales, Laurales, Canellales, and Piperales [fig. 5b]). The magnoliids are one of the major recently recovered angiosperm clades. This clade has also been revealed by substitution-based trees of this and of other noncoding (Borsch et al. 2003) and coding data sets (e.g., Qiu et al. 1999, Hilu et al. 2003) and now is clearly substantiated by indel information. Compared with substitutions, indels may be regarded as independent evidence because microstructural changes result from different mutational processes.
Although petD sequences reveal Amborella and Nymphaeales as first branching angiosperms, they provide no statistical support for the basal grade (figs. 6 and 7). Short branch lengths (phylogram in figure 7) indicate a possible lack of mutations that were fixed during early divergence of angiosperm lineages. Similarly, there is no support from the indels for these nodes (Appendix 3 in Supplementary Material online and figure 6). Plotting substitutions on the tree (Appendix 4 in Supplementary Material online) allowed assessment of homoplasy at particular sites and an assignment of individual characters to structural elements of the group II intron. In total, the backbone of angiosperms (nodes from Amborella to eudicots) is supported by only 28 substitutions among the intron characters, averaging four to five per node. All of these substitutions occur in nonpairing elements (21 in loops, five in bulges, and two in interhelical elements). Synapomorphic states supporting angiosperms above Amborella are in character 1509 and, for angiosperms above Nymphaeales, in character 1014 (exon binding site 1). A substitution in character 1564 is synapomorphic for magnoliids. All other characters variable at the basal nodes are homoplastic, often exhibiting repeated substitutions or reversals within eudicots and monocots. Thus, short branches at the base of the angiosperm tree are predominantly caused by a lack of mutations, which may be explained by either a rapid radiation of these lineages or substitutional rates increasing through time. In addition, the respective variable characters are likely to be homoplastic because they are mostly located in rather freely evolving loops of the intron. Microstructural changes also have accumulated in unpaired elements, whereas stems are structurally conserved and show almost no length variability across angiosperms. The rates of length mutations present in unpaired elements of petD obviously have not led to noticeable homoplasy. The above mentioned structural conservation of stems extends to their conservation in primary sequence. Across the angiosperm tree, substitutions are very few in the stem partition of petD (Appendix 5b in Supplementary Material online). Contrary to the expectation that stem elements would provide no information for terminal nodes, some terminals are resolved in the stem partition tree, such as Nymphaeales, the genus Piper, or the Annonaceae. These lineages accumulate both indels and substitutions (see Appendix 5 in Supplementary Material online and figure 6), which points to a lineage-specific acceleration of mutational rates. Higher variability, even in stem elements, may be caused by possibly relaxed constraints on helical parts of the petD intron in these lineages. Accumulation of length mutations may be a trend in the chloroplast genome of Piperaceae and the Nymphaeales because this accumulation was also observed in the trnT-trnF region (Borsch et al. 2003).
Substitutional patterns in stem regions are often biased by compensatory mutations because maintenance of secondary structure is essential for the self-splicing mechanism. Because single events, thus, lead to double substitutions, signal from respective substitutions might be overweighted in phylogenetic analysis. Therefore, several authors argued for not including such characters in phylogenetic analyses. In the petD intron, 192 out of 1,055 characters belong to stem regions, but out of these characters, only 64 are variable (in contrast to 406 variable characters in the whole intron). Thus, a maximum of 16% of all substitutions might be compensatory mutations. This percentage is higher than in the trnL intron (7% [Borsch et al. 2003]) but still considerably lower than in 18S rDNA (73% [Soltis et al. 1999]). Compensatory mutations, therefore, probably play a minor role in sequence evolution of the petD intron.
This study has demonstrated that the partitioned analysis of structural elements (Appendix 5 in Supplementary Material online) is a valuable tool for unraveling effects of mutational dynamics on phylogenetic signal. However, the present petD data set only contains 192 stem characters, 45 of which are potentially parsimony informative across seed plants. Thus, stem partitions will not have the potential to fully resolve parts of the angiosperm tree, simply because there is insufficient information. Additional group II intron data sets may well provide further insight.
Phylogeny of Basal Angiosperms
The trees inferred from the petD region are consistent with those based on multigene, multigenome data sets (Qiu et al. 1999; Soltis et al. 2000; Zanis et al. 2002) and with trees obtained from other rapidly evolving genomic regions such as trnT-trnF (Borsch et al. 2003) and matK (Hilu et al. 2003). PetD further substantiates Amborella as sister to all other angiosperms, although a contrary hypothesis on the root of angiosperms has recently been proposed (Goremykin et al. 2003).
Within Nymphaeales, three clades are clearly supported: (1) Cabombaceae, (2) the genus Nuphar, and (3) a clade consisting of Nymphaea + Victoria. However, the monophyly of Nymphaeaceae as comprising Nuphar, Nymphaea, and Victoria is not substantiated by the petD data set. Actually, Nuphar is found either sister to Cabombaceae (MP [fig. 6]) or first branching in Nymphaeaceae (BI [fig. 7]). Further studies are needed to clarify relationships, because resolution and support among the three major Nymphaeales lineages are also low in other studies (e.g., Borsch et al. 2003, Soltis et al. 2000, Qiu et al. 1999, Zanis et al. 2002), or monophyly of Nymphaeaceae s.str. was assumed a priori (Les et al. 1999).
The magnoliids consisting of Laurales + Magnoliales and Piperales + Canellales are consistently resolved in petD analyses. This underscores the utility of rapidly evolving regions, including petD (Borsch et al. 2003; Hilu et al. 2003). Contrary to trees based on single, slowly evolving genes, magnoliids were otherwise only inferred by multigene analyses combining all three genomic compartments (Qiu et al. 1999). Within magnoliids, results of this petD data set are particularly relevant because BI and MP infer Piperales and Canellales as sister groups. In all previous analyses, there has been medium or only low statistical support for this sister group, and BI of matK sequences even found Piperales first branching within magnoliids (Hilu et al. 2003). Thus, molecular evidence seems to converge upon close relationships between Piperales and Canellales, contrary to phylogenetic analyses of morphological characters (Doyle and Endress 2000). Within Magnoliales, the petD data set provides high support and a topology that mirrors exactly the conclusions drawn in a recent analysis by Sauquet et al. (2003) using molecules and morphology. The respective positions of Ceratophyllum and Chloranthaceae, as well as monocots and eudicots, could not be clarified here because of the small number of characters in petD (half of the trnT-trnF data set [Borsch et al. 2003]); this clarification will have to await future combined analyses. Within eudicots, the topoloy is fully congruent with data sets containing large numbers of taxa (Soltis et al. 2000; Hilu et al. 2003), suggesting that petD will provide valuable information for further resolving eudicot relationships.
![]() |
Conclusion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Asmussen, C. B., and M. W. Chase. 2001. Coding and noncoding plastid DNA in palm systematics. Am. J. Bot. 88:11031117.
Bakker, F., A. Culham, R. Gomez-Martinez, J. Carvalho, J. Compton, R. Dawtrey, and M. Gibby. 2000. Patterns of nucleotide substitution in angiosperm cpDNA trnL (UAA)-trnF (GAA) regions. Mol. Biol. Evol. 17:11461155.
Bonen, L., and J. Vogel. 2001. The ins and outs of group II introns. Trends Genet. 17:322331.[CrossRef][ISI][Medline]
Borsch, T., K. W. Hilu, D. Quandt, V. Wilde, C. Neinhuis, and W. Barthlott. 2003. Non-coding plastid trnT-trnF sequences reveal a well resolved phylogeny of basal angiosperms. J. Evol. Biol. 16:558576.[CrossRef][ISI][Medline]
Clausing, G., and S. S. Renner. 2001. Molecular phylogenetics of Melastomataceae and Memecylaceae: implications for character evolution. Am. J. Bot. 88:486498.
Costa, M., F. Michel, and E. Westhof. 2000. A three-dimensional perspective on exon binding by a group II self-splicing intron. EMBO J. 19:50075018.
De Pinna, M. C. C. 1991. Concepts and tests of homology in the cladistic paradigm. Cladistics 7:367394.[ISI]
Dib-Hajj, S. D., S. C. Boulanger, S. K. Hebbar, C. L. Peebles, J. S. Franzen, and P. S. Perlman. 1993. Domain 5 interacts with domain 6 and influences the second transesterification reaction of group II intron self-splicing. Nucleic Acids Res. 21:17971804.[Abstract]
Dixit, R., P. K. Trivedi, P. Nath, and R. V. Sane. 1999. Organization and post-transcriptional processing of the psbB operon from chloroplasts of Populus deltoides. Curr. Genet. 36:165172.[CrossRef][ISI][Medline]
Doyle, J. A., and P. K. Endress. 2000. Morphological phylogenetic analysis of basal angiosperms: comparison and combination with molecular data. Int. J. Plant Sci. 161:S121S153.[CrossRef][ISI]
Doyle, J. J. 1992. Gene trees and species trees: molecular systematics as one-character taxonomy. Syst. Bot. 17:144163.[ISI]
Federova, O., T. Mitros, and A. M. Pyle. 2003. Domains 2 and 3 interact to form critical elements of the group II intron active site. J. Mol. Biol. 330:197209.[CrossRef][ISI][Medline]
Golenberg, E. M., M. T. Clegg, M. L. Durbin, J. Doebley, and D. P. Ma. 1993. Evolution of a non-coding region of the chloroplast genome. Mol. Phylogenet. Evol. 2:5264.[CrossRef][Medline]
Goremykin, V. V., K. I. Hirsch-Ernst, S. Wolfl, and F. H. Hellwig. 2003. Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm. Mol. Biol. Evol. 20:14991505.
Graham, S. W., and R. G. Olmstead. 2000a. Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Curr. Genet. 37:183188.[CrossRef][ISI][Medline]
. 2000b. Utility of 17 chloroplast genes for inferring the phylogeny of the basal angiosperms. Am. J. Bot. 87:17121730.
Graham, S. W., P. A. Reeves, A. C. E. Burns, and R. G. Olmstead. 2000. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161:S83S96.[CrossRef][ISI]
Gu, X., and W. H. Li. 1995. The size distribution of insertions and deletions in human and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J. Mol. Evol. 40:464473.[ISI][Medline]
Hillis, D. M., M. W. Allard, and M. M. Miyamoto. 1993. Analysis of DNA sequence data: phylogenetic inference. Methods Enzymol. 224:456490.[ISI][Medline]
Hilu, K. W., T. Borsch, K. Müller et al. (16 co-authors). 2003. Angiosperm phylogeny based on matK sequence information. Am. J. Bot. 90:17581776.
Hoot, S. B., and A. W. Douglas. 1998. Phylogeny of the Proteaceae based on atpB and atpB-rbcL intergenic spacer region sequences. Aust. Syst. Bot. 11:301320.[CrossRef][ISI]
Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754755.
Kelchner, S. A. 2000. The evolution of non-coding chloroplast DNA and its application in plant systematics. Ann. MO Bot. Gard. 87:482498.
. 2002. Group II introns as phylogenetic tools: structure, function, and evolutionary constraints. Am. J. Bot. 89:16511669.
Kelchner, S. A., and L. G. Clark. 1997. Molecular evolution and phylogenetic utility of the chloroplast rpl16 intron in Chusquea and the Bambusoideae (Poaceae). Mol. Phylogenet. Evol. 8:385397.[CrossRef][ISI][Medline]
Kelchner, S. A., and J. F. Wendel. 1996. Hairpins create minute inversions in non-coding regions of chloroplast DNA. Curr. Genet. 30:259262.[CrossRef][ISI][Medline]
Knoop, V., and A. Brennicke. 1993. Group II introns in plant mitochondriaTrans-splicing, RNA editing, evolution and promiscuity. Pp. 221232 in Brennicke, A. and U. Kück, eds. Plant mitochondria. VCH Verlag, Weinheim, Germany.
Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA 2.1: molecular evolutionary genetics analysis software. Arizona State University, Tempe, Arizona.
Laroche, J., and J. Bousquet. 1999. Evolution of the mitochondrial rps3 intron in perennial and annual angiosperms and homology to nad5 intron 1. Mol. Biol. Evol. 16:441452.[Abstract]
Learn, G. H. J., J. S. Shore, G. R. Furnier, G. Zurawski, and M. T. Clegg. 1992. Constraints on the evolution of plastid introns: the Group II intron in the gene encoding tRNA-Val (UAC). Mol. Biol. Evol. 9:856871.[Abstract]
Les, D. H., E. L. Schneider, D. J. Padgett, P. S. Soltis, D. E. Soltis, and M. Zanis. 1999. Phylogeny, classification and floral evolution of water lilies (Nymphaeaceae; Nymphaeales): a synthesis of non-molecular, rbcL, matK, and 18S rDNA data. Syst. Bot. 24:2846.[ISI]
Levinson, G., and G. Gutman. 1987. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4:203221.[Abstract]
Lutzoni, F., P. Wagner, V. Reeb, and S. Zoller. 2000. Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology. Syst. Biol. 49:628651.[CrossRef][ISI][Medline]
Mathews, S., and M. J. Donoghue. 2000. Basal angiosperm phylogeny inferred from duplicate phytochromes A and C. Int. J. Plant Sci. 161:S41S55.[CrossRef][ISI]
Michel, F., and J.-L. Ferat. 1995. Structure and activities of group II introns. Ann. Rev. Biochem. 64:435461.[CrossRef][ISI][Medline]
Michel, F., K. Umesono, and H. Ozeki. 1989. Comparative and functional anatomy of group II catalytic intronsa review. Gene 82:530.[CrossRef][ISI][Medline]
Monde, R.-A., J. C. Greene, and D. B. Stern. 2000. Disruption of the petB-petD intergenic region on tobacco chloroplasts affects petD RNA accumulation and translation. Mol. Gen. Genet. 263:610618.[CrossRef][ISI][Medline]
Morgenstern, B. 1999. Dialign 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15:211218.
Müller, J., and K. Müller. 2003. QuickAlign: A new alignment editor. Plant Mol. Biol. Reporter 21:5.
Müller, K. 2004. PRAP computation of Bremer support for large data sets. Mol. Phyl. Evol. 31:780782.[CrossRef][ISI][Medline]
Nixon, K. C. 2002. WinClada. Version 1.00.08. Published by the author, Ithaca, NY.
Posada, D. and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817818.[Abstract]
Pruchner, D., S. Beckert, H. Muhle, and V. Knoop. 2002. Divergent intron conservation in the mitochondrial nad2 gene: signatures for the three bryophyte classes (mosses, liverworts, and hornworts) and the lycophytes. J. Mol. Evol. 55:265271.[CrossRef][ISI][Medline]
Qiu, Y.-L., Y. Cho, J. C. Cox, and J. D. Palmer. 1998. The gain of three mitochondrial introns identifies liverworts as the earliest land plants. Nature 394:671674.[CrossRef][ISI][Medline]
Qiu, Y.-L., J. L. Lee, F. Bernasconi-Quadroni, D. E. Soltis, P. S. Soltis, M. Zanis, E. A. Zimmer, Z. Chen, V. Savolainen, and M. W. Chase. 1999. The earliest angiosperms: evidence form mitochondrial, plastid and nuclear genomes. Nature 402:404407.[CrossRef][ISI][Medline]
Quandt, D., K. Müller, and S. Huttunen. 2003. Characterisation of the chloroplast DNA psbT-H region and the influence of dyad symmetrical elements on phylogenetic reconstructions. Plant Biol. 5:400410.[CrossRef][ISI]
Rock, C. D., A. Barkan, and W. C. Taylor. 1987. The maize plastid psbB-psbF-petB-petD gene cluster: spliced and unspliced petB and petD RNAs encode alternative products. Curr. Genet. 12:6977.[ISI][Medline]
Rousset, F., M. Pélandakis, and M. Solignac. 1991. Evolution of compensatory substitutions through G-U intermediate state in Drosophila rRNA. Proc. Natl. Acad. Sci. USA 88:1003210036.[Abstract]
Sauquet, H., J. A. Doyle, T. Scharaschkin, T. Borsch, K. W. Hilu, L. W. Chatrou, and A. Le Thomas. 2003. Phylogenetic analysis of Magnoliales and Myristicaceae based on multiple data sets: implications for character evolution. Bot. J. Linnean Soc. 142:125186.[CrossRef][ISI]
Schmitz-Linneweber, C., R. M. Maier, J. P. Alcaraz, A. Cottet, R. G. Herrmann, and R. Mache. 2001. The plastid chromosome of spinach (Spinacia oleracea): complete nucleotide sequence and gene organization. Plant Mol. Biol. 45:307315.[CrossRef][ISI][Medline]
Simmons, M. P., and H. Ochoterena. 2000. Gaps as characters in sequence-based phylogenetic analyses. Syst. Biol. 49:369381.[CrossRef][ISI][Medline]
Simmons, M. P., H. Ochoterena, and T. G. Carr. 2001. Incorporation, relative homoplasy, and effect of gap characters in sequence-based phylogenetic analyses. Syst. Biol. 50:454462.[CrossRef][ISI][Medline]
Soltis, D. E., and P. S. Soltis. 1998. Choosing an approach and an appropriate gene for phylogenetic analysis. Pp. 142 in D. E. Soltis, P. S. Soltis, and J. J. Doyle, eds. Molecular systematics of plants II: DNA sequencing. Kluwer Academic Publishers, Boston.
Soltis, D. E., P. S. Soltis, M. W. Chase et al. (16 co-authors). 2000. Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Bot. J. Linnean Soc. 133:381461.[CrossRef][ISI]
Soltis, P. S., D. E. Soltis, P. G. Wolf, D. L. Nickrent, S.-M. Chaw, and R. L. Chapman. 1999. The phylogeny of land plants inferred from 18S rDNA sequences: pushing the limits of rDNA signal?. Mol. Biol. Evol. 16:17741784.
Swofford, D. L. 2002. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4.0b10. Sinauer Associates, Sunderland, Mass.
Tanaka, M., J. Obokata, J. Chunwongse, J. Shinozaki, and M. Suguira. 1987. Rapid splicing and stepwise processing of a transcript from the psbB operon in tobacco chloroplasts: determination of the intron sites in petB and petD.. Mol. Gen. Genet. 209:427431.[ISI]
Tsudzuki, J., K. Nakashima, T. Tsudzuki, J. Hirasuka, M. Shibata, T. Wakasugi, and M. Sugiura. 1992. Chloroplast DNA of black pine retains a residual inverted repeat lacking rRNA genes: nucleotide sequences of trnQ, trnK, psbA, trnI and trnH and the absence of rps16.. Mol. Gen. Genet. 232:206214.[ISI][Medline]
Vangerow, S., T. Teerkorn, and V. Knoop. 1999. Phylogenetic information in the mitochondrial nad5 gene of pteridophytes: RNA editing and intron sequences. Plant Biol. 1:235243.[ISI]
Vawter, L., and W. M. Brown. 1993. Rates and patterns of base change in the small subunit ribosomal RNA gene. Genetics 134:597608.
Westhoff, P., and R. G. Herrmann. 1988. Complex RNA maturation in the chloroplast: the psbB operon from spinach. Eur. J. Biochem. 171:551564.[Abstract]
Zanis, M., D. E. Soltis, P. S. Soltis, S. Mathews, and M. J. Donoghue. 2002. The root of the angiosperms revisited. Proc. Natl. Acad. Sci. USA 99:68486853.