Department of Biology, University of Virginia
Correspondence: E-mail: pelle{at}eg.umu.se.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: insertions deletions chloroplast Silene
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Several molecular processes are known to create indels. Polymerase slippages during DNA replication, so called slipped-strand mispairing (Levinson and Gutman 1987), add or subtract short repeat sequences, usually one or a few base pairs in length. Repeat structures in chloroplast DNA are primarily found in AT-rich regions and often involve long stretches of repeats of a single nucleotide (Kelchner 2000). Larger indels are often associated with the formation of hairpins (Kelchner and Wendel 1996) or stem-loop structures in DNA secondary structure (Kelchner 2000), and these indels may or may not show sequence similarity with the flanking region of the indel site. Different types of indels also show varying amounts of homoplasy. In general, in between-species studies, repeat indels seems to be more prone to homoplasy, simply because the rate at which they occur appears to be higher than larger indels (Olsen 1999; Kelchner 2000). Another cause of homoplasy is multiple, overlapping indels within a single region of DNA (Kelchner 2000; Simmons and Ochoterena 2000).
Here, we present a study of the molecular evolution of indels in three intergenic spacers and one intron from the chloroplast genome of Silene latifolia and S. vulgaris. We show that indels in these four regions evolve at slightly higher rates than base pair substitutions. Repeat indels appear to evolve at higher rates than other types of indels and are thus more prone to homoplasy. We also show that the indel data have high information content for phylogenetic analysis, and coded indels can provide useful information to infer phylogenetic relationships at the intraspecific level.
![]() |
Material and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Repeat indels arise through replication slippage and are often assumed to evolve according to a stepwise mutation model (Levinson and Gutman 1987). We therefore performed all analyses of the indel data with repeat indels assigned as either unordered or ordered characters in PAUP* (Swofford 1998). As a measure of the level of homoplasy in the data set, we scored the consistency index (CI) (Kluge and Farris 1969), the retention index (RI) and the rescaled consistency index (RC) (Farris 1989). Indels were then grouped by type (true indel or repeat indel) to determine whether the two types of indels provide the same phylogenetic resolution.
Next, we appended the two different data sets (base pair substitutions and indels) for the four chloroplast regions for individuals where all four regions had been sequenced. This combined data set consisted of 25 S. latifolia and 29 S. vulgaris individuals. We performed an incongruence length difference (ILD) test to determine whether the two types of data (base pair substitution or indels) produced phylogenetic trees that were congruent with each other (Farris et al. 1994). The data set containing base pair substitutions included only variable sites to avoid artificially inflating the results of ILD test (Cunningham 1997; Lee 2001).
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
However, not all indels are equal. Single and dinucleotide repeat indels appeared to evolve at a faster rate than either base pair substitutions or other types of indels, as evidenced by the greater degree of homoplasy for repeat indels in both S. latifolia and S. vulgaris (table 1; see also Supplementary Material online). This was true regardless of whether repeat indels were scored as unordered or ordered characters in the parsimony analyses (data not shown).
The phylogenetic utility of indels is corroborated by the ILD test that show no evidence of phylogenetic incongruence between parsimony trees based on base pair substitution data or indel data alone (ILD test; P = 0.98 for S. latifolia and P = 0.98 for S. vulgaris). Not surprisingly, based on strict consensus trees, a combined data set of base pair substitutions and indels produced phylogenetic trees with higher resolution than trees based on base pair substitutions alone (fig. 1). Chloroplast DNA have lower substitution rate than nuclear DNA in plants (Wolfe, Li, and Sharp 1987; Muse 2000), and sequence diversity and phylogenetic resolution at the intraspecific levels is generally low for moderate amounts of sequence data (1 to 2 kb). However, our results indicate that coded indels have levels of homoplasy comparable with base pair substitutions, and including coded indels may therefore increase the resolution of phylogenetic studies. Moreover, some forms of indels have levels of homoplasy virtually identical to base pair substitutions, where other types, primarily repeat indels, are far less reliable. For example, we constructed phylogenetic trees for both S. latifolia and S. vulgaris using combined data from all four chloroplast regions, including both base pair substitutions and indels. In S. latifolia, the rescaled consistency index (RC) increased from 0.526 to 0.620 when repeat indels were excluded, whereas it is 0.832 for base pair substitutions alone. For S. vulgaris, RC equals 0.441 with all characters included in the analysis and 0.681 without repeat indels. For S. vulgaris, RC equals 0.877 for the base pairs substitutions alone.
|
We have shown that phylogenetic studies performed at the intraspecific levels may benefit from including coded indel data, although our data suggest that repeat indels are less reliable than other types of indels and should be avoided, if possible. At the present time, using coded indels has the drawback of restricting the analytical methods one can use. We have shown their utility in a parsimony-based analysis, but it would be more of a challenge to employ a maximum-likelihoodbased approach without specific models of evolution, such as those available for base pair substitutions (see McGuire, Denham, and Balding [2001] for an example of such a model that includes indels). Nevertheless, we see the present study as a necessary first step in developing models of evolution for the different types of insertions and deletions in DNA sequence data.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Geoffrey McFadden, Associate Editor
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Cunningham, C. W. 1997. Can three incongruence tests predict when data should be combined? Mol. Biol. Evol. 14:733-740.[Abstract]
Farris, J. S. 1989. The retention index and the rescaled consistency index. Cladistics 5:417-419.[ISI]
Farris, J. S., M. Källersjö, A. G. Kluge, and C. Bult. 1994. Testing significance of congruence. Cladistics 10:315-319.[CrossRef][ISI]
Golenberg, E. M., M. T. Clegg, M. L. Durbin, J. Doebley, and D. P. Ma. 1993. Evolution of a non-coding region of the chloroplast genome. Mol. Phylogenet. Evol. 2:52-64.[CrossRef][Medline]
Graham, S. W., P. A. Reeves, A. C. E. Burns, and R. G. Olmstead. 2000. Microstructural changes in non-coding DNA: interpretation, evolution and utility of indels and inversions in basal angiosperm phylogenetic inference. Int. J. Plant Sci. 161:S83-S96.[CrossRef][ISI]
Hamilton, M. B. 1999. Four primer pairs for the amplification of chloroplast intergenic regions with intraspecific variation. Mol. Ecol. 8:521-523.[ISI][Medline]
Ingvarsson, P. K., and D. R. Taylor. 2002. Genealogical evidence for epidemics of selfish genes. Proc. Natl. Acad. Sci. USA 99:11265-11269.
Kelchner, S. A. 2000. The evolution of non-coding chloroplast DNA and its application in plant systematics. Ann. MO Bot. Gard. 87:499-527.
Kelchner, S. A., and L. G. Clark. 1997. Molecular evolution and phylogenetic utility of the chloroplast rpl16 intron in Chusquea and the Bambusoideae (Poaceae). Mol. Phylogenet. Evol. 8:385-397.[CrossRef][ISI][Medline]
Kelchner, S. A., and J. F. Wendel. 1996. Hairpins create minute inversions in non-coding regions of chloroplast DNA. Curr. Genet. 30:259-262.[CrossRef][ISI][Medline]
Kluge, A. G., and J. S. Farris. 1969. Quantitative phyletics and the evolution of anurans. Syst. Zool. 18:1-32.[ISI]
Lee, M. S. Y. 2001. Uninformative characters and apparent conflict between molecules and morphologhy. Mol. Biol. Evol. 18:676-680.
Levinson, G., and G. A. Gutman. 1987. Slipped-strand misparing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4:203-221.[Abstract]
McCauley, D. E. 1995. The use of chloroplast DNA polymorphism in studies of gene flow in plants. Trends Ecol. Evol. 10:190-202.[CrossRef][ISI]
McGuire, G., M. C. Denham, and D. J. Balding. 2001. Models of sequence evolution for DNA sequences containing gaps. Mol. Biol. Evol. 18:481-490.
Muse, S. V. 2000. Examining rates and patterns of nucleotide substitution in plants. Plant Mol. Biol. 42:25-43.[CrossRef][ISI][Medline]
Olsen, K M. 1999. Minisatellite variation in a single-copy nuclear gene: phylogenetic assessment of repeat length homoplasy and mutational mechanism. Mol. Biol. Evol. 16:1406-1409.
Simmons, M. P., and H. Ochoterena. 2000. Gaps and characters in sequence-based phylogenetic analysis. Syst. Biol. 42:369-381.[CrossRef]
Swofford, D. L. 1998. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
Taberlet, P., L. Gielly, G. Pautou, and J. Bouvet. 1991. Universal primers for amplification of three non-coding regions of chloroplast DNA. Plant Mol. Biol. 17:1105-1109.[ISI][Medline]
Wolfe, K. H., W-H. Li, and P. M. Sharp. 1987. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc. Natl. Acad. Sci. USA 84:9054-9058.[Abstract]