*Fukui Prefectural University, Fukui, Japan;
National Institute of Genetics, Mishima, Japan;
Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Grass evolution has mainly been studied by analyzing diversity seen in various morphological, cytological, and molecular traits (Kellogg 1998
). Of those traits, variation of chloroplast DNA (cpDNA) has advantages in comparative studies. In general, cpDNA has a low rate of nucleotide substitution, which facilitates comparison of variation in a wide range of plant taxa. Furthermore, uniparental inheritance lowers the impact of intermolecular recombination and helps to simplify theories of chloroplast genome evolution in most plant taxa. Coupled with advances in clarifying entire chloroplast genome sequences (Hiratsuka et al. 1989
for rice; Maier et al. 1995
for maize), comparative analyses of grass cpDNA variation have complemented morphological studies and given us novel insights into grass evolution.
Although the analysis of cpDNA has contributed significantly to our understanding of grass evolution, most of the conclusions drawn so far are based on nucleotide sequence variation in a single chloroplast gene or gene intron such as rbcL (Doebley et al. 1990
), ndhF (Clark, Zhang, and Wendel 1995
), rpoC2 (Cummings, King, and Kellogg 1994
), rps4 (Nadot, Bajon, and Lejeune 1994
), matK (Hilu and Alice 1999
), and rpl16 intron (Zhang 2000
). In contrast to widespread studies done with the single genebased approach, there have been only a few studies that addressed grass evolution or grass chloroplast gene evolution based on multiple chloroplast gene sequences (Wolfe, Li, and Sharp 1987
; Wolfe et al. 1989
; Gaut, Muse, and Clegg 1993
). The entire chloroplast genome structure of wheat was reported recently (Ogihara et al. 2002
), and the fully sequenced cpDNAs of three cereals, maize (Maier et al. 1995
), rice (Hiratsuka et al. 1989
), and wheat, now provide a unique opportunity to investigate grass evolution based on whole-genome comparison.
We report results of comparative analyses of nucleotide sequence variations in 106 chloroplast genes of maize, rice, and wheat. The goals were to (1) provide a broad picture of chloroplast gene diversification in cereals; (2) compare relative evolutionary rates of chloroplast genes; and (3) infer the chloroplast genome phylogeny of the three cereals using all the chloroplast gene sequences.
![]() |
Data and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Overall Diversification of Chloroplast Genes
To provide a broad picture of chloroplast gene diversification in the three cereals, total gene divergence (D) from the tobacco homolog was calculated for each of the 106 cereal genes (see Supplementary Material at MBE web site: www.molbiolevol.org). Eight genes, all of them tRNA genes, were invariable in the four species. The 106 chloroplast genes were classified in eight functional gene groups, and the average D was calculated for each group (table 1
). Average D values of the 106 genes were comparable for maize (0.112), rice (0.112), and wheat (0.113). A Friedman test showed there is no statistical significance for the differences in these average D values (P = 0.35), indicative that most of the genic regions of the three cereal chloroplast genomes evolved at similar rates. Maturase, envelop membrane protein, and proteinase genes seem to have evolved rapidly. In contrast, RNA genes (rRNA and tRNA genes) are highly conservative. Average D values were not statistically uniform between the gene groups of each species (Kruskal-Wallis test, P < 0.0001). Scheffe's multiple comparison test indicated that the average D values of RNA genes always were significantly smaller than those of the other gene groups (P < 0.01 or P < 0.05) and that in each species the average D of photosynthetic genes was significantly smaller than that of ribosomal protein genes (P < 0.01 or P < 0.05).
|
|
|
Nonsynonymous Substitutions as a Driving Force for Gene Divergence
Relative rate tests showed that some cereal chloroplast genes evolved at heterogeneous rates. The rate differences seem to consist of multiple components (table 4
). One component is the bias regarding transitional and transversional substitutions. For example, the transition-transversion ratio of the wheat psbF gene is 1.7-fold that of rice psbF and 2.1-fold that of maize psbF. This suggests that the high total gene divergence of the wheat psbF gene (table 3
) may be the result of accelerated transitional substitutions in wheat, decelerated transversions in the others, or both. Similarly, the synonymous and nonsynonymous substitution ratios (S-N ratio) varied among the species (table 4 ). For example, the maize psbD gene had a smaller S-N ratio (15.4) than the rice and wheat genes (38.7 and 22.9) because of the high nonsynonymous substitution rate (0.024 in maize, 0.009 in rice, and 0.015 in wheat). The high nonsynonymous substitution rate may have contributed to the high gene total divergence of maize psbD gene (table 3
).
|
|
Some genes provided evidence that accelerated nonsynonymous substitution rates enhanced total gene divergence. For the ndhB and rpl2 genes, relative rate tests for the entire sequences indicated that the wheat homologs evolved significantly faster than the maize and rice homologs (P < 0.001) (table 3 ). Although in those genes wheat had larger synonymous and nonsynonymous substitution rates than maize and rice (table 4 ), significant rate heterogeneity (P < 0.001) was found exclusively at nondegenerate sites (table 5 ), indicating that nonsynonymous changes are the major driving force for fast nucleotide substitution in wheat ndhB and rpl2 genes. In the psbD gene, the faster evolution of the maize homolog relative to the rice homolog (P < 0.05, table 3 ) is associated with a marked difference in the nonsynonymous substitution rate (table 4 ) and significant rate heterogeneity at nondegenerate sites (P < 0.001) (table 5 ). Regarding the rpoA gene, the rice homolog evolved significantly faster than the wheat homolog (P < 0.05) (table 3 ). Although compared with wheat, rice had higher synonymous and nonsynonymous substitution rates (table 4 ), and relative rate test results showed significant heterogeneity only at nondegenerate sites (P < 0.05, table 5 ), revealing that accelerated nonsynonymous changes are responsible for the enhanced total gene divergence of the rice rpoA gene.
Chloroplast Genome Phylogeny Based on Genome-Wide Gene Comparisons
Chloroplast genes have been used extensively to reconstruct the phylogeny in the Poaceae. Most of the conclusions drawn have been based on nucleotide sequence variation in a single chloroplast gene. The amount of phylogenetic information that a chloroplast gene provides is not, however, always sufficient to make robust phylogenetic inferences (Doebley et al. 1990
). One way to overcome this is to increase the amount of information available for phylogeny reconstruction by analyzing multiple chloroplast genes. The 106 chloroplast genes provided us the opportunity to infer chloroplast genome phylogeny of cereals on the basis of genome-wide gene comparisons. Because maize (Panicoideae), rice (Bambusoideae), and wheat (Pooideae) represent different subfamilies of the Poaceae, whether phylogeny based on the entire genic region agrees with that based on a single gene is of interest.
To infer a phylogeny based on a genome-wide gene comparison, we focused on nucleotide substitutions in the 106 genes. Nucleotide substitutions are reliable phylogenetic markers for the chloroplast genome (see Golenberg et al. 1993
). We constructed a chloroplast genome type for each species by extracting and concatenating the variable sites from the aligned gene sequences of maize, rice, wheat, and tobacco. Of the 106 genes, 98 had more than one variable site. Because the impact of the rate heterogeneity of the nucleotide substitutions in each gene on phylogeny reconstruction was unknown, we used three gene groups for analysis: (1) all 98 genes; (2) 94 genes, excluding four genes (ndhB, psbD, psbH, and rrn23) that showed highly significant rate heterogeneity of nucleotide substitution (P < 0.001) in relative rate tests on the entire sequences; and (3) 84 genes, excluding all 14 genes that showed significant rate heterogeneity (P < 0.05). The total number of variable sites and average number of variable sites per gene, respectively, ranged from 8324 to 9675 bp and from 98.7 to 99.1 bp (table 6
).
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Unlike the preponderance of genes that evolve at homogeneous rates, our results showed that 14 genes (13.2%), about 28% of the genic region, evolved at heterogeneous rates in the chloroplast genomes of the three cereals. This raises the question of what generated the heterogeneity of nucleotide substitutions in those genes. The overall nucleotide substitution pattern in the genes suggests that rate heterogeneity was the product of multiple factors, including biased transitional and transversional substitution rates and variation in synonymous and nonsynonymous substitution rates between species (table 4 ). Relative rate tests done on the entire gene sequences showed that the rice homolog tends to have evolved more slowly than the homologs of maize and wheat at loci that showed rate heterogeneity (table 3 ). This may reflect the prolonged generation time of Oryza species, which include many perennials and annual-perennial intermediates, whereas the Zea and Triticum species are primarily annual (exceptions: Zea perennis and Zea diploperennis). If prolonged generation time is the primary factor, one would expect reduced total gene divergence (D) for all rice chloroplast genes because the generation-time effect should affect the entire genome. This expectation is not supported because there are slight differences in the average D values of the 106 genes of the three cereal species (0.112 for maize, 0.112 for rice, and 0.113 for wheat) (Friedman test, P = 0.35). The contribution of the generation-time effect to gene diversification therefore appears to be minor in chloroplast genomes of cereal species.
Unlike the genome-wide effect of generation time, selection may target individual genes, producing rate heterogeneity between evolutionary lineages. We dissected the rate heterogeneities found for 12 protein-coding genes by performing separate relative rate tests on the nondegenerate, twofold degenerate, and fourfold degenerate sites of these genes. Of the 12 genes, four (ndhB, psbD, rpl2, and rpoA) gave results that favor the selection hypothesis. In those genes, nonsynonymous substitutions seem to be accelerated in one of the three species, whereas synonymous substitution rates were homogeneous interspecifically, suggesting that in the different cereal lineages gene products were under different selective constraints. More importantly, the marked association of accelerated rates of nonsynonymous substitution with enhanced total gene divergence indicates that much of the interspecific variation in these genes is attributable to differences in nonsynonymous substitution rates. Selection therefore seems to have had a role in the diversification of these genes.
Our results also provide evidence that other factors contribute to chloroplast gene diversification. The psbC gene, the only one with significant rate heterogeneity at the fourfold degenerate (table 5 ), shows that accelerated synonymous substitution is a driving force for the diversification of this gene in wheat. Relative rate tests failed to detect significant heterogeneity at all site classes in five genes (ndhK, psbE, psbF, psbH, and 5'-rps12) (table 5 ). The variation in these genes may be attributable to factors other than biased rates for synonymous and nonsynonymous substitutions. These findings indicate that the mechanism that underlies gene diversification in cereal chloroplast genomes is complex.
Implications for Phylogeny of Cereals
Morphological and molecular studies of grass phylogeny are not always congruent regarding relationships between the subfamilies that maize (Panicoideae), rice (Bambusoideae), and wheat (Pooideae) represent. A numerical taxonomy study indicated that the Panicoideae is an outgroup to the Bambusoideae and Pooideae (Watson, Clifford, and Dallwitz 1985
), whereas Clayton (1981)
proposed that on the basis of geographic distribution patterns rice is an outgroup. Analyses of a chloroplast gene (ndhB, Clark, Zhang, and Wendel 1995
) and a nuclear gene (PHYB, Mathews, Tsai, and Kellogg 2000
) recognized a monophyletic clade that includes the Bambusoideae and Pooideae and placed the Panicoideae as an outgroup. Monophyly of the clade that includes the Bambusoideae and Pooideae, however, is not supported by findings for other chloroplast genes (e.g., matK gene, Hilu and Alice 1999
). Recently, Ogihara et al. (2002)
suggested that rice and wheat are more closely related to each other than to maize based on the structural similarity of the chloroplast genomes.
One problem regarding such molecular studies is the limited amount of phylogenetic information a single gene provides. In this study, we constructed chloroplast genome trees for the three major cereals based on variable nucleotide sites in the genes. They all place maize basal to the rice-wheat clade; moreover, statistical support for the rice-wheat clade is a high 87% (bootstrap) and 99% (interior test) (fig. 1 and table 6 ). These findings support a close relationship between the Bambusoideae and Pooideae in the grass phylogeny and suggest that the Panicoideae split from the rice-wheat lineage during the early stage of grass evolution. Given the increasing number of fully sequenced chloroplast genomes in the databases, a variable sitebased approach should provide a simple method to reconstruct a reliable phylogeny based on the entire genome comparison, but the performance and application of this approach require further evaluation.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Abbreviations: cpDNA, chloroplast DNA.
Keywords: chloroplast genome
gene evolution
relative rates
substitution rate
Poaceae
Address for correspondence and reprints: Yoshihiro Matsuoka, Fukui Prefectural University, Matsuoka-cho, Yoshida-gun, Fukui 910-1195, Japan. E-mail: matsuoka{at}fpu.ac.jp
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bousquet J., S. H. Strauss, A. H. Doerksen, R. A. Price, 1992 Extensive variation in evolutionary rate of rbcL gene sequences among seed plants Proc. Natl. Acad. Sci. USA 89:7844-7848[Abstract]
Clark L. G., W. Zhang, J. F. Wendel, 1995 A phylogeny of the grass family (Poaceae) based on ndhF sequence data Syst. Bot 20:436-460[ISI]
Clayton W. D., 1981 Evolution and distribution of grasses Ann. Missouri Bot. Gard 68:5-14[ISI]
Clayton W. D., S. A. Renvoize, 1986 Genera graminum Her Majesty's Stationery Office, London
Cummings M. P., L. M. King, E. A. Kellogg, 1994 Slipped-strand mispairing in a plastid gene: rpoC2 in grasses (Poaceae) Mol. Biol. Evol 11:1-8[Abstract]
Corneille S., K. Lutz, P. Maliga, 2000 Conservation of RNA editing between rice and maize plastids: are most editing events dispensable? Mol. Gen. Genet 264:419-424[ISI][Medline]
Doebley J., M. Durbin, E. M. Golenberg, M. T. Clegg, D. P. Ma, 1990 Evolutionary analysis of the large subunit of carboxlase (rbcL) nucleotide sequence among the grasses (Gramineae) Evolution 44:1097-1108[ISI]
Gaut B. S., S. V. Muse, M. T. Clegg, 1993 Relative rates of nucleotide substitution in the chloroplast genome Mol. Phylogenet. Evol 2:89-96[Medline]
Gaut B. S., S. V. Muse, W. Deniss Clark, M. T. Clegg, 1992 Relative rates of nucleotide substitution at the rbcL locus of monocotyledonous plants J. Mol. Evol 35:292-303[ISI][Medline]
Golenberg E. M., M. T. Clegg, M. L. Durbin, J. Doebley, D. P. Ma, 1993 Evolution of a noncoding region of the chloroplast genome Mol. Phylogenet. Evol 2:52-64[Medline]
Hilu K. W., L. A. Alice, 1999 Evolutionary implications of matK indels in Poaceae Am. J. Bot 86:1735-1741
Hiratsuka J., H. Shimada, R. Whittier, et al. (16 co-authors) 1989 The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals Mol. Gen. Genet 217:185-194[ISI][Medline]
Hirose T., T. Kusumegi, T. Tsudzuki, M. Sugiura, 1999 RNA editing sites in tobacco chloroplast transcripts: editing as a possible regulator of chloroplast RNA polymerase activity Mol. Gen. Genet 262:462-467[ISI][Medline]
Kellogg E. A., 1998 Relationships of cereal crops and other grasses Proc. Natl. Acad. Sci. USA 95:2005-2010
Kumar S., K. Tamura, I. B. Jakobsen, M. Nei, 2001 MEGA2: molecular evolutionary genetics analysis software Arizona State University, Tempe
Maier R. M., K. Neckermann, G. L. Igloi, H. Kössel, 1995 Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and fine tuning of genetic information by transcript editing J. Mol. Biol 251:614-628[ISI][Medline]
Mathews S., R. C. Tsai, E. A. Kellogg, 2000 Phylogenetic structure in the grass family (Poaceae): evidence from the nuclear gene phytochorome B Am. J. Bot 87:96-107
Nadot S., R. Bajon, B. Lejeune, 1994 The chloroplast gene rps4 as a tool for the study of Poaceae phylogeny Plant Syst. Evol 191:27-38[ISI]
Nei M., T. Gojobori, 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions Mol. Biol. Evol 3:418-426[Abstract]
Ogihara Y., K. Isono, T. Kojima, et al. (19 co-authors) 2002 Structural features of a wheat plastome as revealed by complete sequencing of chloroplast DNA Mol. Gen. Genomics 266:740-746[ISI][Medline]
Shinozaki K., M. Ohme, M. Tanaka, et al. (23 co-authors) 1986 The complete nucleotide sequence of tobacco chloroplast genome: its gene organization and expression EMBO J 5:2043-2049[ISI]
Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]
Soltis D. E., P. S. Soltis, M. T. Clegg, M. Durbin, 1990 rbcL sequence divergence and phylogenetic relationships in Saxifragaceae sensu lato Proc. Natl. Acad. Sci. USA 87:4640-4644[Abstract]
Tajima F., 1993 Simple methods for testing the molecular evolutionary clock hypothesis Genetics 135:599-607
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]
Watson L., H. T. Clifford, M. J. Dallwitz, 1985 The classification of Poaceae: subfamilies and supertribes Aust. J. Bot 33:433-484[ISI]
Wolfe K. H., M. Gouy, Y.-W. Yang, P. M. Sharp, W.-H. Li, 1989 Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data Proc. Natl. Acad. Sci. USA 86:6201-6205[Abstract]
Wolfe K. H., W.-H. Li, P. M. Sharp, 1987 Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs Proc. Natl. Acad. Sci. USA 84:9054-9058[Abstract]
Zhang W., 2000 Phylogeny of the grass family (Poaceae) from rpl16 intron sequence data Mol. Phylogenet. Evol 15:135-146[ISI][Medline]