Molecular Evolution of Nuclear Genes in Cupressacea, a Group of Conifer Trees

Junko Kusumi, Yoshihiko Tsumura, Hiroshi Yoshimaru and Hidenori Tachida

*Department of Biology, Faculty of Sciences, Kyushu University;
{dagger}Bio-resources Technology Division, Forestry and Forest Product Research Institute, Kukizaki, Ibaraki, Japan


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We surveyed the molecular evolutionary characteristics of 11 nuclear genes from 10 conifer trees belonging to the Taxodioideae, the Cupressoideae, and the Sequoioideae. Comparisons of substitution rates among the lineages indicated that the synonymous substitution rates of the Cupressoideae lineage were higher than those of the Taxodioideae. This result parallels the pattern previously found in plastid genes. Likelihood-ratio tests showed that the nonsynonymous-synonymous rate ratio did not change significantly among lineages. In addition, after adjustments for lineage effects, the dispersion indices of synonymous and nonsynonymous substitutions were considerably reduced, and the latter was close to 1. These results indicated that the acceleration of evolutionary rates in the Cupressoideae lineage occurred in both the nuclear and plastid genomes, and that generally, this lineage effect affected synonymous and nonsynonymous substitutions similarly. We also investigated the relationship of synonymous substitution rates with the nonsynonymous substitution rate, base composition, and codon bias in each lineage. Synonymous substitution rates were positively correlated with nonsynonymous substitution rates and GC content at third codon positions, but synonymous substitution rates were not correlated with codon bias. Finally, we tested the possibility of positive selection at the protein level, using maximum likelihood models, assuming heterogeneous nonsynonymous-synonymous rate ratios among codon (amino acid) sites. Although we did not detect strong evidence of positively selected codon sites, the analysis suggested that significant variation in nonsynonymous-synonymous rate ratio exists among the sites. The most likely sites for action of positive selection were found in the ferredoxin gene, which is an important component of the apparatus for photosynthesis.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The extent to which mutation and selection contribute to nucleotide substitutions has been one of the outstanding problems in molecular evolution since the proposal of the neutral theory by Kimura (1968)Citation . In order to resolve this issue, numerous studies have compared sequences of homologous genes in related species (see Gillespie 1991Citation , pp. 40–44; Li 1997Citation , pp. 177–235). Several observations from these studies have suggested important roles of selection for nonsynonymous (Gillespie 1989Citation ) and synonymous (Ikemura 1985Citation ; Akashi 1995Citation ) substitutions, although evolution of noncoding regions seems to be mostly driven by mutation (but see Bernardi 2000Citation ). However, most species studied thus far have been confined to specific groups—e.g., bacteria, mammals, Drosophila, and Arabidopsis—consisting of one or more model organisms. Recent undertakings of genome projects at various scales enable us to study wider groups of species and to expand our knowledge of the evolutionary patterns of DNA sequences.

One group of species that has not been well studied for molecular evolution is conifer trees. Conifers belong to gymnosperms, and there are currently about 550 species in the group. They have large genome sizes and long life expectancy and have undergone only a few chromosome duplications in their evolution. In addition, they are wind-pollinated and probably for this reason are less differentiated within species (Hamrick and Godt 1990Citation ). These characteristics make this group an interesting target of a molecular evolutionary study.

In this study, we surveyed the molecular evolutionary characteristics of nuclear genes in 10 species of Cupressaceae sensu lato (s.l.), one group of conifers. Their phylogenetic relationships have been inferred using plastid DNA (Brunsfeld et al. 1994Citation ; Tsumura et al. 1995Citation ; Gadek et al. 2000Citation ; Kusumi et al. 2000Citation ), rDNA (Stefanovic et al. 1998Citation ), and immunological analysis (Price and Lowenstein 1989Citation ). These analyses provided well-resolved phylogenetic relationships within Cupressaceae s.l. Gadek et al. (2000)Citation combined nonmolecular and molecular data and proposed a new infrafamilial classification of Cupressaceae s.l., in which seven subfamilies were recognized.

Meanwhile, in collaboration with other researchers, we have started a genome project of Cryptomeria japonica, which belongs to the Taxodioideae of Cupressaceae s.l. (based on the new classification, Gadek et al. 2000Citation ). Cryptomeria japonica is one of the most important timbers in Japan because of its excellent growth and wood quality. Tsumura et al. (1997)Citation constructed sequenced-tagged–site (STS) markers, and Iwata et al. (2001)Citation constructed cleaved amplified polymorphic sequence (CAPS) markers in C. japonica, using libraries from 3-day imbibed embryos and inner bark, adding more markers to a linkage map based on RFLP, RAPD, and isozyme and morphological loci (Mukai et al. 1995Citation ). Furthermore, expressed sequence tags (ESTs) analysis was also carried out (Ujino-Ihara et al. 2000Citation ), and more than 2,000 partial sequences of C. japonica were obtained from the cDNA clones isolated from a library derived from the inner bark tissues. These genetic information and the phylogenetic analyses provide us an opportunity to study the molecular evolution of C. japonica and its related species. Here, we study the molecular evolution of 11 nuclear genes from cDNA clones, whose functions were inferred by homology.

We study the 11 genes from 10 species, including species from three subfamilies of Cupressaceae s.l., Taxodioideae, Cupressoideae, and Sequoioideae. The Taxodioideae include three genera, Cryptomeria, Taxodium, and Glyptostrobus. The inferred relationships among these genera are well supported, and this clade is sister to the traditional Cupressaceae sensu stricto (s.s.), which is subdivided into the Cupressoideae and the Callitroideae according to the new classification (Gadek et al. 2000Citation ). In contrast with the Taxodioideae, the Cupressoideae include 10 genera comprising more than 100 species. Many of the modern genera of these subfamilies originated before the end of the Mesozoic, and the Cupressoideae-Callitroideae clade and Taxodioideae likely diverged roughly 100 MYA (Miller 1977, 1988Citation ). Thus, there has been enough time for these species, potentially, to accumulate sequence variation to study the evolutionary dynamics of genes. Because the Sequoioideae is sister to those two subfamilies, this subfamily was chosen as an outgroup. We characterize the molecular evolution of each of the 11 genes from species of these conifer subfamilies by estimating the synonymous (silent, dS) and nonsynonymous (amino acid replacing, dN) substitution rates in protein-coding regions and the nucleotide substitution rate (K) in noncoding regions.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Plant Materials
We examined three species of Taxodioideae, five Cupressoideae species, and two Sequoioideae species. The phylogenetic relationship of these species is known, as shown in figure 1A (see Gadek et al. 2000Citation ; Kusumi et al. 2000Citation ). Either one of the Sequoioideae species was used as an outgroup for comparison of nucleotide substitution rates between Taxodioideae and Cupressoideae species.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 1.—The phylogenetic relationship of the species and neighbor-joining trees of nuclear genes. A, The phylogenetic relationship based on plastid DNA sequences (made from Kusumi et al. 2000Citation ), BL, Neighbor-joining trees based on Kimura's (1980)Citation two-parameter distances with bootstrap values. Species names are abbreviated: Seq, S. sempervirens; Met, M. glyptostroboides; Jun, Juniperus rigida; Chap, Chamaecyparis pisifera; Chao, Chamaecyparis obtusa; Thuj, Thuja standishii; Thujp, T. dolabrata; Tax, Taxodium distichum; Gly, Glyptostrobus pensilis; Cry, C. japonica. Se, Cu, and Ta represent subfamilies, the Sequoioideae, Cupressoideae, and Taxodioideae, respectively. Sequences used in the three branch analyses are marked by an *

 
DNA Sequencing
Eighty new sequences at 11 nuclear loci were determined for the 10 species (details are shown in table 1 ). DNA materials were a part of those used in the previous study of Tsumura et al. (1995)Citation . Eleven pairs of primers were used for PCR amplification (listed in table 1 ). Four of them (Chi1, GapC, Pgi, and F3h genes) were designed from sequences of cDNA clones that were obtained from EST analysis (Ujino-Ihara et al. 2000Citation ). The four clones, designated CD0851, CC2241, CC2464, and CM0218, are homologous to class I chitinase (Chi1), glyceraldehyde-3-phosphate dehydrogenase (GapC), glucose-6-phosphate isomerase (Pgi), and flavanone-3-hydroxylase (F3h) genes, respectively. The seven other pairs of primers were originally designed from STS markers to amplify cDNA clones CD1514, CD1613, CD1706, CD1640, CC1195, CC0550, and CC1606, which are homologous to chalcone synthase (Chs), ferredoxin (Ferr), phosphoribosyltransferase (Pat), glutamyl-tRNA reductase (HemA), Rab geranylgeranyl transferase beta subunit (Rabggtb), chitinase class III–like (Chi2), Myb transcription factor (Myb) genes, respectively (Tsumura et al. 1997Citation ; unpublished data). Genes were chosen randomly to obtain a broad range of putative functions and a general picture of molecular evolution in the species group. Chs and F3h are involved in the synthesis of secondary metabolites, Chi1 and Chi2 are enzymes catalyzing chitin, and Myb is a transcription factor. All other genes are involved in primary metabolism.


View this table:
[in this window]
[in a new window]
 
Table 1 Newly Sequenced Genes, Taxa, Accession Numbers, and Primer Sequences

 
PCR products of these genes were purified using Geneclean II (Bio101) or QIAquick PCR purification Kit (QIAGEN) and then directly sequenced for both strands on an ABI 377 automated sequencer, using the BigDye Terminator Cycle Sequencing Ready Reaction Kit (PE Biosystems). Some products (the GapC, F3h, and Rabggtb genes) included DNA fragments of various lengths, probably because of heterozygosity at the locus or a recent gene duplication; so these were cloned into a pGEM-T vector (Promega), and several independent clones were sequenced. All new sequences obtained in this study were submitted to the DNA Data Bank of Japan (DDBJ). The accession numbers for sequences are listed in table 1 .

Sequence Analyses
Sequence alignments were first performed by Clustal X (Thompson et al. 1997Citation ) and then refined manually. In order to know the relationships among sequences, we first constructed neighbor-joining trees (Saitou and Nei 1987Citation ) of the 11 genes, using these alignments of DNA sequences. The numbers of synonymous and nonsynonymous substitutions per site (dS and dN) and numbers of nucleotide substitutions per site (K) in noncoding regions were estimated by maximum likelihood (ML), using the CODEML and BASEML programs of the PAML package (Yang 2000Citation ). For the estimation of dS and dN, we used models allowing transition-transversion rate bias and unequal codon frequencies, which were determined using the empirical nucleotide frequencies at the three positions of the codon (F3 x 4 model; Yang and Nielsen 1998Citation ). K was estimated under Kimura's two-parameter model (Kimura 1980Citation ) that also accounts for transition-transversion rate bias. The parameters, dS, dN, and K, were first estimated pairwise by ML for each gene, and the average numbers of substitutions between subfamilies, with their variances, were obtained using the method of Nei and Jin (1989)Citation . The one-degree-of-freedom (1D) relative rate test of Tajima (1993)Citation was used to compare the accumulation of site differences between Taxodioideae and Cupressoideae species.

Lineage-specific estimates of dS and dN were obtained by ML using the sequences from three species, Metasequoia glyptostroboides (or Sequoia sempervirens), C. japonica, and Thujopsis dolabrata. In the latter two species, sequences of all 11 nuclear genes could be determined, and so these species were used as representatives of the respective subfamilies. We used two models of the dN/dS ({omega}) ratio. The first model assumed the same ratio for all branches of the M. glyptostroboides, C. japonica, and T. dolabrata, whereas the second model allowed independent {omega} ratios for the three branches.

G+C content at third codon positions synonymous site (GC3s) and codon usage bias, measured using the effective number of codons (ENC; Wright 1990Citation ), were computed in all sequences using the program CODON (Lloyd and Sharp 1992Citation ).

To test the presence of codon (amino acid) sites with {omega} = dN/dS > 1 which can be considered as candidate sites undergoing diversifying selection and to identify them, ML models with variable {omega} ratios among sites were used to analyze each of the 11 nuclear gene data (Nielsen and Yang 1998Citation ). We use the following six models for the {omega} distribution (table 4 ), implemented in CODEML program (Yang 2000Citation ). Model 0 (M0) assumes one {omega} ratio ({omega}0) for all codon sites. The neutral model (M1) assumes conserved sites with {omega}0 = 0 and neutral sites with {omega}1 = 1 with a proportion p0 and a proportion p1 = 1 - p0. The selection model (M2) adds an additional {omega} class with frequency p2 = 1 - p0 - p1, with {omega}2 estimated from the data. The discrete model (M3) uses a general discrete distribution with three site classes, with the proportions (p0, p1, and p2) and the {omega} ratios ({omega}0, {omega}1, and {omega}2) estimated from the data. The beta model (M7) assumes that the {omega} ratio varies according to a beta distribution B(p, q), whose domain is bounded within the interval (0, 1). Thus, this model does not allow for codon sites with {omega} > 1. The beta & {omega} model (M8) adds a discrete {omega} class to the beta (M7) model to account for codons with {omega} > 1. Sites with {omega} drawn from the beta distribution B(p, q) occur in a proportion p0, and the rest belong to a discrete {omega} class ({omega}1) and occur in proportion p1 = 1 - p0. We compared each of four pairs of models (M0 vs. M3, M1 vs. M2, M1 vs. M3, and M7 vs. M8) by likelihood-ratio tests (LRTs) to examine the statistical significance of the fit of the model (Yang et al. 2000Citation ).


View this table:
[in this window]
[in a new window]
 
Table 4 Log-Likelihood Values and Parameter Estimates Under Models of Variable {omega} Ratios Among Codon Sites

 

    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Table 1 summarizes the sequence data of the 11 nuclear genes. The sequences of all 11 nuclear genes could be obtained from the three species of Taxodioideae and T. dolabrata; but for other species in Cupressoideae, several sequences could not be obtained because of poor amplification of PCR products. For each of the 11 genes, the sequence of both or either species of Sequoioideae could be determined. Although duplicated copies of the F3h and Rabggtb genes were found in several species, phylogenetic analyses indicated that these genes were duplicated after speciation or at least after the divergence between Taxodioideae and Cupressoideae (fig. 1 ). They were therefore included in the following analyses.

Because it is important to compare orthologous, not paralogous, genes for estimating substitution rates, we first examine this issue. The highest synonymous substitution rate is that of the Chs, and the lowest is that of the HemA (see table 2 ). The former is more than eight times higher than the latter, and this high estimate merits special concern because an overestimation of the substitution rate is expected if we compare paralogous genes. We examined the phylogenetic relationship of the eight Chs sequences using a pine Chs gene obtained from a public database as an outgroup. Topological relationships should not reflect species relationships if we compare paralogous genes, but we obtained exactly the same topological relationship as that obtained from plastid genes by Kusumi et al. (2000)Citation (data not shown). In addition, Chs was directly sequenced from PCR products, and no sequence heterogeneity was observed. Therefore, we tentatively conclude that the eight Chs sequences are orthologous and that Chs has high synonymous substitution rates, although there is a small possibility that Chs genes are paralogs that retain phylogenetic relationships like the plastid genes. In other genes, we examined the topological relationships without taking outgroup genes (fig. 1 ). In all genes, the topological relationships are essentially the same as those of the plastid genes, though gene duplications after the divergence of Cupressoideae and Taxodioideae were found in a few genes. We therefore believe that we are measuring substitution rates of orthologous genes.


View this table:
[in this window]
[in a new window]
 
Table 2 Average Number of Nucleotide Substitutions Per Site Between the Lineages and GC3s and ENC Values

 
The average synonymous and nonsynonymous substitution rates and nucleotide substitution rates in noncoding regions between subfamilies in each of 11 nuclear genes and two plastid genes are shown in table 2 . In this analysis, we used all sequences including those of duplicated copies. We previously documented an acceleration of both synonymous and nonsynonymous substitution rates in two plastid genes in the Cupressoideae (Cupressaceae s.s.) lineage (Kusumi et al. 2000Citation ). In the 11 nuclear genes, the synonymous substitution rate (dS) is higher in Cupressoideae than in Taxodioideae, except for the GapC gene (the difference is significant using a two-tailed sign test, P = 0.011). Averages of dS between Sequoioideae and Taxodioideae range from 0.107 to 0.854, with a mean value of 0.387, and those between Sequoioideae and Cupressoideae range from 0.131 to 1.212, with a mean value of 0.474. The rate increase in Cupressoideae is similar to that of the plastid genes. However, in nonsynonymous substitution rates (dN), no obvious acceleration could be observed in the Cupressoideae. In Chi1, Chs, HemA, Rabggtb, and Myb genes, average dN is slightly higher in the Cupressoideae lineage, but in the other genes, dN is roughly the same in the two lineages or higher in the Taxodioideae lineage. Mean values of average dN in both lineages are similar, those between Sequoioideae and Taxodioideae being 0.051, and those between Sequoioideae and Cupressoideae being 0.053.

For noncoding regions, only three data sets of introns and two of 3'-untranslated regions (UTRs) were obtained. In three out of five cases, because sequences were short or include extensive indels (or both), lengths of aligned segments became less than 200 bp. Introns of the GapC and Rabggtb genes were relatively long (1,094 and 617 bp each), and estimates from these regions were considered reliable. For GapC and Rabggtb, average nucleotide substitution rates (K) are higher in the Cupressoideae lineage.

In the previous study, we applied Tajima's 1D relative rate test to plastid genes (chlL, matK, and rbcL) and the 28S rRNA gene to test the homogeneity of substitution rates among the lineages. The results indicated that changes in the nucleotide substitution rate have occurred across multiple loci. We also applied the analysis to the 11 nuclear genes. Among 114 tests, only 10 comparisons were significant at the 5% level, and 9 out of 10 significant results indicated higher rates of accumulation of nucleotide substitutions in the Cupressoideae lineage. Although most comparisons of pairs were not significant, in 69% of the comparisons, we observed larger numbers of different sites (Tajima's m in the method) in the Cupressoideae lineage.

The values of the {omega} (=dN/dS) ratio estimated using the average synonymous and nonsynonymous substitution rates are also shown in table 2 . As in the previous study, two plastid genes show higher values of the {omega} ratio between Sequoioideae and Cupressoideae. However, 8 out of 11 nuclear genes show higher value of the {omega} ratio between Sequoioideae and Taxodioideae. The mean value of {omega} ratios among the nuclear genes between Sequoioideae and Taxodioideae is 0.153, and it is higher than that between Sequoioideae and Cupressoideae, 0.129.

To test whether the variation of the {omega} ratio between the two lineages is significant, we carried out the LRT under two models, one assuming the same {omega} ratio for all branches (lineages) of the tree and another assuming different {omega} ratios among branches. For these three branch analyses, we used sequences from three species, M. glyptostroboides (or S. sempervirens), C. japonica, and T. dolabrata, as representatives of respective subfamilies. The latter two species were chosen because all 11 genes could be sequenced from them. If genes are duplicated (e.g., F3h, Rabggtb), we used copy A (see fig. 1 ). The sequences used in analyses are marked by an * in figure 1 . The same choice was made for the following analyses using these three species. First, we assumed a star phylogeny for the estimation. By the LRT, only one gene, the Pat gene, shows a significant difference of the {omega} ratio in the three lineages (P < 0.05). The same analysis was performed using the rooted tree (the outgroup is M. glyptostroboides or S. sempervirens), and the result was similar to that using the star phylogeny. To sum up, heterogeneity of nucleotide substitution rates in the two lineages has occurred in both the nuclear and plastid genomes, but these lineage effects did not significantly change the {omega} ratios of the respective lineages.

From the above ML estimation, the numbers of synonymous and nonsynonymous substitutions for the three branches, M. glyptostroboides (or S. sempervirens), C. japonica, and T. dolabrata, were also obtained. To examine variability of the numbers of synonymous and nonsynonymous substitutions among lineages, we calculated the dispersion index (R) with and without the weighting factor of Gillespie (1989)Citation . The weighting factors are proportional to the total number of substitutions across all 11 genes along the lineage in the respective categories (synonymous and nonsynonymous). They are 1.453, 0.665, and 0.883 for M. glyptostroboides, C. japonica, and T. dolabrata lineages, respectively, for nonsynonymous substitutions, and 1.319, 0.700, and 0.981, respectively, for synonymous substitutions. We used the dN and dS estimates from the ML method under the model of different {omega} ratios among branches. The results are shown in table 3 . In both categories of substitutions, adjustments for lineage effects by the weighting factors did lower the average values of R. For the synonymous substitutions, without weights (equally weighted) R ranges from 0.281 to 18.612, with an average value of 4.106. When weights are used, R ranges from 0.092 to 5.734, with the average value being reduced to 2.104. For the nonsynonymous substitutions, equally weighted R ranges from 13.229 to 0.000, with an average value of 2.500, but weighted R ranges from 2.772 to 0.136, with a reduced average value of 0.968. This last estimate is surprisingly close to 1, which is expected under a simple Poisson process with the same rate among lineages (Kimura 1983, p. 69Citation ). Therefore, the lineage effects seem to be a significant factor of variation in synonymous and especially in nonsynonymous numbers of substitutions.


View this table:
[in this window]
[in a new window]
 
Table 3 Dispersion Index for Nonsynonymous and Synonymous Substitutions

 
In the 11 genes, we measured G+C content by the percentage of G+C at third codon positions synonymous sites (GC3s), and we measured codon bias by the effective number of codons (ENC, Wright 1990Citation ) (table 2 ). Average GC3s varied among the 11 nuclear genes, ranging from 70.3% to 31.3%. However, in each gene, the GC contents were similar across the species. Average ENC values among the nuclear genes ranged from 60.4 to 48.3, and differences of ENC among the species are not significant in each gene. Whereas the GC3s of the plastid genes were lower (~25%) than those of the nuclear genes, values of ENC were comparable to those of the nuclear genes.

We subsequently examined whether GC3s, ENC, and dN were related to dS variation among the genes, using the data from M. glyptostroboides (or S. sempervirens), C. japonica, and T. dolabrata. We estimated lineage-specific dS and dN under the model with different {omega} ratios among branches. Before examining the relationships of dS wth GC3s, ENC, and dN, we checked whether estimates of dS (dN) in the C. japonica and T. dolabrata lineages are correlated across loci or not. The coefficient of determination (r2) of dS estimates is 0.708 (P = 0.005) and that of dN estimates is 0.696 (P = 0.0007) between these two lineages. Subsequently, the relationships of synonymous substitution rate with nonsynonymous substitution rate, GC content, and codon bias were evaluated by linear regression, using the lineage-specific dS and dN, GC3s, and ENC values.

In both lineages, the correlation between dS and dN did not differ significantly from zero (fig. 2a and b, dotted line). However, these plots have an outlier gene (Chs). This gene has a high dS and a low {omega} ratio. Because the number of nonsynonymous differences of the Chs genes between these two species was one, a large variance of the estimates of dS was expected. Thus, we also carried out regression excluding the Chs genes. When the Chs gene was removed, estimates of dS were positively correlated with dN in both lineages (C. japonica, r2 = 0.476, P = 0.0249; T. dolabrata, r2 = 0.793, P = 0.0002) (fig. 2a and b, solid line). GC3s is also positively correlated with dS, and the correlation coefficient was significant in both lineages (C. japonica, r2 = 0.786, P < 0.0001; T. dolabrata, r2 = 0.403, P = 0.0341) (fig. 2c and d ). On the other hand, dS was not correlated with ENC in either lineage (C. japonica, r2 = 0.22, P = 0.1499; T. dolabrata, r2 = 0.061, P = 0.4766) (fig. 2e and f ).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2.—The relationships between dS and dN (a and b), dS and GC3s (c and d), and dS and ENC (e and f) in C. japonica and T. dolobrata genes.

 
We used six models that assume various classes of {omega} ratio among the codon sites to analyze data, utilizing the CODEML program (Yang 2000Citation ) and carried out LRTs. Results of the analyses for five genes in which one of M1 versus M2, M1 versus M3, and M7 versus M8 comparisons gave significant results are shown in table 4 . Estimates of the transition-transversion rate ratio ({kappa}) and the tree length (sum of branch length) are homogeneous among models in each gene, and thus those based on M0 are tabulated. In the Chs, Chi2, and Myb genes, {omega} estimates are always below or close to 1. In contrast, in the Ferr and F3h genes, M3 and M8 models gave {omega} estimates of more than 1 ({omega}1 = 2.042 in Ferr, and {omega}2 = 4.708 in F3h under M3), though the proportions of such sites were small. The M1 versus M3 comparison was significant in the Ferr (2{Delta}l = 19.16, P < 0.001, df = 4) and F3h (2{Delta}l = 11.92, P < 0.05, df = 4) genes, although the M7 versus M8 comparison was not significant in either gene. In the Ferr gene, three sites had {omega} > 1, and these sites were clustered. In the F3h gene, only one site had {omega} > 1. These four sites are considered candidate targets of positive selection (Yang et al. 2000Citation ).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Rate Variation Among Lineages
At synonymous sites, the Cupressoideae have evolved faster than the Taxodioideae except for the GapC gene (table 2 ). Although we could not clearly identify such tendency in nonsynonymous sites, perhaps because of a small number of substitutions among the lineages, we consider that these results provide evidence for the acceleration of the evolutionary rate of nuclear genomes in the lineage. The rate heterogeneity between the two lineages observed in the nuclear genes parallels the finding that Cupressoideae evolved faster than Taxodioideae in plastid genes (Kusumi et al. 2000Citation ). In fact, the ratio of the average synonymous rate between Sequoioideae and Taxodioideae to that between Sequoioideae and Cupressoideae is 0.813 in plastid genes and 0.816 in nuclear genes (see table 2 ). Thus, the variation of evolutionary rates among the lineages is correlated between genes from the two different genomes. This suggests that the same evolutionary process is responsible for the acceleration of the evolutionary rate of the two genomes in Cupressoideae. Such a correlation between genomes has also been demonstrated in grasses and palms (Gaut et al. 1996Citation ; Eyre-Walker and Gaut 1997Citation ; Gaut 1998Citation ). At synonymous sites, the grasses evolve faster than the palms in the plastid gene rbcL, the mitochondrial gene atpA, and the nuclear gene Adh (Eyre-Walker and Gaut 1997Citation ). The parallel change of substitution rates in different genomes may be a general feature of plant evolution, although a few possible exceptions are known in some parasitic plants (Nickrent et al. 1998Citation ).

Rate Variation Among Genes
As shown in other organisms, synonymous substitution rates of the nuclear genes vary among loci in conifers. We showed that the synonymous substitution rate was positively correlated with the nonsynonymous substitution rate and GC3s but not with ENC (fig. 2 ). This pattern is similar to those reported in mammals (Bielawski, Dunn, and Yang 2000Citation ; Hurst and Williams 2000Citation ). It is well known that mammalian genomes are made up of large regions of distinct base composition, the so-called isochores (Bernardi et al. 1985Citation ), and that the GC content at third codon positions of a gene is correlated with the GC content of the regions in which that gene resides (Ikemura 1985Citation ). Recently, Matassi, Sharp, and Gautier (1999)Citation investigated synonymous substitution rates and GC content at silent sites among genes lying within 1 cM of each other in mouse and human. They found regional similarity both in the synonymous substitution rate and GC content, but did not find a significant correlation between them. However, other analyses using an ML method indicated a positive correlation between synonymous substitution rate and GC3s (Bielawski, Dunn, and Yang 2000Citation ; Hurst and Williams 2000)Citation . In addition, Francino and Ochman (1999)Citation reported that interspecific variation in two globin pseudogenes that reside in different isochores was consistent with the effects of differential GC mutation pressure. These results suggest that variation in synonymous substitution rate among genes at least partially reflects region-specific synonymous mutation rates (Wolfe, Sharp, and Li 1989Citation ), and such regional differences are related to GC content (Ticher and Graur 1989Citation ). Because compositional analyses of genomes and genes have not been carried out in the conifers studied here, we could not conclude whether the positive correlation of the synonymous substitution rate with GC3s observed both in mammal and conifers is caused by a similar evolutionary process or not. Nonetheless, it is interesting that organisms that have different lifecycles and habitats demonstrated similar patterns of nucleotide substitutions.

The nuclear genomes of angiosperms are characterized by compositional compartmentalization (Salinas et al. 1988Citation ), and differences of compositional patterns among genomes have been observed. The genomes of Gramineae are GC rich, and their coding sequences cover a broad compositional range, whereas the genomes of dicots, Arabidopsis, soybean, pea, tobacco, tomato, and potato are GC poor, and their coding sequences cover a narrow GC range (Salinas et al. 1988Citation ; Matassi et al. 1989Citation ; Carels et al. 1998Citation ). However, studies on the relationship between the synonymous substitution rate and base composition are limited. In the Gramineae, Alvarez-Valin et al. (1999)Citation reported a negative correlation between the synonymous rate and GC3s, and this result is contrary to that observed in conifers. Furthermore, we did not find a correlation between codon usage and synonymous substitution rates (fig. 2e and f), but Fennoy and Bailey-Serres (1993)Citation suggested that codon usage in maize might reflect both regional bias on nucleotide composition and selection on the third position. What factors make the opposite signs of the correlation between the synonymous rate and GC3s observed in conifers and Gramineae is currently unknown.

A significant positive correlation between synonymous (dS) and nonsynonymous (dN) substitution rates was also found in Drosophila (Dunn, Bielawski, and Yang 2001)Citation , but it was weak, if existent, in mammals (Bielawski, Dunn, and Yang 2000Citation ; Hurst and Williams 2000Citation ). A positive correlation is expected if both types of substitutions are mutation-driven. Mutation-driven models of molecular evolution include the neutral (Kimura 1968Citation ; see Ohta and Ina 1995Citation , for a theoretical treatment of the correlation), nearly neutral (Ohta 1973, 1992Citation ) and advantageous mutation models. If environmental fluctuations drive amino acid substitutions as in the SAS-CFF model of Gillespie (1978)Citation , a positive correlation between dS and dN is not expected unless synonymous substitutions are also driven by the same force. Because we found a positive correlation between synonymous and nonsynonymous substitution rates when the Chs gene was excluded, substitution data in conifers are consistent with the hypothesis of mutation-driven evolution. The exclusion of Chs is justified because the number of nonsynonymous substitutions between C. japonica and T. dolabrata is very small.

The mutation-driven hypothesis of nonsynonymous substitutions is also consistent with the observation of the low dispersion indices, R, for nonsynonymous substitutions. Except in strong interaction models such as the house-of-cards model (see Iwasa 1993Citation ; Gillespie 1994Citation ), the dispersion index is expected to be close to 1, under various mutation-driven models, unless underlying changes of parameters (e.g., population size) are very slow (Araki and Tachida 1997Citation ; Cutler 2000Citation ). The dispersion index is large in mammals (Gillespie 1989Citation ; Ohta 1995Citation ) but not large in Drosophila (Zeng et al. 1998Citation ). Therefore, we suggest that nonsynonymous substitutions in conifers are mainly driven by mutation, not by diversifying selection as suggested by Gillespie (1989)Citation for mammalian genes. Because the number of sample genes is small, more genes need to be examined to generalize these findings.

Possibility for the Adaptive Selection at the Protein Level
ML analyses based on the M3 model identified three sites and one site with {omega} > 1 in the Ferr and F3h genes, respectively, suggesting that they are candidate sites under positive selection. The M8 model also detected the same candidate sites, but the test results were not significant. Because the number of sequences (s = 6) was small and sequence length (n = 115) was short in the Ferr genes in our study, the failure of obtaining a significance between M8 and M7 may be just a lack of power in the LRT. Because the neutrality of synonymous substitutions is not yet known in conifers, the fact that {omega} > 1 in some sites does not automatically mean that those sites are under positive selection, but at least they are good candidates for further research. The ferredoxin donates electrons to several proteins, which are important components of the photosynthesis apparatus and the nitrate reduction in plants. It is notable that such a basic gene has candidate sites for positive selection.

Molecular Evolution in Conifers
In summary, synonymous substitution rates for conifer nuclear genes are higher in Cupressoideae than in Taxodioideae, are variable among loci, and correlate with GC content and nonsynonymous rates but not with ENC. The dispersion indices in nonsynonymous substitutions are close to 1. Some characteristics are similar to those of mammals but others are not. These features of nucleotide substitutions in conifers are considered to reflect various factors that affected the evolution of those species and their ancestors. In addition to increasing the number of genes to examine as to what extent the general features found in this study hold, it is necessary to identify those evolutionary factors and evaluate their effects by accumulating more information on conifers that have been attracting less attention than other taxa thus far.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank Dr. B. S. Gaut and two anonymous referees for many helpful comments on the manuscript. This work was partially supported by a grant from the Program for Promotion of Basic Research Activities for Innovative Biosciences (PROBRAIN). H.T. was also partially supported by a grant from the Uehara Memorial Foundation.


    Footnotes
 
Brandon Gaut, Reviewing Editor

Keywords: Cupressaceae lineage effect substitution rate dispersion index the neutral theory Back

Address for correspondence and reprints: Hidenori Tachida, Department of Biology, Faculty of Sciences, Kyushu University, Ropponmatsu, Fukuoka 810-8560, Japan. htachscb{at}mbox.nc.kyushu-u.ac.jp . Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Akashi H., 1995 Inferring weak selection from pattern of polymorphism and divergence at silent sites in Drosophila DNA Genetics 139:1067-1076[Abstract/Free Full Text]

    Alvarez-Valin F., K. Jabbari, N. Carels, G. Bernardi, 1999 Synonymous and nonsynonymous substitution in genes from Gramineae: intragenic correlations J. Mol. Evol 49:330-342[ISI][Medline]

    Araki H., H. Tachida, 1997 Bottleneck effect on evolutionary rate in the nearly neutral mutation model Genetics 147:907-914[Abstract/Free Full Text]

    Bernardi G., 2000 The compositional evolution of vertebrate genomes Gene 259: (Special issue) 31-43.[ISI][Medline]

    Bernardi G., B. Oloffson, J. Flipski, M. Zerial, J. Salinas, G. Cuny, M. Meunierrotival, F. Rodier, 1985 The mosaic genome of warm-blooded vertebrates Science 228:953-958[ISI][Medline]

    Bielawski J. P., K. A. Dunn, Z. Yang, 2000 Rates of nucleotide substitution and mammalian nuclear gene evolution: approximate and maximum-likelihood methods leads to different conclusions Genetics 156:1299-1308[Abstract/Free Full Text]

    Brunsfeld S. J., P. S. Soltis, D. E. Soltis, P. A. Gadek, C. J. Quinn, D. D. Strenge, T. A. Ranker, 1994 Phylogenetic relationships among the genera of Taxodiaceae and Cupressaceae: evidence from rbcL sequences Syst. Bot 19:253-262[ISI]

    Carels N., P. Hatey, K. Jabbari, G. Bernardi, 1998 Compositional properties of homologous coding sequence from plants J. Mol. Evol 46:45-53[ISI][Medline]

    Cutler D. J., 2000 Understanding the overdispersed molecular clock Genetics 154:1403-1417[Abstract/Free Full Text]

    Dunn K. A., J. P. Bielawski, Z. Yang, 2001 Substitution rates in Drosophila nuclear genes: implications for translational selection Genetics 157:295-305[Abstract/Free Full Text]

    Eyre-Walker A., B. S. Gaut, 1997 Correlated rates of synonymous site evolution among plant genomes Mol. Biol. Evol 14:455-460[Abstract]

    Fennoy S. L., J. Bailey-Serres, 1993 Synonymous codon usage in Zea mays L. nuclear genes is varied by levels of C and G-ending codons Nucleic Acids Res 21:5294-5300[Abstract]

    Francino M. P., H. Ochman, 1999 Isochores results from mutation not selection Nature 400:30-31[ISI][Medline]

    Gadek P. A., D. L. Alpers, M. M. Heslewood, C. J. Quinn, 2000 Relationships within Cupressaceae sensu lato: a combined morphological and molecular approach Am. J. Bot 87:1044-1057[Abstract/Free Full Text]

    Gaut B. S., 1998 Molecular clocks and nucleotide substitution rates in higher plants Evol. Biol 30:93-120[ISI]

    Gaut B. S., B. R. Morton, B. M. McCaig, M. T. Clegg, 1996 Substitution rate comparisons between grasses and palms: synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL Proc. Natl. Acad. Sci. USA 93:10274-10279[Abstract/Free Full Text]

    Gillespie J. H., 1978 A general model to account for enzyme variation in natural populations. V. The SAS-CFF model Theor. Popul. Biol 13:1-45[ISI][Medline]

    ———. 1989 Lineage effects and the index of dispersion of molecular evolution Mol. Biol. Evol 6:636-647[Abstract]

    ———. 1991 The causes of molecular evolution Oxford University Press, Oxford, U.K

    ———. 1994 Substitution processes in molecular evolution. III Deleterious alleles. Genetics 138:943-952

    Hamrick J. L., M. J. Godt, 1990 Allozyme diversity in plant species Pp. 43–63 in A. H. D. Brown, M. T. Clegg, A. L. Kahler, and B. S. Weir, eds. Plant population genetics, breeding, and genetic resources. Sinauer, Sunderland, Mass

    Hurst L. D., E. J. B. Williams, 2000 Covariation of GC content and the silent site substitution rate in rodents: implications for methodology and for the evolution of isochores Gene 261:107-114[ISI][Medline]

    Ikemura T., 1985 Codon usage and tRNA content in unicellular and multicellular organisms Mol. Biol. Evol 2:13-34[Abstract]

    Iwasa Y., 1993 Overdispersed molecular evolution in constant environments J. Theor. Biol 164:373-393[ISI][Medline]

    Iwata H., T. Ihara-Ujino, K. Yoshimura, K. Nagasaka, Y. Tsumura, 2001 Cleaved amplified polymorphic sequence markers in Sugi, Cryptomeria japonica Theor. Appl. Genet 103:881-895[ISI]

    Kimura M., 1968 Evolutionary rate at the molecular level Nature 217:624-626[ISI][Medline]

    ———. 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences J. Mol. Evol 16:111-120[ISI][Medline]

    ———. 1983 The neutral theory of molecular evolution Cambridge University Press, Cambridge, U.K

    Kusumi J., Y. Tsumura, H. Yoshimaru, H. Tachida, 2000 Phylogenetic relationships in Taxodiaceae and Cupressaceae sensu stricto based on matK gene, chlL gene, trnL-trnF IGS region, and trnL intron sequences Am. J. Bot 87:1480-1488[Abstract/Free Full Text]

    Li W.-H., 1997 Molecular evolution Sinauer, Sunderland, Mass

    Lloyd A. T., P. M. Sharp, 1992 CODONS: a microcomputer program for codon usage analysis J. Hered 83:239-240[ISI][Medline]

    Matassi G., L. M. Montero, J. Salinas, G. Bernardi, 1989 The isochores organization and compositional distribution of homologous coding sequences in nuclear genomes of plants Nucleic Acids Res 17:5273-5290[Abstract]

    Matassi G., P. M. Sharp, C. Gautier, 1999 Chromosomal location effects on gene sequence evolution in mammals Curr. Biol 9:786-791[ISI][Medline]

    Miller C. N., 1977 Mesozoic conifers Bot. Rev 43:217-280[ISI]

    ———. 1988 The origin of modern conifer families Pp. 448–486 in C. B. Beck, ed. Origin and evolution of gymnosperms. Columbia University Press, New York

    Mukai Y., Y. Suyama, Y. Tsumuta, T. Kawahara, H. Yoshimaru, T. Kondo, N. Tomaru, N. Kuramoto, M. Murai, 1995 A linkage map for sugi (Cryptomeria japonica) based on RFLP, RAPD, and isozyme loci Theor. Appl. Genet 90:835-840[ISI]

    Nei M., L. Jin, 1989 Variances of the average number of nucleotide substitutions within and between populations Mol. Biol. Evol 6:290-300[Abstract]

    Nickrent D. L., R. J. Duff, A. E. Colwell, A. D. Wolfe, N. D. Young, K. E. Steiner, R. J. Duff, 1998 Molecular phylogenetic and evolutionary studies of parasitic plants Pp. 211–241 in D. E. Soltis, P. S. Soltis, and J. J. Doyle, eds. Molecular systematics of plants, 2nd edition. Chapman & Hall, New York

    Nielsen R., Z. Yang, 1998 Likelihood models for detecting positively selected amino-acid sites and applications to the HIV-1 envelope gene Genetics 148:929-936[Abstract/Free Full Text]

    Ohta T., 1973 Slightly deleterious mutant substitutions in evolution Nature 246:96-98[ISI][Medline]

    ———. 1992 The nearly neutral theory of molecular evolution Annu. Rev. Syst. Ecol 23:263-286[ISI]

    ———. 1995 Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory J. Mol. Evol 40:56-63[ISI][Medline]

    Ohta T., Y. Ina, 1995 Variation in synonymous substitution rates among mammalian genes and the correlation between synonymous and nonsynonymous divergences J. Mol. Evol 41:717-720[ISI][Medline]

    Price R. A., J. M. Lowenstein, 1989 An immunological comparison of the Sciadopityaceae, Taxodiaceae, and Cupressaceae Syst. Bot 14:141-149[ISI]

    Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]

    Salinas J., G. Mattasi, L. M. Montero, G. Bernardi, 1988 Compositional compartmentalization and compositional patterns in the nuclear genomes of plants Nucleic Acids Res 16:4269-4285[Abstract]

    Stefanovic S., M. Jager, J. Deutsch, J. Broutin, M. Masselot, 1998 Phylogenetic relationships of conifers inferred from partial 28S rRNA gene sequences Am. J. Bot 85:688-697[Abstract]

    Tajima F., 1993 Simple methods for testing the molecular evolutionary clock hypothesis Genetics 135:599-607[Abstract/Free Full Text]

    Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The Clustal-X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acid Res 25:4876-4882[Abstract/Free Full Text]

    Ticher A., D. Graur, 1989 Nucleic acid composition, codon usage, and the rate of synonymous substitution in protein-coding genes J. Mol. Evol 28:286-298[ISI][Medline]

    Tsumura Y., Y. Suyama, K. Yoshimura, N. Shirato, Y. Mukai, 1997 Sequence-tagged-sites (STSs) of cDNA clones in Cryptomeria japonica and their evaluation as molecular markers in conifers Theor. Appl. Genet 94:764-772[ISI]

    Tsumura Y., K. Yoshimura, N. Tomaru, K. Ohba, 1995 Molecular phylogeny of conifers using RFLP analysis of PCR-amplified specific chloroplast genes Theor. Appl. Genet 91:1222-1236[ISI]

    Ujino-Ihara T., K. Yoshimura, Y. Ugawa, H. Yoshimaru, K. Nagasaka, Y. Tsumura, 2000 Expression analysis of ESTs derived from the inner bark of Cryptomeria japonica Plant Mol. Biol 43:451-457[ISI][Medline]

    Wolfe K. H., P. M. Sharp, W.-H. Li, 1989 Mutation rates differ among regions of the mammalian genome Nature 337:283-285[ISI][Medline]

    Wright F., 1990 The ‘effective number of codons' used in a gene Gene 87:23-29[ISI][Medline]

    Yang Z., 2000 PAML: phylogenetic analysis by maximum likelihood. Version 3.0 University College London, U.K

    Yang Z., R. Nielsen, 1998 Synonymous and nonsynonymous rate variation in nuclear genes of mammals J. Mol. Evol 46:409-418[ISI][Medline]

    Yang Z., R. Nielsen, N. Goldman, A.-M. K. Pedersen, 2000 Codon-substitution models for heterogeneous selection pressure at amino acid sites Genetics 155:431-449[Abstract/Free Full Text]

    Zeng L.-W., J. M. Comeron, B. Chen, M. Kreitman, 1998 The molecular clock revisited: the rate of synonymous vs replacement change in Drosophila. Genetica 102/103:369–382

Accepted for publication January 15, 2002.