* Arborea and Chaire de Recherche du Canada en Génomique Forestière et Environnementale, Centre de Recherche en Biologie Forestière, Université Laval, Sainte-Foy, Québec Canada; and Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre, Sainte-Foy, Québec Canada
Correspondence: E-mail: bousquet{at}rsvs.ulaval.ca.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: conifer knox-I genes evolutionary rates functional divergence gene duplication genome mapping phylogenetic analysis
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Knox genes are a large group of transcription factors that belong to the homeobox gene family. KNOX proteins have a common structural organization consisting of six regions, including a conserved KNOX domain, a highly conserved ELK domain, and the homeodomain (HD) responsible for DNA binding (Ito, Sato, and Matsuoka 2002). All KNOX proteins have three conserved amino acids in the loop between helix I and helix II of HD, and, therefore, belong to the TALE (three amino acid loop extension) superfamily. Plant knox genes are defined by homology to the maize knotted1 (kn1) gene and are divided into two classes based on HD sequence similarity and expression patterns (Bharathan et al. 1999). Class II genes are found in all organs and no specific function has been ascribed to these genes. In contrast, class I genes are mainly expressed in embryos and in the shoot apical meristem (SAM), which is essential for organ formation in higher plants. Class I knox genes are known as transcriptional regulators that play an important role in plant architecture (Hake and Ori 2002). Some knox-I genes have been reported to control the development of the SAM during embryogenesis (Ito, Sato, and Matsuoka 2002). Loss-of-function mutations in the SHOOTMERISTEMLESS (STM) gene of Arabidopsis has led to the loss of embryogenic SAM, whereas other embryonic organs developed normally (reviewed in Takada and Tasaka [2002]). Thus, one of the functional attributes of knox-I genes is to maintain indeterminate cell fate and suppress determination processes within the SAM (Veit 2004). In agreement with these observations, the duplication and diversification of ancestral knox genes has resulted in gene families with related but unique functions that appeared to represent essential steps in the evolution of higher plant SAMs (Reiser, Sanchez-Baracaldo, and Hake 2000).
In conifers and more specifically in Norway spruce (Picea abies) and black spruce (Picea mariana), a total of three distinct knox-I paralogs (KN1, KN2, and KN3) have been reported from various studies (Sundas-Larsson et al. 1998; Hjortswang et al. 2002; Robert K. Rutledge, Canadian Forest Service, GenBank numbers U90091 and U90092). Phylogenetic analysis has shown that two of the paralogs (PmKN1 and PmKN2) form a monophyletic group and, thus, have evolved independently from angiosperm knox-I genes (Champagne and Ashton 2001). Recently, a distinct fourth member, KN4, has been identified from a large screening of cDNAs from diverse tissues in Picea mariana (Robert K. Rutledge, Canadian Forest Service, personal communication). Picea abies knox-I genes are expressed in embryogenic cultures, stems, roots, and cone buds but not in needles (Hjortswang et al. 2002). In Welwitschia mirabilis, knox-I genes are good indicators of meristem activity and maintenance (Pham and Sinha 2003). In addition, one of Picea abies knox-I genes, HBK2 (named PaKN3 in this study), appears to be involved in the development of somatic embryos (Hjortswang et al. 2002). Hence, the function of knox-I genes does not seem to be entirely redundant in conifers: some genes appear to be regulators during embryogenesis, and others are involved in postembryonic SAM maintenance. Therefore, it is likely that part of the diversity of conifer knox-I genes is functional and that footprints of functional divergence can be detected at the molecular level.
In this study, we used phylogenetic analysis of protein-coding sequences in combination with chromosomal localization to investigate on the expansion of knox-I gene family in conifers. Relationships with evolutionary rates were also examined to identify the tempos of evolution and the driving forces leading to the diversification of conifer knox-I genes.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
RT-PCR
The missing 5' and 3' ends of partial Pinus taeda cDNA clones (Center for Computational Genomics and Bioinformatics, University of Minnesota), and partial Populus trichocarpa x P. deltoides genomic contigs (International Populus Genome Consortium) were amplified by RACE using the SMART RACE cDNA Amplification Kit and the Advantage 2 PCR Kit according to the manufacturer's instructions (BD Biosciences, Palo Alto, Calif.). For RT-PCR analysis, total RNA was reverse-transcribed using the Superscript First Strand Synthesis System (Invitrogen, Carlsbad, Calif.). cDNAs were then PCR-amplified using primers specific to PtKN1 (forward, 5'-GGTTTTGAAACTAGTGTTTCATGAATATGG-3' and reverse, 5'-TGTGCTGTTTTCAGCGGCCGCTTCTTTCTA-3'), to PtKN2 (forward, 5'-GATGATGCAGTAAAGGCAGAC-3' and reverse, 5'-ACGGAAAATTACAGCTTCCCT-3'), to PtKN3 (forward, 5'- GAAATATAACTAGTTGGGATCATATG-3' and reverse, 5'-CAAATCTACAGCGGCCGCCTAAAACTG-3'), to PtdKN3 (forward, 5'-ATGGAGGACTACAATCAAATGAG-3' and reverse, 5'- TCATGGACCGAGCCGATAAT-3'), and to PtdKN2 (forward, 5'-ATGGAGGGTGGTGGTGGTGA-3' and reverse, 5'-TCAAAGCAGGGTGGGAGAGAT-3'). The complete cDNA sequence of KN4 was first determined in Picea mariana (PmKN4) by RT-PCR of midstage embryos and midstage female cones (a gift from Robert K. Rutledge, Canadian Forest Service).
Genomic PCR
Gene-specific primer pairs used to amplify spruce and pine knox-1 paralogs KN1, KN2, KN3, and KN4 are presented in tables 1 and 2, respectively, in Supplementary Material online. Because of the large size of the third intron in knox-1 genes (5.056 kb for the maize Kn1 gene [Vollbrecht et al. 1991], 5.5 kb for the rice OSH3 gene [Sato, Fukuda, and Hirano 2001], and 4.16 kb for the Arabidopsis KNAT2 gene [AC010796]), each gene was amplified in two fragments using gene-specific primers located in the more variable 5'UTR (untranslated region) and 3'UTR and designed from the alignment of complete cDNA sequences from pine and spruce (see above). For KN1, a region of 1.6 kb that encompassed the 5'UTR, the exon1, intron1, exon2, intron2, and the beginning of exon3 (5' part) and a region of 0.6 kb that encompassed the exon4, intron4, exon5, and 3'UTR (3' part) were amplified. For the other three conifer knox-I genes, the two amplicons had the same exon and intron structures with respective lengths of 2.1 kb and 0.75 kb for KN2, 1.6 kb and 0.57 kb for KN3, and 1.3 kb and 0.65 kb for KN4. PCR reactions were performed in 30 µl containing 20mM Tris-HCl (pH 8.4), 50 mM KCl, 1.5 to 2.0 mM MgCl2, 200 µM of each dNTP, 200 µM of both 5' and 3' primers, and 1.0 U of platinum Taq DNA polymerase (Invitrogen, Carlsbad, Calif.). About 5 to 20 ng of genomic DNA was used as template. A Peltier thermal cycler (DYADTM DNA Engine, MJResearch, Waltham, Mass.) was used, with the following thermal cycling profile: 4 min at 94°C, followed by 35 cycles of 30 s at 94°C, 30 s at annealing temperature optimized between 54°C and 58°C for each pair of primers, and 1 min at 72°C, followed by 10 min at 72°C. Each PCR fragment was directly sequenced in both directions with a Perkin-Elmer ABI 3730 XL DNA sequencer (Applied Biosystems, Foster City, Calif.), using BigDye Terminator cycle sequencing kits version 3.1. Contigs were constructed using Windows 32 SeqMan version 5.05 (DNASTAR Inc., Madison, Wis.) and using the BioEdit sequence alignment editor version 5.0.9 (Tom Hall, Department. of Microbiology, North Carolina State University). There was no evidence for artificial recombinants: all amplifications resulted in expected single sequencing products, alignment with known conifer cDNA sequences was congruent with expectations for each group of orthologs, and each gene was located at the same locus position on black spruce or white spruce linkage maps, whenever the 5' or the 3' part of the gene was used. Sequence data from this study (table 1) have been deposited in the GenBank database under accession numbers AY680380 to AY680405 (Pinus and Picea) and AY684937 to AY684938 (Populus trichocarpa x P. deltoides).
|
To study the evolution of the conifer group, gene trees were estimated from conifer sequences (excluding the partial Cryptomeria sequence) using MP as implemented in PAUP* UNIX version 4.0b10 (Swofford 2002). Gene trees based on DNA sequences had to be estimated for subsequent substitution rate calculations for various codon positions (see below). The aligned amino acid sequences were used as a guide for the alignment of the corresponding cDNA or genomic DNA sequences in the TRANALIGN program (EMBOSS). Then, distinct MP analyses were applied to first and second codon positions only (corresponding mostly to nonsynonymous substitutions) and to third codon positions only (corresponding mostly to synonymous substitutions). Conifer trees were rooted with two angiosperm knox-I genes (AtSTM and MtKnox) extracted from the angiosperm sister group to conifer genes, which was identified from the large-scale analysis of 75 knox-I sequences (see Results). The various trees were visualized using TREEVIEW version 1.6.6 (Page 2001; http://taxonomy.zoology.gla.ac.uk/rod/treeview.html).
Mapping knox-I Genes in Picea
The four conifer knox-I genes were mapped on the genome of Picea mariana and also on the genome of Picea glauca (white spruce) as a replicate. These two sympatric species are reproductively isolated and represent highly divergent lineages in the genus. For each species, two crosses with one parent in common were used (Picea mariana, cross numbers 9920002 and S11991V; Picea glauca, cross numbers 2856 and 2872), each consisting of 118 progeny, except 85 progeny for S11991V (table 3 in Supplementary Material online). More than 300 amplified fragment length polymorphisms (AFLPs) segregating 1:1 and 50 codominant markers (expressed sequence tag polymorphisms (ESTPs) and microsatellites), as well as the four conifer knox-I genes, were used to build each linkage map. ESTP markers of orthologous loci for Picea species have been previously developed by Pelgas, Isabel, and Bousquet (2004). The linkage analysis was performed with JoinMap version 3.0 (Van Ooijen and Voorrips 2001) using the parameter set for progeny derived by cross-pollination (CP). A minimum LOD threshold of 4.5 was used for grouping markers into linkage groups. Then, for ordering markers, a minimum LOD threshold of 3.0 and a recombination frequency of 0.35 were used. Parents and progeny were genotyped for the four conifer knox-I genes using DGGE (denaturing gradient gel electrophoresis) or CAPS (cleaved amplified polymorphic sequence) methodologies (see Pelgas, Isabel, and Bousquet [2004]), using specific primers for each gene (table 3 in Supplementary Material online). For comparison between Picea mariana and Picea glauca, at least three codominant markers per linkage group were used.
|
Rates of substitution per year were estimated using the penalized-likelihood method (Sanderson 2002) as implemented in the program r8s version 1.50 (Sanderson 2003). Optimal values of smoothing were determined by a cross-validation procedure implemented in the r8s program. Two distinct r8s analyses were conducted: the first one by considering the conifer MP tree constructed with first and second codon positions only (corresponding mostly to nonsynonymous substitutions) and the second one by considering the conifer MP tree constructed with third codon positions only (corresponding mostly to synonymous substitutions).
Adaptive evolution between the knox-1 conifer paralogs was investigated based on a relative rate ratio test (RRRT) developed by Creevey and McInerney (2002) as implemented in CRANN (Creevey and McInerney 2003). For each internal branch of the conifer tree, changes in the descendant clade described by that internal branch were counted. This procedure results in four types of substitutions: replacement (nonsynonymous) invariable (RI), replacement variable (RV), silent (synonymous) invariable (SI), and silent variable (SV). A significant difference between the ratio of RI to RV and SI to SV is indicative of positive selection.
To further analyze functional divergence between knox-1 conifer paralogs, sites with evolutionary rate shift involved in type-I or type-II divergence were identified using likelihood ratio tests, as implement in the program LRT (Knudsen et al. 2003). Type I sites represent amino acid configurations that are conserved in one paralog but highly variable in another; type II sites are fixed for different amino acids between paralogs (Wang and Gu 2001). LRT constructs a multiple sequence alignment with two subfamilies at a time representing two clusters of duplicated genes and infers a tree using the NJ method. The branch lengths are estimated with maximum likelihoods under the Jones, Taylor, and Thornton (JTT) model (Jones, Taylor, and Thornton 1992).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The tree reported in figure 1 represents a majority-rule consensus of the 32 most parsimonious trees based on all positions of the protein sequence. The two moss sequences, PpMKN2 and PpMKN4, were sister group to the seed plants. The angiosperm sequences were divided into four large groups, each with medium to high bootstrap support from the various analytical procedures used (fig. 1). Bootstrap support for major nodes was generally consistent between MP and NJ (fig. 1). Considering sequences from conserved domains only instead of complete sequences resulted in higher or lower bootstraps, depending of the node (fig. 1). For MP analyses, considering gaps as an additional character did not result in a consistently higher or lower bootstrap support (fig. 1).
|
Within the conifer group, the four paralogous subgroups (KN1, KN2, KN3, and KN4) contained pine and spruce sequences (fig. 1). Hence, the divergence between the four conifer subgroups occurred before the split between Pinus and Picea. The partial cDNA from Cryptomeria japonica (CjKN) could not be considered in phylogenetic analyses limited to conserved domains because it consisted of only 118 amino acid residues in the C-terminal region. Nevertheless, in the other analyses, it clustered with the conifer group but did not cluster with any of the four KN subgroups, presumably because this sequence was partial (fig. 1).
KN1 and KN2 showed a high degree of amino acid identity (average of 85%), whereas KN4 and KN3 paralogs were less similar to KN1 and KN2 and between each other (from 58% to 68% identity). Altogether, the diversification pattern suggests three successive events of duplication (D1 to D3) occurring during a period not exceeding 160 Myr from the split between angiosperms and gymnosperms (300 Myr [Savard et al. 1994]) to the divergence of conifer genera (140 Myr [Florin 1963; Miller 1988]).
Assignment of the Four Conifer knox-1 Genes to Linkage Groups
The four conifer knox-I paralogs were mapped on the genome of Picea mariana and Picea glauca. From one individual linkage map to another within each species, and between Picea mariana and Picea glauca, the order of codominant markers remained consistent within linkage groups. Detailed results on the linkage maps developed from the four progeny arrays will be reported elsewhere (B. Pelgas, S. Beauseigle, J. Bousquet, and N. Isabel, unpublished data). The less divergent KN1 and KN2 were assigned to the same linkage group and localized close to each other, whereas KN3 and KN4 mapped on different linkage groups (fig. 2), paralleling their more remote phylogenetic position.
|
|
|
In contrast, after the split between Pinus and Picea (140 Myr, see asterisks in fig. 3), the rates per site per year slowed down dramatically for all four paralogs, as indicated by the very short branches on the ratograms (fig. 3), with mean values of 0.094 x 109 (12-fold difference) for first and second codon positions and 0.385 x 109 (sixfold difference) for third codon positions. A correlated response was observed between the two categories of sites because a rate decrease of approximately the same amplitude was observed for third codon positions defining mostly synonymous changes versus first and second codon positions defining mostly nonsynonymous changes (fig. 3).
Rates of Synonymous and Nonsynonymous Substitution in Conifer knox-I Genes
The above estimates of substitution rate per site per year indicated an early period of faster evolution, followed by a period of slower evolution for the four conifer knox-I genes. To investigate whether this temporal heterogeneity in evolutionary rates was caused by selection, synonymous (dS) and nonsynonymous (dN) rates of substitution per site and the ratio dN/dS () were evaluated for all pairwise comparisons of conifer sequences (table 2). Between paralogs, the estimated dN values and the estimated dS values had a mean of 0.271 and 2.003, respectively. Between orthologs, the estimated dN values had a mean of 0.020, and the estimated dS values averaged 0.135. For all pairwise comparisons, the estimated dS values were seven times higher, on average, than dN values. Both dN and dS values were, on average, 14 times higher in pairwise comparisons of paralogues than between orthologous gene pairs within each conifer knox-I gene cluster, indicating a larger degree of sequence divergence between the four conifer paralogs. The coefficient of variation of rates among all conifer knox-I genes was 0.58 for dN and 0.63 for dS, indicating that synonymous and nonsynonymous rates had varied about to the same extant among conifer genes. The dN/dS ratios (
) measured between conifer paralogs had a mean of 0.143 and were similar to those between orthologous gene pairs within each conifer knox-I gene cluster, which averaged 0.130. A near homogenous
among conifer knox-I genes also indicates that both synonymous and nonsynonymous sites evolved in a correlated fashion, as seen from r8s analyses. Overall, the
ratios indicated strong functional constraints (purifying selection,
< 1).
To further characterize the patterns of sequence divergence between duplicated genes, we examined factors other than selection that could contribute to variation in rates of synonymous and nonsynonymous substitution among genes. As GC content and codon usage bias have been shown to affect rates of synonymous substitution (e.g., Zhang, Vision, and Gaut 2002), we estimated the effective number of codons (ENCs), the GC content, and the transition/tranversion ratio for each gene. GC content was similar among paralogs. However, the ENCs were lower for KN4, the %GC at third codon positions was lower for KN3, and both genes had a less skewed transition/tranversion ratio, as compared with that of the closely linked KN1 and KN2 (table 3).
Adaptive Evolution
We used the relative rate ratio test (RRRT) developed by Creevey and Mclnerney (2002) to detect whether knox-I conifer genes underwent a period of adaptive evolution. We found far more replacement invariant (RI) substitutions, those nonsynonymous substitutions where the new character state is preserved in all subsequent lineages, than would be expected from neutrality alone for a number of internal branches of the conifer knox-I gene tree (table 4). These branches are all located before the divergence of Pinus from Picea, 140 MYA (fig. 3A). These branches include that from the conifer knox-1 "root" node to node D1 (#17), from the node D1 to D2 (#16), from the node D2 to D3 (#15), from the node D1 to the KN4 clade (#3), and from the node D2 to the KN3 clade (#7). Although the estimation of failed to detect positive selection, such an excess of RI over RV pattern should be taken as evidence for positive directional selection and neofunctionalization. Theses branches were also characterized by an accelerated rate of change per site per year in the r8s analyses (fig. 3A). No significant differences between the RI/RV and SI/SV ratios were noted for the branches leading to the KN1 clade (#11) or the KN2 clade (#14) (table 4).
The likelihood ratio tests (LRTs) developed by Knudsen et al. (2003) were used to detect significant rate shift at specific amino acid sites (table 5). The first duplication, that between KN4 and the other three gene paralogs, was analyzed by comparing amino acid changes in the multiple sequence alignment constructed with KN4 genes (here considered as subfamily 1), and KN1, KN2, and KN3 genes (here considered as subfamily 2). The LRTs identified 15 sites with U
2, those considered to have highly significant rate shift (Knudsen et al. 2003), and some of them were located in conserved functional domains (table 5). The second duplication was analyzed by comparing the amino acid changes between KN3 genes (here considered as subfamily 1) and KN1 and KN2 genes (here considered as subfamily 2). The LRTs identified 5 sites with
U
2, with some of them in conserved functional domains (table 5). The third duplication was analyzed by comparing the amino acid sequences of KN1 orthologs to those of KN2 orthologs. The LRTs identified only one site with
U
2, and it was located outside the conserved functional domains (table 5). These results correlate those from the RRRTs.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
For three of the four angiosperm groups delineated by phylogenetic analyses, there were no conifer sister groups. Given that such groups were expected, this observation suggests incomplete sampling of conifer genes or gene loss. However, given the numerous clones analyzed from cDNA libraries screened from diverse tissues in Picea mariana and given the various pine and spruce EST databases screened herein (see Results), it is possible that the absence of conifer genes in some of the angiosperm groups observed is not a sampling artifact and that gene loss might have happened in conifers after new paralogs were gained. The birth and death of genes is a common theme in gene family evolution (Nei, Rogozin, and Piontkivska 2000), and the majority of genes appear to be lost after duplication (e.g. Kellis, Biren, and Lander 2004). The absence of monocot sequences in some of the knox-I angiosperm groups (fig. 1) may represent another putative case of gene loss (see figure 1), as reported by Tioni, Gonzales, and Chan (2003). It is also possible that other knox-I genes exist in conifers and in monocots, which have not been sampled because of transient or low levels of expression (Robert G. Rutledge, personal communication). It could be argued that gene loss in certain taxa would be more likely if the various groups of spermatophyte knox-I genes had high functional redundancy. Recently, sequence comparisons, expression studies, and mutant analysis indicated that the knox-I genes AtKNAT1, OsH15, ZmRS1, and ZmKnox4, all present in the group A3 (fig. 1), may encode equivalent knox functions in grasses and dicots (Veit 2004). However, evidence for functional redundancy between phylogenetically distant knox-I genes in angiosperms remains contradictory. For instance, it has been recently shown that AtSTM, an Arabidopsis knox-I gene from the group A4 (fig. 1), regulates AtKNAT1 (group A3) and AtKNAT2 (group A2), thus, suggesting divergent roles in regulating meristem function among these three angiosperm knox-I groups (Byrne, Simorowski, and Martienssen 2002). At the same time, mutant analyses revealed partial redundancy among Arabidopsis genes. AtKNAT1 was shown to assume a redundant role with AtSTM in the vegetative shoot apical meristem but could not compensate for AtSTM in floral meristem (Byrne, Simorowski, and Martienssen 2002). More detailed functional studies will be necessary before any firm conclusion can be reached on the redundancy of function among knox-I paralogs.
Mechanisms for the Expansion of the knox-1 Gene Family in Conifers
The phylogenetic structure of knox-I genes from conifers suggests that three duplication events (D1, D2, and D3 [fig. 1]) occurred between the split of the lineage leading to conifers from that leading to angioperms, around 300 MYA (Savard et al. 1994), and the more recent divergence between Pinus and Picea, here indicated by the diversification of the Pinaceae, around 140 MYA (Florin 1963; Miller 1988). KN1 and KN2, which are the result of a more recent duplication, were assigned to the same linkage group and localized close to each other. On the other hand, KN3 and KN4 mapped on different linkage groups (fig. 2), paralleling their more remote locations on the phylogenetic tree and their increased divergence at the sequence level. The remote chromosomal location of KN3 and KN4 would involve local duplication followed by a secondary mechanism bringing about rearrangements and translocation of chromosomal segments to a distant location. Such a model of structural evolution has been proposed for the homeobox-leucine zipper genes in Arabidopsis (Schena and Davis 1994). Comparative genomics studies also indicated that gene duplication is often accompanied by genetic map changes in all genomes surveyed (e.g., Riechmann et al. 2000; Lynch et al. 2001). Our results are coherent with the analysis of the Arabidopsis homeobox gene family, which showed that gene duplicates located on different chromosomes are more frequent ( 71%) than those on the same ones (Riechmann et al. 2000). Perhaps this trend is reflective of the relative abundance of ancient versus more recent gene duplication events.
Functional Divergence Among Conifer knox-I Genes
It has been shown early on that after gene duplication, one gene duplicate maintain the original function, whereas the other duplicate is free to accumulate amino acid changes as a result of functional redundancy and positive directional selection (Ohno 1970; Li 1983). Recent studies have indicated a more diverse array of potential outcomes in terms of fates of duplicated genes: pseudogeneization, functional redundancy, subfunctionalization in which each daughter gene adopts part of the functions of their parental gene, and neofunctionalization in which there is gain of novel function (for reviews, see Lawton-Rauh [2003] and Zhang [2003]). The differences between the last three categories are not so obvious in the absence of structural or biochemical data at the protein level. In our study, evidence for purifying selection was detected at the molecular level, with ratios of nonsynonymous (dN) to synonymous (dS) substitutions much smaller than 1 between sequences within each conifer knox-I gene as well as between conifer knox-I genes (dS values six times higher than dN values on average) (table 2). At their face value, these estimates indicate little evidence for directional selection. A similar trend towards low
values (dS values five times higher than dN values) was observed between gene duplicates in Arabidopsis (Zhang, Vision, and Gaut 2002).
However, the ratio is based on rate averages since the split between two gene sequences. It does not take into account possible temporal heterogeneity in rates of evolution since the split between two gene duplicates, which could be indicative of periods of relaxed selection, followed by periods of more stringent selection. Such a rate shift is suggested by recent theoretical studies showing that duplicate genes experience two phases of evolution: a rather short period of relaxed selection and, thereafter, a period of purifying selection (Lynch and Conery 2003). When estimating rates of substitution per year for the diverse branches of the conifer tree (fig. 3), signatures of positive selection and functional divergence were evident for internal branches leading to the four conifer knox-I gene clades. R8s analyses showed that the calibrated rates of evolution per year were on average nine times larger before the split between Pinus and Picea than after, 140 MYA. Using the RRRT of Creevey and McInerney (2002), significant positive directional selection was also detected for several internal branches after the duplication of conifer knox-I genes. Such nonneutral divergence, early in the history of the conifer knox-I genes, is likely the result of a reduction in selective constraints on at least one duplicate, together with a commensurate increase in the amino acid replacement rate (Zhang 2003). At the opposite, short branches at the tip of the tree after the split between Pinus and Picea are indicative of a slow down in rates of evolution for the four conifer knox-I genes (fig. 3), which would imply more intense purifying selection in the recent past.
Relation Between Functional Divergence and Change in Map Position
The different chromosomal locations observed for the conifer knox-I paralogs were related to increased functional divergence. In our study, three unlinked locations were disclosed, corresponding respectively to (1) the genes KN1 and KN2, (2) the gene KN3, and (3) the gene KN4. These distinct locations parallel the significant results from RRRTs for positive selection and the large number of sites with significant rate shift from LRTs among the three groups. On the basis of model-based predictions, Lynch et al. (2001) suggested that the probability of change in map position per newly arisen gene duplicate increases with the strength of selection on neofunctional alleles. Similarly, the analysis of the genomic organization of multigene families in Arabidopsis suggested that the movement of gene duplicates to unlinked chromosomal regions may play an important role in functional divergence because of less exposure to sequence homogenization (Leister 2004).
The conifer paralogs KN1 and KN2 were positioned close to each other on the same linkage group, and there was little indication of significant positive directional selection between them, which is suggestive of more or less complete redundancy of function. At the same time, they presented a higher degree of sequence identity (average of 85% at the amino acid level), they had similar effective numbers of codon, %GC at third codon positions and transition/transversion ratios (table 3). It might be argued that this greater homogeneity is the result of a more recent gene duplication. However, the close physical relationship between KN1 and KN2 might have contributed to this greater homogeneity by exposing them to sequence homogenization effects. Indeed, the rates of substitution per site per year were lower for the branches leading to the KN1 or the KN2 clade (#11 and #14), as compared with those leading to the KN3 or KN4 clade (#7 and #3) and, more notably, to the branches representing the common ancestor of KN1 and KN2 (#15) and that of KN1, KN2, and KN3 (#16) (fig. 3).
Correlated Rates of Evolution Between Synonymous and Nonsynonymous Sites
A near homogenous was observed among conifer knox-I genes, indicating correlated rates of synonymous and nonsynonymous substitutions. Correlated temporal heterogeneity in rates of evolution was also observed between rates of change per year for first and second codon positions (mostly nonsynonymous) and rates of change per year for third codon positions (mostly synonymous): longer internal branches were noted before the divergence between Pinus, and Picea and shorter branches we noted after the divergence between Pinus and Picea (fig. 3). These trends are suggestive of factors affecting both synonymous and nonsynonymous sites in a similar way. A similar correlated trend between dS and dN was reported from the study of 10 nuclear conifer genes (Kusumi et al. 2002) and from the study of 242 duplicated genes from chromosomes 2 and 4 of Arabidopsis thaliana (Zhang, Vision, and Gaut 2002). This pattern could be indicative of background selection, which implies the removal of deleterious mutations as well as linked neutral or nearly neutral substitutions (Charlesworth, Morgan, and Charlesworth 1993). It would result in a decrease in the proportion of neutral mutations and a consequent correlated decrease in the rates of synonymous and nonsynonymous substitutions (Ohta 1995). Such a correlated selective constraint on both synonymous and nonsynonymous sites has also been observed in mammalian genes (Mouchiroud, Gautier, and Bernardi 1995).
This study demonstrates that knox-I genes did not evolve in a linear fashion in conifers. Initial periods of relaxed evolution have been followed by a period of conservatism, at least for the past 140 Myr. Taxon sampling in additional orders of the Gymnosperms would help delineate the beginning of this conservative period. A number of hypotheses at the functional level also need to be addressed by gene expression studies and investigations on biochemical properties and protein structure. For instance, it is anticipated that KN1 and KN2 would show more redundancy or overlapping in their physiological roles than with KN3 and KN4, which roles should be more distinct. These studies should assist in validating the putative functional importance of the changes observed at the molecular level and the extent of subfunctionalization or neofunctionalization among these genes.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Bharathan, G., B. J. Janssen, E. A. Kellogg, and N. Sinha. 1999. Phylogenetic relationships and evolution of the KNOTTED class of plant homeodomain proteins. Mol. Biol. Evol. 16:553563.[Abstract]
Bouillé, M., and J. Bousquet. 2004. Trans-species shared polymorphisms at orthologous nuclear gene loci among distant species in the conifer Picea: implications for the maintenance of genetic diversity in trees. Am. J. Bot. (in press).
Byrne, M. E., J. Simorowski, and R. A. Martienssen. 2002. ASYMMETRIC LEAVES1 reveals knox gene redundancy in Arabidopsis. Development 129:19571965.[ISI][Medline]
Champagne, C. E. M., and N. W. Ashton. 2001. Ancestry of KNOX genes revealed by bryophyte (Physcomytrella patens) homologs. New Phytol. 150:2336.[CrossRef][ISI]
Chang, S., J. Puryear, and J. Cairney. 1993. A simple and efficient method for isolating RNA from pine trees. Plant Mol. Biol. Rep. 11:113116.
Charlesworth, B., M. T. Morgan, and D. Charlesworth. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:12891303.
Creevey, C. J., and J. O. McInerney. 2002. An algorithm for detecting directional and non-directional positive selection, neutrality and negative selection in protein coding DNA sequences. Gene 300:4351.[CrossRef][ISI][Medline]
Creevey, C. J., and J. O. McInerney. 2003. CRANN: detecting adaptive evolution in protein-coding DNA sequences. Bioinformatics 19:1726.
Florin, R. 1963. The distribution of conifer and taxad genera in time and space. Acta Horti. Bergiani 20:121312.
Gaucher, E. A., X. Gu, M. M. Miyamoto, and S. A. Benner. 2002. Predicting functional divergence in protein evolution by site-specific rate shifts. Trends Biochem. Sci. 27:315321.[CrossRef][ISI][Medline]
Gu, X., and K. Vander Velden. 2002. DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 18:500501.
Hake, S., and N. Ori. 2002. Plant morphogenesis and KNOX genes. Nat. Genet. 31:121122.[CrossRef][ISI][Medline]
Hall, T. A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids Symp. Ser. 41:9598.
Hjortswang, H. I., A. Sundas-Larsson, G. Bharathan, P. V. Bozhkov, S. von Arnold, and T. Vahala. 2002. KNOTTED1-like homeobox genes of gymnosperm, Norway spruce, expressed during somatic embryogenesis. Plant Physiol. Biochem. 40:837843.[CrossRef][ISI]
Ingouff, M., I. Farbos, M. Wiweger, and S. von Arnold. 2003. The molecular characterization of PaHB2, a homeobox gene of the HD-GL2 family expressed during embryo development in Norway spruce. J. Exp. Bot. 54:13431350.
Ito, M., Y. Sato, and M. Matsuoka. 2002. Involvement of homeobox genes in early body plan of monocot. Int. Rev. Cytol. 218:135.[ISI][Medline]
Jones, D. T., W. R. Taylor, and J. M. Thornton. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275282.[Abstract]
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.
Kellis, M., B. W. Birren, and E. S. Lander. 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428:617624.[CrossRef][ISI][Medline]
Knudsen, B., M. M. Miyamoto, P. J. Laipis, and D. N. Silverman. 2003. Using evolutionary rates to investigate protein functional divergence and conservation: a case study of the carbonic anhydrases. Genetics 164:12611269.
Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:12441245.
Kusumi, J., Y. Tsumura, H. Yoshimaru, and H. Tachida. 2002. Molecular evolution of nuclear genes in Cupressacea, a group of conifer trees. Mol. Biol. Evol. 19:736747.
Lawton-Rauh, A. 2003. Evolutionary dynamics of duplicated genes in plants. Mol. Phylogenet. Evol. 29:396409.[CrossRef][ISI][Medline]
Leister, D. 2004. Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance genes. Trends Genet. 20:116122.[CrossRef][ISI][Medline]
Li, W. H. 1983. Evolution of duplicated genes. Pp. 1437 in M. Nei and R. K. Koehin, eds. Evolution of genes and proteins. Sinauer, Sunderland, Mass.
Lynch, M., and J. S. Conery. 2003. The evolutionary demography of duplicate genes. J. Struct. Funct. Genomics 3:3544.[CrossRef][Medline]
Lynch, M., M. O'Hely, B. Walsh, and A. Force. 2001. The probability of preservation of a newly arisen gene duplicate. Genetics 159:17891804.
Miller, C. N. 1988. The origin of modern conifer families. Pp. 448486 in C. B. Beck, ed. Origin and evolution of Gymnosperms. Columbia University Press, New York.
Mouchiroud, D., C. Gautier, and G. Bernardi. 1995. Frequencies of synonymous substitutions in mammals are gene-specific and correlated with frequencies of nonsynonymous substitutions. J. Mol. Evol. 40:107113.[ISI][Medline]
Nam, J., C. W. dePamphilis, H. Ma, and M. Nei. 2003. Antiquity and evolution of the MADS-box gene family controlling flower development in plants. Mol. Biol. Evol. 20:14351447.
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418426.[Abstract]
Nei, M., I. B. Rogozin, and H. Piontkivska. 2000. Purifying selection and birth-and-death evolution in the ubiquitin gene family. Proc. Natl. Acad. Sci. USA 97:1086610871.
Ohno, S. 1970. Evolution by gene duplication. Allen and Unwin, London.
Ohta, T. 1995. Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol. 40:5663.[ISI][Medline]
Pelgas, B., N. Isabel, and J. Bousquet. 2004. Efficient screening for expressed sequence tag polymorphisms (ESTP) by DNA pool sequencing and denaturing gradient gel electrophoresis in spruces. Mol. Breed. 13:263279.[CrossRef][ISI]
Pham, T., and N. Sinha. 2003. Role of KNOX genes in shoot development of Welwitschia mirabilis. Int. J. Plant Sci. 164:333343.[CrossRef][ISI]
Prager, E. M., D. P. Fowler, and A. C. Wilson. 1976. Rates of evolution in conifers (pinaceae). Evolution 30:637649.[ISI]
Reiser, L., P. Sanchez-Baracaldo, and S. Hake. 2000. Knots in the family tree: evolutionary relationships and functions of knox homeobox genes. Plant Mol. Biol. 42:151166.[CrossRef][ISI][Medline]
Riechmann, J. L., J. Heard, G. Martin et al. (17 co-authors). 2000. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290:21052110.
Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer, and R. Rozas. 2003. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:24962497.
Sanderson, M. J. 2002. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Mol. Biol. Evol. 19:101109.
. 2003. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19:301302.
Sato, Y., Y. Fukuda, and H. Y. Hirano. 2001. Mutations that cause amino acid substitutions at the invariant positions in homeodomain of OSH3.KNOX protein suggest artificial selection during rice domestication. Genes Genet. Syst. 76:381392.[CrossRef][ISI][Medline]
Savard, L., P. Li, S. H. Strauss, M. W. Chase, M. Michaud, and J. Bousquet. 1994. Chloroplast and nuclear gene sequences indicate late Pennsylvanian time for the last common ancestor of extant seed plants. Proc. Natl. Acad. Sci. USA 91:51635167.[Abstract]
Schena, M., and R. W. Davis. 1994. Structure of homeobox-leucine zipper genes suggests a model for the evolution of gene families. Proc. Natl. Acad. Sci. USA 91:83938397.[Abstract]
Semiarti, E., Y. Ueno, H. Tsukaya, H. Iwakawa, C. Machida, and Y. Machida. 2001. The ASYMMETRIC LEAVES2 gene of Arabidopsis thaliana regulates formation of a symmetric lamina, establishment of venation and repression of meristem-related homeobox genes in leaves. Development 128:17711783.
Sundas-Larsson, A., M. Svenson, H. Liao, and P. Engstrom. 1998. A homeobox gene with potential developmental control function in the meristem of the conifer Picea abies. Proc. Natl. Acad. Sci. USA 95:1511815122.
Swofford, D. L. 2002. PAUP*: phylogeneric analysis using parsimony (* and other methods). Sinauer Associates, Sunderland, Mass.
Takada, S., and M. Tasaka. 2002. Embryonic shoot apical meristem formation in higher plants. J. Plant Res. 115:411417.[CrossRef][ISI][Medline]
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:48764882.
Tioni, M. F., D. H. Gonzalez, and R. L. Chan. 2003. Knotted1-like genes are strongly expressed in differentiated cell types in sunflower. J. Exp. Bot. 54:681690.
Ujino-Ihara, T., K. Yoshimura, Y. Ugawa, H. Yoshimaru, K. Nagasaka, and Y. Tsumura. 2000. Expression analysis of ESTs derived from the inner bark of Cryptomeria japonica. Plant Mol. Biol. 43:451457.[CrossRef][ISI][Medline]
Van Ooijen, J. W., and R. E. Voorrips. 2001. JoinMap: Software for the calculation of genetic linkage maps. Version 3.0. Plant Research International, Wageningen, The Netherlands.
Veit, B. 2004. Determination of cell fate in apical meristems. Curr. Opin. Plant Biol. 7:5764.[CrossRef][ISI][Medline]
Vollbrecht, E., B. Veit, N. Sinha, and S. Hake. 1991. The developmental gene Knotted-1 is a member of a maize homeobox gene family. Nature 350:241243.[CrossRef][ISI][Medline]
Wang, Y., and X. Gu. 2001. Functional divergence in the caspase gene family and altered functional constraints: statistical analysis and prediction. Genetics 158:13111320.
Ware, D., P. Jaiswal, J. Ni et al. (12 co-authors). 2002. Gramene: a resource for comparative grass genomics. Nucleic Acids Res. 30:103105.
Wei, X. X., and X. Q. Wang. 2004. Evolution of 4-coumarate:coenzyme A ligase (4CL) gene and divergence of Larix (Pinaceae). Mol. Phylogenet. Evol. 31:542553.[CrossRef][ISI][Medline]
Wright, F. 1990. The effective number of codons used in a gene. Gene 87:2329.[CrossRef][ISI][Medline]
Xue, B., P. J. Charest, Y. Devantier, and R. G. Rutledge. 2003. Characterization of a MYBR2R3 gene from black spruce (Picea mariana) that shares functional conservation with maize C1. Mol. Genet. Genomics 270:7886.[CrossRef][ISI][Medline]
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13:555556.[Medline]
Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:3243.
Zhang, J. 2003. Evolution by gene duplication: an update. Trends Ecol. Evol. 18:292298.[CrossRef][ISI]
Zhang, L., T. J. Vision, and B. S. Gaut. 2002. Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol. Biol. Evol. 19:14641473.
|