Rapid Evolution of the Family of CONSTANS LIKE Genes in Plants

Ulf Lagercrantz2, and Tomas Axelsson

Department of Plant Biology, Swedish University of Agricultural Sciences, Uppsala, Sweden


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
A family of CONSTANS LIKE genes (COLs) has recently been identified in Arabidopsis thaliana and other plant species. CONSTANS, the first isolated member, is a putative zinc finger transcription factor that promotes the induction of flowering in A. thaliana in long photoperiods. Phylogenetic analysis of the COL family demonstrated that it is organized into a few distinct groups, some of which evolved before the divergence of gymnosperms and angiosperms. Molecular evolutionary analyses showed that COL genes within the Brassicaceae family evolve rapidly. The number of nonsynonymous substitutions was larger, and the ratio of nonsynonymous to synonymous substitutions was higher. The analysis also indicated that the rate of evolution is heterogeneous between different domains in the COL genes. The results support previous data indicating that plant regulatory genes evolve relatively fast and that the rate of evolution varies significantly between different regions of those genes. The rate of evolution of COL genes seems to have accelerated during later stages of evolution, possibly as an effect of frequent gene duplications.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
The evolution of regulatory genes is an important aspect of the diversification of physiological and developmental processes. However, there are relatively little data on the molecular evolution of such genes. One important mechanism for the evolution of novel gene function, in regulatory as well as other genes, is gene duplication and subsequent functional divergence. After duplication, the gene pair can retain the same function and remain similar, the pair may diverge and one of the genes may acquire a new function, or one copy might lose its function and become a pseudogene.

The relative importance of selection and genetic drift in the evolution of duplicated genes is under debate. It has been argued that there is a period of fast evolution following duplication during which natural selection fix changes in one copy of the gene thereby creating a new function (Goodman 1976Citation ). Others have argued that the relaxation of selection due to redundancy allows fixation of neutral mutations that eventually may result in a new function (Ohno 1970Citation ; Kimura 1983Citation ; Li 1985Citation ). Both models may result in an enhanced nonsynonymous substitution rate. Positive selection may result in nonsynonymous substitution rates that exceed those of synonymous substitution, while in the latter case, the nonsynonymous rate will never exceed that of synonymous substitutions. Thus, the relative role of genetic drift and natural selection in the differentiation of duplicated genes can be studied by comparing the synonymous and nonsynonymous substitution rates between such genes.

CONSTANS LIKE (COL) genes are members of a recently identified family of plant zinc finger proteins. CONSTANS (CO), the first discovered member, was identified in Arabidopsis thaliana (Putterill et al. 1995Citation ). CO probably acts as a transcriptional activator, directly or indirectly activating the floral meristem identity gene LEAFY (Simon, Igeno, and Coupland 1996Citation ). When CO was isolated, no homologous sequence could be detected in public databases (Putterill et al. 1995Citation ). However, a striking arrangement of cysteine residues was present near the N-terminus of the protein. This arrangement was similar to the DNA-binding zinc finger domain of GATA1 transcription factors (Ramain et al. 1993Citation ). The C-terminal region of CO contains a stretch of positively charged amino acids that may serve as a nuclear localization signal (Robert et al. 1998Citation ), although the function of this conserved basic domain remains to be fully elucidated. These two features suggested that CO might function as a transcription factor. All mutations characterized to date affect one of the two regions.

Two CO like genes (COL1 and COL2) have since been reported in A. thaliana (Ledger, Dare, and Putterill 1996Citation ; Putterill et al. 1997Citation ), and recent database searches have identified additional CO homologous sequences in Arabidopsis and rice (Song et al. 1998Citation ).

To study the molecular evolution of the family of CO-related genes, we performed an extensive search of the databases using protein sequences of CO and CO-homologous sequences, and we isolated a number of homologous genes from Brassica nigra genomic libraries. We present the first phylogenetic analysis of the family of COL genes. The analyses show that (1) these genes evolved rapidly and (2) the rate of evolution is heterogeneous between different domains in the COL genes.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Isolation of CO Family Members from B. nigra
A cDNA of the A. thaliana CO gene (Putterill et al. 1995Citation ) was used to screen a B. nigra genomic library in {lambda}EMBL3. Hybridizations were performed at moderate stringency. More than 80 clones were isolated and characterized. Restriction mapping and hybridization experiments identified four different genes. One representative of each gene was subcloned and sequenced using cycle sequencing and an automatic sequencer (ABI377). The four genes were positioned on the B. nigra linkage map using a previously described mapping population (Lagercrantz and Lydiate 1995Citation ).

Expression Analysis of B. nigra COL Genes
RNA was extracted from leaves according to Hernould et al. (1993), followed by two additional chloroform/isoamyl alcohol extractions and a final precipitation in EtOH. The RNA was resuspended in DEPC-treated RNA (0.5–1 µg), DNase-treated, and used for first-strand cDNA synthesis (Boehringer) with oligo d(T)18 primer in a total volume of 20 µl. Ten microliters of the resulting cDNA was amplified by PCR using primers specific for Bni COa, Bni COb, Bni COL1, and Bni COL2, respectively. Control PCR reactions were also performed on reverse transcription reactions lacking reverse transcriptase to check for contamination of genomic DNA. The PCR products were separated on agarose gels.

Database Searches
The deduced amino acid sequence of the A. thaliana CO gene was used to search for homologous genes among translated sequences from EMBL and GenBank using BLAST (Altschul et al. 1990Citation ) and FASTA (Pearson 1990Citation ). After phylogenetic analysis (see below), genes from the divergent locations of the tree were used to search for additional homologous genes. Genes identified in the database searches were defined as members of the COL family if they contained two adjacent zinc finger motifs fitting the consensus sequences CX2CX16CX2C (fig. 1 ).



View larger version (107K):
[in this window]
[in a new window]
 
Fig. 1.—Alignment of amino acid sequences from the CONSTANS LIKE (COL) genes used in the phylogenetic analysis. Gaps are indicated with dashes. Dark and light shaded boxes indicate identical and similar amino acid residues, respectively. Accession numbers of the different genes are given in table 1

 
Phylogenetic Analysis
DNA sequences were initially aligned using CLUSTAL W (Thompson, Higgins, and Gibson 1997Citation ). Alignments were further refined manually, considering both amino acid and nucleotide sequences. Phylogenetic relationships were inferred using maximum-parsimony (MP) and distance-based methods. In the broader-level phylogenetic analysis (fig. 2 ), only the first and second codon positions were used, while all three codon positions were used in the analysis presented in figure 3 . The MP searches were performed in PAUP, version 3.1 (Swofford 1993Citation ), and used heuristic search strategies, consisting of ASIS addition sequences followed by tree bisection-reconnection (TBR) branch swapping.



View larger version (20K):
[in this window]
[in a new window]
 
Fig. 2.—Phylogenetic relationships of 32 CONSTANS LIKE (COL) genes based on analyses of nucleotide sequences of the zinc finger region using weighted neighbor joining. The first and second codon positions of a 300-bp DNA sequence were used to produce this tree. Numbers at some nodes denote support from three different phylogenetic reconstruction methods. The first and second numbers indicate bootstrap support derived from weighted neighbor joining and maximum parsimony, respectively (500 replicates), and the third indicates quartet puzzling support values based on 10,000 puzzling steps from maximum-likelihood analysis

 


View larger version (11K):
[in this window]
[in a new window]
 
Fig. 3.—Phylogenetic relationships of 11 CONSTANS LIKE (COL) genes based on nucleotide sequences using nucleotides at all three codon positions. This is the single most-parsimonious tree. Malus domestica genes (Md) were used as an outgroup. Numbers at the nodes denote bootstrap support based on 500 replicates

 
Weighted neighbor-joining (WNJ) trees (Bruno, Socci, and Halpern 2000) were constructed based on the Kimura (1980)Citation two-parameter model. This method takes into account that errors in distance estimates are exponentially larger for longer distances (Bruno, Socci, and Halpern 2000). WNJ appears to be relatively immune to the "long branches attract" and "long distance distracts" drawbacks observed with MP and neighbor joining (NJ) (Bruno, Socci, and Halpern 2000). The PUZZLE program (Strimmer and von Haeseler 1996Citation ) was also used for maximum-likelihood (ML) phylogenetic reconstruction based on the HKY model of sequence evolution (Hasegawa, Kishino, and Yano 1985Citation ). The ML analysis was performed assuming rate heterogeneity with eight categories of sites.

Nodal support was estimated using bootstrap analyses based on 500 replicates in MP and WNJ analyses and using ML/quartet puzzling support values calculated on the basis of 10,000 puzzling steps (Strimmer and von Haeseler 1996Citation ) in ML analysis.

Analysis of Substitution Rates
Synonymous (Ks) and nonsynonymous (Ka) nucleotide substitutions were calculated with a simplified version of the codon-based model of Goldman and Yang (1994)Citation that uses a single distance between any pair of amino acids. Codon frequencies were calculated using the nucleotide frequencies at the three codon positions, resulting in nine parameters for the codon frequencies. Calculations were performed with the program codonml in the PAML, version 2.0, package (Yang 1997Citation ).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Isolation of CONSTANS Homologs from B. nigra
More than 80 clones, isolated after screening of B. nigra genomic libraries with a CO cDNA clone, were characterized by restriction mapping. These analyses identified four different sets of clones, representing four different genes. Representatives of these four genes were positioned on a B. nigra linkage map (Lagercrantz and Lydiate 1995Citation ) and sequenced. The two most similar genes mapped to two genomic regions originating from a genome duplication in B. nigra, which occurred after the divergence of lineages leading to Arabidopsis and Brassica (Lagercrantz 1998Citation ). RT-PCR experiments showed that all four B. nigra COL genes (Bni COa, Bni COb, Bni COL1, and Bni COL2) are expressed (fig. 4 ).



View larger version (120K):
[in this window]
[in a new window]
 
Fig. 4.—Detection of mRNA of Brassica nigra COL genes by RT-PCR. RNA was isolated from leaves and used to synthesize cDNA, which was amplified by PCR using primers specific for Bni COa, Bni COb, Bni COL1, and Bni COL2, respectively. To check for contamination of genomic DNA in the RNA preparations, amplifications were also conducted on cDNA synthesis reactions without reverse transcriptase (RT). Amplifications based on cDNA synthesis with RT are indicated with a plus sign, and those without RT are indicated with a minus sign

 
Database Searches Identified a Large Family of Zinc Finger Genes
The deduced amino acid sequence of the A. thaliana CONSTANS gene was used to search translated sequences from EMBL and GenBank. The searches identified four previously published A. thaliana genes, 56 Arabidopsis expressed sequence tags (ESTs), and 45 Arabidopsis genomic sequences. In addition, related sequences were identified from Brassica napus, Oryza sativa, Malus domestica, Raphanus sativus, and Pinus radiata. We defined members of the COL family based on the presence of two adjacent zinc fingers, each fitting the consensus sequence CX2CX16CX2C (fig. 1 ). Based on this criterion, and after removal of apparent duplicate Arabidopsis EST and genomic sequences, 32 sequences were retained. Accession numbers for these genes used in following analysis are shown in table 1 .


View this table:
[in this window]
[in a new window]
 
Table 1 Accession Numbers of Genes Used in this Study

 
Phylogenetic Relationships Among COL Genes
The phylogenetic relationships between different COL genes were analyzed using the first and second codon positions of a 300-bp DNA sequence of the zinc finger region (the alignment of the deduced amino acid sequences is shown in fig. 1 ). Due to a high rate of DNA sequence evolution (see below), other parts of the genes were difficult to align and were thus not included in the analysis. Figure 2 shows a WNJ tree depicting the relationships among the identified zinc finger genes. Support for the topology was estimated from 500 bootstrap replicates, and nodes occurring in less than 50% of the replicates were collapsed. Four major clades are indicated in the tree, with bootstrap support ranging from 66% to 92%. Phylogenetic trees produced with MP and ML methods displayed the same major topology with four groups (A–D; fig. 2 ). Support for the major nodes from using the three different methods are indicated in figure 2 .

Each clade contains several A. thaliana genes, suggesting that duplication and gene diversification have been frequent in the evolution of COL genes. The first clade (group A) includes three previously identified Arabidopsis genes, the CONSTANS (CO) gene and CONSTANS LIKE 1 and 2 (COL1 and COL2). In addition, this group includes two B. napus genes and the COL genes isolated from B. nigra in the present study. To obtain a better resolution of group A, a phylogenetic reconstruction was performed using the DNA sequence of the complete coding sequence of genes from this group (fig. 3 ). In this analysis, a fourth B. nigra gene (Bni COL2) was included. This gene could not be included in the previous analysis, as the zinc finger region was deleted from that gene. Two genes from the apple (Md1 and Md2) were also included as an outgroup.

Two B. napus genes and two B. nigra genes cluster with A. thaliana CO. Brassica napus is an amphidiploid, and the B. napus genes originate from each of the two progenitor genomes. The existence of at least two B. nigra CO homologs is consistent with previous mapping data indicating that the genome of the diploid B. nigra is extensively duplicated (Lagercrantz and Lydiate 1996Citation ; Lagercrantz 1998Citation ).

In addition, the presence of at least one B. nigra homolog of COL1 and one COL2 was indicated from the phylogenetic analyses (fig. 3 ). In A. thaliana, COL1 is located about 3 kb upstream of CO on chromosome 5. Similarly, the B. nigra COL1 homolog is located about 3 kb upstream of the B. nigra COa homolog (unpublished data). The lack of genes from species outside the Brassicaceae family in group A (fig. 2 ) indicates either that this group evolved recently or that CO orthologs are present in other species but have not yet been identified. The topology of the tree indicates that duplications that gave rise to CO, COL1, and COL2 occurred before the divergence of the lineages leading to Arabidopsis and Brassica.

Group B (fig. 2 ) comprises genes from Arabidopsis, R. sativus, M. domestica, and P. radiata, indicating that the ancestral gene to that group was present before the divergence of gymnosperms and angiosperms. Groups C and D contain several genes from Arabidopsis and rice. The preponderance of genes from Arabidopsis and rice is most likely due to the comparatively large numbers of genes sequenced from these species. Even though the bootstrap support for group C was relatively weak, the distinctness of group C was supported by two separate features of those genes: the lack of a basic region near the C-terminus and the presence of an intron between the two zinc fingers. Visual inspection of amino acid sequence alignments of the complete genes revealed two domains that were conserved in proteins from most parts of the phylogenetic tree. In addition to the zinc finger domain present in all studied genes, a basic region was present near the C-terminus of most predicted proteins. This basic region was rich in Lys and Arg, and parts of it showed similarity to a predicted bipartite nuclear localization signal (Robert et al. 1998Citation ). This basic domain was present in genes from all parts of the phylogenetic tree except in genes from group C (fig. 2 ). Genes in group C also had a variable number of extra amino acids between the two zinc fingers as compared with the other genes. This spacer region coincided with the presence of an intron between the zinc fingers in the DNA sequence of the group C genes.

Group D comprises a number of highly divergent sequences from Arabidopsis and rice, suggesting that some genes might cluster in group D as an effect of long-branch attraction (LBA; Hendy and Penny 1989Citation ). We used the WNJ and ML methods to reduce the effects of LBA. However, these methods still rely on a reasonably correct substitution model, which is unknown, so the clustering in group D should be viewed with some caution.

Nucleotide Substitution Rates
The rate of evolution of COL genes was studied by comparing orthologous genes between A. thaliana and B. nigra or B. napus (within group A). No attempt was made to study the rate of evolution of the complete coding region using more distantly related genes. The high frequency of insertions/deletions made it difficult to unambiguously align the whole gene using these diverged homologs. Synonymous and nonsynonymous substitutions were calculated between orthologous COL genes in A. thaliana and B. nigra or B. napus (table 2 ). The divergence time of the Arabidopsis and Brassica lineages is poorly known. To enable a comparison of the substitution rates of the COLs with other genes, synonymous and nonsynonymous substitutions were also calculated for nine other genes sequenced in A. thaliana and in at least one species in the genus Brassica or the closely related genus Raphanus (Chs2).


View this table:
[in this window]
[in a new window]
 
Table 2 Synonymous and Nonsynonymous Substitutions Between Different Orthologous Genes from Arabidopsis thaliana and Brassica Species

 
The synonymous substitution rates were not significantly different between COLs and the other genes (F1,12 = 1.72, P = 0.2). However, comparing COLs with other genes revealed significantly higher nonsynonymous rates in COLs (F1,12 = 50.9, P < 0.0001). The only one of the non-COL genes that had a similar level of replacement substitutions was RPM1, a gene involved in resistance to bacterial pathogens (Grant et al. 1995Citation ). These data indicate that the COL genes (at least within group A) evolve more rapidly than other genes. The degree of sequence constraint can also be evaluated by determining the ratio of nonsynonymous to synonymous substitutions for different genes. This ratio corrects for possible rate alterations that result from variations in mutation rate, as synonymous substitutions are expected to be nearly neutral (Li, Wu, and Luo 1985Citation ). The Ka/Ks ratio was significantly higher in the COL group (F1,12 = 116.8, P < 0.0001), again indicating a low degree of sequence constraint in COL genes.

Rates of Evolution in Different Domains of COL Genes
It has been suggested that regulatory genes are often divided into domains that are characterized by slowly and rapidly evolving sequences. To check for heterogeneity in evolutionary rate along COL genes, the distribution of nucleotide diversity based on nonsynonymous substitutions along the genes was studied sliding a window of 50 sites. This analysis included all COL genes in group A.

The distribution shown in figure 5 suggests two relatively conserved domains, one close to the N-terminus and one close to the C-terminus of the protein. The N-terminal region comprises the two putative zinc fingers, and the C-terminal region comprises the basic domain rich in Lys and Arg. Loss of function mutations in either of these domains in the CO gene showed that both domains are important for functional activity of CO gene product in A. thaliana (Putterill et al. 1995Citation ; Robert et al. 1998Citation ).



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 5.—Distribution of nonsynonymous divergence along the COL genes in group A (figs. 2 and 3 ) using a sliding window of 50 sites. The divergence was estimated as the average proportion of nonsynonymous nucleotide differences between genes applying Jukes and Cantor's (1969)Citation correction

 
To study the rate of evolution in different domains, synonymous and nonsynonymous substitutions were calculated separately for the zinc finger domain, the C-terminal basic domain, and the intervening, more variable domain (table 3 ). The zinc finger domain and the C-terminal basic domain showed comparable Ka/Ks ratios, but these ratios were significantly lower than that for the intervening domain in all comparisons.


View this table:
[in this window]
[in a new window]
 
Table 3 Synonymous and Nonsynonymous Substitutions in Different Domains of COL Genes from Arabidopsis thaliana and Brassica nigra or Brassica napus

 
Accelerated Evolution of COL Genes
Comparison of substitution rates between genes from A. thaliana and Brassica species indicated a comparatively high rate of nonsynonymous substitutions in CO and COL1 genes. To study whether this accelerated rate was specific to a particular evolutionary lineage, substitution rates were calculated for individual branches of the phylogenetic tree in figure 3 . We used the ML approach of Goldman and Yang (1994)Citation to test the constancy of nonsynonymous to synonymous substitution rates among the branches. Estimation was performed assuming either a single ratio ({omega}) for all branches in the tree or assuming different ratios for each branch. The log likelihood value for the one-ratio model was l1 = -6,714.14, while the model assuming a different ratio for each branch resulted in a log likelihood of l19 = -6,688.34. Twice the log likelihood difference (2{Delta}l = 51.6) was compared with a {chi}2 with df = 19 to test whether the free-ratio model fitted the data significantly better than the one-ratio model. The free-ratio model resulted in a significantly better fit (P < 0.001). Visual inspection of the ratios for the different branches indicated that most of the difference in the ratios of nonsynonymous to synonymous substitution rates was due to a lower ratio in the two basal branches of the tree. A model with one ratio for the most basal branches ({omega}1, for branches a and b; fig. 3 ) and a second ratio ({omega}2) for all other branches yields a log likelihood of l2 = -6,697.09. The one-ratio and the two-ratio models are nested and can be compared with a {chi}2 with one degree of freedom. The basal branches had a significantly lower ratio than the other branches (table 4 ). No additional ratio heterogeneity was evident among the "nonbasal" branches (2{Delta}l = 17.48; df = 17; P > 0.05).


View this table:
[in this window]
[in a new window]
 
Table 4 Parameter Estimates and Likelihood Ratio Statistics (2{{Delta}}l, df = 1) Under a One-Ratio ({{omega}}) and a Two-Ratio ({{omega}}1, {{omega}}2) Model in Different Domains of COL Genes in Figure 3

 
A significantly lower ratio for the two basal branches was indicated for each of the three identified domains of the gene. The conserved zinc finger domain, the basic domain, and the intervening variable domain all showed a significantly better fit using the two-ratio model as compared with the one-ratio model (table 4 ).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
The COL family of putative zinc finger transcription factors is relatively large. In A. thaliana, there are at least 17 different genes, and additional sequencing will most likely reveal additional members. The presence of several homologs in monocots (rice) indicates that the major diversification of the family took place before the separation of monocots and dicots more than 100 MYA. The isolated gymnosperm member (fig. 2 ) shows that at least part of the family was present before the divergence of gymnosperms and angiosperms.

There are four relatively distinct clades in the phylogenetic analysis. Groups B and C contain genes from dicots, gymnosperms, and monocots, showing that these groups have persisted for a considerable period. Group A includes only genes from the Brassicaceae family, indicating that true CO orthologs have not yet been isolated in species outside the Brassicaceae family. Additional sequencing will show if this group diverged recently in the evolution of dicots or if members are also present in species outside the Brassicaceae family.

There is at present not much data on the function of COL genes. The CO gene of A. thaliana is involved in the timing of flower induction in response to photoperiod. The CO gene probably acts a transcriptional activator that promotes flowering (Putterill et al. 1995Citation ; Simon, Igeno, and Coupland 1996Citation ). The second gene for which a function is indicated is the STO gene of A. thaliana. This gene can functionally complement yeast calcineurin mutants and can also increase salt tolerance of wild-type yeast (Lippuner, Cyert, and Gasser 1996Citation ). It was suggested that STO might modulate a calcineurin-dependent pathway in yeast. Although the function of STO in Arabidopsis is still unknown, present data indicate that STO functions in a process completely different from the one in which CO is involved. COL genes in group A and STO like genes in group C are also clearly differentiated at the amino acid sequence. Group C genes all lack the C-terminal basic domain that is present in most other genes in the COL family. As mentioned previously, analysis of induced mutations has shown that this domain is important in CO function.

Rapid Evolution of COL Genes
Genes in the COL family seem to evolve relatively fast. Within group A in the phylogenetic reconstruction (figs. 2 and 3 ), the nonsynonymous substitution rate was significantly higher for the COL gene as compared with a random sample of other genes in Arabidopsis and Brassica. Based on mitochondrial nad4 genes, Yang et al. (1999)Citation estimated a divergence time for the Arabidopsis and Brassica lineages of 22 MYA. This divergence time yields a nonsynonymous substitution rate of 3.32 x 10-9 for COL genes, which is higher than the average calculated for a large number of genes (0.87 x 10-9; Li, Wu, and Luo 1985Citation ).

The COL genes in group A (i.e., CO, COL1, and COL2 and their Brassica homologs) also display marked variation in the rate of replacement substitutions along the sequences. The two domains that have been indicated to be important for the function of A. thaliana CO, the zinc finger region and the basic N-terminal region, show relatively low rates of replacement substitutions. These rates are similar to those of other genes in the Brassicaceae family (cf. tables 1 and 2 ) and a large number of other genes (Li, Wu, and Luo 1985Citation ). In contrast, the estimate for the variable region is about five times as high (4.70 x 10–9). The higher evolutionary rate of the variable domain in COL genes is also supported by estimates of the ratio of nonsynonymous to synonymous substitutions. The ratios for the two conserved domains were 0.13 and 0.09, which are close to 0.14, the average reported for a number of dicot and monocot genes (Martin, Gierl, and Saedler 1989Citation ; Huang, Stebbins, and Rodriguez 1992Citation ), while the ratio for the variable domain was 0.64.

The nonsynonymous substitution rates estimated for COL genes are more similar to those reported for two other sets of regulatory genes, the plant MADS-box family (1.4 x 10-9; Purugganan et al. 1995Citation ) and the R family (2.6 x 10-9; Purugganan and Wessler 1994Citation ). Similar to the COL genes, these two gene families exhibited three- and fourfold increased replacement rates in variable regions as compared with conserved regions.

Alternating segments of rapidly and slowly evolving sequences have also been noted for other regulatory genes (Tucker and Lundrigan 1993Citation ; Whitfield, Lovell-Badge, and Goodfellow 1993Citation ; Rausher, Miller, and Tiffin 1999Citation ).

This observation could be explained by relaxed constraint on variable regions and by a high proportion of substitutions in these regions being effectively neutral. Alternatively, the high nonsynonymous substitution rates in variable regions could be the result of positive selection for amino acid substitutions. From data for A. thaliana, it is clear that CO, COL1, and COL2 have diverged in function. Mutations in CO, resulting in delayed flowering under long days, cannot be compensated by COL1 or COL2. It remains to be determined which mutations have been important for the diverged function. Whether these mutations are mainly located in the conserved regions that are known to be important for CO function or in the intervening region that is evolving at a high rate is not known. Even though the rate of nonsynonymous substitutions was high in the variable region, our data did not indicate any positive selection (i.e., a nonsynonymous to synonymous ratio >1). However, the power of this test is low, as it assumes that all sites in the studied sequence are under the same selection pressure with the same underlying Ka/Ks ratio.

Analysis of the rate of nonsynonymous to synonymous substitution among different branches of the phylogenetic tree indicated that the rate has increased during later stages of evolution. The two most basal branches of the three had a significantly lower nonsynonymous to synonymous substitution ratio than did younger branches. An increased evolutionary rate was indicated not only for the rapidly evolving middle region of the protein, but also for the two more conserved flanking regions (the zinc finger region and the basic region). The structure of the evolutionary tree indicates that gene duplications have been frequent in later stages of evolution. In the limited sample of genes in figure 3 , there are two closely related CO homologs in the apple and several homologs in A. thaliana and Brassica species. One possible reason for the accelerated evolution of COL genes indicated by the present data could be the frequent occurrence of duplicated copies that are allowed to diverge and acquire new functions. The high rate of substitutions in COL genes is probably not due to relaxed selection in pseudogenes, as all the COL genes included in the analysis (fig. 3 ) are still expressed. Expression has been observed for CO, COL1, COL2, (Putterill et al. 1995, 1997Citation ; Ledger, Dare, and Putterill 1996Citation ), Bna COa, Bna COb (Robert et al. 1998Citation ), Bni COa, Bni COb, Bni COL1, Bni COL2, (present study), and Md1 and Md2 (Jeong, Sung, and An 1999Citation ).

It remains to be determined whether selection is the main factor operating to diversify the COL genes or whether the rapid evolution (primarily of the variable domain) is mainly an effect of relaxed sequence constraint. We will extend the analysis to include comparison of polymorphism and divergence at nonsynonymous and synonymous sites, as such comparisons are more powerful with regard to detecting adaptive evolution in closely related species (McDonald and Kreitman 1991Citation ). These analyses might provide more insight into which domains have been important in the functional diversification of COL genes.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
We thank Martin Lascoux for valuable comments on the manuscript. The Swedish Council for Forestry and Agricultural Research supported this work.


    Footnotes
 
Pekka Pamilo, Reviewing Editor

1 Keywords: gene duplication gene family evolution molecular evolution phylogeny zinc finger protein Back

2 Address for correspondence and reprints: Ulf Lagercrantz, Department of Plant Biology, Swedish University of Agricultural Sciences, Box 7080, S-750 07 Uppsala, Sweden. E-mail: ulf.lagercrantz{at}vbiol.slu.se Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 

    Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lippman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410[ISI][Medline]

    Bruno, W. J., N. D. Socci, and A. L. Halpern. 2000. Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol. 17:189–197[Abstract/Free Full Text]

    Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725–736[Abstract/Free Full Text]

    Goodman, M. 1976. Protein sequences in phylogeny. Pp. 141–159 in F. J. Ayala, ed. Molecular evolution. Sinauer, Sunderland, Mass

    Grant, M. R., L. Godiard, E. Straube, T. Ashfield, J. Lewald, A. Sattler, R. W. Innes, and J. L. Dangl. 1995. Structure of the Arabidopsis RPM1 gene enabling dual specificity disease resistance. Science 269:843–846

    Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174[ISI][Medline]

    Hendy, M. D., and D. Penny. 1989. A framework for the quantitative study of evolutionary trees. Syst. Zool. 38:297–309[ISI]

    Hernould, M., S. Suharsono, S. Litvak, A. Araya, and A. Mouras. 1993. Male-sterility induction in transgenic tobacco plants with an unedited atp9 mitochondrial gene from wheat. Proc. Natl. Acad. Sci. USA 90:2370–2374

    Huang, N., G. L. Stebbins, and R. Rodriguez. 1992. Classification of and evolution of the alpha-amylase genes in plants. Proc. Natl. Acad. Sci. USA 89:7526–7530

    Jeong, D. H., S. K. Sung, and G. An. 1999. Molecular cloning and characterization of CONSTANS-like cDNA clones of the Fuji apple. J. Plant Biol. 42:23–31

    Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21–132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York

    Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Biol. 16:111–120

    ———. 1983. The neutral theory of molecular evolution. Cambridge University Press, London

    Lagercrantz, U. 1998. Comparative mapping between Arabidopsis thaliana and Brassica nigra indicates that Brassica genomes have evolved through extensive genome replication accompanied by chromosome fusions and frequent rearrangements. Genetics 150:1217–1228

    Lagercrantz, U., and D. Lydiate. 1995. RFLP mapping in Brassica nigra indicates differing recombination rates in male and female meiosis. Genome 38:255–264

    ———. 1996. Comparative genome mapping in Brassica. Genetics 144:1903–1910

    Ledger, S. E., A. P. Dare, and J. Putterill. 1996. COL2 is a homologue of the Arabidopsis flowering-time gene CONSTANS. Plant Physiol. 112:862

    Li, W.-H. 1985. Accelerated evolution following gene duplication and its implication for the neutralist-selectionist controversy. Pp. 333–352 in T. Ohta and K. Aoki, eds. Population genetics and molecular evolution. Japan Scientific Societies Press, Tokyo

    Li, W. H., M. Wu, and C. L. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitutions considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150–174[Abstract]

    Lippuner, V., M. S. Cyert, and C. S. Gasser. 1996. Two classes of plant cDNA clones differentially complement yeast calcineurin mutants and increase salt tolerance of wild-type yeast. J. Biol. Chem. 271:12859–12866[Abstract/Free Full Text]

    McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652–654

    Martin, W., A. Gierl, and H. Saedler. 1989. Molecular evidence for pre-Cretaceous angiosperm origins. Nature 339:46–48

    Ohno, S. 1970. Evolution by gene duplication. Springer Verlag, Berlin

    Pearson, W. R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183:63–98[ISI][Medline]

    Purugganan, M. D., S. D. Rounsley, R. J. Schmidt, and M. F. Yanofsky. 1995. Molecular evolution of flower development: diversification of the plant MADS-box regulatory gene family. Genetics 140:345–356

    Purugganan, M. D., and S. R. Wessler. 1994. Molecular evolution of the plant R regulatory gene family. Genetics 138:849–854

    Putterill, J., S. Ledger, K. Lee, F. Robson, G. Murphy, and G. Coupland. 1997. The flowering-time gene CONSTANS and homologue CONSTANS LIKE 1 (accession no. Y 10555 and Y10556) exists as a tandem repeat on chromosome 5 of Arabidopsis. Plant Physiol. 114:396

    Putterill, J., F. Robson, K. Lee, R. Simon, and G. Coupland. 1995. The CONSTANS gene of Arabidopsis promotes flowering and encodes a protein showing similarities to zinc finger transcription factors. Cell 80:847–857

    Ramain, P., P. Heitzler, M. Haenlin, and P. Simpson. 1993. pannier, a negative regulator of achaete and scute in Drosophila, encodes a zinc finger protein with homology to the vertebrate transcription factor GATA-1. Development 119:1277–1291

    Rausher, D., R. E. Miller, and P. Tiffin. 1999. Patterns of evolutionary rate variation among genes of the anthocyanin biosynthetic pathway. Mol. Biol. Evol. 16:266–274[Abstract]

    Robert, L. S., F. Robson, A. Sharpe, D. Lydiate, and G. Coupland. 1998. Conserved structure and function of the Arabidopsis flowering time gene CONSTANS in Brassica napus. Plant Mol. Biol. 37:763–772

    Simon, R., M. I. Igeno, and G. Coupland. 1996. Activation of floral meristem identity genes in Arabidopsis. Nature 384:59–62

    Song, J., K. Yamamoto, A. Shomura, H. Itadani, H. S. Zhong, M. Yano, and T. Sasaki. 1998. Isolation and mapping of a family of putative zinc-finger protein cDNA from rice. DNA Res. 5:95–101[Medline]

    Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964–969[Free Full Text]

    Swofford, D. 1993. PAUP: phylogenetic analysis using parsimony. Illinois Natural History Survey, Champaign

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1997. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignments through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680[Abstract]

    Tucker, P. K., and B. Lundrigan. 1993. Rapid evolution of the sex-determining loci in Old World mice and apes. Nature 364:715–717

    Whitfield, L., R. Lovell-Badge, and P. Goodfellow. 1993. Rapid sequence evolution of the sex-determining gene SRY. Nature 364:713–715

    Yang, Z. 1997. PAML, a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555–556

    Yang, Y. W., K. N. Lai, P. T. Tai, and W. H. Li. 1999. Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages. J. Mol. Evol. 48:597–604[ISI][Medline]

Accepted for publication June 13, 2000.