Department of Plant Biology, Swedish University of Agricultural Sciences, Uppsala, Sweden
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The relative importance of selection and genetic drift in the evolution of duplicated genes is under debate. It has been argued that there is a period of fast evolution following duplication during which natural selection fix changes in one copy of the gene thereby creating a new function (Goodman 1976
). Others have argued that the relaxation of selection due to redundancy allows fixation of neutral mutations that eventually may result in a new function (Ohno 1970
; Kimura 1983
; Li 1985
). Both models may result in an enhanced nonsynonymous substitution rate. Positive selection may result in nonsynonymous substitution rates that exceed those of synonymous substitution, while in the latter case, the nonsynonymous rate will never exceed that of synonymous substitutions. Thus, the relative role of genetic drift and natural selection in the differentiation of duplicated genes can be studied by comparing the synonymous and nonsynonymous substitution rates between such genes.
CONSTANS LIKE (COL) genes are members of a recently identified family of plant zinc finger proteins. CONSTANS (CO), the first discovered member, was identified in Arabidopsis thaliana (Putterill et al. 1995
). CO probably acts as a transcriptional activator, directly or indirectly activating the floral meristem identity gene LEAFY (Simon, Igeno, and Coupland 1996
). When CO was isolated, no homologous sequence could be detected in public databases (Putterill et al. 1995
). However, a striking arrangement of cysteine residues was present near the N-terminus of the protein. This arrangement was similar to the DNA-binding zinc finger domain of GATA1 transcription factors (Ramain et al. 1993
). The C-terminal region of CO contains a stretch of positively charged amino acids that may serve as a nuclear localization signal (Robert et al. 1998
), although the function of this conserved basic domain remains to be fully elucidated. These two features suggested that CO might function as a transcription factor. All mutations characterized to date affect one of the two regions.
Two CO like genes (COL1 and COL2) have since been reported in A. thaliana (Ledger, Dare, and Putterill 1996
; Putterill et al. 1997
), and recent database searches have identified additional CO homologous sequences in Arabidopsis and rice (Song et al. 1998
).
To study the molecular evolution of the family of CO-related genes, we performed an extensive search of the databases using protein sequences of CO and CO-homologous sequences, and we isolated a number of homologous genes from Brassica nigra genomic libraries. We present the first phylogenetic analysis of the family of COL genes. The analyses show that (1) these genes evolved rapidly and (2) the rate of evolution is heterogeneous between different domains in the COL genes.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Expression Analysis of B. nigra COL Genes
RNA was extracted from leaves according to Hernould et al. (1993), followed by two additional chloroform/isoamyl alcohol extractions and a final precipitation in EtOH. The RNA was resuspended in DEPC-treated RNA (0.51 µg), DNase-treated, and used for first-strand cDNA synthesis (Boehringer) with oligo d(T)18 primer in a total volume of 20 µl. Ten microliters of the resulting cDNA was amplified by PCR using primers specific for Bni COa, Bni COb, Bni COL1, and Bni COL2, respectively. Control PCR reactions were also performed on reverse transcription reactions lacking reverse transcriptase to check for contamination of genomic DNA. The PCR products were separated on agarose gels.
Database Searches
The deduced amino acid sequence of the A. thaliana CO gene was used to search for homologous genes among translated sequences from EMBL and GenBank using BLAST (Altschul et al. 1990
) and FASTA (Pearson 1990
). After phylogenetic analysis (see below), genes from the divergent locations of the tree were used to search for additional homologous genes. Genes identified in the database searches were defined as members of the COL family if they contained two adjacent zinc finger motifs fitting the consensus sequences CX2CX16CX2C (fig. 1
).
|
|
|
Nodal support was estimated using bootstrap analyses based on 500 replicates in MP and WNJ analyses and using ML/quartet puzzling support values calculated on the basis of 10,000 puzzling steps (Strimmer and von Haeseler 1996
) in ML analysis.
Analysis of Substitution Rates
Synonymous (Ks) and nonsynonymous (Ka) nucleotide substitutions were calculated with a simplified version of the codon-based model of Goldman and Yang (1994)
that uses a single distance between any pair of amino acids. Codon frequencies were calculated using the nucleotide frequencies at the three codon positions, resulting in nine parameters for the codon frequencies. Calculations were performed with the program codonml in the PAML, version 2.0, package (Yang 1997
).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Each clade contains several A. thaliana genes, suggesting that duplication and gene diversification have been frequent in the evolution of COL genes. The first clade (group A) includes three previously identified Arabidopsis genes, the CONSTANS (CO) gene and CONSTANS LIKE 1 and 2 (COL1 and COL2). In addition, this group includes two B. napus genes and the COL genes isolated from B. nigra in the present study. To obtain a better resolution of group A, a phylogenetic reconstruction was performed using the DNA sequence of the complete coding sequence of genes from this group (fig. 3 ). In this analysis, a fourth B. nigra gene (Bni COL2) was included. This gene could not be included in the previous analysis, as the zinc finger region was deleted from that gene. Two genes from the apple (Md1 and Md2) were also included as an outgroup.
Two B. napus genes and two B. nigra genes cluster with A. thaliana CO. Brassica napus is an amphidiploid, and the B. napus genes originate from each of the two progenitor genomes. The existence of at least two B. nigra CO homologs is consistent with previous mapping data indicating that the genome of the diploid B. nigra is extensively duplicated (Lagercrantz and Lydiate 1996
; Lagercrantz 1998
).
In addition, the presence of at least one B. nigra homolog of COL1 and one COL2 was indicated from the phylogenetic analyses (fig. 3 ). In A. thaliana, COL1 is located about 3 kb upstream of CO on chromosome 5. Similarly, the B. nigra COL1 homolog is located about 3 kb upstream of the B. nigra COa homolog (unpublished data). The lack of genes from species outside the Brassicaceae family in group A (fig. 2 ) indicates either that this group evolved recently or that CO orthologs are present in other species but have not yet been identified. The topology of the tree indicates that duplications that gave rise to CO, COL1, and COL2 occurred before the divergence of the lineages leading to Arabidopsis and Brassica.
Group B (fig. 2
) comprises genes from Arabidopsis, R. sativus, M. domestica, and P. radiata, indicating that the ancestral gene to that group was present before the divergence of gymnosperms and angiosperms. Groups C and D contain several genes from Arabidopsis and rice. The preponderance of genes from Arabidopsis and rice is most likely due to the comparatively large numbers of genes sequenced from these species. Even though the bootstrap support for group C was relatively weak, the distinctness of group C was supported by two separate features of those genes: the lack of a basic region near the C-terminus and the presence of an intron between the two zinc fingers. Visual inspection of amino acid sequence alignments of the complete genes revealed two domains that were conserved in proteins from most parts of the phylogenetic tree. In addition to the zinc finger domain present in all studied genes, a basic region was present near the C-terminus of most predicted proteins. This basic region was rich in Lys and Arg, and parts of it showed similarity to a predicted bipartite nuclear localization signal (Robert et al. 1998
). This basic domain was present in genes from all parts of the phylogenetic tree except in genes from group C (fig. 2
). Genes in group C also had a variable number of extra amino acids between the two zinc fingers as compared with the other genes. This spacer region coincided with the presence of an intron between the zinc fingers in the DNA sequence of the group C genes.
Group D comprises a number of highly divergent sequences from Arabidopsis and rice, suggesting that some genes might cluster in group D as an effect of long-branch attraction (LBA; Hendy and Penny 1989
). We used the WNJ and ML methods to reduce the effects of LBA. However, these methods still rely on a reasonably correct substitution model, which is unknown, so the clustering in group D should be viewed with some caution.
Nucleotide Substitution Rates
The rate of evolution of COL genes was studied by comparing orthologous genes between A. thaliana and B. nigra or B. napus (within group A). No attempt was made to study the rate of evolution of the complete coding region using more distantly related genes. The high frequency of insertions/deletions made it difficult to unambiguously align the whole gene using these diverged homologs. Synonymous and nonsynonymous substitutions were calculated between orthologous COL genes in A. thaliana and B. nigra or B. napus (table 2
). The divergence time of the Arabidopsis and Brassica lineages is poorly known. To enable a comparison of the substitution rates of the COLs with other genes, synonymous and nonsynonymous substitutions were also calculated for nine other genes sequenced in A. thaliana and in at least one species in the genus Brassica or the closely related genus Raphanus (Chs2).
|
Rates of Evolution in Different Domains of COL Genes
It has been suggested that regulatory genes are often divided into domains that are characterized by slowly and rapidly evolving sequences. To check for heterogeneity in evolutionary rate along COL genes, the distribution of nucleotide diversity based on nonsynonymous substitutions along the genes was studied sliding a window of 50 sites. This analysis included all COL genes in group A.
The distribution shown in figure 5
suggests two relatively conserved domains, one close to the N-terminus and one close to the C-terminus of the protein. The N-terminal region comprises the two putative zinc fingers, and the C-terminal region comprises the basic domain rich in Lys and Arg. Loss of function mutations in either of these domains in the CO gene showed that both domains are important for functional activity of CO gene product in A. thaliana (Putterill et al. 1995
; Robert et al. 1998
).
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
There are four relatively distinct clades in the phylogenetic analysis. Groups B and C contain genes from dicots, gymnosperms, and monocots, showing that these groups have persisted for a considerable period. Group A includes only genes from the Brassicaceae family, indicating that true CO orthologs have not yet been isolated in species outside the Brassicaceae family. Additional sequencing will show if this group diverged recently in the evolution of dicots or if members are also present in species outside the Brassicaceae family.
There is at present not much data on the function of COL genes. The CO gene of A. thaliana is involved in the timing of flower induction in response to photoperiod. The CO gene probably acts a transcriptional activator that promotes flowering (Putterill et al. 1995
; Simon, Igeno, and Coupland 1996
). The second gene for which a function is indicated is the STO gene of A. thaliana. This gene can functionally complement yeast calcineurin mutants and can also increase salt tolerance of wild-type yeast (Lippuner, Cyert, and Gasser 1996
). It was suggested that STO might modulate a calcineurin-dependent pathway in yeast. Although the function of STO in Arabidopsis is still unknown, present data indicate that STO functions in a process completely different from the one in which CO is involved. COL genes in group A and STO like genes in group C are also clearly differentiated at the amino acid sequence. Group C genes all lack the C-terminal basic domain that is present in most other genes in the COL family. As mentioned previously, analysis of induced mutations has shown that this domain is important in CO function.
Rapid Evolution of COL Genes
Genes in the COL family seem to evolve relatively fast. Within group A in the phylogenetic reconstruction (figs. 2 and 3
), the nonsynonymous substitution rate was significantly higher for the COL gene as compared with a random sample of other genes in Arabidopsis and Brassica. Based on mitochondrial nad4 genes, Yang et al. (1999)
estimated a divergence time for the Arabidopsis and Brassica lineages of 22 MYA. This divergence time yields a nonsynonymous substitution rate of 3.32 x 10-9 for COL genes, which is higher than the average calculated for a large number of genes (0.87 x 10-9; Li, Wu, and Luo 1985
).
The COL genes in group A (i.e., CO, COL1, and COL2 and their Brassica homologs) also display marked variation in the rate of replacement substitutions along the sequences. The two domains that have been indicated to be important for the function of A. thaliana CO, the zinc finger region and the basic N-terminal region, show relatively low rates of replacement substitutions. These rates are similar to those of other genes in the Brassicaceae family (cf. tables 1 and 2
) and a large number of other genes (Li, Wu, and Luo 1985
). In contrast, the estimate for the variable region is about five times as high (4.70 x 109). The higher evolutionary rate of the variable domain in COL genes is also supported by estimates of the ratio of nonsynonymous to synonymous substitutions. The ratios for the two conserved domains were 0.13 and 0.09, which are close to 0.14, the average reported for a number of dicot and monocot genes (Martin, Gierl, and Saedler 1989
; Huang, Stebbins, and Rodriguez 1992
), while the ratio for the variable domain was 0.64.
The nonsynonymous substitution rates estimated for COL genes are more similar to those reported for two other sets of regulatory genes, the plant MADS-box family (1.4 x 10-9; Purugganan et al. 1995
) and the R family (2.6 x 10-9; Purugganan and Wessler 1994
). Similar to the COL genes, these two gene families exhibited three- and fourfold increased replacement rates in variable regions as compared with conserved regions.
Alternating segments of rapidly and slowly evolving sequences have also been noted for other regulatory genes (Tucker and Lundrigan 1993
; Whitfield, Lovell-Badge, and Goodfellow 1993
; Rausher, Miller, and Tiffin 1999
).
This observation could be explained by relaxed constraint on variable regions and by a high proportion of substitutions in these regions being effectively neutral. Alternatively, the high nonsynonymous substitution rates in variable regions could be the result of positive selection for amino acid substitutions. From data for A. thaliana, it is clear that CO, COL1, and COL2 have diverged in function. Mutations in CO, resulting in delayed flowering under long days, cannot be compensated by COL1 or COL2. It remains to be determined which mutations have been important for the diverged function. Whether these mutations are mainly located in the conserved regions that are known to be important for CO function or in the intervening region that is evolving at a high rate is not known. Even though the rate of nonsynonymous substitutions was high in the variable region, our data did not indicate any positive selection (i.e., a nonsynonymous to synonymous ratio >1). However, the power of this test is low, as it assumes that all sites in the studied sequence are under the same selection pressure with the same underlying Ka/Ks ratio.
Analysis of the rate of nonsynonymous to synonymous substitution among different branches of the phylogenetic tree indicated that the rate has increased during later stages of evolution. The two most basal branches of the three had a significantly lower nonsynonymous to synonymous substitution ratio than did younger branches. An increased evolutionary rate was indicated not only for the rapidly evolving middle region of the protein, but also for the two more conserved flanking regions (the zinc finger region and the basic region). The structure of the evolutionary tree indicates that gene duplications have been frequent in later stages of evolution. In the limited sample of genes in figure 3
, there are two closely related CO homologs in the apple and several homologs in A. thaliana and Brassica species. One possible reason for the accelerated evolution of COL genes indicated by the present data could be the frequent occurrence of duplicated copies that are allowed to diverge and acquire new functions. The high rate of substitutions in COL genes is probably not due to relaxed selection in pseudogenes, as all the COL genes included in the analysis (fig. 3
) are still expressed. Expression has been observed for CO, COL1, COL2, (Putterill et al. 1995, 1997
; Ledger, Dare, and Putterill 1996
), Bna COa, Bna COb (Robert et al. 1998
), Bni COa, Bni COb, Bni COL1, Bni COL2, (present study), and Md1 and Md2 (Jeong, Sung, and An 1999
).
It remains to be determined whether selection is the main factor operating to diversify the COL genes or whether the rapid evolution (primarily of the variable domain) is mainly an effect of relaxed sequence constraint. We will extend the analysis to include comparison of polymorphism and divergence at nonsynonymous and synonymous sites, as such comparisons are more powerful with regard to detecting adaptive evolution in closely related species (McDonald and Kreitman 1991
). These analyses might provide more insight into which domains have been important in the functional diversification of COL genes.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: gene duplication
gene family evolution
molecular evolution
phylogeny
zinc finger protein
2 Address for correspondence and reprints: Ulf Lagercrantz, Department of Plant Biology, Swedish University of Agricultural Sciences, Box 7080, S-750 07 Uppsala, Sweden. E-mail: ulf.lagercrantz{at}vbiol.slu.se
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lippman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403410[ISI][Medline]
Bruno, W. J., N. D. Socci, and A. L. Halpern. 2000. Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol. Biol. Evol. 17:189197
Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725736
Goodman, M. 1976. Protein sequences in phylogeny. Pp. 141159 in F. J. Ayala, ed. Molecular evolution. Sinauer, Sunderland, Mass
Grant, M. R., L. Godiard, E. Straube, T. Ashfield, J. Lewald, A. Sattler, R. W. Innes, and J. L. Dangl. 1995. Structure of the Arabidopsis RPM1 gene enabling dual specificity disease resistance. Science 269:843846
Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160174[ISI][Medline]
Hendy, M. D., and D. Penny. 1989. A framework for the quantitative study of evolutionary trees. Syst. Zool. 38:297309[ISI]
Hernould, M., S. Suharsono, S. Litvak, A. Araya, and A. Mouras. 1993. Male-sterility induction in transgenic tobacco plants with an unedited atp9 mitochondrial gene from wheat. Proc. Natl. Acad. Sci. USA 90:23702374
Huang, N., G. L. Stebbins, and R. Rodriguez. 1992. Classification of and evolution of the alpha-amylase genes in plants. Proc. Natl. Acad. Sci. USA 89:75267530
Jeong, D. H., S. K. Sung, and G. An. 1999. Molecular cloning and characterization of CONSTANS-like cDNA clones of the Fuji apple. J. Plant Biol. 42:2331
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Biol. 16:111120
. 1983. The neutral theory of molecular evolution. Cambridge University Press, London
Lagercrantz, U. 1998. Comparative mapping between Arabidopsis thaliana and Brassica nigra indicates that Brassica genomes have evolved through extensive genome replication accompanied by chromosome fusions and frequent rearrangements. Genetics 150:12171228
Lagercrantz, U., and D. Lydiate. 1995. RFLP mapping in Brassica nigra indicates differing recombination rates in male and female meiosis. Genome 38:255264
. 1996. Comparative genome mapping in Brassica. Genetics 144:19031910
Ledger, S. E., A. P. Dare, and J. Putterill. 1996. COL2 is a homologue of the Arabidopsis flowering-time gene CONSTANS. Plant Physiol. 112:862
Li, W.-H. 1985. Accelerated evolution following gene duplication and its implication for the neutralist-selectionist controversy. Pp. 333352 in T. Ohta and K. Aoki, eds. Population genetics and molecular evolution. Japan Scientific Societies Press, Tokyo
Li, W. H., M. Wu, and C. L. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitutions considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150174[Abstract]
Lippuner, V., M. S. Cyert, and C. S. Gasser. 1996. Two classes of plant cDNA clones differentially complement yeast calcineurin mutants and increase salt tolerance of wild-type yeast. J. Biol. Chem. 271:1285912866
McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652654
Martin, W., A. Gierl, and H. Saedler. 1989. Molecular evidence for pre-Cretaceous angiosperm origins. Nature 339:4648
Ohno, S. 1970. Evolution by gene duplication. Springer Verlag, Berlin
Pearson, W. R. 1990. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183:6398[ISI][Medline]
Purugganan, M. D., S. D. Rounsley, R. J. Schmidt, and M. F. Yanofsky. 1995. Molecular evolution of flower development: diversification of the plant MADS-box regulatory gene family. Genetics 140:345356
Purugganan, M. D., and S. R. Wessler. 1994. Molecular evolution of the plant R regulatory gene family. Genetics 138:849854
Putterill, J., S. Ledger, K. Lee, F. Robson, G. Murphy, and G. Coupland. 1997. The flowering-time gene CONSTANS and homologue CONSTANS LIKE 1 (accession no. Y 10555 and Y10556) exists as a tandem repeat on chromosome 5 of Arabidopsis. Plant Physiol. 114:396
Putterill, J., F. Robson, K. Lee, R. Simon, and G. Coupland. 1995. The CONSTANS gene of Arabidopsis promotes flowering and encodes a protein showing similarities to zinc finger transcription factors. Cell 80:847857
Ramain, P., P. Heitzler, M. Haenlin, and P. Simpson. 1993. pannier, a negative regulator of achaete and scute in Drosophila, encodes a zinc finger protein with homology to the vertebrate transcription factor GATA-1. Development 119:12771291
Rausher, D., R. E. Miller, and P. Tiffin. 1999. Patterns of evolutionary rate variation among genes of the anthocyanin biosynthetic pathway. Mol. Biol. Evol. 16:266274[Abstract]
Robert, L. S., F. Robson, A. Sharpe, D. Lydiate, and G. Coupland. 1998. Conserved structure and function of the Arabidopsis flowering time gene CONSTANS in Brassica napus. Plant Mol. Biol. 37:763772
Simon, R., M. I. Igeno, and G. Coupland. 1996. Activation of floral meristem identity genes in Arabidopsis. Nature 384:5962
Song, J., K. Yamamoto, A. Shomura, H. Itadani, H. S. Zhong, M. Yano, and T. Sasaki. 1998. Isolation and mapping of a family of putative zinc-finger protein cDNA from rice. DNA Res. 5:95101[Medline]
Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964969
Swofford, D. 1993. PAUP: phylogenetic analysis using parsimony. Illinois Natural History Survey, Champaign
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1997. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignments through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680[Abstract]
Tucker, P. K., and B. Lundrigan. 1993. Rapid evolution of the sex-determining loci in Old World mice and apes. Nature 364:715717
Whitfield, L., R. Lovell-Badge, and P. Goodfellow. 1993. Rapid sequence evolution of the sex-determining gene SRY. Nature 364:713715
Yang, Z. 1997. PAML, a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555556
Yang, Y. W., K. N. Lai, P. T. Tai, and W. H. Li. 1999. Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages. J. Mol. Evol. 48:597604[ISI][Medline]