DNA Polymorphism at the FRIGIDA Gene in Arabidopsis thaliana: Extensive Nonsynonymous Variation Is Consistent with Local Selection for Flowering Time

Valérie Le Corre, Fabrice Roux and Xavier Reboud

Laboratoire Malherbologie et Agronomie, INRA, Dijon Cedex, France


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
FRIGIDA (FRI) is a major gene involved in the regulation of flowering time in Arabidopsis thaliana. Nucleotide variation at this gene was investigated by sequencing 25 field ecotypes collected from western Europe. Genetic diversity at FRI was characterized by a high number of haplotypes and an excess of low-frequency polymorphisms. A large excess of intraspecific nonsynonymous variation associated with low synonymous variation was detected along the first exon in the FRI gene. In contrast, no excess of nonsynonymous divergence was detected between A. thaliana and A. lyrata. The Tajima and McDonald and Kreitman tests, however, suggested that this gene has evolved in a nonneutral fashion. Nonsynonymous variation included eight loss-of-function mutations that have probably arisen recently and independently in several locations. A phenotypic evaluation of the sequenced ecotypes confirmed that these loss-of-function mutations were associated with an early-flowering phenotype. Taken together, our results suggest that DNA polymorphism at the FRI gene in A. thaliana from western Europe has been shaped by recent positive selection for earliness in a set of isolated populations.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
The extent and pattern of DNA sequence variation in natural populations can provide useful information about the evolutionary forces acting on a species. In this respect, an important point is to understand how molecular variation at the single-gene level is affected by natural selection at higher phenotypic level.

The model plant Arabidopsis thaliana is being increasingly used in molecular evolutionary studies. The different genes studied so far in A. thaliana can be classified according to their function as follows: genes involved in the recognition of the plant pathogen or abiotic stimuli (Kawabe et al. 1997Citation ; Caicedo, Schaal, and Kunkel 1999Citation ; Kawabe and Miyashita 1999Citation ; Stahl et al. 1999Citation ; Kuittinen and Aguadé 2000Citation ; Aguadé 2001Citation ), floral homeotic genes (Purugganan and Suddith 1998Citation , 1999Citation ), and genes that encode catalytic enzymes (Hansftingl et al. 1994Citation ; Innan et al. 1996Citation ; Miyashita, Kawabe, and Innan 1998Citation ; Kawabe, Yamane, and Miyashita 2000Citation ).

Surprisingly, genes involved in the control of flowering time have almost never been studied for natural variation at the DNA level. Flowering time has nevertheless been shown to be highly variable in A. thaliana, is clearly related to fitness, and would be a main determinant of adaptation to environmental variation (Pigliucci 1998Citation ). Conversely, recent progress has been made in understanding the genetic determinism of flowering time in A. thaliana. A large number of genes have been described, among which the single-copy gene FRIGIDA (FRI) seems to be of particular importance. The FRI locus acts synergistically with the Flowering Locus C (FLC) to cause late flowering (Sheldon et al. 1999Citation ). FLC is the key gene involved in the initiation of flowering; it is negatively regulated by vernalization but positively regulated by FRI (Sheldon et al. 1999Citation ). Ecotypes having functional alleles at the FLC but nonfunctional alleles at the FRI will have a shorter life cycle compared with the ecotypes having functional alleles at both loci. Moreover, by sequencing four ecotypes, Johanson et al. (2000)Citation identified two different deletions in FRI that disrupt the open reading frame and demonstrated that some of the early-flowering ecotypes carry one of these loss-of-function mutations. The exact function of the FRI protein is still unknown, although it contains two coiled-coil domains, the importance of which remains to be identified.

In this study, sequence variation at the FRI gene was analyzed for 24 field strains sampled in western Europe (France and the United Kingdom) and one seedbank ecotype (Ler, Poland). There is recent evidence that the populations we sampled originated from a single postglacial colonization "wave" from a Pleistocene refugium located in the Iberian Peninsula (Sharbel, Bernhard, and Mitchell-Olds 2001Citation ). Nevertheless, these populations display a range of variation for flowering time (under greenhouse conditions) similar to that observed in a collection of 249 worldwide ecotypes (unpublished data). It is likely that the flowering time has been subjected to diversifying selection to adapt to the various kinds of habitats occurring in western Europe. Because of their longer vegetative growth phase, late-flowering plants can accumulate and allocate more resources for seed production. Late flowering in the absence of vernalization also allows the seedlings that emerge in autumn to overwinter as rosette plants. In contrast, early flowering would be advantageous under climates with a short favorable season for growth or in disturbed habitats. Early flowering may also be advantageous under climates with a long growing season if it allows several reproductive cycles to be achieved during a year.

Our study shows that many replacements and indels have occurred during the evolution of the FRI gene in western European ecotypes and that some of these variations have led to a loss of function. Our objectives are, therefore, firstly, to compare the level and pattern of sequence variation in the FRI region with that of other previously studied genes, secondly, to understand which neutral and selective forces acted to create this pattern, and finally, to examine the relationship between the polymorphism in the FRI region and the natural variation for flowering time.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Plant Material
Twenty-five ecotypes of A. thaliana were used in this study (table 1 ). The ecotype Landsberg erecta (Ler-0) was obtained from the Nottingham Stock Center. All other ecotypes were collected from the field at different locations in France and the United Kingdom (table 1 ). We used one ecotype per location. Three pairs of locations, ALL1 and ALL2, DAM1 and DAM2, and WHA1 and WHA2 were separated by less than 5 km, whereas all other sampling locations were separated from each other by at least 100 km. This two-level sampling scheme was designed to assess the relationships between the geographical distances and the DNA polymorphism.


View this table:
[in this window]
[in a new window]
 
Table 1 The Arabidopsis thaliana Field Strains Used in This Study

 
Phenotypic Evaluation
For each accession, four seeds obtained from the first generation of multiplication in the greenhouse to remove potential maternal effects were sown separately in 3-cm pots filled with a mixture of sand, soil, and peat. No vernalization treatment was applied. Plants were randomized and grown in the greenhouse from March 1 until mid-June, under natural light supplemented by artificial light with a 16-h photoperiod. Temperature was kept between 20 and 25°C. Earliness was measured by flowering time (the number of days elapsed since sowing to when the first flower opened) and by the number of rosette leaves at flowering, a parameter less affected by photoperiod. Measures were averaged over the four plants for each ecotype (table 1 ).

DNA Sequencing
The total DNA was isolated from young leaves using a CTAB method (Doyle and Doyle 1987Citation ). The FRI gene was amplified as two overlapping segments. The two pairs of primers used were Fri1 (5'-GAAGACTAAAAAGAGCACACCATCACCCC-3') and Fri1R (5'-CATTCCCTTGATACTTGATTCAAC-3') to amplify the 5'-end of the gene, and Fri2 (5'-CGAAATTGTTGCTTGTCAGAACCAAATG-3') and Fri2R (5'-ATGAAGAGAATCCAGATGACCAAGAGCC-3')to amplify the 3'-end of the gene. PCR products were purified using the Qiaquick PCR purification kit (Qiagen). To eliminate sequence errors caused by the Taq DNA polymerase, PCR products from five independent reactions were pooled and used as a template for sequencing. DNA sequencing was carried out by MWG Biotech A.G. (Ebersberg, Germany). All sequence polymorphisms were visually rechecked from chromatograms, with special attention to singleton polymorphisms.

Data Analysis
The published FRI sequence of the ecotype H51, a derivative of the ecotype "Stockholm," was included in this study (GenBank accession number AF228499; Johanson et al. 2000Citation ). The analyzed region was located between nucleotide positions 126 and 2936 of the H51 ecotype and encompassed the entire coding region of the FRI gene. The A. lyrata FRI sequence corresponding to positions 616 to 2838 in the H51 sequence was provided by H. Kuittinen (University of Oulu, Finland). Sequences were aligned using the Multalin program (Corpet 1988Citation ). A neighbor-joining tree was constructed with Mega version 2.0 (Kumar et al. 2001Citation ), using the genetic distances estimated by the Jukes and Cantor (1969)Citation method. Sequence polymorphisms in A. thaliana were analyzed using the DnaSP program version 3.52 (Rozas J. and Rozas R. 1999Citation ). Nucleotide variation was estimated as nucleotide diversity ({pi}, Nei 1987Citation ) and 4Nµ ({theta}, Watterson 1975Citation ). The Tajima (1989)Citation test that compares these two measures of nucleotide variation was used as a test for mutation-drift equilibrium. Observed values of Tajima's D statistic were compared with empirical distributions generated by coalescent simulations under a neutral infinite-site model and assuming a large constant population size, as implemented in DnaSP (Rozas and Rozas 1999Citation ). The distribution of observed pairwise nucleotide differences, or mismatch distribution, was calculated and graphically compared with the expected mismatch distribution under constant population size or population growth with no recombination (Rogers and Harpending 1992Citation ). The minimum number of intragenic recombination events was calculated using the four-gametes test (Hudson and Kaplan 1985Citation ), and the parameter C = 4Nc was estimated using the method of Hudson (1987)Citation . Rates of synonymous and nonsynonymous substitution in the coding region of FRI were estimated using two different methods, the approximate method of Nei and Gojobori (1986)Citation with the Jukes and Cantor correction for multiple hits and the maximum likelihood method developed by Goldman and Yang (1994)Citation . This latter method accounts for transition-transversion bias and codon usage bias. Standard errors for the rates of synonymous and nonsynonymous substitution estimated using the method of Nei and Gojobori (1986)Citation were obtained by the bootstrap method implemented in Mega version 2.0 (Kumar et al. 2001Citation ).


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
DNA Polymorphisms at the FRI Locus
A total of 2,802 nt of the FRI gene were sequenced. This comprises 448 nt in the 5'-flanking region, three exons and two introns, and 45 nt in the 3'-flanking region. Twenty different haplotypes were found among the 26 ecotypes studied. Three haplotypes were present twice, and one haplotype was present four times. The 20 haplotypes differed by 44 mutations: 36 nucleotide polymorphisms and 8 indels (fig. 1 ). There was a majority of singleton polymorphic sites: 21 nucleotide polymorphisms among the 36 and 5 indels among the 8 were singletons.



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 1.—Summary of DNA polymorphisms in the FRI gene of A. thaliana. A dot indicates a nucleotide identical to H51, an — indicates the absence of the corresponding insertion or nucleotide. In, insertion; Del, deletion; Indel, deletion combined with insertion. In1 = GTA. Indel1 = deletion of 376 bp combined with an insertion of 31 bp. Indel2 = deletion of 99 bp combined with an insertion of 61 bp. Redundant haplotypes are: H51 = FET, ALL1 = CLA, CLE = DAM2, BUI = FER = RAN = WHA1. Names of sequences containing a stop codon are underlined

 
Polymorphic sites were found in all regions of the gene. There were similar proportions of polymorphic sites in the coding and noncoding regions (chi-square test of homogeneity), and the six indels among the eight were also found in the coding region. The six indels in the coding region caused a frameshift. One indel (indel1) overlapped with the end of the 5'-flanking region and the beginning of the first exon and removed the translation start codon. The other five indels were always followed by a stop codon in the sequence. Similarly, two nucleotide changes resulted in stop codons (fig. 1 ). The interrupted reading frames, which terminated either in exon 1, in exon 2, or at the beginning of exon 3, would encode peptides varying in length between 41 and 410 amino acids. The normal FRI protein, which has 609 amino acids, has an unknown function but contains two coiled-coil domains, one between amino acid positions 55 and 100 and the other between 405 and 450 (Johanson et al. 2000Citation ). Because one of these two coiled-coil domains is always lost in the interrupted FRI sequences, these were assumed to be nonfunctional. Among the 20 haplotypes observed, at least nine (representing 13 ecotypes among 26) would thus correspond to a nonfunctional FRI gene.

Genealogical Relationships Among Ecotypes
The neighbor-joining tree (fig. 2 ) showed weak support for allelic dimorphism at the FRI, in contrast to several other genes previously studied in Arabidopsis (Kawabe, Yamane, and Miyashita 2000Citation ). A small group of sequences formed a separated cluster with a bootstrap value of 77%, whereas the remaining sequences were not strongly structured into groups. There was no relationship between the tree structure and the geographical origin of the ecotypes. Pairs of geographically related ecotypes were always separate on the tree, whereas some geographically distant ecotypes clustered together. Ecotypes with an interrupted, nonfunctional FRI sequence were scattered among all clusters on the tree.



View larger version (10K):
[in this window]
[in a new window]
 
Fig. 2.—Neighbor-joining tree based on nucleotide variation in the entire FRI region. Bootstrap probabilities >50% are shown above branches. The number of bootstrap replications was 1,000. Interrupted sequences (see text) are underlined

 
Mismatch Distribution Analysis
The distribution of the numbers of pairwise nucleotide difference is affected by the demographic history. Under constant population size, the general distribution shows several peaks that, on an average, follow a gradually declining curve (Rogers and Harpending 1992Citation ). The observed mismatch distribution for the FRI gene region (fig. 3 ) departed from this expectation by showing a single main peak, as expected for a population having experienced a recent growth. However, the raggedness statistics r, a measure of the smoothness of the distribution (Harpending 1994Citation ), was 0.0372, a value not significantly different from the values obtained by coalescent simulations under the constant population size model (implemented in DnaSP version 3.52, Rozas and Rozas 1999Citation ).



View larger version (12K):
[in this window]
[in a new window]
 
Fig. 3.—The mismatch distribution for nucleotide variation at the FRI region in A. thaliana. Observed distribution is shown in columns. Expected distributions under constant population size and population growth are indicated as dotted and solid lines, respectively

 
Patterns of Nucleotide Diversity at the FRI Gene: Comparison of Complete and Interrupted Sequences
Nucleotide diversity at the FRI gene is shown in table 2 . When considering all ecotypes (table 2A ), the nucleotide diversity ({pi}) for the entire region was 0.00214. This value is the smallest within the nine genes studied so far in A. thaliana, ranging from 0.00310 for ferulate-5-hydroxylase to 0.01040 for chitinase A (reviewed in Aguadé 2001Citation ). In contrast to the other genes studied so far in A. thaliana, the coding region of FRI was as diverse ({pi} = 0.00217) as its noncoding regions ({pi} = 0.00154). Tajima's D statistic was negative in most regions and sets of sequences, as expected when singletons and low-frequency polymorphisms are in excess. A significant departure from the neutral expectation was observed for the total gene and for its coding region. Within the coding region, Tajima's test was significant when considering nonsynonymous polymorphisms but not significant on synonymous polymorphisms.


View this table:
[in this window]
[in a new window]
 
Table 2 Patterns of Nucleotide Variation in the FRI Region of Arabidopsis thaliana

 
Nucleotide diversity was estimated separately for ecotypes carrying complete and interrupted FRI sequences (table 2B and C ). Nearly identical results were observed for the two sets of sequences, both of them showing the patterns described earlier. However, nonfunctional sequences showed slightly more diversity than did functional sequences at all regions of the FRI gene, except the introns. This result might indicate a recent relaxation of the selection pressures on nonfunctional variants of the gene. Within either set of sequences, Tajima's D values were nonsignificant but nevertheless negative for most regions of the gene. Lack of significance of the tests might be because of the small sample size for these two sets of sequences (Simonsen, Churchill, and Aquadro 1995Citation ).

Polymorphism and Divergence at the FRI Coding Region
A sliding-window analysis was conducted to better examine the distribution of silent (i.e., noncoding and synonymous) and nonsynonymous variations along the FRI gene region (fig. 4 ). Silent divergence between A. thaliana and A. lyrata was higher in the intronic region than in the exons, suggesting that selective constraints reduced the level of variation in the coding regions of FRI. Peaks of silent diversity within A. thaliana were present in two regions of the gene: first, in the 5'-flanking region, second in a large region that comprised the two introns and exons 2 and 3. In contrast, there were very few to zero silent variations along the first exon. Nonsynonymous variation was present mainly in the first exon, where two high peaks of nonsynonymous diversity were observed. The first peak overlapped with the first coiled-coil domain of the FRI protein. Only small peaks of nonsynonymous variation were found in exons 2 and 3.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 4.—Sliding-window analysis of silent (synonymous and noncoding) and nonsynonymous nucleotide diversity within A. thaliana and divergence between A. thaliana and A. lyrata for the FRI gene. Window size is 50 bp, and step size is 10 bp (in units of either silent or nonsynonymous sites). The positions of the exons are represented by boxes below the plot

 
Heterogeneity in nucleotide variation patterns across a DNA sequence can result from nonneutral evolution (Goss and Lewontin 1996Citation ; McDonald 1996Citation ). Statistical tests for heterogeneity, based either on the ratio of diverged to nondiverged sites (Goss and Lewontin 1996Citation ) or on the ratio of polymorphisms to fixed (between species) differences (McDonald 1996Citation , 1998Citation ), were applied to the FRI region using the software DNA Slider (McDonald 1998Citation ). These tests do not require a priori partition of the region studied. The Goss and Lewontin's interval length variance (Goss and Lewontin 1996Citation , equation 2) and modified interval length variance (Goss and Lewontin 1996Citation , equation 3) were both highly significant, with probabilities less than 0.01, irrespective of the recombination value used (4Nc between 0 and 64). The McDonald's maximum sliding G statistic was also significant (P <= 0.0227). The coding part of the FRI gene thus appeared to be made of differently evolving regions that, from figure 4 , approximately corresponded first to exon 1 and second to exons 2 and 3. These two regions were of similar length, and each contained one of the two coiled-coil domains in the FRI. To our knowledge, no data is available to assess whether these two regions overlap with functionally distinct domains in the FRI protein.

The ratio of nonsynonymous to synonymous substitution at the coding region of FRI between A. thaliana and A. lyrata was 0.36 (table 3 ). This ratio was higher than those previously found at other genes in A. thaliana. For CAL, AP3, PI (Purugganan and Suddith 1998Citation ), CHI (Kuittinen and Aguadé 2000Citation ), FAH1, and F3H (Aguadé 2001Citation ), the ratio of replacement to synonymous divergence varied between 0.038 (for F3H) and 0.25 (for CHI). However, the ratio found for FRI was still significantly less than one (table 3 ), indicating that the FRI protein is constrained against amino acid changes at the between-species level. The pattern was quite different, however, at the within-species level. For exons 2 and 3, the synonymous diversity was about 10-fold higher than the nonsynonymous diversity (table 3 ), suggesting that purifying selection was also acting at the within-species level on this part of the gene. In contrast, the first exon in FRI showed a very high level of nonsynonymous to synonymous diversity ({pi}a/{pi}s = 5 over 954 bp, table 3 ). For the whole gene, the ratio of nonsynonymous to synonymous diversity was 0.76, a value higher than those previously found at other genes in A. thaliana. For ChiB (Kawabe and Miyashita 1999Citation ), PgiC (Kawabe, Yamane, and Miyashita 2000Citation ), CHI (Kuittinen and Aguadé 2000Citation ), FAH1, and F3H (Aguadé 2001Citation ), the {pi}a/{pi}s ratio varied between 0.05 (for ChiB and F3H) and 0.38 (for CHI). The coding region of FRI had a G+C content of 45.8% for exon 1 and 43% for exons 2 and 3. The effective number of codons (ENC) used in a gene, as defined by Wright (1990)Citation , ranges from 20 when only one codon is used for each amino acid to 61 when all synonymous codons are equally used. For the FRI gene, ENC was 58.3 in the first exon and 51.8 in the second and third exons. No codon bias that could cause an elevated ratio of nonsynonymous to synonymous diversity was thus present.


View this table:
[in this window]
[in a new window]
 
Table 3 Polymorphism and Divergence at the Coding Region of FRI

 
Among the 19 nonsynonymous single-nucleotide polymorphisms present at low frequency within the A. thaliana ecotypes, and localized mostly in exon 1, only one (at nucleotide position 1010) was also present in the A. lyrata sequence. Thus, most of these low-frequency nonsynonymous polymorphisms have occurred after the separation between the two species and are not the remnants of some ancestral variant of the gene. For the whole gene as well as for exon 1, the McDonald and Kreitman test (McDonald and Kreitman 1991Citation ) indicated that the ratio of synonymous to nonsynonymous variation was significantly higher in A. thaliana than between A. thaliana and A. lyrata (table 4 ), which contradicted the neutral expectation and suggested that positive selection has been acting at the within-species level. In contrast, similar ratios of synonymous to nonsynonymous variation at the within- and between-species levels were observed for exons 2 and 3, indicating no significant departure from neutral expectation.


View this table:
[in this window]
[in a new window]
 
Table 4 Summary of McDonald and Kreitman's Test

 
Intragenic Recombination
The four-gametes test detected a minimum of two recombination events within the FRI region, between nucleotide positions 398 and 736 and between nucleotide positions 736 and 1017. Given that the number of informative sites (nonsingletons) available for the test was only 11, this value for RM may underestimate the real number of recombination events that have occurred. Indeed, after scaling with the number of informative sites (as in Kuittinen and Aguadé 2000Citation ), RM was found to be greater for FRI than for six other genes previously studied in A. thaliana (table 5 ). The estimated recombination parameter 4Nc was 0.1094 per site, a value much greater than the estimate of 4Nµ. The effect of recombination relative to mutation on nucleotide diversity was thus found to be stronger in FRI than in any other genes previously studied in A. thaliana (table 5 ). This does not mean, however, that the recombination rate is much greater in the FRI than in the other genes. The parameter 4Nc was estimated assuming an equilibrium-neutral model (Hudson 1987Citation ) that might be inappropriate, and the different values of the ratio 4Nc/4Nµ found among different genes could rather reflect the differences in selection pressures (Kuittinen and Aguadé 2000Citation ).


View this table:
[in this window]
[in a new window]
 
Table 5 Summary of Recombination at Different Genes in Arabidopsis thaliana

 
Naturally Occurring Variation in Earliness Among FRI Genotypes
Ecotypes with an interrupted FRI sequence had between 10.7 and 38.7 rosette leaves on average at flowering, whereas ecotypes with a complete FRI sequence had between 20 and 82.5 leaves (table 1 ). This difference in earliness was statistically significant (Wilcoxon rank-sum test, P = 0.0011). Ecotypes with a complete FRI sequence showed a much greater range of variation for earliness than did ecotypes with an interrupted FRI sequence. This suggests that at least some of the observed amino acid changes could modify the properties of the FRI protein. However, because the two distributions overlapped between the two sets of ecotypes, it could not be excluded that some of the amino acid changes within the full-length sequences might also induce a loss of function.

To examine the relationships between amino acid changes within FRI sequences and variation in flowering time, we mapped the flowering phenotype of each ecotype on a haplotype tree based on nonsynonymous polymorphisms only (fig. 5 ). The ecotype H51 had the consensus amino acid sequence, which from a comparison with A. lyrata was also found to be the ancestral sequence, and was used as a reference. Two ecotypes with nonfunctional sequences, LAC and VOU, could not be attributed to a unique position on the tree because of either homoplasy or recombination. They were thus discarded from the tree construction. Figure 5 showed that interrupted FRI sequences descended from a variety of intact FRI proteins. Among the 13 ecotypes with an intact FRI sequence, nine different FRI proteins that differed from each other by up to four amino acid changes were observed. There was some variation in the degree of earliness between ecotypes having identical FRI amino acid sequences (e.g., for ecotypes ALL1, CLA, and PON). This variation could be the result of either environmental effects or the effects of genetic polymorphisms elsewhere (see Discussion). Because the functional structural requirements of the FRI protein are unknown, it was impossible to predict which amino acid changes would modify its activity. Instead, we classified the observed changes according to Grantham's (1974)Citation physiochemical distance: changes corresponding to a distance higher than 100 (the mean distance) were classified as nonconservative. Figure 5 showed that most branches of the gene tree that connected a late-flowering ecotype to an early-flowering ecotype was associated with either a loss-of-function mutation or a nonconservative amino acid change.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5.—Haplotype tree based on nonsynonymous polymorphisms at the FRI gene. Loss-of-function mutations and the position and nature of amino acid changes are indicated on each branch connecting two ecotypes. Nonconservative amino acid changes (from Grantham 1974Citation ) are in bold. The earliness phenotype of each ecotype is classified as follows: EE, very early (number of leaves at flowering <= 25); E, early (number of leaves <= 40); L, late (number of leaves <= 70); LL, very late (number of leaves > 70). Ecotypes having an interrupted FRI sequence are underlined

 

    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Overall Level and Pattern of DNA Variation at the FRI Locus in A. thaliana
The level of nucleotide diversity found in this study for the FRI gene was low compared with the other genes previously studied in A. thaliana. This could be because of the reduced geographical range from which the studied ecotypes were sampled (mainly France). However, previous analyses of the genetic diversity of large, worldwide collections of A. thaliana ecotypes, based either on microsatellites (Innan, Terauchi, and Miyashita 1997Citation ) or on amplified fragment length polymorphism (Miyashita, Kawabe, and Innan 1999Citation ), have shown that western European accessions are genetically scattered among ecotypes from other geographical origins and have similar levels of diversity. These studies have found no clear, worldwide geographical structuring of the genetic diversity in A. thaliana. We did not find any correlation between the geographical distances and the genetic similarity at the reduced geographical scale considered in our study. Thus, it seems that the genetic variability in A. thaliana is shaped more by its worldwide expansion and the postglacial colonization events or by selection pressures (or both) than by present patterns of migration and isolation by distance.

The significantly negative Tajima's D values and the bell-shaped mismatch distribution indicated the presence of an excess of low-frequency polymorphisms. Another noticeable result was the large number of haplotypes. Using simulations of a coalescent process for a large constant population size under the neutral infinite-site model with no recombination, we could show that the observed number of haplotypes (20) was significantly larger than expected under demographic equilibrium and neutrality (P = 0.001, expected mean number of haplotypes is 13.1). A rapid demographic expansion or a past selective sweep could both account for these results (Tajima 1989Citation ; Depaulis and Veuille 1998Citation ).

Heterogeneity of Nucleotide Variation Along the Sequence of the FRI Gene
Demographic factors, such as a recent expansion, affect all genes and all regions of a gene equally. In contrast, selection directly affects the genetic diversity at a target site and modifies the genetic diversity at linked sites via hitchhiking effects. Selection is thus expected to result in heterogeneous patterns of genetic diversity among different genes and across the sequence of a given gene. In FRI both the amount of nucleotide diversity at nonsynonymous sites that are potential targets of selection pressures and the amount of nucleotide diversity at silent sites varied greatly along the gene sequence (fig. 4 ). In the second and third exons of the FRI there was a significantly lower rate of nonsynonymous than synonymous variation, which indicated that variation in this part of the gene has been driven mainly by purifying selection. A bias toward low-frequency polymorphisms was also observed in exons 2 and 3, as in other regions of the gene. This could reflect a recent demographic expansion of the A. thaliana in western Europe, but a recent relaxation of purifying selection on the nonfunctional variants of the gene may also have contributed to this pattern. In contrast to exons 2 and 3, a large coding region corresponding to the first exon in FRI showed a reduced level of synonymous variation associated with an excess of nonsynonymous polymorphisms, including changes between amino acids with quite different physiochemical properties and stop codons. As there is evidence that FRI is a single-copy gene (Johanson et al. 2000Citation ), it is unlikely that these polymorphisms are actually neutral.

Mechanisms of Maintenance of Excess Nonsynonymous Mutations
A high degree of intraspecific amino acid polymorphism has already been found in other genes in A. thaliana, such as floral homeotic genes (Purugganan and Suddith 1998Citation , 1999Citation ) and ChiA (Kawabe et al. 1997Citation ). On the basis of the highly selfing nature of A. thaliana and its distribution as small scattered populations, it was proposed that most nonsynonymous mutations would be only slightly deleterious and could, therefore, be maintained by genetic drift in small populations (Ohta 1992Citation ; Whitlock 2000Citation ). Another explanation for the excess nonsynonymous polymorphisms in the first exon in the FRI would lie in a recent relaxation of purifying selection on this gene region. In conjunction with a recent demographic expansion in western Europe, this might have led to a large number of amino acid replacements occurring at low frequency. Such a process has previously been invoked to explain the elevated {pi}a/{pi}s ratio in the melanocortin 1 receptor in human populations from Europe as compared with Africa (Harding et al. 2000Citation ).

Under these two different explanations, however, an elevated but still lower than one {pi}a/{pi}s ratio is expected. In the first exon in the FRI, the significantly higher than one {pi}a/{pi}s ratio, therefore, strongly suggests that the excess amino acid polymorphisms are mostly adaptive and have been maintained by positive selection.

Effect of Local Positive Selection in a Subdivided Population
Two kinds of selection pressures are known to increase the level of variation in a gene: local selection and balancing selection (Nordborg, Charlesworth, and Charlesworth 1996Citation ; Charlesworth, Nordborg, and Charlesworth 1997Citation ). According to Charlesworth, Nordborg, and Charlesworth (1997)Citation , local selection would be difficult to distinguish from balancing selection in the absence of within-deme data because both kinds of selection would result in an excess of intermediate frequency polymorphisms. Their conclusion was based on a model of local selection at a single locus in a population subdivided into two demes. However, more complex circumstances would lead to other patterns of genetic diversity. It should be considered that first, several different mutations in a gene can potentially change its functional properties and, conversely, a same phenotypic change may be achieved via different mutations in a given gene. Thus, a single selection pressure can potentially affect several sites in a gene. Second, highly selfing species such as A. thaliana show a high degree of population subdivision. Therefore, we may hypothesize that positive selection in a set of isolated populations may lead to the maintenance of a large number of low-frequency nonsynonymous changes, as observed for the FRI gene. The lack of linkage disequilibrium among the observed nonsynonymous polymorphisms in FRI (data not shown) indeed suggests that they have appeared independently. The fact that the loss of function has appeared from at least eight different mutations also supports this scenario.

A point that remains unclear, however, is why so few synonymous polymorphisms were observed at exon 1. Local selection is indeed known to increase differentiation not only at selected sites but also at linked neutral sites via hitchhiking effects (Charlesworth, Nordborg, and Charlesworth 1997Citation ). A likely explanation for this would be that local positive selection was relatively recent and was preceded by episodes of purifying selection that swept out the nucleotide diversity in this region of the gene (purifying selection probably also acted on exons 2 and 3, as discussed previously). A recent episode of selection would also explain why no accumulation of further nonsynonymous mutations was observed in interrupted sequences, despite the fact that the gene sequence should no longer be under selective constraint once a loss-of-function mutation has arisen.

High Level of Recombination at the FRI and Other Genes in A. thaliana
As noticed previously by Kuittinen and Aguadé (2000)Citation , a striking feature of nucleotide sequence studies in A. thaliana is the considerable discrepancy found among different genes for the estimated recombination parameter 4Nc and its ratio to the estimated mutation parameter 4Nµ. This ratio can reach extremely large values, whereas A. thaliana is known to display a very low outcrossing rate (Abbott and Gomez 1989Citation ). One explanation for this, as proposed by Kuittinen and Aguadé (2000)Citation , lies in the possible heterosis conferred by recombination between individuals carrying slightly deleterious mutations. In the light of our results, a possible role of local selection may also be suggested. First, local positive selection may increase the apparent number of recombinations simply by enhancing the overall level of nucleotide variation. Second, in inbreeding species, even rare recombination events are probably an efficient way to create adaptive variation. For example, a loss of function for the FRI gene, conferring earliness, can result from a mutation as well as from a recombination event involving any already nonfunctional FRI allele. Thus, local positive selection may have maintained more recombination events (effective recombination) than expected under neutrality.

Identifying the Targets of Selection for Flowering Time
FRI is a regulatory gene that has been shown to increase the expression level of FLC, a gene encoding a MADS-box protein that acts to inhibit flowering (Sheldon et al. 1999Citation ; Michaels and Amasino 2000Citation ). Two loss-of-function mutations in the FRI (the indels at site 263 and 1511) were previously shown to induce early flowering in natural ecotypes (Johanson et al. 2000Citation ). Our study confirmed and extended these results to six other loss-of-function mutations. These mutations were clearly associated with a life cycle of reduced length.

The functional effects of the observed amino acid changes at the FRI and their consequences for flowering time are much more hypothetical. Some early-flowering ecotypes had an intact FRI open reading frame, which suggests the following three possibilities. First, some of the observed amino acid replacements in the FRI gene could modify or even suppress the function of the FRI protein. Indeed, some replacements seemed associated with a transition from a late-flowering to an early-flowering phenotype (fig. 5 ). Second, mutations in the promoter region of FRI, which were not investigated here, could also lead to the nonexpression of FRI. Third, early flowering may be caused by the activity of other flowering time genes. Because FRI interacts synergistically with FLC, a loss-of-function mutation in FLC results in earliness, whatever the FRI allele (Michaels and Amasino 2000Citation ), and amino acid changes at the FLC may also affect flowering time. Moreover, the FLC is controlled by at least two other pathways, the vernalization pathway and the autonomous pathway, and sequence variation at the underlying genes may also affect flowering time. Clearly, the joint analysis of amino acid changes at several genes involved in flowering time is needed to better characterize the evolutionary importance of the amino acid changes identified in natural FRI alleles. Nevertheless, the previous analyses by Johanson et al. (2000)Citation and our results clearly demonstrate that FRI is a major target of natural selection for flowering time in A. thaliana.

Phylogenetical reconstruction and the comparison of the levels of diversity between complete and interrupted sequences suggest that the loss-of-function mutations are more recent than most of the amino acid changes observed in the FRI. Mutations resulting in stop codons are less frequent than the mutations causing amino acid changes. One can hypothesize that selection for earliness first induced amino acid changes until a stop codon or indel knocked out the gene.

A Putative Scenario for the Recent Evolution of Flowering Time in A. thaliana
In their study of 40 Arabidopsis ecotypes, Johanson et al. (2000)Citation found a latitudinal gradient of flowering time across Europe. The majority of late-flowering ecotypes were from northern latitudes, whereas most of the early ecotypes were from central and eastern Europe. Thus, early flowering may be generally advantageous under the latitudes from which the ecotypes studied in the present article originated. In contrast, climatic conditions that prevailed during glaciation in the refugia were probably close to the present conditions in northern Europe. One could, therefore, hypothesize that late flowering was advantageous during the last glacial period in Europe and that purifying selection was then acting to maintain the functionality of the FRI protein. According to this scenario, selection for greater precocity occurred after the postglacial recolonization of Europe, when new conditions were encountered.

The presence of late-flowering phenotypes among our studied ecotypes can be explained in different ways. First, they may reflect adaptation to locally varying environmental conditions. Second, if selection for earliness is only recent, populations with nonoptimal phenotypes may still be present. Third, as human-induced dispersal is known to be a major factor in the recent spread of Arabidopsis, some of our studied ecotypes may be long-distance migrants.

It also cannot be ruled out that some early-flowering ecotypes may predate the postglacial expansion. Related haplotypes may have ended up at different locations during the postglacial recolonization, producing the lack of geographical structure we observed. Whether phenotypic variation of flowering time in A. thaliana actually reflects the adaptation to present local conditions remains an open question that a better ecological characterization of Arabidopsis natural populations would help to answer.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We are grateful to Helmi Kuittinen and Outi Savolainen for the A. lyrata sequence. Special thanks are given to Deborah Charlesworth for her helpful discussion and comments on an earlier version of this manuscript. We are indebted to John McDonald for performing the heterogeneity tests on his software DNA slider. We also thank Patrick Tranel from the University of Illinois Champaign for the sequencing facilities, Christophe Délye for sharing his expertise in DNA sequencing, and Marie-Laure Biard and Jean-Michel Rémond for their technical assistance. This study was supported in part by a grant from the program "Coopération UIC-INRA" to F.R. and from the Bureau des Ressources Génétiques as well as the Région Bourgogne.


    Footnotes
 
Diethard Tautz, Reviewing Editor

Keywords: nucleotide diversity selection flowering time Arabidopsis thaliana Back

Address for correspondence and reprints: Valérie Le Corre, Laboratoire Malherbologie et Agronomie, INRA, BP 86510, 21065 Dijon Cedex, France. lecorre{at}dijon.inra.fr Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Abbott R. J., M. F. Gomez, 1989 Population genetic structure and outcrossing rate in Arabidopsis thaliana (L.) Heynh Heredity 62:411-418[ISI]

    Aguadé M., 2001 Nucleotide sequence variation at two genes of the phenylpropanoid pathway, the FAH1 and F3H genes, in Arabidopsis thaliana Mol. Biol. Evol 18:1-9[Abstract/Free Full Text]

    Caicedo A. L., B. A. Schaal, B. N. Kunkel, 1999 Diversity and molecular evolution of the RPS2 resistance gene in Arabidopsis thaliana Proc. Natl. Acad. Sci. USA 96:302-306[Abstract/Free Full Text]

    Charlesworth B., M. Nordborg, D. Charlesworth, 1997 The effects of local selection, balanced polymorphism and background selection on equilibrium patterns of genetic diversity in subdivided populations Genet. Res 70:155-174[ISI][Medline]

    Corpet F., 1988 Multiple sequence alignment with hierarchical clustering Nucleic Acids Res 16:10881-10890[Abstract]

    Depaulis F., M. Veuille, 1998 Neutrality tests based on the distribution of haplotypes under an infinite-site model Mol. Biol. Evol 15:1788-1790[Free Full Text]

    Doyle J. J., J. L. Doyle, 1987 Isolation of DNA from fresh plant tissue Focus 12:13-15

    Goldman N., Z. Yang, 1994 A codon based model of nucleotide substitution for protein-coding DNA sequences Mol. Biol. Evol 11:725-736[Abstract/Free Full Text]

    Goss P. J. E., R. C. Lewontin, 1996 Detecting heterogeneity of substitution along DNA and protein sequences Genetics 143:589-602[Abstract/Free Full Text]

    Grantham R., 1974 Amino acid difference formula to help explain protein evolution Science 185:862-864[ISI][Medline]

    Hansftingl U., A. Berry, E. A. Kellogg, J. T. Costa III, W. Rudige, R. M. Ausubel, 1994 Haplotypic divergence coupled with lack of diversity at the Arabidopsis thaliana alcohol deshydrogenase locus: roles for both balancing and directional selection? Genetics 138:811-828[Abstract/Free Full Text]

    Harding R. M., E. Healy, A. J. Ray, et al. (11 co-authors) 2000 Evidence for variable selection pressures at MC1R Am. J. Hum. Genet 66:1351-1361[ISI][Medline]

    Harpending H., 1994 Signature of ancient population growth in a low resolution mitochondrial DNA mismatch distribution Hum. Biol 66:591-600[ISI][Medline]

    Hudson R. R., 1987 Estimating the recombination parameter of a finite population model without selection Genet. Res 50:245-250[ISI][Medline]

    Hudson R. R., N. L. Kaplan, 1985 Statistical properties of the number of recombination events in the history of a sample of DNA sequences Genetics 111:147-164[Abstract/Free Full Text]

    Innan H., F. Tajima, R. Terauchi, N. T. Miyashita, 1996 Intragenic recombination in the Adh locus of the wild plant Arabidopsis thaliana Genetics 143:1761-1770[Abstract/Free Full Text]

    Innan H., R. Terauchi, N. T. Miyashita, 1997 Microsatellite polymorphism in natural populations of the wild plant Arabidopsis thaliana Genetics 146:1441-1452[Abstract/Free Full Text]

    Johanson U., J. West, C. Lister, M. Scott, R. Amasino, C. Dean, 2000 Molecular analysis of FRI, a major determinant of natural variation in Arabidopsis flowering time Science 290:344-347[Abstract/Free Full Text]

    Jukes T. H., C. R. Cantor, 1969 Evolution of protein molecules Pp. 21–32 in H. Munro, ed. Mammalian protein metabolism. Academic Press, New York

    Kawabe A., H. Innan, R. Terauchi, N. T. Miyashita, 1997 Nucleotide polymorphism in the acidic chitinase locus (ChiA) region of the wild plant Arabidopsis thaliana Mol. Biol. Evol 14:1303-1315[Abstract]

    Kawabe A., N. T. Miyashita, 1999 DNA variation in the basic chitinase locus (ChiB) region of the wild plant Arabidopsis thaliana Genetics 153:1445-1453[Abstract/Free Full Text]

    Kawabe A., K. Yamane, N. T. Miyashita, 2000 DNA polymorphism at the cytosolic phosphoglucose isomerase (PgiC) locus of the wild plant Arabidopsis thaliana Genetics 156:1339-1347[Abstract/Free Full Text]

    Kuittinen H., M. Aguadé, 2000 Nucleotide variation at the Chalcone isomerase locus in Arabidopsis thaliana Genetics 155:863-872[Abstract/Free Full Text]

    Kumar S., K. Tamura, I. B. Jakobsen, M. Nei, 2001 MEGA2: molecular evolutionary genetics analysis software Bioinformatics 17:1244-1245[Abstract/Free Full Text]

    McDonald J. H., 1996 Detecting non-neutral heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence Mol. Biol. Evol 13:253-260[Abstract]

    ———. 1998 Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence Mol. Biol. Evol 15:377-384[Abstract]

    McDonald J. H., M. Kreitman, 1991 Adaptive protein evolution at the Adh locus in Drosophila Nature 351:652-654[ISI][Medline]

    Michaels S. D., R. M. Amasino, 2000 Memories of winter: vernalization and the competence to flower Plant Cell Environ 23:1145-1153[ISI]

    Miyashita N. T., A. Kawabe, H. Innan, 1998 Intra- and interspecific DNA variation and codon bias of the alcohol deshydrogenase (Adh) locus region in Arabis and Arabidopsis species Mol. Biol. Evol 13:433-436.[Free Full Text]

    ———. 1999 DNA variation in the wild plant Arabidopsis thaliana revealed by amplified fragment length polymorphism analysis Genetics 152:1723-1731[Abstract/Free Full Text]

    Nei M., 1987 Molecular evolutionary genetics Columbia University Press, New York

    Nei M., T. Gojobori, 1986 Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions Mol. Biol. Evol 3:418-426[Abstract]

    Nordborg M., B. Charlesworth, D. Charlesworth, 1996 Increased levels of polymorphism surrounding selectively maintained sites in highly selfing species Proc. R. Soc. Lond. B 263:1033-1039[ISI]

    Ohta T., 1992 The nearly neutral theory of molecular evolution Annu. Rev. Ecol. Syst 23:263-286[ISI]

    Pigliucci M., 1998 Ecological and evolutionary genetics of Arabidopsis TREE 3:485-489

    Purugganan M. D., J. I. Suddith, 1998 Molecular population genetics of the ArabidopsisCAULIFLOWER regulatory gene: nonneutral evolution and naturally occurring variation in floral homeotic function Proc. Natl. Acad. Sci. USA 95:8130-8134[Abstract/Free Full Text]

    ———. 1999 Molecular population genetics of floral homeotic loci: departure from the equilibrium-neutral model at the APETALA3 and PISTILLATA genes of Arabidopsis thaliana Genetics 151:839-848[Abstract/Free Full Text]

    Rogers A. R., H. Harpending, 1992 Population growth makes waves in the distribution of pairwise genetic differences Mol. Biol. Evol 9:552-569[Abstract]

    Rozas J., R. Rozas, 1999 DnaSP version 3: an integrated program for molecular population genetics and molecular population analysis Bioinformatics 15:174-175[Abstract/Free Full Text]

    Sharbel T., H. Bernhard, T. Mitchell-Olds, 2001 Genetic isolation by distance in Arabidopsis thaliana: biogeography and postglacial colonization of Europe Mol. Ecol 9:2109-2118[ISI]

    Sheldon C. C., J. E. Burn, P. P. Perez, J. Metzger, J. A. Edwards, W. J. Peacock, E. S. Dennis, 1999 The FLF MADS box gene. A repressor of flowering in Arabidopsis regulated by vernalization and methylation Plant Cell 11:445-458[Abstract/Free Full Text]

    Simonsen K. L., G. A. Churchill, C. F. Aquadro, 1995 Properties of statistical tests of neutrality for DNA polymorphism data Genetics 141:413-429[Abstract/Free Full Text]

    Stahl E. A., G. Dwyer, R. Mauricio, M. Kreitamn, J. Bergelson, 1999 Dynamics of disease resistance polymorphism at the Rpm1 locus of Arabidopsis Nature 400:667-671[ISI][Medline]

    Tajima F., 1989 Statistical method for testing the neutral mutation hypothesis by DNA polymorphism Genetics 123:585-595[Abstract/Free Full Text]

    Watterson G. A., 1975 On the number of segregating sites in genetical models without recombination Theor. Popul. Biol 7:256-276[ISI][Medline]

    Whitlock M. C., 2000 Fixation of new alleles and the extinction of small populations: drift load, beneficial alleles, and sexual selection Evolution 54:1855-1861[ISI][Medline]

    Wright F., 1990 The "effective number of codons" used in a gene Gene 87:23-29[ISI][Medline]

Accepted for publication March 5, 2002.