Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Spain
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
DNA sequences contain valuable information about the evolutionary history of the particular region of the genome and/or species studied. According to the neutral theory of molecular evolution (Kimura 1983
), the levels of polymorphism within species and divergence between species at different genomic regions should be positively correlated if populations are in mutation-drift equilibrium. The neutral theory also makes definite predictions about the frequency spectrum of variants segregating in natural populations (Watterson 1974
). Deviations from equilibrium due to demographic events such as population bottlenecks or expansions will affect the level and pattern of polymorphism in all regions of the genome. On the other hand, directional or balancing selection in a particular region leaves a characteristic footprint on that region's pattern and level of neutral variation. Therefore, unlike demographic events, selection affects only specific parts of the genome. Comparison of the level and distribution of nucleotide variation within and between species has been successfully used, for example in Drosophila, to establish the role of natural selection, as opposed to genetic drift, in shaping molecular variation within species and thus in determining molecular evolution in particular regions of the genome (Hudson, Kreitman, and Aguadé 1987
; Tajima 1989
; McDonald and Kreitman 1991
; Fu and Li 1993
; McDonald 1996, 1998
).
Variation at one of the genes of the phenylpropanoid pathway, the CHI, or chalcone isomerase, gene, was previously studied in different ecotypes of the self-fertilizing species A. thaliana and in its close relative, the outcrossing species Arabidopsis lyrata ssp. petraea (Kuittinen and Aguadé 2000
). The CHI gene encodes the second enzyme in the pathway leading to the synthesis of flavonoids and anthocyanins. The frequency spectrum of nucleotide variants in the CHI gene region was skewed toward an excess of low-frequency polymorphisms, which was consistent with the suggested recent expansion of the species (Price, Palmer, and Al-Shehbaz 1994
). Variation in this region did not suggest the action of natural selection. Different parts of a metabolic pathway may differ in their responsiveness to selection. Also, the relative importance of sinapoate esters and flavonoids as UV-protectants is not well established. Nucleotide variation at one gene in each of these pathways (the FAH1 and F3H genes) has been surveyed by sequencing these regions for a similar sample of 20 ecotypes of A. thaliana and for one individual of the closely related species A. lyrata ssp. petraea.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Sequence Analysis
Sequences were assembled and aligned with the SeqEd program (Perkin-Elmer), which was also used to check all variable sites. The sequences were edited for further analyses using the program MacClade, version 3.0.6 (Maddison and Maddison 1992
). The program DnaSP, version 3.14 (Rozas and Rozas 1999
), was used for most intraspecific and some interspecific analyses. Neighbor-joining trees (Saitou and Nei 1987
) were built with the TreeCon program (Van de Peer and de Wachter 1994
) using genetic distances corrected for multiple hits (Jukes and Cantor 1969
).
Nucleotide variation was estimated as nucleotide diversity (; Nei 1987
). The genetic distance between major haplotypes (see below) was estimated as the average number of pairwise differences between haplotypes (k) and as the average per site number of nucleotide differences between haplotypes (Dxy; Nei 1987
). The four-gamete test (Hudson and Kaplan 1985
) was used to infer the minimum number of recombination events in the history of each sample. The recombination parameter C = 4Nc was estimated by the method of Hudson (1987)
, which is based on the variance in the number of pairwise differences. Linkage disequilibrium or gametic association was measured as the correlation coefficient R2 (Hill and Robertson 1968
) using only informative sites (for which the rarest variant is present more than once); the statistical significance of the associations was established by the
2 test. The ZnS statistic (Kelly 1998
) was also estimated, and its statistical significance was established by computer simulations based on the coalescent process without recombination (Hudson 1990
) and with recombination.
Coalescent Simulations
Computer simulations were used to contrast whether the presence of two subsets of highly differentiated sequences in a sample of size n is consistent with the equilibrium neutral model (J. Rozas, personal communication). The test statistic used is based on the number of mutations fixed between the two subsets given the total number of segregating sites (S) in the sample. Here, these subsets refer to a partition of the genealogical tree where the two lineages descending from the root have a and n - a sequences (a, n - a partition). Processes like balancing selection, population subdivision, or population decline should generate a larger number of fixed differences than those expected under neutrality. Population expansion should, on the other hand, generate a smaller number of fixed differences than expected under stationarity.
The empirical distribution of the number of fixed differences between two subsets of sequences was obtained by computer simulation (1,000 replicates) based on the coalescent process with no recombination (Kingman 1982a, 1982b
; Hudson 1990
). The genealogical samples were generated conditioned on n, S and a particular partition (a, n - a) of sequences. Random genealogies were generated using conventional procedures, but samples with partitions other than (a, n - a) were discarded.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
Two recombination events were detected in the evolutionary history of FAH1 by the four-gamete test (Hudson and Kaplan 1985
). Both events occurred in the rightmost part of the region studied (between polymorphic sites 947 and 1478 and between sites 1478 and 1946, respectively). Although this test detected only one recombination event in the history of the F3H sample (between sites 57 and 204), the partition of the ecotypes for the different variants in this region indicates that ecotypes ME-0, RSCH-0, CHA-0, and RUB-1 are recombinants. The recombination parameter, C, estimated from the sampled sequences was 3.2 for FAH1 and 1.1 for F3H.
The proportion of pairwise comparisons with a significant association between variants, i.e., sites in linkage disequilibrium, was high for both FAH1 and F3H (table 3
). These percentages remained above 30% after Bonferroni correction for multiple comparisons. Values of the ZnS statistic (Kelly 1998
) were high: 0.383 and 0.501 for FAH1 and F3H, respectively. The probabilities of obtaining values higher than those observed were low but were not significant assuming no recombination (0.154 and 0.053, respectively). When the C value estimated for each gene was used, the probabilities were 0.047 and 0.031, respectively. The overall level of significant disequilibrium in the two regions therefore probably cannot be explained by mutation-drift equilibrium.
|
|
As a result of recombination, different parts of a given sequence have different evolutionary histories. Nevertheless, the presence of two highly differentiated haplotypes in both genes studied was clearly detectable in the corresponding neighbor-joining trees of the 20 ecotypes when the sequence of A. lyrata ssp. petraea was used as the outgroup (results not shown). For the FAH1 region, where recombination was detected in the less variable region, both clusters of ecotypes were supported by high bootstrap values (86% and 100%). On the other hand, for the F3H region, only the clade corresponding to the F3H.1 haplotype showed a high bootstrap value (95%).
Amino Acid Replacement Polymorphism and Divergence
Three replacement polymorphisms, resulting in three amino acid haplotypes (fig. 1
), were detected in the N-terminal half of the deduced F5H protein; none of these polymorphisms involved any charge change. At the corresponding residues, A. lyrata presented an alanine, a glycine, and a threonine, respectively, as in FAH1.2. Five amino acid polymorphisms were detected in the F3H protein (fig. 2
); these polymorphisms resulted in five amino acid haplotypes, four of which differed in the total charge of the corresponding protein. At each of these residues, A. lyrata presented the most common amino acid in the A. thaliana ecotypes except at the second residue, where A. lyrata had an aspartic acid like ecotypes MR-0, Col-2, and Can-0.
Under neutrality, the ratio between nonsynonymous and synonymous changes should be the same within and between species. The McDonald and Kreitman (1991)
, or MK, test examines this prediction using a 2 x 2 contingency table where the observed differences are classified as synonymous or nonsynonymous and also as polymorphic within species or fixed between species. No significant deviation from neutrality was detected in either the FAH1 or the F3H coding regions (results not shown).
Silent Nucleotide Polymorphism and Divergence
Silent divergence in the FAH1 and F3H genes was estimated both for the different functional regions (table 2
) and by sliding a window of 100 silent sites across each of the regions studied (fig. 4
). In both genes, silent divergence was more homogeneously distributed than silent polymorphism. In the FAH1 gene, there was a peak of silent nucleotide diversity that, like the peak of between-haplotype silent diversity (measured as Dxy; fig. 3 ), was centered at the beginning of the second exon and, more specifically, at the nonsynonymous polymorphism at site 645. In the F3H gene, the estimated silent nucleotide diversity was highest at the beginning of the region studied (fig. 4
).
|
The different tests of heterogeneity in the ratio of silent polymorphism to divergence across a given DNA region developed by McDonald (1996, 1998)
, which do not require any a priori partition of the region studied, were also applied to the FAH1 and F3H regions. The mean sliding G test revealed some heterogeneity in the ratio of polymorphism to divergence both for FAH1 (with probabilities ranging between 0.057 and 0.085 for the different recombination values used) and for F3H (with probabilities ranging between 0.040 and 0.057). In the F3H gene, the Kolmogorov-Smirnov test also revealed some possible heterogeneity (with probabilities ranging between 0.045 and 0.067).
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The genomewide presence of dimorphism can, however, be questioned because two clear patterns of variation emerge from the genes thus far surveyed. In the Adh (Innan et al. 1996
), ChiA (Kawabe et al. 1997
), ChiB (Kawabe and Miyashita 1999
), and Rpm1 (Stahl et al. 1999
) regions, variation is structured into two highly differentiated haplotypes, and most nucleotide diversity can be attributed to the between-haplotype differences, with relatively little variation present within haplotypes. On the other hand, in the CAL, PI, AP3 (Purugganan and Suddith 1998, 1999
), and CHI (Kuittinen and Aguadé 2000
) gene regions, there is no clear evidence for two major haplotypes. Variation in the two genes studied here conforms to the first pattern.
The presence of recombinants between the two highly differentiated haplotypes in some genomic regions (Adh, ChiA, ChiB, Rpm1, FAH1, F3H) requires that the two haplotypes of each region segregated in the same population at some point in the past. Most probably, they were present in the ancestral A. thaliana populations of Central Asia prior to the suggested worldwide expansion of the species (Price, Palmer, and Al-Shehbaz 1994
). The general lack of association between the distribution of the different haplotypes and their geographical origin would also favor the view that recombination had occurred before the species expanded its range. The contrasting patterns of variation in different gene regions seem to preclude admixture of two highly differentiated populations prior to the expansion. Also, a genomewide amplified fragment length polymorphism analysis of variation in A. thaliana using 38 ecotypes sampled worldwide (Miyashita, Kawabe, and Innan 1999
) gives no support to the admixture hypothesis. However, neither of these observations constitutes clear evidence against population structure in the ancestral population.
Level of Nucleotide Variation in A. thaliana and Demography
Despite the presence of two highly differentiated haplotypes in both FAH1 and F3H, silent nucleotide diversity was lower in the former gene (tables 2 and 4
). This is concordant with the observed shorter extent of the region showing two major haplotypes in FAH1 (figs. 1 and 2
). Except for this gene, silent nucleotide diversity was generally higher in those regions with two major haplotypes (Adh, ChiA, ChiB, FAH1, and F3H) than in the other regions (table 4
). In fact, the average silent nucleotide diversity in the former regions was 0.0126, while the average value for CAL, AP3, PI, and CHI was 0.0067.
|
Recent demographic events, such as a population expansion, should affect the pattern of nucleotide variation in all parts of the genome. In regions not subject to balancing selection, the frequency spectrum of variants would be expected to be skewed toward an excess of polymorphisms with rare variants. The four genes for which there was no clear evidence for dimorphism (CAL, AP3, PI, and CHI) presented an excess of singletons. This excess resulted in a negative value of Tajima's D statistic, which was significant only for the first three genes (Purugganan and Suddith 1998, 1999
; Kuittinen and Aguadé 2000
). This observation has been considered to support a recent increase in the population size of the species.
The two highly differentiated haplotypes present in some regions (Adh, ChiA, ChiB, Rpm1, FAH1, and F3H) clearly predate the worldwide expansion of the species (see above). Even if in each gene region the two haplotypes were maintained by balancing selection, variation within each haplotype should also reflect the expansion. In all of these regions, the level of within-haplotype diversity was low compared with between-haplotypes diversity, and the low number of within-haplotype polymorphisms in most regions probably precludes the detection of any footprint of the expansion in those regions. This result might otherwise question the suggested expansion of the species.
Dimorphism in the FAH1 and F3H Genes
The two divergent haplotypes present in the Adh region of A. thaliana suggested the presence of a balanced polymorphism in the fourth exon of this gene, perhaps associated with allozyme variation (Hanfstingl et al. 1994; Innan et al. 1996
). In the Rpm1 gene region, the divergent haplotypes are associated with a phenotypic difference: susceptibility or resistance to a pathogen (Stahl et al. 1999
). The significant excess of silent polymorphism detected at the Rpm1 "junction" region was attributed to the action of balancing selection. The similar pattern of variation observed in the other regions with clear dimorphism also suggests balancing selection. However, a dimorphic pattern of variation could also conform to the expectations of a neutral process in an essentially selfing species like A. thaliana. In this species, the reported level of outcrossing is less than 1% (Abbott and Gomes 1989
), and plants will be mostly homozygous. The scarcity of heterozygous individuals will cause recombination to be effectively very rare.
In a constant-size neutral coalescent process with no recombination (as reviewed in Hudson 1990
), the time between two coalescence events is approximately exponentially distributed. Accordingly, the expected time required for the coalescence from n to two sequences is nearly equal to the time required for these two sequences to coalesce to a single sequence or the most recent common ancestor (MRCA) of all sequences. Thus, the branch separating the two sets of sequences on each side of the root might accumulate a high number of mutations. Computer simulation of the coalescent process with no recombination (see Materials and Methods) was used to test whether the number of silent differences fixed between the two divergent haplotypes in each of the FAH1 and F3H genes was consistent with the neutral process. For the FAH1 gene, only the 5' half of the region studied (where all fixed differences between haplotypes were located; see Results) was analyzed. For 12 polymorphisms and a partition of 14 and 6 sequences (fig. 1
), the probability of having a number of fixed differences equal to or higher than the observed eight differences was 0.084. This probability was 0.061 when the 12 silent nucleotide polymorphisms and one indel were considered, and 0.11 when only 11 nucleotide polymorphisms were considered (those remaining after exclusion of the nucleotide polymorphism associated with the complex mutational event; see Results). In the F3H gene with 23 silent polymorphisms, the corresponding probability for a partition of 13 and 3 sequences with 16 silent fixed differences (fig. 2
) was 0.065. Thus, the pattern of variation observed in both the FAH1 and the F3H genes would seem to be compatible with a constant-size neutral process with no recombination.
In A. thaliana, recombination might be very low but not entirely absent. Recombination would decrease the probabilities of observing the actual numbers of fixed differences between the two divergent haplotypes present in both the FAH1 and the F3H genes. This, together with the observed heterogeneous distribution of silent polymorphic and fixed changes across both genes, may indicate that processes other than genetic drift (e.g., selection) are contributing to the generation of the observed patterns of variation in these genes. Also, when considering all regions thus far studied in this species, it seems difficult to envisage that a neutral process with population expansion might be causing the contrasting patterns of variation detected, i.e., the presence of dimorphism in some genes and a starlike phylogeny in the rest. The number of regions surveyed is, however, not very large, but it is rapidly increasing. The joint analysis of variation in a large number of regions might be the most promising way to establish the role played by demographic events and drift, as opposed to selection, in the evolutionary history of A. thaliana.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: polymorphism
divergence
Arabidopsis thaliana,
Arabidopsis lyrata.
2 Address for correspondence and reprints: Montserrat Aguadé, Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08071 Barcelona, Spain. E-mail: aguade{at}bio.ub.es
![]() |
literature cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Abbot, R. J., and M. F. Gomes. 1989. Population genetic structure and outcrossing rate of Arabidopsis thaliana (L.) Heynh. Heredity 62:411418.
Chapple, C. C. S., B. W. Shirley, M. Zook, R. Hammerschmidt, and S. C. Somerville. 1994. Secondary metabolism in Arabidopsis. Pp. 9891030 in E. M. Meyerowitz and C. Somerville, eds. Arabidopsis. Cold Spring Harbor Laboratory Press, New York.
Day, T. A. 1993. Relating UV-B radiation screening effectiveness of foliage to absorbing-compound concentration and anatomical characteristics in a diverse group of plants. Oecologia 95:542550.
Feinbaum, R. L., and R. M. Ausubel. 1988. Transcriptional regulation of the Arabidopsis thaliana chalcone synthase gene. Mol. Cell. Biol. 8:19851992.[ISI][Medline]
Fu, Y.-X., and W.-H. Li. 1993. Statistical tests of neutrality of mutations. Genetics 133:693709.
Hansftingl, U., A. Berry, E. A. Kellogg, J. T. Costa III, W. Rudige, and R. M. Ausubel. 1994. Haplotypic divergence coupled with lack of diversity at the Arabidopsis thaliana alcohol dehydrogenase locus: roles for both balancing and directional selection? Genetics 138:811828.
Hill, W. G., and A. Robertson. 1968. Linkage disequilibrium in finite populations. Theor. Appl. Genet. 38:226231.
Hudson, R. R. 1987. Estimating the recombination parameter of a finite population model without selection. Genet. Res. 50:245250.[ISI][Medline]
. 1990. Gene genealogies and the coalescent process. Oxf. Surv. Evol. Biol. 7:144.
Hudson, R. R., and N. L. Kaplan. 1985. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147164.
Hudson, R. R., M. Kreitman, and M. Aguadé. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153159.
Innan, H., F. Tajima, R. Terauchi, and N. T. Miyashita. 1996. Intragenic recombination in the Adh locus of the wild plant Arabidopsis thaliana. Genetics 143:17611770.
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21120 in H. M. Munro, ed. Mammalian protein metabolism. Academic Press, New York.
Kawabe, A., H. Innan, R. Terauchi, and N. T. Miyashita. 1997. Nucleotide polymorphism in the acidic chitinase locus (ChiA) region of the wild plant Arabidopsis thaliana. Mol. Biol. Evol. 14:13031315.[Abstract]
Kawabe, A., and N. T. Miyashita. 1999. DNA variation in the basic chitinase locus (ChiB) region of the wild plant Arabidopis thaliana. Genetics 153:14451453.
Kelly, J. 1998. A test of neutrality based on interlocus associations. Genetics 146:11971206.
Kimura, M. 1983. The neutral theory of molecular evolution. Cambridge University Press, Cambridge, England.
Kingman, J. F. C. 1982a. On the genealogy of large populations. J. Appl. Prob. 19A:2743.
. 1982b. The coalescent. Stochast. Proc. Appl. 13:235248.
Kuittinen, H., and M. Aguadé. 2000. Nucleotide variation at the Chalcone Isomerase locus in Arabidopsis thaliana. Genetics 155:863872.
Landry, G., C. C. S. Chapple, and R. L. Last. 1995. Arabidopsis mutants lacking phenolic sunscreens exhibit enhanced ultraviolet-B injury and oxidative damage. Plant Physiol. 109:11591166.
Li, J., T.-M. Ou-Lee, R. Raba, R. G. Amundson, and R. L. Last. 1993. Arabidopsis flavonoid mutants are hypersensitive to UV-B irradiation. Plant Cell 5:171179.
Lois, R. 1994. Accumulation of UV-absorbing flavonoids induced by UV-B radiation in Arabidopsis thaliana. Planta 194:498503.
McDonald, J. H. 1996. Detecting nonneutral heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol. Biol. Evol. 13:253260.[Abstract]
. 1998. Improved tests for heterogeneity across a region of DNA sequence in the ratio of polymorphism to divergence. Mol. Biol. Evol. 15:377384.[Abstract]
McDonald, J. H., and M. Kreitman. 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652654.
Maddison, W. P., and D. R. Maddison. 1992. MacClade: analysis of phylogeny and character evolution. Version 3.0. Sinauer, Sunderland, Mass.
Meyer, K., J. C. Cusumano, C. Somerville, and C. C. S. Chapple. 1996. Ferulate-5-hydroxylase from Arabidopsis thaliana defines a new family of cytochrome P450-dependent monooxygenases. Proc. Natl. Acad. Sci. USA 93:68696874.
Miyashita, N. T., A. Kawabe, and H. Innan. 1999. DNA variation in the wild plant Arabidopsis thaliana revealed by amplified fragment length polymorphism analysis. Genetics 152:17231731.
Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York.
Pelletier, M. K., and B. W. Shirley. 1996. Analysis of flavanone 3-hydroxylase in Arabidopsis seedlings. Plant Physiol. 111:339345.
Price, R. A., J. D. Palmer, and I. A. Al-Shehbaz. 1994. Systematic relationships of Arabidopsis: a molecular and morphological perspective. Pp. 719 in E. M. Meyerowitz and C. Somerville, eds. Arabidopsis. Cold Spring Harbor Laboratory Press, New York.
Purugganan, M. D., and J. I. Suddith. 1998. Molecular population genetics of the Arabidopsis CAULIFLOWER regulatory gene: nonneutral evolution and naturally occurring variation in floral homeotic function. Proc. Natl. Acad. Sci. USA 95:81308134.
. 1999. Molecular population genetics of floral homeotic loci: departures from the equilibrium-neutral model at the APETALA3 and PISTILLATA genes of Arabidopsis thaliana. Genetics 151:839848.
Rogers, S. O., and A. J. Bendich. 1985. Extraction of DNA from milligram amounts of fresh, herbarium and mummified plant tissues. Plant Mol. Biol. 5:6976.[ISI]
Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174175.
Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406425.[Abstract]
Shirley, B., S. Hanley, and H. M. Goodman. 1992. Effects of ionizing radiation on a plant genome: analysis of two Arabidopsis transparent testa mutations. Plant Cell 4:333347.
Stahl, M., G. Dwyer, R. Mauricio, M. Kreitman, andJ. Bergelson. 1999. Dynamics of disease resistance polymorphism at the Rpm1 locus of Arabidopsis. Nature 400:667671.
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585595.
Van de Peer, Y., and R. de Wachter. 1994. TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput. Appl. Biosci. 10:569570.[Medline]
Watterson, G. A. 1974. The sampling theory of selectively neutral alleles. Adv. Appl. Prob. 6:463488.
Wisman, E., U. Hartman, M. Sagasser, E. Baumann, K. Palme, K. Hahlbrock, H. Saedler, and B. Weisshaar. 1998. Knock-out mutants from an En-1 mutagenized Arabidopsis thaliana population generate phenylpropanoid biosynthesis phenotypes. Proc. Natl. Acad. Sci. USA 95:1243212436.