Department of Genetics, University of Wisconsin-Madison
Correspondence: E-mail: kbomblies{at}yahoo.com.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: FLORICAULA/LEAFY Andropogoneae zfl1 zfl2 maize domestication
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The genetic basis of the complex morphological changes that accompany the transition to reproductive development has been extensively studied in flowering plants, particularly in dicot species. One of the key regulatory genes in inflorescence and flower development is the Antirrhinum majus FLORICAULA gene (FLO; Coen et al. 1990) and its Arabidopsis thaliana ortholog, LEAFY (LFY; Weigel et al. 1992). FLO and LFY gene products are involved in promoting the reproductive transition, as well as in controlling the identity and patterning of flowers and their constituent organs (Coen et al. 1990; Weigel et al. 1992). Studies in additional species suggest that the role of FLO/LFY orthologs in reproductive development is largely conserved in diverse angiosperms, including maize (Hofer et al. 1997; Souer et al. 1998; Molinero-Rosales et al. 1999; Ahearn et al. 2001; Bomblies et al. 2003).
In some species the FLO/LFY genes appear to have evolved novel functions in addition to their normal roles in reproductive development. These include roles in shoot apical meristem development in tobacco (Ahearn et al. 2001), leaf compounding in pea and tomato (Souer et al. 1998; Molinero-Rosales et al. 1999), and a potential role in inflorescence branching in rice (Kyozuka et al. 1998). Furthermore, expression changes of FLO/LFY-like genes have been implicated in the evolution of inflorescence architecture in Brassicaceae species (Shu et al. 2000; Yoon and Baum 2004), while in maize we have previously proposed one of two duplicate FLO/LFY orthologs, zfl2, as a candidate gene for a quantitative trait locus (QTL) contributing to inflorescence structure differences between maize and its wild progenitor, teosinte (Zea mays ssp. parviglumis; hereafter parviglumis; Bomblies et al. 2003; Doebley 2004). The potential roles of FLO/LFY genes in inflorescence structure evolution, along with the finding that these genes appear to be involved in inflorescence branching in the grasses rice and maize (Kyozuka et al. 1998; Bomblies et al. 2003), make them attractive candidates for a role in mediating the evolution of inflorescence structure differences in the Poaceae.
To begin addressing whether FLO/LFY orthologs may play a role in grass morphological evolution, we undertook a study of the molecular evolution of FLO/LFY-like genes in the Andropogoneae, a morphologically diverse tribe of grasses that includes maize and sorghum (Kellogg 2000). We generated a phylogeny for Andropogoneae FLO/LFY-like genes and studied their molecular evolution. We also examined nucleotide diversity at the zfl2 locus in maize and parviglumis to address whether this gene has been selected for inflorescence architecture differences during the domestication of maize. Taken together, our results suggest that the FLO/LFY-like genes in the Andropogoneae are evolving with selective constraint for amino acid conservation. Relative-rate tests on zfl1- and zfl2-like sequences in maize and its close relatives suggest that in most species neither of the paralogs shows strong evidence for relaxed constraint following duplication. Finally, though we have previously presented zfl2 as a candidate gene for a maize domestication QTL based on its roles in development, there is no significant evidence for a selective sweep having acted on the zfl2-transcribed region during domestication.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Phylogenetic Analysis
We generated maximum parsimony phylogenetic trees by heuristic searches using PAUP 4.0b10 (Swofford 2003) starting from a random tree with simple stepwise addition, tree bisection reconnection branch swapping, and ACCTRAN (accelerated transformation) character optimization to estimate branch lengths. Alignment gaps were treated as missing data, and branches with zero length were collapsed. Bootstrap analysis (Felsenstein 1985) was carried out with 1,000 replicates using the same search options.
We used Modeltest 3.6 (Posada and Crandall 1998) to test which model of molecular evolution is most appropriate for our data set using the Akaike information criterion as implemented in Modeltest, which allows comparison of nonnested models and establishes a 95% confidence interval of appropriate models. We generated a phylogeny by Bayesian inference using MrBayes v.3.0b4 (Huelsenbeck and Ronquist 2001) with the GTR-invariant- model (Tavaré 1986; Yang 1997), as recommended as most appropriate for our data by Modeltest. We used uniform prior probabilities and four rate categories to approximate the
distribution. We ran the Markov chain Monte Carlo analysis for 1,000,000 generations starting from a random tree. The first 25,000 generations were dropped as chain burn-in, and subsequently every 100th generation was sampled to generate a set of 9,750 trees from which the 50% majority rule consensus tree was calculated in MrBayes.
We generated alternative tree topologies differing in the placement of the genera Elionurus and Coelorachis with respect to zfl1 and zfl2 in Treeview PPC (Page 1996). We tested for significant differences between these alternate hypotheses and the Bayesian 50% majority rule tree using the Shimodaira-Hasegawa (SH) test (Shimodaira and Hasegawa 1999) implemented in PAUP. We performed the SH test with settings corresponding to the GTR-invariant- model as was used to generate the phylogeny in MrBayes. Nucleotide frequencies, transition/transversion ratio, the number of invariant sites, and the
shape parameter for nucleotide substitution rate variation were estimated from the data using maximum likelihood (ML) in PAUP. These were entered as fixed values for the SH test, for which we ran 1,000 bootstrap replicates with full optimization (all free parameters estimated by ML in each replicate) to set the 95% confidence interval for the test statistic.
Sequence Analysis
We visualized conservation of Andropogoneae FLO/LFY-like sequences by comparing maize zfl1 and zfl2 with the Sorghum bicolor sequence using VISTA (Bray, Dubchak, and Pachter 2003) with 8-bp windows to maximize detection of short conserved noncoding sequences in intron1. We scanned intron sequences from maize zfl1 and zfl2, Elionurus muticus, and S. bicolor for known transcription factorbinding sites using the PLACE (Higo et al. 1999) and TESS (Schug and Overton 1997) databases. To estimate codon bias, we calculated the effective number of codons (ENC; Wright 1990) and synonymous third position GC content (GC3s) in DnaSP v.3.99 (Rozas et al. 2003).
We used dN/dS () ratios to examine whether Andropogoneae FLO/LFY-like sequences are evolving under purifying constraint for amino acid sequences (
< 1) or positive selection for amino acid changes (
> 1).
was calculated for each sequence and each codon within the sequences using a ML approach in PAML v.3.14 (Yang 1997), which employs the method of Goldman and Yang (1994) to take into account codon and transition-transversion biases.
In order to test for variation of evolutionary rates among sequences or at specific codons within the sequences, we calculated likelihood scores for the Bayesian phylogeny (shown in fig. 1) in PAML using the following models: (1) "model 0," in which one average value is estimated from the data and applied to all branches in the phylogeny and all codons in each sequence; (2) "model 1," which similarly applies one
value to all branches but places each codon within each sequence into one of two categories (
= 0 [purifying selection] and
= 1 [neutral]); (3) "model 2," which is similar to model 1 but allows codons in a third
category (
> 1 [positive or directional selection]); (4) "model ß," which applies one
value to all branches but places codons within each sequence into 1 of 10
value categories and fits a ß distribution where the estimated parameters (p and q) define the shape of the distribution of
values between zero and one; (5) "model ß +
," which is similar to model ß but allows an additional category for
> 1; and (6) "model F," in which a separate
value is calculated for each sequence in the phylogeny but all codons within the sequence are assigned that
value. These models have been previously described and used in sequence analyses (Goldman and Yang 1994; Nielsen and Yang 1998; Yang 1998; Yang et al. 2000).
|
We used Tajima's relative-rate test (Tajima 1993), as implemented in MEGA2.1 (Kumar et al. 2001), to test for differences in evolutionary rates among zfl1-zfl2 duplicates. We used S. bicolor as an out-group to test for rate variation between the duplicate sequences obtained from the anciently tetraploid Zea and Tripsacum species and used RFL (GenBank accession number AB005620) as an out-group to test for rate variation between the full-length maize zfl1- and zfl2-coding regions.
We calculated nucleotide diversity (), linkage disequilibrium (LD; as r2), the minimum number of recombination events (Rm), and the number of segregating sites (S) in maize and parviglumis zfl2 sequences using DnaSP. For LD analysis, we included coded insertion-deletion sites, except those due to microsatellites.
Selection Tests and Neutrality Statistics for Z. mays zfl2 Sequences
We performed HKA selection tests (Hudson, Kreitman, and Aguadé 1987) in DnaSP for the exon1-intron1-exon2 region of the maize zfl2 sequences using Tripsacum dactyloides as the out-group and for the intron2 region using Zea diploperennis as the out-group. For neutral control loci, we used previously published data sets for adh1 (Tenaillon et al. 2001; Tiffin and Gaut 2001), adh2 (Goloubinoff, Pääbo, and Wilson 1993), te1 (White and Doebley 1999), and bz2, an1, and csu1138 (Tenaillon et al. 2001). We calculated Fay and Wu's H statistic for genetic hitchhiking (Fay and Wu 2000) at http://crimp.lbl.gov/htest.html for the zfl2 exon1-intron1-exon2 region using Tripsacum floridanum as the out-group and for intron2 using Z. diploperennis as the out-group.
We used DnaSP to estimate the population recombination parameter (C = 4Nc) for the zfl2-transcribed region and intron2 sequences in maize and parviglumis by coalescent simulations. These employed the observed minimum number of recombination events (Rmobs) and the number of segregating sites (S) as estimated from the data in DnaSP. We performed simulations of 1,000 realizations each, with different input values for C, in ProSeq v2.7 (Filatov 2002) to determine at which value for C the simulated (Rmsim) exceeded Rmobs in fewer than 5% of the realizations (i.e., the value of C for which P[Rmsim Rmobs] = 0.95). This method of estimating C was previously described (Wall 1999). We used the estimated values of C in subsequent coalescent simulations (1,000 realizations each) to determine appropriate 95% confidence intervals for Tajima's D (Tajima 1989) and Wall's Q (Wall 1999). This method of estimating appropriate intervals that take recombination into account was previously described (Simonsen, Churchill, and Aquadro 1995; Wall 1999). We calculated the values of Tajima's D and Wall's Q for the sequence samples using DnaSP (for Tajima's D) and ProSeq (for Wall's Q) for zfl2 sequences from maize and parviglumis.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our Andropogoneae phylogeny for the FLO/LFY-like sequences agrees in several key aspects with previously published phylogenies for other genes (Spangler et al. 1999; Lukens and Doebley 2001; Mathews et al. 2002): (1) the "core Andropogoneae" form a monophyletic clade, (2) Sorghum FLO/LFY-like sequences show a close relationship with sequences from Cleistachne sorghoides and Saccharum officinarum, and (3) Sorghum-Cleistachne-Saccharum sequences group with the core Andropogoneae (fig. 1).
Sequences from the duplicate maize FLO/LFY genes, zfl1 and zfl2, fall into two separate and well-supported clades within the Tripsacum-Zea clade (fig. 1). This suggests that the duplication of these genes occurred prior to the divergence of Zea and Tripsacum but after the divergence of this clade from the remaining Andropogoneae. This corresponds with the proposed timing of a tetraploidy event estimated to have occurred approximately 11 MYA in an ancestor of the Zea-Tripsacum lineage (Gaut and Doebley 1997). Within the Zea-Tripsacum clade, we obtained single zfl1- and/or zfl2-like alleles from each species except Tripsacum andersonii, from which we obtained Zea-like alleles of zfl1 and zfl2, as well as a Tripsacum-like zfl2 allele from a single sample. This result is in agreement with a previous study that showed that T. andersonii is a polyploid hybrid species carrying Zea-like and Tripsacum-like genomes (Talbert et al. 1990).
Our phylogeny also agrees with several previously published Andropogoneae phylogenies (Spangler et al. 1999; Lukens and Doebley 2001; Mathews et al. 2002) in the placement of samples from the genera Coelorachis and Elionurus as close relatives of Zea and Tripsacum. Interestingly, sequences from these genera group more closely with zfl2 than with zfl1 (78% posterior probability; 60% parsimony bootstrap support). This branch topology is expected if Coelorachis and Elionurus are more closely related to a diploid progenitor that contributed a zfl2-like gene to the ancestral tetraploid Zea-Tripsacum ancestor than they are to the zfl1-contributing progenitor. To test this hypothesis against alternative possibilities, we generated alternate trees in which (1) Elionurus and Coelorachis are placed on a trifurcation with the zfl1 and zfl2 clades (H1) or (2) Elionurus and Coelorachis are grouped with zfl1 instead of zfl2 (H2). The relative consistency of these topologies with the data was tested using the SH test (Shimodaira and Hasegawa 1999). The SH test was not significant when comparing the original tree topology, which places Elionurus and Coelorachis with zfl2 (H0), with either H1 (P = 0.211) or H2 (P = 0.208), though H0 was labeled as the "best" tree in terms of likelihood scores (ln L = 7,254 vs. ln L = 7,256 for H1 and H2). These results suggest that the closer relationship of Elionurus and Coelorachis to zfl2 implied by the phylogenies is not robust, and thus the branch topology for these genera with respect to the duplicated maize zfl genes remains uncertain.
Andropogoneae Nucleotide Sequence Conservation
We used VISTA plots to visualize sequence conservation between the S. bicolor FLO/LFY-like sequence and zfl1 and zfl2. These plots reveal two highly conserved regions: one directly downstream to the start codon and a second spanning a series of leucine repeats (fig. 2). An acidic-basic domain in exon2 (1,0601,220 bp) shows greater amino acid sequence variability, but many charged positions are more highly conserved than adjacent uncharged residues (data not shown). Overall charge in this region appears largely conserved among the Andropogoneae sequences sampled: the 1,060- to 1,160-bp region contains a range of 1116 basicpositively charged amino acids (arginine and lysine residues), with an average of 13.9 basic residues per sequence; the bp 1,1601,220 region contains from 4 to 9 acidicnegatively charged amino residues (aspartic acid and glutamic acid), with an average of 8 acidic residues per sequence. For both regions, T. floridanum and Tripsacum zopilotense zfl1 sequences deviate from the remaining sequences in that they have the lowest number of charged amino acids (11 basic, 4 acidic). A proline-rich domain is found in the 5' region in all of the Andropogoneae FLO/LFY-like sequences but is highly variable at the primary sequence level.
|
|
Codon Bias in FLO/LFY-like Genes
The maize FLO/LFY orthologs zfl1 and zfl2 are GC rich, suggesting codon bias. Thus, to examine codon bias in these and other Andropogoneae FLO/LFY-like genes, we calculated the ENC (Wright 1990), which is independent of sequence length and amino acid composition. ENC can range from 20 (maximum bias; one codon used per amino acid) to 61 (no bias; all codons used equally). For the FLO/LFY-like genes in the Andropogoneae, ENC varies from 31.1 in S. bicolor to 37.5 in Z. luxurians zfl2, with an average ENC across the Andropogoneae of 33.6. These values suggest that strong codon bias is conserved throughout the tribe, and this places these genes among the more strongly biased genes reported to date in maize and other grasses (Fennoy and Bailey-Serres 1993; Zhang, Kosakovsky Pond, and Gaut 2001). Nevertheless, 8 of the 293 codon positions in our alignment are conserved for "unpreferred" or "rare" codons (A or T ending) in all 36 sequences, while an additional 18 rare codons are highly conserved (in >25 sequences). These apparently nonrandom patterns of codon usage suggest that synonymous sites in these genes are also subject to selection pressure throughout the Andropogoneae.
To estimate when codon bias arose in the FLO/LFY orthologs, we examined codon bias in full-length FLO/LFY-like cDNAs available in GenBank using ENC values and GC3s (table 3). We found that for dicot FLO/LFY orthologs ENC values are generally high (suggesting low bias) and GC3s values are correspondingly low (with the exception of Eucalyptus globulus; see table 3), while the grass orthologs (from rice, Lolium, and maize) have low ENC and high GC3s values suggestive of strong codon bias. Interestingly, ENC is also high and GC3s low in FLO/LFY orthologs reported from other monocots, including several orchids and a rush (Juncus effusus), though the latter is closely related to the grass family (Bremer 2000). This result extends a previous finding that high overall genomic GC content observed in grasses is not prevalent in other monocots (Salinas et al. 1988; Montero et al. 1990). The fern (Ceratopteris) and pine (Pinus) orthologs have ENC and GC3s values similar to the unbiased dicot species (table 3), suggesting that low bias may be the ancestral state for these genes. Furthermore, the high bias in grasses and eucalyptus suggests that codon bias in these genes may have arisen multiple times.
|
To test whether specific Andropogoneae FLO/LFY-like sequences have different values (which might suggest differing evolutionary constraints in different species), we calculated the likelihood of the phylogeny under a model allowing different
values on each branch of the tree (model F in tables 4 and 5) and a model constraining all branches to a single
value (model 0 in tables 4 and 5). Model F has a significantly less negative likelihood score relative to model 0 (table 5), but branch-specific
values under model F are all lower than one (ranging from 0.0001 to 0.37). This suggests that despite variation in
values among species, a model primarily involving purifying selection and constraint on amino acid sequence best explains the evolution of Andropogoneae FLO/LFY-like sequences.
|
|
Analyses performed using all of the models we tested for the Andropogoneae FLO/LFY orthologs assign the majority of codons in each sequence values close to zero. Model ß, which estimates the distribution of
values according to a ß distribution (0 <
< 1), yields shape parameter estimates (p and q) describing an L-shaped distribution with maximum density near zero (table 4). While model ß +
has a significantly better likelihood score than model ß (table 5), only 2% of the codons are placed in the
> 1 category. Model 2 places 0.5% of the codons in the
> 1 category (table 4) and has a significantly lower likelihood score than model 0, which does not allow any sites with
> 1 (table 5). Of the codons placed in the
> 1 class under models 2 and ß +
, only one codon is statistically significant, but this codon lies in the variable proline-rich region and is absent from many of the sequences. Thus, this result is inconclusive, and overall there is no convincing evidence that individual codons in the exon1-exon2 region of the Andropogoneae FLO/LFY sequences are experiencing positive selection for amino acid changes.
zfl1 and zfl2 as Duplicate Genes
Because gene duplication may subsequently allow relaxed selective constraint on one or both paralogs (Force et al. 1999; Lynch and Conery 2000), we examined the evolution of the zfl1-zfl2 duplicates in the Zea-Tripsacum clade in more detail. Pairwise values are close to zero for the Zea and Tripsacum zfl1 and zfl2 sequences, suggesting that purifying selection is the dominant force acting on both paralogs (table 6). We tested whether the duplication of zfl-like genes resulted in a change in nucleotide substitution rate in either paralog by comparing the duplicate sequences from five species (T. floridanum, T. andersonii, T. zopilotense, Z. mays, and Z. luxurians) with S. bicolor using Tajima's relative-rate test (Tajima 1993). The test is statistically significant for only one species (T. floridanum; table 5), suggesting that the zfl1 and zfl2 clades are evolving at similar rates. The significant relative-rate test for T. floridanum suggests that the zfl1 gene from this species has accumulated an excess of unique mutations when compared with T. floridanum zfl2, implying a relaxation of evolutionary constraint. In support of this, we identified several potentially deleterious mutations in T. floridanum zfl1: (1) a leucine to valine mutation of a central leucine repeat residue that is highly conserved among angiosperm FLO/LFY orthologs examined (data not shown), (2) a second leucine to valine mutation within the leucine repeat region, (3) a mutation in the highly conserved RY repeat sequence in intron1 (table 2), and (4) at least two small deletions and three base changes that result in a lower number of charged residues in the acidic and basic regions than observed in other Andropogoneae sequences (see above).
|
We calculated nucleotide diversity () for the maize and parviglumis zfl2 sequences, as well as for each region within the gene (table 7). Overall nucleotide diversity for maize zfl2 sequences compared with parviglumis sequences is approximately 62% (table 7), which is similar to a loss of genetic diversity previously attributed to the maize domestication bottleneck (Eyre-Walker et al. 1998; White and Doebley 1999). We observed a drop in relative nucleotide diversity to 21% in the first intron (table 7). While such a drop in relative diversity can be indicative of selection, the most common parviglumis haplotype in this region (observed in 5 of the 13 sequences) is identical to the predominant maize haplotype (observed in 10 of the 16 maize samples). Furthermore, while only two segregating single-nucleotide polymorphisms are observed in the maize zfl2 intron1 sequences, insertion-deletion variation defines at least three different haplotypes in the maize sample, two of which are also found in the parviglumis sample (see alignment 2 in Supplementary Material). These results suggest that the observed drop in relative diversity in maize compared with parviglumis in intron1 is unlikely to result from a selective sweep having acted on this region during maize domestication.
|
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In addition to selective constraints at nonsynonymous sites, our data suggest that evolutionary constraints also exist for synonymous and some noncoding sites in these sequences. Constraint on synonymous sites within the coding region is suggested by the strong codon bias we observed in Andropogoneae FLO/LFY-like sequences. Strong codon bias has been previously observed for numerous other nuclear genes in grasses, suggesting that this pressure operates at the genomic level and is probably not specific to any unique properties of the FLO/LFY genes (Salinas et al. 1988; Montero et al. 1990; Fennoy and Bailey-Serres 1993; Zhang, Kosakovsky Pond, and Gaut 2001). However, while the majority of codons in the Andropogoneae FLO/LFY sequences have a G or C in synonymous third positions, several codons are conserved for rare (A or U ending) codons. Conservation of rare codons suggests that selection at some sites acts against the prevailing codon bias to maintain these synonymous sites. Rare codon use at specific positions has been implicated in basic aspects of messenger RNA function such as splicing (Schaal and Maniatis 1999) and translational pausing to allow protein domain folding (Purvis et al. 1987).
Recently, the rice FLO/LFY ortholog, RFL, was shown to harbor functional regulatory elements within both of its introns (Prasad, Kushalappa, and Vijayraghavan 2003). Because conservation of noncoding sequences is a useful criterion for identifying putative regulatory elements (Koch et al. 2001; Levy, Hannenhalli, and Workman 2001; Kaplinsky et al. 2002; Hong et al. 2003), we examined the first intron from the Andropogoneae FLO/LFY sequences for conservation of sequences that might indicate the presence of important regulatory elements. We found multiple regions of moderate to high sequence conservation, several of which contain sequences with similarity to known transcription factorbinding sites. The most highly conserved sequence is a RY repeat sequence previously proposed as a putative regulatory site in rice (Prasad, Kushalappa, and Vijayraghavan 2003). RY sequences are known to be binding sites for B3 transcription factors and are important for spatial regulation of target genes (Baumlein et al. 1992; Reidt et al. 2000). The near-perfect conservation of this eight-nucleotide sequence in the Andropogoneae and rice strongly suggests that this site is important for gene regulation in these species, and thus it would be extremely interesting to determine which, if any, protein binds to this site.
Interestingly, four species within the tetraploid Zea-Tripsacum clade harbor point mutations in the RY repeat sequence. For three of these individuals, we also isolated the paralog and found in each case that the duplicate has retained a wild-type RY sequence. While this sample size is not large enough to draw firm conclusions, this observation suggests that constraint on the RY repeat sequence may be relaxed when a duplicate locus is present but that maintenance of at least one paralog with a wild-type RY repeat is functionally important. In contrast to the highly conserved RY repeat sites, other potential transcription factorbinding sites in intron1 show greater variability. This suggests either that the similarity of the sequences to binding sites is incidental or that there is potential plasticity in gene regulation that may allow evolution of novel expression patterns or levels. To address this, it would be important to determine whether there are differences in FLO/LFY ortholog expression between these species and whether these differences correlate with sequences differences in putative regulatory elements.
One of our primary interests in generating a phylogeny for FLO/LFY-like genes from the Andropogoneae was to examine the relationship of the duplicate maize FLO/LFY orthologs, zfl1 and zfl2, with sequences from related species to better understand the origin of the paralogs. The phylogeny presented in this study is unique among previously published Andropogoneae gene phylogenies (Spangler et al. 1999; Lukens and Doebley 2001; Mathews et al. 2002) in that it includes both members of a maize duplicate gene set. Sequences from the zfl1 and zfl2 duplicate loci form two clades within the Zea-Tripsacum group, supporting our previous hypothesis (Bomblies et al. 2003) that the zfl genes were duplicated in the tetraploidy event preceding the Zea-Tripsacum divergence (Gaut and Doebley 1997). The relationships of duplicated Zea-Tripsacum genes with orthologous sequences from other Andropogoneae are of interest because the progenitor(s) of the tetraploidy event are as yet unknown. If tetraploidy results from genome doubling within a single species (autotetraploidy), duplicate genes should be more closely related to one another than either is to sequences from related diploids. If tetraploidy results from hybridization of two species (allotetraploidy), then the duplicate sequences should be more closely related to orthologs from species related to the diploid ancestors than the paralogs are to one another. In the FLO/LFY-like gene phylogenies we obtained, the genera Elionurus and Coelorachis are supported as close relatives of Zea and Tripsacum and show a closer relationship to zfl2 than to zfl1. However, because statistical tests fail to reject alternate hypotheses regarding the relationship of Elionurus and Coelorachis to zfl1 and zfl2, the phylogenetic relationships of the duplicates remain unclear, and we cannot conclude from these data whether the tetraploid ancestor of the Zea-Tripsacum clade arose by auto- or allotetraploidy. Detailed phylogenetic analyses of additional duplicate genes in the Zea-Tripsacum clade with orthologous sequences from closely related genera could shed light on this question in the future.
Duplicate genes are of broad interest because redundancy may release paralogs from evolutionary constraint and thus provide "raw material" for evolution (Ohno 1970). Theoretical studies suggest that relaxed constraint on duplicate genes may result in several fates, including loss of function of one paralog, evolution of novel function(s), or long-term maintenance of both paralogs (Force et al. 1999; Lynch and Conery 2000). We have previously shown that zfl1 and zfl2 are largely redundant in maize (Bomblies et al. 2003), but we do not know whether this is true for other Zea and Tripsacum species. ratios for the five Zea and Tripsacum species (including maize) from which we isolated both paralogs are close to zero for both genes, suggesting that both the zfl1 and zfl2 protein sequences are evolutionarily constrained throughout the clade. Relative-rate tests confirm that in most of the duplicate pairs, the paralogs from a given species are evolving at similar rates. However, the duplicate pair from T. floridanum shows a significant relative-rate test. We argue that this is not merely an artifact of performing multiple tests because T. floridanum zfl1 carries several potentially deleterious mutations, including two leucine to valine changes within a highly conserved series of leucine repeats. While leucine to valine is often considered a "conservative" change due to the structural similarity and neutral charge of these amino acids, in several cases leucine to valine mutations in the context of leucine repeats have been shown to destabilize coiled-coil structures in proteins (Zhu et al. 1993) or abolish protein-protein interactions necessary for transcription factor function (Wang et al. 2001). However, because the functional consequences of these and the other mutations observed in T. floridanum zfl1 are not known, we cannot conclude whether this gene is deteriorating into a pseudogene or acquiring a novel function. Overall, our results for the zfl1 and zfl2 paralogs add to the expanding literature demonstrating that gene duplicates frequently show evidence of purifying selection pressure for amino acid conservation acting on both duplicates and long-term maintenance of paralogs (Van de Peer et al. 2001; Conant and Wagner 2003; Hileman and Baum 2003).
We have previously proposed zfl2 as a candidate gene for a major-effect maize domestication QTL for inflorescence architecture (Bomblies et al. 2003). Thus, we asked whether the maize zfl2 gene shows evidence of selection during domestication by examining zfl2 sequence diversity in maize and its wild ancestor parviglumis. We found that the drop in relative nucleotide diversity and elevated LD observed in maize zfl2 relative to parviglumis are similar to values previously reported for neutrally evolving loci and can thus be explained by the domestication bottleneck effect alone (Eyre-Walker et al. 1998; White and Doebley 1999). However, it is important to point out that in maize, recombination often occurs preferentially in or near coding regions (Fu, Zheng, and Dooner 2002). Thus, if selection has acted on nearby regulatory regions, its molecular signature may not be detected in the coding region. Such a result has been observed for the teosinte branched1 gene in maize, which shows strong evidence of selection in upstream regulatory regions, but nearly neutral nucleotide diversity patterns in the coding region (Wang et al. 1999; Clark et al. 2004). Thus, concluding whether or not selection has acted on zfl2 during maize domestication will require further study of surrounding genomic regions.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Neelima Sinha, Associate Editor
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Ahearn, K. P., H. A. Johnson, D. Weigel, and D. R. Wagner. 2001. NFL1, a Nicotiana tabacum LEAFY-like gene, controls meristem initiation and floral structure. Plant Cell Physiol. 42:11301139.
Baumlein, H., I. Nagy, R. Villarroel, D. Inze, and U. Wobus. 1992. Cis-analysis of a seed protein gene promoter: the conservative RY repeat CATGCATG within the legumin box is essential for tissue-specific expression of a legumin gene. Plant J. 2:233239.[CrossRef][ISI][Medline]
Bomblies, K., R. L. Wang, B. A. Ambrose, R. J. Schmidt, R. B. Meeley, and J. Doebley. 2003. Duplicate FLORICAULA/LEAFY homologs zfl1 and zfl2 control inflorescence architecture and flower patterning in maize. Development 130:23852395.
Bray, N., I. Dubchak, and L. Pachter. 2003. AVID: a global alignment program. Genome Res. 13:97102.
Bremer, K. 2000. Early Cretaceous lineages of monocot flowering plants. Proc. Natl. Acad. Sci. USA 97:47074711.
Clark, R. M., E. Linton, J. Messing, and J. F. Doebley. 2004. Pattern of diversity in the genomic region near the maize domestication gene tb1. Proc. Natl. Acad. Sci. USA 101:700707.
Coen, E. S., J. M. Romero, S. Doyle, R. Elliott, G. Murphy, and R. Carpenter. 1990. floricaula: a homeotic gene required for flower development in Antirrhinum majus. Cell 63:13111322.[ISI][Medline]
Conant, G. C., and A. Wagner. 2003. Asymmetric sequence divergence of duplicate genes. Genome Res. 13:20522058.
Doebley, J. 2004. The genetics of maize evolution. Annu. Rev. Genet. 38:3759.[CrossRef][Medline]
Eyre-Walker, A., R. L. Gaut, H. Hilton, D. L. Feldman, and B. S. Gaut. 1998. Investigation of the bottleneck leading to the domestication of maize. Proc. Natl. Acad. Sci. USA 95:44414446.
Fay, J. C., and C. I. Wu. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:14051413.
Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368376.[ISI][Medline]
Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783791.[ISI]
Fennoy, S. L., and J. Bailey-Serres. 1993. Synonymous codon usage in Zea mays L. nuclear genes is varied by levels of C and G-ending codons. Nucleic Acids Res. 21:52945300.[Abstract]
Filatov, D. A. 2002. ProSeq: a software for preparation and evolutionary analysis of DNA sequence data sets. Mol. Ecol. Notes 2:621624.[CrossRef][ISI]
Force, A., M. Lynch, F. B. Pickett, A. Amores, Y. L. Yan, and J. Postlethwait. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151:15311545.
Fu, H., Z. Zheng, and H. K. Dooner. 2002. Recombination rates between adjacent genic and retrotransposon regions in maize vary by 2 orders of magnitude. Proc. Natl. Acad. Sci. USA 99:10821087.
Gaut, B. S., and J. F. Doebley. 1997. DNA sequence evidence for the segmental allotetraploid origin of maize. Proc. Natl. Acad. Sci. USA 94:68096814.
Goldman, N. 1993. Statistical tests of models of DNA substitution. J. Mol. Evol. 36:182198.[CrossRef][ISI][Medline]
Goldman, N., and Z. Yang. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725736.
Goloubinoff, P., S. Pääbo, and A. C. Wilson. 1993. Evolution of maize inferred from sequence diversity of an Adh2 gene segment from archaeological specimens. Proc. Natl. Acad. Sci. USA 90:19972001.
Higo, K., Y. Ugawa, M. Iwamoto, and T. Korenaga. 1999. Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic acids Res. 27:297300.
Hileman, L. C., and D. A. Baum. 2003. Why do paralogs persist? Molecular evolution of CYCLOIDEA and related floral symmetry genes in Antirrhineae (Veronicaceae). Mol. Biol. Evol. 20:591600.
Hofer, J., L. Turner, R. Hellens, M. Ambrose, P. Matthews, A. Michael, and N. Ellis. 1997. UNIFOLIATA regulates leaf and flower morphogenesis in pea. Curr. Biol. 7:581587.[ISI][Medline]
Hong, R. L., L. Hamaguchi, M. A. Busch, and D. Weigel. 2003. Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing. Plant Cell 15:12961309.
Hudson, R. R. M., M. Kreitman, and M. Aguadé. 1987. A test of neutral molecular evolution based on nucleotide data. Genetics 116:153159.
Huelsenbeck, J. P., and F. Ronquist. 2001. MRBAYES: Bayesian inference of phylogeny. Bioinformatics 17:754755.
Kaplinsky, N. J., D. M. Braun, J. Penterman, S. A. Goff, and M. Freeling. 2002. Utility and distribution of conserved noncoding sequences in the grasses. Proc. Natl. Acad. Sci. USA 99:61476151.
Kellogg, E. A. 2000. Molecular and morphological evolution in Andropogoneae. Pp. 149158 in S. W. L. Jacobs and J. E. Everett, eds. Grasses: systematics and evolution. Commonwealth Scientific and Industrial Research Organization (CSIRO), Collingwood, Victoria, Australia.
Koch, M. A., B. Weisshaar, J. Kroymann, B. Haubold, and T. Mitchell-Olds. 2001. Comparative genomics and regulatory evolution: conservation and function of the Chs and Apetala3 promoters. Mol. Biol. Evol. 18:18821891.
Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:12441245.
Kyozuka, J., S. Konishi, K. Nemoto, T. Izawa, and K. Shimamoto. 1998. Down-regulation of RFL, the FLO/LFY homolog of rice, accompanied with panicle branch initiation. Proc. Natl. Acad. Sci. USA 95:19791982.
Levy, S., S. Hannenhalli, and C. Workman. 2001. Enrichment of regulatory signals in conserved non-coding genomic sequence. Bioinformatics 17:871877.
Lukens, L., and J. F. Doebley. 2001. Molecular evolution of the teosinte branched 1 gene among maize and related grasses. Mol. Biol. Evol. 18:627638.
Lynch, M., and J. S. Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:11511155.
Mathews, S., R. E. Spangler, R. J. Mason-Gamer, and E. A. Kellogg. 2002. Phylogeny of Andropogoneae inferred from Phytochrome B, GBSSI, and ndhf. Int. J. Plant Sci. 163:441450.[CrossRef][ISI]
Molinero-Rosales, N., M. Jamilena, S. Zurita, P. Gomez, J. Capel, and R. Lozano. 1999. FALSIFLORA, the tomato orthologue of FLORICAULA and LEAFY, controls flowering time and floral meristem identity. Plant J. 20:685693.[CrossRef][ISI][Medline]
Montero, L. M., J. Salinas, G. Matassi, and G. Bernardi. 1990. Gene distribution and isochore organization in the nuclear genome of plants. Nucleic Acids Res. 18:18591867.[Abstract]
Nielsen, R., and Z. Yang. 1998. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929936.
Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Heidelberg, Germany.
Page, R. D. M. 1996. TREEVIEW: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12:357358.[Medline]
Posada, D., and K. A. Crandall. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817818.[Abstract]
Prasad, K., K. Kushalappa, and U. Vijayraghavan. 2003. Mechanism underlying regulated expression of RFL, a conserved transcription factor, in the developing rice inflorescence. Mech. Dev. 120:491502.[CrossRef][ISI][Medline]
Purvis, I. J., A. J. Bettany, T. C. Santiago, J. R. Coggins, K. Duncan, R. Eason, and A. J. Brown. 1987. The efficiency of folding of some proteins is increased by controlled rates of translation in vivo. A hypothesis. J. Mol. Biol. 193:413417.
Rambaut, A. 1996. Se-Al: sequence alignment editor. (http://evolve.zoo.ox.ac.uk/).
Reidt, W., T. Wohlfarth, M. Ellerstrom, A. Czihal, A. Tewes, I. Ezcurra, L. Rask, and H. Baumlein. 2000. Gene regulation during late embryogenesis: the RY motif of maturation-specific gene promoters is a direct target of the FUS3 gene product. Plant J. 21:401408.[CrossRef][ISI][Medline]
Remington, D. L., J. M. Thornsberry, Y. Matsuoka, L. M. Wilson, S. R. Whitt, J. Doebley, S. Kresovich, M. M. Goodman, and E. S. Buckler IV. 2001. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc. Natl. Acad. Sci. USA 98:1147911484.
Rozas, J., J. C. Sanchez-DelBarrio, X. Messeguer, and R. Rozas. 2003. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics 19:24962497.
Salinas, J., G. Matassi, L. M. Montero, and G. Bernardi. 1988. Compositional compartmentalization and compositional patterns in the nuclear genomes of plants. Nucleic Acids Res. 16:42694285.[Abstract]
Schaal, T. D., and T. Maniatis. 1999. Selection and characterization of pre-mRNA splicing enhancers: identification of novel SR protein-specific enhancer sequences. Mol. Cell. Biol. 19:17051719.
Schug, J., and G. C. Overton. 1997. TESS: transcription element search software on the WWW. Technical report CBIL-TR-1997-1001-v0.0. Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania, Philadelphia, Pa.
Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:11141116.
Shu, G., W. Amaral, L. C. Hileman, and D. A. Baum. 2000. LEAFY and the evolution of rosette flowering in violet cress (Jonopsidium acaule, Brassicaceae). Am. J. Bot. 87:634641.
Simonsen, K. L., G. A. Churchill, and C. A. Aquadro. 1995. Properties of statistical tests of neutrality for DNA polymorphism data. Genetics 141:413429.
Souer, E., A. van der Krol, D. Kloos, C. Spelt, M. Bliek, J. Mol, and R. Koes. 1998. Genetic control of branching pattern and floral identity during Petunia inflorescence development. Development 125:733742.
Spangler, R., B. Zaitchik, E. Russo, and E. Kellogg. 1999. Andropogoneae evolution and generic limits in Sorghum (Poaceae) using ndhf sequences. Syst. Bot. 24:267281.[ISI]
Swofford, D. L. 2003. PAUP*. Phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
Tajima, F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585595.
Tajima, F. 1993. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135:599607.
Talbert, L. E., J. F. Doebley, S. R. Larson, and V. L. Chandler. 1990. Tripsacum andersonii is a natural hybrid involving Zea and Tripsacum: molecular evidence. Am. J. Bot. 77:722726.[ISI]
Tavaré, S. 1986. Some probabilistic and statistical problems on the analysis of DNA sequences. Lect. Math. Life Sci. 17:5786.
Tenaillon, M. I., M. C. Sawkins, A. D. Long, R. L. Gaut, J. F. Doebley, and B. S. Gaut. 2001. Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc. Natl. Acad. Sci. USA 98:91619166.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
Tiffin, P., and B. S. Gaut. 2001. Sequence diversity in the tetraploid Zea perennis and the closely related diploid Z. diploperennis: insights from four nuclear loci. Genetics 158:401412.
Van de Peer, Y., J. S. Taylor, I. Braasch, and A. Meyer. 2001. The ghost of selection past: rates of evolution and functional divergence of anciently duplicated genes. J. Mol. Evol. 53:436446.[CrossRef][ISI][Medline]
Wall, J. D. 1999. Recombination and the power of statistical tests of neutrality. Genet. Res. 74:6579.[CrossRef][ISI]
Wang, R. L., A. Stec, J. Hey, L. Lukens, and J. Doebley. 1999. The limits of selection during maize domestication. Nature 398:236239.[CrossRef][ISI][Medline]
Wang, Y., W. Devereux, T. M. Stewart, and R. A. Casero Jr. 2001. Characterization of the interaction between the transcription factors human polyamine modulated factor (PMF-1) and NF-E2-related factor 2 (Nrf-2) in the transcriptional regulation of the spermidine/spermine N1-acetyltransferase (SSAT) gene. Biochem. J. 355:4549.[CrossRef][ISI][Medline]
Weigel, D., J. Alvarez, D. R. Smyth, M. F. Yanofsky, and E. M. Meyerowitz. 1992. LEAFY controls floral meristem identity in Arabidopsis. Cell 69:843859.[ISI][Medline]
White, S. E., and J. F. Doebley. 1999. The molecular evolution of terminal ear1, a regulatory gene in the genus Zea. Genetics 153:14551462.
Wright, F. 1990. The effective number of codons used in a gene. Gene 87:2329.[CrossRef][ISI][Medline]
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555556.[Medline]
Yang, Z. 1998. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39:306314.[CrossRef]
Yang, Z., R. Nielsen, N. Goldman, and A. M. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431449.
Yoon, H. S., and D. Baum. 2004. Transgenic study of parallelism in plant morphological evolution. Proc. Natl. Acad. Sci. USA. 101:65246529.
Zhang, L., S. Kosakovsky Pond, and B. S. Gaut. 2001. A survey of the molecular evolutionary dynamics of twenty-five multigene families from four grass taxa. J. Mol. Evol. 52:144156.[ISI][Medline]
Zhu, B. Y., N. E. Zhou, C. M. Kay, and R. S. Hodges. 1993. Packing and hydrophobicity effects on protein folding and stability: effects of beta-branched amino acids, valine and isoleucine, on the formation and stability of two-stranded alpha-helical coiled coils/leucine zippers. Protein Sci. 2:383394.
|