*Department of Ecology and Evolutionary Biology, University of California;
Department of Biology, University of North Carolina at Chapel Hill
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
It is even more difficult to identify causes of synonymous rate variation among genes, but at least three molecular characteristics correlate with synonymous substitution rates. The first characteristic is GC content (Ticher and Graur 1989
), which affects synonymous substitution rates because CpG dinucleotides are subject to high rates of mutation (Tsunoyama, Bellgard, and Gojobori 2001
). The second characteristic is codon bias. Highly biased genes evolve more slowly, on average (Sharp and Li 1987
; Akashi 1994a
, 2001
; Eyre-Walker and Bulmer 1995
), presumably because selection for codon use limits the number of acceptable synonymous nucleotide substitutions in highly biased genes. Finally, mutation rates can differ among genomic regions, contributing to variation in synonymous substitution rates among genes (Wolfe, Sharp, and Li 1989a
; Matassi, Sharp, and Gautier 1999
). However, none of these three factors alone consistently explains variation in synonymous substitution rates (Ticher and Graur 1989
; Wolfe, Sharp, and Li 1989a
; Akashi 1994b, 1997
; Matassi, Sharp, and Gautier 1999
), and the interdependence between these factors is not always clear.
An important prerequisite for understanding causes of rate variation is to characterize the distribution of rates among genes. To date, most studies of the distribution of synonymous and nonsynonymous substitution rates have been restricted to animals. In plants, several studies have examined rate variation across evolutionary lineages of mitochondrial and chloroplast genes (e.g., Gaut et al. 1992
; dePamphilis, Young, and Wolfe 1997
; Laroche, Maggia, and Bousquet 1997
), but few studies have either characterized rate variation among plant nuclear genes or discerned the factors contributing to rate variation (Wolfe, Sharp, and Li 1989b
; Alvarez-Valin et al. 1999
). One possible reason for the dearth of plant studies is that most plant nuclear genes are members of multigene families. Copy number within multigene families fluctuates (Clegg, Cummings, and Durbin 1997
), and as a result it is often difficult to identify orthologs between species and hence to compare substitution rates among genes.
The recently sequenced Arabidopsis thaliana genome offers a unique opportunity to study substitution rate variation across plant nuclear genes. Many large arabidopsis chromosomal segments are duplicated (Mayer et al. 1999
; AGI 2000
), and these segments contain genes that were likely duplicated contemporaneously. Genetic distances can be compared among pairs of duplicated sequences (or gene pairs). For this special case in which the time of duplication is equivalent for all gene pairs, comparing genetic distances among gene pairs is equivalent to comparing nucleotide substitution rates among gene pairs.
In addition to providing insight into rate variation among genes, sequence data from duplicated chromosomal regions also facilitate characterization of patterns of sequence divergence between paralogs. Divergence patterns between paralogs have been of interest since Ohno (1970)
suggested that most adaptive changes occur after gene duplication (see also Ohta and Kimura 1973
). At the molecular level, adaptive changes can be detected by measuring the ratio of nonsynonymous (Ka) to synonymous substitution (Ks). A Ka/Ks ratio >1 is strong evidence for positive selection having acted during sequence divergence, and a Ka/Ks < 1 is consistent with purifying selection, although it does not rule out the possibility that positive selection acted. Although the sensitivity of Ka/Ks to detect positive selection can be low (Hughes 1999
), the distribution of Ka/Ks among genes can be useful for characterizing the relative strengths of evolutionary forces acting on individual genes (Charlesworth, Charlesworth, and McVean 2001)
.
Sequence divergence can have other measurable effects. For example, shifts in selective constraint after duplication can lead to changes in rates of nucleotide substitution in one or both of the duplicated sequences (Goodman, Moore, and Matsuda 1975
; Li and Gojobori 1983
; Gonzalez and Jordan 2000
). Similarly, paralogs often diverge in synonymous codon usage (Gaut et al. 1999
; Zhang, Kosakovsky-Pond, and Gaut 2001b
), either as a function of changes in mutation biases or as a function of shifts in gene expression (Fennoy and Bailey-Serres 1993
; Duret and Mouchiroud 1999
).
The large number of duplicated genes in A. thaliana permits unprecedented examination of rate variation among plant nuclear genes and also facilitates study of patterns of evolutionary divergence between paralogs. Here we characterize patterns of molecular evolution in a duplicated region between arabidopsis chromosomes 2 and 4. With data from 242 gene pairs, we address the following questions: (1) What is the distribution of synonymous and nonsynonymous substitution rates among gene pairs? (2) Are synonymous and nonsynonymous substitution rates a function of physical location, GC content, or codon usage? (3) Is there evidence for positive selection acting during the divergence of paralogs? (4) Is sequence divergence between paralogs accompanied by divergence in codon usage or evolutionary rate?
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The chromosome 24 duplication was further partitioned into five blocks (numbered 45, 49, 52, 54, and 56 by Vision, Brown, and Tanksley 2000
) that differ in their order and orientation relative to one another on the two chromosomes but show only minor within-block rearrangement (fig. 1
). A minimum of four inversions is necessary to convert the order and orientation of blocks on one chromosome into that of the other. It is thought that these blocks arose from a single duplication event, both because of their suggestive spatial arrangement and based on a genome-wide analysis of the distribution of amino acid distances between duplicated genes (Vision, Brown, and Tanksley 2000
). It should be emphasized, however, that our analyses do not assume that all blocks were duplicated at the same time, but we do assume that gene pairs within a single block were duplicated contemporaneously when they were in collinear order.
|
The accession numbers of the 484 sequences used in the analyses are available at bgbox.bio.uci.edu/data/lq2acn.html.
Alignment
Amino acid alignments were obtained with ClustalW (Thompson, Higgins, and Gibson 1994
), using default parameters. Visual inspection revealed that the alignments around gaps were sometimes ambiguous, and we therefore analyzed two alignment data sets. In the first data set, the residues around gaps were included. In the second data set, five amino acids were removed on either side of each gap. The two sets of amino acid alignments were then back translated using the known coding sequences to produce DNA sequence alignments. Gap treatment did not qualitatively affect our results, and we therefore report only the results for the data set with the original alignments. The DNA sequence alignments are available at bgbox.bio.uci.edu/.
Sequence Analyses
We calculated the percent identity of DNA sequences and protein sequences for all gene pairs. The expected relationship between DNA and amino acid identity under strictly neutral evolution was obtained by simulation, using Evolver (Yang 1997
). We simulated DNA sequences with distances ranging from 0.1 to 2.0 along distance intervals of 0.05 units, corresponding to the range of distances observed between duplicated genes in our data set. For each distance, 20 pairs of DNA sequences of 500 codons were simulated. For all simulated data, percent identity was calculated from DNA sequences and translated protein sequences.
Ks and Ka between duplicated sequences were estimated by the maximum likelihood (ML) method implemented in PAML (Yang 1997
), using the codon model of Goldman and Yang (1994)
. We used a likelihood ratio (LR) comparison to test for a Ka/Ks ratio different from 1.0. To do this, two models were applied to the data: model 0 constrains the Ka/Ks ratio to 1.0, and model 1 estimates the Ka/Ks ratio as a free parameter. The LR of model 0 and model 1 was compared with the
2 distribution with one degree of freedom, as detailed by Yang (1998)
.
Relative rate tests were performed with HYPHY (http://peppercat.statgen.ncsu.edu/hyphy/), for both protein sequences and DNA sequences. In order to apply the relative rate test, we obtained outgroup sequences for a large number of the gene pairs. Ku et al. (2000)
proposed that the duplication between chromosomes 2 and 4 occurred after the split of the tomato and A. thaliana lineages. If this is true, then plants as divergent, or more divergent, from A. thaliana than tomato can be used as the source of outgroup sequences. The Rosidae are the largest phylogenetic grouping in the NCBI taxonomy that contains A. thaliana and excludes tomato. Accordingly, we searched for candidate outgroup sequences among all GenBank records derived from angiosperms, excluding the Rosidae. TBLASTX (Altschul et al. 1997
) was used to find matches between the 242 genes and the DNA sequences in the database. We required that both arabidopsis genes showed a strong match (expected value <1 x 10-10) to at least one of the sequences in the database for which a predicted or experimentally determined protein translation was available. In cases where multiple GenBank records met these criteria, the outgroup was selected from among the following NCBI-derived taxonomic groupings, in descending order of preference (and increasing taxonomic breadth): Asteridae (which includes tomato), core eudicots, eudicotyledons, and Magnoliophyta.
Each relative rate test was based on one gene pair and its outgroup. For DNA sequences, synonymous and nonsynonymous substitution rates were tested separately using the codon substitution model of Muse and Gaut (1994)
. For amino acid sequences, we used the substitution model of Dayhoff, Schwartz, and Orcutt (1978)
. The relative rate test uses an LR statistic to test the null hypothesis that two sequences evolve at equal rates (Muse and Weir 1992
).
The effective number of codons (ENC) and GC content at synonymous third codon positions (GCS) were calculated using codon W (http://www.molbiol.ox.ac.uk/cu/). We used the ENC as a measure of codon usage because it is not biased by gene length, given a certain minimum length, or by amino acid composition (Wright 1990
; Comeron and Aguade 1998
). ENC values range from 20 to 61; a value of 20 represents extreme bias, and a value of 61 indicates random codon use. We tested for homogeneity of ENC and GCS between duplicated sequences by a permutation procedure described previously (Zhang, Kosakovsky-Pond, and Gaut 2001b
), with the slight modification that the test statistic was the difference in ENC (or GCS) between sequences rather than the variance in ENC (or GCS) among sequences. Test statistics were based on 1000 permutations.
Spatial correlation of Ka and Ks
To examine the relationship between evolutionary distance and physical position, we performed a spatial correlation analysis. The following statistic (Chatfield 1999
, p. 20) was calculated for a range of distances between gene pairs, where distance was defined as the number of intervening genes between gene pairs:
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
Over the whole region, 58% (141/242) of gene pairs had higher DNA than protein identity, and DNA sequence identity was always higher than protein sequence identity when amino acid identity was greater than 80% (fig. 3
). To determine the relationship between DNA and protein identities relative to a model of strictly neutral evolution, we simulated pairs of gene sequences under neutrality. For simulations, Ka/Ks was set to 1.0;
, the transition and transversion parameter was set to 3.0; and the codon frequency matrix was based on tabulated frequencies from 14,647,315 A. thaliana codons (http://www.kazusa.or.jp/codon/). The simulations indicated that DNA sequence identity was always higher than protein sequence identity under the neutral model (fig. 3
), and the result was qualitatively similar with different transition:transversion ratios (
= 2, 3, or 4; data not shown). In our arabidopsis data set, amino acid sequence identity was always higher than that for simulated gene pairs with equivalent DNA sequence identity (fig. 3
). Thus, selective constraint appears to be slowing the pace of nonsynonymous nucleotide substitution for all the gene pairs in our data set.
|
We tested the relationship between substitution rate and physical distance with two methods. First, we applied a spatial autocorrelation test to both Ka and Ks. There were no significant results with Ka, but autocorrelation for Ks was greater than expected at the 5% significance level for some distancesfor example, genes separated by 10 genes are more highly correlated than expected at random (fig. 4
). We should note, however, that Ks autocorrelation statistics do not demonstrate significantly high autocorrelation in neighboring genes, as expected under the hypothesis that physical location is a contributing factor to variation in synonymous substitution rates among genes. Thus, the Ks results are difficult to interpret but suggest only weakly that physical location contributes to synonymous rate variation among genes. Second, we applied the permutation method of Williams and Hurst (2000)
to test the null hypothesis that rates are random with respect to physical location. This test also failed to reject the null hypothesis for Ka (P = 0.66), but the test was borderline significant for Ks (P = 0.067). Thus, both tests provide suggestive, but inconclusive, evidence that there could be a relationship between Ks and physical distance.
|
|
Divergence in Evolutionary Rate and Codon Usage Between Paralogs
We performed relative rate tests to examine whether sequences within a gene pair evolve at different evolutionary rates. Relative rate tests require an outgroup for each gene pair. Using the search strategy described in Materials and Methods, we identified putative outgroup sequences for 105 gene pairs. To examine the validity of these outgroup assignments, we examined the three pairwise genetic distances for each gene pair and its outgroup. In all 105 cases, paralogs had a smaller distance (as measured by both Ka and Ks) to one another than to the outgroup sequence (data not shown), suggesting that our outgroup assignments were reasonable.
Few paralogs demonstrated deviation from clock-like evolution after divergence. Relative rate tests using protein sequences resulted in 14 significant tests at P < 0.05; three remained significant after Bonferroni correction for an experiment-wide error of = 0.05. The relative rate tests for nonsynonymous substitution rates provided similar results, with 24 significant gene pairs; three remained significant after Bonferroni correction. Ten of the 14 gene pairs that were significant for protein sequences were also significant for Ka-based tests, showing that results were reasonably consistent between methods. For Ks, six of 105 gene pairs were significant at P < 0.05, but only one remained significant after Bonferroni correction.
To further characterize patterns of sequence divergence between duplicated genes, we measured both GCS and ENC for all gene pairs. Table 3
provides average ENC and GCS for both chromosomes and all blocks. We applied paired t-tests to determine whether the two chromosomes differ significantly for either measure; no significant difference was detected between chromosomes for either ENC (t = 0.478, P = 0.633) or GCS (t = -1.591, P = 0.112). This result did not preclude the possibility that any two paralagous sequences had diverged significantly in codon usage. To test this possibility, we applied a permutation test of homogeneous ENC (or GCS) to each of the 242 gene pairs (Zhang, Kosakovsky-Pond, and Gaut 2001b
). No paralogs exhibited significant difference in either measure after Bonferroni correction (data not shown), and thus there has been no detectable divergence in codon usage or GC content after duplication.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our study differs from previous studies of rate variation among genes in that we used paralogous, rather than orthologous, sequences to estimate substitution rates. One potential complication is that gene conversion can occur between paralogs. If gene conversion has occurred in some gene pairs but not others, the net effect is to increase the range and variance of Ka and Ks among gene pairs. We tested for the presence of gene conversion in all gene pairs with Sawyer's (1989)
method but found no evidence for conversion between paralogs (data not shown). We should note, however, that most tests for gene conversion detect events that affect a portion of a gene rather than a complete gene.
Despite the tendency of gene conversion to inflate the range of substitution rates, we found Ks ranges similar to previous studies. For example, Ks ranged 15-fold in a study of 24 Drosophila orthologs (Zeng et al. 1998
) and 20-fold in a study of 363 rat-mouse orthologs (Wolfe and Sharp 1993
). In our study, synonymous substitution rates varied up to 10.4-fold within an individual block and 13.8-fold over the entire data set (table 1
and fig. 2 ).
The distribution of Ka and Ks from arabidopsis gene pairs differs from previous studies of non-plant taxa in two noteworthy ways. First, the ratio of mean synonymous to mean nonsynonymous substitution rates is relatively low. The ratio is 5 for our arabidopsis data (table 1
), but it is as high as 24 for bacterial genes (Sharp 1991
) and
7 for rat-mouse comparisons (Wolfe and Sharp 1993
), based on comparisons between orthologous sequences. Differences in this ratio may reflect different population sizes and life histories (Sharp 1991
), but may also reflect differences between ortholog and paralog comparisons. For example, Kondrashov et al. (2002)
recently found that the ratio of synonymous to nonsynonymous substitution is
twofold lower between paralogs than orthologs. Thus, the forces affecting this ratio remain unclear at present. Second, the ratio of CVs between nonsynonymous and synonymous rates is lower for this study than previous studies. In rat-mouse comparisons, for example, the CV for nonsynonymous rates was fourfold higher than the CV for synonymous rates (Wolfe and Sharp 1993
). Among arabidopsis gene pairs, there is only a 1.5-fold difference between nonsynonymous and synonymous CVs (table 1
), indicating that synonymous and nonsynonymous rates vary among genes to roughly the same extent.
Our analyses represent the most extensive comparison of rates among plant nuclear genes to date. Some previous studies have been based on a much smaller number of sequence comparisons. For example, Wolfe, Sharp, and Li (1989b)
and Gaut (1998)
found that synonymous rates vary up to 2.5-fold among genes, but these studies were based on only 11 and nine nuclear genes, respectively. A more recent study made 212 orthologous sequence comparisons between arabidopsis and Brassica rapis (Tiffin and Hahn 2002
), but relied on EST data, which could bias results depending on the accuracy of EST data and the gene regions sequenced. Nonetheless, Tiffin and Hahn (2002)
described two rate characteristics that are similar to our results: (1) an
24-fold range of synonymous rate variation among genes and (2) a twofold difference in CV between nonsynonymous and synonymous rates. Together these two plant studies indicate that synonymous rates vary at least an order of magnitude among plant nuclear genes and also that variation in nonsynonymous and synonymous rates are relatively similar among genes, based on the CV. It is clear, however, that a general picture of plant nuclear gene evolution requires additional studies, particularly given the fact that different plant lineages evolve with different mutational patterns (Tiffin and Hahn 2002
) and different rates (e.g., Eyre-Walker and Gaut 1997
).
With the increasing availability of genome sequences, several studies have used genetic distances (either Ka or Ks) to make genome-wide inferences about the timing and frequency of gene and genome duplication (Gaut and Doebley 1997
; Lynch and Conery 2000
; Vision, Brown, and Tanksley 2000
; Friedman and Hughes 2001
). For example, Lynch and Conery (2000)
used Ks values between duplicated arabidopsis sequences as a proxy for divergence time; in essence they assumed that substitution rates were homogeneous among gene pairs. Their use of Ks has come under criticism (Long and Thornton 2001
; Zhang, Gaut, and Vision 2001a
), but this study provides quantitative insights about the degree to which the assumption can be misleading. In a worst-case scenario, Ks provides time estimates that differ up to 14-fold (table 1 ). However, the average effect of the homogeneous rate assumption is not nearly as dramatic because
90% of the gene pairs in this study fall within a Ks range of 0.4621.188 (fig. 2
), and thus most genes (
90%) vary less than 2.57-fold in rate. Overall, however, our study indicates that Ks variation among genes is generally higher than previously reported for plants (Wolfe, Sharp, and Li 1989b
; Gaut 1998
), and thus one should be cautious when equating Ks with time.
Factors Contributing to Rate Variation Among Gene Pairs
What are the forces that contribute to variation in evolutionary rates among gene pairs? The first possibility, discussed earlier, is gene conversion. Although we did not detect gene conversion in any of our gene pairs, the possibility of gene conversion should not be ignored. However, the fact that the range of variation found in our 242 gene pairs is similar to that of orthologous gene pairs from other studies (rat-mouse, drosophilids and Arabidopsis-Brassica; see earlier) suggests that the effects of gene conversion have been negligible. In addition, a similar study of yeast paralogs found no evidence of gene conversion, suggesting gene conversion may not be widespread (Pál, Papp, and Hurst 2001
).
The origin of duplicated chromosomes can also affect the variance in rates across gene pairs. If chromosomal duplication originated via a polyploid event, the original gene pairs contained varied levels of residual standing variation (polymorphism) at the onset of duplication. Standing variation contributes to variation in genetic distances among gene pairs (discussed in Gaut and Doebley 1997
). However, for highly diverged gene pairs like those we have studied here, the level of standing variation at the time of origin should be very low relative to the total divergence between paralogs. Hence the origin effect probably contributes little to the observed variance in substitution rates among the 242 arabidopsis gene pairs.
Physical location is a third potential contributor to rate variation among genes. For example, Williams and Hurst (2000)
found that nonsynonymous substitution rates between mouse and rat vary as a function of genome location. Similarly, Matassi, Sharp, and Gautier (1999)
documented location effects on synonymous substitution rates in human-rodent comparisons. Both of these effects likely reflect variable mutation rates along chromosomes (Casane et al. 1997
; Lercher, Williams, and Hurst 2001
). We examined the relationship between physical location and nucleotide substitution rates with two methods, and neither analysis provided evidence that physical location affects nonsynonymous substitution rates. In contrast, there is a hint that physical location and synonymous substitution rates are correlated, but the effect, if present, is weak. It should be noted that previous studies focused on whole genomes, with neighboring genes up to 5 cM apart (Matassi, Sharp, and Gautier 1999
; Williams and Hurst 2000
; Lercher, Williams, and Hurst 2001
), which is roughly 10 Mb in humans (Williams and Hurst 2000
). In contrast, our entire study focused on a region of 5 Mb, with neighboring genes separated by only a few kb. If the physical scale that affects substitution rates is large, our analyses could miss location effects.
Variation in selective constraint among proteins also contributes to rate variation among gene pairs. Deviation from a molecular clock is one potential measure of variation in selective constraint between paralogs, but on the whole our analyses uncovered little deviation from clock-like evolution. With some caveats, Ka/Ks can also be used for characterizing the relative strength of evolutionary forces acting on individual gene pairs. For the 242 gene pairs in our study, Ka/Ks ranged from 0.0 to 0.70, with a mean of 0.20, suggesting extensive variation in selective constraint among gene pairs. However, Ka/Ks never exceeded 1.0 (table 1 and fig. 3 ), and there is thus no evidence that positive selection has driven divergence between any of the paralogs on the duplicated regions of chromosomes 2 and 4. Altogether, variation in selective constraint, but not positive selection, likely contributes to variation in substitution rates among arabidopsis gene pairs.
Both GC content and codon bias have been shown to correlate with nucleotide substitution rates (Moriyama and Gojobori 1992
; Alvarez-Valin et al. 1999
), and hence GC content and codon bias may also contribute to nucleotide substitution rate variation among gene pairs. We detected no correlations between GCS and Ks in arabidopsis gene pairs, but did find a negative correlation between GCS and Ka. The phenomena underlying the negative correlation are unclear, but the lack of correlation between GCS and Ks is not surprising given the relatively homogeneous base composition of the A. thaliana genome (Barakat, Matassi, and Bernardi 1998
; AGI 2000
). We also found a correlation between codon bias and Ks that is similar to that documented in other species (Sharp and Li 1987
; Moriyama and Hartl 1993
, pp. 847858; Akashi 1994b
). This correlation is consistent with strong codon bias (low ENC) limiting acceptable synonymous changes and thus retarding substitution rates.
One interesting feature of codon bias is that it often also correlates with gene expression leveli.e., highly expressed genes are more biased (Gouy and Gautier 1982
; Duret and Mouchiroud 1999
; Coghlan and Wolfe 2000
). Surprisingly, highly expressed genes also evolve with relatively low Ka (Duret and Mouchiroud 2000
). Both of these observations suggest that yet another factorgene expressioncontributes to variation in Ka and perhaps Ks among genes. To briefly explore this possibility with the arabidopsis gene pairs, we employed the database procedures of Duret and Mouchiroud (2000)
and counted the number of BLAST hits for each gene pair to an arabidopsis EST database (http://www.kazusa.or.jp/en/plant/arabi/EST/). We then regarded the number of hits as a rough approximation of the expression level of a gene pair (this approximation is particularly rough given the diverse origin and preparation methods of cDNA libraries used for EST sequencing), and compared this expression level with evolutionary rates. Based on pairwise correlations among Ka, Ks and the number of EST hits, there is a significantly negative partial correlation between expression and Ka (r = -0.172, P = 0.007) but no significant correlation between expression and Ks (r = -0.037, P = 0.57). More detailed analyses require better expression data, but this significant negative correlation is consistent with recent ideas that evolutionary rates may be either proximally or secondarily a function of gene expression (reviewed in Akashi 2001
).
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Address for correspondence and reprints: Brandon S. Gaut, Department of Ecology and Evolutionary Biology, 321 Steinhaus Hall, University of California, Irvine, California 92697-2525. E-mail: bgaut{at}uci.edu
Keywords: nucleotide substitution rates
positive selection
codon usage
regional mutation
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
AGI (Arabidopsis Genome Initiative). 2000 Analysis of the genome sequence of the flowering plant Arabidopsis thaliana Nature 408:796-815[ISI][Medline]
Akashi H., 1994a. Inferring weak selection from patterns of polymorphism and divergence at "silent" sites in Drosophila DNA Genetics 139:1067-1076
. 1994b. Synonymous codon usage in Droshophila melanogaster: natural selection and translational accuracy Genetics 136:927-935
. 1997 Codon bias evolution in Drosophila Population genetics of mutation-selection drift. Gene 205:269-278
. 2001 Gene expression and molecular evolution Curr. Opin. Genet. Dev 11:660-666[ISI][Medline]
Altschul S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402
Alvarez-Valin F., K. Jabbari, N. Carels, G. Bernardi, 1999 Synonymous and nonsynonymous substitutions in genes from Gramineae: intragenic correlations J. Mol. Evol 49:330-342[ISI][Medline]
Barakat A., G. Matassi, G. Bernardi, 1998 Distribution of genes in the genome of Arabidopsis thaliana and its implications for the genome organization of plants Proc. Natl. Acad. Sci. USA 95:10044-10049
Bird A. P., 1980 DNA methylation and the frequency of CpG in animal DNA Nucleic Acids Res 8:1499-1504[Abstract]
Casane D., S. Boissinot, B. H. J. Chang, L. C. Shimmin, W. H. Li, 1997 Mutation pattern variation among regions of the primate genome J. Mol. Evol 45:216-226[ISI][Medline]
Charlesworth D., B. Charlesworth, G. McVean, 2001 Genome sequences and evolutionary biology, a two-way interaction TREE 16:235-242[ISI][Medline]
Chatfield C., 1999 The analysis of time series: an introduction Chapman and Hall, London
Clegg M. T., M. P. Cummings, M. L. Durbin, 1997 The evolution of plant nuclear genes Proc. Natl. Acad. Sci. USA 94:7791-7798
Coghlan A., K. H. Wolfe, 2000 Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae Yeast 16:1131-1145[ISI][Medline]
Comeron J. M., M. Aguade, 1998 An evaluation of measures of synonymous codon usage bias J. Mol. Evol 47:268-274[ISI][Medline]
Dayhoff M. O., R. M. Schwartz, B. C. Orcutt, 1978 A model for evolutionary change in proteins Pp. 345352 in M. O. Dayhoff, ed. Atlas of protein sequence and structure. National Biochemical Research Foundation, Washington, D.C.
dePamphilis C. W., N. D. Young, A. D. Wolfe, 1997 Evolution of plastid gene rps2 in a lineage of hemiparasitic and holoparasitic plants: many losses of photosynthesis and complex patterns of rate variation Proc. Natl. Acad. Sci. USA 94:7367-7372
Duret L., D. Mouchiroud, 1999 Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila and Arabidopsis Proc. Natl. Acad. Sci. USA 96:4482-4487
. 2000 Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate Mol. Bio. Evol 17:68-85
Eyre-Walker A., M. Bulmer, 1995 Synonymous substitution rates in enterobacteria Genetics 140:1407-1412
Eyre-Walker A., B. S. Gaut, 1997 Correlated rates of synonymous site evolution among plant genomes Mol. Biol. Evol 14:455-460[Abstract]
Fennoy S. L., J. Bailey-Serres, 1993 Synonymous codon usage in Zea mays L. nuclear genes is varied by levels of C and G-ending codons Nucleic Acids Res 21:5294-5300[Abstract]
Friedman R., A. L. Hughes, 2001 Gene duplication and the structure of eukaryotic genomes Genome Res 11:373-381
Gaut B. S., 1998 Molecular clocks and nucleotide substitution rates in higher plants Evol. Biol 30:93-120[ISI]
Gaut B. S., J. F. Doebley, 1997 DNA sequence evidence for the segmental allotetraploid origin of maize Proc. Natl. Acad. USA 94:6809-6814
Gaut B. S., S. V. Muse, W. D. Clark, M. T. Clegg, 1992 Relative rates of nucleotide substitution at the rbcL locus of monocotyledonous plants J. Mol. Evol 35:292-303[ISI][Medline]
Gaut B. S., A. S. Peek, B. R. Morton, M. T. Clegg, 1999 Patterns of genetic diversification within the Adh gene family in the grasses (Poaceae) Mol. Biol. Evol 16:1086-1097[Abstract]
Goldman N., Z. H. Yang, 1994 Codon-based model of nucleotide substitution for protein coding DNA sequences Mol. Biol. Evol 11:725-736
Gonzalez D. S., I. K. Jordan, 2000 The alpha-mannosidases: phylogeny and adaptive diversification Mol. Biol. Evol 17:292-300
Goodman M., G. W. Moore, G. Matsuda, 1975 Darwinian evolution in the genealogy of hemoglobin Nature 253:603-608[ISI][Medline]
Gouy M., C. Gautier, 1982 Codon usage in bacteria: correlation with gene expressivity Nucleic Acids Res 10:7055-7074[Abstract]
Gusfield D., 1997 Algorithms on strings trees and sequences Cambridge University Press, Cambridge, Mass
Hughes A. L., 1999 Adaptive evolution of genes and genomes Oxford University Press, Oxford
Kondrashov F. A., I. B. Rogozin, Y. I. Wolf, E. V. Koonin, 2002 Selection in the evolution of gene duplications Genome Biol 3:1-9
Ku H.-M., T. Vision, J. Liu, S. D. Tanksley, 2000 Comparing sequenced segments of the tomato and Arabidopsis genomes: large-scale duplication followed by selective gene loss creates a network of synteny Proc. Natl. Acad. Sci. USA 97:9121-9126
Laroche J., P. Li, L. Maggia, J. Bousquet, 1997 Molecular evolution of angiosperm mitochondrial introns and exons Proc. Natl. Acad. Sci. USA 94:5722-5727
Lercher M. J., E. J. B. Williams, L. D. Hurst, 2001 Local similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: implications for understanding the mechanistic basis of the male mutation bias Mol. Bio. Evol 18:2032-2039
Li W. H., T. Gojobori, 1983 Rapid evolution of goat and sheep globin genes following gene duplication Mol. Biol. Evol 1:94-108[Abstract]
Lin X. Y., S. S. Kaul, S. Rounsley, et al. (37 co-authors) 1999 Sequence and analysis of chromosome 2 of the plant Arabidopsis thaliana Nature 402:761-768[ISI][Medline]
Long M. Y., K. Thornton, 2001 Gene duplication and evolution Science 293:U1.
Lynch M., J. S. Conery, 2000 The evolutionary fate and consequences of duplicate genes Science 290:1151-1155
Matassi G., P. M. Sharp, C. Gautier, 1999 Chromosomal location effects on gene sequence evolution in mammals Curr. Biol 9:786-791[ISI][Medline]
Mayer K., C. Schuller, R. Wambutt, et al. (240 co-authors) 1999 Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana Nature 402:769-777.[ISI][Medline]
Moriyama E. N., T. Gojobori, 1992 Rates of synonymous substitution and base composition of nuclear genes in Drosophila Genetics 130:855-964
Moriyama E. N., D. N. Hartl, 1993 Codon usage bias and base composition of nuclear genes in Drosophila Genetics 134.
Muse S. V., B. S. Gaut, 1994 A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome Mol. Biol. Evol 11:715-724
Muse S. V., B. S. Weir, 1992 Testing for equality of evolutionary rates Genetics 132:269-276
Nei M., 1987 Molecular evolutionary genetics Columbia University Press, New York
Ohno S., 1970 Evolution by gene duplication Springer-Verlag, Heidelberg
Ohta T., M. Kimura, 1973 Model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population Genet. Res 22:201-204.[ISI][Medline]
Pál C., B. Papp, L. D. Hurst, 2001 Highly expressed genes in yeast evolve slowly Genetics 158:927-931
Sawyer S., 1989 Statistical tests for detecting gene conversion Mol. Biol. Evol 6:526-538[Abstract]
Sharp P. M., 1991 Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium codon usage, map position, and concerted evolution J. Mol. Evol 33:23-33[ISI][Medline]
Sharp P. M., W.-H. Li, 1987 The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias Mol. Biol. Evol 4:222-230[Abstract]
Terryn N., L. Heijnen, A. De Keyser, et al. (21 co-authors) 1999 Evidence for an ancient chromosomal duplication in Arabidopsis thaliana by sequencing and analyzing a 400-kb contig at the APETALA2 locus on chromosome 4 FEBS Lett 445:237-245.[ISI][Medline]
Thompson J. D., D. G. Higgins, T. J. Gibson, 1994 ClustalWimproving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]
Ticher A., D. Graur, 1989 Nucleic acid composition, codon usage, and the rate of synonymous substitution in protein-coding genes J. Mol. Evol 28:286-298[ISI][Medline]
Tiffin P., W. M. Hahn, 2002 Coding sequence divergence between two closely related plant species: Arabidopsis thaliana and Brassica rapa ssp. pekinensis J. Mol. Evol 54::746-753.[ISI][Medline]
Tsunoyama K., M. I. Bellgard, T. Gojobori, 2001 Intragenic variation of synonymous substitution rates is caused by nonrandom muations at methylated CpG J. Mol. Evol 53:456-464[ISI][Medline]
Vision T. J., D. G. Brown, S. D. Tanksley, 2000 The origins of genomic duplication in the Arabidopsis genome Science 290:2114-2117
Williams E. J., L. D. Hurst, 2000 The proteins of linked genes evolve at similar rates Nature 407:900-903[ISI][Medline]
Wolfe K. H., P. M. Sharp, 1993 Mammalian gene evolution: nucleotide sequence divergence between mouse and rat J. Mol. Evol 37:441-456[ISI][Medline]
Wolfe K. H., P. M. Sharp, W.-H. Li, 1989a. Mutation rates differ among regions of the mammalian genome Nature 337:283-285[ISI][Medline]
. 1989b. Rates of synonymous substitution in plant nuclear genes J. Mol. Evol 29:208-211[ISI]
Wright F., 1990 The effective number of codons' used in a gene Gene 87:23-29[ISI][Medline]
Yang Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood CABIOS 13:555-556[Medline]
. 1998 Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution Mol. Bio. Evol 15:568-573[Abstract]
Zeng L.-W., J. M. Comeron, B. Chen, M. Kreitman, 1998 The molecular clock revisited: the rate of synonymous versus replacement change in Drosophila Genetica 102/103:369-382
Zhang L. Q., B. S. Gaut, T. J. Vision, 2001a. Gene duplication and evolution Science 293:U1-U2
Zhang L. Q., S. Kosakovsky-Pond, B. S. Gaut, 2001b. A survey of the molecular evolutionary dynamics of twenty-five multigene families from four grass taxa J. Mol. Evol 52:144-156[ISI][Medline]