National Center for Biotechnology Information, National Institutes of Health Bethesda, Maryland
Correspondence: E-mail: koonin{at}ncbi.nlm.nih.gov.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: Gene expression Human evolution Natural selection Network Self-organization Substitution rate
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The agent that probably has been most often invoked to explain the ordering of biological systems over time is natural selection (Darwin 1859; Li 1997). Genome-scale studies on natural selection have detailed many of the factors that mitigate the effects of selection on the evolution of gene sequences. Such surveys rely on comparisons between evolutionary rates, which yield information about the action of natural selection, and various quantifiable functional genomic parameters. For instance, several studies have demonstrated a relationship between gene evolutionary rates and the fitness effects associated with gene knockouts. Genes with greater fitness effects (e.g., essential genes) seem to evolve more slowly, on average, than genes with smaller fitness effects (Hirsh and Fraser 2001; Jordan et al. 2002). This is taken to suggest that essential genes evolve under stronger functional constraints and, thus, a more severe purifying selection regime, than nonessential genes. Similarly, genes that encode proteins involved in numerous protein-protein interactions have been reported to be more evolutionarily conserved than genes encoding less-prolific interactors (Fraser et al. 2002; Fraser, Wall, and Hirsh 2003). A recent study that dealt with several such relationships simultaneously demonstrated correlations between different measures of evolutionary conservation and various functional genomic parameters (Krylov et al. 2003).
However, the findings of some of these evolutionary genomics studies have been challenged. The possibility that the observed effects of any one genomic parameter on evolutionary rates can be confounded by the correlations between different genomic parameters has been raised repeatedly. For example, some of the strongest correlations seen are between evolutionary rates and gene expression levels. Genes that are expressed at high levels and in numerous tissues tend to be more conserved than genes with lower and narrower expression patterns (Duret and Mouchiroud 2000; Pal, Papp, and Hurst 2001; Krylov et al. 2003; Zhang and Li 2004). When the effects of expression level are controlled for, the correlations between evolutionary rate and fitness effects as well as between evolutionary rate and the number of protein-protein interactions are mitigated (Bloom and Adami 2003; Pal, Papp, and Hurst 2003). Furthermore, when duplicate genes were removed from consideration, the relationship between fitness effects and evolutionary rate disappeared (Yang, Gu, and Li 2003). These controversies remain unsettled, and the general question of how various functional genomic parameters interact to effect evolutionary rate is open.
In addition to natural selection, an emphasis has recently been placed on the role of fundamental physical principles in imposing order on biological systems (Barabasi and Oltvai 2004). Various complex biological systems have been abstracted as networks where the nodes in the network represent the individual parts, such as proteins or metabolites, and the links in the network represent the interactions between the parts (Jeong et al. 2000, 2001; Luscombe et al. 2002; Ravasz et al. 2002). Studies of the statistical properties of the topologies of such networks suggest specific mechanisms that govern their evolution. In particular, many biological networks show scale-free topological properties that can be explained by a model of network growth via preferential attachment of new nodes to existing nodes that are already highly connected (Barabasi and Albert 1999). At the genomic level, gene duplication is thought to underlie the phenomenon of preferential attachment (Rzhetsky and Gomez 2001; Bhan, Galas, and Dewey 2002; Barabasi and Oltvai 2004). Existing highly connected nodes (i.e., genes or proteins) are more likely, simply by virtue of their large number of connections, to be linked to nodes that are duplicated. Because the duplicated nodes are expected to maintain the same links as the ancestral singleton, the connectivity of a highly connected node will increase with duplication (Barabasi and Oltvai 2004). This process alone can lead to network growth by preferential attachment. The ubiquity of scale-free network topological patterns suggests that network growth by preferential attachment, via mechanisms such as gene duplication, is a fundamental and conserved evolutionary process.
In this work, we attempted to integrate the two perspectives of natural selection and physical self-organization to analyze the evolution of gene sequence and expression patterns. Comparison of human and mouse genome sequence data was combined with the analysis of gene expression profiles that were derived from microarray experiments on a number of tissues in both species. Human gene expression profiles were used to reconstruct a network of coexpressed genes, and we demonstrate the effect of the network topology on the strength of natural selection as well as an unexpected relationship between human-mouse gene sequence and gene expression divergence. These results underscore the influence of expression network self-organization on gene evolution and suggest that distinct mechanisms are responsible for the evolution of expression patterns and gene sequences.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Pairwise correlations between gene expression patterns were used to derive the gene coexpression network. The fit of the network node degree distributionthat is, the frequency distribution of the number of genes, f(n), that have n coexpressed genesto a theoretical distribution (the generalized Pareto distribution here) was done as previously described (Karev et al. 2002; Koonin, Wolf, and Karev 2002). It should be noted that, because the available node degree data span only approximately 2.5 orders of magnitude, a formal possibility remains that the coexpression data are also compatible with models other than the scale-free model adopted here. The clustering coefficient (C) was calculated for each node as the ratio of the number of the actual connections between the neighbors of the node to the number of possible connections between them (Barabasi and Oltvai 2004).
Human and mouse gene sequences were extracted from the RefSeq database (NCBI, NIH, Bethesda). Human-mouse orthologs were identified as reciprocal best Blast hits between protein sequences as previously described (Jordan, Wolf, and Koonin 2003). Human-mouse orthologous protein sequences were aligned using ClustalW (Higgins, Thompson, and Gibson 1996), and the protein alignments were used to guide alignments of the corresponding nucleotide coding sequences to ensure that they were aligned in frame. The 5' and 3' UTR sequences were aligned using ClustalW, and only alignments where the shortest sequence had more than 20 residues and had no more than 50% differences in the number of residues (i.e., length) between aligned sequences were used for further analysis. Synonymous (dS) and nonsynonymous (dN) substitution rates were calculated for alignments of protein-coding sequences using the Nei-Gojobori method (Nei and Gojobori 1986) implemented in the PAML package (Yang 1997). The 5' and 3' UTR substitution rates (d) were calculated using the Jukes-Cantor correction for multiple substitutions (Jukes and Cantor 1969). Paralogous genes were identified using an all-against-all BlastP search of the proteins in the coexpression (r 0.7) network with an e-value threshold of 105 and a coverage cutoff such that more than 50% of the shorter protein sequence had to be included in the high scoring segment pair.
Comparisons between substitution rates and various gene expression parameters (expression breadth, expression level, and correlations) were done by sorting the rates or rate differences in the ascending order and then binning the sorted values into 10 equal-sized bins. Average and fractional values of gene expression parameters for each substitution rate bin were compared with respect to the order of the bins and the Spearman rank correlation (R), along with its statistical significance (P-value at n = 10 for all comparisons), was computed for each comparison (table 1). Regular correlation coefficients (rXY) and partial correlation coefficients (rXY·Z) were Fisher Ztransformed, and the significance of the differences between them was calculated using a z-test.
|
|
|
|
|
![]() |
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Human Gene Coexpression Network
Tissue-specific expression patterns of human genes were further compared to identify coexpressed genes. For each differentially expressed human gene, the normalized expression levels in each of the 31 tissues were used to construct a vector, which was compared with similarly derived vectors of other human genes using the Pearson correlation coefficient (r) (Eisen et al. 1998). Pairs of genes with high r-values are considered to be coexpressed. Values of r were determined for all pairs of human genes. These data were used to infer a network of coexpressed genes, where the genes are nodes that are connected by an edge if they share an r-value greater than or equal to a specified threshold. This was done using a series of r-value thresholds (0.9 0.4), and the topological properties of the resulting series of networks were investigated by analyzing their node degree distributions; that is, the frequency distributions of the number of genes, f(n), that have n coexpressed genes. As the r-value threshold increases, the number of edges in the network decreases, and the node degree distribution seems to tend to a power law distribution (see supplementary figure 3). Node degree distributions and graphic representations of the corresponding network topologies, for r 0.9, 0.8, and 0.7, are shown in figure 2. Because the distribution for r
0.7 shows a good fit to a distribution with a power law tail while still retaining enough data for meaningful statistical analysis (fig. 2, and see supplementary figure 3), the threshold of 0.7 was chosen for further analysis. A list of coexpressed gene pairs at r
0.7 is shown at ftp://ftp.ncbi.nih.gov/pub/koonin/Jordan/MBE-040138.R1/SupplementaryTable1.tab.
|
Obviously, there is a transitive property to the correlation coefficients that are used to measure coexpression and connect nodes in the coexpression network. If the tissue-specific expression pattern of gene A is correlated with that of gene B, and the pattern of gene B is correlated with that of gene C, then one should expect expression of gene A to be correlated with that of gene C. However, the level of correlation between A and C is an open question because some groups of genes can be tightly coregulated, whereas others are only loosely coexpressed. To further evaluate the topological properties of the human gene coexpression network, the clustering of network nodes was analyzed. The clustering coefficient (C) of a given node is the ratio of the number of the actual connections between the neighbors of the node to the number of possible connections between them (Barabasi and Oltvai 2004). The average C for the gene coexpression network is 0.452. This indicates a high degree of clustering that is typical of many scale-free networks (Barabasi and Oltvai 2004); however, the fact that C <<1 indicates only limited transitivity in the gene coexpression network analyzed here. The shape of the dependence of C on the node degree is thought to be indicative of the mode of network growth (Barabasi and Oltvai 2004). Furthermore, a plot of the clustering coefficient, C(n), versus node degree (n) shows the absence of any clear trend between the two (fig. 3b). C(n) constant, as seen here, is consistent with network growth by simple preferential attachment of nodes rather than growth by the hierarchical addition of modules that is thought to be characterized by C(n)
n1 (Barabasi and Oltvai 2004).
The human gene coexpression network was examined to assess the functional relationships between coexpressed genes. Pairs of coexpressed genes were functionally classified using the eukaryotic KOG database (Tatusov et al. 2003) and the Gene Ontology database (Ashburner et al. 2000). Using both of these approaches, approximately 17.5% of coexpressed gene pairs were found to encode proteins with the same functional classification, a significant (P < 0.0001) but relatively small excess over the random expectation (13%). However, far more striking nonrandomness was observed when the coexpressed genes were examined for co-occurrence of different functions: certain combinations of seemingly related functions were strongly preferred (ftp://ftp.ncbi.nih.gov/pub/koonin/Jordan/MBE-040138.R1/SupplementaryTable2.tab). For example, signal transduction genes are coexpressed with those involved in secretion, transcription, and posttranslational protein modification much more often than expected by chance. This is likely to reflect coexpression of genes coding for proteins that function together in biochemical and signaling pathways.
A number of highly connected individual clusters from the human gene coexpression network were further examined with respect to the expression patterns and functions of their member genes. The majority of these clusters are made up of genes with narrow, if not exclusive, tissue-specific expression patterns (see supplementary figure 4). Examples of the topologies of two of these clusters, along with their tissue-specific expression patterns, are shown in figure 4. The experimentally determined and predicted functions for the member genes of these two clusters are generally consistent with their expression patterns and indicate their involvement in pancreatic and testis-related physiological functions, respectively (table 2).
|
Although the correlation between dN and the number of coexpressed genes was nearly as strong as that between dN and expression breadth and level, for dS, the latter correlation was substantially stronger (table 1). This is not surprising, because codon usage adaptation is particularly important for highly expressed genes (Carbone, Zinovyev, and Kepes 2003). Partial correlation, rXY·Z where X = number of coexpressed genes, Y = substitution rate, and Z = expression level, was used to control for the effects of expression level on the relationship between the number of coexpressed genes and substitution rates. In all cases where the substitution rate was found to be significantly correlated with the number of coexpressed genesdN, dS, and d(3' UTR) (table 1)the application of partial correlation did not result in any significant decrease of the correlation coefficient; that is, rXY·Z is not significantly smaller than rXY (0.38 < P < 0.56). Thus, the effect of the number of coexpressed genes on substitution rates is not based on any correlation between the number of coexpressed genes and the expression level.
The relationship between gene coexpression and evolutionary rate was further examined by comparing the r-values between pairs of human gene expression patterns with their pairwise gene substitution rate differences (fig. 6 and table 1). The results show that bins of human genes with similar values of dN between human and mouse (i.e., those genes that evolve at similar rates) have a greater fraction of pairwise r-values 0.7. This negative correlation was substantial and statistically significant (fig. 6 and table 1). In contrast, the pairwise dS difference between human genes did not correlate with the difference in expression patterns (fig. 6). Thus, genes with similar patterns of expression tend to evolve at similar rates, and the selection that leads to such coevolution appears to operate primarily at the protein sequence level. The difference in evolution rates of 5' and 3' UTRs also negatively correlated with pairwise r-values between human gene expression patterns, although the effect, statistically significant because of the large number of analyzed gene pairs (table 1), was not nearly as pronounced as seen for dN (fig. 6). As with other comparisons between evolutionary rate and expression patterns, the magnitude of this relationship was greater for 3' UTRs, suggesting that this region is more functionally relevant than the 5' UTR with respect to the pattern of gene expression. Qualitatively identical results are seen when the mouse expression data was used to compare r-values between pairs of mouse genes with pairwise human-mouse gene substitution rate differences (see supplementary figure 5b).
|
|
|
|
![]() |
Conclusion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Our results show that overall levels of expression and more specific topological properties of the gene coexpression network are clearly related to the rates of sequence evolution: highly connected network hubs tend to evolve slowly, and genes that are coexpressed show a strong tendency to evolve at similar rates. These relationships most likely reflect purifying selection, which appears to act primarily at the level of the protein sequence. Generally, these results are compatible with the notion of greater biological importance of highly connected nodes of biological networks (Jeong et al. 2001). However, two of the findings reported here appear to be distinctly surprising. Firstly, we found that the connection between gene coexpression network topology and sequence evolution held for both nonsynonymous and synonymous sites in the coding sequence and 3' UTR but not for the 5' UTR. Thus, the constraints on 5' UTR evolution appear to be unrelated to expression regulation as analyzed here. Secondly, we found that, although the expression profiles of orthologs tend to be strongly correlated, they diverged at roughly the same rate across a wide range of sequence evolution rates. Thus, unlike the properties of the gene coexpression network, evolution of an individual gene's expression regulation after speciation seems to be uncoupled from the functional constraints on the gene's sequence. This is generally compatible with the pregenomic notion that regulatory evolution could be the decisive factor of biological diversification between species (Britten and Davidson 1969; King and Wilson 1975) and can be taken to suggest that sequence divergence and expression pattern divergence are governed by distinct forces.
Recent studies have reported the rapid diversification of gene expression patterns and suggested that this might reflect neutral evolution of transcriptional regulation (Khaitovich et al. 2004; Yanai, Graur and Ophir 2004). The lack of correlation between sequence divergence and expression profile divergence demonstrated here could be deemed as compatible with the neutral model whereby transcription profiles of orthologs diverge rapidly upon speciation until they reach a basal level of similarity that is then maintained by purifying selection. However, it is tempting to speculate as to a distinct role for natural selection in driving expression pattern divergence. Perhaps, whereas sequence divergence levels are determined largely by purifying selection, the effects of adaptive (diversifying) selection are more prevalent at the level of gene expression. Under this hypothetical scenario, although the basal level of correlation between orthologs tends to persist, probably reflecting the conservation of the general function, the aspects of the gene expression patterns that reflect species-specific changes, governed in part by short, degenerate transcription factorbinding sites, would be expected to diverge rapidly. Indeed, our analysis of the relationship between gene coexpression in this work and gene duplication, as well as previous results (Gu et al. 2002), suggest rapid divergence of expression patterns after duplication. Once the expression patterns of homologous genes (paralogs or orthologs) diverge, convergent evolution, which is a hallmark of adaptation and is widely evident for phenotypic characters, could cause unrelated genes to achieve similar expression patterns. This convergence could be related to the rapid de novo generation of transcription factorbinding sites in promoters that lead to slight changes in expression patterns. If the functionally relevant aspects of the ancestral expression pattern are maintained during this process, subtle expression pattern changes would be simultaneously invisible to purifying selection and serve as the raw material for repeated trials of adaptive selection. This hypothesis yields a specific prediction with regard to the evolution of promoter regions that is currently being investigated. Expression profiles are predicted to be more correlated with the distributions of specific transcription factorbinding sites along promoters than with the overall sequence divergence (i.e., relatedness) between promoters.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Supplementary Figure 2.Frequency distributions of human-mouse substitution rate ratios. Ratios shown in log scale. (a) d(5' UTR)/d(3' UTR). d(5' UTR)/d(3' UTR) is shown in log scale. In 43% of genes, d(5' UTR) < d(3'UTR), whereas in 57% d(3' UTR) < d(5' UTR). (b) d(5' UTR)/dS. (c) d(3' UTR/dS).
Supplementary Figure 3.Frequency distributions of the number of links per node in human gene coexpression networks. For each network, the correlation coefficient value (r) threshold used to determine whether any two genes (nodes) are considered to be coexpressed (linked) are shown above the plot. The slope of the linear trend line that fits the data and the r2 value indicating the goodness of fit to the power law are shown for each network plot.
Supplementary Figure 4.Tissue-specific expression patterns for human gene coexpression network (r 0.9) clusters. Cluster expression patterns: 1, adult and fetal liver; 2, fetal liver; 3, testis; 4, pancreas; 5, testis and umbilical vein; 6, various brain tissues; 7, salivary gland; 8, adult liver; 9, thymus; 10, spleen; 11, lung; 12, cerebellum.
Supplementary Figure 5.Relationship between the mouse coexpression network topology and substitution rates. (a) The dependence between the node degree and substitution rate. Average (± standard error) numbers of coexpressed genes (node degree for r 0.7) per gene for 10 ascending bins of human-mouse orthologous gene substitution rates are shown. (b) Coexpressed mouse genes have similar substitution rates. Fractions of pairwise correlation coefficient values, where r
0.7 between mouse genes, are shown for 10 ascending bins of pairwise human-mouse orthologous gene substitution rate differences.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Agrawal, H. 2002. Extreme self-organization in networks constructed from gene expression data. Phys. Rev. Lett. 89:268702.[CrossRef][Medline]
Ashburner, M., C. A. Ball, J. A. Blake et al. 2000. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25:2529.[CrossRef][ISI][Medline]
Barabasi, A. L., and R. Albert. 1999. Emergence of scaling in random networks. Science 286:509512.
Barabasi, A. L., and Z. N. Oltvai. 2004. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 5:101113.[CrossRef][ISI][Medline]
Bergmann, S., J. Ihmels, and N. Barkai. 2004. Similarities and differences in genome-wide expression data of six organisms. PLoS Biol. 2:00850093.
Bhan, A., D. J. Galas, and T. G. Dewey. 2002. A duplication growth model of gene expression networks. Bioinformatics 18:14861493.
Bloom, J. D., and C. Adami. 2003. Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evol. Biol. 3:21.[CrossRef][Medline]
Britten, R. J., and E. H. Davidson. 1969. Gene regulation for higher cells: a theory. Science 165:349357.[ISI][Medline]
Carbone, A., A. Zinovyev, and F. Kepes. 2003. Codon adaptation index as a measure of dominating codon bias. Bioinformatics 19:20052015.
Darwin, C. 1859. On the origin of species. John Murray, London.
Duret, L., and D. Mouchiroud. 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17:6874.
Eisen, M. B., P. T. Spellman, P. O. Brown, and D. Botstein. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95:1486314868.
Farris, J. S. 1977. Phylogenetic analysis under Dollo's law. Syst. Zool. 26:7788.[ISI]
Fraser, H. B., A. E. Hirsh, L. M. Steinmetz, C. Scharfe, and M. W. Feldman. 2002. Evolutionary rate in the protein interaction network. Science 296:750752.
Fraser, H. B., D. P. Wall, and A. E. Hirsh. 2003. A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol. Biol. 3:11.[CrossRef][Medline]
Giot, L., J. S. Bader, C. Brouwer et al. 2003. A protein interaction map of Drosophila melanogaster. Science 302:17271736.
Gu, Z., D. Nicolae, H. H. Lu, and W. H. Li. 2002. Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. 18:609613.[CrossRef][ISI][Medline]
Hedges, S. B., H. Chen, S. Kumar, D. Y. Wang, A. S. Thompson, and H. Watanabe. 2001. A genomic timescale for the origin of eukaryotes. BMC Evol. Biol. 1:4.[CrossRef][Medline]
Higgins, D. G., J. D. Thompson, and T. J. Gibson. 1996. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266:383402.[ISI][Medline]
Hirsh, A. E., and H. B. Fraser. 2001. Protein dispensability and rate of evolution. Nature 411:10461049.[CrossRef][ISI][Medline]
Ho, Y., A. Gruhler, A. Heilbut et al. 2002. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415:180183.[CrossRef][ISI][Medline]
Hoyle, D. C., M. Rattray, R. Jupp, and A. Brass. 2002. Making sense of microarray data distributions. Bioinformatics 18:576584.
Jeong, H., S. P. Mason, A. L. Barabasi, and Z. N. Oltvai. 2001. Lethality and centrality in protein networks. Nature 411:4142.[CrossRef][ISI][Medline]
Jeong, H., B. Tombor, R. Albert, Z. N. Oltvai, and A. L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407:651654.[CrossRef][ISI][Medline]
Jordan, I. K., I. B. Rogozin, Y. I. Wolf, and E. V. Koonin. 2002. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12:962968.
Jordan, I. K., Y. I. Wolf, and E. V. Koonin. 2003. No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol. Biol. 3:1.[CrossRef][Medline]
Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21132 in H. N. Munro, ed. Mammalian protein metabolism. Academic, New York.
Kamath, R. S., A. G. Fraser, Y. Dong et al. 2003. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421:231237.[CrossRef][ISI][Medline]
Karev, G. P., Y. I. Wolf, A. Y. Rzhetsky, F. S. Berezovskaya, and E. V. Koonin. 2002. Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol. Biol. 2:18.[CrossRef][Medline]
Khaitovich, P., G. Weiss, M. Lachmann, I. Hellman, W. Enard, B. Muetzel, U. Wiekner, W. Ansorge, and S. Paabo. 2004. A neutral model of transcriptome evolution. PLOS Biol. 2:06820689.[CrossRef]
King, M. C., and A. C. Wilson. 1975. Evolution at two levels in humans and chimpanzees. Science 188:107116.[ISI][Medline]
Koonin, E. V., N. D. Fedorova, J. D. Jackson et al. 2004. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 5:R7.[CrossRef][Medline]
Koonin, E. V., Y. I. Wolf, and G. P. Karev. 2002. The structure of the protein universe and genome evolution. Nature 420:218223.[CrossRef][ISI][Medline]
Krylov, D. M., Y. I. Wolf, I. B. Rogozin, and E. V. Koonin. 2003. Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 13:22292235.
Kuznetsov, V. A., G. D. Knott, and R. F. Bonner. 2002. General statistics of stochastic process of gene expression in eukaryotic cells. Genetics 161:13211332.
Lander, E., S. L. M. Linton, B. Birren et al. 2001. Initial sequencing and analysis of the human genome. Nature 409:860921.[CrossRef][ISI][Medline]
Li, S., C. M. Armstrong, N. Bertin et al. 2004. A map of the interactome network of the metazoan C. elegans. Science 303:540543.
Li, W. H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass.
Luscombe, N. M., J. Qian, Z. Zhang, T. Johnson, and M. Gerstein. 2002. The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties. Genome Biol.3:RESEARCH0040.
Makova, K. D., and W. H. Li. 2003. Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 13:16381645.
Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3:418426.[Abstract]
Pal, C., B. Papp, and L. D. Hurst. 2001. Highly expressed genes in yeast evolve slowly. Genetics 158:927931.
. 2003. Genomic function: rate of evolution and gene dispensability. Nature 421:496497.
Pennisi, E. 2003. Systems biology: tracing life's circuitry. Science 302:16461649.
Ravasz, E., A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A. L. Barabasi. 2002. Hierarchical organization of modularity in metabolic networks. Science 297:15511555.
Rzhetsky, A., and S. M. Gomez. 2001. Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome. Bioinformatics 17:988996.
Su, A. I., M. P. Cooke, K. A. Ching et al. 2002. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. USA 99:44654470.
Tatusov, R. L., N. D. Fedorova, J. D. Jackson et al. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41.[CrossRef][Medline]
Ueda, H. R., S. Hayshi, S. Matsuyama, T. Yomo, S. Hashimoto, S. A. Kay, J. B. Hogenesch, and M. Iino. 2004. Universality and flexibility in gene expression from bacteria to human. Proc. Natl. Acad. Sci. USA 101:37653769.
Wagner, A. 2000. Decoupled evolution of coding region and mRNA expression patterns after gene duplication: implications for the neutralist-selectionist debate. Proc. Natl. Acad. Sci. USA 97:65796584.
. 2003. How the global structure of protein interaction networks evolves. Proc. R. Soc. Lond. B Biol. Sci. 270:457466.[CrossRef][ISI][Medline]
Waterston, R. H. K. Lindblad-Toh, E. Birney et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520562.[CrossRef][ISI][Medline]
Yanai, I., D. Graur, and R. Ophir. 2004. Incongruent expression profiles between human and mouse orthologous genes suggest widespread neutral evolution of transcription control. OMICS 8:1524.[CrossRef][ISI][Medline]
Yang, J., Z. Gu, and W. H. Li. 2003. Rate of protein evolution versus fitness effect of gene deletion. Mol. Biol. Evol. 20:772774.
Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl. Biosci. 13:555556.[Medline]
Zhang, L., and W. H. Li. 2004. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol. Biol. Evol. 21:236239.