Department of Biology, University of New Mexico
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Studying functional divergence among duplicate genes requires a definition of gene function, but no such universal definition is possible. The reason is that there are several complementary ways of classifying gene functions (Ashburner et al. 2000
). For instance, gene products can be characterized biochemically, e.g., as enzymes or transcription factors. Second, they can be characterized through their time and locus of expression, e.g., expression during a cell cycle stage, in the cytoplasm, or during brain development. Third, they can be characterized genetically through mutations and through other genes that these mutations affect. This list is not necessarily complete.
Functional genomics has added much information to each of these categories, especially in model organisms like the yeast Saccharomyces cerevisiae. First, monitoring expression through microarrays (Chu et al. 1998
; Eisen et al. 1998
; Spellman et al. 1998
; Gasch et al. 2000
) provides spatiotemporal expression information for thousands of genes at once. This information is indicative of the biological process a gene is involved in. Second, genome-wide protein-protein interactions can characterize physical interactions among thousands of gene products (Bartel et al. 1996
; Fromont-Racine, Rain, and Legrain 1997
; Uetz et al. 2000
; Ito et al. 2001
). Third, large-scale gene knockout screens in combination with microarray experiments indicate which genes' expression level is affected by a mutated gene (Hughes et al. 2000
). Thus, even in the absence of a detectable phenotypeall too frequent in knockout experimentsa putative function can sometimes be assigned using genetic interactions with known genes.
Attempts to identify gene functions according to any of the above criteria, whether they use genomic or pregenomic techniques, yield one key message: most genes have more than one, if not many functions. They are expressed at multiple times and in multiple places, they affect multiple biological processes when mutated, or they interact with proteins with diverse biochemical and biological roles (Bender et al. 1983
; Li and Noll 1994
; Jack and Delotto 1995
; Slusarski, Motzny, and Holmgren 1995
; Kirchhamer, Yuh, and Davidson 1996
; Schwikowski, Uetz, and Fields 2000
; Wagner 2001
). This multifunctionality has important implications for the divergence of duplicate genes: duplicate genes often diverge through loss of complementary (sub)functions in each duplicate (Force, Lynch, and Postlethwait 1999
; Lynch and Force 2000
; Wagner 2000
). Examples abound. To name but two, the ZAG1 and ZMM2 genes are paralogues in the maize genome. They are orthologues of the Arabidopsis AGAMOUS gene, which is involved in carpel and stamen development. Each of them appears to have largely lost one of their ancestral expression domains: ZAG1 is expressed at high levels in developing carpels and ZMM2 is expressed in developing stamens. A null mutation in ZAG1 affects only early carpel development (Coen and Meyerowitz 1991
; Schmidt et al. 1993
; Mena et al. 1996
). Force, Lynch, and Postlethwait (1999)
report on the zebrafish engrailed genes eng1 and eng1b, the likely results of a teleost-specific gene duplication of the tetrapod En1 gene. In mice and chicken, En1 is expressed in the developing pectoral appendage bud and in specific neurons of the developing hindbrain and spinal cord. In zebrafish, eng1 retained expression in the pectoral appendage bud, whereas eng1b is only expressed in the hindbrain and the spinal cord. Similar patterns of divergence may be quite common in zebrafish (Ekker et al. 1995
; Lee, Xu, and Breitbart 1996
; Ekker et al. 1997
).
Studies focussing on individual gene pairs fall short of identifying general divergence patterns of many duplicate genes. At first sight, analyzing functional divergence of many duplicate genes may seem like a hopeless task. Because it is not even straightforward to classify one gene's function, how would one compare the functions of many divergent duplicates? Functional genomic experiments provide a crude remedy for this problem. Despite their disadvantage of providing largely qualitative information about genetic and molecular interactions of genes, their great advantage is that they do so for thousands of genes at once. They thus yield insight about one aspecthowever minuteof gene function, such as the protein interaction partners of a gene, gene expression patterns affected through mutating a gene, or the response of gene expression to environmental challenges. It is this aspect of gene function I will focus on.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Protein Interaction Data and Analysis
Data for 899 pairwise interactions among 985 yeast proteins, as reported in Uetz et al. (2000)
, were obtained from http://depts.washington.edu/sfields/projects/YPLM/Nature-plain.html on February 15, 2000. There are 43 proteins that have been reported to interact among themselves. Before further analysis all such self-interactions were eliminated. (Self-interactions are interactions between two protein products of the same gene, such as interactions that might occur for homodimerizing proteins.) The resulting protein interaction network was then represented as a graph using the Library of Efficient Data types and Algorithms (LEDA) (Mehlhorn and Naher 1999
). Within this graph representation, common and different protein interactions among gene family members are easily analyzed (Wagner 2001
). To analyze protein interaction data not generated by two-hybrid experiments, I used information on physical interactions among yeast proteins obtained from the Munich Institute for Protein Sequences (MIPS) database (Mewes et al. 1999
, http://mips.gsf.de/proj/yeast/CYGD/db/index.html). I eliminated from these data all protein interactions generated only by two-hybrid experiments. The remaining 899 interactions involve 680 proteins. I did not distinguish between genes with only one paralogue and genes that occur in multigene families in the analysis of either data set.
I used the following numerical approach to test (and reject) the null hypothesis that the number of interactions in products of paralogous genes has diverged symmetrically. (Notice that this hypothesis does not regard the mechanism of divergence, only its pattern.) The approach proceeds by (1) reconstructing the (identical) numbers of interactions of two proteins immediately after duplication of their encoding genes, and (2) emulating the process of symmetric divergence. Consider two proteins P and P* that have d1 and d2 protein interactions, respectively, and that share b of these interaction partners (fig. 1 ). It follows that P and P* have d1 - b and d2 - b nonshared interactions, respectively, adding to a total of d1 + d2 - 2b nonshared interactions. Each of these interactions might have arisen through the evolutionary loss of an interaction that was shared after duplication or through the evolutionary gain of an interaction since the duplication. To not restrict myself to only one of these possibilities, I assume that after duplication interactions are lost with some probability Pl and gained with probability (1 - Pl). Because interactions are gained or lost probabilistically, one cannot unambiguously reconstruct the ancestral state of interactions, that is, the number of interactions P and P* had immediately after duplication. But it is possible to reconstruct a likely ancestral state simply by noting that the number of lost interactions after duplication follows a binomial distribution B(d1 + d2 - 2b, Pl). This ancestral state, the number of interactions of each protein immediately after duplication, is simply given by b + nl, where nl is a random number distributed as B(d1 + d2 - 2b, Pl). (The total number of interactions gained by the two duplicates then immediately follows as ng = [d1 + d2 - 2b] - nl.) Equipped with these two numbers, I then applied the null hypothesis of symmetric divergence to emulate each protein's divergence from this ancestral state. According to the null hypothesis, the number of interactions lost and gained by protein P since duplication is given by random numbers nl1 with distribution B(nl, 0.5) and ng1 with distribution B(ng, 0.5), respectively. The factor 0.5 in these distributions reflects the assumption of symmetric divergence in the null hypothesis. Thus, according to the null hypothesis, protein P should have (b + nl) - nl1 + ng1 interactions. The number of interactions of protein P* immediately follows as (b + nl) - (nl - nl1) + (ng - ng1).
|
Environmental Stress and Gene Expression
To assay the differential expression response of yeast paralogues to environmental stresses, I used data provided by Gasch et al. (2000)
for the following conditions: heat shock (2537°C, after 30 min), reverse heat shock (3725°C, 30'), H2O2 and Menadione exposure, both of which generate reactive oxygen species (60' and 80', respectively), dithiothreitol, a reducing agent interfering with protein folding (90'), diamide, an agent oxidizing sulfhydryl groups, (40'), hyperosmotic shock mediated by 1 M sorbitol (60'), hypo-osmotic shock mediated by transfer of cells from 1 M sorbitol to medium lacking sorbitol (30'), amino acid starvation (2 h), nitrogen depletion (1 day) , and stationary phase (7 days). I considered genes whose expression level was changed at least threefold relative in response to a stressor to be affected significantly. Because the expression response to most environmental stresses is transient, I chose a time point (indicated above in parentheses) approximately halfway through the measured response time series for each environmental stress to assess significant change. I then counted the number of stressors to which each member of a paralogous gene pair responded and did so for all 5,460 duplicate pairs with Ka < 0.75. For 40.4% (2,210) of these gene pairs, neither gene in the pair showed a response to any of the stressors applied. Such gene pairs are not suitable for this analysis, and I have thus eliminated them. I also excluded 162 further gene pairs (2.96%), where at least one stress condition induced the expression of one gene but repressed that of the other. Because of cross-hybridization, very closely related duplicates cannot be distinguished through microarray analysis, but the analysis of Gasch et al. (2000
, fig. 5) suggests that gene pairs with Ks > 0.5 are readily distinguishable. I thus excluded an additional 4.5% (247) of the paralogues with Ks < 0.5 from the analysis. The null hypothesis of symmetric divergence was assessed in exactly the same way as that for protein-protein interactions, except that d1 and d2 now do not correspond to the number of protein interactions but instead to the number of expression responses that two duplicate genes show when exposed to the environmental stressors considered here (b is the number of environmental stressors to which both duplicate genes respond).
Gene Perturbations and Gene Expression
Data summarizing the effects of 271 gene deletions (and other treatments) on gene expression were made available as supplemental material to Hughes et al. (2000)
, file data_expts_1-300_ratios.txt. From this data set, which contains log10-transformed expression ratios of 6,312 genes for each mutation, I eliminated all data derived from haploid and aneuploid deletion strains, as well as data on nongenetic treatments. The remaining data contain information on null mutation effects for a total of 21 paralogous gene pairs, the most closely related 11 of which (Ka < 1) are discussed here. For each member gene of each paralogue, I determined what other genes were affected in their expression level by a synthetic-null mutation in the gene. I also determined the number of genes that were affected by a null mutation in each paralogue. I considered a gene as affected by a null mutation if its level of mRNA expression had changed by more than threefold in response to the mutation.
The total number of gene pairs to test for symmetric divergence is much smaller than that available for protein interactions and environmental stress response, but the number of affected genes per null mutation is much larger. This means not only that the above test must be modified but also that it is now possible to test each individual gene pair for symmetric divergence. I present an exact test only for the two extreme cases of loss and gain of function after duplication. Let d1 and d2 be the number of genes affected by a synthetic null mutation in genes 1 and 2, respectively, of a paralogous pair. Let b be the number of genes affected by both mutations. Under the null hypothesis of symmetric (equiprobable) loss-of-effects on other genes, a null mutation in either duplicate would have affected d1 + d2 - b other genes immediately after duplication. l1 = d2 - b and l2 = d1 - b of these effects were subsequently lost in genes 1 and 2, respectively, adding to a total of d1 + d2 - 2b lost effects. A disparity between l1 and l2 indicates asymmetry in divergence. The probability P of a disparity as big as or bigger than that actually observed, by chance alone, is calculated by summing over the tails of a binomial distribution B(d1 + d2 - 2b, ), so
|
The following are brief annotations (Mewes et al. 1999
) of all genes listed in table 1
(in order of appearance), with the exception of genes with seven-letter names, which correspond to genes of completely uncharacterized functionsMBP1: subunit of the MBF transcription factor; SWI4: transcription factor; ERP2: p24 protein involved in membrane trafficking; ERP4: similarity to human COP-coated vesicle membrane protein; CLB6: B-type cyclin; CLB2: G2/M-specific cyclin; ISW1 and ISW2: strong similarities to Drosophila ISW1 gene; RAD27: ssDNA endonuclease and 5'-3'exonuclease; VPS21: GTP-binding protein; CAT8: transcription factor involved in gluconeogenesis; SIR2: silencing regulatory protein and DNA-repair protein; HST3: silencing protein; PAU2: strong similarity to members of the Srp1p/Tip1p family; ALD5: aldehyde dehydrogenase 2 (NAD+); DIG1 and DIG2: MAP kinaseassociated proteins, down-regulator of invasive growth and mating. Further information on the genes affected by a particular perturbation is available at http://www.rosettainpharmatics.com/publications/cell_hughes.htm as well as at the Munich Information Center for Protein Sequences (http://mips.gsf.de/proj/yeast/).
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
More than 30% of yeast genes whose products interact with proteins have one or more gene duplicates in the yeast genome (Wagner 2001
). How do gene duplications influence the structure of the protein interaction network? Figure 1
shows a hypothetical protein P that interacts with four other proteins. Immediately after duplication of the gene encoding P, P and its duplicate P* share all four interactions. As the duplicates diverge in sequence, they also diverge in their protein interactions. Each protein may occasionally gain new interactions. But if mutations are more likely to cause loss of an interaction, as suggested by the prevalence of degenerative mutations in general (Li 1997
), then most divergences will be due to loss of originally common protein interactions. Here, I use the number of interaction partners a protein has as a crude one-dimensional indicator of protein function. The number of common and different interactions between two duplicates then indicates their functional divergence.
Figure 2a
shows the number of interaction partners for 1,734 pairs of paralogous genes in the network described by Uetz et al. (2000)
. These comprise all paralogous pairs with Ka < 1 nonsynonymous substitutions per nonsynonymous site, corresponding to genes with less than 60% amino acid divergence. The abscissa and ordinate axes show the number of protein interactions for the first and second protein member of each pair. The number of common interactions in these pairs is small: even among the most recent paralogues (synonymous substitutions per synonymous site Ks < 0.5) less than 60% share any interactions at all, and this number dwindles to less than 15% for more distant paralogues (Ks > 1) (Wagner 2001
).
|
The second scenario assumes that all divergence is due to symmetric (equiprobable) gain of interactions (Pl = 0) in the two duplicates. It yields identical results (Spearman rs = -0.1, P << 10-3; Pearson r = 0.46, P << 10-3, df = 1,732). The third scenario assumes that divergence is due to a mix of both loss and gain of interactions (Pl = 0.5), where both duplicates lose or gain interactions symmetrically, that is, with equal probability. It also leads to a fundamentally different distribution of interactions compared with that observed in the data. (Spearman rs = -0.08, P << 10-3; Pearson r = 0.46, P << 10-3, df = 1,732). Similar to the simulated data shown in figure 2b, the L-shape observed in the data also disappears under the latter two scenarios.
Independent genome-scale two-hybrid experiments using different experimental designs (Uetz et al. 2000
; Ito et al. 2001
) show limited overlap in the interactions they detect. It is thus advisable to ensure that the observed patterns of divergence are not artifacts of a particular experimental technique. I have repeated the above analysis with yeast protein interaction data taken from the MIPS database (Mewes et al. 1999
), from which I eliminated all protein interaction information generated by two-hybrid experiments. The remaining 899 interactions among 680 yeast proteins have been experimentally confirmed using techniques ranging from Western blotting to coimmunoprecipitation. The global pattern of interactions among paralogues follows closely that of the two-hybrid data, an L-shaped distribution indicating asymmetry (fig. 2c
) and a highly negative statistical association (Spearman rs = -0.52, P << 10-3; Pearson r = -0.15, P << 10-3, df = 1,357). This pattern is not explicable through symmetric loss of interactions (Spearman rs = 0.12, P << 10-3; Pearson r = 0.54, P << 10-3, df = 1,357), symmetric gain of interactions (Spearman rs = 0.10, P << 10-3; Pearson r = 0.49, P << 10-3, df = 1,357), or symmetric gain and loss of interactions (Spearman rs = 0.09, P << 10-3; Pearson r = 0.53, P << 10-3, df = 1,357).
In summary, protein interactions among products of duplicate genes diverge asymmetrically, i.e., one paralogue has more protein interactions than the other. This asymmetry is statistically highly significant and is not explicable through independent (equiprobable) loss or gain of function in the duplicates.
Asymmetric Response to Environmental Stresses
Unicellular organisms like yeast have evolved elaborate cellular responses, allowing them to adapt to drastic environmental changes. They can not only withstand fluctuations in temperature, osmolarity, environmental acidity, and types and quantity of nutrients but also survive the influence of radiation and toxic chemicals. During environmental change, many genes alter their transcriptional activity. Such changes in mRNA expression profile provide valuable insights into gene functions (Chu et al. 1998
; Eisen et al. 1998
; Spellman et al. 1998
; Gasch et al. 2000
). A recent study examined the genomic mRNA expression response of most yeast genes to a variety of environmental stressors (Gasch et al. 2000
). To assess the differential response of duplicate genes to these stressors, I analyzed data from 11 different stress responses, including heat shock, hyperosmotic shock, amino acid, and nitrogen starvation (Gasch et al. 2000
). I excluded the most closely related paralogues (Ks < 0.5) from the analysis because cross-hybridization does not allow them to be distinguished by microarray analysis. For the remaining 2,841 paralogous gene pairs, with Ka < 0.75 and Ks > 0.5, I identified the number of stressors to which each member of the pair responds.
There is again a pronounced asymmetry in the response of gene duplicates to these stresses, as indicated by a significantly negative statistical association between the number of stresses the first and second gene respond to (Spearman rs = -0.33, P << 10-3; Pearson r = -0.1, P << 10-3, df = 2,839). Completely analogous to the tests for symmetric divergence in protein interactions, I analyzed whether this association is consistent with the null hypothesis that the paralogues originally responded identically to these 11 stresses but that divergence occurred symmetrically for the two gene duplicates. This hypothesis must be rejected, regardless of whether divergence occurs through loss of responses (Spearman rs = -0.003, P > 0.5; Pearson r = 0.21, P << 10-3, df = 2,839), gain of responses (Spearman rs = -0.0004, P > 0.5; Pearson r = 0.21, P << 10-3, df = 2,839), or a mix of loss and gain of responses (Spearman rs = -0.002, P > 0.5; Pearson r = 0.21, P << 10-3, df = 2,839). In summary, the distinct asymmetry in divergence observed for protein interactions also holds for another aspect of gene function, the response to environmental stress.
Asymmetric Response to Genetic Perturbations
The results of a large-scale gene perturbation experiment in yeast, involving several hundred gene-knockout mutations in combination with microarray measurements of changes in the expression of 6,312 yeast genes, have been reported (Hughes et al. 2000
). Measuring the effect of a null mutation in a gene on the expression of all other genes does not distinguish between direct and indirect effects of the mutation. Its advantage, however, is that it is a very comprehensive means to assay genetic interactions.
For the purpose of this article it is relevant that the available data (Hughes et al. 2000
) contain information on the knockout effect of 11 paralogous gene pairs with Ka < 1. For these 11 gene pairs, I compared the number of genes whose expression is affected by a null mutation in each member of the pair (table 1 ). Interpreting differences between paralogues in the number of affected genes is complicated because these differences are not only the result of divergence between the paralogues but also include effects from the divergence of genes interacting with each paralogue. But the advantage of a perturbation approach is that it provides a more comprehensive assessment of functional differences between paralogues than a mere analysis of direct physical protein interactions. It exposes how the effects of a mutation ripple through a transcriptional regulation network.
Similar to the analysis discussed above, one can ask whether the observed differences between paralogues can be attributed to independent and equiprobable loss or gain of genetic interactions. For seven out of 11 gene pairs in table 1 , both these null hypotheses must be rejected, that is, these seven gene pairs show statistically significant asymmetries in divergence. Eliminating one of two paralogous genes affects a substantially greater number of other genes than eliminating the other.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
What causes asymmetric divergence? Here, I present a simple model of divergence through loss of common functions. Figure 3 explains the basic idea behind this model. It applies to the divergence of genes that have several suitably defined functions (represented by white boxes in fig. 3a
), as indicated by observed molecular interactions or patterns of gene expression. Immediately after a duplication, two duplicates are identical in all these functions. The model makes only two assumptions about the process of divergence, both of them very simple. First, every function must be exercised by at least one of the two genes. Organisms in which this does not hold will suffer reduced fitness. Second, a loss-of-function mutation (1) affects each of the duplicates with equal probability (1/2), and (2) eliminates one of the affected gene's functions. In this context, what is the probability Pd of suffering a deleterious mutation if the two duplicates have diverged symmetrically versus asymmetrically? Asymmetric divergence means that one duplicate has lost more functions than the other. Assume that since the duplication, duplicates 1 and 2 have lost a fraction l1 and l2 of their functions, respectively (0 < l1, l2 1). Let l = l1 + l2 be the total fraction of functions lost (0
l
1). If no function is allowed to have been lost in both genes, the probability that a mutational loss of one further function has a deleterious effect is equal to
|
|
|
One might assume that the selective advantage of having asymmetrically diverged gene duplicates must be minute. After all, differences in fitness do not manifest themselves until new loss-of-function mutations arise. For any organism, the expected waiting time for such a new loss-of-function mutation is proportional to the inverse of the mutation rate µ (Hartl and Clark 1989
, p. 98). During this time, symmetrically diverged gene duplicates are free to go to fixation via random drift. Formal population genetic analysis (Wagner 2000
) shows that for sufficiently large population sizes (N > 1/µ) the lens of natural selection has sufficient resolving power to perceive differences in mutational robustness and to act on them. For microorganisms like yeast, attainable population sizes may well be in the required range. In addition, this minimally required population size is based on the evolution of only one diverging gene pair (Wagner 2000
). It may be much smaller for multiple gene pairs and their cumulative effects on mutational robustness.
The requirement for large effective population sizes suggests a test for the model. In organisms with small effective population sizes, such as many higher vertebrates, we would not expect asymmetric divergence of gene duplicates. (The necessary data are not yet available.) A requirement for persistently large population sizes may also be one of the reasons why the asymmetry observed is not perfect and does not hold for all genes. Depending on a gene and its functions, a loss-of-function mutation may have very subtle fitness effects. In conjunction with fluctuating effective population sizes, the selection pressures for asymmetrical divergence may fluctuate as well. Some genes thus diverge symmetrically, whereas others do not.
The foundation of this speculative model is the assumption that gene duplicates diversify mostly through loss of common functions. The model is thus a neutral model in the sense that adaptive mutations providing fitness benefits play no role in it. Although neutral divergence of gene duplicates has received much attention in recent work (Nowak et al. 1997
; Gibson and Spring 1998
; Force, Lynch, and Postlethwait 1999
; Wagner 1999
; Lynch and Force 2000
; Wagner 2000
) and is probably an important mode of gene evolution, the importance of beneficial mutations must not be neglected (Hughes 1994
; Kreitman and Akashi 1995
; Walsh 1995
; Ludwig, Patel, and Kreitman 1997
; Cirera and Aguade 1998
; Tsaur, Ting, and Wu 1998
). Recent evidence using fully sequenced genomes further underscores the abundance of beneficial mutations and thus the importance of scenarios of sequence divergence that involve such mutations (Fay, Wyckoff, and Wu 2002
). Although it is not clear how adaptive mutations might lead to asymmetric functional divergence of gene duplicates, the cause may be as simple as that one adaptive mutation leads to a cascade of further such mutations and consequent functional change. To distinguish between neutral models of asymmetric functional divergence and models involving adaptive mutations will be a major task for future work.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
Keywords: protein interaction networks
microarrays
gene knockout
biochemical innovation
Address for correspondence and reprints: Andreas Wagner, Department of Biology, University of New Mexico, 167A Castetter Hall, Albuquerque, New Mexico 817131-1091. wagnera{at}unm.edu
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, D. J. Lippman, 1997 Gapped blast and psi-blast: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402
Ashburner M., C. A. Ball, J. A. Blake, et al. (20 co-authors) 2000 Gene ontology: tool for the unification of biology Nat. Genet 25:25-29[ISI][Medline]
Bartel P. L., J. A. Roecklein, D. SenGupta, S. Fields, 1996 A protein linkage map of Escherichia coli bacteriophage T7 Nat. Genet 12:72-77[ISI][Medline]
Bender W., M. Akam, F. Karch, P. A. Beachy, M. Peifer, P. Spierer, E. B. Lewis, D. S. Hogness, 1983 Molecular genetics of the Bithorax complex in Drosophila melanogaster Science 221:23-29[ISI]
Chu S., J. Derisi, M. Eisen, J. Mulholland, D. Botstein, P. O. Brown, I. Herskowitz, 1998 The transcriptional program of sporulation in budding yeast Science 282:699-705
Cirera S., M. Aguade, 1998 Molecular evolution of a duplication: the sex-peptide (Acp70a) gene region of Drosophila subobscura and Drosophila madeirensis Mol. Biol. Evol 15:988-996[Abstract]
Coen E. S., E. M. Meyerowitz, 1991 The war of the whorls: genetic interactions controlling flower development Nature 353:31-37[ISI][Medline]
Eisen M. B., P. T. Spellman, P. O. Brown, D. Botstein, 1998 Cluster analysis and display of genome-wide expression patterns Proc. Natl. Acad. Sci. USA 95:14863-14868
Ekker M., M. A. Akimenko, M. L. Allende, R. Smith, G. Drouin, R. M. Langille, E. S. Weinberg, M. Westerfield, 1997 Relationships among msx gene structure and function in zebrafish and other vertebrates Mol. Biol. Evol 14:1008-1022[Abstract]
Ekker S. C., A. R. Ungar, P. Greenstein, D. P. Vonkessler, J. A. Porter, R. T. Moon, P. A. Beachy, 1995 Patterning activities of vertebrate hedgehog proteins in the developing eye and brain Curr. Biol 5:944-955[ISI][Medline]
Fay J. C., G. J. Wyckoff, C. I. Wu, 2002 Testing the neutral theory of molecular evolution with genomic data from Drosophila Nature 415:1024-1026[ISI][Medline]
Force A., M. Lynch, J. Postlethwait, 1999 Preservation of duplicate genes by subfunctionalization Am. Zool 39:460.
Fromont-Racine M., J. C. Rain, P. Legrain, 1997 Toward a functional analysis of the yeast genome through exhaustive two-hybrid screens Nat. Genet 16:277-282[ISI][Medline]
Gasch A. P., P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, P. O. Brown, 2000 Genomic expression programs in the response of yeast cells to environmental change Mol. Biol. Cell 11:4241-4257
Gibson T. J., J. Spring, 1998 Genetic redundancy in vertebrates: polyploidy and persistence of genes encoding multidomain proteins Trends Genet 14:46-49[ISI][Medline]
Hartl D. L., A. G. Clark, 1989 Principles of population genetics Sinauer Associates, Sunderland, Mass
Hughes A. L., 1994 The evolution of functionally novel proteins after gene duplication Proc. R. Soc. Lond. Ser. B Biol. Sci 256:119-124[ISI][Medline]
Hughes T. R., M. J. Marton, A. R. Jones, et al. (22 co-authors) 2000 Functional discovery via a compendium of expression profiles Cell 102:109-126[ISI][Medline]
Ito T., T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, Y. Sakaki, 2001 A comprehensive two-hybrid analysis to explore the yeast protein interactome Proc. Natl. Acad. Sci. USA 98:4569-4574
Ito T., K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, Y. Sakaki, 2000 Toward a protein-protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins Proc. Natl. Acad. Sci. USA 97:1143-1147
Jack J., Y. Delotto, 1995 Structure and regulation of a complex locus: the cut gene of Drosophila Genetics 139:1689-1700
Kirchhamer C. V., C. H. Yuh, E. H. Davidson, 1996 Modular cis-regulatory organization of developmentally expressed genes: 2 genes transcribed territorially in the sea-urchin embryo and additional examples Proc. Natl. Acad. Sci. USA 93:9322-9328
Kreitman M., H. Akashi, 1995 Molecular evidence for natural selection Annu. Rev. Ecol. Syst 26:403-422[ISI]
Lee K. H., Q. H. Xu, R. E. Breitbart, 1996 A new tinman-related gene, nkx2.7, anticipates the expression of nkx2.5 and nkx2.3 in zebrafish heart and pharyngeal endoderm Dev. Biol 180:722-731[ISI][Medline]
Li W.-H., 1993 Unbiased estimation of the rates of synonymous and nonsynonymous substitution J. Mol. Evol 36:96-99[ISI][Medline]
Li W.-H., 1997 Molecular evolution Sinauer Associates, Sunderland, Mass
Li X. L., M. Noll, 1994 Evolution of distinct developmental functions of 3 Drosophila genes by acquisition of different cis-regulatory regions Nature 367:83-87[ISI][Medline]
Ludwig M. Z., N. H. Patel, M. Kreitman, 1997 Evolution of the even-skipped stripe-2 enhancer of Drosophila Dev. Biol 186:A27.[ISI]
Lynch M., J. S. Conery, 2000 The evolutionary fate and consequences of duplicate genes Science 290:1151-1155
Lynch M., A. Force, 2000 The probability of duplicate gene preservation by subfunctionalization Genetics 154:459-473
Mehlhorn K., S. Naher, 1999 LEDA: a platform for combinatorial and geometric computing Cambridge University Press, Cambridge
Mena M., B. A. Ambrose, R. B. Meeley, S. P. Briggs, M. F. Yanofsky, R. J. Schmidt, 1996 Diversification of C-function activity in maize flower development Science 274:1537-1540
Mewes H. W., K. Heumann, A. Kaps, K. Mayer, F. Pfeiffer, S. Stocker, D. Frishman, 1999 MPS: a database for genomes and protein sequences Nucleic Acids Res 27:44-48
Nowak M. A., M. C. Boerlijst, J. Cooke, J. Maynard-Smith, 1997 Evolution of genetic redundancy Nature 388:167-171[ISI][Medline]
Rubin G. M., M. D. Yandell, J. R. Wortman, et al. (54 co-authors) 2000 Comparative genomics of the eukaryotes Science 287:2204-2215
Schmidt R. J., B. Veit, M. A. Mandel, M. Mena, S. Hake, M. F. Yanofsky, 1993 Identification and molecular characterization of zag1, the maize homolog of the Arabidopsis floral homeotic gene agamous Plant Cell 5:729-737
Schwikowski B., P. Uetz, S. Fields, 2000 A network of protein-protein interactions in yeast Nat. Biotechnol 18:1257-1261[ISI][Medline]
Slusarski D. C., C. K. Motzny, R. Holmgren, 1995 Mutations that alter the timing and pattern of cubitus interruptus gene expression in Drosophila melanogaster Genetics 139:229-240
Spellman P. T., G. Sherlock, B. Futcher, P. O. Brown, D. Botstein, 1998 Identification of cell-cycle regulated genes in yeast by DNA microarray hybridization Mol. Biol. Cell 9:2155.
Tsaur S. C., C. T. Ting, C. I. Wu, 1998 Positive selection driving the evolution of a gene of male reproduction, Acp26aa, of Drosophila: divergence versus polymorphism Mol. Biol. Evol 15:1040-1046[Abstract]
Uetz P., L. Giot, G. Cagney, et al. (20 co-authors) 2000 A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Nature 403:623-627[ISI][Medline]
Wagner A., 1999 Redundant gene functions and natural selection J. Evol. Biol 12:1-16[ISI]
. 2000 The role of pleiotropy, population size fluctuations, and fitness effects of mutations in the evolution of redundant gene functions Genetics 154:1389-1401
. 2001 The yeast protein interaction network evolves rapidly and contains few duplicate genes Mol. Biol. Evol 18:1283-1292
Walsh J. B., 1995 How often do duplicated genes evolve new functions? Genetics 139:421-428