Department of Biology, University of New Mexico, and Santa Fe Institute, Santa Fe, New Mexico
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Only indirect evidence is available for rapid functional divergence after gene duplication. A number of case studies suggest that positive selection of advantageous mutations occurs after duplication (Long and Langley 1993
; Benton et al. 1997
; Cirera and Aguade 1998
; Tsaur, Ting, and Wu 1998
; Zhang, Rosenberg, and Nei 1998
). Unfortunately, it is usually unknown whether such mutations often cause a change in function. An exception is the primate genes for eosinophil cationic proteins (ECPs) and for eosinophil-derived neurotoxin (EDN). They were duplicated an estimated 31 MYA (Zhang et al. 1998
). EDN has high RNAse activity and may act as an antiretroviral agent. ECP is an antibacterial toxin exerting its effects independently of RNAse activity by making pores in bacterial cell membranes. A second line of evidence involves the divergence of expression patterns after duplication. If duplicates are expressed in different parts of an organism, then they are likely to have different biological functions, regardless of whether their biochemical activities are identical. Examples include the expression patterns of the dopa decarboxylase gene and the
-methyldopa hypersensitive (amd) gene from Drosophila melanogaster. In the closely related sibling species Drosophila simulans, the amd gene has evolved a new expression pattern within the last 25 Myr (Wang, Marsh, and Ayala 1996
).
The relatively few examples of rapidly diverging functions in duplicate genes contrast with the greater documented number of gene duplicates with similar functions. However, it is not valid to conclude that rapid divergence of function is rare. After all, our knowledge in this area relies only on case studies. The purpose of this paper is to address this question more systematically and on a genomewide scale. At what rate does functional divergence occur after gene duplication for a large sample of duplicate genes in a genome? Addressing this question will also shed light on the evolution of a large genetic network.
Before studying the evolution of function in many genes, one has to make a difficult choice. What aspect of function should one focus on? Genes exert their biological roles in many different ways. Some gene products are parts of subcellular structures, others engage in protein-protein interactions, protein-DNA interactions, or catalytic interactions with small molecules. Moreover, genes with the same biochemical activities may be expressed at different times or in different places. A detailed biochemical understanding of the function of all of an organism's genes is not within reach. However, genomic technologies permit the analysis of one aspect of gene function for all genes. For instance, microarray analysis (DeRisi, Iyer, and Brown 1997
) can be used to analyze transcriptional regulatory interactions on a genomewide scale. The validity of any functional genomics study rests on the implicit assumption that studying one aspect of gene function allows us to learn about the biology of an organism. The aspect of gene function studied here is protein-protein interaction, i.e., the number and identity of proteins with which the products of duplicate genes in the yeast Saccharomyces cerevisiae interact. The required information on these protein-protein interactions comes from a large-scale experiment (Uetz et al. 2000)
using the yeast two-hybrid assay (Fields and Song 1989
).
The yeast two-hybrid assay (Fields and Song 1989
) takes advantage of the fact that many eukaryotic transcription factors have two separate domains, one required for binding of the transcription factor to DNA, and another one required for interaction with RNA polymerase II and for the initiation of transcription. In a two-hybrid assay, a known protein fused to the DNA-binding domain of a transcription factor is transfected into a yeast cell bearing a reporter gene that is under the control of the DNA-binding domain. This fusion protein can then be used as "bait" to screen a library of cDNA clones fused to the activation domain of the transcription factor. If the bait protein interacts physically with a fusion protein expressed from the library, then the DNA-binding domain and the activation domain are held in physical proximity, and the reporter gene is expressed. In an effort to elucidate protein interactions on a genomewide basis, this method has been applied to the genomes of viruses (Bartel et al. 1996
; McGraith et al. 2000
), as well as to the genome of budding yeast, yielding a comprehensive map of protein-protein interactions in yeast (Uetz et al. 2000)
.
Large-scale two-hybrid assays are tools to identify candidates for interacting proteins. However, they may identify interactions erroneously. First, a protein itself may be able to activate transcription without interacting with an activation domain fusion. Second, because of the extensive use of chimeric proteins, misfolding might occur and yield spurious interactions. Third, some proteins might be toxic when expressed in yeast. Fourth, proteins that are coexpressed during the assay might not normally be expressed at the same time or in the same location. Other interactions might be missed, for example, because they are too transient or because a protein has a strong targeting signal directing it to some compartment other than the nucleus. In sum, the map of protein-protein interactions (Uetz et al. 2000)
used as the raw material for this study is best viewed as a statistical estimate of the protein interaction network, an estimate with a possibly considerable number of false-positive and false-negative interactions.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Random Graph Comparison
Two types of random graphs are explored here: Erdõs-Rényi (ER) random graphs, and random graphs with a degree distribution that follows a power law (PL). An ER random graph consists of n nodes and k edges, where any pair of nodes is equally likely to be connected by one of the k edges (Bollobás 1985
). Finding a random graph with a best fit to the observed protein network is made difficult by the fact that a random graph of the same n and k as the yeast protein network has an expected number of n0 = 159 isolated nodes, i.e., nodes that are not connected to any other nodes (n0
n[1 - 2k/(n(n - 1)]n-1 for a sparse random graph). However, by its very nature, the protein interaction network is defined only for nodes that are not isolated. In order to find a random graph that is a best fit for the yeast protein interaction network but does not have any isolated nodes, a random graph with the same k but a larger number n of nodes than the protein network was chosen. With the aid of the above formula, this n was chosen such that when all isolated nodes were removed from the resulting random graph, the remaining graph had (1) an expected number of n = 999 nodes, (2) k = 899 edges, and (3) no isolated edges, as does the yeast protein interaction network. The required value of n was calculated numerically as n = 1,325. Random number generators used were based on Mehlhorn and Naher (1999, section 3.2.2) as described by Knuth (1981, p. 92).
PL random graphs are random graphs whose degree probability distribution P(d) is proportional to d- for some constant
. Such graphs, with a prespecified number of n = 6,279 nodes and a number of edges k approximately identical to that of the yeast protein interaction network, were generated following a prescription by M. Newman (personal communication). Briefly, a random graph with n = 6,279 isolated nodes was generated. A node was then chosen at random from this graph. A random integer d > 0 with the desired PL distribution was then assigned to this node in the following way. First, a random number d =
-
*log(1 - r)
was generated, where r is a random real number uniformly distributed in the interval (0, 1), and
> 0 is a constant (see below). The symbol
x
refers to the smallest integer greater than x. Second, this number d was accepted with probability d-
. If d was not accepted, it was discarded, a new d was generated according to the same prescription, and this process was repeated until a d was accepted. Strictly speaking, the resulting distribution of d is a PL with an exponential cutoff, P(d)
d-
exp(-d/
). However, a large value of
= 1,000 was used here, such that the distortion caused by the cutoff was negligible. Once a d was accepted, it was assigned to the randomly chosen node. Another node was chosen at random (without replacement of the previously chosen node), an integer d was assigned to it in the same way, and this process was repeated until the sum S of all the integers assigned to the chosen nodes first exceeded 2k. The integers assigned to each node correspond to the node's degree. They may also be thought of as the "stubs" of edges emerging from a node. Two such stubs were then chosen at random, and the respective nodes were connected via an edge until the reservoir of stubs was exhausted, i.e., until S/2 edges had been placed on the graph.
Gene Duplication Data
Data on yeast gene duplicates were obtained from John Conery (Department of Computer Science, University of Oregon) and were generated as described in Lynch and Conery (2000)
. Briefly, gapped BLAST (Altschul et al. 1997
) was used for pairwise amino acid sequence comparisons of all yeast open reading frames as obtained from GenBank. All protein pairs with BLAST alignment scores greater than 10-2 were retained for further analysis. Then, the following conservative approach was followed to retain only unambiguously aligned sequences. Using the protein alignment generated by BLAST as a guide, a sequence pair was scanned to the right of each alignment gap. All sequence from the end of the gap through the first "anchor" pair of matched amino acids was discarded. All subsequent sequence (exclusive the anchor pair of amino acids) was retained if a second pair of matching amino acids was found within less than six amino acids from the first. This procedure was then repeated to the left of each alignment gap (see Lynch and Conery [2000] for a more detailed description and justification). The retained portion of each amino acid sequence alignment was then used jointly with DNA sequence information to generate nucleotide sequence alignments of genes. For each gene pair in this data set, the fraction Ks of synonymous (silent) substitutions per silent site, as well as the fraction Ka of replacement substitutions per replacement site, was estimated using the method of Li (1993)
. Of all aligned gene pairs, only those with Ks < 5 were included in the analysis presented here. There were 9,059 such pairs (1,041 pairs with 0
Ks < 1, 2,378 pairs with 1
Ks < 2, 3,822 pairs with 2
Ks < 3, 1,403 pairs with 3
Ks < 4, and 415 pairs with 4
Ks < 5). The mean and maximum values of Ka for these 9,059 gene pairs were 0.68 (SE = 2.8 x 10-3) and 1.55, respectively. Both member genes were part of the protein interaction network for 387 among these 9,059 genes. Genes with only one paralog and genes that occurred in multigene families were not distinguished here. All results reported are based on estimates of Ks.
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
It is conceivable that the size of the subnet a gene is part of influences its propensity to undergo duplications. For genes found outside the largest component of the network, there were mean component sizes of 6.34 (s = 0.62) and 6.49 (s = 0.56) for single-copy genes (n = 191) and duplicated genes (n = 249), respectively, a difference that is not statistically significant (F = 0.03; P = 0.86). This did not change if only duplicates with Ks < 1 were considered (results not shown).
Evolution of Shared Interaction Partners Among Closely Related Proteins
Figure 3
illustrates the effect of a gene duplication on gene products involved in protein interactions. Shortly after the duplication, the products of both duplicated genes will have identical interaction partners. Over time, either gene may gain new interaction partners or, perhaps more likely, lose one or more of its interaction partners. The number of shared interactions might be taken as a crude measure of the overlap in the two genes' functions. Eventually, duplicate genes may not interact at all with any of the proteins that they interacted with before duplication, or they may even cease to engage in protein-protein interactions. At least two evolutionary questions can be posed in this context. First, on what timescale does this divergence take place? Second, do most gene duplicates eventually lose all common interactions, or do they retain some common interactions indefinitely? Because absolute duplication time estimates are unavailable for most yeast gene duplications, divergence estimates among duplicates here are based on the fraction Ks of synonymous substitutions per synonymous site (Li 1997
). Divergence among synonymous sites is a better indicator of relative times since duplication than many other distance measures between DNA or protein sequences, partly because these sites are under fewer evolutionary constraints than are nonsynonymous sites. The rate of spontaneous mutations per base pair and per round of DNA replication for yeast (2.2 x 10-10) is similar to those of Drosophila (3.4 x 10-10) and mice (1.8 x 10-10; Drake et al. 1998
). Based on this observation, and in the absence of measurements of the rate of synonymous substitutions for yeast, it is assumed here that the rate of synonymous substitutions is similar as well. Both mammals and drosophilids are taxa for which rates of synonymous substitutions are known for a variety of genes (Li 1997
). Based on this information, the yeast rate is assumed to be the average rate of mammals (3.5 substitutions per site per billion years) and Drosophilids (15.4 per billion years; Li 1997
). This yields Ks = 9.45 per site per billion years. A Ks = 1 between two genes would then indicate that approximately 100 Myr have passed since their duplication. This estimate is consistent with the evolutionary distance (mean Ks = 1.66;
= 0.94) between 390 genes duplicated in an ancient yeast genome duplication event estimated to have occurred less than 150 Myr ago (Wolfe and Shields 1997
).
|
How does the fraction of interaction partners shared by duplicate genes evolve? Figure 4a
shows the fraction of duplicate genes with shared interaction partners as a function of Ks. The line labeled "random expectation" indicates the probability that two proteins share an interaction partner if the two proteins are picked at random from the protein interaction network. Asterisks indicate whether the number of duplicates with shared interactions is significantly different from the random expectation as assessed by a 2 test. Strikingly, already for 0.5 < Ks < 1, only 20% of duplicate gene pairs share an interaction partner. That is, if one applies this criterion of functional overlap, 80% of genes have no functional overlap with their duplicates approximately 100 Myr after the duplication. For Ks > 2, the probability that two gene duplicates share an interaction partner approaches the value expected for randomly chosen gene pairs.
|
Distribution of Duplicate Gene Products Among Subnets
Proteins that are part of the same subnet may have related biological functions in the sense that they act in the same cellular process. If so, then the function of gene duplicates that are part of the same subnet could be called conserved in this crude sense. Figure 5
shows a histogram of the fraction of duplicates that are part of the same subnet as a function of Ks. For Ks > 1.5, the probability that two duplicates are part of the same subnet approaches the probability that two randomly chosen proteins are part of the same subnet. Thus, products of duplicate genes do not remain associated with the same group of interacting proteins.
|
The Rate of Interaction Loss
Because many characterized protein-protein interactions are responsible for crucial cellular functions, they might not change much over time. However, the rapid divergence of common interaction partners after gene duplication shows otherwise. Under the assumption that this change is caused predominantly by loss of interactions, one can also put a lower bound on the rate at which interactions are lost. There are 127 duplicate gene pairs with Ks < 2 where both duplicates engage in protein-protein interactions. Assuming that all of the diversification observed between these duplicates is due to lost interactions, one arrives at a total estimate of 920 interactions immediately after duplication, 429 of which have been lost since. This amounts to a lower bound of (429/920)(1/200) = 2.3 x 10-3/Myr for the probability that a protein interaction is lost. Notice that the actual rate of interaction loss may be much higher, because (1) interactions lost in both duplicate genes cannot be observed and are not accounted for, and (2) many duplicates in this set are younger than 200 Myr. Also, most interactions may be lost shortly after duplication. A very similar estimate is obtained if one corrects for multigene families, admitting only one gene pair per gene family (42 gene pairs, for which 140 of 318 interactions have been lost since duplication, leading to a lower bound of 2.2 x 10-3 lost interactions per million years).
The Evolution of Self-Interactions and New Interactions
Forty-three proteins are reported to interact with themselves ("self-interactors"), 16 of which have one or more paralogs in the yeast genome. There are also 20 paralogous gene pairs whose products interact with each other. Figure 6a
illustrates the two different routes by which self-interactions and cross-interactions among duplicate genes may evolve. First, a gene product may have been a self-interactor before duplication. In this case, observed self-interactions and interactions between paralogs are most likely a remnant of self-interaction before duplication. Second, the interactions may have evolved de novo after the duplication. Figure 6b
shows the number of paralogs in yeast with Ks < 5 and the indicated combination of self-interactions and cross-interactions. The data set is too small to allow rigorous statistical analysis, but there are at least two conspicuous features. First, there is only 1 out of 25 paralogous pairs for which both proteins show self-interaction. It is not obvious why two duplicates should independently lose the ability to interact with themselves so frequently. Second, for 13 out of 21 paralogous pairs with interactions between the duplicates, neither duplicate shows self-interactions. Thus, there appears to be an abundance of paralogous pairs whose features are more easily explained if one assumes that interactions between duplicates evolve de novo at an appreciable rate.
To obtain a crude estimate of how rapidly new protein interactions might evolve, assume that among the 20 observed interactions between duplicates, only those 13 interactions where neither paralog self-interacts have evolved de novo after the duplication (fig. 6b
). These 13 genes are among 9,059 duplicate gene pairs with Ks < 5. Thus, in the time it took to accumulate five synonymous substitutions per synonymous site, a fraction 13/9,059 = 1.44 x 10-3 of gene products evolved new interactions. This yields more than 2.88 x 10-6 new interactions per protein pair per million years, if a Ks = 1 corresponds to 100 Myr. There are approximately n = 6,280 open reading frames in the yeast genome, with n(n - 1)/2 = 1.97 x 107 possible pairwise interactions. Extrapolating the above estimate to the entire yeast proteome would thus yield (1.97 x 107)(2.88 x 10-6) = 56.7 newly evolved interactions per million years. The multiple caveats to this calculation include uncertainty in the precise number of newly evolved interactions, problems with estimating large Ks values due to biases in correcting for multiple substitutions (Li 1997
), and the assumption that all yeast proteins can evolve interactions. Also, many gene pairs in the data set have Ks < 5, suggesting that the actual rate of evolution of new interactions is higher. However, even with a possibly large margin of error, the above calculations illustrate that the number of interactions evolving de novo is not negligible.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Gene functions comprise much more than protein-protein interactions, as is evident from the fact that only 16% of all yeast proteins interact with other proteins. Thus, the value of this analysis would clearly be strengthened if similar results emerged from studies using complementary technologies such as microarray analysis of transcriptional regulation (DeRisi, Iyer, and Brown 1997
). The opportunity to use genomic technology to analyze the evolution of many duplicate gene pairs clearly comes at the price of focusing on merely one aspect of function. Consequently, interpretation of the results reported here would have been difficult if protein-protein interactions had been extensively conserved after gene duplication. This is because other aspects of gene function, such as catalytic activity and spatiotemporal expression pattern, might still have diverged. However, exactly the opposite is observed. Even just considering protein-protein interactions, one can conclude that diversification must be extensive after duplication. This might be seen as a contradiction to the observation that up to 40% of synthetic null mutations in yeast open reading frames show weak phenotypic effects (Smith et al. 1996
). Many of these mutations involve duplicate genes. However, although widely assumed, redundancy among duplicate genes may not be the cause of this phenomenon (Wagner 2000)
.
How Do Most Duplicate Gene Products Arrive at Different Subnets?
Most gene duplicates eventually reside in different subnets of the protein interaction graph. If the de novo evolution of new interactions is extremely rare, then only new gene duplications would replenish lost interactions. In this case, duplicate genes would reside in different subnets, because interactions are constantly lost. Originally connected groups of proteins would become disconnected, and the protein interaction network would become increasingly fragmented. However, the observed patterns of interaction between duplicate gene products (fig. 6
) suggest that de novo evolution of new interactions does occur. If its rate is sufficiently high, then duplicate gene products may reside within different subnets because new interactions evolved between one of them and gene products in another subnet. The role of interaction loss would then be primarily one of severing existing ties between duplicate gene products. Whether the rate at which new interactions evolve is sufficiently high for this second scenario is an open question.
Interaction Turnover in the Protein Network
Not all protein pairs may be able to engage in physical interactions, and some existing interactions might be absolutely indispensable for cellular function. However, a significant rate of interaction turnover is evident in the protein interaction network. Based on observed interactions between duplicate proteins without self-interaction (fig. 6
), it was crudely estimated above that new interactions evolved at a rate of 2.88 x 10-6 per protein pair per million years. This may seem small. However, if extrapolated to all 1.97 x 107 possible pairwise interactions in the yeast proteome, one arrives at an estimate of 57 newly evolving interactions per million years. Even when restricting oneself to the 985 proteins known to interact with other proteins, one arrives at an estimate of (2.88 x 10-6)(4.84 x 105) 1.4 newly evolved interactions per million years. The true value is likely to lie somewhere in between.
Based on the assumption that the divergence in protein interactions after gene duplication is largely due to interaction loss, one can put a lower bound on the rate at which interactions get lost at 2.2 x 10-3 per interaction per million years. If a comparable rate holds for interactions between single-copy genes, then 50% of all interactions get lost every 300 Myr. Replenishment of lost interactions would take place through gene duplication and de novo evolution of interactions. Even in the unlikely case that all loss of interaction is restricted to duplicated genes, the rate of interaction turnover would be substantial. This is because the majority (57% for Ks < 5) of all genes in the network have one or more paralogs.
Outlook
The observations made here stimulate a multitude of questions regarding their evolutionary significance. Is the change in network structure driven by neutral evolution or by natural selection for advantageous interaction patterns? Similarly, are there many alternative configurations of the network that perform the network's function equally well? Are many protein-protein interactions of little functional significance, or are most of them critical? Does the change in network configurations over time reflect different environments or different adaptations that the organism evolved at different times? Can one understand the network's global structure from a few key parameters, such as the rate of gene duplications and the rate of interaction loss? Perhaps genomic technologies, focusing on the big picture of gene functions, cannot determine the "blueprint" of organismal design. At the very least, however, they open new levels of inquiry into the evolution of this design.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
1 Keywords: protein network
evolution
Saccharomyces cerevisiae,
yeast
gene duplication
2 Address for correspondence and reprints: Andreas Wagner, University of New Mexico, Department of Biology, 167A Castetter Hall, Albuquerque, New Mexico 87131-1091. E-mail: wagnera{at}unm.edu
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul S. F., T. L. Madden, A. A. Schaffer, J. H. Zhang, Z. Zhang, W. Miller, D. J. Lipman, 1997 Gapped Blast and Psi-Blast: a new generation of protein database search programs Nucleic Acids Res 25:3389-3402
Bailey W. J., J. Kim, G. P. Wagner, F. H. Ruddle, 1997 Phylogenetic reconstruction of vertebrate Hox cluster duplications Mol. Biol. Evol 14:843-853[Abstract]
Barabasi A.-L., R. Alberty, 1999 Emergence of scaling in random networks Science 286:509-512
Bartel P. L., J. A. Roecklein, D. SenGupta, S. Fields, 1996 A protein linkage map of Escherichia coli bacteriophage T7 Nat. Genet 12:72-77[ISI][Medline]
Bennetzen J. L., B. D. Hall, 1981 Codon selection in yeast J. Biol. Chem 257:3026-3031
Benton B. K., A. Tinkelenberg, I. Gonzalez, F. R. Cross, 1997 Cla4p, a Saccharomyces-Cerevisiae Cdc42p-activated kinase involved in cytokinesis is activated at mitosis Mol. Cell. Biol 17:5067-5076[Abstract]
Benton B., A. Tinkelenberg, D. Jean, S. Plump, F. Cross, 1993 Genetic analysis of Cln/Cdc28 regulation of cell morphogenesis in budding yeast EMBO J 12:5267-5275[Abstract]
Bollobás B., 1985 Random graphs Academic Press, London
Cadigan K. M., U. Grossniklaus, W. J. Gehring, 1994 Functional redundancy: the respective roles of the 2 sloppy paired genes in Drosophila segmentation Proc. Natl. Acad. Sci. USA 91:6324-6328[Abstract]
Cirera S., M. Aguade, 1998 Molecular evolution of a duplication: the sex-peptide (Acp70a) gene region of Drosophila subobscura and Drosophila madeirensis Mol. Biol. Evol 15:988-996[Abstract]
Costanzo M. C., J. D. Hogan, M. E. Cusick, et al. (14 co-authors) 2000 The Yeast Proteome Database (YPD) and Caenorhabditis elegans Proteome Database (WormPD): comprehensive resources for the organization and comparison of model organism protein information Nucleic Acids Res 28:73-76
DeRisi J. L., V. R. Iyer, P. O. Brown, 1997 Exploring the metabolic and genetic control of gene expression on a genomic scale Science 278:680-686
Drake J. W., B. Charlesworth, D. Charlesworth, J. F. Crow, 1998 Rates of spontaneous mutation Genetics 148:1667-1686
Fell D., A. Wagner, 2000 The small world of metabolism Nat. Biotechnol 18:1121-1122[ISI][Medline]
Fields S., O. K. Song, 1989 A novel genetic system to detect protein protein interactions Nature 340:245-246[ISI][Medline]
Fromental-Ramain C., X. Warot, S. Lakkaraju, B. Favier, H. Haack, C. Birling, A. Dierich, P. Dolle, P. Chambon, 1996 Specific and redundant functions of the paralogous Hoxa-9 and Hoxd-9 genes in forelimb and axial skeleton patterning Development 122:461-472
Gonzalez-Gaitan M., M. Rothe, E. A. Wimmer, H. Taubert, H. Jackle, 1994 Redundant functions of the genes knirps and knirps-related for the establishment of anterior Drosophila head structures Proc. Natl. Acad. Sci. USA 91:8567-8571[Abstract]
Jeong H., B. Tombor, R. Albert, Z. N. Oltvai, A. L. Barabasi, 2000 The large-scale organization of metabolic networks Nature 407:651-654[ISI][Medline]
Joyner A. L., K. Herrup, B. A. Auerbach, C. A. Davis, J. Rossant, 1991 Subtle cerebellar phenotype in mice homozygous for a targeted deletion of the En-2 homeobox Science 251:1239-1243[ISI][Medline]
Knuth D. E., 1981 The art of computer programming Vol. 2. Seminumerical algorithms. Addison-Wesley, New York
Li W.-H., 1993 Unbiased estimation of the rates of synonymous and nonsynonymous substitution J. Mol. Evol 36:96-99[ISI][Medline]
. 1997 Molecular evolution Sinauer, Sunderland, Mass
Long M. Y., C. H. Langley, 1993 Natural-selection and the origin of jingwei, a chimeric processed functional gene in Drosophila Science 260:91-95[ISI][Medline]
Lynch M., J. S. Conery, 2000 The evolutionary fate and consequences of duplicate genes Science 290:1151-1155
McGraith S., T. Holtzman, B. Moss, S. Fields, 2000 Genome-wide analysis of vaccinia virus protein-protein interactions Proc. Natl. Acad. Sci. USA 97:4879-4884
Mehlhorn K., S. Naher, 1999 LEDA: a platform for combinatorial and geometric computing Cambridge University Press, Cambridge, England
Nasmyth K., 1993 Control of the yeast cell cycle by the Cdc28 protein kinase Curr. Opin. Cell Biol 5:166-179[Medline]
Sharman A. C., P. W. H. Holland, 1996 Conservation, duplication, and divergence of developmental genes during chordate evolution Neth. J. Zool 46:47-67[ISI]
Smith V., K. N. Chou, D. Lashkari, D. Botstein, P. O. Brown, 1996 Functional analysis of the genes of yeast chromosome-V by genetic footprinting Science 274:2069-2074.
Tautz D., 1992 Redundancies, development and the flow of information Bioessays 14:263-266[ISI][Medline]
Thomas J. H., 1993 Thinking about genetic redundancy Trends Genet 9:395-399[ISI][Medline]
Toda T., S. Cameron, P. Sass, M. Zoller, M. Wigler, 1987 Three different genes in S. cerevisiae encode the catalytic subunits of the cAMP-dependent protein kinase Cell 50:277-287[ISI][Medline]
Tsaur S. C., C. T. Ting, C. I. Wu, 1998 Positive selection driving the evolution of a gene of male reproduction, Acp26aa, of Drosophila: divergence versus polymorphism Mol. Biol. Evol 15:1040-1046[Abstract]
Uetz P., L. Giot, G. Cagney, et al. (20 co-authors) 2000 A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Nature 403:623-627[ISI][Medline]
Wagner A., 2000 Mutational robustness in genetic networks of yeast Nat. Genet 24:355-361[ISI][Medline]
Wagner A., D. Fell, 2001 The small world inside large metabolic networks (in press)
Wang D. G., J. L. Marsh, F. J. Ayala, 1996 Evolutionary changes in the expression pattern of a developmentally essential gene in 3 Drosophila species Proc. Natl. Acad. Sci. USA 93:7103-7107
Wang Y. K., P. N. J. Schnegelsberg, J. Dausman, R. Jaenisch, 1996 Functional redundancy of the muscle-specific transcription factors Myf5 and Myogenin Nature 379:823-825[ISI][Medline]
Watts D. J., S. H. Strogatz, 1998 Collective dynamics of small-world networks Nature 393:440-442[ISI][Medline]
Wilkins A., 1997 Canalization: a molecular genetic perspective Bioessays 19:257-262[ISI][Medline]
Wolfe K. H., D. C. Shields, 1997 Molecular evidence for an ancient duplication of the entire yeast genome Nature 387:708-713[ISI][Medline]
Zhang J. Z., H. F. Rosenberg, M. Nei, 1998 Positive Darwinian selection after gene duplication in primate ribonuclease genes Proc. Natl. Acad. Sci. USA 95:3708-3713