* EA Biodiversité 2202, Université de Provence, Marseille, France
INSERM U119, Marseille, France
Correspondence: E-mail: avienne{at}up.univ-mrs.fr.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: vertebrate evolution 2R paralogous regions polyploidisation diploidisation mapping time divergence
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Polyploidizationthe addition of one or more complete sets of chromosomes to the original set (Graur and Li 2000, pp. 480482)is extensive in plants and has been shown to be an important process in plant speciation as well as in the evolution of vertebrates (Lundin 1993). To give two examples of plant polyploidization, consider that 50% to 70% of the angiosperms (Wendel 2000), and approximately 95% of pteridophytes (ferns) (Masterson 1994) are thought to have experienced at least one polyploidization event during their evolution.
Recently, 103 duplicated chromosomal regions in the Arabidopsis thaliana genome have been identified and are hypothesized to be the product of a unique polyploidization event (Vision, Brown, and Tanksley 2000). However, the overlap of some of these regions and their presence in different ways presupposes the occurrence of many polyploidization events at different times. Other examples are known in plants, as in Gossypium hirsutum, the genome of which resulted from a single allopolyploidization event between two diploid genomes (Cro, Small, and Wendel 1999). Although polyploidization is most common in plants, it is nonetheless observed throughout the tree of life, and some examples of polyploid animals are known, including the tetraploid common carp Cyprinus carpio (bony fishes) and Hyla versicolor (amphibian) (as reviewed in Soltis and Soltis 1999).The stigmata of such events can also be observed in the genomes of Saccharomyces cerevisiae (Wolfe and Shields 1997) and Danio rerio (Taylor, Van De Peer, and Meyer 2001).
Thirty years ago, Susumo Ohno (Ohno 1970) suggested that vertebrate genomes evolved by two rounds of genome-wide duplications. This hypothesis of two tetraploidization events was based on the difference in the size of the genomes and the amount of DNA per haploid cell between many vertebrate and chordate taxa.
One of the first arguments in support of such duplications was the identification of numerous co-orthologs in vertebrates for one gene in cephalochordata (e.g., amphioxus). Many examples are known of one ortholog identified in amphioxus to two, three, or four co-orthologs in vertebrates (see for example Holland, Holland, and Kozmik 1995). This implies that the duplication events for these genes occurred independently after the cephalochordata/chordata separation and before the Gnathostomata radiation (fig. 1). However, it does not give any clue about a possible en bloc duplication, and in an "ideal" case of en bloc duplication, all the co-orthologs of amphioxus would constitute a network of paralogous regions having the same duplication date.
|
McLysaght, Hokamp, and Wolfe (2002) concluded that at least one round of large-scale duplication occurred in the vertebrate genome; whereas Gu, Wang, and Gu (2002) concluded that at least two rounds of large-scale duplication occurred associated with continuous small-scale duplications. In the study published by Gu, Wang, and Gu (2002), one peak observed in the early stage of vertebrates likely corresponds to the one-round large-scale event reported by McLysaght, Hokamp, and Wolfe (2002), whereas another peak found after the mammalian radiation is more likely to be the result of recent segmental/tandem duplications.
The phylogenetic analysis of 31 genes located in these regions supported the hypothesis of en bloc duplications after the Cephalochordata/Chordata separation and before the emergence of Gnathostomata (Abi-Rached et al. 2002). Unfortunately, only six of the genes studied had more than two paralogs, making it impossible to rigorously test the 2R hypothesis or estimate the divergence time for the duplicated genes.
As in the study of the major histocompatibility complex (MHC) region and its paralogs (Abi-Rached et al. 2002), we choose the strategy of taking into account the possibility of domain shuffling and we revisited the formerly identified paralogons in the 8p11.218p21.3, 10q21.210q26.3, 5q31.15q35.2, and 4p16.14q35.1 regions. The analysis was realized by the reinvestigation of their gene content, testing of their phylogenetic relationships between paralogous genes found in these regions, using three reconstruction methods, showing statistical evidence of their nonrandom distribution over the genome, and estimating the divergence time of these genes in order to test the 2R theory for these regions.
![]() |
Materials and Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Domains Identification
For the remaining 65 genes, many paralogs were predicted and their constitutive domains were identified in the Pfam database (Bateman et al. 2002). Their phylogenetic relationships were determined on the basis of Pfam domain alignments and reconstructions made by the Neighbor-Joining method (Saitou and Nei 1987) using MEGA2 (Kumar et al. 2001) for each of their constitutive domains. Congruence between the different domains was done using the incongruence length difference (ILD) test (Farris et al. 1995; Thornton and DeSalle 2000). This step was done to avoid potential reconstruction biases from domain shuffling (as detected in Abi-Rached et al. 2002). This common mechanism is indeed thought to have played a major role in the evolution of new proteins (Doolittle and Bork 1993; Doolittle 1995), and we choose to take it into account for the analysis.
Sequence Retrieval and Phylogenetic Reconstruction
For those sequences whose evolutionary history agreed with the two rounds of duplications, more sequences were retrieved using BlastP (Altschul et al. 1997) on the nonredundant database (NR) and on the Danio rerio referenced proteins database at the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/genome/seq/DrBlast.html). TBlastN was used on the Takifugu rubripes predicted proteins database at the HGMP-Fugu Genomics Project webpage (http://fugu.hgmp.mrc.ac.uk/blast/). Sequences were aligned with ClustalX (Thompson et al. 1997), and the phylogenetic relationships were determined at the domain level using the Neighbor-Joining algorithm in MEGA2 (Kumar et al. 2001). Trees were rooted at the midpoint, and the outgroup was composed of protostomian sequences, except that three gene families with deuterostomian sequences were also found: BAG4 (Drosophila sister group to the chicken sequence BAA13589), MGC1136 (Drosophila sister group to the human 10q22.2 XP_061191 and Fugu JGI_11847 sequences), and Loc55893 (Drosophila sister group to the mouse ZFA_MOUSE and Fugu JGI_2181 sequences).
Two other reconstruction methods were used for the 25 genes yielding more than two paralogs: the Maximum Parsimony method in PAUP*4.0 (Swofford 2000, Appendix 1) and the Maximum Likelihood method using Tree-Puzzle 5.0 (Strimmer and von Haeseler 1996). The bootstrap proportion (Felsenstein 1985) was used to access the strength of the topologies for each of the three methods.
Saturation Visualization
Tree reconstruction with DNA or protein sequences can be biased by mutational saturation (Van de Peer et al. 2002). To avoid such problems and to find those taxa that could be implicated in branch swappingand thus in the wrong topologies gainedthe saturation at the amino acid level was visualized with MUST (Philippe 1993) by comparing two distance matrices, the first one calculated using the p distance and the second one using the Poisson correction. When saturation was observed, it was usually caused by the outgroup, and thus there was no implication in the reconstruction.
Four-Cluster Analysis and Templeton Test
The robustness of the topologies was tested by the Four-cluster analysis in Phyltest (Kumar 1996) for the 25 gene families with more than two paralogs (see Appendix 2 in online Supplementary Material); their congruence was tested using the Templeton test with PAUP*4.0 under the Maximum Parsimony method (Nei and Kumar 2000).
Relative Rate Test and Estimation of Duplication Time
The Relative Rate Test was performed with Phyltest (Kumar 1996) to determine if the substitution rate of the different paralogous genes was homogeneous. Sequences whose substitution pattern was significantly different from the mean pattern were not included in the estimation. The duplication times were then estimated using a linearized Neighbor-Joining tree of the conserved sequences in MEGA2 (Kumar et al. 2001) as describe elsewhere (Balczarek, Lai, and Kumar 1997; Kumar and Hedges 1998). The calibration times we used were mammal-amphibian (360 MYA), and tetrapod-teleost (420 MYA) (Kumar and Hedges 1998; Hedges 2000; Wang and Gu 2000).
Statistical Significance of Gene Distribution
The size and the gene number of all the human genomic regions were defined with the University of California at Santa Cruz (UCSC) Human Genome Bbrowser. The studied region of chromosome 8 is approximately 27.8 Mbp enclosed between the genes FGF17 (8p21.3) and VDAC3 (8p11.21), and contains 111 genes. Among these 111 genes, 25 are paralogs that have been created by duplications after the protostomian/deuterostomian separation and before the Osteychthyian split. The other regions were defined as follows: between ANK (10q21.2) and BAG4 (10q26.2) for chromosome 10 (enclosing a total of 293 genes); between LOXL2 (5q23.3) and FGFR (5q35.3) for chromosome 5 (enclosing 255 genes); and between PDL (4q35.1) and EPB49 (4p16) for chromosome 4 (a total of 2271 genes). The total number of genes in the human genome was estimated to be 35,000 (Lander et al. 2001; Venter et al. 2001).
Taking into account the total number of genes (2,819) located in these regions, the probability that a randomly selected gene would be located in one of them is 2,819/35,000 = 0.08; similarly, the probability for a gene to be located somewhere else in the genome is 10.08 = 0.92.
The gene nomenclature was set according to the HUGO gene nomenclature committee for 26 gene families. The mapping of all human genes was done using the BLAT tool available with The Draft Human Genome Browser of UCSC (UCSC Human Genome Project Working Draft, April 2002 assembly).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Phylogenetic Analysis
Of the 111 genes located in the 8p11.218p21.3 region, 46 were predicted to be unique by the BLAT tool (see Materials and Methods). As a first approximation for the remaining 65 genes, we used the Neighbor-Joining algorithm with Poisson correction to visualize the phylogenetic relationships of their domains (Bateman et al. 2002). Even though duplication is a continuous process in genomes (Lynch and Conery 2000; Gu, Wang, and Gu 2002), we focused our analysis on the window of duplication events between the protostomian-deuterostomian separation and the actinopterygian-sarcopterygian split. We selected 38 paralog groups for this time frame, within which there were 25 families yielding more than two paralogs and 13 families yielding two paralogs. We then carried out complete phylogenetic analyses on the 38 gene families.
All tree topologies obtained from the 25 gene families with more than 2 paralogs were supported by Bootstrap Proportions (BP) for the Neighbor-Joining and the Maximum Parsimony (see Supplementary Material online) reconstruction methods. For the Maximum Likelihood method, the values are quartet Puzzling support values and will be interpreted as Bootstrap Proportions. The Bootstrap Proportions at the nodes were most often high (BP > 69) for the different paralogous groups (fig. 2). We used an outgroup made up of at least one protostomian sequence (Drosophila melanogaster or Caenorhabditis elegans sequence) for all the reconstructions. Three of the studied gene families had, however, a mixed outgroup composed of paralogs from older duplication events, as well as from protostomians and deuterostomians: these gene families are BAG4, MGC1136, and Loc55893. This mixing indicated that the true orthologs in the Drosophila melanogaster or Caenorhabditis elegans genomes for these families has not been detected or has simply been lost in the protostomian lineage. Gene loss is indeed common and has been described in a comparative analysis of Drosophila melanogaster and Anopheles gambiae (Zdobnov et al. 2002). Nevertheless, we were able to identify orthologous sequences in Takifugu rubripes for all of the data sets analyzed (fig. 2).
|
|
|
|
Furthermore, we individually tested each of the 13 paralogous regions (see Materials and Methods). In this case, the Bonferroni correction was used to adjust downward the -level of each individual test and thus to ensure that the overall risk for a number of tests remains 0.05. Thus the
-level was divided by the number of comparisons (0.05/13). Using the above formula and now calculating the probability of finding a paralogous gene located on chromosomes 1, 2, 4, 5, 6, 7, 8q, 10, 14, 15, 18, 20, or X for each gene located within the chromosome 8 region we find that the distribution of the paralogous genes on chromosomes 5 and 10 is statistically different from a random distribution (with high significance P < 0.001), strongly suggesting that these regions were created by en bloc duplications. This result suggests that the distribution of the paralogous genes in the other regions could be due to chance alone. With regard to chromosome 4, we can separate it into two parts: 4p16.1-p16.3 (five paralogous genes out of 36 predicted genes in this zone) and 4q24-q35.1 (five paralogous genes out of 617 predicted genes). In this case, the 4p16.1-p16.3 region is highly significantly different from a random distribution, whereas 4q24-q35.1 is not. This finding indicates that the two sections of chromosome 4 are dichotomousone belonging to the two rounds of duplication and the other possibly resulting from shuffling events.
Congruency of the Tree Topologies
A Templeton test (Templeton 1983) of the topologies obtained from the three different reconstruction methods (Neighbor-Joining, Maximum Parsimony, and Maximum Likelihood) indicated a high congruence between the three topologies gained by the various techniques.
To give statistical significance to the topologies obtained, we next employed a four cluster analysis (Kumar 1996). Of the 25 gene families with more than two paralogs, our phylogenies indicated that 11 represent a (8,10), (4,5) topology (for a total of 44% of the genes analyzed by this method; with Confidence Proportion (CP) values ranging from 0.549 to 1); 5 corresponded to a (8,5), (4,10) topology (20% of the analyzed genes; CP values ranging from 0.376 to 0.858); 3 corresponded to a (8,4), (5,10) topology (12% of the analyzed genes; CP values ranging from 0.548 to 0.887); and 6 represented other topologies (24% of the analyzed genes; CP values ranging from 0.354 to 0.999). This predominance of observed (8,10), (4,5) topologies suggests the way the duplications occurred, and the other observed topologies can be explained by recombination (see Supplementary Material online). From the four-cluster analysis and the congruence of the topologies, we can conclude that, for this region of the genome, two rounds of duplications did indeed occur between the protostomian-deuterostomian separation and before the actinopterygian-sarcopterygian split during vertebrate evolution.
The determination of the phylogenetic relationships and the comparisons between four human chromosomal regions show that they duplicated en bloc.
Estimation of the Duplication Dates
To distinguish suitable gene families from which to estimate the timing of the duplication events, we selected families with constant substitution patterns using the Relative Rate test (see Materials and Methods) (Kumar 1996). We found that 22 of the 38 data sets evolved with a conserved molecular clock, but we used 21 of them to estimate the duplication times (we remove the ANK gene family because of the extreme value more than 1,100 MYA in comparison with all the other time divergence estimations).
The following hypothesis could be considered: a first duplication of an ancestral region giving rise to the (8, 10) and (4, 5) ancestral region followed by a second duplication giving rise to the 8 and 10 paralogous region, on the one hand, and the 4 and 5 paralogous region, on the other hand. These events were then followed by recombination. In this case only the genes yielding a (8,10) and (4,5) topology were used to estimate the timing of the first and second rounds of duplication. They were estimated to have occurred at T1 = 687 ± 155.70 MYA and T2 = 506.5 ± 103.76 MYA, respectively. We could take into an account a more general hypothesis: en bloc duplication occurred with a less clear history than above. In this case we used the complete data set (including the families with different topologies than before) and we obtained dates of T1 = 738.95 ± 74.84 MYA and T2 = 532.54 ± 57.84 MYA, respectively (fig. 4) (see Supplementary Material online).
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Phylogenetic Reconstruction
By employing a robust methodology based on phylogenetic trees of protein domains constructed by three different algorithms, we observed a majority of (8,10), (4,5) topologies (44%), allowing us to discount the various other topologies arising from undetected new artifacts, crossing over between the different regions, rapid diploidization processes, ancestral polymorphism, or two rapid rounds of duplication (Furlong and Holland 2002).
Mapping of Paralogous Genes
The statistical analysis yielded strong support for the nonrandom distribution of the paralogous genes found on chromosomes 5 and 10. With regard to the paralogous sections of chromosome 4, the test rejected the entire chromosome as being paralogous but did not reject the 4p16.116.3 subregion. Interestingly, all five genes present in this region displayed a (8,10), (4,5) topology.
Time Divergence
To estimate the lower bound of the duplication more precisely, we comprehensively searched the Takifugu rubripes genome for paralogous sequences; to better determine the upper bound of the first duplication, we included the amphioxus sequences for EGR and FGFR (the complete sequencing of the amphioxus genome would be monumental for this purpose). Furthermore, McLysaght, Hokamp, and Wolfe (2002) and Gu, Wang, and Gu (2002) found an excess in date range between 397695 MYA and 430750 MYA, respectively, whereas our analysis estimated the mean ages of the first and second duplications to be 738 MYA and 531 MYA. Therefore, the divergence time estimates for the duplication events were similar (as for T1 and T2, Wang and Gu 2000)albeit within a large time windowwhether or not the lineages of the protein domains were considered. Possible explanations for this result are that domain shuffling did not play a major role in locating the different genes to this region (in contrast to gene belonging to the MHC region; Abi-Rached et al. 2002), or that the information contained by the shuffled domain is diluted by the phylogenetic information of the other domains.
The Deciphering of the Ancestral Diploidization Process
This article represents a first step toward understanding the complex diploidization events that shaped the vertebrate genome. Given the potentially numerous gene deletions that occurred on chromosome 8 after the duplication events, we underestimate the number of vertebrate paralogous genes that could be conserved in these 4 regions. It would be interesting in future studies to take the same approach, and to use the region 5q35.2-q31.1 as reference to detect the paralogous genes shared by chromosomes 10, 4, and elsewhere on chromosome 8.
In the future, to demonstrate an entire genome duplication, it will be interesting to show that the other paralogous regions described in the human genome (McLysaght, Hokamp, and Wolfe 2002) display a pattern similar to the one described for 8p11.218p21.3, 10q21.210q26.3, 4p16.14q35.1, and 5q31.15q35.2 paralogous regions: (1) a majority of (but not only) (A,B) (C,D) topologies, (2) a nonrandom distribution of the paralogous genes in the parasyntenic region, and (3) similar divergence times.
The Evolutionary Fate and Consequences of Paralogous Regions
As regards the fate of duplicated regions, the substitution rate between paralogous genes shows that they evolved at a similar pace. This result contrasts with what we observed for the MHC and its paralogous regions (Abi-Rached et al. 2002), where the analysis of the substitution pattern showed that genes located on chromosome 9 had a lower substitution rate than the other paralogous regions. The results described in this study indicate that the proposition "only one paralogous region retains the ancestral function" is not a rule. Therefore the evolution of the paralogous region after large-scale duplication is not a homogeneous phenomenon. To determine the fate of duplicated regions, other sets of paralogous region should be investigated as described here.
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Abi-Rached, L., A. Gilles, T. Shiina, P. Pontarotti, and H. Inoko. 2002. Evidence of en bloc duplication in vertebrate genomes. Nat. Genet. 31:100-105.[CrossRef][ISI][Medline]
Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402.
Balczarek, K. A, Z- C. Lai, and S. Kumar. 1997. Evolution and functional diversification of the paired box (Pax) DNA-binding domains. Mol. Biol. Evol. 14:829-842.[Abstract]
Bateman, A., E. Birney, L. Cerruti, R. Durbin, L. Etwiller, S. R. Eddy, S. Griffiths-Jones, K. L. Howe, M. Marshall, and E. L. Sonnhammer. 2002. The Pfam protein families database. Nucleic Acids Res. 30:276-280.
Cro, R. C., R. L. Small, and F. Wendel. 1999. Duplicated genes evolve independently after polyploid formation in cotton. Proc. Natl. Acad. Sci. USA 96:14406-14411.
Doolittle, R. F. 1995. The multiplicity of domains in proteins. Annu. Rev. Biochem. 64:287-314.[CrossRef][ISI][Medline]
Doolittle, R. F., and P. Bork. 1993. Evolutionarily mobile modules in proteins. Sci. Am. 269:50-56.[Medline]
Farris, J. S., M. Kallersjo, A. G. Kluge, and C. Bult. 1995. Testing significance of incongruence. Cladistics 10:315-319.[CrossRef][ISI]
Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783-791.[ISI]
Furlong, R. F., and P. W. H. Holland. 2002. Were vertebrates octoploid? Phil. Trans. R. Soc. Lond. Ser. B. 357:531-544.[CrossRef][ISI][Medline]
Graur, D., and W- H. Li. 2000. Fundamentals of molecular evolution. 2nd edition. Sinauer Associates, Sunderland, Mass.
Gu, X., Y. Wang, and J. Gu. 2002. Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nat. Genet. 31:205-209.[CrossRef][ISI][Medline]
Hedges, S. B. 2001. Molecular evidence for the early history of living vertebrates. Pp. 119134 in E. Ahlberg, ed. Major events in early vertebrate evolution: paleontology, phylogeny, and development. Taylor and Francis, London.
Holland, N. D., L. Z. Holland, and Z. Kozmik. 1995. An amphioxus Pax gene, AmphiPax-1, expressed in embryonic endoderm, but not in mesoderm: implications for the evolution of class I paired box genes. Mol. Mar. Biol. Biotechnol. 4:206-214.[ISI][Medline]
Holland, P. W., J. Garcia-Fernandez, N. A. Williams, and A. Sidow. 1994. Gene duplications and the origins of vertebrate development. Development Suppl. 125133.
Kent, W. J. 2002. BLAT-the BLAST-like alignment tool. Genome Res. 12:656-664.
Kent, W. J., C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. 2002. The Human Genome Browser at UCSC. Genome Res. 12:996-1006.
Kumar, S. 1996. PHYLTEST: a program for testing phylogenetic hypotheses. Pennsylvania State University, University Park.
Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917-920.[CrossRef][ISI][Medline]
Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244-1245.
Lander, E. S., L. M. Linton, and B. Birren, et al. (253 co-authors). 2001. Initial sequencing and analysis of the human genome. Nature 409:860-921.[CrossRef][ISI][Medline]
Lundin, L. G. 1993. Evolution of the vertebrate genome as reflected in paralogous chromosome regions in man and the house mouse. Genomics 16:1-19.[CrossRef][ISI][Medline]
Lynch, M., and J. S. Conery. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151-1155.
Masterson, J. 1994. Stromatal size in fossils plants: evidence for polyploïdy in majority of angiosperms. Science 264:421-424.[ISI]
McLysaght, A., K. Hokamp, and K. H. Wolfe. 2002. Extensive genomic duplication during early chordate evolution. Nat. Genet. 31:200-204.[CrossRef][ISI][Medline]
Nei, M., and S. Kumar. 2000. Molecular evolution and phylogenetics. Oxford University Press, New York.
Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, Berlin.
Pebusque, M. J., F. Coulier, D. Birnbaum, and P. Pontarotti. 1998. Ancient large-scale genome duplications : phylogenetic and linkage analyses shed light on chordate genome evolution. Mol. Biol. Evol. 15:1145-1159.[Abstract]
Philippe, H. 1993. MUST, a computer package of management utilities for sequences and trees. Nucleic Acids Res. 21:5264-5272.[Abstract]
Saitou, N., and M. Nei. 1987. The Neighbor-Joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.[Abstract]
Soltis, D. E., and P. S. Soltis. 1999. Polyploidy: recurrent formation and genome evolution. Trends Ecol. Evol. 14:34-352.[CrossRef][ISI][Medline]
Stephens, S. G. 1951. Possible significance of duplication in evolution. Adv. Gent. 4:247-265.
Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964-969.
Swofford, D. L. 2000. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4. Sinauer Associates, Sunderland, Mass.
Taylor, J. S., Y. Van De Peer, I. Braasch, and A. Meyer. 2001. Comparative genomics provides evidence for an ancient genome duplication event in fish. Phil. Trans. R. Soc. Lond. 356:1661-1679.[CrossRef][ISI][Medline]
Taylor, J. S., Y. Van De Peer, and A. Meyer. 2001. Genome duplication, divergent resolution and speciation. Trends Genet. 17:299-301.[CrossRef][ISI][Medline]
Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and apes. Evolution 37:221-244.[ISI]
Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX Windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876-4882.[CrossRef]
Thornton, J. W., and R. DeSalle. 2000. Gene family evolution and homology: genomics meets phylogenetics. Annu. Rev. Genomics Hum. Genet. 1:41-73.[CrossRef][ISI][Medline]
Van De Peer, Y., T. Frickey, J. S. Taylor, and A. Meyer. 2002. Dealing with saturation at the amino acid level: a case study based on anciently duplicated zebrafish genes. Gene 295:205-211.[CrossRef][ISI][Medline]
Venter, J. C., M. D. Adams, and E. W. Myers, et al. (272 co-authors). 2001. The sequence of the human genome. Science 291:1304-1351.
Vision, T. J., D. G. Brown, and S. D. Tanksley. 2000. The origins of genomic duplications in Arabidopsis. Science 290:2114-2117.
Wang, Y., and X. Gu. 2000. Evolutionary patterns of gene families generated in the early stage of vertebrates. J. Mol. Evol. 51:88-96.[ISI][Medline]
Wendel, J. F. 2000. Genome evolution in polyploids. Plant Mol. Biol. 42:225-249.[CrossRef][ISI][Medline]
Wolfe, K. H., and D. C. Shields. 1997. Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387:708-713.[CrossRef][ISI][Medline]
Zdobnov, E. M., C. von Mering, and I. Letunic, et al. (34 co-authors). 2002. Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science 298:149-159.