* Institute of Molecular and Cell Biology, Singapore
University of Cambridge, Department of Oncology, Hutchison-MRC Research Centre, Cambridge, UK
Correspondence: E-mail: mcbbv{at}imcb.nus.edu.sg.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Key Words: whole-genome duplication; paralogons molecular clock Fugu
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The studies of duplicate genes in teleosts so far are based on small data sets before the completion of a teleost genome and, thus, lack adequate statistical and phylogenetic support. Furthermore, because of the small size of the data sets, it has not been possible to estimate the precise ages of fish-specific duplicate genes. Thus, the extent and the timing of gene duplication events in the teleost lineage remain unclear. Recently, the draft genome sequence of a teleost, the Fugu, was completed (Aparicio et al. 2002). At 385 Mb, Fugu has one of the smallest genomes among vertebrates. The compact size of the genome has been attributed to the paucity of repetitive sequences and other nonessential sequences in the genome. In this study, we have made a systematic comparison of the Fugu and human genomes to estimate the extent of fish-specific paralogous chromosomal fragments ("paralogons") and fish-specific duplicate genes in the Fugu genome. We estimated the ages of the Fugu duplicate genes using the molecular clock. Our results provide strong evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Interestingly, the whole-genome duplication appears to have occurred before the origin of teleosts, raising doubts about its role in the radiation of teleosts.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Fugu-Human Protein Families
BlastP searches of 30,749 Fugu proteins were carried out against a combined human and Fugu protein data sets using the following parameters: (1) BLOSUM62 matrix with SEG filtering switched on; (2) expectation cutoff score of 1e07; and (3) minimum 50% alignment length of the longer sequence covered by the alignment. This combination of parameters were arrived at after extensive exploration of various parameters, including those described by McLysaght, Hokamp, and Wolfe (2002) and Friedman and Hughes (2001). All Blast searches were carried out on a 75-node Compaq Alpha server, and the data were stored in a MySQL database. Of the 30,749 Fugu proteins, 3,257 did not meet the expectation score threshold (1e07). Another 3,983 proteins were excluded because they did not match 50% of the longest sequence. The remaining 23,509 Fugu proteins and their matching human sequences were grouped into families that each contained two to 248 proteins.
Paralogon Detection
All scaffolds were compared with each other to identify scaffolds that shared protein families. Scaffolds that shared more than one duplicate protein pair were identified as paralogous chromosomal segments ("paralogons"), irrespective of the order and orientation of the genes. A maximum of 20 unrelated genes were allowed on a paralogon. The paralogon detection algorithm outlined above represents an interscaffold comparison and does not take into account any intrascaffold events. We downloaded data for 1,437 human paralogons that were reported to have originated during the early evolution of chordates (McLysaght, Hokamp, and Wolfe 2002).
Statistical Test for Randomness
We identified paralogons on 1,000 shuffled gene maps to test the statistical significance of our results. The number and the size of paralogons in the shuffled data were compared with the actual data using the t-test.
Fish-Specific Duplicate Genes and the Time of Duplications
To obtain outgroups for Fugu and human protein families, we identified Ciona, fly, or nematode orthologs for human proteins by a reciprocal Blast. A total of 3,781 Ciona-human orthologs, 1,967 fly-human orthologs, and 2,182 nematode-human orthologs were derived from these searches. After adding these invertebrate out-groups into the respective Fugu-human protein families, a total of 995 families that had one out-group sequence (Ciona, fly, or nematode), at least one human sequence and more than one Fugu sequence were retrieved.
The 995 families were analyzed for evidence of duplication events in the fish lineage by phylogenetic analysis. Alignments were generated using ClustalW with default parameters (Thompson, Higgins, and Gibson 1994), and tree topologies were generated by the PHYLIP programs, PROTDIST and NEIGHBOR (Felsenstein 1989). The gamma-corrected substitution rates were calculated using the program GAMMA (Gu and Zhang 1997). For 142 families, the program crashed because of some unexplained error in the data file. Neighbor-joining (NJ) trees were drawn for the remaining 853 protein families with 1,000 bootstraps. Because we were only interested in fish-specific duplications, we filtered out 506 families that did not show a duplication topology for the Fugu sequences. NJ trees were reconstructed for the remaining 347 families (425 duplication nodes). Phylogenetic trees were also reconstructed using the maximum-likelihood (ML) method. This method identified 425 fish-specific duplicate nodes similar to the NJ method, except that 12 of the duplicate gene pairs were different from that predicted by the NJ method. Results of only the NJ method are presented because the molecular clock analysis carried out for dating the duplication event (see below) also uses the distance-based NJ algorithm.
The two-cluster test for rate heterogeneity was applied to 347 families to test for deviation from the molecular clock at 5% significance using TPCV, a program within the LinearTree package (Takezaki, Rzhetsky, and Nei 1995). A total of 236 families did not satisfy the molecular clock hypothesis. Estimates of divergence time of these genes were nevertheless calculated to get an idea of their distribution pattern, and are shown as Supplementary Material online at www.mbe.oupjournals.org. An additional 15 families showed negative or zero branch lengths. Linearized trees were drawn for the remaining 96 families. A total of eight linearized trees showed topology inconsistent with fish-specific duplications. The duplication dates of the 95 Fugu genes in the remainder 88 linearized trees were estimated relative to the divergence time of ray-finned fish and lobe-finned fish (450 Myr) (Kumar and Hedges 1998).
![]() |
Results and Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
|
Evolutionary Implications
Based on the presence of orthologous duplicate genes in phylogenetically distant species of teleosts such as zebrafish and Fugu, it has been suggested that an ancient whole-genome duplication provided the additional genetic material that facilitated the radiation of teleosts (Amores et al. 1998; Postlethwait et al. 1998; Meyer and Schartl 1999; Taylor et al. 2003). Our estimation of the ages of duplicate genes in Fugu suggests that the whole-genome duplication occurred in the ray-finned fish lineage around 350 MYA. Interestingly, paleontological evidence suggest that teleosts first appeared around 220 MYA and underwent rapid diversification during the Jurassic and Cretaceous periods (205 to 135 MYA) (Maissey 1996). Thus, the whole-genome duplication appears to have occurred before the origin of teleosts. Alternatively, this could be a case of a dramatic discordance between the molecular data and fossil evidence. If the genome duplication in the ray-finned fish lineage did indeed occur before the origin of teleosts, then genome duplication may not have been the driving force behind the radiation of teleosts. The basal "nonteleost" ray-finned fishes are represented by only four living groups: the polypteriforms (e.g., bichirs), acipenseriforms (sturgeons and paddlefish), semionotiforms (gars), and amiiforms (bowfin). To establish a correlation between the whole-genome duplication and the radiation of teleosts, it would be important to determine the time of the whole-genome duplication in relation to the speciation events of these "nonteleost" ray-finned fishes. Characterization of duplicate genes and paralogons in the basal "nonteleost" ray-finned fishes, as well as in basal teleosts such as osteoglossomorphs (e.g., bonytongues) and elopomorphs (e.g., eels) should help to determine whether the whole-genome duplication event in the ray-finned fish lineage spurred the radiation of teleosts.
![]() |
Supplementary Material |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Web Site References
![]() |
Acknowledgements |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Footnotes |
---|
![]() |
Literature Cited |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Amores, A., A. Force, and Y. L. Yan, et al. (13 co-authors). 1998. Zebrafish hox clusters and vertebrate genome evolution. Science 282:1711-1714.
Aparicio, S., J. Chapman, and E. Stupka, et al. (41 co-authors). 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301-1310.
Barbazuk, W. B., I. Korf, C. Kadavi, J. Heyen, S. Tate, E. Wun, J. A. Bedell, J. D. McPherson, and S. L. Johnson. 2000. The syntenic relationship of the zebrafish and human genomes. Genome Res. 10:1351-1358.
Felsenstein, J. 1989. PHYLIP (phylogeny inference package). Version 3.2. Cladistics 5:164-166.
Friedman, R., and A. L. Hughes. 2001. Pattern and timing of gene duplication in animal genomes. Genome Res. 11:1842-1847.
Gates, M. A., L. Kim, E. S. Egan, T. Cardozo, H. I. Sirotkin, S. T. Dougan, D. Lashkari, R. Abagyan, A. F. Schier, and W. S. Talbot. 1999. A genetic linkage map for zebrafish: comparative analysis and localization of genes and expressed sequences. Genome Res. 9:334-347.
Gu, X., Y. Wang, and J. Gu. 2002. Age distribution of human gene families shows significant roles of both large- and small-scale duplications in vertebrate evolution. Nat. Genet. 31:205-209.[CrossRef][ISI][Medline]
Gu, X., and J. Zhang. 1997. A simple method for estimating the parameter of substitution rate variation among sites. Mol. Biol. Evol. 14:1106-1113.[Abstract]
Kumar, S., and S. B. Hedges. 1998. A molecular timescale for vertebrate evolution. Nature 392:917-920.[CrossRef][ISI][Medline]
Maissey, J. G. 1996. Discovering fossil fishes. Henry Holt and Company, New York.
McLysaght, A., K. Hokamp, and K. H. Wolfe. 2002. Extensive genomic duplication during early chordate evolution. Nat. Genet. 31:200-204.[CrossRef][ISI][Medline]
Meyer, A., and M. Schartl. 1999. Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr. Opin. Cell. Biol. 11:699-704.[CrossRef][ISI][Medline]
Naruse, K., S. Fukamachi, and H. Mitani, et al. (20 co-authors). 2000. A detailed linkage map of medaka, Oryzias latipes: comparative genomics and genome evolution. Genetics 154:1773-1784.
Panopoulou, G., S. Hennig, D. Groth, A. Krause, A. J. Poustka, R. Herwig, M. Vingron, and H. Lehrach. 2003. New evidence for genome-wide duplications at the origin of vertebrates using an amphioxus gene set and completed animal genomes. Genome Res. 13:1056-1066.
Postlethwait, J. H., I. G. Woods, P. Ngo-Hazelett, Y. L. Yan, P. D. Kelly, F. Chu, H. Huang, A. Hill-Force, and W. S. Talbot. 2000. Zebrafish comparative genomics and the origins of vertebrate chromosomes. Genome Res. 10:1890-1902.
Postlethwait, J. H., Y. L. Yan, and M. A. Gates, et al. (26 co-authors). 1998. Vertebrate genome evolution and the zebrafish gene map. Nat. Genet. 18:345-349.[ISI][Medline]
Robinson-Rechavi, M., A. S. Carpentier, M. Duffraisse, and V. Laudet. 2001a. How many nuclear hormone receptors are there in the human genome? Trends Genet. 17:554-556.[CrossRef][ISI][Medline]
Robinson-Rechavi, M., O. Marchand, H. Escriva, and V. Laudet. 2001b. An ancestral whole-genome duplication may not have been responsible for the abundance of duplicated fish genes. Curr. Biol. 11:R458-R459.[CrossRef][ISI][Medline]
Smith, N. G., R. Knight, and L. D. Hurst. 1999. Vertebrate genome evolution: a slow shuffle or a big bang? Bioessays 21:697-703.[CrossRef][ISI][Medline]
Smith, S. F., P. Snell, F. Gruetzner, A. J. Bench, T. Haaf, J. A. Metcalfe, A. R. Green, and G. Elgar. 2002. Analyses of the extent of shared synteny and conserved gene orders between the genome of Fugu rubripes and human 20q. Genome Res. 12:776-784.
Takezaki, N., A. Rzhetsky, and M. Nei. 1995. Phylogenetic test of the molecular clock and linearized trees. Mol. Biol. Evol. 12:823-833.[Abstract]
Taylor, J. S., I. Braasch, T. Frickey, A. Meyer, and Y. Van de Peer. 2003. Genome duplication, a trait shared by 22000 species of ray-finned fish. Genome Res. 13:382-390.
Taylor, J. S., Y. Van de Peer, I. Braasch, and A. Meyer. 2001. Comparative genomics provides evidence for an ancient genome duplication event in fish. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 356:1661-1679.[CrossRef][ISI][Medline]
Taylor, J. S., Y. Van de Peer, and A. Meyer. 2001. Genome duplication, divergent resolution and speciation. Trends Genet. 17:299-301.[CrossRef][ISI][Medline]
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673-4680.[Abstract]
Venkatesh, B. 2003. Evolution and diversity of fish genomes. Curr. Opin. Genet. Dev. 13:1-5.[CrossRef]
Winkler, C., M. Schafer, J. Duschl, M. Schartl, and J. N. Volff. 2003. Functional divergence of two zebrafish midkine growth factors following fish-specific gene duplication. Genome Res. 13:1067-1081.
Wittbrodt, J., A. Meyer, and M. Schartl. 1998. More genes in fishes? Bioessays 20:511-515.[CrossRef][ISI]
Woods, I. G., P. D. Kelly, F. Chu, P. Ngo-Hazelett, Y. Yan, H. Huang, J. H. Postlethwait, and W. S. Talbot. 2000. A comparative map of the zebrafish genome. Genome Res. 10:1903-1914.
Yu, W. P., S. Brenner, and B. Venkatesh. 2003. Duplication, degeneration and subfunctionalization of the nested synapsin-Timp genes in Fugu. Trends Genet. 19:180-183.[CrossRef][ISI][Medline]