Department of Environmental, Population, and Organismic Biology, University of Colorado
Many hypotheses proposed as explanations of gene family diversity in vertebrates invoke two successive polyploidizations prior to the origin of fishes (see references in Skrabanek and Wolfe 1998
; Martin 1999
). This hypothesis has gained widespread support in the literature. Spring (1997)
has advanced the term "tetralogy" as a definition of orthology in which the four paralogous gene copies in vertebrates are all homologous to single-copy genes in invertebrates. Nadeau and Sankoff (1997)
assumed successive genome duplications early in vertebrates in a study examining the relative rates of acquisition of new gene function and gene loss. Similarly, recent reviews on genome evolution emphasize the "one-to-four rule" (Meyer and Schartl 1999
), a rule that mirrors the intention of proposing tetralogy as an addition to the genomics vocabulary.
Recent papers have brought the issue of contemporary vertebrate genome organization into clearer focus and advocate adopting a hypothesis-testing framework for investigating genome evolution (Smith, Knight, and Hurst 1999
). In this paper, I test some of the predictions of the tetralogy hypothesis by phylogenetic analysis of several multigene families that match the one-to-four rule. Alignments for gene families listed by Spring (2000)
for which there are four "tetralogous" copies in vertebrates and a single invertebrate ortholog (see also http://stripe.colorado.edu/
am/multigene.html) were constructed using CLUSTAL W (Thompson, Higgins, and Gibson 1994
). Minimum-length gene trees were discovered using the branch-n-bound algorithm in PAUP*, and support for nodes of the tree was evaluated by bootstrapping the data 500 times (implemented in PAUP* Swofford 1999
)(fig. 1
). Gene trees that did not match the tetralogy hypothesis were rearranged to match the predictions of tetralogy, and the rearranged tree was tested against the minimum-length tree (MLT) using a Wilcoxon signed-ranks test (Templeton 1983
) implemented in PAUP* (Swofford 1999
)(rearranged trees are also presented in fig. 1
).
|
|
Bone morphogen proteins (BMPs) are important in determining cell fates during development (Dale and Jones 1999
). ID genes encode helix-loop-helix transcription factors involved in cell growth and differentiation (Norton et al. 1998
). Genes for the BMP 58 and ID gene families are distributed on chromosomes 1, 2, 6, and 20. The BMP 5 and BMP 6 genes are both on chromosome 6, although they are not closely linked. Trees of these two gene families that conform to the tetralogy hypothesis are not congruent, however. BMP depicts a sister relationship between chromosomes 1 and 20, whereas ID supports the grouping of chromosomes 2 and 20 (fig. 2
). Furthermore, the two BMP genes on the same chromosome are sisters, supporting the hypothesis that they originated by intrachromosomal duplication. Evaluation of the congruence of the chromosomal relationship by rearranging gene trees (as described) yields significant results for all possible comparisons.
|
Src and src-related genes are part of a superfamily of genes encoding nuclear receptors (Leo and Chen 2000). Syndecans are transmembrane proteins involved in cell-cell communication (Zimmermann and David 1999
). Both of these gene families have members on chromosomes 1, 8, and 20. Two of the src-related genes are on chromosome 8, although they are not close together on the chromosome. Concordance of the inferred relationships of chromosomes for the two gene families provides support for the existence of paralogous chromosomes (fig. 2
). In both gene families, chromosomes 8 and 20 are sisters. However, the syndecan tree depicts a relationship between chromosomes 1 and 2, whereas src-related genes indicate that chromosomes 1 and 8 are related. Two explanations are plausible. First, there were whole-chromosome or genome duplications, and one of the src-related genes translocated from chromosome 2 to chromosome 8, or the syndecan gene translocated from chromosome 8 to chromosome 2. Alternatively, the two gene families may have independent histories, and the chromosomes appear paralogous because of factors other than descent, like translocation and selection.
Hox genes are well-known regulatory genes expressed during development (Sharkey, Graba, and Scott 1997
). EGFRs are growth factor receptors involved in cell signaling cascades (Wells 1999
). Bailey et al. (1997) inferred the phylogenetic relationships of Hox gene clusters based on Hox gene sequences and linked fibrillar-type collagen genes. Their results suggest that the Hox gene tree is asymmetric; namely, (D, [A, (B, C)]) (see Bailey et al. 1997
; the uppercase-letter notation denotes the different Hox gene clusters.). They note that if genome duplication is invoked to explain the existence of four Hox gene clusters (AD), then there must have been three rounds of duplication and the subsequent loss of four clusters to explain the Hox gene tree. Bailey et al. mention, however, that the rooting of the Hox gene tree may be in error and that the tree may be ([A ,D], [B, C]), although the high bootstrap values supporting the asymmetric tree makes this alternative tree unlikely (Bailey et al. 1997
, p. 850).
In this paper, I inferred the phylogeny of the EGFR family, which, like the collagen genes, are linked with the Hox genes. The EGFR phylogeny is symmetric, supporting the hypothesis that there were two rounds of genome duplication; however, even if we accept the symmetric Hox gene tree (i.e., [(A, D), (B, C)]), the two gene trees do not support identical histories for the chromosomes (fig. 2 ). If we rearrange the EGFR tree to match the published Hox gene tree (i.e., [D, (A, [B, C])]), the fit of the data is significantly less parsimonious than the minimum-length EGFR tree based on a Wilcoxon signed-ranks test (tree length difference [TLD] = 23; P = 0.015). Similarly, if we rearrange the EGFR tree to match the tetralogous Hox tree (i.e., [(A, D), (B, C)]), the topology is significantly longer than the MLT (TLD = 25; P < 0.01). This result is noteworthy because it suggests that physically linked genes may have independent histories.
The MyoD family of genes encode transcription factors underlying myogenesis (Arnold and Winter 1998
). Insulin-like genes play roles in cell division, differentiation, and metabolism (Stewart and Rotwein 1996
). Genes for the MyoD family are on chromosomes 1, 11, and 12, with Myf5 and Myf6 clustered close together on chromosome 12 (Cupelli et al. 1996
). The insulin-like genes are found on chromosomes 9, 11, 12, and 19. The insulin gene tree depicts INSL 3 and INSL 4 as closely related paralogs, suggesting that if the tree were rearranged to match the predictions of tetralogy, IGF 1 and IGF 2 would be grouped together (fig. 2 ). If the grouping of Myf5/6 resulted from gene duplication and translocation, then the comparison of the insulin and MyoD gene family trees suggests that one of these two genes may have occurred on chromosome 9. The grouping of chromosomes 9 and 19 evident in the insulin gene family is also mirrored by the JAK data.
Phylogenetic analysis of gene families identified as being tetralogous does not provide convincing support for the underlying assumption that precipitated the tetralogy hypothesis, namely, that gene family diversity in vertebrates can be explained in part by multiple, successive tetraploidizations at the base of vertebrates. Two lines of evidence favor refutation of the tetralogy hypothesis. First, the minimum-length trees of only 2 of 11 gene families surveyed displayed the shape predicted by two successive rounds of genome duplication. Of the 9 gene family trees that did not match the tetralogy predictions, 2 were significantly shorter than the tetralogy hypothesis (BMP from this study and the Hox gene family; Bailey et al. 1997
); conversely, in 7 of the 9 cases, the tetralogy hypothesis could not be refuted. Second, for several gene families for which paralogous members were distributed on the same chromosomes, little indication of chromosomal paralogy was evident, even when the trees matching the predictions of the tetralogy hypothesis were compared.
Although these results discount the likelihood of multiple wholesale genome duplications, they do not necessarily suggest independence of gene family evolution. Evidence of gene family coevolution (e.g., Sitnikova and Su 1998
) implies that probabilities of gene duplication (meaning origin, fixation, and retention of new genes) may be dependent on prior gene duplication. Evidence for coevolution of functionally interacting gene families is likely to be difficult to uncover, however, because of the general complexity of gene families and the ubiquity of pleiotropic interactions among genes from different gene families (see Newfeld, Wisotzkey, and Kumar 1999
).
Apparent synteny of putative paralogous chromosomes, in light of the lack of strong evidence of whole genome duplications, suggests that selection may play a role in maintaining syntenic arrangements of related genes on separate chromosomes (Hughes 1998
). Clear evidence for selection on gene order is available for Hox genes (Mann 1997
). In addition, several pairs of gene families with codistributed paralogous members are known to interact. Recent evidence suggests that JAK and Notch genes interact during eye development in Drosophila (Strutt and Strutt 1999
). Similarly, src family kinases are thought to regulate syndecan phosphorylation (Ott and Rapraeger 1998
). Other examples of interactions among genes on the same chromosome are known. These particular examples suggest that synteny of paralogous genes may be maintained by selection associated with gene regulation and interaction. My point is not that we can necessarily implicate selection, but that when attempting explain contemporary characteristics of organisms, both the history of mutation and selection need to be considered. Most of the discussion has focused almost exclusively on mutation as an explanation for contemporary vertebrate genome organization; however, sufficient explanation requires consideration of both mutation and selection.
|
B. Franz Lang, Reviewing Editor
1 Keywords: tetralogy
gene duplication
vertebrate genome
2 Address for correspondence and reprints: Andrew Martin, Department of Environmental, Population, and Organismic Biology, University of Colorado, Boulder, Colorado 80309.am{at}stripe.colorado.edu
literature cited
Arnold, H. H., and B. Winter. 1998. Muscle differentiation: more complexity to the network of myogenic regulators. Curr. Opin. Genet. Dev. 8:539544.[ISI][Medline]
Bailey, W. J., J. Kim, G. P. Wagner, and F. H. Ruddle. 1997. Phylogenetic reconstruction of vertebrate Hox cluster duplication. Mol. Biol. Evol. 14:843853.[Abstract]
Cupelli, L., B. J. Renault, A. Leblanc-Straceski, D. Banks, R. Ward, R. Kucherlapati, and K. Krauter. 1996. Assignment of the human myogenic factors 5 and 6 (MYF5, MYF6) gene cluster to 12q21 by in situ hybridization and physical mapping of the locus between D12S350 and D12S106. Cytogenet. Cell Genet. 72:250251.[ISI][Medline]
Dale, L., and C. M. Jones. 1999. BMP signaling in early Xenopus development. Bioessays 21:751760.
Hughes, A. L. 1998. Phylogenetic test of the hypothesis of block duplication of homologous genes on chromosomes 6, 9 and 1. Mol. Biol. Evol. 15:854870.[Abstract]
Leo, C., and J. D. Chen. 2000. The SRC family of nuclear receptor coactivators. Gene 245:111.
Mann, R. S. 1997. Why are Hox genes clustered? Bioessays 66:14.
Martin, A. P. 1999. Increasing genomic complexity by gene duplication and the origin of vertebrates. Am. Nat. 154:111128.[ISI]
Meyer, A. M., and M. Schartl. 1999. Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr. Opin. Cell Biol. 11:699704.[ISI][Medline]
Nadeau, J. H., and D. Sankoff. 1997. Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. Genetics 147:12591266.
Newfeld, S. J., R. G. Wisotzkey, and S. Kumar. 1999. Molecular evolution of a developmental pathway: phylogenetic analysis of transforming growth factor-beta family ligands, receptors and Smad signal transducers. Genetics 152:783795.
Norton, J. D., R. W. Reed, G. Craggs, and F. Sablitzky. 1998. Id helix-loop-helix proteins in cell growth and differentiation. Trends Cell Biol. 8:5865.[ISI][Medline]
Ott, V. L., and A. C. Rapraeger. 1998. Tyrosine phosphorylation of syndecan-1 and -4 cytoplasmic domains in adherent B82 fibroblasts. Biol. Chem. 273:3529135298.
Sharkey, M., Y. Graba, and M. P. Scott. 1997. Hox genes in evolution: protein surfaces and paralog groups. Trends Genet. 13:145151.[ISI][Medline]
Sitnikova, T., and C. Su. 1998. Coevolution of immunoglobulin heavy- and light-chain variable-region gene families. Mol. Biol. Evol. 15:617625.[Abstract]
Skrabanek, L., and K. H. Wolfe. 1998. Eukaryote genome duplicationwhere's the evidence? Curr. Opin. Genet. Dev. 8:694700.[ISI][Medline]
Smith, N. G. C., R. Knight, and L. D. Hurst. 1999. Vertebrate genome evolution: a slow shuffle or a big bang? Bioessays 21:697703.
Spring, J. 1997. Vertebrate evolution by interspecific hybridizationare we polyploid? FEBS 400:28.
. 2000. Tetrabase (http://www.unibas.ch/dib/zoologie/research/tetrabase2.html).
Stewart, C. E., and P. Rotwein. 1996. Growth, differentiation, and survival: multiple physiological functions for insulin-like growth factors. Physiol. Rev. 76:10051026.
Strutt, H., and D. Strutt. 1999. Polarity determination in the Drosophila eye. Curr. Opin. Genet. Dev. 9:442446.[ISI][Medline]
Swofford, D. S. 1999. PAUP*. Version 4.0. Sinauer, Sunderland, Mass.
Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37:221244.
Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:46734680.[Abstract]
Watanabe, S., and K.-I. Arai. 1996. Roles of the JAK-STAT system in signal transduction via cytokine receptors. Curr. Opin. Genet. Dev. 6:587596.[ISI][Medline]
Weinmaster, G. 1998. Notch signalling: direct or what? Curr. Opin. Genet. Dev. 8:436442.[ISI][Medline]
Wells, A. 1999. EGF receptor. Int. J. Biochem. Cell Biol. 31:637643.[ISI][Medline]
Zimmermann, P., and G. David. 1999. The syndecans, tuners of transmembrane signaling. FASEB J. 13:S91S100.