Is Tetralogy True? Lack of Support for the "One-to-Four Rule"

Andrew Martin

Department of Environmental, Population, and Organismic Biology, University of Colorado

Many hypotheses proposed as explanations of gene family diversity in vertebrates invoke two successive polyploidizations prior to the origin of fishes (see references in Skrabanek and Wolfe 1998Citation ; Martin 1999Citation ). This hypothesis has gained widespread support in the literature. Spring (1997)Citation has advanced the term "tetralogy" as a definition of orthology in which the four paralogous gene copies in vertebrates are all homologous to single-copy genes in invertebrates. Nadeau and Sankoff (1997)Citation assumed successive genome duplications early in vertebrates in a study examining the relative rates of acquisition of new gene function and gene loss. Similarly, recent reviews on genome evolution emphasize the "one-to-four rule" (Meyer and Schartl 1999Citation ), a rule that mirrors the intention of proposing tetralogy as an addition to the genomics vocabulary.

Recent papers have brought the issue of contemporary vertebrate genome organization into clearer focus and advocate adopting a hypothesis-testing framework for investigating genome evolution (Smith, Knight, and Hurst 1999Citation ). In this paper, I test some of the predictions of the tetralogy hypothesis by phylogenetic analysis of several multigene families that match the one-to-four rule. Alignments for gene families listed by Spring (2000)Citation for which there are four "tetralogous" copies in vertebrates and a single invertebrate ortholog (see also http://stripe.colorado.edu/~am/multigene.html) were constructed using CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ). Minimum-length gene trees were discovered using the branch-n-bound algorithm in PAUP*, and support for nodes of the tree was evaluated by bootstrapping the data 500 times (implemented in PAUP* Swofford 1999Citation )(fig. 1 ). Gene trees that did not match the tetralogy hypothesis were rearranged to match the predictions of tetralogy, and the rearranged tree was tested against the minimum-length tree (MLT) using a Wilcoxon signed-ranks test (Templeton 1983Citation ) implemented in PAUP* (Swofford 1999Citation )(rearranged trees are also presented in fig. 1 ).



View larger version (43K):
[in this window]
[in a new window]
 
Fig. 1.—Gene trees for multigene families examined in this study. For gene families with two trees, the gene tree on the left is the shortest tree, with numbers indicating bootstrap support for nodes. Nodes without values had bootstrap scores of less than 50. In cases in which the minimum length tree did not match the tetralogy hypothesis predictions, a second tree is shown that represents the minimum-length tree (of the three possible rearranged trees) that matches the predictions of the tetralogy hypothesis (tree on the right). In cases in which the minimum-length tree matched the tree shape prediction of the tetralogy hypothesis, only a single tree is presented. Table 1 provides results of statistical tests for the difference in length between the two trees (when appropriate)

 
Eight of the 10 trees did not match the predictions of the tetralogy hypothesis (fig. 1 ), although the hypothesis could not be refuted using a Wilcoxon signed-ranks test for 7 of the 8 trees (table 1 ). In addition, in the two cases in which the minimum-length tree matched the predictions of the tetralogy hypothesis, alternative trees could not be refuted (data not shown). These results suggest that (1) hierarchical signal of the relationships among paralogous genes is weak, and (2) the signal that exists does not provide convincing support (nor does it strongly refute) the tetralogy hypothesis.


View this table:
[in this window]
[in a new window]
 
Table 1 Summary of Tests of Alternative Hypotheses

 
The tetralogy hypothesis predicts the existence of chromosome paralogy, assuming chromosomal translocations have been rare. Thus, another strategy for evaluating the tetralogy hypothesis is to ask, for the gene families that are distributed on the same chromosomes, do the gene trees support paralogy of chromosomes? In other words, are trees for gene families that exist on the same chromosome identical in topology? Five pairs of gene families exhibited broadly similar chromosomal distributions. For these cases, gene trees were constructed that matched the predictions of the tetralogy hypothesis and required the fewest changes (of the three possible rooted trees of four taxa). In cases of incongruent chromosomal trees, gene trees were rearranged, and the rearranged trees were tested against the MLT using the signed-ranks test (Templeton 1983Citation ). Results of this analysis are presented for chromosomally codistributed pairs of gene family trees.

Bone morphogen proteins (BMPs) are important in determining cell fates during development (Dale and Jones 1999Citation ). ID genes encode helix-loop-helix transcription factors involved in cell growth and differentiation (Norton et al. 1998Citation ). Genes for the BMP 5–8 and ID gene families are distributed on chromosomes 1, 2, 6, and 20. The BMP 5 and BMP 6 genes are both on chromosome 6, although they are not closely linked. Trees of these two gene families that conform to the tetralogy hypothesis are not congruent, however. BMP depicts a sister relationship between chromosomes 1 and 20, whereas ID supports the grouping of chromosomes 2 and 20 (fig. 2 ). Furthermore, the two BMP genes on the same chromosome are sisters, supporting the hypothesis that they originated by intrachromosomal duplication. Evaluation of the congruence of the chromosomal relationship by rearranging gene trees (as described) yields significant results for all possible comparisons.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 2.—Gene trees for putative tetralogous gene families with similar chromosomal distributions. Gene trees are drawn assuming two successive gene duplications, and the gene tree of the three possible rooted trees of four taxa requiring the fewest amino acid changes is shown. The gene trees depict the relationships among chromosomes

 
The JAK gene family encodes kinases, proteins involved in signal transduction (Watanabe and Arai 1996Citation ). The Notch gene family encodes receptors involved in determination of cell identity during development and in cell-cell communication (Weinmaster 1998Citation ). Genes from each family are present on chromosomes 1, 9, and 19; however, the relationship among chromosomes differs between the gene families. The minimum-length JAK tree supports a chromosomal relationship of ((9, 19), (1, 19)), whereas the best Notch tree that matches the tetralogy hypothesis is ((1, 9), (2, 19)) (fig. 2 ). These two trees are incongruent, refuting hypotheses depicting paralogous chromosomes. Tests of the relationships among chromosomes by forcing the topology depicted by one gene family on the other all show significant incongruence. For instance, if the relationships among chromosomes suggested by the Notch gene family is forced onto the JAK gene family, the resulting topology requires 73 more changes than the MLT, a difference that is highly significant (Wilcoxon P << 0.01). Of particular interest is that two paralogous JAK genes are present on chromosome 19. If there were two rounds of genome duplication, then one of the genes must have been translocated to chromosome 19 adjacent to a related gene. It seems more likely that JAK 3 and TYK 2 originated by tandem gene duplication of a region of chromosome 19. Importantly, JAK 3 and TYK 2 are not the most closely related paralogous pair of the gene family, suggesting that the tandem duplication event preceded additional gene duplications that gave rise to the diversity of the gene family. If so, then the clustering of these paralogous genes has persisted for a long time, arguing for the role of selection in maintaining the observed linkage.

Src and src-related genes are part of a superfamily of genes encoding nuclear receptors (Leo and Chen 2000). Syndecans are transmembrane proteins involved in cell-cell communication (Zimmermann and David 1999Citation ). Both of these gene families have members on chromosomes 1, 8, and 20. Two of the src-related genes are on chromosome 8, although they are not close together on the chromosome. Concordance of the inferred relationships of chromosomes for the two gene families provides support for the existence of paralogous chromosomes (fig. 2 ). In both gene families, chromosomes 8 and 20 are sisters. However, the syndecan tree depicts a relationship between chromosomes 1 and 2, whereas src-related genes indicate that chromosomes 1 and 8 are related. Two explanations are plausible. First, there were whole-chromosome or genome duplications, and one of the src-related genes translocated from chromosome 2 to chromosome 8, or the syndecan gene translocated from chromosome 8 to chromosome 2. Alternatively, the two gene families may have independent histories, and the chromosomes appear paralogous because of factors other than descent, like translocation and selection.

Hox genes are well-known regulatory genes expressed during development (Sharkey, Graba, and Scott 1997Citation ). EGFRs are growth factor receptors involved in cell signaling cascades (Wells 1999Citation ). Bailey et al. (1997) inferred the phylogenetic relationships of Hox gene clusters based on Hox gene sequences and linked fibrillar-type collagen genes. Their results suggest that the Hox gene tree is asymmetric; namely, (D, [A, (B, C)]) (see Bailey et al. 1997Citation ; the uppercase-letter notation denotes the different Hox gene clusters.). They note that if genome duplication is invoked to explain the existence of four Hox gene clusters (A–D), then there must have been three rounds of duplication and the subsequent loss of four clusters to explain the Hox gene tree. Bailey et al. mention, however, that the rooting of the Hox gene tree may be in error and that the tree may be ([A ,D], [B, C]), although the high bootstrap values supporting the asymmetric tree makes this alternative tree unlikely (Bailey et al. 1997Citation , p. 850).

In this paper, I inferred the phylogeny of the EGFR family, which, like the collagen genes, are linked with the Hox genes. The EGFR phylogeny is symmetric, supporting the hypothesis that there were two rounds of genome duplication; however, even if we accept the symmetric Hox gene tree (i.e., [(A, D), (B, C)]), the two gene trees do not support identical histories for the chromosomes (fig. 2 ). If we rearrange the EGFR tree to match the published Hox gene tree (i.e., [D, (A, [B, C])]), the fit of the data is significantly less parsimonious than the minimum-length EGFR tree based on a Wilcoxon signed-ranks test (tree length difference [TLD] = 23; P = 0.015). Similarly, if we rearrange the EGFR tree to match the tetralogous Hox tree (i.e., [(A, D), (B, C)]), the topology is significantly longer than the MLT (TLD = 25; P < 0.01). This result is noteworthy because it suggests that physically linked genes may have independent histories.

The MyoD family of genes encode transcription factors underlying myogenesis (Arnold and Winter 1998Citation ). Insulin-like genes play roles in cell division, differentiation, and metabolism (Stewart and Rotwein 1996Citation ). Genes for the MyoD family are on chromosomes 1, 11, and 12, with Myf5 and Myf6 clustered close together on chromosome 12 (Cupelli et al. 1996Citation ). The insulin-like genes are found on chromosomes 9, 11, 12, and 19. The insulin gene tree depicts INSL 3 and INSL 4 as closely related paralogs, suggesting that if the tree were rearranged to match the predictions of tetralogy, IGF 1 and IGF 2 would be grouped together (fig. 2 ). If the grouping of Myf5/6 resulted from gene duplication and translocation, then the comparison of the insulin and MyoD gene family trees suggests that one of these two genes may have occurred on chromosome 9. The grouping of chromosomes 9 and 19 evident in the insulin gene family is also mirrored by the JAK data.

Phylogenetic analysis of gene families identified as being tetralogous does not provide convincing support for the underlying assumption that precipitated the tetralogy hypothesis, namely, that gene family diversity in vertebrates can be explained in part by multiple, successive tetraploidizations at the base of vertebrates. Two lines of evidence favor refutation of the tetralogy hypothesis. First, the minimum-length trees of only 2 of 11 gene families surveyed displayed the shape predicted by two successive rounds of genome duplication. Of the 9 gene family trees that did not match the tetralogy predictions, 2 were significantly shorter than the tetralogy hypothesis (BMP from this study and the Hox gene family; Bailey et al. 1997Citation ); conversely, in 7 of the 9 cases, the tetralogy hypothesis could not be refuted. Second, for several gene families for which paralogous members were distributed on the same chromosomes, little indication of chromosomal paralogy was evident, even when the trees matching the predictions of the tetralogy hypothesis were compared.

Although these results discount the likelihood of multiple wholesale genome duplications, they do not necessarily suggest independence of gene family evolution. Evidence of gene family coevolution (e.g., Sitnikova and Su 1998Citation ) implies that probabilities of gene duplication (meaning origin, fixation, and retention of new genes) may be dependent on prior gene duplication. Evidence for coevolution of functionally interacting gene families is likely to be difficult to uncover, however, because of the general complexity of gene families and the ubiquity of pleiotropic interactions among genes from different gene families (see Newfeld, Wisotzkey, and Kumar 1999Citation ).

Apparent synteny of putative paralogous chromosomes, in light of the lack of strong evidence of whole genome duplications, suggests that selection may play a role in maintaining syntenic arrangements of related genes on separate chromosomes (Hughes 1998Citation ). Clear evidence for selection on gene order is available for Hox genes (Mann 1997Citation ). In addition, several pairs of gene families with codistributed paralogous members are known to interact. Recent evidence suggests that JAK and Notch genes interact during eye development in Drosophila (Strutt and Strutt 1999Citation ). Similarly, src family kinases are thought to regulate syndecan phosphorylation (Ott and Rapraeger 1998Citation ). Other examples of interactions among genes on the same chromosome are known. These particular examples suggest that synteny of paralogous genes may be maintained by selection associated with gene regulation and interaction. My point is not that we can necessarily implicate selection, but that when attempting explain contemporary characteristics of organisms, both the history of mutation and selection need to be considered. Most of the discussion has focused almost exclusively on mutation as an explanation for contemporary vertebrate genome organization; however, sufficient explanation requires consideration of both mutation and selection.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 1 (Continued)

 
Footnotes

B. Franz Lang, Reviewing Editor

1 Keywords: tetralogy gene duplication vertebrate genome Back

2 Address for correspondence and reprints: Andrew Martin, Department of Environmental, Population, and Organismic Biology, University of Colorado, Boulder, Colorado 80309.am{at}stripe.colorado.edu Back

literature cited

    Arnold, H. H., and B. Winter. 1998. Muscle differentiation: more complexity to the network of myogenic regulators. Curr. Opin. Genet. Dev. 8:539–544.[ISI][Medline]

    Bailey, W. J., J. Kim, G. P. Wagner, and F. H. Ruddle. 1997. Phylogenetic reconstruction of vertebrate Hox cluster duplication. Mol. Biol. Evol. 14:843–853.[Abstract]

    Cupelli, L., B. J. Renault, A. Leblanc-Straceski, D. Banks, R. Ward, R. Kucherlapati, and K. Krauter. 1996. Assignment of the human myogenic factors 5 and 6 (MYF5, MYF6) gene cluster to 12q21 by in situ hybridization and physical mapping of the locus between D12S350 and D12S106. Cytogenet. Cell Genet. 72:250–251.[ISI][Medline]

    Dale, L., and C. M. Jones. 1999. BMP signaling in early Xenopus development. Bioessays 21:751–760.

    Hughes, A. L. 1998. Phylogenetic test of the hypothesis of block duplication of homologous genes on chromosomes 6, 9 and 1. Mol. Biol. Evol. 15:854–870.[Abstract]

    Leo, C., and J. D. Chen. 2000. The SRC family of nuclear receptor coactivators. Gene 245:1–11.

    Mann, R. S. 1997. Why are Hox genes clustered? Bioessays 66:1–4.

    Martin, A. P. 1999. Increasing genomic complexity by gene duplication and the origin of vertebrates. Am. Nat. 154:111–128.[ISI]

    Meyer, A. M., and M. Schartl. 1999. Gene and genome duplications in vertebrates: the one-to-four (-to-eight in fish) rule and the evolution of novel gene functions. Curr. Opin. Cell Biol. 11:699–704.[ISI][Medline]

    Nadeau, J. H., and D. Sankoff. 1997. Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. Genetics 147:1259–1266.

    Newfeld, S. J., R. G. Wisotzkey, and S. Kumar. 1999. Molecular evolution of a developmental pathway: phylogenetic analysis of transforming growth factor-beta family ligands, receptors and Smad signal transducers. Genetics 152:783–795.

    Norton, J. D., R. W. Reed, G. Craggs, and F. Sablitzky. 1998. Id helix-loop-helix proteins in cell growth and differentiation. Trends Cell Biol. 8:58–65.[ISI][Medline]

    Ott, V. L., and A. C. Rapraeger. 1998. Tyrosine phosphorylation of syndecan-1 and -4 cytoplasmic domains in adherent B82 fibroblasts. Biol. Chem. 273:35291–35298.[Abstract/Free Full Text]

    Sharkey, M., Y. Graba, and M. P. Scott. 1997. Hox genes in evolution: protein surfaces and paralog groups. Trends Genet. 13:145–151.[ISI][Medline]

    Sitnikova, T., and C. Su. 1998. Coevolution of immunoglobulin heavy- and light-chain variable-region gene families. Mol. Biol. Evol. 15:617–625.[Abstract]

    Skrabanek, L., and K. H. Wolfe. 1998. Eukaryote genome duplication—where's the evidence? Curr. Opin. Genet. Dev. 8:694–700.[ISI][Medline]

    Smith, N. G. C., R. Knight, and L. D. Hurst. 1999. Vertebrate genome evolution: a slow shuffle or a big bang? Bioessays 21:697–703.

    Spring, J. 1997. Vertebrate evolution by interspecific hybridization—are we polyploid? FEBS 400:2–8.

    ———. 2000. Tetrabase (http://www.unibas.ch/dib/zoologie/research/tetrabase2.html).

    Stewart, C. E., and P. Rotwein. 1996. Growth, differentiation, and survival: multiple physiological functions for insulin-like growth factors. Physiol. Rev. 76:1005–1026.[Abstract/Free Full Text]

    Strutt, H., and D. Strutt. 1999. Polarity determination in the Drosophila eye. Curr. Opin. Genet. Dev. 9:442–446.[ISI][Medline]

    Swofford, D. S. 1999. PAUP*. Version 4.0. Sinauer, Sunderland, Mass.

    Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37:221–244.

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680.[Abstract]

    Watanabe, S., and K.-I. Arai. 1996. Roles of the JAK-STAT system in signal transduction via cytokine receptors. Curr. Opin. Genet. Dev. 6:587–596.[ISI][Medline]

    Weinmaster, G. 1998. Notch signalling: direct or what? Curr. Opin. Genet. Dev. 8:436–442.[ISI][Medline]

    Wells, A. 1999. EGF receptor. Int. J. Biochem. Cell Biol. 31:637–643.[ISI][Medline]

    Zimmermann, P., and G. David. 1999. The syndecans, tuners of transmembrane signaling. FASEB J. 13:S91–S100.

Accepted for publication September 24, 2000.