Analysis of Lamprey and Hagfish Genes Reveals a Complex History of Gene Duplications During Early Vertebrate Evolution

Hector Escriva*, Lori Manzon{dagger}, John Youson{dagger} and Vincent Laudet*

*CNRS UMR 5665, Laboratoire de Biologie Moléculaire et Cellulaire, Ecole Normale Supérieure de Lyon, Lyon Cedex, France,
{dagger}Division of Life Sciences, University of Toronto at Scarboroug


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
It has been proposed that two events of duplication of the entire genome occurred early in vertebrate history (2R hypothesis). Several phylogenetic studies with a few gene families (mostly Hox genes and proteins from the MHC) have tried to confirm these polyploidization events. However, data from a single locus cannot explain the evolutionary history of a complete genome. To study this 2R hypothesis, we have taken advantage of the phylogenetic position of the lamprey to study the history of gene duplications in vertebrates. We selected most gene families that contain several paralogous genes in vertebrates and for which lamprey genes and an out-group are known in databases. In addition, we isolated members of the nuclear receptor superfamily in lamprey. Hagfish genes were also analyzed and found to confirm the lamprey gene analysis. Consistent with the 2R hypothesis, the phylogenetic analysis of 33 selected gene families, dispersed through the whole genome, revealed that one period of gene duplication arose before the lamprey-gnathostome split and this was followed by a second period of gene duplication after the lamprey-gnathostome split. Nevertheless, our analysis suggests that numerous gene losses and other gene-genome duplications occurred during the evolution of the vertebrate genomes. Thus, the complexity of all the paralogy groups present in vertebrates should be explained by the contribution of genome duplications (2R hypothesis), extra gene duplications, and gene losses.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
On the basis of the observation of the genome size of chordates, it was suggested in 1970 that two rounds of tetraploidization occurred in the lineage leading to vertebrates (Ohno 1970Citation , pp. 124–131). The haploid genome of amphioxus contains around 0.6 pg of DNA, whereas hagfish, lampreys, most teleost fish, and tetrapods have larger genomes. One duplication was proposed to have taken place between the divergence of tunicates and amphioxus, and the other later on in gnathostomes (fig. 1A; Atkin and Ohno 1967Citation ).



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 1.—Schematic representation of the chordate phylogeny. The periods of gene or genome duplications proposed by Ohno (1970)Citation (A) and Holland et al. (1994)Citation (B) are indicated by arrows. According to Delarbre et al. (2000)Citation , the precise relationships existing between hagfish and lamprey are not resolved. Thus, we cannot formally exclude on the basis of the available evidences the possibility that hagfish and lamprey form a natural group

 
This hypothesis was modified following Schmidtke et al. (1977)Citation , who used isozyme electrophoresis to estimate the number of loci encoding eight categories of enzymes in amphioxus and the ascidian Ciona and suggested the same gene number for all enzymes studied. These data did not support polyploidization between the divergence of tunicates and amphioxus but did not contradict the Ohno's second phase polyplodization hypothesis in vertebrates. Holland et al. (1994)Citation compared gene number between amphioxus, lamprey, hagfish, and mammals, using seven gene families (Msx, Hox, Cdx, MnSOD, En, Wnt, and Insulin-IGF), and proposed a reformulation of the Ohno (1970)Citation model, in which two series of extensive gene duplication occurred early in vertebrate evolution. However, their data do not match the timings suggested in the earlier model. One duplication was proposed to have occurred on the vertebrate lineage after the divergence of cephalochordates and the second after the divergence of jawless vertebrates (hagfish and lampreys; Holland et al. 1994Citation ; fig. 1B ). It has been noted that many single-copy Drosophila genes have four vertebrate orthologues, consistent with the notion of two rounds of genome duplication in vertebrates (Sidow 1996Citation ; Spring 1997Citation ). Different proposals for the timing of duplication events in the vertebrate lineage have been made by other authors and are summarized by Skrabanek and Wolfe (1998)Citation . This model has been refined by the study of gene families which have doubled in gene number between invertebrates and vertebrates, as well as the analysis of extensive paralogy regions in the human genome. Sharman and Holland (1996)Citation suggested that the first duplication event concerned only a subset of the genome, whereas the second event was a complete tetraploidization. Nevertheless, recently, the same laboratory provides evidence in favor of 2R (Furlong and Holland 2002Citation ).

Numerous observations have suggested that more genes exist in vertebrates than in invertebrates (including cephalochordates), but the timing and mechanism of the duplication events remain unclear. Hughes (1999)Citation proposed that according to the hypothesis of two rounds of polyploidization (2R hypothesis), paralogous families with four vertebrate genes should exhibit a topology of the form (AB)(CD), rather than (A)(BCD). He studied the existence of two tetraploidization events by the phylogenetic analysis of nine gene families. Only one of the nine gene families studied followed an (AB)(CD) pattern. The lack of (AB)(CD) topologies in other gene families has also been demonstrated by other authors (Martin 1999Citation ; Friedman and Hughes 2001Citation ; IHGSC 2001Citation ; Martin 2001Citation ). However, the assumption that the 2R evolution of vertebrates should produce (AB)(CD) topologies is too strict, and other evolutionary possibilities like hybridization followed by tetraploidization can explain topologies like (A)(BCD).

The analysis of Hox genes in the lamprey, Lampetra fluviatilis, has revealed that none of the paralogous groups contain four genes, which could be consistent with three or four Hox clusters (Sharman and Holland 1998Citation ). Given the phylogenetic position of lamprey at the base of the gnathostomes (Delarbre et al. 2000Citation ), a three-cluster state cannot be reconciled with a simple 2R hypothesis such as that proposed in figure 1 , without implying a specific gain or loss of a lamprey Hox complex. However, it is also possible to explain these data by several incomplete genome duplication events. Because the Hox genes are difficult to use as phylogenetic markers, it is not possible to determine which of these hypotheses are correct.

The aforementioned example clearly shows that further study of lamprey genes is required to bring important new information on the evolution of genomes in early vertebrates. The cloning and phylogenetic analysis of lamprey genes could be an efficient approach to delineate the timing and mechanisms of gene duplication events. Chordates such as amphioxus are known to possess a single copy for each vertebrate paralogue group (Garcia-Fernandez and Holland 1994Citation ; Sidow 1996Citation ; Escriva et al. 1997Citation ), but very little is known about the lamprey and no systematic study of lamprey genes has been conducted. In the present article, we have focused our study on the cloning and characterization of available lamprey genes to analyze gene duplication events. The analysis of many genes dispersed throughout the genome allows us to discuss the evolution of the whole genome, not only the evolution of some particular loci. We isolated several members of the nuclear receptor superfamily in lampreys because we have previously shown that nuclear receptors are very good phylogenetic markers for gene duplication studies (Escriva et al. 1997Citation ; Marchand et al. 2000Citation ). We also selected gene families with a good phylogenetic signal for which duplicate vertebrate genes and lamprey and out-group sequences are known. The phylogenetic analyses of these lamprey sequences, together with some hagfish genes, are consistent with the hypothesis of two events of tetraploidization during the chordate-vertebrate transition. The first occurred before the divergence leading to lampreys and vertebrates and the second after this divergence. In addition, extra gene duplications occurred both before and after the cyclostome-gnathostome split. Our data imply that a high rate of secondary gene loss occurred during vertebrate evolution.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Isolation and Sequencing of Lamprey Nuclear Receptor cDNAs
Total RNA was extracted from adult Petromyzon marinus liver, brain, and muscle and was transcribed to cDNA using random hexamers (Promega) and reverse transcriptase (RT-MMLV, Boehringer). These cDNAs were pooled and used as templates for PCR amplification, as previously described (Escriva, Robinson, and Laudet 1999Citation , pp. 1–28). The sense and antisense degenerate primers were designed from conserved regions of nuclear receptors (Escriva, Robinson, and Laudet 1999Citation ). RNAs of tissues from dogfish (Scyliorhinus canicula) were also used to search for thyroid hormone receptor cDNAs (E. Rull and V. Laudet, personal communication).

A touchdown PCR amplification was performed as follows: 10 min denaturation step at 94°C; then 4 cycles of 94°C (30 s), 55°C (30 s), 72°C (1 min); then 4 cycles of 94°C (30 s), 50°C (30 s), 72°C (1 min); then 4 cycles of 94°C (30 s), 45°C (30 s), 72°C (1 min); then 4 cycles of 94°C (30 s), 40°C (30 s), 72°C (1 min); then 30 cycles of 94°C (30 s), 37°C (30 s), 72°C (1 min) and 5 min at 72°C. PCR amplification was carried out using outer primers followed by a seminested PCR using the same conditions with inner primers (Escriva, Robinson, and Laudet 1999Citation ).

The PCR products were visualized in a 1% agarose gel containing ethidium bromide, and products of the expected size were directly cloned into the TOPO cloning vector (Invitrogen). Three independent clones for each PCR amplification were sequenced by the dideoxy chain termination method with an ABI DNA sequencer 377A (Perkin-Elmer) using synthetic nucleotide primers.

Phylogenetic Analysis
The total number of gene families analyzed was 33, three of which were new nuclear receptor sequences isolated in the present work. The sequences used in the study, including alignments and the resulting phylogenetic trees, are available upon request.

Amino acid sequences were aligned using the CLUSTAL W program (Thomson, Higgins, and Gibson 1994Citation ) and manually corrected with SEAVIEW (Galtier, Gouy, and Gautier 1996Citation ). Phylogenetic trees were inferred by the Neighbor-Joining method (Saitou and Nei 1987Citation ) with Poisson-corrected distances on amino acids, implemented in PHYLO_WIN (Galtier, Gouy, and Gautier 1996Citation ); amino acid sites with gaps in any sequence were excluded from the calculations. The bootstrap analysis (1,000 repetitions) was carried out by the method of Felsenstein (1985)Citation . Divergent sequences for which the alignment was uncertain were excluded.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
Design of the Study
We selected gene families from the GenBank that met the following criteria: (1) Several paralogous genes were known in vertebrates. (2) At least one member of the family existed in lampreys or hagfish (or both). (3) An out-group was available for phylogenetic analysis. The out-group was either an arthropod or an invertebrate chordate gene or a distantly related paralogous member of the superfamily. (4) A phylogenetic signal that clearly resolved each known vertebrate paralogue existed. In addition, by RT-PCR analysis with degenerate primers, we isolated in the lamprey (P. marinus) partial cDNA sequences for thyroid hormone receptors (TR), peroxisome proliferator-activated receptors (PPARs) (AF316877), and retinoid X receptors (RXRs) (AF316878).

The phylogenetic trees obtained for the different gene families analyzed could always be schematized by one of the topologies shown in figure 2 . When the phylogeny of a gene family had been previously reported in the literature we reanalyzed it and drew new trees for all the families. Thus, the results discussed subsequently will refer to these topologies, and the actual phylogenetic trees will not be shown, except for the three examples depicted in figure 3 . The results for protein families for which bootstrap values of the relevant branches were less than 50% have not been included in the analysis. Some of the gene families for which phylogenetic trees were constructed have thus been excluded from our conclusions caused by low bootstrap values. These families are RXR (for which we isolated a new lamprey gene), synapsin (SYN), and CXC chemokines. The bootstrap values of important nodes (numbered at the respective nodes in fig. 2 ) are shown in table 1 .



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 2.—Schematic representations of the different phylogenetic patterns obtained from the 33 analyzed gene families. A, B, C, and D represent the jawed vertebrate paralogues and "Jawless" indicates the position where the lamprey or hagfish (or both) genes are located in the trees. OUT represents the out-group. The symbol {blacktriangleleft}–1 indicates that a duplication arose before the jawless-gnathostome split, whereas the symbol 1–{blacktriangleright} indicates that the duplication occurred after the jawless-gnathostome split. Panels H and I represent examples of gene families where only some branches were supported by high bootstrap values and were only used partially for the conclusion of the present study. The symbol <-1 indicates that one duplication arose before the jawless-gnathostome split, and we ignore the evolution of the unresolved branches after such a split. The symbol 1{Rightarrow} indicates that one duplication occurred after the jawless-gnathostome split, but we ignore the evolution of the gene family before the jawless-gnathostome split. The relevant gene families are indicated below each tree. Key nodes are numbered, and the corresponding bootstrap values for each gene family and for each node are indicated in table 1

 


View larger version (35K):
[in this window]
[in a new window]
 
Fig. 3.—Neighbor-joining trees of TR (A), PPAR (B), and FGFR (C) families, obtained as described in the Materials and Methods. Bootstrap values that are relevant to the discussion of the present work are indicated. Lamprey and hagfish sequences are in bold

 

View this table:
[in this window]
[in a new window]
 
Table 1 List of Analyzed Genes with GenBank Accession Numbers for the Lamprey and Hagfish (labeled with an asterisk) Genes

 
Hagfish sequences can be used in addition to those of the lamprey to localize gene duplication events in early vertebrate evolution. Patterns of gene duplication similar to those we found in lampreys can be inferred from previously reported studies of the phylogenetic relationships of hagfish genes. The results of the analysis carried out on both lampreys and hagfish genes are presented in table 1 , and the two types of data are discussed together subsequently.

Gene Families with Only Two Known Gnathostome Paralogues
We found seven gene families (EN, MHCIII, OTX, PAB, PTPN3, TR, Wnt5, see table 1 for accession numbers) whose phylogenetic analysis gives rise to a topology, consistent with one specific duplication after the lamprey-gnathostome split (fig. 2A ). One interesting example is the TR family which contains two vertebrate paralogues TR{alpha} (NR1A1) and TRß (NR1A2) (Nuclear Receptors Nomenclature Committee 1999Citation ). A large set of TR sequences is available in the GenBank, including an ascidian (Ciona) that may be used as an out-group. Previous phylogenetic analysis of these sequences revealed that they give a good phylogenetic signal, giving rise to robust trees (Marchand et al. 2000Citation ). By RT-PCR analysis we found TR{alpha} and TRß homologues in a cartilaginous fish (S. canicula), clearly showing that the TR{alpha}-TRß duplication occurred before the origin of gnathostomes. Surprisingly, in lamprey we found two TRs, TR1 and TR2, that cluster together in a phylogeny (fig. 3A ) and join the TR tree before the TR{alpha}-TRß duplication, as in figure 2A. This clearly shows that an ancestral TR gene was duplicated specifically in the gnathostome lineage to give rise to the TR{alpha} and ß genes. A hagfish TR homologue clusters with the lamprey TRs confirming that the TR{alpha}-TRß duplication arose specifically in gnathostomes (fig. 3A; E. Rull and V. Laudet, personal communication).

The OTX gene family contains two vertebrate paralogues, Otx1 and Otx2, and three lamprey sequences are available: two from L. japonica and one from P. marinus. A unique amphioxus gene was used as an out-group. Our results show that the three lamprey OTX sequences cluster together between the amphioxus and the vertebrate-specific duplication (Otx1–Otx2). As in the case of TR, this demonstrates that a duplication event took place after the lamprey-gnathostomes divergence. Furthermore, an independent, lamprey-specific duplication occurred in L. japonica. This result is consistent with a previous analysis (Tomsa and Langeland 1999Citation ). Hagfish sequences also provide specific examples of this pattern.

The protein tyrosine phosphatase PTPN3 subfamily, one of the 17 subfamilies of PTP (Ono-Koyanagi et al. 2000Citation ), gave a figure 2A pattern with a unique hagfish homologue that split before the duplication that gave rise to the two paralogues.

The Wnt gene superfamily is composed of at least 12 subfamilies, and hagfish members are known in five of them (Wnt3, Wnt4, Wnt5, Wnt7, and Wnt10). The phylogeny of this superfamily has been previously analyzed showing low bootstrap values in most of the subfamilies, except in Wnt5 (Sidow 1992Citation ; Schubert et al. 2000Citation ). The Wnt5 subfamily contains three hagfish sequences and two paralogues in gnathostomes (Wnt5a and Wnt5b). The hagfish sequences group before the gnathostome-specific split between Wnt5a and Wnt5b. For the present study, we took into account this duplication event after the hagfish-gnathostome divergence.

Two other gene families (LMP2 and LMP7) that also contain only two vertebrate paralogues showed a topology consistent with the opposite scenario, that is, a unique gene duplication occurred before the divergence of lamprey and gnathostomes (fig. 2B ). For the LMP7 family, a lamprey and a hagfish gene exist in the database and cluster together with the gnathostome LMP7 genes after the divergence between LMP7 and LMPX. The lack of a second gene that clusters with LMPX in lampreys and hagfish may be because of either the low number of lamprey-hagfish genes that have been identified to date or a secondary loss of LMPX in agnathans. Topologies consistent with this scenario have also been found by other authors (Nonaka et al. 1997Citation ) for the LMP2 family. In hagfish, the three gene families of the PTP superfamily (PTPR4, PTPR5, and PTPN6) also illustrate that a gene duplication arose before the jawless-gnathostome split, as shown in figure 2B (Ono-Koyanagi et al. 2000Citation ).

Gene Families with Three Vertebrate Paralogues
Vertebrate gene families in which three paralogues are known are the most frequent in our data set. Our phylogenetic analysis of lamprey homologues of these genes reveals three possible scenarios.

Five families (COMP, FIBRI, LDH, MASP, and NF; table 1 ) exhibit a topology consistent with the occurrence of two gene duplications before the divergence between lampreys and gnathostomes (fig. 2C ). Our phylogenetic results for the complement factors C3, C4, and C5 (COMP), with a lamprey C3-like sequence, confirm the previously reported data (Kuraku et al. 1999Citation ). The lamprey C3-like protein clusters with the vertebrate C3 complement factor after the two gene duplications that lead to C5, C4, and then C3. This scenario implies that homologues of the C5 and C4 genes should exist in lamprey or have been secondarily lost. The lamprey neurofilaments (NFs) are unique in being homopolymers of a single subunit (NF-180), in contrast with the mammalian neurofilaments which contain three subunits, coded by three different genes NF-M, NF-H, and NF-L. The phylogenetic tree shows that the lamprey NF-180 clusters with the NF-M subunit of jawed vertebrates after the split (NF-H)(NF-L, NF-M). Again, this suggests that two gene duplications occurred before the lamprey-gnathostome split. Similar topologies and conclusions were found by others (Tsuji et al. 1994Citation ; Endo et al. 1998Citation ; Suga et al. 1999Citation ) and us (data not shown) for the other three families.

As in the case of vertebrate gene families with two paralogues, the opposite scenario (two duplications taking place after the lamprey-gnathostome split) (fig. 2D ) was also found in three of the gene families analyzed (ENOL, HEMO, SPI). The transcription factors Spi-1, Spi-B, and Spi-C (Shintani et al. 2000Citation ) and the published phylogenetical analyses of enolases {alpha}, 2, and 3 (Kuraku et al. 1999Citation ) and hemoglobins A and B and myoglobin (Lanfranchi et al. 1994Citation ) confirm our findings.

Finally, three gene families (ALDO, Insulin-IGF, PPAR) follow a third scenario in which one gene duplication occurred between amphioxus and lamprey and the other after the lamprey-gnathostome split, as schematized in figure 2E. The PPAR family contains three paralogous genes in vertebrates, PPAR{alpha} (NR1C1), PPARß (NR1C2), and PPAR{gamma} (NR1C3). In a phylogenetic tree, PPAR{gamma} is the most divergent, whereas PPAR{alpha} and PPARß appear to be more closely related (Laudet 1997Citation ). In lampreys, we isolated a unique sequence that clusters before the duplication, thereby giving rise to PPAR{alpha} and PPARß (fig. 3B ). This suggests that, as for TR{alpha} and TRß, the PPAR{alpha}-PPARß duplication is specific to gnathostomes. We could not find a lamprey PPAR{gamma} homologue. The phylogeny robustly indicates that a lamprey PPAR{gamma} homologue should exist. We explain this by the known restricted expression pattern of PPAR{gamma} that may render its isolation difficult or by a secondary loss of PPAR{gamma} in lampreys. The aldolase enzyme gene family, including Ald-A, Ald-B, and Ald-C (Kuraku et al. 1999Citation ), also follows this gene duplication pattern. The Insulin-IGF family that contains three vertebrate paralogues (insulin, IGF-1, and IGF-2) has two hagfish homologues (insulin and IGF). Its analysis gave rise to a pattern consistent with one early and one recent duplication like the one depicted in figure 2E (Patton, Luke, and Holland 1998Citation ).

Gene Families with Four Paralogous Members
According to the 2R hypothesis, the families for which four vertebrate paralogues are known represent examples in which gene loss has not occurred (except for gene families with clear tandem duplications, like the glucagon gene family, that were excluded in this study). We analyzed two gene families (Neurotrophins [NT] and FGFR) in which lamprey homologues and four gnathostome paralogues are known. The phylogeny of the NT family suggests that two gene duplications occurred before the lamprey-gnathostome split and one occurred after the split (fig. 2F ). NTs include nerve growth factor (NGF), brain-derived neurotrophic factor (BDNF), NT3, and NT4/5 in vertebrates. Their phylogenetic relationships with a L. fluviatilis NT-like gene have been studied by Hallböök, Lundin, and Kullander (1998)Citation but with no out-group. We have thus realigned and reanalyzed the NT phylogenetic relationships using the virus sequence AF198100 as an out-group. This analysis yielded a tree supported by high bootstrap values in all branches. This tree shows that NT3 and NGF diverged before the lamprey-gnathostome split, whereas the duplication leading to NT4/5 and BDNF occurred after it. Thus, the lamprey sequence is a nonduplicated homologue of both NT4/5 and BDNF. According to this analysis, homologues of NT3 and NGF should exist in lampreys.

For the FGFR gene family that contains four distinct members (FGFR1–FGFR4) (Suga et al. 1999Citation ), two gene duplications occurred after the lamprey-gnathostome split producing a topology shown in figures 2G and 3C.

Incomplete Data
Data from some of the gene families studied were used partially for the conclusions of the present work because of either low bootstrap values in some, but not all the relevant branches, or the possible presence of the other lamprey genes that have not yet been isolated (fig. 2H and I ). Two of these "incomplete" families contain three paralogues (TRK and PTPR2A) and three of them contain four paralogues (PDGFR, and the FGR and HCK subfamilies from the SRC gene family).

The TRK family that encodes for the tyrosine kinase receptors contains two lamprey homologues that cluster together (Hallböök, Lundin, and Kullander 1998Citation ). The lamprey genes and the TRKB and TRKC paralogues were collapsed into a polytomy that splits after the TRKA gene divergence caused by low bootstrap values (fig. 2H ). In this case, we took into account only the first duplication leading to (TRKA)(TRKB, TRKC, Lamp), before the lamprey-gnathostome split. The PTPR2A family which contains three vertebrate (PAP{sigma}, PTP{delta}, LAR) paralogues and two hagfish sequences could only be partially analyzed because of the incomplete data within the PTP{sigma}-PTP{delta} branch. However, a duplication before the agnathan-gnathostome split can be assumed for the final conclusions of the present study, as in figure 2H.

The PDGFR family (composed of the Flt3, CSF-1R, PDGFR{alpha}, and PDGFRß genes) contains two known hagfish and lamprey genes. The two lamprey sequences, one hagfish sequence, and the corresponding Flt3 and CSF-1R paralogues from vertebrates cluster together with low bootstrap values. The hagfish sequence (AB025554) diverges before the duplication between PDGF{alpha}R and PDGFßR. From these results, we can only suggest one gene duplication after the agnathan-gnathostome divergence (Suga et al. 1999Citation ) (fig. 2I ).

An uncertainty about the presence, number, and timing of gene duplications also arose from our analysis of the three hagfish and two lamprey members of the SRC (cytosolic tyrosine kinase) gene family. The SRC superfamily follows a complex evolutionary pattern of several gene duplications arising before the lamprey-gnathostome split that produced several subfamilies (Suga et al. 1999Citation ) such as the HCK (composed of hck-lyn-lck-blk genes) and FGR (composed of fgr-fyn-src-yes genes) subfamilies. The two subfamilies show low bootstrap values in the recent branches that we collapse into polytomies, but a high bootstrap value supports a gene duplication before the lamprey-gnathostome split. In this case, we took into account this gene duplication for the final conclusions of this work. Another three gene families were of no value for the conclusions of the present work because of the uncertainty of the existence of other lamprey genes (SYN) and low bootstrap values (CXC, RXR).


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
A Complex History of Gene Duplication During Early Vertebrate Evolution
In the present study, we have analyzed for the first time the phylogenetic histories of 33 gene families for which lamprey or hagfish (or both) genes exist as do at least two paralogue genes in jawed vertebrates. Families for which a single gene is known or for which duplications arose obviously late, such as the oxitocin-vasotocin family where a gene duplication occurred specifically in mammals, have been omitted. For three of the 33 families analyzed, we isolated new lamprey nuclear receptor sequences, and the other 30 were obtained from databases. Three of the 33 families were of no value because of unresolved phylogenetic trees. Twenty-five of the 33 gene families show phylogenies supported by bootstrap values higher than 50% in all relevant branches and were analyzed further. Five other gene families were only used partially because of low bootstrap values in some of the relevant branches. The phylogeny of all the 30 protein families used for the final conclusions could always be reduced to one of the schematic patterns in figure 2 .

In several cases in our analysis, for example in PPAR, the topology of the tree suggested the presence of a lamprey gene that is still unknown. This absence of lamprey sequences may be attributed either to the incomplete data that are presently available for lampreys or to the secondary gene loss that occurred specifically in this lineage. The amount of gene loss arising in such a lineage is still unknown (see subsequently). Through the increasing accumulation of sequence data in strategically important organisms such as lampreys we will hopefully soon be able to discriminate between these two possibilities.

From the phylogenetic patterns obtained, we have observed that 18 of the 30 gene families (60%) duplicated at least once before the lamprey-gnathostome split (fig. 2 , topologies B, C, E, F, and H). Six of these 18 (20% of total) duplicated a second time in this period (fig. 2 , topologies C and F). Sixteen of the 30 genes (53%) duplicated at least once after the lamprey-gnathostome duplication (fig. 2 , topologies A, D, E, G, and I), and four of them (13% of the 30) duplicated a second time, specifically in jawed vertebrates (fig. 2 , topologies D and G). These results suggest that the duplications took place during the chordate-craniate and the vertebrate-gnathostome transitions, two periods during which important morphological innovations occurred. Taken together, these data suggest that at least two series of gene duplications occurred during early vertebrate evolution: one before and one after the lamprey-gnathostome split. Interestingly, our data also clearly indicate that other gene duplications occurred during the chordate-vertebrate transition. Thus, 20% of the analyzed genes duplicated twice before the agnathan-gnathostome split and 13% of them duplicated twice afterward. The proportion of gene duplication events that fall into these various groups changes slightly depending on the bootstrap value supporting the relevant branches. However, our conclusions are unchanged if only the strongly supported branches (value above 85%) are considered (fig. 4 ).



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 4.—Schematic representation of the chordate phylogeny. Arrows mark the periods where gene duplications have been detected in the present study. The percentages of genes duplicated have been calculated from the number of gene families that duplicated once or twice before or after the lamprey-gnathostome split compared with the 30 relevant phylogenies analyzed in the present study. The three unresolved gene families are also indicated (and the percentage corresponds to the total 33 analyzed families). To take into account the bootstrap value supporting the relevant branches, the number of each gene duplication found within a given bootstrap interval is indicated

 
The 2R Hypothesis Revisited
The fact that we observed one gene duplication arising before the lamprey-gnathostome split or one after in about 50% of all cases favors the scenario of two tetraploidization events taking place early in vertebrate evolution, as suggested by the 2R hypothesis. Nevertheless, Hughes (1999)Citation proposed that if two rounds of complete genome duplication occurred as proposed in the classical 2R hypothesis, the phylogenetic trees should produce topologies like (AB)(CD) instead of (A)(BCD). He showed that for nine gene families only one followed an (AB)(CD) pattern and concluded that this was evidence against the 2R hypothesis. But this analysis lacked lamprey data, and the (AB)(CD) prediction is not inherent to the 2R model. Thus, 2x allotetraploidy could yield (A)(BCD). Furlong and Holland (2002)Citation argue that 2x autotetraploidy can also yield (A)(BCD) for loci that pass through a transient octoploid state. From our data analysis of 33 protein families, only six were composed of four paralogue genes, 15 had three paralogues, and 12 had two paralogues. Interestingly, our failure to find any gene family with five or more paralogues favors the 2R hypothesis because this scenario would be expected if totally independent gene duplications had taken place. In our data set, the six families composed by four paralogue genes are the NTs (BDNF, NGF, NT3, and NT4), the FGF receptor subfamily (FGFR1–FGFR4), the CXC chemokines subfamily (IL-8–like A, IL-8–like B, IL-8, and IFN {gamma} induced chemokines), the PDGFR family, and the two SRC subfamilies. If the (AB)(CD) theory is correct and the lamprey ancestor emerged before the second round of genome duplication, then according to the classical view its sequences should emerge between the (AB) and (CD) clusters. Our analyses of these six families and their lamprey homologues did not yield any family that supports the (AB)(CD) topology, as defined by Hughes (1999)Citation .

Some of the families composed of two or three paralogues fit the (AB)(CD) pattern as subsets of the expected tree (Lamp1(AB))(Lamp2(CD)), taking into account a possible high level of secondary gene loss (see subsequently). These families are the ones that are schematized in figure 2A, B, and E and include 15 of the 23 families composed of two or three paralogues, with bootstrap values higher that 50% in all the relevant branches. In contrast, families with the topology presented in figure 2C, D, and F are inconsistent with the (AB)(CD) pattern. These families represent eight of the 23 relevant families.

From the observed evolutionary patterns of the gene families that we studied and the gene families studied by Hughes and taking into account that the prediction made by Hughes (1999)Citation is not a prediction inherent to the 2R model as explained earlier, the evolutionary pattern of a gene family is not a proof that can validate or exclude the 2R hypothesis. On the contrary, the 2R hypothesis of tetraploidization implies that the various paralogues of vertebrate-linked genes should have the same duplication history (Kasahara et al. 1996Citation ). Indeed, large regions of synteny have been detected between divergent vertebrates such as zebrafish and humans, suggesting that the gene order in the vertebrate genome is much more conserved than was anticipated previously (Postlethwait et al. 1998Citation ). Thus, the best way to study the evolution of the vertebrate genome would be to make a comparative study of the syntenic regions between key organisms within the vertebrate lineage, like amphioxus, lampreys, and other vertebrates.

Importantly, we found that in addition to the main duplication events that arose before and after the lamprey-gnathostome divergence, a significant number of supplementary gene duplications also occurred (as mentioned previously). A simple 2R hypothesis cannot account for these events. This result suggests that the genomic evolution of early vertebrates was more complex than that anticipated. Given the complex paleontological history that is known for early vertebrates (Janvier 1996, pp. 327–330Citation ) it is not surprising to observe that a complicated history also occurs at the level of the genome.

The 2R hypothesis has two major predictions: (1) the phylogeny of different gene families should give rise to similar topologies consistent with a limited number of periods in which gene duplication occurred; and (2) the gene duplications should occur simultaneously. Our work only allowed us to study the first of these assumptions, and we provide clear evidence in favor of two major periods of gene duplications arising before and after the lamprey-gnathostome split. With the available data and given the difficulty to use rigorous molecular clock calibrations to estimate ancient divergent times, we cannot test the second assumption. The second assumption should be tested once the gene mapping data on amphioxus and lamprey genomes is available, to determine if syntenic linkage between genes is conserved.

Taken together, our data clearly show similar numbers of gene duplication events before and after the lamprey-gnathostome split. These results can be explained by the existence of two series of gene duplication, in accordance with the 2R theory or by the null hypothesis in which single-gene duplications have occurred at a regular frequency throughout chordate evolution. Looking at the fossil record, the first chordate fossils have been found in the Cambrian (~530 Ma), the first jawless vertebrates in the Ordovician (~470 Ma), and the first jawed vertebrates in the Silurian (~435 Ma; Janvier 1996Citation ). If the null hypothesis was correct, we should expect a higher number of duplications before the jawless vertebrates split than after because of a longer period of time. From our results, equivalent gene duplications have been detected in both periods, which favors the 2R hypothesis. However, the existence of extra gene duplications that occurred both before and after the lamprey-gnathostome split and the complex phylogenic topologies of the four-paralogue gene families suggest that the 2R scenario should be revised to fully explain the complex evolution of the vertebrate genomes.

The Secondary Loss Rate of Duplicated Genes is Very High
We can propose three major alternative scenarios at this stage (fig. 5 ), (1) a total of four tetraploidization events took place early in vertebrate evolution followed by massive gene loss; (2) two tetraploidization events as well as independent duplications of genes or large genomic regions occurred; and (3) tetraploidization did not take place, and the pattern can be explained by the independent duplication of genes or large genomic regions. This last scenario implies the smallest amount of gene loss.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 5.—Schematic representation of the proposed models for gene-genome duplications during the chordate-vertebrate transition. In this figure, black arrows indicate a tetraploidization event, gray square arrows indicate a gene loss event, and dotted arrows represent an independent gene duplication. (i) Four tetraploidization events took place during vertebrate evolution, two before and two after the lamprey-gnathostome split. This model implies that many of the newly duplicated genes were lost after each tetraploidization event. (ii) Two tetraploidization events took place during vertebrate evolution, one before and one after the lamprey-gnathostome split. Extra partial gene duplications and gene loss events occurred during these periods. (iii) Multiple independent gene duplications and gene loss events took place before and after the lamprey-gnathostome split

 
It is difficult to estimate the secondary loss rate of duplicated genes. Until now, the strongest evidence for genome duplications comes from yeast. In yeast, duplicated genes have a conserved gene order and orientation, but they are outnumbered by unique (nonduplicated) intervening genes (Seoighe and Wolfe 1998Citation ). The unique genes must originally have been duplicated along with the rest of the genome, but one copy was subsequently lost. If this is true, only ~8% of the original gene set was retained as duplicated and the other 92% returned to a single-copy state. A well-known duplicated region in the vertebrate genome is the cluster of Hox genes. This cluster has four copies or more in gnathostomes, in contrast with only one copy in the cephalochordate, amphioxus (Garcia-Fernandez and Holland 1994Citation ). Closer inspection of the four Hox gene clusters shows that in most Hox gene paralogy groups, only three members are truly maintained in the vertebrate genome. Indeed, only two groups contain all four genes (15% of the 13 loci), eight groups have three, and three groups have only two in human and mouse. Conversely, the amphioxus cluster has no missing genes and has an additional 14th gene that has either been lost in other chordates or appeared in amphioxus through a specific duplication (Ferrier et al. 2000Citation ). Collectively, these data suggest that the amount of gene loss is high (92% in the yeast genome and 25% for the four human Hox loci). In the context of the 2R hypothesis, from our data, gene loss occurred in 40% and 47% of the cases, respectively. These gene loss estimations are confirmed by the number of genes found in paralogous groups in gnathostomes. In our data set 27 of the 33 (81%) gene families were composed of two or three members, not four as we could expect without gene loss (for the Hox clusters, 85% of the loci are composed of two or three paralogues in gnathostomes). If two tetraploidization events occurred effectively, this implies that at least one gene loss occurred in 81% of the gene families.

The fact that we found extra duplications in 20% and 13% of the analyzed genes before and after the agnathan-gnathostome split, respectively, seems more difficult to reconcile with the tetraploidization events. This result would imply an 80% and 87% gene loss, respectively, and it seems at least as parsimonious to hypothesize an unrelated duplication of the individual genes or the genomic regions.

How Might Genome Duplication Occur?
How could polyploidization occur in ancestral chordates? It has been proposed that immediately after speciation, hybridization leading to allopolyploidy is not much different from autopolyploidization and probably has several advantages. Polyploidization has been detected even in a mammal (Tympanoctomys barrerae) recently discovered as a tetraploid (Gallardo et al. 1999Citation ). Hybridization followed by tetraploidization in highly adapted species probably had few advantages and thus became rare in animals. Nevertheless, there could have been a narrow hybridization window during evolution, when allopolyploidy allowed evolutionary jumps through the combination of advantage traits that had evolved previously in separate lineages (Spring 1997Citation ). In this sense, the faster evolving genes would already be quite different at the time of hybridization and thus could serve only as a partially redundant pool for further divergent evolution of the gene families. If this hypothesis is true, we should expect that the genes with different evolutionary rates in each of the species that hybridized gave rise to phylogenies biased by long-branch attraction artifacts and, as a result, the observed pattern would be (A)(BCD) instead of (AB)(CD). On the other hand, highly conserved genes are more likely to be reduced to a single copy than to rapidly diverging genes. Mutations within the regulatory regions could accumulate faster than those in the coding regions and lead to at least a partial tissue or developmental stage specificity of expression of functionally redundant genes. This interpretation could explain why so many knockout mice have much milder phenotypes than expected from the expression patterns of the individually investigated genes.

The final question concerns the role of these gene duplications in the phenotypic evolution of vertebrates. Did the origin of new genes create new opportunities for the evolution of new structures? Did the increase in gene number improve the adaptation of the ancestral chordates to new environments? Only comparative studies and later inference of gene expression and protein function between species, such as amphioxus, lampreys, and vertebrates, will improve our knowledge in this field.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 
We thank Marc Robinson-Rechavi for critical reading of the manuscript, Stephanie Bertrand for preliminary analysis, and two anonymous referees for their helpful suggestions. This work was supported by the CNRS, MENRT, ARC, FRM, and LNCC. H.E. held EMBO and Region Rhône Alpes postdoctoral fellowships. The study was also supported by a grant from the Natural Sciences and Engineering Research Council of Canada to J.Y. and an Ontario Graduate Scholarship to L.M.


    Footnotes
 
Manolo Gouy, Reviewing Editor

Keywords: duplication lamprey chordate vertebrate nuclear hormone receptor tetraploidy Back

Address for correspondence and reprints: Vincent Laudet, CNRS UMR 5665, Laboratoire de Biologie Moléculaire et Cellulaire, Ecole Normale Supérieure de Lyon, 46 Allée d'Italie 69364, Lyon Cedex 07, France. E-mail: vincent.laudet{at}ens-lyon.fr Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 References
 

    Atkin N. B., S. Ohno, 1967 DNA values of four primitive chordates Chromosoma 23:10-13[ISI]

    Delarbre C., H. Escriva, C. Gallut, V. Barriel, P. Kourilsky, P. Janvier, V. Laudet, G. Gachelin, 2000 The complete nucleotide sequence of the mitochondrial DNA of the Agnathan Lampetra fluviatilis: bearings on the phylogeny of cyclostomes Mol. Biol. Evol 17:519-529[Abstract/Free Full Text]

    Endo Y., M. Takahashi, M. Nakao, H. Saiga, H. Sekine, M. Matsushita, M. Nonaka, T. Fujita, 1998 Two lineages of mannose-binding lectin-associated serine protease (MASP) in vertebrates J. Immunol 161:4924-4930[Abstract/Free Full Text]

    Escriva H., M. Robinson, V. Laudet, 1999 Evolutionary biology of the nuclear receptor superfamily Oxford University Press, Oxford, UK

    Escriva H., R. Safi, C. Hanni, M. C. Langlois, P. Saumitou-Laprade, D. Stehelin, A. Capron, R. Pierce, V. Laudet, 1997 Ligand binding was acquired during evolution of nuclear receptors Proc. Natl. Acad. Sci. USA 94:6803-6808[Abstract/Free Full Text]

    Felsenstein J., 1985 Confidence limits on phylogenies: an approach using the bootstrap Evolution 39:783-791[ISI]

    Ferrier D. E. K., C. Minguillon, P. W. H. Holland, J. Garcia-Fernandez, 2000 The amphioxus Hox cluster: deuterostome posterior flexibility and Hox 14 Evol. Dev 2:284-293[ISI][Medline]

    Friedman R., A. L. Hughes, 2001 Gene duplication and the structure of eukaryotic genomes Genome Res 11:373-381[Abstract/Free Full Text]

    Furlong B., P. W. Holland, 2002 Were vertebrates octoploid? Philos. Trans. R. Soc. Lond. B Biol. Sci 357:531-544[ISI][Medline]

    Gallardo M. H., J. W. Bickham, R. L. Honeycutt, R. A. Ojeda, N. Kohler, 1999 Discovery of tetraploidy in a mammal Nature 401:341.[ISI][Medline]

    Galtier N., M. Gouy, C. Gautier, 1996 SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny Comput. Appl. Biosci 12:543-548[Abstract]

    Garcia-Fernandez J., P. W. H. Holland, 1994 Archetypal organization of the amphioxus Hox gene cluster Nature 120:407-413

    Hallböök F., L. G. Lundin, K. Kullander, 1998 Lampetra fluviatilis neurotrophin homolog, descendant of a neurotrophin ancestor, discloses the early molecular evolution of neurotrophins in the vertebrate subphylum J. Neurosci 18:8700-8711[Abstract/Free Full Text]

    Holland P. W. H., J. Garcia-Fernandez, N. A. Williams, A. Sidow, 1994 Gene duplication and the origins of vertebrate development Dev. Suppl.:125–133

    Hughes A. L., 1999 Phylogenies of developmentally important proteins do not support the hypothesis of two rounds of genome duplication early in vertebrate history J. Mol. Evol 48:565-576[ISI][Medline]

    IHGSC. 2001 Initial sequencing and analysis of the human genome Nature 409:860.[ISI][Medline]

    Janvier P., 1996 Early vertebrates Oxford University Press, New York

    Kasahara M., M. Hayashi, K. Tanaka, H. Inoku, K. Sugaya, T. Ikemura, T. Ishibashi, 1996 Chromosomal localization of the proteasome Z subunit gene reveals an ancient chromosomal duplication involving the major histocompatibility complex Proc. Natl. Acad. Sci. USA 93:9096-9101[Abstract/Free Full Text]

    Kuraku S., D. Hoshiyama, K. Katoh, H. Suga, T. Miyata, 1999 Monophyly of lampreys and hagfishes supported by nuclear DNA-coded genes J. Mol. Evol 49:729-735[ISI][Medline]

    Lanfranchi G., A. Pallavicini, P. Laverder, G. Valle, 1994 Ancestral hemoglobin switching in lampreys Dev. Biol 164:402-408[ISI][Medline]

    Laudet V., 1997 Evolution of the nuclear receptor superfamily: early diversification from an ancestral orphan receptor J. Mol. Endocrinol 19:207-226[Abstract/Free Full Text]

    Marchand O., R. Safi, H. Escriva, E. Van Rompaey, P. Prunet, V. Laudet, 2000 Molecular cloning and characterization of thyroid hormone receptors in teleost fishes J. Mol. Endocrinol 26:51-65[ISI]

    Martin A. P., 1999 Increasing genomic complexity by gene duplication and the origin of vertebrates Am. Nat 154:111-128[ISI]

    ———. 2001 Is tetralogy true? Lack of support for the "one-to-four rule." Mol. Biol. Evol 18:89-93[Free Full Text]

    Nonaka M., C. Namikawa-Yamada, M. Sasaki, L. Salter-Cid, M. F. Flajnik, 1997 Evolution of proteasome subunits {delta} and LMP2 J. Immunol 159:734-740[Abstract]

    Nuclear Receptors Nomenclature Committee. 1999 A unified nomenclature system for the nuclear receptor superfamily Cell 97:161-163[ISI][Medline]

    Ohno S., 1970 Evolution by gene duplication Springer-Verlag, Heidelberg

    Ono-Koyanagi K., H. Suga, K. Katoh, T. Miyata, 2000 Protein tyrosine phosphatases from amphioxus, hagfish, and ray: divergence of tissue-specific isoform genes in the early evolution of vertebrates J. Mol. Evol 50:302-311[ISI][Medline]

    Patton S. J., G. N. Luke, P. W. Holland, 1998 Complex history of a chromosomal paralogy region: insights from amphioxus aromatic amino acid hydroxylase genes and insulin-related genes Mol. Biol. Evol 15:1373-1380[Free Full Text]

    Postlethwait J. H., Y. L. Yan, M. A. Gates, et al. (29 co-authors) 1998 Vertebrate genome evolution and the zebrafish gene map Nat. Genet 18:345-349[ISI][Medline]

    Saitou N., M. Nei, 1987 The neighbor-joining method: a new method for reconstructing phylogenetic trees Mol. Biol. Evol 4:406-425[Abstract]

    Schmidtke J., C. Weiler, B. Kunz, W. Engel, 1977 Isozymes of a tunicate and a cephalochordate as a test of polyploidisation in chordate evolution Nature 266:532-533[ISI][Medline]

    Schubert M., L. Z. Holland, N. Holland, D. K. Jacobs, 2000 A phylogenetic tree of the Wnt genes based on all available full-length sequences, including five from the cephalochordate amphioxus Mol. Biol. Evol 17:1896-1903[Abstract/Free Full Text]

    Seoighe C., K. H. Wolfe, 1998 Extent of genomic rearrangement after genome duplication in yeast Proc. Natl. Acad. Sci. USA 95:4447-4452[Abstract/Free Full Text]

    Sharman A. C., P. W. H. Holland, 1996 Conservation, duplication, and divergence of developmental genes during chordate evolution Neth. J. Zool 46:47-67[ISI]

    ———. 1998 Estimation of Hox gene cluster number in lampreys Int. J. Dev. Biol 42:617-620[ISI][Medline]

    Shintani S., J. Terzic, A. Sato, M. Saraga-Babic, C. O'hUigin, H. Tichy, J. Klein, 2000 Do lampreys have lymphocytes? The Spi evidence Proc. Natl. Acad. Sci. USA 97:7417-7422[Abstract/Free Full Text]

    Sidow A., 1992 Diversification of the Wnt gene family on the ancestral lineage of vertebrates Proc. Natl. Acad. Sci. USA 89:5098-5102[Abstract]

    ———. 1996 Gen(om)e duplications in the evolution of early vertebrates Curr. Opin. Genet. Dev 6:715-722[ISI][Medline]

    Skrabanek L., K. H. Wolfe, 1998 Eukaryote genome duplication—where's the evidence? Curr. Opin. Genet. Dev 8:694-700[ISI][Medline]

    Spring J., 1997 Vertebrate evolution by interspecific hybridisation—are we polyploid? FEBS Lett 400:2-8[ISI][Medline]

    Suga H., D. Hoshiyama, S. Kuraku, K. Katoh, K. Kubokawa, T. Miyata, 1999 Protein tyrosine kinase cDNAs from amphioxus, hagfish, and lamprey: isoform duplications around the divergence of cyclostomes and gnathostomes J. Mol. Evol 49:601-608[ISI][Medline]

    Thomson J. D., D. G. Higgins, T. J. Gibson, 1994 Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22:4673-4680[Abstract]

    Tomsa J. M., J. A. Langeland, 1999 Otx expression during lamprey embryogenesis provides insights into the evolution of the vertebrate head and jaw Dev. Biol 207:26-37[ISI][Medline]

    Tsuji S., M. A. Qureshi, E. W. Hou, W. M. Fitch, S. S. L. Li, 1994 Evolutionary relationships of lactate dehydrogenases (LDHs) from mammals, birds, an amphibian, fish, barley, and bacteria: LDH cDNA sequences from Xenopus, pig and rat Proc. Natl. Acad. Sci. USA 91:9392-9396[Abstract/Free Full Text]

Accepted for publication April 15, 2002.