An Amphioxus Emx Homeobox Gene Reveals Duplication During Vertebrate Evolution

Nic A. Williams1, and Peter W. H. Holland3,

School of Animal and Microbial Sciences, University of Reading, Reading, England


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Members of the Emx homeobox gene class are expressed during embryogenesis in the brain and/or other head structures of phylogenetically diverse phyla. Here, we describe sequence, genomic structure, and molecular phylogenetic analysis of a cephalochordate (amphioxus) Emx class gene termed AmphiEmxA. The genomic structure of AmphiEmxA is very similar to that of vertebrate Emx genes, with two conserved intron sites. The Drosophila homolog empty spiracles (ems) has just one intron, which may be shared with chordates; the other has been secondarily lost in this Drosophila gene and in a cnidarian Emx-related gene. We identify a highly conserved peptide motif close to the amino terminus of Emx proteins, demonstrate its similarity to a sequence found in a variety of transcription factors, and argue that it arose through convergent evolution in homeobox and forkhead genes. Finally, our molecular phylogenetic analysis strongly supports the presence of a single Emx gene in the ancestor of chordates and gene duplication along the vertebrate lineage.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
The identification of evolutionarily homologous regions or structures by comparison of expression patterns of homologous genes is now common practice in molecular biology. In order to infer which gene expression sites are homologous and which may be derived in specific lineages, it is helpful to construct molecular phylogenies and identify the timing of gene duplication events during the evolution of particular taxa. We have embarked on a program to identify gene duplication events and acquisition of novel developmental roles during chordate evolution. A useful animal for this approach is the cephalochordate Branchiostoma floridae (amphioxus). Being a chordate, it has a body plan comparable to that of vertebrates, yet for many genes studied, it has a typically invertebrate gene compliment (Garcia-Fernàndez and Holland 1994Citation ; Holland 1996, 1999Citation ; Williams and Holland 1998Citation ). It is therefore a useful animal for molecular comparison with diverse invertebrate taxa, including Drosophila, since it possesses genes directly orthologous to invertebrate genes. On the other hand, its genes may be compared with those of vertebrates to deduce timings of gene duplication and shed light on functional recruitment of genes within a phylum.

A gene family of particular interest is the Emx homeobox gene class. The first members of this gene family to be cloned were two Drosophila genes, empty spiracles (ems) and E5 (Dalton, Chadwick, and McGinnis 1989Citation ; Walldorf and Gehring 1992Citation ). Drosophila ems is the best known of these genes and functions as a gap gene in the head in a way analogous to that of orthodenticle (otd). Emx is expressed in an anterior stripe during the syncytial blastoderm stage; loss of function mutations cause deletion of the anterior cephalic segments, as well as deletion of parts of the deuterocerebrum and tritocerebrum brain neuromeres (Hirth et al. 1995Citation ). Drosophila E5 is not expressed during the syncytial blastoderm or blastoderm stage. From stage 11 onward, E5 is expressed in segmentally reiterated blocks of lateral mesoderm and/or lateral epidermis. In stage 10–12 embryos, this pattern overlaps with the reiterated lateral epidermis expression pattern of the ems gene (W. McGinnis, personal communication).

Two members of the Emx gene family have been isolated from mice (Simeone et al. 1992Citation ); both are expressed in cephalic domains that include the presumptive cerebral cortex, the olfactory bulbs, and the olfactory epithelia, as well as the developing urogenital system. Gene targeting of the mouse Emx1 gene results in a deletion or reduction of the hippocampus (Yoshida et al. 1997Citation ); in homozygous mutations at the Emx2 locus, the cortex is reduced in size, the dentate gyrus is deleted, and the mice die postnatally due to severe urogenital alterations (Pellegrini et al. 1996Citation ; Yoshida et al. 1997Citation ). Similar cephalic expression domains, and putative functions, have been reported for zebrafish and Xenopus Emx genes (Morita et al. 1995Citation ; Pannese et al. 1998Citation ); in addition, Xenopus Emx genes are expressed in the visceral arches (Pannese et al. 1998Citation ).

The expression of Emx genes in anterior cephalic domains of both vertebrates and Drosophila is intriguing and (together with similar data for Otx genes) prompted the suggestion that the process of cephalization may be evolutionarily homologous between arthropods and chordates (Holland, Ingham, and Krauss 1992Citation ). Since this suggestion was made, Otx genes have been described from many invertebrate taxa, and the accumulating comparative data have added more support for ancient roles for this gene family in the development of the anterior body region. In contrast, Emx genes have been cloned from very few invertebrates besides Drosophila (Dalton, Chadwick, and McGinnis 1989Citation ; Simeone et al. 1992Citation ). A putative homolog has been found in the Caenorhabditis elegans genome (the ceh-2 gene on cosmid C27A12), and a cnidarian Emx gene, Cn-ems, has been cloned from the hydrozoan Hydractinia symbiolongicarpus. The latter gene is expressed in endoderm around the mouth-bearing end of the animal, apparently adding support for an ancient anterior expression domain for Emx genes (Mokady et al. 1998Citation ). It may be prudent to treat this conclusion with caution, however, since other authors have argued that the mouth-bearing end of Cnidaria may not actually be homologous to the anterior pole of triploblasts (Martindale and Henry 1998Citation ). Indeed, the huge differences in body layout between diploblast and triploblast animals may preclude direct comparisons of body regions.

The lack of information available regarding invertebrate Emx genes makes it difficult to draw safe conclusions concerning conservation of gene expression pattern or function. Without a sound molecular phylogeny of these genes, it is not even clear if valid comparisons are being made. To begin to address these questions, we cloned an amphioxus member of the Emx homeobox class, termed AmphiEmxA. We describe the full sequence and intron-exon organization of AmphiEmxA and identify a highly conserved peptide domain outside of the homeodomain in Emx protein sequences. We use molecular phylogenetic analyses to examine the course of gene duplication in the Emx gene family during animal evolution.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
PCR Amplification
Amphioxi (B. floridae) were collected from Old Tampa Bay, Florida (Holland and Holland 1993Citation ), and high-molecular-weight genomic DNA was extracted from a pooled sample of adults using standard methods (Shimeld 1997aCitation ). Degenerate oligonucleotide primers were designed to anneal to two regions within the homeobox, conserved at the amino acid level between vertebrate and Drosophila Emx genes. These primers were IRTAFSP (sense primer NWems1: 5'-GCGGATCCGAACNGCNTTYWSNCC-3') and AERKQLA (antisense primer NWems3: 5'-GGCGARYTGYTTCCKYTCNGC-3'). Both sequences are 5' of the homeobox intron site in vertebrates. The PCR product amplified from genomic DNA was blunt-end-cloned into SmaI-cut pUC18. Of 10 recombinants sequenced, eight contained spurious amplification products, and two contained an identical DNA sequence with high sequence similarity to the homeobox of vertebrate and Drosophila Emx class genes (designated AmphiEmxA). Low frequency of positive recombinants may relate to divergence of AmphiEmxA, causing a mismatch at the 3' end of NWems3.

Genomic and cDNA Library Screening
The amphioxus Emx-related PCR product was used to screen approximately 50,000 clones of a B. floridae genomic library (Garcia-Fernàndez and Holland 1994Citation ); the low-stringency conditions of Holland and Hogan (1986)Citation were used. Six strongly hybridizing, overlapping phage clones were isolated; two were further restriction-mapped, subcloned, and partially sequenced (Bfg 356-1 and Bfg 364-1). This procedure revealed a region of sequence similarity to the second exon of vertebrate Emx genes, including part of the homeobox. This sequence was identical at the amino acid level to the AmphiEmxA PCR fragment. A region of sequence similarity to the third exon of vertebrate Emx genes, including the 3' part of the homeobox, was identified by hybridization with an end-labeled oligonucleotide, HB1 (CKNCKRTTYTGRAACCADATYTT), matching sequence encoding recognition helix III of Hox and Emx class homeodomains. A probe derived from the genomic partial homeobox sequence was used to screen 40,000 clones of an amplified B. floridae cDNA library constructed from 5–24-h embryos (kindly provided by J. Langeland); conditions were those of Church and Gilbert (1984)Citation at 65°C. A single, strongly hybridizing plaque was identified. The full cDNA sequence is available in the EMBL and GenBank databases (accession number AF261146). The 5' region of this AmphiEmxA cDNA was used as a probe to isolate the 5' exon from the Bfg 356-1 and Bfg 364-1 phage clones. Comparison of genomic and cDNA sequences was used to confirm intron/exon boundaries of AmphiEmxA.

Molecular Phylogenetic Analysis
The entire putative AmphiEmxA coding region was aligned with deduced amino acid sequences of Drosophila ems and human, mouse, Xenopus, and zebrafish Emx class genes using the CLUSTAL W program (Thompson, Higgins, and Gibson 1994Citation ) and adjusted by eye to maximize contiguous stretches of sequence similarity. The C. elegans ceh-2 gene (C27A12.5; accession number AF003137) and a cnidarian Emx gene, Cn-ems (Mokady et al. 1998Citation ), were not included in this alignment due to high divergence outside the homeodomain. The Drosophila E5 sequence was also not included. A total of 127 amino acid positions were used in phylogenetic analyses, after regions that could not be aligned with confidence and all sites with gaps were excluded from the data set. The alignment is available as supplementary information. Phylogenies were constructed using neighbor joining (NJ), maximum parsimony (MP), and maximum likelihood (ML). NJ was implemented using the PROTDIST and NEIGHBOR programs of PHYLIP, version 3.573c (Felsenstein 1993Citation ), on a distance matrix calculated with the Dayhoff PAM option. MP used PROTPARS of PHYLIP, version 3.5c, from which a strict consensus tree was constructed. Confidence in each node was assessed by 100 bootstrap replicates for both NJ and MP. Both analyses were repeated using an alignment of just the homeodomain, so that Cn-ems and Drosophila E5 could be included. The ML analysis was performed using the quartet sampling and NJ parameter estimation procedure of TreePuzzle, version 4.0.2 (Strimmer and von Haeseler 1996Citation ), with 1,000 puzzling steps, the Dayhoff model of amino acid substitution, and a mixed model of between-site rate heterogeneity with four gamma-distributed rate categories and one invariant category.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Genomic Organization of AmphiEmxA
Using PCR, cDNA library, and genomic library screening, we isolated an amphioxus member of the Emx homeobox gene family, designated AmphiEmxA. The cDNA is 2,884 bp long and has the potential to encode a protein of 289 amino acids (fig. 1 ). We suggest that this corresponds to the full-length protein, since there are stop codons 5' of the first methionine codon. The open reading frame includes a 180-bp homeobox sequence close to its 3' end, followed by a long 3' untranslated region (UTR). The encoded homeodomain belongs to the Emx class, as indicated by its high similarity to the homeodomain sequences of the human, mouse, frog, zebrafish, Drosophila, and cnidarian Emx genes (80%–83% to vertebrate Emx sequences, 80% to ems, 85% to E5; fig. 2 ).



View larger version (105K):
[in this window]
[in a new window]
 
Fig. 1.—Nucleotide and predicted amino acid sequence of AmphiEmxA. The homeodomain residues are shown in bold, and the conserved Emx peptide domain is underlined. Asterisks indicate the first 5' and 3' in-frame stop codons. Intron positions are indicated with triangles

 


View larger version (19K):
[in this window]
[in a new window]
 
Fig. 2.—Alignment of the predicted homeodomain sequence of AmphiEmxA with vertebrate, fly, nematode, and cnidarian Emx homeodomain sequences. The figures indicate percentage identity to the AmphiEmxA sequence. Abbreviations: Ce, Caenorhabditis elegans; Cn, Cnidaria (Hydractinia symbiolongicarpus); Dm, Drosophila melanogaster; H, human; M, mouse; X, Xenopus; Z, zebrafish

 
AmphiEmxA contains two introns, as determined by comparison of genomic and cDNA sequences (figs. 1 and 3 ). The first is approximately 3 kb long and 5' of the homeobox, separating it from a region coding for a partially conserved hexapeptide sequence found in several classes of homeobox. This intron position is shared with mouse and human Emx (Simeone et al. 1992Citation ). The second intron, approximately 6 kb in size, is located between residues 44 and 45 of the homeodomain. An intron is also found in this position in vertebrate Emx genes (at least in those genes for which intron-exon organization has been determined, i.e., human and mouse Emx1 and Emx2), in Drosophila E5, and in C. elegans ceh-2. Cnidarian Cn-ems possesses an intron within the homeobox, but this is not located in the same position as that of amphioxus and vertebrates.

Restriction mapping and sequence analysis revealed a repeated DNA sequence of approximately 180 bp within AmphiEmxA. This motif is imperfectly repeated (fully or partially) six times within the 3' half of the transcribed region of the gene. Approximately 66 bp of the most 5' repeat unit forms part of the AmphiEmxA coding region; the remainder and all subsequent repeats are part of the 3' UTR. No such region exists in Drosophila ems, and searches against EMBL and GenBank databases revealed no significant matches. The 3' UTR sequences of vertebrate Emx genes have not been published.

A Conserved Motif in Homeobox and Other Genes
Alignment of the AmphiEmxA deduced protein sequence to other members of the Emx homeobox class revealed a well-conserved 14-residue peptide motif close to the N-terminus. This sequence is located four to five residues downstream of the first methionine in vertebrate Emx class proteins, 21 residues downstream in Drosophila ems, and 11 residues downstream in AmphiEmxA (underlined in fig. 1 ). A weakly conserved version is present in a cnidarian Emx protein, Cn-ems.

The Emx peptide motif was used to search the EMBL and GenBank databases. This revealed a similar sequence in a wide variety of homeobox genes and some other transcription factors. We find that the Emx peptide motif overlaps with the Hep motif present in the Drosophila H2.0 homeobox gene, engrailed homeobox genes, and homeobox genes with a paired box (Allen et al. 1991Citation ). Similar motifs have been noted in Msx, NK-1, NK-2, gsc, Not, Pax-3/7, Rx, ceh-10, and Anf class homeodomain proteins (Smith and Jaynes 1996Citation ; Stein, Niß, and Kessel 1996Citation ; Galliot, de Vargas, and Miller 1999Citation ). Our analyses extend the list to include Emx and Gbx class homeodomain proteins.

Figure 4 shows an alignment of this motif from homeodomain proteins and some other transcription factors from diverse taxa (see Discussion). Sequence identity is more striking within a gene class than between gene classes, suggesting the existence of functional constraints specific to each gene class. The most conserved sites are an invariant phenylalanine at position 5 and an almost invariant isoleucine at position 7.



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 4.—Alignment of the conserved peptide motif present in Emx genes. The motif is 14 amino acids in length; dashes indicate sequence identity. A selection of similar sequences encoded by other homeobox genes, forkhead domain genes, and zinc finger genes is also shown. Abbreviations: Amphi/am, amphioxus; C/c, chicken; Cn, cnidarian; D, Drosophila; Em, Ephydatia muelleri (sponge); h, human; m, mouse; X, Xenopus; zf, zebrafish

 
Molecular Phylogenetic Analysis
In order to investigate whether AmphiEmxA is an ortholog of a particular vertebrate Emx gene or a homolog of multiple genes, we performed molecular phylogenetic analyses on the deduced protein sequences. Figure 5 shows phylogenetic trees inferred using NJ and ML. Where bootstrap or quartet puzzling reliability values were less than 60%, nodes were collapsed. The trees are rooted using Drosophila ems as the outgroup. These analyses strongly indicate that AmphiEmxA lies outside of a clade containing all of the vertebrate Emx genes (98% NJ bootstrap, 100% ML reliability value). This position is also supported by MP analysis (99% bootstrap). The implication is that a single Emx gene was present in an ancestral chordate and that this gene underwent at least one duplication event in the vertebrate lineage after it split from the lineage leading to the cephalochordates. We have not conclusively demonstrated whether single or multiple Emx genes exist in amphioxus; however, the phylogenetic analyses predict that if more Emx class genes were present in the amphioxus genome, they would have arisen from independent gene duplication events in the cephalochordate lineage.



View larger version (16K):
[in this window]
[in a new window]
 
Fig. 5.—Neighbor-joining (NJ; top) and maximum-likelihood (ML; bottom) phylogenetic trees from an alignment of the putative protein sequences of AmphiEmxA, vertebrate Emx genes, and Drosophila ems. Figures at nodes are scores from 100 bootstrap resamplings of the data (NJ) or quartet puzzling support values (ML). Nodes were collapsed where scores were below 60. The two methods gave the same overall topology except for the position of zebrafish Emx1 (see text) and minor branch swapping within the Emx2 clade

 
The tree topology also helps to define some relationships between vertebrate Emx class genes. The Emx2 genes of zebrafish, Xenopus, mice, and humans clearly group into a single clade (100% NJ bootstrap, 96% MP bootstrap, 100% ML reliability). This confirms that these genes are true orthologs. The situation regarding Emx1 genes is less clear. There is strong evidence that Xenopus, mouse, and human Emx1 genes are orthologs (NJ, 96%; ML, 72%), but the position of the zebrafish gene termed emx1 is not resolved with confidence. The node connecting the latter gene to the other vertebrate Emx1 genes has been collapsed in the NJ tree due to its low bootstrap score (51%). MP or ML, on the other hand, places zebrafish emx1 next to the Emx2 clade, but, again, this position is supported with low bootstrap or reliability values (MP, 56%; ML, 57%). Together, these data indicate that there has been at least one Emx gene duplication within the vertebrate lineage (to give Emx1 and Emx2); this predates the divergence of actinopterygian and tetrapod lineages. The aberrant zebrafish emx1 either is descended from a separate duplication or is a highly divergent Emx1 gene that has confounded attempts to reconstruct ancestry from sequence data.

To include Drosophila E5 and the divergent cnidarian Cn-ems gene, it was necessary to restrict the alignment to just the homeodomain, thus compromising sequence length in favor of taxonomic sampling. Phylogenetic trees obtained from this alignment had very similar topologies to those in figure 5 (data not shown). All of the invertebrate Emx genes were again placed outside of a clade containing all vertebrate Emx1 and Emx2 genes (82% NJ bootstrap, 70% MP bootstrap). Although these scores are lower than those above, presumably due to reduced informative sequence variation in the homeodomain, they still support the existence of a single Emx gene in an ancestral chordate. The two Drosophila Emx class genes (ems and E5) group together in such analyses, suggesting that they may derive from an independent gene duplication, although bootstrap values are very low (59% NJ, 53% MP). Further sampling of Emx genes will be required to resolve the timing of this duplication relative to arthropod radiation.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
Conservation, Gain, and Loss of Introns
The best-known class of homeobox genes, the Hox genes, have a simple and stereotyped genomic organization in vertebrates and amphioxus. With few exceptions, Hox genes have a single intron just 5' of the homeobox, dividing this region from a conserved hexapeptide motif (for amphioxus, see Garcia-Fernàndez and Holland 1994Citation ; Wada, Garcia-Fernàndez, and Holland 1999Citation ). Interestingly, this intron lies between DNA sequences coding for two functional domains of Hox proteins: the homeodomain mediating sequence-specific binding to DNA, and the hexapeptide involved in heterodimer formation with Pbx/exd homeodomain proteins (Piper et al. 1999Citation ). A comparable intron position is found in many other classes of homeobox gene, including ParaHox genes (Cdx, Xlox, Gsx classes), Mox, Otx, and the Emx class genes studied here. We find that this intron position, just upstream of the homeobox, is conserved between human and mouse Emx genes and the amphioxus homolog, AmphiEmxA. Drosophila ems and the cnidarian Cn-ems gene also possess an intron 5' of the homeobox, but sequence divergence precludes assigning this as a homologous position on the basis of sequence alone. Nonetheless, taking into account the fact that a comparable intron position exists in many homeobox classes and that these classes diverged early in metazoan evolution (Bürglin 1995Citation ), we suggest that conservation of an ancient intron position is the most likely explanation.

Introns within the homeobox can be compared more easily, since sequence conservation allows sites to be aligned with certainty. We mapped a large (6 kb) intron within the homeobox of AmphiEmxA between codons 44 and 45. This intron site is shared with human and mouse Emx1 and Emx2 genes, indicating an origin before the divergence of vertebrate and cephalochordate lineages. Presence of this same intron site in the C. elegans Emx class gene ceh-2 and the Drosophila Emx class gene E5 (although not ems) suggests an even earlier origin, before the divergence of the major bilaterian lineages. Bürglin (1994, 1995)Citation noted that among homeobox genes with an intron in the homeobox, the most frequent site is between codons 44 and 45. These include Hox genes such as Drosophila lab, pb, and Abd-B, plus nematode lin-39 and the non-Hox genes NK-1, H2.0, lbl, and Dll, chick CdxA and CNot1, several vertebrate Dlx and Hlx genes, nematode ceh-1, ceh-9, ceh 12, and ceh-20, flatworm Dth-2, and the chordate Emx genes discussed here.

Clearly, possession of an intron between codons 44 and 45 of the homeobox is a character shared by genes from several related classes of homeobox genes. We follow Bürglin (1995)Citation in arguing that possession of this intron is an ancestral property of many metazoan homeobox gene classes, including Emx. We do not suggest, however, that this intron position was present in the first homeobox genes. Instead, we suggest that insertion of this intron corresponds to a major division of the homeobox gene superfamily of metazoans into a PRD superclass and an ANTP superclass (as defined by the phylogeny of Galliot, de Vargas, and Miller 1999Citation ). The Emx, Hox, Cdx, Dlx, NK-1, NK-2, Lbx, Hlx, and NEC classes (and others) are all part of the ANTP superclass; we propose that the intron was inserted into the ancestor of this superclass.

It is interesting, therefore, that Drosophila ems and cnidarian Cn-ems lack this intron. Indeed, Cn-ems possesses an intron at a different site in the homeobox. We conclude that there has been both loss and gain of introns within Emx class homeoboxes.

Convergent Evolution of a Transcriptional Modification Domain?
Alignment of the full-length deduced protein encoded by AmphiEmxA with its homologs from vertebrates and Drosophila allowed identification of a conserved 14-residue motif close to the N-terminus. Database analysis and comparisons by eye revealed similarity to or overlap with a number of independently identified peptide motifs in several (but not all) homeodomain proteins, as well as some forkhead domain genes and zinc finger genes. The motif was first noted by Allen et al. (1991)Citation , who named it the Hep motif, referring to its presence in the H2.0 (Hlx) homeobox gene, engrailed homeobox genes, and homeobox genes with a paired box (Hep = H2.0/engrailed/paired). The motif is the most N-terminal of five conserved protein stretches shared by mouse, human, and chicken engrailed (en) class genes and is designated eh1 (Logan et al. 1992Citation ). The eh1 motif is also present in en class protein sequences from invertebrates, including Drosophila and Artemia (e.g., Manzanares, Marco, and Garesse 1993Citation ). Smith and Jaynes (1996)Citation extended the range of homeodomain proteins in which the eh1 motif could be recognized to the Msx, NK-1, NK-2, and gsc classes. The eh1 motif from Drosophila engrailed is capable of strongly repressing transcription when attached to a DNA-binding domain, providing a functional reason for wide conservation of the motif. Stein, Niß, and Kessel (1996)Citation noted that proteins of the Not homeodomain class possess a Hep/eh1 motif, while Galliot, de Vargas, and Miller (1999)Citation noted presence of an eh1-like motif in a range of PRD superclass homeodomain proteins, including the Pax-3/7, Rx, ceh-10, and Anf classes.

Our finding that a similar motif exists in Emx class proteins from invertebrates and vertebrates extends the range of homeobox genes further. Using the Emx motif in database searches revealed that the Gbx homeodomain class could also be added to the growing list, in addition to allowing us to refine the extent of conservation of this motif (fig. 4 ). These comparisons clearly suggest that this motif has an ancient origin within the homeodomain superfamily, at least within Metazoa. In addition, it suggests that Emx homeodomain proteins possess a separate domain that is likely to act as a modulator of transcriptional activity.

Shimeld (1997b)Citation noted that the eh1 domain has remarkable similarity to a conserved domain (region II) shared between proteins of the HNF3 family of forkhead domain transcription factors from vertebrates, amphioxus, and arthropods. To this list of taxa we can now add the budhead gene from Hydra (Martinez et al. 1997Citation ). As with eh1, a function has been assigned to region II; in this case, the function is transcriptional activation rather than repression (Pani et al. 1992Citation ). Grimes et al. (1996)Citation and Deschet et al. (1998)Citation noted similarity to the SNAG repressor domain in vertebrate Gfi-1 proto-oncoproteins and to a sequence located at the N-terminus of the vertebrate Snail-Slug class of zinc finger proteins. In the case of zinc finger proteins, however, biochemical function has not been demonstrated.

It is intriguing that a similar protein motif exists in at least three apparently unrelated transcription factor families (homeodomain proteins, forkhead domain proteins, and zinc finger proteins). This is a highly unusual distribution that demands explanation. It cannot be discounted as mere chance sequence similarity, because (at least for the homeodomain and forkhead examples) the motif has a defined biochemical function and evolutionary conservation across a wide taxonomic range. Indeed, in each case, conservation extends across almost the full range of Metazoa, from cnidarians to arthropods and chordates. There are two opposing explanations for the pattern described: conservation and convergence. Conservation would imply that similarity is a reflection of descent from a very ancient functional motif that existed in a "primordial" transcription factor. This would demand radical exon shuffling or gene fusion to copy a domain between precursors of proteins possessing different DNA-binding domains, plus extensive loss or divergence of the motif in some subsequent lineages of each gene family. On the basis of the unusual distribution of this motif, we favor the alternative explanation: convergent evolution. Two other factors also argue in favor of convergent evolution. First, the motif has distinct biochemical functions in the two gene families; it can act as a repressor in homeodomain proteins and as an activator in forkhead proteins. Second, the motif is located in a different part of the protein in each case: close to the N-terminus in homeodomain proteins, and C-terminal in forkhead proteins.

Gene Duplication
Molecular phylogenetic analysis using information from the entire coding sequences of chordate and Drosophila Emx class genes gives strong support for the existence of a single Emx gene in the ancestor of chordates. This Emx gene underwent at least one gene duplication event in the vertebrate lineage, after this lineage had diverged from its sister lineage leading to amphioxus and before the divergence of ray-finned fish and tetrapods. AmphiEmxA is a descendant of the ancestral gene before it underwent vertebrate-specific duplication. Hence, neither vertebrate Emx1 nor Emx2 should strictly be considered orthologs of AmphiEmxA. Emx1 and Emx2 are also not orthologs of Drosophila ems and E5. These gene duplication events suggest that some caution is necessary when comparing gene expression patterns and developmental roles between vertebrate and invertebrate Emx genes. Similar gene duplication events in early vertebrate evolution have been recorded for many genes. These include several classes of transcription factors, including homeobox genes of the Hox, Otx, Msx, Cdx, En, and Gsx classes, Pax genes, and myogenic bHLH genes (for review, see Holland 1999Citation ). Other examples of duplicated genes are also known, raising the possibility that gene duplication affected a large proportion of the genome in early vertebrate evolution (Holland et al. 1994Citation ; Holland 1999Citation ). This proposal gains additional support from total gene number estimates; Simmen et al. (1998)Citation estimated that the tunicate Ciona intestinalis has approximately 15,500 genes (±3,700), as compared with 50,000–100,000 in higher vertebrates. Current data from individual gene families suggest that the amphioxus condition is comparable to that of tunicates.

Although there is now overwhelming evidence in favor of extensive gene duplication in early vertebrate evolution (with the Emx class adding to that evidence), the mechanism by which duplication occurred is contentious. A popular view, originally proposed by Ohno (1970)Citation , is that two or more polyploidy events, followed by gene divergence and gene loss, caused a stepwise increase in gene number during early vertebrate evolution. The existence of chromosomal paralogy regions (regions of similar gene content on different chromosomes) in mammalian genomes seems to support the polyploidy model (Lundin 1993Citation ). Paralogy regions may be the echoes of at least two whole-genome duplications, but they are not necessarily faithful copies due to subsequent gene loss and/or additional tandem gene duplication. The Emx1 and Emx2 genes of mice or humans do not map to currently identified paralogy regions; human Emx1 maps to 2p14–p13, while Emx2 maps to 10q26.1. It is unclear, therefore, whether the gene duplication reported in this paper occurred in concert with other genes or was an isolated event.

We have discussed the duplication of vertebrate Emx genes as if it were a single event, since we consider this the most parsimonious interpretation of our molecular phylogenetic analyses. However, while the monophyletic status of the vertebrate Emx2 genes is very well supported, we cannot decide conclusively between a monophyletic and a paraphyletic origin for Emx1 genes. This is due to low confidence in the precise position of zebrafish emx1, which appears to be evolving at a relatively high rate, as judged by its long branch length in phylogenetic trees (fig. 5 ). It is formally possible that zebrafish emx1 represents a third group of vertebrate Emx class genes which has been lost from tetrapods (or has yet to be cloned); if this is the case, there have been at least two duplications of the ancestral Emx gene in early vertebrate evolution. The simpler explanation is that the zebrafish emx1 gene is a true Emx1 gene, but its placement in the molecular phylogenetic tree is compromised by an unusually rapid rate of sequence evolution. These two alternatives were also contrasted by Patarnello et al. (1997)Citation . These authors found a very basal placement for zebrafish emx1, as an outgroup to both Emx1 and Emx2; our analyses do not agree with this placement. Compared with the alignment of Patarnello et al. (1997)Citation , we have been more conservative in identifying putatively homologous sites and excluding very variable regions between chordate Emx genes and Drosophila ems. We suggest that this has resulted in more reliable phylogenetic trees. We favor the parsimonious interpretation that zebrafish emx1 is a real, but divergent, Emx1 gene. In further support of this interpretation, no other Emx1-type gene has been reported from zebrafish to date, and the rapid divergence of particular duplicated zebrafish genes is not without precedence (Williams and Holland 1998Citation ). In summary, we conclude that the duplication of an ancestral Emx class homeobox gene in the vertebrate lineage postdates divergence from cephalochordates and predates the divergence of ray-finned fish and tetrapods.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 3.—Genomic and cDNA organization of AmphiEmxA indicating positions and sizes of introns. Black boxes represent the homeobox; other coding regions are dotted. White boxes represent 5' and 3' untranslated regions (UTRs). The repeat domain present in the 3' UTR is indicated below the genomic schematic. Restriction sites are shown above the genomic clone: Pv = PvuII, R = EcoRI

 

    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 
We thank Jim Langeland for the B. floridae cDNA library, Bill McGinnis for communicating unpublished data, and Jordi Garcia-Fernàndez, Hidetoshi Saiga, and members of the laboratory for helpful discussions. The constructive suggestions of Dr. Richard Thomas and a referee are also acknowledged. This work was supported by BBSRC grant G04203.


    Footnotes
 
Richard Thomas, Reviewing Editor

1 Present address: Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford, England. Back

2 Keywords: amphioxus homeobox gene duplication hep motif, intron Back

3 Address for correspondence and reprints: Peter W. H. Holland, School of Animal and Microbial Sciences, University of Reading, Whiteknights, Reading RG6 6AJ, United Kingdom. E-mail: p.w.h.holland{at}reading.ac.uk Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Acknowledgements
 literature cited
 

    Allen, J. D., T. Lints, N. A. Jenkins, N. G. Copeland, A. Strasser, R. P. Harvey, and J. M. Adams. 1991. Novel murine homeobox gene on chromosome-1 expressed in specific hematopoietic lineages and during embryogenesis. Genes Dev. 5:509–520[Abstract]

    Bürglin, T. R. 1994. A comprehensive classification of homeobox genes. Pp. 25–71 in D. Duboule, ed. Guidebook to the homeobox genes. Oxford University Press, Oxford, England

    ———. 1995. The evolution of homeobox genes. Pp. 291–336 in R. Arai, M. Kato, and Y. Doi, eds. Biodiversity and evolution. National Science Museum Foundation, Tokyo

    Church, G. M., and W. Gilbert. 1984. Genomic sequencing. Proc. Natl. Acad. Sci. USA 81:1991–1995

    Dalton, D., R. Chadwick, and W. McGinnis. 1989. Expression and embryonic function of empty spiracles: a Drosophila homeobox gene with two patterning functions on the anterior-posterior axis of the embryo. Genes Dev. 3:1940–1956[Abstract]

    Deschet, K., F. Bourrat, D. Chourrout, and J.-S. Joly. 1998. Expression domains of the medaka (Oryzias latipes) Ol-Gsh 1 gene are reminiscent of those of clustered and orphan homeobox genes. Dev. Genes Evol. 208:235–244[ISI][Medline]

    Felsenstein, J. 1993. PHYLIP (phylogeny inference program). Version 3.5c. Distributed by the author, Department of Genetics, University of Washington, Seattle

    Galliot, B., C. de Vargas, and D. Miller. 1999. Evolution of homeobox genes: Q50 Paired-like genes founded the Paired class. Dev. Genes Evol. 209:186–197[ISI][Medline]

    Garcia-Fernàndez, J., and P. W. H. Holland. 1994. Archetypal organization of the amphioxus Hox gene cluster. Nature 370:563–566

    Grimes, H. L., T. O. Chan, P. A. Zweidler-Mckay, B. Tong, and P. N. Tsichlis. 1996. The Gfi-1 proto-oncoprotein contains a novel transcriptional repressor domain, SNAG, and inhibits G1 arrest induced by interleukin-2 withdrawal. Mol. Cell. Biol. 16:6263–6272[Abstract]

    Hirth, F., S. Therianos, T. Loop, W. J. Gehring, H. Reichert, and K. Furukubo-Tokunaga. 1995. Developmental defects in brain segmentation caused by mutations of the homeobox genes orthodenticle and empty spiracles in Drosophila. Neuron 15:769–778

    Holland, N. D., and L. Z. Holland. 1993. Embryos and larvae of invertebrate deuterostomes. Pp. 21–32 in C. D. Stern and P. W. H. Holland, eds. Essential developmental biology: a practical approach. IRL Press at Oxford University Press, Oxford, England

    Holland, P. W. H. 1996. Molecular biology of lancelets: insights into development and evolution. Isr. J. Zool. 42:S247—S272

    ———. 1999. Gene duplication: past, present and future. Semin. Cell Dev. Biol. 10:541–547[ISI][Medline]

    Holland, P. W. H., J. Garcia-Fernàndez, N. A. Williams, and A. Sidow. 1994. Gene duplications and the origins of vertebrate development. Development 1994(Suppl.):125–133

    Holland, P. W. H., and B. L. M. Hogan. 1986. Phylogenetic distribution of Antennapedia-like homeoboxes. Nature 321:251–253

    Holland, P. W. H., P. Ingham, and S. Krauss. 1992. Mice and flies head to head. Nature 358:627–628

    Logan, C., M. C. Hanks, S. Nobletopham, D. Nallainathan, N. J. Provart, and A. L. Joyner. 1992. Cloning and sequence comparison of the mouse, human, and chicken engrailed genes reveal potential functional domains and regulatory regions. Dev. Genet. 13:345–358[ISI][Medline]

    Lundin, L. G. 1993. Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. Genomics 16:1–19

    Manzanares, M., R. Marco, and R. Garesse. 1993. Genomic organization and developmental pattern of expression of the engrailed gene from the brine shrimp Artemia. Development 118:1209–1219

    Martindale, M. Q., and J. Q. Henry. 1998. The development of radial and biradial symmetry: the evolution of bilaterality. Am. Zool. 38:672–684[ISI]

    Martinez, D. E., M. L. Dirksen, P. M. Bode, M. Jamrich, R. E. Steele, and H. R. Bode. 1997. Budhead, a fork head HNF-3 homologue, is expressed during axis formation and head specification in hydra. Dev. Biol. 192:523–536[ISI][Medline]

    Mokady, O., M. H. Dick, D. Lackschewitz, B. Schierwater, and L. W. Buss. 1998. Over one-half billion years of head conservation? Expression of an ems class gene in Hydractinia symbiolongicarpus (Cnidaria: Hydrozoa). Proc. Natl. Acad. Sci. USA 95:3673–3678

    Morita, T., H. Nitta, K. Yuji, H. Mori, and M. Mishina. 1995. Differential expression of two zebrafish emx homeodomain mRNAs in the developing brain. Neurosci. Lett. 198:131–134[ISI][Medline]

    Ohno, S. 1970. Evolution by gene duplication. Springer-Verlag, New York

    Pani, L., D. G. Overdier, A. Porcella, X. Qian, E. Lai, and R. H. Costa. 1992. Hepatocyte nuclear factor-3-beta contains 2 transcriptional activation domains, one of which is novel and conserved with the Drosophila fork head protein. Mol. Cell. Biol. 12:3723–3732[Abstract]

    Pannese, M., G. Lupo, B. Kablar, E. Boncinelli, G. Barsacchi, and R. Vignali. 1998. The Xenopus Emx genes identify presumptive dorsal telencephalon and are induced by head organizer signals. Mech. Dev. 73:73–83[ISI][Medline]

    Patarnello, T., L. Bargelloni, E. Boncinelli, F. Spada, M. Pannese, and V. Broccoli. 1997. Evolution of Emx genes and brain development in vertebrates. Proc. R. Soc. Lond. B Biol. Sci. 264:1763–1766[ISI][Medline]

    Pellegrini, M., A. Mansouri, A. Simeone, E. Boncinelli, and P. Gruss. 1996. Dentate gyrus formation requires Emx2. Development 122:3893–3898

    Piper, D. E., A. H. Batchelor, C. P. Chang, M. L. Cleary, and C. Wolberger. 1999. Structure of a HoxB1-Pbx1 heterodimer bound to DNA: role of the hexapeptide and a fourth homeodomain helix in complex formation. Cell 96:587–597

    Shimeld, S. M. 1997a. Characterisation of amphioxus HNF-3 genes: conserved expression in the notochord and floor plate. Dev. Biol. 183:74–85

    ———. 1997b. A transcriptional modification motif encoded by homeobox and fork head genes. FEBS Lett. 410:124–125

    Simeone, A., M. Guilisano, D. Acampora, A. Stornaiulo, M. Rambaldi, and E. Boncinelli. 1992. Two vertebrate homeobox genes related to the Drosophila empty spiracles gene are expressed in the embryonic cerebral cortex. EMBO J. 11:2541–2550[Abstract]

    Simmen, M. W., S. Leitgeb, V. H. Clark, S. J. M. Jones, and A. Bird. 1998. Gene number in an invertebrate chordate, Ciona intestinalis. Proc. Natl. Acad. Sci. USA 95:4437–4440

    Smith, S. T., and J. B. Jaynes. 1996. A conserved region of engrailed, shared among all en-, gsc-, NK1-, NK2- and msh-class homeoproteins, mediates active transcriptional repression in vivo. Development 122:3141–3150

    Stein, S., K. Niß, and M. Kessel. 1996. Differential activation of the clustered homeobox genes CNOT2 and CNOT1 during notogenesis in the chick. Dev. Biol. 180:519–533[ISI][Medline]

    Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling: a maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964–969[Free Full Text]

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680[Abstract]

    Wada, H., J. Garcia-Fernàndez, and P. W. H. Holland. 1999. Colinear and segmental expression of amphioxus Hox genes: differences from vertebrates and clues to ancestral roles. Dev. Biol. 213:131–141[ISI][Medline]

    Walldorf, U., and W. J. Gehring. 1992. Empty spiracles, a gap gene containing a homeobox involved in Drosophila head development. EMBO J. 11:2247–2259[Abstract]

    Williams, N. A., and P. W. H. Holland. 1998. Gene and domain duplication in the chordate Otx gene family: insights from amphioxus Otx. Mol. Biol. Evol. 15:600–607[Abstract]

    Yoshida, M., Y. Suda, I. Matsuo, N. Miyamoto, N. Takeda, S. Kuritani, and S. Aizawa. 1997. Emx1 and Emx2 function in development of dorsal telencephalon. Development 124:101–111

Accepted for publication June 14, 2000.