Accelerated Evolution of the Surface Amino Acids in the WD-Repeat Domain Encoded by the hagoromo Gene in an Explosively Speciated Lineage of East African Cichlid Fishes

Yohey Terai, Naoko Morikawa, Koichi Kawakami and Norihiro Okada

*Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Japan;
{dagger}Department of Tumor Biology, Institute of Medical Science, University of Tokyo

Lakes Victoria, Malawi, and Tanganyika in the East African Rift Valley harbor approximately 200, 400, and 170 endemic species of cichlid fishes, respectively (Fryer and Iles 1972Citation ; Greenwood 1984Citation ). These fishes provide a spectacular example of the explosive adaptive radiation of living vertebrates (Fryer and Iles 1972Citation ; Greenwood 1984Citation ). They exploit almost all resources that are available to freshwater fishes in general (Fryer and Iles 1972Citation ; Greenwood 1984Citation ) and are extremely diverse, both ecologically and morphologically, despite having evolved during a very short evolutionary period (Meyer et al. 1990Citation ; Johnson et al. 1996Citation ). In cichlids, species are sexually isolated as a consequence of mate choice (Crapon De Caprona 1996Citation ; Seehausen, Van Alphen, and Witte 1997Citation ), which is based on coloration. Assortative mating among individuals with various colorations can rapidly lead to sexual isolation of color morphs (Seehausen, Van Alphen, and Witte 1997Citation ). Therefore, it is reasonable to postulate that genes that control formation of pigment patterns that are responsible for cichlid speciation must have changed at an accelerated rate in parallel with the diversification of the pigment patterns of species. In an attempt to identify such genes, we have focused our attention on genes responsible for the formation of pigment patterns in cichlids. Mechanisms underlying patterns of pigmentation remain, however, totally unknown. In zebrafish, various mutations affecting pigmentation have been described (Johnson et al. 1995Citation ; Haffer et al. 1996Citation ), and some relevant genes have been cloned and characterized (e.g., sparse, nacre, and hagoromo [hag]; Lister et al. 1999Citation ; Parichy et al. 1999Citation ; Kawakami et al. 2000Citation ). The genes that have been identified in zebrafish should help us to analyze pigmentation in other species of fishes, including cichlids. As a first step toward an understanding of the molecular basis for the divergence of pigment patterns and speciation in cichlids, we cloned and characterized a cichlid homolog of the zebrafish hag gene.

We cloned a partial cDNA of cichlid hag by RT-PCR, using degenerate primers hagdF1 (5'-AGTGCAGGAGGAGAYGGKAARAT-3') and hagdR1 (5'-GTCTCTGGAGCCACTSACDAT-3') and, subsequently, we isolated full-length cDNAs from the RNA of Labidochromis caeruleus, a cichlid from Lake Malawi, by 3'RACE and 5'RACE using the nested primers hagFS1 (5'-CGTTGTCCACAGCAGGAGAAGTGAT-3'), hagFS2 (5'-GTGCCAGTGGAGTTCTCAGGTCATAAC-3'), hagRS1 (5'-AGTCCGTCTTTAGCATCCACACAGTTC-3'), and hagRS2 (5'-CCTGGTTATGACCTGAGAACTCCACTG-3').

Figure 1a shows an alignment of the products of the hag genes of L. caeruleus and zebrafish (Kawakami et al. 2000Citation ). The cichlid hag cDNA encoded a putative protein of 389 amino acids that was 64% homologous to the protein deduced from the zebrafish hag gene.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 1.—(a) An alignment of the amino acid sequences deduced from the hag gene of a cichlid fish (L. caeruleus) and of zebrafish. Identical amino acids are highlighted in black. The hypothetical F-box and WD-repeats are enclosed in boxes. The a, b, and c regions of each hypothetical ß-strand, each loop and turn, and each variable region are indicated by arrows, wavy lines, and shaded boxes, respectively, below the alignment. (b) Phylogeny of the cichlid species used in this study, showing average values of Dn/Ds within groups. Surface and nonsurface values indicate the average values of Dn/Ds for surface and nonsurface residues in the WD-repeat regions encoded by the hag gene in the Great Lakes. The consensus phylogenetic tree of cichlids is based on molecular data from several sources (Nishida 1991Citation ; Kocher et al. 1993Citation ; Sturmbauer and Meyer 1993Citation ; Mayer, Tichy, and Klein 1998Citation ). The numbers above the branches are the numbers of nonsynonymous substitutions (right) and the numbers of synonymous substitutions (left) at the branches, as estimated by the parsimony method. We used tribe name for Lake Tanganyika species (Poll 1986Citation ). (c) Sliding-window analysis of estimates of average values of Dn and Ds from sequences of the WD-repeat regions encoded by the hag gene from the Great Lakes group. We divided the codon window into deduced surface residues and nonsurface residues. We calculated the average values of Dn and Ds for surface residues and nonsurface residues. A schematic representation of the WD-repeat structure encoded by hag is indicated above the sliding window, and the surface residues and nonsurface residues in each WD-repeat are indicated by shaded boxes and open boxes, respectively, below the sliding window. Open columns and filled columns indicate Ds and Dn values, respectively

 
The product of the hag gene is a member of the family of F-box–WD-repeat proteins, in which the F-box and WD-repeats are fused (Kawakami et al. 2000Citation ). The proteins in the F-box–WD-repeat family function in a variety of regulatory pathways, being involved in the control of cell division (Feldman et al. 1997Citation ; Kominami and Toda 1997Citation ), the regulation of sulfur metabolism (Natorff, Piotrowska, and Paszewski 1998Citation ), and the regulation of differentiation (Jiang and Struhl 1998Citation ). Moreover, all of them are involved in protein degradation. The product of the hag gene appears to regulate an as yet unidentified developmental pathway, which is essential for the formation of the stripe pattern, by promoting the degradation of putative target proteins (Kawakami et al. 2000Citation ). The F-box domain binds to Skp1p, a protein that is thought to act in multiple protein-degradation pathways because it participates in the assembly of a variety of ubiquitination complexes that recruit specific substrates for degradation by the proteosome (Patton, Willems, and Tyers 1998Citation ). The WD-repeat domain binds with a variety of proteins (Smith et al. 1999Citation ). The WD-repeats form a ß-propeller fold, a highly symmetrical structure made up of repeats that each consist of a small four-stranded antiparallel ß sheet (Wall et al. 1995Citation ; Lambright et al. 1996Citation ; Sondek et al. 1996Citation ). However, each WD-repeat is not equivalent to a single propeller blade; each repeat contains the first three strands (fig. 1a; strands a, b, and c) of one blade and the last strand (fig. 1a; strand d, included in the variable region) of the next blade. The ß-propeller structures create a stable platform that can form complexes reversibly and simultaneously with several proteins (Smith et al. 1999Citation ). For example, the Gß subunit of heterotrimeric G proteins interacts tightly with G{gamma} while interacting simultaneously with one of more than 15 different proteins (van der Voorn, Hengeveld, and Ploegh 1992Citation ). The product of the cichlid hag gene includes at least five WD-repeats, and we deduced the structure of the WD-repeat domains by referring to the work of Smith et al. (1999; fig. 1a ).

To identify possible adaptive changes in the hag gene during cichlid evolution, we sequenced the WD-repeat domain (589 bp; fig. 1a ), a putative regulatory domain associated with the formation of a striped pattern of coloration, of this gene from 10 species in the major cichlid lineages in the African Great Lakes and from three species of riverine cichlids (fig. 1b ). We used a pair of primers, designated hage3F (5'-CTGCTGACATAAAGGTGTACCATATCCACA-3'), hage9R2 (5'-TCTGAAGTCCAGCGAATGCACAG-3'), to amplify WD-repeat regions from cDNAs from each species. The nucleotide sequences are in GenBank under accession numbers AB075463AB075475. Then we calculated all the pairwise values for nonsynonymous substitutions per nonsynonymous site (Dn) and for synonymous substitutions per synonymous site (Ds), using the program package MEGA version 2.1 (Kumar et al. 2001Citation ). The ratio Dn/Ds provides an estimate of the evolutionary rate of amino acid substitutions as well as standard errors by bootstrap resampling (Miyata and Yasunaga 1980Citation ; Felsenstein 1985Citation ). To analyze one sequence from one tribe, we used the consensus sequence of tribe Lamprologini (fig. 1b; Neolamprologus leleupi, Neolamprologus brichardi, and Altolamprologus calvus) for this analysis.

In general, for elucidating the evolution of a certain gene, it is necessary to know, in advance, the phylogeny of the species in question. In the present case, referring to the presently accepted phylogeny of cichlid species, we divided the African cichlid fishes, we used, into two groups: a riverine group, which includes a small number of species and can serve as an outgroup for the other groups (Ribbink 1991Citation ) and the Great Lakes group, which includes the members of the tribe in Lake Tanganyika, as well as East Africa riverine Haplochromine, and members of the Lake Victoria and Lake Malawi flocks (Mayer, Tichy, and Klein 1998Citation ). Species in the Great Lakes group appear to have developed to high morphological diversity. This group includes a vast number of species (>800) and demonstrates the results of explosive speciation (Fryer and Iles 1972Citation ; Greenwood 1984Citation ; Meyer et al. 1990Citation ; Sturmbauer and Meyer 1993Citation ; Johnson et al. 1996Citation ). We calculated the average values of Dn/Ds for the hag gene for the riverine and the Great Lakes groups and compared them. We postulated that if the hag gene has changed with speciation or morphological changes (or both), amino acid substitutions should have occurred at an accelerated rate and the average values of Dn/Ds for the Great Lakes group should be higher than those for the riverine group because speciation has occurred more frequently in the Great Lakes lineage. However, if the gene has not been involved in speciation or morphological changes (or both), the values should be similar.

Figure 1b shows the phylogeny of the cichlid species used in this study, as well as the average values of Dn/Ds within each group. The average values of Dn/Ds for the hag gene from the riverine group and the Great Lakes group were estimated to be 0.125 ± 0.042 and 0.267 ± 0.019, respectively (fig. 1b ). The average value for the Great Lakes group is twice higher than that for the riverine group (fig. 1b ). The confidence interval for the estimate for the Great Lakes does not overlap, indicating the robust nature of the estimations of these values. The difference between the estimate for the Great Lakes group and that for the riverine group was statistically significant, as estimated by bootstrap resampling (Felsenstein 1985Citation ). The higher average value of Dn/Ds for the Great Lakes group suggests that the amino acids in the WD-repeat domain encoded by the hag gene changed at an accelerated rate in this group. Thus, there appears to be a correlation between the explosive speciation of the Great Lakes lineage and the high rate of change in the WD-repeat domain encoded by the hag gene. We also calculated the average values of Dn/Ds for the hag gene for the rapid speciation group, which includes the Tropheini tribe in Lake Tanganyika, as well as the Lake Victoria and Lake Malawi flocks, and which is designated as the TMV (Tropheini, Lake Malawi, and Lake Victoria flock) group (Takahashi et al. 2001Citation ). Species in this group appear to have been subject to rapid speciation very recently (Meyer et al. 1990Citation ; Sturmbauer and Meyer 1992Citation ; Johnson et al. 1996Citation ). Although the average value of Dn/Ds for the TMV group was only a little higher than that for the Great Lakes group (fig. 1b ), it was demonstrated that the nonsynonymous substitutions concentrate in the surface residues of the protein (see subsequently), suggesting the accelerated evolution of the regulation of the protein interaction with hag protein in this lineage (discussed subsequently).

In the WD-repeat domain, the ß-propeller structure contains three potential interacting surfaces, namely, the top, the bottom, and the circumference, and these surfaces are composed of variable regions (fig. 1a ) and interact with other proteins (Smith et al. 1999Citation ). Referring to the deduced structure of WD-repeats encoded by the cichlid hag gene (fig. 1a ), we divided the amino acid sequence of the gene product into two regions: surface residues (variable regions in fig. 1a ) and nonsurface residues (strands a, b, and c and loops and turns; fig. 1a ). In order to identify the region that changed at an accelerated rate in the Great Lakes group, we calculated the average values of Dn/Ds for the surface residues and the nonsurface residues separately in the Great Lakes group and compared them (fig. 1b ). We also performed a sliding-window analysis of the average estimates of Dn and Ds for each of several species from the Great Lakes, as shown in figure 1c.

We found that the average value for the surface residues (0.536 ± 0.063) was about three times higher than that for the nonsurface residues (0.195 ± 0.014; fig. 1b ) in the Great Lakes group. In the case of the TMV group, nonsynonymous substitutions concentrate in the surface residues, and there is no synonymous substitution in this region, making the calculation of Dn/Ds analysis impossible. We also calculated the average values in the riverine group and found no difference between surface and nonsurface values, as was seen in the Great Lakes group (fig. 1b ). Moreover, in the Great Lakes group, the sliding-window analysis clearly showed that the average estimate of Dn for the surface residues is always higher than that for the nonsurface residues in each unit of WD-repeats (second, third, and fourth repeats; fig. 1c ). These results demonstrate that an accelerated rate of changes in amino acid in the Great Lakes group resulted from changes in the surface residues of the WD-repeat domains. Thus, it is likely that the observed accelerated changes in amino acids have not affected the structure of the WD-repeat domain itself but, rather, they have affected the regulation of the interactions with binding proteins in the Great Lakes group. The amino acid sequences at the surface of the WD-repeat domain might regulate the formation of pigment patterns in a manner that is somehow related to cichlid speciation by sexual selection.

In cichlids, the hag gene might function in the regulation of pigment-pattern formation. An analysis of proteins that bind to WD-repeat domains of the product of the hag gene might provide interesting insight into this possibility.

The evolution of species was initially studied in terms of differences in morphology, and the correlations between differences in morphology and changes in genes remain to be clarified. The analysis of the genes that control pigment-pattern formation in East African cichlids provides an opportunity for studies of mechanisms of speciation and of correlations between morphological differences and changes in genes, in general.

Footnotes

Dan Graur, Reviewing Editor

Keywords: cichlid adaptive radiation color pattern formation gene hagoromo WD-repeat protein Back

Address for correspondence and reprints: Norihiro Okada, Graduate School of Bioscience and Biotechnology, Tokyo Institute of Technology, 4259 Nagatsuta-cho, Midori-ku, Yokohama 226-8501, Japan. nokada{at}bio.titech.ac.jp . Back

References

    Crapon De Caprona M. D., 1996 The use of fertile hybrids for the study of the accuracy of species recognition in cichlids Ann. Mus. R. Afr. Cent. Sci. Zool 251:117-120

    Feldman R. M., C. C. Correll, K. B. Kaplan, R. J. Deshaies, 1997 A complex of Cdc4p, Skp1p, and Cdc53p/cullin catalyzes ubiquitination of the phosphorylated CDK inhibitor Sic1p Cell 91:221-230[ISI][Medline]

    Felsenstein J., 1985 Confidence limits on phylogenies: an approach using the bootstrap Evolution 39:783-791[ISI]

    Fryer G., T. D. Iles, 1972 The cichlid fishes of the Great Lakes of Africa Oliver & Boyd, Edinburgh, U.K

    Greenwood P. H., 1984 African cichlids and evolutionary theories Pp. 141–154 in A. A. Echelle and I. Kornfield, eds. Evolution of fish species flock. University of Maine at Orono Press, Orono

    Haffer P., J. Odenthal, M. C. Mullins, et al. (17 co-authors) 1996 Mutations affecting pigmentation and shape of the adult zebrafish Dev. Genes Evol 206:260-276[ISI]

    Jiang J., G. Struhl, 1998 Regulation of the Hedgehog and Wingless signalling pathways by the F-box/WD40-repeat protein Slimb Nature 391:493-496[ISI][Medline]

    Johnson S. L., D. Africa, C. Walker, J. A. Weston, 1995 Genetic control of adult pigment stripe development in zebrafish Dev. Biol 167:27-33[ISI][Medline]

    Johnson T. C., C. A. Scholz, M. R. Talbot, K. Kelts, R. D. Ricketts, G. Ngobi, K. Beuning, I. Ssemmanda, J. W. McGill, 1996 Late Pleistocene desiccation of Lake Victoria and rapid evolution of cichlid fishes Science 273:1091-1093.[Abstract]

    Kawakami K., A. Amsterdam, N. Shimoda, T. Becker, J. Mugg, A. Shima, N. Hopkins, 2000 Proviral insertions in the zebrafish hagoromo gene, encoding an F-box/WD40-repeat protein, cause stripe pattern anomalies Curr. Biol 10:463-466[ISI][Medline]

    Kocher T. D., J. A. Conroy, K. R. McKaye, J. R. Stauffer, 1993 Similar morphologies of cichlid fish in Lakes Tanganyika and Malawi are due to convergence Mol. Phylogenet. Evol 2:158-165[Medline]

    Kominami K., T. Toda, 1997 Fission yeast WD-repeat protein pop1 regulates genome ploidy through ubiquitin-proteasome-mediated degradation of the CDK inhibitor Rum1 and the S-phase initiator Cdc18 Genes Dev 11:1548-1560[Abstract]

    Kumar S., K. Tamura, I. B. Jakobsen, M. Nei, 2001 MEGA2: molecular evolutionary genetics analysis Software Bioinformatics 12:1244-1245.

    Lambright D. G., J. Sondek, A. Bohm, N. P. Skiba, H. E. Hamm, P. B. Sigler, 1996 The 2.0 Å crystal structure of a heterotrimeric G protein Nature 379:311-319[ISI][Medline]

    Lister J. A., C. P. Robertson, T. Lepage, S. L. Johnson, D. W. Raible, 1999 nacre encodes a zebrafish microphthalmia-related protein that regulates neural-crest-derived pigment cell fate Development 126:3757-3767[Abstract/Free Full Text]

    Mayer W. E., H. Tichy, J. Klein, 1998 Phylogeny of African cichlid fishes as revealed by molecular markers Heredity 80:702-714[ISI][Medline]

    Meyer A., T. D. Kocher, P. Basasibwaki, A. C. Wilson, 1990 Monophyletic origin of Lake Victoria cichlid fishes suggested by mitochondrial DNA sequences Nature 347:550-553[ISI][Medline]

    Miyata T., T. J. Yasunaga, 1980 Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application J. Mol. Evol 16:23-36[ISI][Medline]

    Natorff R., M. Piotrowska, A. Paszewski, 1998 The Aspergillus nidulans sulphur regulatory gene sconB encodes a protein with WD40 repeats and an F-box Mol. Gen. Genet 257:255-263[ISI][Medline]

    Nishida M., 1991 Lake Tanganyika as an evolutionary reservoir of old lineages of East African cichlid fishes: inference from allozyme data Experientia 47:974-979[ISI]

    Parichy D. M., J. F. Rawls, S. J. Pratt, T. T. Whitfield, S. L. Johnson, 1999 Zebrafish sparse corresponds to an orthologue of c-kit and is required for the morphogenesis of a subpopulation of melanocytes, but is not essential for hematopoiesis or primordial germ cell development Development 126:3425-3436[Abstract/Free Full Text]

    Patton E. E., A. R. Willems, M. Tyers, 1998 Combinatorial control in ubiquitin-dependent proteolysis: don't Skp the F-box hypothesis Trends Genet 14:236-243[ISI][Medline]

    Poll M., 1986 Classification des Cichlidae du lac Tanganyika: tribus, genres et especes Acad. R. Belg. (2E serie) XLV:1-163

    Ribbink A. J., 1991 Distribution and ecology of the cichlids of the African Great Lakes Pp. 36–59 in M. H. A. Keenleyside, ed. Cichlid fishes: behavior, ecology and evolution. Chapman & Hall, London

    Seehausen O., J. J. M. Van Alphen, F. Witte, 1997 Cichlid fish diversity threatened by eutrophication that curbs sexual selection Science 277:1808-1811[Abstract/Free Full Text]

    Sondek J., A. Bohm, D. G. Lambright, H. E. Hamm, P. B. Sigler, 1996 Crystal structure of a G-protein beta gamma dimer at 2.1 Å resolution Nature 379:369-374[ISI][Medline]

    Smith T. F., C. Gaitatzes, K. Saxena, E. J. Neer, 1999 The WD repeat: a common architecture for diverse functions Trends Biochem. Sci 24:181-185[ISI][Medline]

    Sturmbauer C., A. Meyer, 1992 Genetic divergence, speciation and morphological status in a lineage of African cichlid fishes Nature 358:578-581[ISI][Medline]

    ———. 1993 Mitochondrial phylogeny of the endemic mouthbrooding lineages from Lake Tanganyika in East Africa Mol. Biol. Evol 10:751-768[Abstract]

    Takahashi K., Y. Terai, M. Nishida, N. Okada, 2001 Phylogenetic relationships and ancient incomplete lineage sorting among cichlid fishes in Lake Tanganyika as revealed by analysis of the insertion of retroposons Mol. Biol. Evol 18:2057-2066.[Abstract/Free Full Text]

    van der Voorn L., T. M. Hengeveld, H. L. Ploegh, 1992 Subunit interactions of the Go protein FEBS Lett 308:75-78[ISI][Medline]

    Wall M. A., D. E. Coleman, E. Lee, J. A. Iniguez-Lluhi, B. A. Posner, A. G. Gilman, S. R. Sprang, 1995 The structure of the G protein heterotrimer Gi alpha 1 beta 1 gamma 2 Cell 83:1047-1058[ISI][Medline]

Accepted for publication December 7, 2001.