Frequent Intron Loss in the White Gene: A Cautionary Tale for Phylogeneticists

Jaroslaw Krzywinski and Nora J. Besansky

Center for Tropical Disease Research and Training, Department of Biological Sciences, University of Notre Dame, Indiana

It has been postulated that because spliceosomal introns are incapable of self-splicing, intron indels should be very rare, if not unique, evolutionary events (Venkatesh, Ning, and Brenner 1999Citation ). This characteristic would make them powerful markers for phylogenetic studies, immune from problems of homoplasy that can affect primary sequence data. Counter to this argument, earlier studies of the white gene (Besansky and Fahey 1997Citation ; Gomulski et al. 2001Citation ) indicated a discouraging degree of promiscuity in the pattern of intron presence or absence in the white sequences then available.

Here we report a more extensive study of the intron-exon organization of the dipteran white, present as a single copy gene in all insects examined to date (Levis, Bingham, and Rubin 1982Citation ; Besansky et al. 1995Citation ; Ke et al. 1997Citation ; Abraham et al. 2000Citation ; Gomulski et al. 2001Citation ; R. Beeman, personal communication). The white gene product, involved in the uptake of pigment precursors into the insect eye, belongs to a large superfamily of ABC transporters that translocate a wide range of substances across cellular membranes in both prokaryotes and eukaryotes (Saurin, Hofnung, and Dassa 1999Citation ). We analyzed an ~1-kb gene fragment in 52 insect species from the orders Diptera (48 species), Mecoptera (1 species), Lepidoptera (2 species), Coleoptera (1 species) and homologues from human and Caenorhabditis elegans, identified with a parsimony-based phylogenetic analysis.

Sequences were retrieved from GenBank or were obtained after PCR amplification of genomic DNA using degenerate primers and conditions described earlier (Besansky and Fahey 1997Citation ; Krzywinski, Wilkerson, and Besansky 2001Citation ) (table 1 ). We aimed to include representatives of all main dipteran lineages and insect orders closely related to Diptera. However, attempts to extend the sampling by including representatives of the infraorder Bibionomorpha, additional mecopteran taxa, fleas (Siphonaptera) or Strepsiptera, were unsuccessful because of long introns or mismatches (or both) between primers and templates. Apparently, for the same reason, we were able to amplify and sequence only the 5' half of the gene fragment from Panorpa (Mecoptera), Vanessa (Lepidoptera), and Bombyx mori (Lepidoptera).


View this table:
[in this window]
[in a new window]
 
Table 1 The Taxonomic Positions of Insect Species Used in the Study, with the Corresponding white Gene Sequence Information

 
Prediction of the intron-exon boundaries was facilitated by the comparison of the deduced amino acid sequences with others of known gene structure. Confirmation of inferred splice sites was obtained from Musca domestica by comparison of genomic and cDNA sequences generated by RT-PCR.

The alignment (available on the MBE website) comprised 260 amino acid positions, unambiguously aligned, except for a highly variable region between codons 39–71, reported earlier (Besansky and Fahey 1997Citation ). Aligned amino acids guided the nucleotide alignment on which a total of 14 known and predicted intron insertion sites were mapped. These have been numbered according to their location within the alignment, with intron 1 nearest the 5' end (fig. 1 ). The unambiguously aligned regions harbor 10 introns, of which four are found in homologous (identical) positions in multiple species. Intron 3 is present exclusively in Diptera, whereas three other dipteran introns (10, 11, and 13) are shared by other insects and C. elegans. The number of introns in individual dipteran species varies from 0 to 4. None of the introns from the human white homologue are shared by other taxa.



View larger version (41K):
[in this window]
[in a new window]
 
Fig. 1.—Phylogenetic distribution of introns in a region of the white gene corresponding to D. melanogaster genomic clone X02974 positions 11990 through 12990. The cladogram of relationships is based on a distance tree constructed for ordinal insect relationships using human and C. elegans sequences as the outgroup and to avoid potential bias in the inference of dipteran relationships caused by comparison of very distant taxa on a ML tree for Diptera using Lepidoptera as the outgroup. NJ analyses using PHYLIP 3.57 (Felsenstein 1993Citation ) and distances corrected with the PAM 250 matrix were conducted using amino acid sequences. An ML analysis of nucleotide sequences from insects was performed with PAUP 4.0b7 (Swofford 2001Citation ), using the TrN+I+G model determined by MODELTEST 3.0 (Posada and Crandall 1998Citation ), 100 heuristic searches, and TBR branch swapping. The thick internodes have bootstrap values of 90%–100% in at least one of the parsimony, distance, or likelihood analyses, the intermediate nodes have values of 70%–89%, and the thin internodes have values below 69%. Nodes in the Diptera clade with support higher than 50% are denoted with capital letters and corresponding bootstrap values from three selected analyses (with Lepidoptera as an outgroup) are given in a table below the cladogram. Each taxon represents a single species, except where the number of taxa is given in parentheses. Presence of an intron is represented by a plus sign, absence by a minus, and missing data by blank space. Taxonomic positions of taxa are given on the right; names of insect orders are in uppercase and of infraorders of Diptera in lowercase letters. A shaded box marks introns positioned in a highly variable region

 
To evaluate the distribution of introns in a phylogenetic context we inferred relationships among taxa using unambiguously aligned amino acid and nucleotide sequences. The phylogenetic relationships inferred using neighbor-joining (NJ), maximum parsimony and maximum likelihood (ML) were to a large extent consistent, regardless of the optimality criterion applied (fig. 1 ). Class Insecta formed a strongly supported clade, and when insects only were analyzed, a lineage of Diptera + Panorpa (Mecoptera) was recovered with high bootstrap support. However, Diptera were not well separated from Panorpa, perhaps because of the short sequence analyzed in the latter taxon. Consistent with previous phylogenetic studies, early branching within Diptera remained unresolved. Indeed, relationships among basal lineages remain the subject of major disagreement (Yeates and Wiegmann 1999Citation ), possibly because of rapid diversification of the major subgroups of Diptera (Friedrich and Tautz 1997Citation ). The only well-supported deeper lineages were Muscomorpha and Culicidae + Chaoboridae. Relationships among clades within Muscomorpha were moderately to highly supported. Resolution within Diptera did not improve when Panorpa was excluded from the analysis or when only lepidopteran taxa were used as the outgroup.

Introns were mapped onto trees inferred in this and earlier studies of Diptera (Yeates and Wiegmann 1999Citation ). Despite the limitations imposed by incompletely resolved branching patterns, it is apparent that presence or absence of introns is not correlated with the phylogenetic relationships among taxa (fig. 1 ). The observed distribution suggests relatively frequent and random intron loss during white gene evolution in Diptera. Introns 10, 11, and 13, found in identical positions in insects and C. elegans, must have had a relatively ancient origin, no later than the most recent common ancestor of both lineages (>540 MYA). These introns have been subsequently lost in several dipteran taxa. A reliable inference of the sequence of events and the number of intron losses is hampered by the lack of well-supported phylogenetic relationships within Diptera.

We suggest that of the three introns shared by insects and a roundworm, introns 10 and 11 have been lost three times in distantly related dipteran lineages, whereas intron 13 has been lost four or five times. Loss of intron 10 in a lineage leading to a mosquito Anopheles albitarsis occurred relatively recently, as all the other 28 mosquito species possessed intron 10, including a very close relative, Anopheles albimanus. Of particular interest is an intronless white sequence in a crane fly (Tipulidae). Tipulids are generally regarded as one of the basal clades of Diptera (Yeates and Wiegmann 1999Citation ). Other lineages that originated early in dipteran evolution, including Trichoceridae closely related to Tipulidae, contain two to four introns. It is probable that the most recent common ancestor of Tipulidae and Trichoceridae possessed at least two introns. If so, loss of these introns in a tipulid may have occurred simultaneously, as concerted intron loss appears to be a relatively common phenomenon during genome evolution (Frugoli et al. 1998Citation ).

The most likely mechanism of precise intron deletion is via a spliced or a partially spliced mRNA intermediate which, after reverse transcription, is incorporated into the genome by gene conversion (Lewin 1983Citation ). This mechanism has been demonstrated experimentally in yeast, where replacement of the genomic sequence through homologous recombination with a cDNA copy resulted in intron loss (Derr, Strathen, and Garfinkel 1991Citation ). If processed retrotranscripts are the predominant mediators of intron loss, an important prediction can be made: for an intron loss to be inherited in multicellular organisms, the genes subject to intron loss must be expressed in the germline, in cells developmentally ancestral to the germline, or expressed maternally and supplied to the embryo. Consistent with this prediction, white gene expression has been detected during embryonic development in several insects (Pirrota and Bröckl 1984Citation ; Abraham et al. 2000Citation ; Gomulski et al. 2001Citation ), suggesting that it plays an essential role during early development.

A few convincing examples of intron insertion into new positions have been reported (Logsdon, Stoltzfus, and Doolittle 1998Citation ; Venkatesh, Ning, and Brenner 1999Citation ), but the mechanisms underlying intron gain are poorly understood. Likely mechanisms include insertion by transposable elements, as found in maize (Giroux et al. 1994Citation ); reverse splicing of a spliceosomal intron into a new site, followed by retrotranscript-mediated recombination; and tandem duplication of exons, if the duplicated sequence contains or acquires the elements necessary for splicing (Logsdon, Stoltzfus, and Doolittle 1998Citation ; Venkatesh, Ning, and Brenner 1999Citation ).

The exclusive presence of intron 3 in Diptera suggests its recent gain. The intron is found in a lower dipteran Agathon elegantulus (Nematocera, Blephariceridae) and in most taxa of the higher flies from the infraorder Muscomorpha. Such a distribution is most parsimoniously explained by a single intron gain in a common ancestor of Blephariceridae and higher flies, followed by loss in at least two lineages. Uncertainty surrounding the presumed insertion event and its timing will remain problematic, pending improved resolution of basal lineages of Diptera and extended taxon sampling. In particular, more taxa representing basal dipteran lineages and lineages that diverged shortly before the emergence of Diptera are necessary to increase confidence in the timing of the intron acquisition. Note, however, that the assignment of introns as newly gained must be tentative (Bhattacharya and Weber 1997Citation ). On the basis of six sequences, Gomulski et al. (2001)Citation inferred an increase in the number of the white gene introns from the lower Diptera (Culicidae) to higher Diptera (Calliphoridae and Tephritidae) with an intermediate state in Drosophila melanogaster. However, extended sampling in the present study revealed that two introns, putatively acquired by higher flies (Gomulski et al. 2001Citation ), were actually present in a lower fly A. elegantulus, in nondipteran insects, and in a roundworm. This example emphasizes the difficulty of reconstructing the evolution of gene structure because of a labile nature of intron presence or absence.

Recent advances in automation of molecular techniques have resulted in the accumulation of tremendous amounts of sequence data. The postgenomic era opens the opportunity to utilize in phylogenetics not only sequences of numerous genes but also other types of information embedded in DNA sequences, including intron indels, retroposon integrations, or changes in organelle gene order. In two recent studies, intron indels have been used for tracing monophyletic lineages and establishing evolutionary relationships. Rokas, Kathirithamby, and Holland (1999)Citation used information about intron insertion in the engrailed homeobox gene of Diptera and Lepidoptera and intron absence in corresponding gene positions of Strepsiptera and other taxa as evidence against a close affiliation between Diptera and Strepsiptera. Venkatesh, Ning, and Brenner (1999)Citation used intron indels in six genes to define lineages of actinopterygian fishes and to resolve relationships among them. In common with the classical use of chromosomal inversions to reconstruct phylogeny, the use of intron indels is premised on the assumption that their occurrence at a given position is exceedingly rare, if not unique, thereby providing enormous potential for molecular systematics (Rokas and Holland 2000Citation ). Our study, however, demonstrates that this criterion may not be met, and that intron presence or absence may not be a reliable source of phylogenetic information. The white gene intron indels in Diptera show a high level of homoplasy caused by multiple independent intron losses in distantly related lineages. Other genes not only in insects but also in other taxa may show a similar extent of homoplasy. We believe that intron indels can offer valuable insights into evolutionary history, particularly in relatively recent taxa or taxa conservative with regard to intron differences in their genes, such as vertebrates (Logsdon, Stoltzfus, and Doolittle 1998Citation ). However, caution should be exercised in their use as phylogenetic characters at deeper taxonomic levels, particularly within insects.

Acknowledgements

The authors thank J. Johnson, W. Turner, H. Pratt, U. Willhoeft, M. Goldsmith, K. Okano, C. Porter, and R. Vargas for generously supplying specimens and B. Wiegmann for helpful comments on an earlier draft. R. Beeman kindly shared data prior to publication. J. Bedell and T. Fahey provided skilled technical assistance. Support from the UNDP/World Bank/WHO Special Programme for Research and Training in Tropical Diseases to N.J.B. is gratefully acknowledged.

Footnotes

Dan Graur, Reviewing Editor

Keywords: spliceosomal introns intron loss white gene phylogeny Diptera Back

Address for correspondence and reprints: Jaroslaw Krzywinski, Center for Tropical Disease Research and Training, Department of Biological Sciences, University of Notre Dame, P.O. Box 369, Notre Dame, Indiana 46556-0369. jkrzywin{at}nd.edu Back

References

    Abraham E. G., H. Sezutsu, T. Kanda, T. Kusakabe, T. Sugasaki, T. Shimada, T. Tamura, 2000 Identification and characterisation of a silkworm ABC transporter gene homologous to Drosophila white Mol. Gen. Genet 264:11-19[ISI][Medline]

    Bennett C. L., M. Frommer, 1997 The white gene of the tephritid fruit fly Bactrocera tryoni is characterized by a long untranslated 5' leader and a 12 kb first intron Insect Mol. Biol 6:343-356[ISI][Medline]

    Besansky N. J., J. A. Bedell, O. Mukabayire, D. Hilfiker, F. H. Collins, 1995 Cloning and characterization of the white gene from Anopheles gambiae Insect Mol. Biol 4:217-231[ISI][Medline]

    Besansky N. J., G. T. Fahey, 1997 Utility of the white gene in estimating phylogenetic relationships among mosquitoes (Diptera: Culicidae) Mol. Biol. Evol 14:442-454[Abstract]

    Bhattacharya D., K. Weber, 1997 The actin gene of the glaucocystophyte Cyanophora paradoxa: analysis of the coding region and introns, and an actin phylogeny of eukaryotes Curr. Genet 31:439-446[ISI][Medline]

    Derr L. K., J. N. Strathern, D. J. Garfinkel, 1991 RNA-mediated recombination in S. cerevisiae Cell 67:355-364[ISI][Medline]

    Felsenstein J., 1993 PHYLIP (phylogeny inference package). Version 3.57 University of Washington, Seattle

    Friedrich M., D. Tautz, 1997 Evolution and phylogeny of the Diptera: a molecular phylogenetic analysis using 28S rDNA sequences Syst. Biol 46:674-698[ISI][Medline]

    Frugoli J. A., M. A. McPeek, T. L. Thomas, C. R. McClung, 1998 Intron loss and gain during evolution of the catalase gene family in angiosperms Genetics 149:355-365[Abstract/Free Full Text]

    Garcia R. L., H. D. Perkins, A. J. Howells, 1996 The structure, sequence and developmental pattern of expression of the white gene in the blowfly Lucilia cuprina Insect Mol. Biol 5:251-260[ISI][Medline]

    Giroux M. J., M. Clancy, J. Baier, L. Ingham, D. McCarty, L. C. Hannah, 1994 De novo synthesis of an intron by the maize transposable element Dissociation Proc. Natl. Acad. Sci. USA 91:12150-12154[Abstract/Free Full Text]

    Gomulski L. M., R. J. Pitts, S. Costa, G. Saccone, C. Torti, L. C. Polito, G. Gasperi, A. R. Malacrida, F. C. Kafatos, L. J. Zwiebel, 2001 Genomic organization and characterization of the white locus of the Mediterranean fruitfly, Ceratitis capitata Genetics 157:1245-1255[Abstract/Free Full Text]

    Ke Z., M. Q. Benedict, A. J. Cornel, N. J. Besansky, F. H. Collins, 1997 The Anopheles white gene: molecular characterization of the gene and a spontaneous white gene mutation Genetica 101:87-96[ISI][Medline]

    Krzywinski J., R. C. Wilkerson, N. J. Besansky, 2001 Toward understanding Anophelinae (Diptera, Culicidae) phylogeny: insights from nuclear single copy genes and the weight of evidence Syst. Biol 50:540-556.[ISI][Medline]

    Levis R., P. M. Bingham, G. M. Rubin, 1982 Physical map of the white locus of Drosophila melanogaster Proc. Natl. Acad. Sci. USA 79:564-568[Abstract]

    Lewin R., 1983 How mammalian RNA returns to its genome Science 219:1052-1054[ISI][Medline]

    Logsdon J. M., A. Stoltzfus, W. F. Doolittle, 1998 Molecular evolution: recent cases of spliceosomal intron gain? Curr. Biol 8:R560-R563[ISI][Medline]

    O'Hare K., C. Murphy, R. Levis, G. M. Rubin, 1984 DNA sequence of the white locus of Drosophila melanogaster J. Mol. Biol 180:437-455[ISI][Medline]

    Pirrota V., C. Bröckl, 1984 Transcription of the Drosophila white locus and some of its mutants EMBO J 3:563-568[Abstract]

    Posada D., K. A. Crandall, 1998 MODELTEST: testing the model of DNA substitution Bioinformatics 14:817-818[Abstract]

    Rokas A., P. W. H. Holland, 2000 Rare genomic changes as a tool for phylogenetics TREE 15:454-459[Medline]

    Rokas A., J. Kathirithamby, P. W. H. Holland, 1999 Intron insertion as a phylogenetic character: the engrailed homeobox of Strepsiptera does not indicate affinity with Diptera Insect Mol. Biol 8:527-530[ISI][Medline]

    Saurin W., M. Hofnung, E. Dassa, 1999 Getting in or out: early segregation between importers and exporters in the evolution of ATP-binding cassette (ABC) transporters J. Mol. Evol 48:22-41[ISI][Medline]

    Swofford D. L., 2001 PAUP*: phylogenetic analysis using parsimony (* and other methods). Version 4 Sinauer, Sunderland, Mass

    Venkatesh B., Y. Ning, S. Brenner, 1999 Late changes in spliceosomal introns define clades in vertebrate evolution Proc. Natl. Acad. Sci. USA 96:10367-10271[Abstract/Free Full Text]

    Yeates D. K., B. M. Wiegmann, 1999 Congruence and controversy: toward a higher-level phylogeny of Diptera Annu. Rev. Entomol 44:397-428[ISI]

Accepted for publication November 13, 2001.