Molecular Evolution of the Wingless Gene and Its Implications for the Phylogenetic Placement of the Butterfly Family Riodinidae (Lepidoptera: Papilionoidea)

Dana L. Campbell1,*, Andrew V. Z. Brower{ddagger} and Naomi E. Pierce*

*Department of Organismic and Evolutionary Biology, Harvard University; and
{dagger}Department of Entomology, Oregon State University

Abstract

The sequence evolution of the nuclear gene wingless was investigated among 34 representatives of three lepidopteran families (Riodinidae, Lycaenidae, and Nymphalidae) and four outgroups, and its utility for inferring phylogenetic relationships among these taxa was assessed. Parsimony analysis yielded a well-resolved topology supporting the monophyly of the Riodinidae and Lycaenidae, respectively, and indicating that these two groups are sister lineages, with strong nodal support based on bootstrap and decay indices. Although wingless provides robust support for relationships within and between the riodinids and the lycaenids, it is less informative about nymphalid relationships. Wingless does not consistently recover nymphalid monophyly or traditional subfamilial relationships within the nymphalids, and nodal support for all but the most recent branches in this family is low. Much of the phylogenetic information in this data set is derived from first- and second-position substitutions. However, third positions, despite showing uncorrected pairwise divergences up to 78%, also contain consistent signal at deep nodes within the family Riodinidae and at the node defining the sister relationship between the riodinids and lycaenids. Several hypotheses about how third-position signal has been retained in deep nodes are discussed. These include among-site rate variation, identified as a significant factor by maximum likelihood analyses, and nucleotide bias, a prominent feature of third positions in this data set. Understanding the mechanisms which underlie third-position signal is a first step in applying appropriate models to accommodate the specific evolutionary processes involved in each lineage.

Introduction

The last 40 years have seen considerable activity in the higher systematics and classification of the "true" butterflies (Papilionoidea, including Hesperiidae). However, the multiple hypotheses that have arisen from these efforts do not provide a consistent interpretation of the evolution of the major butterfly lineages. In particular, assessing the monophyly and phylogenetic placement of the large family Riodinidae, which contains over 1,200 species, has been problematic, as can be seen by the conflicting phylogenetic hypotheses derived from morphological data. Although most morphological studies place the riodinids as the sister to the lycaenid butterflies and identify the nymphalids as the closest relatives to this riodinid + lycaenid clade (Ehrlich 1958Citation ; Ehrlich and Ehrlich 1967Citation ; Kristensen 1976Citation ; Scott and Wright 1990Citation ; DeVries 1991, 1997Citation ; Fiedler 1991Citation ; De Jong, Vane-Wright, and Ackery 1996Citation ), few characters support these relationships, and alternate relationships among these groups have been suggested (Robbins 1988a, 1988b;Citation Martin and Pashley 1992Citation ; for review, see Campbell and Pierce 2000Citation ).

To determine the placement of riodinid butterflies, we examined a new set of molecular characters from wingless (wg), a nuclear gene which has shown utility in reconstructing species level to subfamily level relationships in nymphalids (Brower and DeSalle 1998Citation ). Wingless is a member of the wnt gene family, whose paralogs are easily distinguishable (Sidow 1992Citation ). Primers specific to lepidopteran wingless have been developed from the 3' exon (Carroll et al. 1994Citation ; Brower and DeSalle 1998Citation ). Because this region of wingless showed a rapid rate of substitution in nymphalids, it holds promise for resolving family level relationships in the Lepidoptera (Brower and Egan 1997Citation ; Brower and DeSalle 1998Citation ). Previous molecular studies of the Papilionoidea have used few riodinid representatives (Martin and Pashley 1992Citation ; Weller, Pashley, and Martin 1996Citation ). Our work includes representative taxa from all the main lineages of riodinids, lycaenids, and nymphalids (sensu Harvey [1987Citation ], Eliot [1973Citation ], and Harvey [1991]Citation , respectively). The aims of this study are (1) to test the hypotheses that the Riodinidae and Lycaenidae are each monophyletic and determine the relationship between these two families and the family Nymphalidae, and (2) to describe patterns of the molecular evolution of wingless and demonstrate the utility of this gene for reconstructing family and subfamily level relationships among the Lepidoptera.

Materials and Methods

Specimens
Taxa were selected to represent each of the main lycaenid, riodinid, and nymphalid lineages (table 1 ). Previous studies of butterfly systematics agree on the Hesperiidae as the basal lineage of the Papilionoidea (but see Scoble 1986, 1988Citation ); for this reason, a hesperiid representative was included as an outgroup, as were two representatives of the Pieridae and one species of Papilionidae. Taxa were collected as fresh specimens, and the bodies were stored in 100% ethanol at -80°C. Wings were retained as vouchers in the Harvard Museum of Comparative Zoology (riodinids and lycaenids) and the American Museum of Natural History (nymphalids).


View this table:
[in this window]
[in a new window]
 
Table 1 Taxa Used in this Study and Their Classification According to Harvey (1987) (Riodinids), Eliot (1973) (Lycaenids), and Harvey (1991) (Nymphalids)

 
Molecular Methods
For the lycaenid, riodinid, and hesperiid species, genomic DNA was extracted using proteinase K digestion followed by phenol : chloroform (1:1) extraction (Maniatis, Fritsch, and Sambrook 1989Citation ). Genomic DNA for nymphalid, pierid, and papilionid species was prepared as described in Brower (1994)Citation .

The primers WG1, WG2, and WG2.1, described by Brower and DeSalle (1998)Citation , were used to amplify the 450-bp wingless fragment from all specimens. The amplification conditions consisted of 1 min denaturation at 94°C, 1 min annealing at 50°C, and 1 min extension at 72°C for 30 cycles. Final concentrations of reagents in a standard 100-µl reaction were 1x Taq buffer (50 mM KCl, 100 mM Tris-HCl, 0.1% Triton X-100), 3 mM MgCl2, 0.5 µM each primer, and 2.5 U Taq polymerase. PCR products were excised from a 1.2% low-melt agarose gel and phenol : chloroform extracted. The purified products were cycle sequenced in both directions using the ABI dye terminator core kit in a Perkin-Elmer 480. Excess nucleotides were removed using CentriSep spin columns (Princeton Separations), and reactions were run out on an ABI 370 automated sequencer. PCR and sequencing of the wingless fragment for the nymphalids was performed as described in Brower and DeSalle (1998)Citation . Sequence chromatograms were edited and aligned by hand in the program SeqEd (Applied Biosystems, Inc. 1992, SeqEd 1.0.3). Edited nucleotide sequences were translated using GeneJockey (Taylor 1991Citation ). GenBank accession numbers for sequences are listed in table 1 .

Data Analyses and Phylogenetic Methods
We used PAUP*, versions 4.0d54 and 4.0d59 (Swofford 1998Citation ), to calculate pairwise distance matrices, determine likelihood values, and conduct parsimony and LogDet distance analyses. Third-position Euclidean distances based on GC content were calculated as described by Ortí and Meyer (1996)Citation . Skewness (-g1) scores were calculated from 5,000 random trees generated in PAUP*, version 4.0d59. MacClade, version 3.06 (Maddison and Maddison 1996Citation ), was used to explore character evolution and patterns of nucleotide substitution along tree topologies. MEGA, version 1.01 (Kumar, Tamura, and Nei 1993Citation ) was used to determine proportions of synonymous and nonsynonymous differences. Codon preferences and measures of codon bias (ENC and scaled {chi}2) were calculated using the program Molecular Evolutionary Analysis (MEA; generously provided by the author, Etsuko Moriyama, at Yale University).

Parsimony analyses consisted of heuristic searches with 10–50 random-addition replicates using tree bisection-reconnection (TBR) branch swapping. Analysis of amino acid characters employed a transformed Blosum 80 step matrix (see appendix) to weight substitutions among amino acid residues relative to their frequencies of change among observed protein sequences (Henikoff and Henikoff 1992Citation ). In parsimony analyses, nodal support was estimated using bootstrap analyses (100 replications, with 10 random-addition replicates each) and branch support (decay indices; Bremer 1988, 1994Citation ; Donoghue et al. 1992;Citation Davis 1995Citation ). Distance analyses were carried out using the neighbor-joining algorithm (Saitou and Nei 1987Citation ); estimates of nodal support on distance trees were derived using bootstrap analyses (1,000 replications).

We compared the fit of 12 evolutionary models for this data set by estimating maximum-likelihood (ML) scores for the two trees obtained from the unweighted parsimony search. Likelihood scores for these trees were estimated under four models of evolution: the Jukes-Cantor (JC; Jukes and Cantor 1969Citation ), Kimura (1980)Citation two-parameter (K2P), HKY85 (Hasegawa, Kishino, and Yano 1985Citation ), and general time reversible (GTR [=REV of Yang 1994Citation ]) models, using ML to estimate nucleotide frequencies for the HKY85 and GTR models. For each of these models, we evaluated likelihood scores under assumptions of (1) equal rates at all sites; (2) a proportion of sites estimated by ML being invariable (I); (3) rates at all sites evolving with rate heterogeneity as approximated by four discrete rate classes of the gamma distribution ({Gamma}), estimated using ML; and (4) some sites being invariable, with variable sites evolving under a gamma distribution. Likelihood ratio tests were used to determine the model that best fit the data between pairs of nested models (Goldman 1993Citation ; Yang, Goldman, and Friday 1994Citation ; Felsenstein 1995Citation ).

Results and Discussion

Alignment and Substitution Patterns
Alignment of the 350-bp wingless fragment for all taxa required a single one-codon insertion in a pierid (Delias). Overall uncorrected sequence divergence percentages among taxa range from 10.3% to 41.6%, greater than those found by Brower and DeSalle (1998)Citation for comparison of divergences in wingless and the mitochondrial Cytochrome Oxidase I (COI) gene between lepidopteran families (mean uncorrected pairwise difference between Pieridae vs. Nymphalidae reported as 22.8% for wingless, 16.6% for COI). The data set described here yielded 214 parsimony-informative nucleotides when sequences were compared across all taxa (not including outgroup taxa). Of these parsimony-informative nucleotide substitutions, 56 occurred at first-codon-position sites, 43 at second positions, and 115 at third positions (table 2 ). When reconstructed on the (two) most-parsimonious trees recovered from unweighted parsimony (trees discussed below), most second positions required smaller numbers of changes across the tree (up to 9 changes per character, averaging 1.3 changes per character) than did first and third positions (first positions made up to 12 changes per character, averaging 2.48; third positions made up to 19 changes per character, averaging 11). This is reflected in higher consistency index (CI) and retention index (RI) values when second positions alone were reconstructed on these trees than when first and third positions were reconstructed (table 3 ). The estimated proportion of nonsynonymous differences (±SD) ranged from 1.04 ± 0.73% to 38.08 ± 3.48% over all taxa. When translated, 79% of the 116 amino acid residues in this fragment are variable, showing up to eight character states, and 52% of amino acid sites were phylogenetically informative.


View this table:
[in this window]
[in a new window]
 
Table 2 Distribution of Substitutions in wingless for 38 Butterfly Taxa

 

View this table:
[in this window]
[in a new window]
 
Table 3 Tree Statistics for Substitutions Reconstructed Onto the Two Trees Resulting from an Unweighted Parsimony Search (see fig. 3), Excluding Parsimony-Uninformative Characters

 
The ratio of transitions to transversions (ti/tv) estimated on the topologies recovered from an unweighted parsimony search was 1.3, reflecting a slightly faster overall accumulation of transitions than transversions across all positions. To assess saturation effects in this data set, numbers of transitional and transversional pairwise changes were plotted against GTR-corrected pairwise distances. The GTR model was used, since it was found by likelihood ratio tests to be the optimal model for this data set (see below). For both first and second positions, numbers of transitions and transversions steadily accumulate as corrected pairwise divergences increase, indicating that saturation has not been reached (fig. 1 ). On the other hand, numbers of third-position transitions and transversions reach a plateau across the larger pairwise divergences, indicating possible accumulation of multiple hits in third positions. This plateau effect is more pronounced for transitions than for transversions, which is not surprising since at third codon positions only 3% of transitions cause amino acid replacements, compared with 41% for transversions (Crozier and Crozier 1993Citation ; Wakely 1996Citation ), and accumulate multiple hits more rapidly. In addition, third-position distances exceeded the maximum calculable value for the GTR correction method in 130 of the pairwise comparisons, further evidence that multiple hits occur at third positions in distant taxa. There were no undefined distances in GTR-corrected pairwise divergences of first or second positions.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 1.—Abundances of transitions and transversions as a function of GTR-corrected sequence divergence for each pairwise comparison of taxa, broken down by codon position

 
Nucleotide and Amino Acid Composition
In the region of wingless examined, average nucleotide ratios were unbiased across all nucleotide positions, except for slightly lower proportions of thymidine (fig. 2a ; see also Brower and DeSalle 1998Citation ). However, when codon positions were examined individually, nucleotide content showed more variation. Chi-square tests (implemented in PAUP*, version 4.0d59) indicate that wingless shows no significant nucleotide compositional variation among taxa in first or second positions ({chi}2 = 34.49, df = 111; P = 1.00). First-position nucleotides across all taxa show reduced T contents relative to A, C, and G contents (mean first-position T content = 15.7%; this possibly explains overall reduced thymidine levels). A similarly reduced T content in first positions was found by Friedlander et al. (1996)Citation in the gene PEPCK in Lepidoptera, and by Ortí and Meyer (1996)Citation in the ependymin gene in fish.



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 2.—Nucleotide composition of (a) all nucleotides, broken down by codon position for all taxa, and (b) third-position nucleotides, broken down for riodinid, lycaenid, and nymphalid butterflies individually. Asterisks indicate statistically heterogeneous nucleotide composition (P < 0.05) in chi-square tests among taxa

 
Mean base compositions among second-codon-position nucleotides show heightened A contents (mean = 30.3%) and slightly lowered T and C contents (mean TC content = 42.0%). Ortí and Meyer (1996)Citation describe a similar pattern in the ependymin gene, which, like wingless, codes for a secreted protein. This pattern may reflect compositional constraints imposed on second-position nucleotides. Specifically, hydrophobic amino acids (F, L, I, M, V, A, C; hydropathic indices as defined by Kyte and Doolittle [1982Citation ]) are never coded for by triplets with A in the second position, whereas hydrophilic amino acids do tend to have A in the second position. Thus, secreted proteins with a functional requirement for an overall hydrophilic nature (such as ependymin) have high A contents and low TC contents at second positions, whereas membrane-spanning proteins have the opposite condition (Naylor, Collins, and Brown 1995Citation ; Ortí and Meyer 1996Citation ). Similar requirements may be dictating high A contents in second-position nucleotides in wingless, which is a diffusible secreted glycoprotein with a high percentage of hydrophilic amino acids (Couso, Bishop, and Martinez Arias 1994Citation ; Perrimon 1996Citation ) (the butterflies examined here showed an average hydrophilic amino acid content of 68%).

Although base compositions across taxa are homogeneous at first and second positions, significant chi-square tests of third positions indicate among-taxa compositional heterogeneity ({chi}2 = 433.33; df = 111; P < 0.001). When third-position nucleotide content was broken down by family, we found riodinid base composition to be statistically homogeneous among all riodinid taxa examined ({chi}2 = 25.30; df = 33; P = 0.829), with all riodinids showing high AT contents (average AT = 54.7%; fig. 2b ). On the other hand, significantly heterogeneous nucleotide compositions were found among lycaenid species ({chi}2 = 134.17; df = 33; P < 0.001), with four taxa having high AT (%AT > 67%) contents and two having high GC contents (%GC > 73%; average %AT = 48.6%). Nucleotide content among nymphalid taxa was found to be statistically homogeneous ({chi}2 = 36.87; df = 27; P = 0.097). All nymphalids examined have high GC contents (average AT = 42.7%; fig. 2b ). Two different measures of codon bias, ENC and scaled {chi}2, indicate similar (and low) average levels of codon bias in each of the three families, ranging from 0.180 to 0.208 (scaled {chi}2) and from 55.553 to 56.802 (ENC; Shields et al. 1988Citation ; Wright 1990Citation ). However, codon preferences are notably different among the different families, and this may contribute to third-position nucleotide heterogeneity. The significance of nonstationary nucleotide heterogeneity and codon preferences among butterfly families for phylogenetic reconstruction is discussed below.

Maximum-Likelihood Analysis of Nucleotide Evolution
ML analyses were used to assess 16 models of sequence evolution using this data set and the two most-parsimonious trees resulting from the unweighted search. Both trees showed that the single parameter having the greatest effect on improving likelihood scores is among-sites rate heterogeneity ({Gamma}). Table 4 shows scores and parameter estimates for the best (highest likelihood score) of these trees; both trees showed very similar likelihood scores and parameters. Likelihood ratio tests performed on nested models found the GTR+{Gamma}+I model fit the data significantly better than all of the other models examined (all models shown in table 4 are nested within the GTR+{Gamma}+I model, Swofford et al. 1996). Estimates of reversible rate parameters among substitution types (A-C, A-G, A-T, C-G, C-T, G-T) are distinct, with high rates between pyrimidines (A-G) and between purines (C-T; R-matrix, see table 4 ). A similar increase in rate parameters in these substitution classes was reported by Mason-Gamer, Weil, and Kellogg (1998)Citation for granule-bound starch synthase in grasses and explains the improvement in likelihood scores with the use of the GTR model as compared with HKY85, which differs from GTR only in allowing different rates for all manners of transitions and transversions.


View this table:
[in this window]
[in a new window]
 
Table 4 Likelihood Scores and Parameter Estimates for Various Models of Evolution, Incorporating Among-Site Rate Heterogeneity ({{Gamma}}) and Invariant Sites (I), as Derived from the wingless Data and One of the Two Most-Parsimonious Topologies Shown in Figure 3

 
Phylogenetic Analyses
When all characters were weighted equally, maximum-parsimony analysis yielded two most-parsimonious trees (CI = 0.364; length = 1,686), differing only in the placement of the nymphalid clade Cercyonis + Morpho (consensus shown in fig. 3 ). Strong bootstrap and decay values support riodinid monophyly, lycaenid monophyly, and a sister relationship between the riodinids and the lycaenids. The sister of the riodinid + lycaenid lineage is less clear. Although all of the closest relatives to the riodinid + lycaenid clade are nymphalids, the nymphalids do not form a monophyletic group; rather, they are paraphyletic, both with respect to the riodinid + lycaenid lineage and with respect to the papilionids. However, basal relationships among the nymphalids are poorly resolved, and there is no bootstrap support >=50% or decay index >1 supporting a paraphyletic Nymphalidae.



View larger version (42K):
[in this window]
[in a new window]
 
Fig. 3.—Strict consensus of two trees resulting from parsimony analysis of unweighted data set (length = 1,686; consistency index = 0.259). Statistics above branches are bootstrap values (>=50%)/decay indices; numbers below branches are mean third-position pairwise divergences

 
The above analyses of nucleotide evolution in this data set suggest that third positions and transitions are susceptible to multiple hits, which could confound phylogenetic analysis. Parsimony analyses using step matrices to downweight transitions with respect to transversions (ti : tv = 2:1, 5:1) recovered one to six most-parsimonious trees, in which consensus topologies were identical to unweighted analyses for the riodinid relationships. Although lycaenid monophyly remained highly supported, downweighting transitions had the effect of placing Curetis as the most basal lycaenid, with 67% bootstrap support. This effect was also found when just third-position transitions were removed from analysis (see below). Although not supported with bootstrap values >=50%, the nymphalid taxa Diaethria, Limenitis, Heliconius, Siproeta, and Hypolimnas formed a lineage with the same topology under all ti/tv weightings, which was consistent with the results of unweighted parsimony. Basal nymphalid relationships, however, were not stable among different ti/tv weightings.

A parsimony search using first and second positions alone yielded 111 trees (length = 389; CI = 0.414), and analysis of amino acids recovered 44 most-parsimonious trees (length = 5,119). Although some riodinid and lycaenid relationships were slightly less resolved than in the unweighted analysis, all nodes with bootstrap support >50% in the unweighted search (fig. 3 ) were also recovered with similar support in both analyses. In both weightings, strict consensus resulted in the collapse of all nymphalid nodes. Thus, at least for this sampling of representatives, wingless may not be a reliable marker for basal nymphalid relationships.

On the other hand, relationships within the Riodinidae and the Lycaenidae were remarkably well supported and stable except for two nodes which showed differences in patterns of support in response to the weighting of third-position transitions. One of these is the interpretation of the basal branching patterns of the lycaenids, especially the placement of Curetis. Downweighting or removing third-position transitions placed Curetis as the most basal branch of the lycaenids, with 66% bootstrap support when third position transitions were excluded (fig. 4 ) and 53% support in the analysis of amino acid characters.



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 4.—Bootstrap tree resulting from parsimony search excluding third-position transitions. A heuristic search found four most-parsimonious trees (length = 886; consistency index = 0.463). Bootstrap values are shown for nodes with support >=50%; other nodes are collapsed

 
An opposite trend is found in the node defining riodinid monophyly, which is well supported in the unweighted analysis (bootstrap value = 96%; fig. 3 ). As third-position transitions are downweighted with respect to other positions, support for this node decreases. For example, consensus of the four most-parsimonious trees recovered when third-position transitions were excluded from analysis shows 72% bootstrap support for riodinid monophyly (fig. 4 ). This suggests that signal from third-position transitions is consistent with support from first and second positions, and, when third-positions transitions are removed, bootstrap support for this node is reduced.

Third-codon-position variability above 20%–30% raw pairwise divergence has been considered to be at saturation level in analyses of other nuclear genes (Friedlander, Regier, and Mitter 1994Citation ). Such highly divergent third positions are often downweighted or removed from phylogenetic analyses because they contribute excessive noise and little to no signal (e.g., Friedlander et al. 1996Citation ; Fratí et al. 1997Citation ; but see Brower and DeSalle 1998Citation ). Although we observed divergence levels which, under these guidelines, implicate third-position saturation in wingless (27%–78%), a significant -g1 statistic (-0.250) calculated from third positions suggests that these substitutions are not all saturated, and at least some of them contain significant hierarchical signal at the P < 0.01 level (Hillis and Huelsenbeck 1992Citation ; but see Källersjö et al. 1992Citation ). Third positions also show increasing mean pairwise divergences at progressively deeper riodinid and lycaenid phylogenetic levels (fig. 3 ). Furthermore, a parsimony search on third positions alone resulted in one most-parsimonious tree (fig. 5 ) in which interpretations of riodinid relationships and monophyly were completely concordant with relationships recovered by first and second positions, with the exception of the placement of one riodinid, Eurybia. This search also recovered the sister relationship between the Riodinidae and the Lycaenidae. On the other hand, this tree did not recover lycaenid monophyly or basal lycaenid relationships as reconstructed by parsimony analyses employing first and second positions. Thus, while third positions appear to provide signal consistent for recovery of riodinid relationships and the riodinid + lycaenid node, third positions may contribute to contradictions or noise leading to reduced resolution and bootstrap support for lycaenids (such as that described above for Curetis).



View larger version (30K):
[in this window]
[in a new window]
 
Fig. 5.—Single most-parsimonious tree resulting from analysis of third positions only (length = 1,264; consistency index = 0.218). The pierid, papilionid, and hesperiid outgroups are underlined. Numbers represent bootstrap values obtained from parsimony analysis of third characters reweighted with respect to the rescaled consistency index

 
Potential Mechanisms for Storing Signal in Third Positions at Deep Nodes
In euteleost fishes, Ortí and Meyer (1996)Citation attribute third-position ependymin signal in deep nodes among orders and families to an accumulation of nonsynonymous substitutions (up to 35% of all third-position substitutions). This is in contrast to our findings: when we examined the characters supporting each node of the third-position most-parsimonious tree, we found that very few were nonsynonymous substitutions. The node supporting the riodinid lineage, for example, was supported by only 1 unambiguous third-position nonsynonymous substitution and 12 synonymous third-position substitutions. Likewise, the riodinid + lycaenid lineage is supported by only two nonsynonymous substitutions but nine synonymous substitutions. In addition, when a third-position-only search was conducted after removing these nonsynonymous characters, the same topology, with both of these nodes intact, was retrieved, indicating that these nonsynonymous changes were not sufficient to explain the signal underlying these nodes. Although wingless sequences show comparable (in fact, higher) levels of third-position divergence than do ependymin sequences, wingless shows less amino acid divergence than ependymin does. Ortí and Meyer (1996)Citation estimated that >=50% amino acid divergence is required before sufficient numbers of third-position nonsynonymous changes accumulate to provide reliable phylogenetic signal (their data show up to 67.5% pairwise divergence). Our data, which show a maximum of 47.7% amino acid divergence among taxa and very few nonsynonymous third positions, corroborate this estimate.

Another mechanism that might be involved in harboring phylogenetic signal in third positions is among-site rate heterogeneity. If third-position sites evolve at significantly heterogeneous rates, genetic distances between distantly related taxa include multiple hits at more quickly evolving sites which become saturated, as well as changes at more slowly evolving sites which retain phylogenetic signal. In other words, signal persists longer in the data if rate heterogeneity is a prominent feature, such that high third-position divergence creates useful variation in conservatively evolving sites while resulting in saturation of others (Yang 1998Citation ). Wide variation in number of steps per character when nucleotide substitutions were reconstructed onto the topology shown in figure 3 suggests that among-site rate variation exists in this data set (table 3 ). Likelihood ratio tests conducted using the unweighted tree topology and the GTR model of evolution to compare hypotheses of among-site rate variation in each nucleotide position individually further identify rate heterogeneity as a significant factor in third positions ({chi}2 = 67.18; P < 0.005; {Gamma} shape parameter ({alpha}) = 3.785), as well as in first and second positions ({chi}2 = 187.66; 38.96, {alpha} = 0.443 and 0.865 for first and second positions, respectively; table 5 ).


View this table:
[in this window]
[in a new window]
 
Table 5 Likelihood Ratio Test for Among-Site Rate Heterogeneity in Various Nucleotide Subsets of wingless under the General Time Reversible Model of Evolution

 
Methods of phylogenetic reconstruction based on an explicit model of evolution (i.e., likelihood and distance frameworks) offer sophisticated ways to compensate for rate heterogeneity; however, most available models do not yet accommodate nonstationary third-position base composition across taxa, which is a significant feature of this data set (fig. 2 ; discussed below). An alternative method allowing one to approach rate heterogeneity in parsimony is to reweight characters according to their consistencies (we used the rescaled consistency index) calculated with reference to a starting topology (Farris 1969Citation ; Carpenter 1988;Citation Yang 1996;Citation Sullivan, Markert, and Kilpatrick 1997). In this way, we enhanced the signal from the most conservatively changing third-position characters based on the topology shown in figure 5 . Analysis of reweighted third-position characters recovered a topology very similar to that generated by unweighted third positions, with increased bootstrap support for several nodes (see fig. 5 ), most notably (1) the node supporting riodinid monophyly (90% bootstrap support) and (2) the riodinid + lycaenid sister relationship (65% bootstrap support). This is consistent with the interpretation that in the riodinids the conservatively evolving third positions have retained useful phylogenetic information.

Nonstationary base composition across taxa is a third mechanism that may contribute to signal in third positions. At the third position, riodinid, lycaenid, and nymphalid lineages show significantly different nucleotide proportions, with high average AT content in riodinids, intermediate and heterogeneous in lycaenids, and low AT content in nymphalids (fig. 2b ). Unequal nucleotide frequencies among taxa has often been cited as a cause of error in phylogenetic reconstruction because taxa with similar base compositions tend to cluster despite lack of shared ancestry (e.g., Saccone, Pesole, and Preparata 1989Citation ; Sidow and Wilson 1990Citation ; Lockhart et al. 1992, 1994Citation ; Hasegawa and Hashimoto 1993Citation ; Steel, Lockhart, and Penny 1993Citation ; unpublished data). However, if base compositional profiles evolved in a nonhomoplasious manner, they may contain phylogenetic information that reflects historical relationships.

To examine the potential contribution of third-position base composition heterogeneity to phylogenetic signal, trees were reconstructed using third-position distances based on the LogDet correction, which compensates for the effects of biased across-taxa nucleotide compositions in phylogenetic reconstruction (Lockhart et al. 1994Citation ). The resulting tree showed no resemblance to the third-position-only parsimony tree. Specifically, the LogDet tree did not recover riodinid, lycaenid, or riodinid + lycaenid monophyly; rather, taxa from all three lineages were grouped together and randomly distributed across the tree. The complementary experiment, in which neighbor-joining methods were used to reconstruct distances based only on third-position GC content (="GC trees" calculated from Euclidean distances as described by Ortí and Meyer 1996Citation ) recovered a topology far less random than the LogDet tree based on third positions. The GC tree recovered all of the riodinids in one clade (with four lycaenid interlopers which had high GC contents). The Lycaenidae, on the other hand, which show heterogeneous GC contents, were not recovered as monophyletic, and neither were the Nymphalidae. The results of these two analyses implicate the contribution of third-position nucleotide composition in the resolution of riodinid relationships.

Conclusions

Wingless is a single-copy nuclear gene that shows a level of variation appropriate for phylogenetic reconstruction of butterfly tribes and families. Despite this degree of variation, wingless is easy to PCR-amplify using the primers subscribing the region used here. Because this region is bounded on the 5' end by an intron sequence, amplification of a longer segment has not been possible from genomic DNA to date. However, wingless sequences from other taxa (e.g., for Bombyx) are available from GenBank, and it may be possible to construct primers for PCR amplification of the exon 5' of the region used here. Although the 5' exon sequence is shorter than the fragment collected here (201 bp in Drosophila; Rijsewijk et al. 1987Citation ), the results from this study suggest that this second region of wingless may provide valuable characters for phylogenetic reconstruction.

Wingless has been cited as a useful source of phylogenetic characters for resolving relationships as deep as subfamily level in the Nymphalidae (Brower and DeSalle 1998Citation ). Our results here show that wingless provides an even greater degree of variation in the riodinids and lycaenids than in the nymphalids and resolves relationships in these two families at the tribal level to the among-families level with high nodal support.

In the riodinids, third-position substitutions continue to provide signal consistent with first- and second-position signal well past the 20%–30% divergence suggested as the saturation point by Friedlander, Regier, and Mitter (1994)Citation . In particular, third-position transitions harbor deep signal that supports riodinid monophyly, as well as resolving more recent riodinid relationships. Multiple factors probably contribute to this signal, including preservation of signal as similar nucleotide contents among related taxa and in more slowly evolving sites. However, third positions (especially transitions) also lend support for a paraphyletic Lycaenidae, an interpretation that conflicts with topologies derived from first and second positions. This contributes to unstable lycaenid relationships with decreased bootstrap values when third positions are included. The influence of third positions in defining basal nymphalid nodes appears to be minimal, although third positions do appear to contribute resolution to recent divergences in the nymphalids.

Different processes may be driving the evolution of wingless third positions in different butterfly families. Indications of this come in the form of different nucleotide compositions and codon preferences in different lineages. These phenomena have been documented in other data sets, both mitochondrial (e.g., Fratí et al. 1997Citation ) and nuclear (unpublished data), especially for higher-level phylogenetic problems. In cases such as these, close attention should be paid to any signals suggesting that third positions behave differently in the phylogenetic interpretation of different taxonomic groups. For this data set, excluding third-position transitions from the analysis (fig. 4 ) provides a conservative parsimony estimate for this overall analysis of relationships; however, this may exclude useful phylogenetic information, especially in the riodinids. New ML models are just beginning to accommodate nonstationary nucleotide and codon frequencies and may eventually help to solve these problems. To date, though, these models are few in number and rich in parameters (see Yang 1997; Galtier and Guoy 1998Citation ).


View this table:
[in this window]
[in a new window]
 
Blosum80 Amino Acid Step Matrix Transformeda from the Log Odds Matrix of Henikoff and Henikoff (1992) for Use in Phylogenetic Reconstruction

 
Acknowledgements

We are especially indebted to Phil DeVries for lending his vast expertise on riodinid biology, for his generous help in collecting and identifying butterflies, and for his improvements in the manuscript. We thank Kelvyn Dunn, Arthur Shapiro, Man-Wah Tan, and David Yeates for assistance in collecting and identifying specimens used in this study. André Mignault and Karen Nutt provided helpful laboratory assistance. Belinda Chang, Brian Farrell, Donald Harvey, Rodney Honeycutt, Jeff Jensen, Toby Kellogg, Carla Penz, Bob Reed, Chris Simon, and two anonymous reviewers provided helpful discussions and comments. Thanks to David Swofford for allowing us to publish results from test versions of PAUP*. This work was supported by a National Science Foundation Doctoral Dissertation Improvement Grant (DEB-9520824) and a National Institute of Health training grant to D.L.C. and by NSF DEB-9615760 to N.E.P. Grants from the Putnam Expedition Fund of the MCZ (to D.L.C. and N.E.P.), the Harvard Department of Organismic and Evolutionary Biology, and the Harvard Graduate Student Council, and support from the La Selva Lodge in Ecuador enabled us to collect many of these specimens. A.V.Z.B.'s contributions to this research were supported by NSF BSR-9220317 and DEB-9303251 and by a postdoctoral fellowship from the Smithsonian Institution and the American Museum of Natural History.

Footnotes

Rodney Honeycutt, Reviewing Editor

1 Present address: University Park, Maryland. Back

2 Keywords: Riodinidae Lycaenidae Nymphalidae molecular phylogenetics wingless, gene utility third codon positions maximum likelihood Back

3 Address for correspondence and reprints: Dana L. Campbell, Department of Biology, Biology/Psychology Building, University of Maryland, College Park, Maryland 20742. E-mail: dcampbell{at}oeb.harvard.edu Back

literature cited

    Bremer, K. 1988. The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42:795–803.

    ———. 1994. Branch support and tree stability. Cladistics 1:295–304.

    Brower, A. V. Z. 1994. Phylogeny of Heliconius butterflies inferred from mitochondrial DNA sequences (Lepidoptera: Nymphalidae). Mol. Phylogenet. Evol. 3:159–174.[Medline]

    Brower, A. V. Z., and R. DeSalle. 1998. Patterns of mitochondrial versus nuclear DNA sequence divergence among nymphalid butterflies: the utility of wingless as a source of characters for phylogenetic inference. Insect Mol. Biol. 7:1–10.[ISI][Medline]

    Brower, A. V. Z., and M. Egan. 1997. Cladistic analysis of Heliconius butterflies and relatives (Nymphalidae: Heliconiiti): a revised phylogenetic position for Eueides based on sequences from mtDNA and a nuclear gene. Proc. R. Soc. Lond. B Biol. Sci. 264:969–977.[ISI]

    Campbell, D. L., and N. E. Pierce. 2000. Relationships of the Riodinidae (Lepidoptera) and phylogenetic placement with respect to other butterfly families: implications for the evolution of ant association? In C. Boggs, W. Watt, and P. Ehrlich, eds. Ecology and evolution taking flight: butterflies as model study systems. University of Chicago Press, Chicago (in press).

    Carpenter, J. M. 1988. Choosing among equally parsimonious cladograms. Cladistics 4:291–296.

    Carroll, S. B., J. Gates, D. N. Keys, S. W. Paddock, G. E. F. Panganiban, J. E. Selegue, and J. A. Williams. 1994. Pattern formation and eyespot determination in butterfly wings. Science 265:109–114.

    Couso, J. P., S. A. Bishop, and A. Martinez Arias. 1994. The wingless signaling pathway and the patterning of the wing margin in Drosophila. Development 120:621–636.

    Crozier, R. H., and Y. C. Crozier. 1993. The mitochondrial genome of the honeybee Apis mellifera: complete sequence and genome organization. Genetics 133:97–117.

    Davis, J. I. 1995. A phylogenetic structure for the monocotyledons, as inferred from chloroplast DNA restriction site variation, and a comparison of measures of clade support. Syst. Bot. 20:503–527.[ISI]

    De Jong, R., R. I. Vane-Wright, and P. R. Ackery. 1996. The higher classification of butterflies (Lepidoptera): problems and prospects. Entomol. Scand. 27:65–101.[ISI]

    DeVries, P. J. 1991. Evolutionary and ecological patterns in myrmecophilous riodinid butterflies. Pp. 143–156 in C. R. Huxley and D. F. Cutler, eds. Oxford surveys in evolutionary biology, Vol. 4. Ant-plant interactions. Oxford University Press, Oxford, England.

    ———. 1997. The butterflies of Costa Rica and their natural history, Vol. 2. Riodinidae. Princeton University Press, Princeton, N.J.

    Donoghue, M. J., R. G. Olmstead, J. F. Smith, and J. D. Palmer. 1992. Phylogenetic relationships of dipsacales based on rbcL sequences. Ann. Mo. Bot. Gard. 79:333–345.

    Ehrlich, P. R. 1958. The comparative morphology, phylogeny and higher classification of the butterflies (Lepidoptera: Papilionoidea). Univ. Kans. Sci. Bull. 8:305–370.

    Ehrlich, P. R., and A. Ehrlich. 1967. The phenetic relationships of the butterflies I. Adult taxonomy and the nonspecificity hypothesis. Syst. Zool. 16:301–317.

    Eliot, J. N. 1973. The higher classification of the Lycaenidae (Lepidoptera): a tentative arrangement. Bull. Br. Mus. Nat. Hist. 281–505.

    Farris, J. S. 1969. A successive approximations approach to character weighting. Syst. Zool. 18:374–385.[ISI]

    Felsenstein, J. 1995. PHYLIP (phylogeny inference package). Version 3.57. Distributed by the author, Department of Genetics, University of Washington, Seattle.

    Fiedler, K. 1991. Systematic, evolutionary and ecological implications of myrmecophily within the Lycaenidae (Insecta: Lepidoptera: Papilionoidea). Bonn. Zool. Monogr. 31:1–210.

    FratÍ, F., C. Simon, J. Sullivan, and D. L. Swofford. 1997. Evolution of the mitochondrial cytochrome oxidase II gene in Collembola. J. Mol. Evol. 44:145–158.[ISI][Medline]

    Friedlander, T. P., J. C. Regier, and C. Mitter. 1994. Phylogenetic information content of five nuclear gene sequences in animals: initial assessment of character sets from concordance and divergence studies. Syst. Biol. 43:511–525.[ISI]

    Friedlander, T. P., J. C. Regier, C. Mitter, and D. L. Wagner. 1996. A nuclear gene for higher level phylogenetics: phosphoenolpyruvate carboxykinase tracks Mesozoic-age divergences within Lepidoptera (Insecta). Mol. Biol. Evol. 13:594–604.[Abstract]

    Galtier, N., and M. Gouy. 1988. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15:871–879.[Abstract]

    Goldman, N. 1993. Statistical tests of models of DNA substitution. J. Mol. Evol. 36:345–361.

    Harvey, D. J. 1987. The higher classification of the Riodinidae. Ph.D. thesis, University of Texas at Austin.

    ———. 1991. Higher classification of the Nymphalidae. Pp. 255–273 in H. F. Nijhout, ed. The development and evolution of butterfly wing patterns. Smithsonian Institution Press, Washington, D.C.

    Hasegawa, M., and T. Hashimoto. 1993. Ribosomal RNA trees misleading? Nature 361:23.

    Hasegawa, M., M. Kishino, and T. Yano. 1985. Dating the human-ape split by a molecular clock of mitochondrial DNA. J. Mol. Evol. 19:171–175.

    Henikoff, S., and J. G. Henikoff. 1992. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89:10915–10919.

    Hillis, D. M., and J. P. Huelsenbeck. 1992. Signal, noise, and reliability in molecular phylogenetic analyses. J. Hered. 83:189–195.[ISI][Medline]

    Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21–132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.

    Källersjö, M., J. S. Farris, A. G. Kluge, and C. Bult. 1992. Skewness and permutation. Cladistics 8:275–287.

    Kimura, M. 1980. A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111–120.[ISI][Medline]

    Kristensen, N. P. 1976. Remarks on the family-level phylogeny of butterflies (Insecta, Lepidoptera, Rhopalocera). Z. Zool. Syst. Evol. Forsch. 14:25–33.

    Kumar, S., K. Tamura, and M. Nei. 1993. MEGA: molecular evolutionary genetics analysis. Version 1.01. Pennsylvania State University, University Park.

    Kyte, J., and R. F. Doolittle. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157:105–132.[ISI][Medline]

    Lockhart, P. J., C. J. Howe, D. A. Bryant, T. J. Beanland, and A. W. D. Larkum. 1992. Substitutional bias confounds inference of cyanelle origins from sequence data. J. Mol. Evol. 34:153–162.[ISI][Medline]

    Lockhart, P. J., M. A. Steel, M. D. Hendy, and D. Penny. 1994. Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11:605–612.[Free Full Text]

    Maddison, W. P., and D. R. Maddison. 1996. MacClade: analysis of phylogeny and character evolution. Version 3.06. Sinauer, Sunderland, Mass.

    Maniatis T. E., F. Fritsch, and J. Sambrook. 1989. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratories, Cold Spring Harbor, N.Y.

    Martin, J. A., and D. P. Pashley. 1992. A molecular systematic analysis of butterfly family and some subfamily relationships (Lepidoptera: Papilionoidea). Ann. Entomol. Soc. Am. 85:127–139.[ISI]

    Mason-Gamer, R. J., C. F. Weil, and E. A. Kellogg. 1998. Granule-bound starch synthase: structure, function, and phylogenetic utility. Mol. Biol. Evol. 15:1658–1673.[Abstract/Free Full Text]

    Naylor, G. J. P., T. M. Collins, and W. M. Brown. 1995. Hydrophobicity and phylogeny. Nature 373:565–566.

    OrtÍ, G., and A. Meyer. 1996. Molecular evolution of Ependymin and the phylogenetic resolution of early divergences among euteleost fishes. Mol. Biol. Evol. 13:556–573.[Abstract]

    Perrimon, N. 1996. Serpentine proteins slither into the wingless and hedgehog fields. Cell 86:513–516.

    Rijsewijk, F., M. Schuermann, E. Wagenaar, P. Parren, D. Weigel, and R. Nusse. 1987. The Drosophila homolog of the mouse mammary oncogene int-1 is identical to the segment polarity gene wingless. Cell 50:649–657.

    Robbins, R. K. 1988a. Comparative morphology of the butterfly foreleg coxa and trochanter (Lepidoptera) and its systematic implications. Proc. Entomol. Soc. Wash. 90:133–154.

    ———. 1988b. Male foretarsal variation in Lycaenidae and Riodinidae and the systematic placement of Styx infernalis (Lepidoptera). Proc. Entomol. Soc. Wash. 90:356–368.

    Saccone, C., G. Pesole, and G. Preparata. 1989. DNA microenvironments and the molecular clock. J. Mol. Evol. 29:407–411.[ISI][Medline]

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing trees. Mol. Biol. Evol. 4:406–425.[Abstract]

    Scoble, M. J. 1986. The structure and affinities of the Hedyloidea: a new concept of the butterflies. Bull. Br. Mus. Nat. Hist. Entomol. Ser. 53:251–286.

    ———. 1988. Hedylidae: a response to Weintraub and Miller. Cladistics 4:93–96.

    Scott, J. A., and D. M. Wright. 1990. Butterfly phylogeny and fossils. Pp. 152–208 in O. Kudrna, ed. Butterflies of Europe. Vol. 2. Aula-Verlag, Wiesbaden, Germany.

    Shields, D. C., P. M. Sharp, D. G. Higgins, and F. Wright. 1988. "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol. Biol. Evol. 5:704–716.[Abstract]

    Sidow, A. 1992. Diversification of the Wnt gene family on the ancestral lineage of vertebrates. Proc. Natl. Acad. Sci. 89:5098–5102.[Abstract]

    Steel, M. A., P. J. Lockhart, and D. Penny. 1993. Confidence in evolutionary trees from biological sequence data. Nature 364:440–442.

    Sullivan, J., J. A. Markert, and W. Kilpatrick. 1997. Phylogeography and molecular systematics of the Peromyscus aztecus species group (Rodentia: Muridae) inferred using parsimony and likelihood. Syst. Bio. 46(3):426–440.

    Swofford, D. L. 1998. PAUP*: phylogenetic analysis using parsimony and other methods. Test versions 4.0.0 d54, d59. Laboratory of Molecular Systematics, Smithsonian Institution, Washington, D.C.

    Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pp. 407–514 in D. Hillis, C. Moritz, and B. K. Mable, eds. Molecular systematics. 2nd edition. Sinauer, Sunderland, Mass.

    Taylor, P. L. 1991. Gene Jockey. Version 1.20. BIOSOFT, Cambridge, England.

    Wakeley, J. 1996. The excess of transitions among nucleotide substitutions: new methods of estimating transition bias underscore its significance. TREE 11:158–163.

    Weller, S. J., D. P. Pashley, and J. A. Martin. 1996. Reassessment of butterfly family relationships using independent genes and morphology. Ann. Entomol. Soc. Am. 89:184–192.[ISI]

    Wright, F. 1990. The ‘effective number of codons' used in a gene. Gene 87:23–29.

    Yang, Z. 1993. Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol. Biol. Evol. 10:1396–1401.[Abstract]

    ———. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39:105–111.[ISI][Medline]

    ———. 1996. Among-site rate variation and its impact on phylogenetic analyses. TREE 11:367–372.

    ——— 1997. Phylogenetic analysis by maximum likelihood (PAML). Version 1.3. Department of Integrative Biology, University of California at Berkeley.

    ———. 1998. On the best evolutionary rate for phylogenetic analysis. Syst. Biol. 47:125–133.[ISI][Medline]

    Yang, Z., N. Goldman, and A. Friday. 1994. Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. Mol. Biol. Evol. 11:316–324.[Abstract]

Accepted for publication January 6, 2000.