Cirripede Phylogeny Using a Novel Approach: Molecular Morphometrics

Bernard Billoud2,*, Marie-Anne Guerrucci{dagger}, Monique Masselot{dagger} and Jean S. Deutsch{dagger}{ddagger}

*Atelier de BioInformatique,
{dagger}Service Commun de Bio-Systématique, and
{ddagger}Equipe Développement et Évolution, Biologie Moléculaire et Cellulaire du Développement, Université Pierre et Marie Curie, Paris, France


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 literature cited
 
We present a new method using nucleic acid secondary structure to assess phylogenetic relationships among species. In this method, which we term "molecular morphometrics," the measurable structural parameters of the molecules (geometrical features, bond energies, base composition, etc.) are used as specific characters to construct a phylogenetic tree. This method relies both on traditional morphological comparison and on molecular sequence comparison. Applied to the phylogenetic analysis of Cirripedia, molecular morphometrics supports the most recent morphological analyses arguing for the monophyly of Cirripedia sensu stricto (Thoracica + Rhizocephala + Acrothoracica). As a proof, a classical multiple alignment was also performed, either using or not using the structural information to realign the sequence segments considered in the molecular morphometrics analysis. These methods yielded the same tree topology as the direct use of structural characters as a phylogenetic signal. By taking into account the secondary structure of nucleic acids, the new method allows investigators to use the regions in which multiple alignments are barely reliable because of a large number of insertions and deletions. It thus appears to be complementary to classical primary sequence analysis in phylogenetic studies.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 literature cited
 
Many methods have been developed to assess phylogenetic relationships among species by using their nucleic acid secondary structures. These methods mostly make use of known structural features and take advantage of compensatory changes (Engberg et al. 1990Citation ) to help build a reliable sequence alignment (Hendriks et al. 1991Citation ; Corpet and Michot 1994;Citation Kjer 1995Citation ). This alignment finally provides the characters used to infer a phylogeny (for examples, see Xiong and Kocher 1993Citation ; Fan et al. 1994Citation ; Ellis and Morrison 1995Citation ; Haase et al. 1995Citation ; Leblanc et al. 1995Citation ; Liu, Kato, and Sugane 1997Citation ; Morrison and Ellis 1997Citation ). Thus, the secondary structure is only indirectly used.

In contrast, the basic idea behind molecular morphometrics is to use the molecular structures as a direct source of measurable information. This method is based on the assumption that secondary structure can be phylogenetically as significant as primary sequence. In other words, one can consider the secondary-structure elements of RNA molecules, i.e., the helices, loops, bulges, and separating single-stranded portions, as phylogenetic characters.

For this study, we focused on the rRNA molecules, widely used in comparisons between primary and secondary structure information. It is well known that rRNA structure is highly conserved throughout evolution (Zwieb, Glotz, and Brimacombe 1981Citation ), presumably because most of the folding is functionally essential (Wheeler and Honeycutt 1988) despite primary sequence divergence (Michot, Qu, and Bachellerie 1990Citation ). The mutations are not evenly distributed throughout the rRNA molecule, but are restricted to some highly variable regions termed "expansion segments" (Hassouna, Michot, and Bachellerie 1984Citation ). These regions differ from the more conserved "core" in having a high evolutionary rate compared with other parts of the ribosome (Larson and Wilson 1989Citation ), resulting in a greater size variability of the structural elements.

As a case study, we applied the molecular morphometrics method to the phylogenetic analysis of Cirripedia. The choice of this subclass of Crustacea was prompted by (1) the availability of several recently published sequences of the nuclear gene of the small-subunit (SSU) RNA (Spears, Abele, and Applegate 1994Citation ) and (2) the numerous phylogenetic problems posed by these organisms, still unsolved by previous molecular analyses. Cirripedes are fixed and sometimes parasitic organisms. They were recognized as crustaceans only when Thomson discovered their typical nauplius larvae in 1828 (see Winsor 1969Citation ). They comprise three superorders, the Thoracica, the Acrothoracica, and the Rhizocephala. Thoracican and acrothoracican adults share a common body plan comprising six thoracic segments bearing limbs modified into cirri, hence the name. They are devoid of any complete abdominal segment. Thoracica are divided into two orders: Sessilia (e.g., Balanus) and Pedunculata (e.g., Lepas). All Rhizocephala (e.g., Loxothylacus) are parasites, with a completely unshaped adult morphology, lacking any trace of segmentation. Nevertheless, most of the time they are classified among Cirripedia on the basis of their larval morphology, quite akin to that of bona fide cirripedes. In particular, they share with Thoracica and Acrothoracica a typical larval stage, the cypris. Ascothoracids differ from the other three superorders just mentioned in that (1) their nauplius larvae do not possess the typical frontal horns present in the other species, and (2) they possess an abdomen, basically composed of five segments, the number of which is often reduced in these parasitic species. As a result, the debate bears on the following points: (1) whether or not the Ascothoracica to be included within the Cirripedia (Schram and Høeg 1995Citation ) and (2) whether or not the Rhizocephala belong to the Cirripedia (Høeg 1992Citation ). Minor points deal with (1) the possible para- or polyphyly of Thoracica as a whole, or Sessilia or Pedunculata, and (2) the order of emergence of the various superorders (Anderson 1994Citation ; Mizrahi et al. 1998Citation ).


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 literature cited
 
Sequences
The nucleotide sequences (about 1,800 nt in length) of 18S ribosomal RNAs (rRNAs) of 11 cirripede species (Spears, Abele, and Applegate 1994Citation ) were extracted from the EMBL database. Two branchiopods, Artemia salina (Nelles et al. 1984Citation ) and Branchinecta packardi (Spears, Abele, and Applegate 1994Citation ), were added as outgroups. Accession numbers and references for all sequences are reported in table 1 .


View this table:
[in this window]
[in a new window]
 
Table 1 List of the Sequences Used in the Phylogenetic Analysis

 
Molecular Morphometrics
Choice of the Zones of Interest
We selected the SSU RNA segments in which variation was due to insertion/deletion events and extracted them from the global alignment of the SSU RNA structure compilation (Van de Peer et al. 1999Citation ). As expected, all of the retained segments corresponded to regions known as "divergent regions" (Hassouna, Michot, and Bachellerie 1984Citation ). More precisely, we considered the regions named 10, E10, 12, 16, 17, E23, 28-29-30, 42-43-44, 45-46, and 49 (fig. 1 ) according to the general structural model (Van de Peer et al. 1999Citation ). The corresponding regions were extracted from the cirripede sequences, leaving aside all alignment information.



View larger version (32K):
[in this window]
[in a new window]
 
Fig. 1.—Schematic representation of the consensus folding of eukaryotic 18S rRNA. Single strands appear as thin lines, and helices appear as bold lines. Gray lines represent regions without a consensus folding. Regions used for molecular morphometrics in this study are boxed

 
Folding and Structural Parameters
The structural model of D. melanogaster 18S rRNA (Neefs and De Wachter 1990Citation ), which is the closest relative to cirripedes in the Collection of Small Subunit Ribosomal RNA Structures (Gutell 1994Citation ), was used to establish a description of the retained folding patterns. The Palingol searching language (Billoud, Kontic, and Viari 1996Citation ) was used to formally express structural descriptions for the divergent regions listed above and in figure 1 and to find their occurrences in the crustacean sequences (fig. 2A) . The flexibility of the search parameters, i.e., the amount of admitted variability for each descriptive numerical value or consensus sequence pattern, was adjusted until a single occurrence of the structure was found in each sequence.



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 2.—The molecular morphometrics method illustrated on a reduced example (four sequences, four characters). A, Unaligned sequences are folded according to a structural model. B, Variable parameters are measured on the secondary structures. C, The characters for each sequence are stored in a table. D, Distance analysis involves the computation of a distance matrix. E, Neighbor joining leads to the distance tree. F, For parsimony analysis, the character matrix is encoded into a Nexus file. G, Tree length minimization leads to the parsimony tree

 
As characters, we considered the number of nucleotides involved in each substructure (fig. 2B ). This value, obtained after folding the sequences, is simply the length of either a double-stranded (helix) or a single-stranded (loop, bulge, separation segment) part. An output table indicates the size of all the structural elements considered for each species (fig. 2C ).

Standardization
Two standardizing procedures were used, each having its own goal. Individual standardization, where each character is centered and reduced according to its own standard deviation, equalized the relative contributions of all the characters regardless of their respective numerical values. Alternatively, we took into account the fact that different evolutionary rates may occur in single-stranded segments as compared with helices (see Results and Discussion). In this standardization scheme, all characters of the same nature (single or double strand) were weighted identically, with the weight being proportional to the inverse of the evolutionary rate. As a weighting ratio, we used (1) the extreme values found by studies with related sequences, where computed evolutionary rate ratios range from 0.61 (Springer, Hollar, and Burk 1995Citation ) to 0.8 (Dixon and Hillis 1993Citation ), and (2) an estimation of this ratio in our data: we averaged the standard deviation computed for each double-strand character individually and divided this value by its single-strand counterpart, thus obtaining a ratio of 0.653. The result of this differential weighting was to give the two sets of characters the same overall contribution to the tree. Both distance and parsimony analyses were performed with and without standardization.

Distance Computation and Neighbor Joining
Character matrices were bootstrapped 1,000 times (Felsenstein 1985Citation ). Trees were then computed by the neighbor-joining method (Saitou and Nei 1987Citation ) in the MUST package (Philippe 1993Citation ), and bootstrap values were computed for each node of the best tree computed from the whole set of characters (fig. 2E ).

Parsimony: Character Encoding and Weighting
The character states are quantitative and discrete. They were encoded according to the NEXUS (Swofford 1991Citation ) format as ordered and unoriented characters (fig. 2F ). The step count for each considered event is therefore independent on the number of nucleotides which it involves. We also explored another encoding form, which takes the actual quantitative nature of characters into account: numerical values (ranging from 0 to 19 within the whole table) were encoded as character states represented by the letters from A (0) to T (19). The number of steps required for all pairwise character changes were computed as the positive algebraic difference between the corresponding numerical values. Trees were then computed with the PAUP computer package (Swofford 1991Citation ) using the branch-and-bound method (fig. 2G ).

Primary Sequence Analysis
Multiple Alignment
A multiple alignment of the 13 sequences was first performed using CLUSTAL W (Thompson, Higgins, and Gibson 1994Citation ) and was then refined by eye with the help of the editing facilities of the MUST package (Philippe 1993Citation ).

Sequence Realignment
The regions containing many gaps, which were difficult to unambiguously align in the first step, were analyzed using secondary-structure information. For this, the same structural information as that considered for molecular morphometrics analysis (see above) was used to refine the alignment in these sequence segments, thus modifying the initial multiple alignment (fig. 3 ).



View larger version (78K):
[in this window]
[in a new window]
 
Fig. 3.—Nucleotide sequences of the 18S RNA alignment. Dashes refer to deletions, whereas points indicate identity of nucleotide sequence with Artemia salina. Segments that have been used in molecular morphometric analysis are indicated by double arrows

 
Distance
Distances calculated with transitions only were compared with distances obtained with transversions only. No significant saturation could be observed (data not shown). The distance matrix computation could therefore be based on all 1,926 sites, including transitions, transversions, and deletions. Tree construction by the neighbor-joining method (Saitou and Nei 1987Citation ) and bootstrap values of the nodes were computed using the MUST package (Philippe 1993Citation ) set to 1,000 resamplings.

Parsimony
Parsimony analysis was performed using the PAUP package (Swofford 1991Citation ). The branch-and-bound algorithm led to a single shortest tree based on 590 informative sites. Bootstrapping was performed on 1,000 replicates using the branch-and-bound algorithm.

Software
Two computer programs were developed for molecular morphology analysis: one allows bootstrapping and computation of distance matrices for distance analysis, and the other encodes the characters in the NEXUS format, suitable for parsimony analysis. Options in these programs allow one to choose between different distance and parsimony computation methods and to standardize or not standardize individual characters and sets of characters. The programs are available as C sources at the URL http://wwwabi.snv.jussieu.fr/~billoud/MoMo/momo.html.


    Results
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 literature cited
 
Molecular Morphometrics
Thirteen crustacean species (table 1 ) were classified using the parameters describing their 18S RNA expansion segments (fig. 1 ) as characters (fig. 2 ). We obtained 119 variable characters (of which 98 were informative according to the parsimony criterion): 87 variable (70 informative) within single-stranded regions and 32 variable (28 informative) within double-stranded regions. These two data sets were first considered separately, then all characters were taken together.

The tree presented in the left part of figure 4 was computed by neighbor joining (Saitou and Nei 1987Citation ) on a euclidean distance matrix after standardization. The Acrothoracica species (Berndtia and Trypetesa) were found to form a monophyletic taxon with the other cirripedes. This partition was supported by a high bootstrap proportion (BP) (89%); the existence of a clade (Cirripedia sensu stricto + Loxothylacus) was also highly supported (100%). Within the Thoracica, the Sessilia were grouped, while the Pedunculata were not. However, these internal nodes received minimal (<60%) bootstrap support.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 4.—Phylogenetic trees obtained from the molecular morphometrics data. S = Sessilia; P = Pedunculata; R = Rhizocephala; Ac = Acrothoracica; As = Ascothoracica; O = outgroups (Branchiopoda). Horizontal lines are proportional to branch lengths. Left, Tree computed by the neighbor-joining method (Saitou and Nei 1987Citation ) based on a distance matrix on individually weighted characters. Numbers indicate the bootstrap proportion (1,000 resamplings) for each node; only values greater than 60% are reported. Right, One of the five maximum-parsimony trees (length = 320; consistency index = 0.663; retention index = 0.749) obtained by encoding all characters as ordered. The branch-and-bound method was performed with PAUP (Swofford 1991Citation ). The tree shown is the one with the highest bootstrap support. Bootstrap proportions higher than 50% are indicated (1,000 resamplings)

 
The parsimony tree presented in the right part of figure 4 was obtained after encoding the molecular morphometric parameters as ordered characters. Standardization was applied on the basis of either the standard deviation of each character or the type of region in which the character was observed (single-stranded or helix when working with all characters). In the latter case, we computed trees with the extreme values found in the literature for the relative mutation rates, 0.61 (Springer, Hollar, and Burk 1995Citation ) and 0.80 (Dixon and Hillis 1993Citation ), and with the rate computed from our own data, 0.653. Note that this weighting value was not a critical parameter, since the trees built with single-stranded or double-stranded characters gave congruent results; only some branch lengths moderately varied. All of the five best trees (retention index [RI] = 0.749) showed the Acrothoracica as the sister group to a Rhizocepha + Thoracica clade (94% BP). The RI of the shortest tree where the ascothoracican Ulophysema and the two acrothoracican species together formed a clade was 0.733. Loxothylacus was always found as a sister group to the Cirripedia sensu stricto (100% BP). Internal branching within the Thoracica showed a Pedunculata group but received minimal support except for the (Calantica, Lepas) node (BP = 84%).

Primary Sequence Analysis
The initial multiple alignment was 2,163 positions long. After refinement according to the secondary structures, the alignment was significantly shortened by 237 sites, ending with 1,926 positions. Among these, 781 were variable and 590 were informative under the parsimony criterion. In further analyses, gaps were referenced as a fifth state. Distance and parsimony methods applied to the aligned sequences yielded very similar tree topologies (fig. 5 ). Both the neighbor-joining tree and the most parsimonious tree (RI = 0.836) supported the morphological views arguing for the monophyly of Cirripedia sensu stricto (Thoracica + Rhizocephala + Acrothoracica) (92% and 86% BP, respectively). Similarly, there is strong support (100% BP) for a sister relationship between Rhizocephala, represented by Loxothylacus, and Thoracica. The monophyly of the Thoracica (Pedunculata + Sessilia) and that of Sessilia are also supported (100% and 95% BP). The monophyly of the Pedunculata was not supported (fig. 5 ).



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 5.—Phylogenetic relationships between Cirripedes inferred from the 18S rRNA primary sequences. S = Sessilia; P = Pedunculata; R = Rhizocephala; Ac = Acrothoracica; As = Ascothoracica; O = outgroups (Branchiopoda). A, Tree constructed using the neighbor-joining method (Saitou and Nei 1987Citation ). Pairwise distances are based on 1,926 retained sites. Branch lengths reflect distances between taxa. Numbers above internal branches indicate the bootstrap proportion of the corresponding node based on 1,000 resamplings. B, Tree generated using the maximum-parsimony method with exhaustive search and equal character weighting, using 590 informative sites in PAUP (Swofford 1991Citation ). Branch lengths reflect the number of steps. (length = 1,298; consistency index = 0.790; retention index = 0.836). Numbers on the branches indicate the bootstrap proportions (%) based on 1,000 resamplings (branch-and-bound method)

 
It should be pointed out that the secondary-structure information used to enhance the alignments was not exclusively responsible for the tree topology. A similar analysis was indeed made from the initial alignment, excluding the hypervariable regions (see Materials and Methods). It led to a topology similar on the major points, which shows that the different character types do carry a convergent phylogenetic signal.


    Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 literature cited
 
Cirripede Phylogeny
Our results bring clear answers to the major phylogenetic issues mentioned: Ascothoracida are to be considered the sister taxon of a cirripedian clade composed of (Acrothoracica + (Thoracica + Rhizocephala)). This result is obtained with good statistical support with the molecular morphometrics analysis. It is in full agreement with the most recent phylogenetic analyses based on morphological data (Schram and Høeg 1995Citation ), as well as molecular analyses performed on completely different sets of data, the complement of developmental genes (Gibert, Mouchel-Vielh, and Deutsch 1997Citation ; Mouchel-Vielh et al. 1998Citation ). Analysis of the SSU RNA sequences by Spears, Abele, and Applegate (1994) led to an (Ascothoracica + Acrothoracica) clade, which seems to be an artifact. In our hands, primary sequence analysis of the same data set instead confirmed the results mentioned for molecular morphometrics. Monophyly of the Thoracica also appears to be strongly supported, at least given the present species sampling. In contrast, monophyly of the Sessilia does not receive significant support based on these data.

Molecular Morphometrics
In many studies involving secondary-structure analysis as a tool for inferring phylogenies, RNA folding is used to refine the alignment. Although the "correct" alignment cannot be determined with certainty, in extensively folded molecules, this procedure has been shown to improve the multiple alignment and hence the phylogeny reconstruction (Titus and Frost 1996Citation ). In such a case, the use of secondary structures remains an intermediary step in the sequence alignment process; the phylogeny is based on nucleotide comparison or, as recently proposed, on base pair changes in stems (Otsuka, Terai, and Nakano 1999Citation ).

We present here a new method, called molecular morphometrics, in which the measurable structural parameters of the molecules are directly used as specific characters to construct a phylogenetic tree. These structures are inferred from the sequence of the nucleotides, often using energy minimization (Zuker 1994Citation ; Gaspin and Westhof 1995Citation ; Zuker and Jacobson 1998Citation ; Rivas and Eddy 1999Citation ), or fitted to a known model (Billoud, Kontic, and Viari 1996Citation ). Moreover, as the secondary structures are built on each sequence separately, no computation of a sharp sequence alignment is needed. Then, recognizing homologous characters on these structures appears to be easier than finding the right counterpart for each nucleotide in every other sequence.

Character Independence
Once a structural model is established, several types of characters can provide phylogenetic information. Total-evidence studies use a nonnumerical description of the molecules, together with morphological traits (Winnepenninckx, Reid, and Backeljau 1998Citation ). For molecular morphometrics analysis, numerical characters can be either continuous (atomic distances or angles, pairing energies, relative simplicity factor, etc.) or discrete (number of elements or monomers involved in a structural feature, etc.). These variables are obviously linked to one another and cannot be simultaneously used in a given study. In this work, we chose to simply count the number of nucleotides involved in each substructure. The unitary modifications required to change from one structure to another are basically insertions and deletions, as these events clearly modify the length of structural elements. Substitutions, however, can also lead to such variations, for instance, by allowing a new base pairing between the two bases at the foot of a helix. Thus, special care must be taken to acknowledge correlations between characters; including a nucleotide in (or excluding a nucleotide from) a helix can necessitate subtracting it from (respectively, adding it to) the adjacent single-stranded segment(s). For instance, if a base substitution occurs at the foot of a helix, disrupting the last base pairing, this single event increases the size of the two adjacent single-stranded regions. In such cases, only one of the correlated characters can be used, and this is what we did in this study in the few rare cases that we observed.

Other potential correlations may involve long-distance interactions implied in the secondary structure of ribosomal RNA. Expansion segments seem to be subject to some selective pressure that maintains their particular structure, at both the secondary (Engberg et al. 1990Citation ; Ruiz Linares, Hancock, and Dover 1991Citation ) and the tertiary (Sweeney, Chen, and Yao 1994Citation ) levels. Compensatory mutations were suspected to be related to intramolecular interactions (Expert-Bezançon and Wollenzien 1985Citation ; Haselman, Camp, and Fox 1989Citation ; Woese and Gutell 1989Citation ; Gutell, Larson, and Woese 1994Citation ) and/or extramolecular interactions (Gerbi 1986Citation ; Stern, Weiser, and Noller 1988;Citation Thanaraj 1994Citation ), especially with the ribosomal proteins and other RNAs (Hancock, Tautz, and Dover 1988Citation ). However, there is, to date, no evidence for such interactions, and expansion segments seem to behave like "extraneous elements" of the ribosome (Nunn et al. 1996Citation ). Therefore, the variable regions, which are expected to provide us with the greatest phylogenetic information, are those where the functional constraints are the most likely to be relaxed (Gerbi 1986Citation ).

Weighting and Standardizing Characters
Like most phylogenetic characters, the structural parameters studied here are subject to the problem of character weighting; mixing single-stranded and double-stranded regions in the same distance evaluation is questionable, as they are not expected to follow the same mutation mechanisms (Douzery and Catzeflis 1995Citation ) and hence they may evolve at different rates. In particular, the constraints to maintain pairing in stems seem to lower the rate of accumulation of transversions (Springer and Douzery 1996Citation ; Tillier and Collins 1998Citation ), resulting in a differential base composition (Vawter and Brown 1993Citation ; Springer, Hollar, and Burk 1995Citation ). Usually, nucleotide mutations of the helices are considered more phylogenetically informative than those of the single-stranded regions (Ellis and Morrison 1995Citation ; Morrison and Ellis 1997Citation ). However, authors disagree on the relative amount of evolution in the two data sets. In eukaryotic 5S rRNA, nucleotide changes were even found to be more frequent in the stems than in the loops (Otsuka, Nakano, and Terai 1997Citation ). Studying the 12S rRNAs of mammals, Springer, Hollar, and Burk (1995)Citation computed that the base substitution rates in single strands versus double strands are in a ratio of 1 to 0.61; Dixon and Hillis (1993)Citation stated that in 28S rRNAs of some vertebrates, evolutionary rates for single- and double-stranded regions did not differ by more than 20%. For molecular morphometrics characters, we can compare the evolutionary rates in the single-stranded and double-stranded regions by computing the ratio of their mean standard deviations. We obtained a value of 1–0.653, which is between the two latter values. When comparing characters evolving at different rates, proportional weighting is often used. The effect of such a treatment is to give each set of characters the same overall contribution to the final result. The parsimony tree presented here was computed with such a weighting. However, it is noticeable that in our case, single-stranded and double-stranded regions taken separately did give the same tree topology, with only small differences in branch lengths. Obviously, any weight applied to these two sets will also give the same tree topology; thus, the rate value is not a critical issue.

Molecular Morphometrics Versus Anatomical Morphometrics and Sequence Comparison
Molecular morphometrics inherits some features from both anatomical quantitative morphometrics and molecular primary sequence comparison. The new method can therefore be compared with both of these methods.

An important difference between anatomical and molecular characters is the number and nature of their determinant genetic "sites": anatomical variations are often due to more than one gene, and, more importantly, the genetic sites responsible for the morphological characters of interest are usually unknown. Molecular structural variations, on the contrary, are due to a reduced set of identifiable mutations, which can be characterized at the single-nucleotide level. Thus, the evolutionary meanings of two independent molecular variations are known to be of the same nature and, hence, comparable. Moreover, the observed anatomical characters are the result of both the genetic characters themselves and some possible epigenetic effects, such as environmental influences, or some gene expression features, such as sexual dimorphism. On the contrary, like any molecular phylogeny method (including primary sequence comparison), molecular morphometrics takes advantage of the fact that the molecular characters remain independent of their somatic expression (Smith, Lafay, and Christen 1992Citation ).

Compared with sequence comparison, molecular morphometrics involves fewer characters, but each of these characters can take many different values. Molecular morphometrics and sequence comparison differ mainly on a methodological point: in a nucleotide sequence, the structural polymorphism appears as size variations, mostly in regions where insertion/deletion events take place (although point mutations can also lead to structural polymorphism). In other words, these regions are those in which the multiple sequence comparison programs lead to poorly reliable alignments, as the signal/noise ratio they provide is too low (Wheeler 1994Citation ; Grundy and Naylor 1999Citation ). In such cases, different candidate alignments can produce very different trees (Xiong and Kocher 1993Citation ). Usually, such regions are realigned by hand, as we did in this study. Although calibration methods have been developed to take the mutation rate differences into account (Van de Peer et al. 1993;Citation Wheeler, Gatesy, and DeSalle 1995Citation ; Otsuka, Nakano, and Terai 1997Citation ), there is at present no method able to automatically reconstitute an alignment close enough to the one that can be obtained by hand with the help of a structural model (Morrison and Ellis 1997Citation ). Therefore, regions subjected to insertion/deletion events are often not included at all in sequence studies (Okamoto, Sekito, and Yoshida 1996;Citation Liu, Kato, and Sugane 1997Citation ). As shown in the present work, the secondary structure, although depending on the primary nucleotide sequence, can in fact be considered as a distinct set of informative characters providing their own phylogenetic signal. It therefore makes sense to examine whether the phylogenetic interpretation of data differing in nature will lead to the same result. Every object involved in the evolutionary process has its own variation mode, due to specific mutation and selection mechanisms. However, all of them keep, in different forms, the tracks of the same story. In our case, the phylogenetic tree produced by molecular morphometrics perfectly matches that found after realigning the expansion segments. It is noteworthy that the same holds true for anatomical studies (see above). This congruence reinforces the hypothesized phylogeny, which appears to be the result of a common story rather than an artifact of an analysis process. In that sense, the molecular morphometrics method, allowing for an automated treatment of the variations in structured sequences, appears to be complementary to primary sequence comparison.

Field of Application
In its present version, molecular morphometrics considers only size variations of homologous structural segments. This choice implies that the overall architecture of the molecule remains the same among the observed taxa in order that the "continuity argument" can easily apply to the definition of homologous structures. As a consequence, the molecules have to be rather close to one another and undergo no topological rearrangement while evolving. The taxonomic level at which molecular morphometrics is operational is preferably that of the ordinal or supraordinal rank, depending on the degree of conservation of the considered molecule. With molecules like rRNA, where some domains evolve at different rates, higher levels may be addressed.

What is usually presented in the literature involves the secondary structures being used, for instance, to study protists (Lenaers et al. 1988Citation ), ciliates (Gagnon, Bourbeau, and Levesque 1996Citation ), all eukaryotes (Hendriks et al. 1991Citation ), and up to the three primary kingdoms (Clark 1987Citation ; Bachellerie and Michot 1989Citation ). In all of these cases, comparisons are indeed made on great scale variations. Descriptions of the molecular shape point to conserved regions, while the presence or absence of one or more substructures are shown to be a specific trait characteristic of one clade (Wolff and Kuck 1990Citation ; Spears, Abele, and Applegate 1994Citation ; Gagnon, Bourbeau, and Levesque 1996Citation ; Liu, Kato, and Sugane 1997Citation ; Aleshin et al. 1998Citation ). Such treatments, however, remain quite informal. Some attempts have been made to formalize the secondary-structure comparisons, mainly in relation to the problem of identifying common (sub)structures in a set of sequences (Le, Nussinov, and Maizel 1989Citation ; Benedetti and Morosetti 1996Citation ). However, difficulties persist regarding the topic of defining a distance between two related structures with variable topologies (Shapiro 1988; Margalit et al. 1989Citation ; Shapiro and Zhang 1990Citation ; Magarshak and Benham 1992Citation ; Fontana et al. 1993Citation ; Nakaya, Yonezawa, and Yamamoto 1996Citation ).

It can be expected that insights into the secondary-structure evolution processes will help to determine weightings for the different events (Hancock, Tautz, and Dover 1988Citation ; Hancock 1995Citation ; Muse 1995Citation ). In particular, special mechanisms, like slippage, seem to play a major role in rRNA evolution (Tautz, Trick, and Dover 1986Citation ; Hancock and Dover 1988;Citation Vogler, Welsh, and Hancock 1997Citation ; Crease and Taylor 1998;Citation Tautz et al. 1998Citation ). A convincing model accounting for the evolution of structural variations may allow us to gather the different features of the variation of the molecule in a single data matrix, thus expanding the capabilities of molecular morphometrics to the study of high-level taxa.


    Conclusions
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 literature cited
 
Molecular morphometrics is intended to analyze phylogenetic relationships based on similarities between some structural characteristics of folded nucleotide molecules. In a test, molecular morphometrics succeeded in resolving some of the major points of dispute about cirripedian phylogeny. This method allows one to take into account the regions where multiple alignments are barely reliable because of a large number of insertions and deletions. For these reasons, molecular morphometrics appears to be complementary to classical primary sequence analysis in phylogenetic studies.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 literature cited
 
The authors wish to thank Alain Viari for having initiated the work and for helpful discussions, and the referees for their constructive remarks, which provided great help in improving the manuscript. This work was supported by the Ministère de l'Education et de la Recherche, ACC SV7, Réseau National de Systématique.


    Footnotes
 
Manolo Goly, Reviewing Editor

1 Keywords: molecular phylogeny secondary structure small-subunit RNA cirripede Crustacea Back

2 Address for correspondence and reprints: Bernard Billoud, Atelier de BioInformatique, Université Pierre et Marie Curie, 75252 Paris cedex 05, France. E-mail: bernard.billoud{at}snv.jussieu.fr Back


    literature cited
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Conclusions
 Acknowledgements
 literature cited
 

    Aleshin, V. V., O. S. Kedrova, I. A. Milyutina, N. S. Vladychenskaya, and N. B. Petrov. 1998. Secondary structure of some elements of 18S rRNA suggests that strongylid and a part of rhabditid nematodes are monophyletic. FEBS Lett. 429:4–8[ISI][Medline]

    Anderson, D. T. 1994. Barnacles. Structure, function, development and evolution. Chapman and Hall, London

    Bachellerie, J. P., and B. Michot. 1989. Evolution of large subunit rRNA structure. The 3' terminal domain contains elements of secondary structure specific to major phylogenetic groups. Biochimie 71:701–709

    Benedetti, G., and S. Morosetti. 1996. A graph-topological approach to recognition of pattern and similarity in RNA secondary structures. Biophys. Chem. 59:179–184[ISI][Medline]

    Billoud, B., M. Kontic, and A. Viari. 1996. Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence databases. Nucleic Acids Res. 24:1395–1403[Abstract/Free Full Text]

    Clark, C. G. 1987. On the evolution of ribosomal RNA. J. Mol. Evol. 25:343–350[ISI][Medline]

    Corpet, F., and B. Michot. 1994. RNAlign program: alignment of RNA sequences using both primary and secondary structures. Comput. Appl. Biosci. 10:389–399[Abstract]

    Crease, T. J., and D. J. Taylor. 1998. The origin and evolution of variable-region helices in V4 and V7 of the small-subunit ribosomal RNA of Branchiopod crustaceans. Mol. Biol. Evol. 15:1430–1446[Free Full Text]

    Dixon, M. T., and D. M. Hillis. 1993. Ribosomal RNA secondary structure: compensatory mutations and implications for phylogenetic analysis. Mol. Biol. Evol. 10:256–267[Abstract]

    Douzery, E., and F. M. Catzeflis. 1995. Molecular evolution of the mitochondrial 12S rRNA in Ungulata (Mammalia). J. Mol. Evol. 41:622–636[ISI][Medline]

    Ellis, J., and D. Morrison. 1995. Effects of sequence alignment on the phylogeny of Sarcocystis deduced from 18S rDNA sequences. Parasitol. Res. 81:696–699[ISI][Medline]

    Engberg, J., H. Nielsen, G. Lenaers, O. Murayama, H. Fujitani, and T. Higashinakagawa. 1990. Comparison of primary and secondary 26S rRNA structures in two Tetrahymena species: evidence for a strong evolutionary and structural constraint in expansion segments. J. Mol. Evol. 30:514–521[ISI][Medline]

    Expert-Bezançon, A., and P. L. Wollenzien. 1985. Three-dimensional arrangement of the Escherichia coli 16 S ribosomal RNA. J. Mol. Biol. 184:53–66[ISI][Medline]

    Fan, M., B. P. Currie, R. R. Gutell, M. A. Ragan, and A. Casadevall. 1994. The 16S-like, 5.8S and 23S-like rRNAs of the two varieties of Cryptococcus neoformans: sequence, secondary structure, phylogenetic analysis and restriction fragment polymorphisms. J. Med. Vet. Mycol. 32:163–180[ISI][Medline]

    Felsenstein, J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791

    Fontana, W., D. A. Konings, P. F. Stadler, and P. Schuster. 1993. Statistics of RNA secondary structures. Biopolymers 33:1389–1404

    Gagnon, S., D. Bourbeau, and R. C. Levesque. 1996. Secondary structures and features of the 18S, 5.8S and 26S ribosomal RNAs from the Apicomplexan parasite Toxoplasma gondii. Gene 173:129–135

    Gaspin, C., and E. Westhof. 1995. An interactive framework for RNA secondary structure prediction with a dynamical treatment of constraints. J. Mol. Biol. 254:163–174[ISI][Medline]

    Gerbi, S. A. 1986. The evolution of eukaryotic ribosomal DNA. Biosystems 19:247–258

    Gibert, J. M., E. Mouchel-Vielh, and J. S. Deutsch. 1997. engrailed duplication events during the evolution of barnacles. J. Mol. Evol. 44:585–594[ISI][Medline]

    Grundy, W. N., and G. J. Naylor. 1999. Phylogenetic inference from conserved sites alignments. J. Exp. Zool. 285:128–139[ISI][Medline]

    Gutell, R. R. 1994. Collection of small subunit (16S- and 16S-like) ribosomal RNA structures: 1994. Nucleic Acids Res. 22:3502–3507[Abstract]

    Gutell, R. R., N. Larsen, and C. R. Woese. 1994. Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. Microbiol. Rev. 58:10–26[Abstract]

    Haase, G., L. Sonntag, Y. Van de Peer, J. M. Uijthof, A. Podbielski, and B. Melzer-Krick. 1995. Phylogenetic analysis of ten black yeast species using nuclear small subunit rRNA gene sequences. Antonie Van Leeuwenhoek 68:19–33

    Hancock, J. M. 1995. The contribution of DNA slippage to eukaryotic nuclear 18S rRNA evolution. J. Mol. Evol. 40:629–639[ISI][Medline]

    Hancock, J. M., and G. A. Dover. 1988. Molecular coevolution among cryptically simple expansion segments of eukaryotic 26S/28S rRNAs. Mol. Biol. Evol. 5:377–391[Abstract]

    Hancock, J. M., D. Tautz, and G. A. Dover. 1988. Evolution of the secondary structures and compensatory mutations of the ribosomal RNAs of Drosophila melanogaster. Mol. Biol. Evol. 5:393–414[Abstract]

    Haselman, T., D. G. Camp, and G. E. Fox. 1989. Phylogenetic evidence for tertiary interactions in 16S-like ribosomal RNA. Nucleic Acids Res. 17:2215–2221[Abstract]

    Hassouna, N., B. Michot, and J. P. Bachellerie. 1984. The complete nucleotide sequence of mouse 28S rRNA gene. Implications for the process of size increase of the large subunit rRNA in higher eukaryotes. Nucleic Acids Res. 12:3563–3583[Abstract]

    Hendriks, L., R. De Baere, Y. Van de Peer, J. Neefs, A. Goris, and R. De Wachter. 1991. The evolutionary position of the rhodophyte Porphyra umbilicalis and the basidiomycete Leucosporidium scottii among other eukaryotes as deduced from complete sequences of small ribosomal subunit RNA. J. Mol. Evol. 32:167–177[ISI][Medline]

    Høeg, J. T. 1992. The phylogenetic position of the Rhizocephala: are they truly barnacles? Acta Zool. 73:323–326

    Kjer, K. M. 1995. Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from the frogs. Mol. Phylogenet. Evol. 4:314–330[ISI][Medline]

    Larson, A., and A. C. Wilson. 1989. Patterns of ribosomal RNA evolution in salamanders. Mol. Biol. Evol. 6:131–154[Abstract]

    Le, S. Y., R. Nussinov, and J. V. Maizel. 1989. Tree graphs of RNA secondary structures and their comparisons. Comput. Biomed. Res. 22:461–473[ISI][Medline]

    Leblanc, C., B. Kloareg, S. Loiseaux-deGoer, and C. Boyen. 1995. DNA sequence, structure, and phylogenetic relationship of the mitochondrial small-subunit rRNA from the red alga Chondrus crispus (Gigartinales rhodophytes). J. Mol. Evol. 41:196–202[ISI][Medline]

    Lenaers, G., H. Nielsen, J. Engberg, and M. Herzog. 1998. The secondary structure of large-subunit rRNA divergent domains, a marker for protist evolution. Biosystems 21:215–222

    Liu, D. W., H. Kato, and K. Sugane. 1997. The nucleotide sequence and predicted secondary structure of small subunit (18S) ribosomal RNA from Spirometra erinaceieuropaei. Gene 184:221–227

    Magarshak, Y., and C. J. Benham. 1992. An algebraic representation of RNA secondary structures. J. Biomol. Struct. Dyn. 10:465–488[ISI][Medline]

    Margalit, H., B. A. Shapiro, A. B. Oppenheim, and J. V. Maizel Jr. 1989. Detection of common motifs in RNA secondary structures. Nucleic Acids Res. 17:4829–4845[Abstract]

    Michot, B., L. H. Qu, and J. P. Bachellerie. 1990. Evolution of large-subunit rRNA structure. The diversification of divergent D3 domain among major phylogenetic groups. Eur. J. Biochem. 188:219–229[Abstract]

    Mizrahi, L., Y. Achituv, D. J. Katcoff, and R. Perl-Treves. 1998. Phylogenetic position of Ibla (Cirripedia: Thoracica) based on 18S rDNA sequence analysis. J. Crustac. Biol. 18:363–368[ISI]

    Morrison, D. A., and J. T. Ellis. 1997. Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Mol. Biol. Evol. 14:428–441[Abstract]

    Mouchel-Vielh, E., C. Rigolot, J. M. Gibert, and J. S. Deutsch. 1998. Molecules and the body plan: the Hox genes of Cirripedes (Crustacea). Mol. Phylogenet. Evol. 9:382–389[ISI][Medline]

    Muse, S. V. 1995. Evolutionary analyses of DNA sequences subject to constraints of secondary structure. Genetics 139:1429–1439

    Nakaya, A., A. Yonezawa, and K. Yamamoto. 1996. Classification of RNA secondary structures using the techniques of cluster analysis. J. Theor. Biol. 183:105–117[ISI][Medline]

    Neefs, J. M., and R. De Wachter. 1990. A proposal for the secondary structure of a variable area of eukaryotic small ribosomal subunit RNA involving the existence of a pseudoknot. Nucleic Acids Res. 18:5695–5704[Abstract]

    Nelles, L., B. L. Fang, G. Volckaert, A. Vandenberghe, and R. De Wachter. 1984. Nucleotide sequence of a crustacean 18S ribosomal RNA gene and secondary structure of eukaryotic small subunit ribosomal RNAs. Nucleic Acids Res. 12:8749–8768[Abstract]

    Nunn, G. B., B. F. Theisen, B. Christensen, and P. Arctander. 1996. Simplicity-correlated size growth of the nuclear 28S ribosomal RNA D3 expansion segment in the crustacean order Isopoda. J. Mol. Evol. 42:211–223[ISI][Medline]

    Okamoto, K., T. Sekito, and K. Yoshida. 1996. The secondary structure and phylogenetic relationship deduced from complete nucleotide sequence of mitochondrial small subunit rRNA in yeast Hansenula wingei. Genes Genet. Syst. 71:69–74[ISI][Medline]

    Otsuka, J., T. Nakano, and G. Terai. 1997. A theoretical study on the nucleotide changes under a definite functional constraint of forming stable base-pairs in the stem regions of ribosomal RNAs; its application to the phylogeny of eukaryotes. J. Theor. Biol. 184:171–186[Medline]

    Otsuka, J., G. Terai, and T. Nakano. 1999. Phylogeny of organisms investigated by the base-pair changes in the stem regions of small and large ribosomal subunit RNAs. J. Mol. Evol. 48:218–235[ISI][Medline]

    Philippe, H. 1993. MUST, a computer package of management utilities for sequences and trees. Nucleic Acids Res. 21:5264–5272[Abstract]

    Rivas, E., and S. R. Eddy. 1999. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 285:2053–2068[ISI][Medline]

    Ruiz-Linares, A., J. M. Hancock, and G. A. Dover. 1991. Secondary structure constraints on the evolution of Drosophila 28S ribosomal RNA expansion segments. J. Mol. Biol. 219:381–390[ISI][Medline]

    Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425[Abstract]

    Schram, F. R., and J. T. Høeg. 1995. New frontiers in barnacle evolution. Pp. 297–315 in R. Schram and J. T. Høeg, eds. Crustacean issues. A. Balkema, Rotterdam, The Netherlands

    Shapiro, B. A. 1998. An algorithm for comparing multiple RNA secondary structures. Comput. Appl. Biosci. 4:387–393[Abstract]

    Shapiro, B. A., and K. Z. Zhang. 1990. Comparing multiple RNA secondary structures using tree comparisons. Comput. Appl. Biosci. 6:309–318[Abstract]

    Smith, A. B., B. Lafay, and R. Christen. 1992. Comparative variation of morphological and molecular evolution through geologic time: 28S ribosomal RNA versus morphology in echinoids. Philos. Trans. R. Soc. Lond. B Biol. Sci. 338:365–382[ISI][Medline]

    Spears, T., L. G. Abele, and M. A. Applegate. 1994. Phylogenetic study of cirripedes and selected relatives (Thecostraca) based on 18S rDNA sequence analysis. J. Crustac. Biol. 14:641–656[ISI]

    Springer, M. S., and E. Douzery. 1996. Secondary structure and patterns of evolution among mammalian mitochondrial 12S rRNA molecules. J. Mol. Evol. 43:357–373[ISI][Medline]

    Springer, M. S., L. J. Hollar, and A. Burk. 1995. Compensatory substitutions and the evolution of the mitochondrial 12S rRNA gene in mammals. Mol. Biol. Evol. 12:1138–1150[Abstract]

    Stern, S., B. Weiser, and H. F. Noller. 1998. Model for the three-dimensional folding of 16S ribosomal RNA. J. Mol. Biol. 204:447–481

    Sweeney, R., L. Chen, and M. C. Yao. 1994. An rRNA variable region has an evolutionarily conserved essential role despite sequence divergence. Mol. Cell. Biol. 14:4203–4215[Abstract]

    Swofford, D. L. 1991. PAUP: phylogenetic analysis using parsimony. Version 3.0s. Illinois Natural History Survey, Champaign

    Tautz, D., J. M. Hancock, D. A. Webb, C. Tautz, and G. A. Dover. 1998. Complete sequences of the rRNA genes of Drosophila melanogaster. Mol. Biol. Evol. 5:366–376[Abstract]

    Tautz, D., M. Trick, and G. A. Dover. 1986. Cryptic simplicity in DNA is a major source of genetic variation. Nature 322:652–656

    Thanaraj, T. A. 1994. Phylogenetically preserved inter-rRNA base pairs: involvement in ribosomal subunit association. Nucleic Acids Res. 22:3936–3942[Abstract]

    Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680[Abstract]

    Tillier, E. R., and R. A. Collins. 1998. High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. Genetics 148:1993–2002

    Titus, T. A., and D. R. Frost. 1996. Molecular homology assessment and phylogeny in the lizard family Opluridae (Squamata: Iguania). Mol. Phylogenet. Evol. 6:49–62[ISI][Medline]

    Van de Peer, Y., J. M. Neefs, P. De Rijk, and R. De Wachter. 1993. Reconstructing evolution from eukaryotic small-ribosomal-subunit RNA sequences: calibration of the molecular clock. J. Mol. Evol. 37:221–232[ISI][Medline]

    Van de Peer, Y., E. Robbrecht, S. De Hoog, A. Caers, P. De Rijk, and R. De Wachter. 1999. Database on the structure of small subunit ribosomal RNA. Nucleic Acids Res. 27:179–183[Abstract/Free Full Text]

    Vawter, L., and W. M. Brown. 1993. Rates and patterns of base change in the small subunit ribosomal RNA gene. Genetics 134:597–608

    Vogler, A. P., A. Welsh, and J. M. Hancock. 1997. Phylogenetic analysis of slippage-like sequence variation in the V4 rRNA expansion segment in tiger beetles (Cicindelidae). Mol. Biol. Evol. 14:6–19[Abstract]

    Wheeler, W. C. 1994. Sources of ambiguity in nucleic acid sequence alignment. EXS 69:323–352

    Wheeler, W. C., J. Gatesy, and R. DeSalle. 1995. Elision: a method for accommodating multiple molecular sequence alignments with alignment-ambiguous sites. Mol. Biol. Evol. 4:1–9[Abstract]

    Wheeler, W. C., and R. L. Honeycutt. 1998. Paired sequence difference in ribosomal RNAs: evolutionary and phylogenetic implications. Mol. Biol. Evol. 5:90–96[Abstract]

    Winnepenninckx, B. M., D. G. Reid, and T. Backeljau. 1998. Performance of 18S rRNA in littorinid phylogeny (Gastropoda: Caenogastropoda). J. Mol. Evol. 47:586–596[ISI][Medline]

    Winsor, M. P. 1969. Barnacle larvae in the nineteenth century. A case study in taxonomic theory. J. Hist. Med. 1969:294–309

    Woese, C. R., and R. R. Gutell. 1989. Evidence for several higher order structural elements in ribosomal RNA. Proc. Natl. Acad. Sci. USA 86:3119–3122

    Wolff, G., and U. Kuck. 1990. The structural analysis of the mitochondrial SSUrRNA implies a close phylogenetic relationship between mitochondria from plants and from the heterotrophic alga Prototheca wickerhamii. Curr. Genet. 17:347–351

    Xiong, B., and T. D. Kocher. 1993. Phylogeny of sibling species of Simulium venustum and S. verecundum (Diptera: Simuliidae) based on sequences of the mitochondrial 16S rRNA gene. Mol. Phylogenet. Evol. 2:293–303[Medline]

    Zuker, M. 1994. Prediction of RNA secondary structure by energy minimization. Methods Mol. Biol. 25:267–294[Medline]

    Zuker, M., and A. B. Jacobson. 1998. Using reliability information to annotate RNA secondary structures. RNA 4:669–679

    Zwieb, C., C. Glotz, and R. Brimacombe. 1981. Secondary structure comparisons between small subunit ribosomal RNA molecules from six different species. Nucleic Acids Res. 9:3621–3640[Abstract]

Accepted for publication May 2, 2000.