The Complete Mitochondrial DNA Sequence of the Basal Hexapod Tetrodontophora bielanensis: Evidence for Heteroplasmy and tRNA Translocations

Francesco Nardi, Antonio Carapelli, Pietro Paolo Fanciulli, Romano Dallai and Francesco Frati

Department of Evolutionary Biology, University of Siena, Siena, Italy
Department of Environmental Science, Policy and Management, Division of Insect Biology, University of California at Berkeley


    Abstract
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
We present the complete 15,455-nt mitochondrial DNA sequence of the springtail Tetrodontophora bielanensis (Arthropoda, Hexapoda, Collembola). The gene content is typical of most metazoans, with 13 protein-coding genes (PCGs), 2 genes encoding for ribosomal RNA subunits, and 22 tRNA genes. The nucleotide sequence shows the well-known A+T bias typical of insect mtDNA; its A+T content is lower (72.7%) than that observed in other insect species, but still higher than that in other arthropodan taxa. The bias appears to be uniform across the whole molecule, unlike other insect taxa, which show increased A+T content in the so-called A+T-rich region. However, the bias is slightly higher in the third codon positions of the PCGs (81.4%). Anomalous initiation codons have been observed in the nad2 and the cox1 genes. In the latter, the ATTTAA hexanucleotide is suggested to be involved in the initiation signaling. All tRNAs could be folded into the typical cloverleaf secondary structure, but the tRNA for cysteine appears to be missing the DHU arm. Long tandemly repeated regions (193 nt) were found in the A+T-rich region, which in turn was shown to have the possibility of forming a complex array of secondary structures. One of these structures encompassed the junction between the repeats. The A+T-rich region was also interesting in that it showed heteroplasmy in the number of repeats. Three haplotypes were found, possessing 2, 3, and 4 identical repeats, respectively. The order of protein coding and rRNA genes in the molecule was determined and was identical to that of all insects studied so far. However, two tRNA translocations were found which were unprecedented among Arthropoda. These involved the trnQ, which was found between the rrnS and the A+T-rich region, and the trnS(ucn), which was located between trnM and trnI. A preliminary phylogenetic analysis based on the amino acid sequence of the PCGs failed to find support for the monophyly of Hexapoda.


    Introduction
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
Mitochondrial DNA (mtDNA) has been extensively studied with regard to both its structure and its function, and in recent years it has been found to contain relevant phylogenetic information at various taxonomic levels (Simon et al. 1994Citation ; Boore et al. 1995Citation ; Boore, Lavrov, and Brown 1998Citation ). This closed circular molecule has been completely sequenced in over 60 metazoan species, and it has been found to be remarkably conserved in terms of size and gene content (Boore 1999Citation ).

The relatively small size and the ease with which this molecule can be handled make it a good model for studying various aspects of genomic structure and function, such as the evolution of the genetic code, the use of different synonymous codons, transcription and RNA maturation, and disequilibrium in base frequencies. Several genes encoded in the mitochondrion (particularly cox1, cox2, and rrnS) are widely used in molecular phylogenetics (Simon et al. 1994Citation ; Caterino, Cho, and Sperling 2000Citation ). More recently, there has been growing attention paid to the order with which the genes appear along the molecule (gene order) as a very informative genetic marker with which to resolve phylogenetic relationships between distantly related taxa (Boore, Lavrov, and Brown 1998Citation ; Blanchette, Kunisawa, and Sankoff 1999Citation ; Boore and Brown 2000Citation ; Kurabayashi and Ueshima 2000Citation ; Scouras and Smith 2001Citation ).

So far, the mtDNA of eight hexapodan species has been sequenced completely. These species include the dipterans Cochliomiyia hominivorax (Lessinger et al. 2000)Citation , Drosophila yakuba (Clary and Wolstenholme 1985Citation ), Drosophila melanogaster (Lewis, Farr, and Kaguni 1995Citation ), Anopheles gambiae (Beard, Hamm, and Collins 1993Citation ), Anopheles quadrimaculatus (Mitchell, Cockburn, and Seawright 1993Citation ), and Ceratitis capitata (Spanos et al. 2000)Citation , the hymenopteran Apis mellifera (Crozier and Crozier 1993Citation ), and the orthopteran Locusta migratoria (Flook, Rowell, and Gellissen 1995Citation ). Despite the potential usefulness of mitochondrial gene order in inferring phylogenetic relationships among Arthropoda (Staton, Daehler, and Brown 1997Citation ; Boore, Lavrov, and Brown 1998Citation ), only seven nonhexapodan arthropods have been sequenced completely: Ixodes hexagonus and Rhipicephalus sanguineus (Black and Roehrdanz 1998Citation ), Artemia franciscana (Valverde et al. 1994Citation ), Daphnia pulex (Crease 1999Citation ), Penaeus monodon (Wilson et al. 2000)Citation , Pagurus longicarpus (Hickerson and Cunningham 2000)Citation , and Limulus polyphemus (Lavrov, Boore, and Brown 2000)Citation . Unfortunately, also in the Hexapoda, the sampling is not uniform and is highly biased toward the more derived holometabolous orders (six sequences from the Diptera, one from the Hymenoptera), with only one sequence available from the Hemimetabola (L. migratoria). No information on mtDNA gene arrangement is available for basal wingless taxa, in spite of the recognized need for a more homogeneous sampling among hexapods in order to use mtDNA information to interpret their evolution (Flook, Rowell, and Gellissen 1995Citation ).

Tetrodontophora bielanensis (Onychiuridae) is a representative of the apterygotan order Collembola. Members of this order, whose earliest fossil evidence dates back to the Devonian (Whalley and Jarzembowski 1981Citation ), are believed to be among the first hexapods that appeared on the earth, and hence they represent a crucial taxon for the understanding of the evolution of the whole hexapodan assemblage. However, the phylogeny of basal orders constitutes one of the most controversial issues in hexapodan evolution (Kristensen 1997Citation ; Bitsch and Bitsch 2000Citation ; Carapelli et al. 2000Citation ).

In this paper, we report the complete sequence of the mitochondrial genome of T. bielanensis and the consequent arrangement of its genes in the molecule.


    Materials and Methods
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
Specimens of T. bielanensis were collected in October 1998 at Passo San Boldo, northeastern Italy, and kept as a laboratory colony. Total genomic DNA was extracted from single specimens using a modified CTAB method (Shahjahn et al. 1995Citation ), followed by phenol extractions and precipitation in 0.3 M sodium acetate and 50% ethanol. An aliquot of the resuspended DNA was used as a template for PCR reactions. All PCR reactions were performed on a Perkin Elmer 2400 thermal cycler using the Expand High Fidelity PCR System (Roche), which guarantees an extremely low error rate (8.5 x 10-6). Sequencing was performed at the core facility of ENEA, Rome, using dye terminator technology combined with a Perkin Elmer 373A Stretch automated sequencer. Initially, two fragments were amplified using the pairs of universal primers COIf/COIa (Simon, Franke, and Martin 1991Citation ) and N4a1/N4b1 (designed for this study) (table 1 and fig. 1 ). PCR products were gel-purified, and the DNA was extracted from agarose using the Concert Rapid Gel Extraction System (Life Technologies). Sequencing was performed with the same primers used for amplification. Four perfectly matching primers, named CIbL, CIaL, N4aL, and N4bL (table 1 and fig. 1 ), were designed on the sequenced segments of cox1 and nad4 and were used to amplify two long pieces of about 6.2 kb (CIaL-N4bL) and 9.0 kb (N4aL-CIbL) (boldface lines in fig. 1 ). PCR conditions were 35 cycles of 94°C for 1 min, 58°C for 1 min 10 s, and 68°C for 8 min, followed by an extension time of 10 min. PCR products were gel-purified and used as a template for the following reactions. Using these four primers, together with other universal ones and those specifically designed on stretches of sequences obtained during this study (table 1 and fig. 1 ), 10 partially overlapping pieces were obtained, ranging in size from 0.5 to 2.2 kb (fig. 1 ). A fragment encompassing the binding site of CB-J-10933 and H15149mod was PCR-amplified and direct-sequenced to determine the sequence of the 14-nt primer overlap. The fragment encompassing part of nad2 and cox1 was obtained as a XbaI digestion product of the longer PCR product CIbL/12SJL2. These products were cloned "blunt end" into pBluescript according to standard procedures (Sambrook, Fristch, and Maniatis 1989Citation ). DNA was prepared for digestion using the GFX micro Plasmid Prep Kit (Pharmacia Biotech), and the desired clones were identified by the length of the digestion products and confirmed by sequencing their extremities using T3 and T7 primers. An aliquot of the first preparation was then reinoculated in a larger culture and prepared for sequencing using the QUIAGEN plasmid Midi KIT. To cover the remaining segment, encompassing the A+T-rich region and part of the rrnS, the product of a long amplification (16SJL/CibL; table 1 and fig. 1 ) was used as a template for a nested amplification using the primers TM-N-193 (Simon et al. 1994Citation ) and 12SJL2 (this study, table 1 ). This second amplification always produced several bands under various reaction conditions. The product of the amplification was thus purified and cloned as described above without any attempt to purify the single bands. The first screening of the positive clones by digestion clearly showed three populations of clones (called long, medium, and small), whose inserts differed by approximately 200 bp. One clone for each population was prepared as above and sequenced.


View this table:
[in this window]
[in a new window]
 
Table 1 Names, Sequences, and Positions of the 5'-End Nucleotides (with respect to the deposited sequence of Tetrodontophora bielanensis) of the Primers used in this study (see fig. 1 for their orientation)

 


View larger version (18K):
[in this window]
[in a new window]
 
Fig. 1.—Summary of the sequencing strategy. Horizontal lines indicate the fragments which were amplified and cloned. Primers above each line were designed on the sequence of the strand for which the majority of genes were transcribed; primers below each line were designed on the opposite strand

 
Each clone was sequenced on both strands using a primer-walking strategy; electropherograms were checked by eye using the program CHROMAS, version 1.41 (written by Conor McCarthy), and the sequences were imported into ESEE (Cabot and Beckenback 1989Citation ) and assembled manually to obtain the whole sequence of the molecule. No conflict in the sequence was observed in the regions for which we had more than one independent sequence (approximately 15% of the entire molecule). This observation, together with the use of a mixture of enzymes with proofreading activity, should guarantee that the level of potential PCR-generated errors is negligible with respect to the aims of this work. The sequence of the entire mtDNA of T. bielanensis deposited in GenBank contains the sequence of the smallest clone of those obtained for the A+T-rich region; other clones for this segment were submitted to GenBank separately (see below).

Protein-coding genes (PCGs) were located using the Profile alignment mode of Clustal X (Thompson et al. 1997Citation ), aligning some representative arthropodan sequences plus Lumbricus terrestris for each gene first and then using this alignment as a "probe" on the sequence of T. bielanensis. This procedure was repeated for each PCG. The corresponding open reading frame (ORF) was identified, and the extremities of the sequence were checked by eye to locate the exact beginning and termination. Exact locations of tRNAs were found at the boundaries of the genes by searching for sequences capable of forming the cloverleaf structure typical of these molecules. Clustal X (multiple-alignment mode with Gonnet weight matrix series; gap separation distance = 8) was then used to align the amino acid sequences obtained by conceptual translation of the DNA sequences using the D. melanogaster (de Brujin 1983Citation ) mitochondrial code, as implemented in GeneDoc (written by K. B. Nicholas and H. B. Nicholas). Phylogenetic analyses were performed on the complete amino acid sequences of all PCGs. The Lumbricus sequence was set as an outgroup in all analyses. Positions experiencing gaps were excluded. Maximum-parsimony (MP) and maximum-likelihood (ML) approaches were used for phylogenetic analyses. Parsimony-based analyses were performed assigning equal weights to all sites as implemented in PAUP* (Swofford 1998Citation ; branch-and-bound search, simple addition sequence, TBR branch swapping); likelihood analyses were performed with the utility ProtML of the MOLPHY package, version 2.3b3 (Adachi and Hasegawa 1996Citation ), on amino acid sequences using the mtREV24 model (Adachi and Hasegawa 1996Citation ).


    Results and Discussion
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
The entire sequence of the mitochondrial DNA of T. bielanensis was determined and has been deposited in GenBank under accession number AF272824. Two additional sequences for part of the A+T region (see below) were deposited in the same database under accession numbers AF272825 and AF272826. The length of the molecule (15,455 bp) is equal to that of A. quadrimaculatus and is the second shortest among hexapod complete mitochondrial sequences determined to date, after A. gambiae (15,363 bp). The length in T. bielanensis is 888 bp shorter than that in the longest hexapod sequenced, A. mellifera (16,343 bp).

Thirteen PCGs (cox1cox3, nad1nad6, 4L, cob, atp6, and atp8) and the two genes coding for the large (rrnL) and small (rrnS) ribosomal RNA subunits were unambiguously identified. A long (955) noncoding region is present between trnQ and trnI, which is presumed to be homologous to the A+T-rich region of other hexapods by positional homology and shared features. A closer analysis of this region evidenced the presence of tandemly repeated sequences, the features of which will be described below. The positions and orientations of PCGs, ribosomal RNAs, tRNAs, and the A+T-rich region differ from those of Drosophila and Daphnia, which have been proposed to be the ancestral arrangement for the crustacean-hexapod clade (Crease 1999Citation ), for the translocation of the trnQ and the trnS(ucn). Genes and tRNAs are closely assembled one after the other, leaving a total of only 105 nt (excluding the A+T-rich region) in unassigned intergenic spacers, ranging in size from 1 nt (between atp6 and atp8 and between nad6 and cob) to 33 nt (between trnS(ucn) and trnM). This value is similar to that of Locusta (100 bp) and well in the range of other insects, where Apis has the largest (620 bp) and the two Anopheles species have the smallest (43–47 bp) values. The extremities of some genes overlap by a few nucleotides. Six overlaps, ranging in size from 1 nt (between trnK and trnD and between trnE and trnF) to 5 nt (between trnY and cox1), were inferred, giving a total of 22 overlapping nucleotides. A detailed diagram of the genome organization of Tetrodontophora is shown in figure 2 .



View larger version (35K):
[in this window]
[in a new window]
 
Fig. 2.—The complete mitochondrial genome of Tetrodontophora bielanensis. Positions and orientations of genes are shown. Numbers at gene junctions indicate the lengths of intergenic spacers; negative numbers show overlaps between genes

 
Base Composition and Codon Usage
The base composition of the strand encoding most of the proteins in the mtDNA of T. bielanensis was 38% A, 34.7% T, 17.6% C, and 9.7% G (table 2 ), with a total A+T content of 72.7%. This value is the lowest among hexapodan mitochondrial genomes sequenced so far, which show 75.3% A+T in L. migratoria, 76.9%–78.6% in flies, and 84.9% in A. mellifera. Compared with crustaceans, the A+T content of T. bielanensis is much higher than that of D. pulex (62.3%) and A. franciscana (64.5%), but only slightly higher than that of P. monodon (70.6%) and P. longicarpus (71.2%). The A+T content of the isolated PCGs, calculated from the coding strand of each individual gene, was 71.3%, not dissimilar to the value calculated for the entire mtDNA. The same value, calculated for the three different codon positions (table 2 ), showed a clear disequilibrium: A+T content was lowest in first (66.3%) and second (66.1%) codon positions, while it was considerably higher (81.4%) in third codon positions. A possible explanation for this difference is that the constraints on A+T content in the first and second codon positions are less relaxed than those in the third codon position, due to the degenerated nature of the genetic code. Intermediate values could be calculated for rRNAs (76.9%), tRNAs (75.9%), and unassigned spacers (78%). The A+T content of the A+T-rich region (75.7%) was not dissimilar to that of the rest of the molecule (with a difference of 4.1%). This differs from the state of the other hexapods, for which the increase of the A+T content of the A+T-rich region with respect to the complete molecule is well above 10%. As in Locusta, the A+T content was lower in the repeated region than in the rest of the A+T-rich region.


View this table:
[in this window]
[in a new window]
 
Table 2 Base Compositions of Different Subsets of the Mitochondrial Genome of Tetrodontophora bielanensis

 
Relative synonymous codon usage (RSCU; Sharp, Tuohy, and Mosurski 1986Citation ) values were calculated for the PCGs and are given in table 3 . As expected, most values differ from 1 (frequency at equilibrium), and the chi-square test ({chi}2 = 1,626.488, df = 40, P < 0.0000) clearly shows that codon usage does not fulfill the implicit assumption that synonymous codons are equally used.


View this table:
[in this window]
[in a new window]
 
Table 3 Estimates of Relative Synonymous Codon Usage (RSCU) for Each Codon

 
Protein-Coding Genes
The initiation and stop codons of the 13 PCGs of Tetrodontophora mtDNA were identified by comparison with other known arthropodan sequences; particular attention was given to the amino acid alignment in order to find the most likely codon in ambiguous situations. Eleven genes begin with the ATA (6), ATT (1), or ATG (4) codons, which are the triplets that usually initiate metazoan mitochondrial genes; in nad2 and cox1, none of these triplets could be found in the vicinity of the supposed beginning of the gene. We suppose that the nad2 initiation codon is TTG. This particular initiation codon is unprecedented among hexapods but common in the roundworm Caenorhabditis elegans and was found to begin nad1 and nad5 in Limulus polyphemus (Lavrov, Boore, and Brown 2000)Citation . None of the triplets known to act as initiation codons could be found in the vicinity of the supposed cox1 initiation: the amino acid sequence at the beginning of cox1 is well conserved in all arthropods, and the possible initiation signal is likely to be found in an area of four to five triplets. In this location, no ATAA signal, proposed to initiate cox1 in Drosophila and Locusta, was present. The hexanucleotide ATTTAA, which flanks the beginning of cox1 in mosquitoes and has been proposed as an initiation signal, was present in T. bielanensis, and, interestingly, it was followed by an ATG triplet; however, these were not in frame with the rest of the coding sequence. This observation corroborates the hypothesis that the hexanucleotide ATTTAA is involved in initiation signaling, but in this case the first codon of the coding sequence was not exactly adjacent to this hexanucleotide. For phylogenetic analyses presented below, the cox1 initiation was considered to be TTA (L), 8 nt upstream, as suggested by the aligned amino acid sequences. Interestingly, cox1 in L. polyphemus is initiated by the same triplet (Staton, Daehler, and Brown 1997Citation ).

Complete termination codons TAG and TAA were found in two and six PCGs, respectively. The remaining five genes are supposed to end with a single T, which may be completed into a proper TAA stop codon by RNA polyadenylation (Ojala, Montoya, and Attardi 1981Citation ): four of them are exactly adjacent to tRNAs, and one (nad4L) is exactly adjacent to the beginning of another gene (nad4). Thus, it is highly probable that during mRNA processing the U is exposed at the end of an mRNA molecule and polyadenylated, forming the complete UAA termination signal.

Transfer RNAs
As described above, transfer RNAs were located by searching intergenic spacers for sequences capable of forming the typical cloverleaf structure. The whole set of 22 tRNAs typical of metazoan mitochondrial genomes was found, and secondary structures were drawn for each one (fig. 3 ). Anticodons were identified in the anticodon arms and were identical to those in Drosophila. The acceptor stem is always composed of seven paired nucleotides plus the discriminator nucleotide at its 3' end. In 10 tRNAs, this nucleotide was found in regions of sequence overlap, and in 5 of these cases, adjacent sequences overlapped for only this nucleotide. This suggests that, at least in some tRNAs, this discriminator nucleotide may be added after the splicing of the primary transcript (Yokobori and Pääbo 1997Citation ) in such a way that no alternative splicing would be needed in order to obtain the complete tRNA. The DHU arm is composed by a stem of 3–5 bp enclosing an unpaired loop of 3–8 nt. The anticodon arm consistently showed a 5-bp stem and seven unpaired nucleotides comprising the anticodon. The T{Psi}C stem was 3–5 bp long, and its apical loop was 2–8 nt. Between anticodon and T{Psi}C arms, an unpaired region of 2–5 nt was present. Mismatches were found in many tRNAs. Even considering AC and GU as possible pairings and setting the trnC aside, 12 mismatches were found in the acceptor stem, 1 in the DHU arm, 9 in the anticodon stem, and none in the T{Psi}C stem.



View larger version (33K):
[in this window]
[in a new window]
 
Fig. 3.—Inferred secondary structures for all tRNAs, including an alternative reconstruction for trnC.

 
The reconstruction of the secondary structure of trnC poses a major problem: while the DHU, T{Psi}C, and anticodon arms were fairly regular, the acceptor arm alone showed six mispairings (fig. 3 ). This number of unpaired nucleotides in tRNAs is remarkably high, but the regular structure of the other three arms seems to confirm unequivocally that this sequence does correspond to trnC. This tRNA, however, could also assume an alternative conformation (fig. 3 ) where the DHU arm is lost, like in R. sanguineus and Boophilus microplus (Black and Roherdanz 1998Citation ).

There is growing evidence that RNA editing could also be involved in the maturation of tRNAs, as demonstrated in land snails by Yokobori and Pääbo (1995)Citation and suggested also for opistobranchs (Kurabayashi and Ueshima 2000)Citation . However, the application of the mechanism outlined by Yokobori and Pääbo (1995)Citation to our sequences could correct only a few of the mismatches, namely those having U's or C's in the 5' end of the acceptor arm and no A's in the 3' end. This could be the case for trnC and trnR, and, for one base only, trnM and trnS(agy). In general, T. bielanensis tRNAs appear to be subject to structural constraints to a lesser extent on both sequence and size.

Ribosomal RNAs
As in all other mitochondrial genomes sequenced so far, two genes for ribosomal RNAs were present in T. bielanensis, one for the small and one for the large ribosomal subunit. These were located between trnQ and trnV and between trnV and trnL(cun), respectively (fig. 1 ). Their exact boundaries could not be identified by unambiguously analyzing the sequence alone, due to the high sequence variability at the extremities of rRNAs, but their lengths were confirmed by the reconstruction of their secondary structures. A detailed analysis of the structure and variability of rRNAs in T. bielanensis and other hexapods will be given elsewhere (unpublished data).

A+T-Rich Region: Structure and Heteroplasmy
The A+T-rich region in T. bielanensis mtDNA, comprising the region between trnI and trnQ, was 955 nt long. This length was well in the range of other arthropods, which show a remarkable variability, from 263 nt for Rhipicephalus to 4,601 nt for D. melanogaster (Zhang and Hewitt 1997Citation ). The structure of this sequence, outlined in figure 4a, was remarkable in that a close array of repeats could be found at its 5' end. At the beginning of the region, 1 nt apart from the beginning of trnQ, three tandem repeats extended for 550 nt, accounting for more than a half of the whole A+T region. The first two repeats were identical in sequence, 193 nt long, and were called R-repeats. The third repeat was slightly different in sequence, 164 nt long, and was called a D-repeat. The D-repeat was nearly identical to the R-repeats in 87 nt at the beginning and 46 nt at the end, while the middle part was completely different. An additional sequence resembling the repeats, called a pseudorepeat, could be found directly after the end of the D-repeat but was only 38 nt long and will not be considered further. We searched the entire A+T-rich region for sequences capable of forming stable secondary structures using the program RNAdraw (Matzura and Wennborg 1996Citation ). A close array of secondary structures was predicted by the algorithm, but due to the uncertainty of extending to DNA results that are based on RNA energetic values, only four structures were identified as possible candidates. These structures are presented in figure 5 : the first one (fig. 5a ) can be found in each repeat (both R and D), while the second (fig. 5b ) encompasses the junction between any repeats. These structures are thus repeated three and two times, respectively, in the shorter haplotype and more times as the number of repeats increases (see below). Two additional inferred secondary structures are present in single copy: one at the junction between the D-repeat and the nonrepeated sequence (fig. 5c ), and one in the unrepeated sequence (fig. 5d ). Neither of these, nor any of the other possible stem-forming sequences, is flanked by the conserved sequences identified in many hexapods by Zhang, Szymura, and Hewitt (1995)Citation ; thus, it is impossible to assess the homology between any of the structures of T. bielanensis and those identified by these authors.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 4.—Structure of the A+T-rich region and flanking tRNAs in the three haplotypes identified. a, Small. b, Medium. c, Long

 


View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5.—Putative secondary structures in the A+T-rich region. Numbers indicate the positions of such structures in the deposited sequence of Tetrodontophora bielanensis.

 
Every PCR amplification encompassing the A+T-rich region consistently failed to give a single product, but, rather, revealed a close array of bands, which we cloned as described above. Three families of clones, whose structures are outlined in figure 4 , were identified and sequenced. The three clones differed in numbers of R-repeats, with two, three, or four repeats present. The sequences flanking the repeats were identical in the three clones. This is clearly a case of heteroplasmy; that is, mitochondrial haplotypes differing in the structure of the A+T-rich region are present in the same individual. The nine sequences of R-repeats obtained in the three haplotypes were separated and aligned. Five of them were identical, with three differing for a single substitution and one differing for a 1-bp indel. These four differences are autapomorphies of single sequences and thus do not carry any information about the possible "phylogenetic" relationships among single repeats. This may be an effect of the concerted evolution that these sequences have been hypothesized to undergo in mitochondrial control regions (Solignac, Monnerot, and Mounolou 1986Citation ). Tandem duplicated sequences in the A+T/control region have been observed in a variety of animal groups. This kind of heteroplasmy, that is, the difference in the number of tandemly repeated sequences, has been described for many taxa, including, among hexapods, Orthoptera, Diptera, Coleoptera, and Hymenoptera (reviewed in Zhang and Hewitt 1997Citation ). The presence of a slightly different repeat at the end of the array (here called the D-repeat) is unprecedented in hexapods (Zhang and Hewitt 1997Citation and references therein). Several hypotheses have been put forth on the possible mechanisms responsible for heteroplasmy arising in the number of repeats. Recombination, through an intermediate form of unicircular dimer and slippage during replication, seems to be the most appropriate in the case of directly repeated sequences (reviewed in Zhang and Hewitt 1997Citation ). Some relevance to the existence of heteroplasmy has been hypothesized for secondary-structure elements, including flanking tRNA sequences, almost always present in the A+T/control region (Stanton et al. 1994Citation ). These may act as signaling sites to some mechanism involved in recombination, or they may stabilize intermediate structures during replication.

Gene Order and its Evolutionary Significance
The order and orientation of genes and tRNAs along the mitochondrial genome of Tetrodontophora (fig. 2 ) differs from that of Drosophila in the translocation of two tRNAs. trnQ, which is located in Drosophila between trnI and trnM, is found in Tetrodontophora between rrnS and the A+T-rich region. trnS(ucn) (between ND1 and cob in Drosophila) is translocated between trnM and trnI in Tetrodontophora. In both cases, the orientation does not change.

Both translocations are unprecedented among arthropods and thus are to be considered autapomorphic features of some restricted taxa comprising Tetrodontophora. The remaining genes have the same order as in Drosophila, which has been proposed to carry the ancestral hexapodan arrangement (Crease 1999Citation ). Preliminary unpublished data in another collembolan species from the same suborder (Arthropleona), but a different family, show that trnS(ucn) is located, as in Drosophila, between ND1 and cob. Therefore, the autapomorphic feature represented by the translocation of this tRNA could be limited to a single family.

All gene translocations found so far in insect mitochondrial genomes are useless for reconstructing phylogenetic relationships among Hexapoda, either because they are autapomorphic features of restricted groups (Crozier and Crozier 1993Citation ; Mitchell, Cockburn, and Seawright 1993Citation ) or because they are clearly examples of homoplasy (Flook, Rowell, and Gellissen 1995Citation ). However, sampling is still very scarce and biased toward a few holometabolous orders, and additional taxa might provide further insights into the mechanisms of evolution of mitochondrial gene order. Nevertheless, the finding of different gene arrangements restricted to groups of lower taxonomic levels, such as within the order Diptera (Clary and Wolstenholme 1985Citation ; Beard, Hamm, and Collins 1993Citation ), the Collembolan suborder Arthropleona, the acarine family Ixodidae (Black and Roehrdanz 1998Citation ; Campbell and Barker 1999Citation ), and the crustacean order Decapoda (Hickerson and Cunningham 2000)Citation , suggests that gene order data may be more informative at lower taxonomic ranks than previously hypothesized (Boore, Lavrov, and Brown 1998Citation ; Boore and Brown 2000Citation ).

Phylogenetic Analysis
Phylogenetic analyses were performed on the concatenated amino acid sequences of all PCGs of T. bielanensis in conjunction with those from 15 other arthropodan sequences whose complete mtDNA sequences were available in GenBank. These sequences were as follows: D. yakuba (X03240; Clary and Wolstenholme 1985Citation ), D. melanogaster (U37541; Lewis, Farr, and Kaguni 1995Citation ), A. gambiae (L20934; Beard, Hamm, and Collins 1993Citation ), A. quadrimaculatus (L04272; Mitchell, Cockburn, and Seawright 1993Citation ), C. capitata (AJ242872; Spanos et al. 2000)Citation , C. hominivorax (AF260826; Lessinger et al. 2000)Citation , A. mellifera (L06178; Crozier and Crozier 1993Citation ), L. migratoria (X80245; Flook, Rowell, and Gellissen 1995Citation ), P. monodon (AF217843; Wilson et al. 2000)Citation , A. franciscana (X69067; Valverde et al. 1994Citation ), D. pulex (AF117817; Crease 1999Citation ), P. longicarpus (AF150756; Hickerson and Cunningham 2000)Citation , L. polyphemus (AF216203; Lavrov, Boore, and Brown 2000)Citation , and I. hexagonus and R. sanguineus (AF081828 and AF081829; Black and Roehrdanz 1998Citation ). The sequence of the anellid L. terrestris (U24570; Boore and Brown 1995Citation ) was used as an outgroup in all phylogenetic analyses. Preliminary parsimony and likelihood analyses showed that Apis always clustered with the two ticks. This abnormal result was correlated with the extreme nucleotide composition bias of the Apis sequence, which was thus removed from the subsequent analyses (as in Wilson et al. 2000)Citation .

The removal of Apis left a 16-taxon data set (3,329 characters after removal of gap-experiencing sites, of which 3,414 were variable and 1,919 were parsimony-informative), which was analyzed using MP and ML. The topologies of the ML and MP trees (shown in fig. 6 ) were very similar.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 6.—Phylogenetic tree based on amino acid sequences depicting relationships among the 16 taxa used in this study (Apis was removed from the analysis, as explained in the text). This topology resulted from parsimony and likelihood analyses of the total protein-coding gene data set. In the maximum-parsimony analysis, which recovered a single most-parsimonious tree (11,340 steps; consistency index = 0.702), nodal support was estimated by running 1,000 bootstrap replications (value below the node). In the maximum-likelihood analysis, the "Quick add OTUs search" strategy was first applied, and the best 200 topologies were retained and then subjected to "Local rearrangement search" in order to estimate the ML tree (ln L = -75,358.66). Nodal support was estimated as local bootstrap probabilities (LBPs) by the RELL method with 1,000 replications (value above the node). In both cases, only the nodes with >70% probability were retained (Hillis and Bull 1993Citation ). Branch lengths represent likelihood estimates

 
The most striking result of this analysis is that T. bielanensis does not cluster with the winged insects, and therefore the monophyly of Hexapoda is rejected. Although the position of T. bielanensis is not fully resolved, the sister group of the winged insects appears to be the Malacostraca, here represented by the two decapodans P. monodon and P. longicarpus, which leads to the rejection of hexapodan monophyly. This evidence confirms some recent molecular analyses (Giribet and Ribeira 2000Citation and references therein) but is in conflict with other molecular studies (e.g., Regier and Shultz 1997Citation ) and with morphological analyses (Kristensen 1997Citation ).

A second important conclusion is that crustaceans appear paraphyletic in our reconstruction. The two classes of Crustacea represented in this analysis, Malacostraca and Branchiopoda, do not form a monophyletic group, with the former being more closely related to the winged insects than to the Branchiopoda. Not surprisingly, this result confirms that obtained with a similar data set by Wilson et al. (2000)Citation , and the inclusion of P. longicarpus reinforces their conclusion. The paraphyly of the Crustacea has been suggested by other molecular and morphological studies (see Wilson et al. [2000]Citation for a thorough discussion), and our results support the view that insects may have originated from a group of crustaceans.

The remaining relationships resulting from our analysis are fairly congruent with the classical interpretation of arthropodan phylogeny, including the monophyly of the winged insects and that of the two ticks. Within the winged insects, the basal position of L. migratoria is confirmed, while the dipteran sequences are clustered according to the Brachycera/Nematocera split. However, our MP and ML analyses are in conflict with the neighbor-joining analysis of Lessinger et al. (2000)Citation regarding the relative position of C. hominivorax and C. capitata with respect to Drosophila. The tree generated by our analysis, in fact, places C. hominivorax as the sister taxon of Drosophila, therefore rejecting the Acalyptrate taxon (Drosophila + Ceratitis).

Low or no support was found for the deeper nodes, including the monophyly of the chelicerates. Two nodes, Artemia salina + D. pulex and the node uniting all crustacean and hexapodan sequences, were well supported in the likelihood analysis but poorly supported (65% and 54%, respectively) in the parsimony analysis, suggesting that at least some of the relationships are sensitive to the methodological approach of phylogenetic reconstruction.


    Acknowledgements
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 
This work was supported by grants from M.U.R.S.T. (Progetti di ricerca di interesse nazionale to F.F.) and from the University of Siena (P.A.R. 1999 to F.F and Progetto giovani ricercatori to F.N.). We thank D. Lavrov for providing the sequences of some universal primers; A. Vandergarst, two anonymous reviewers, and R. H. Thomas for useful comments on the manuscript; and E. Nebuloso (E.N.E.A.) for valuable help with sequencing.


    Footnotes
 
Richard Thomas, Reviewing Editor

1 Abbreviations: ML, maximum likelihood; MP, maximum parsimony; PCG, protein-coding gene. Back

2 Keywords: complete mitochondrial genome Tetrodontophora bielanensis, Collembola heteroplasmy tRNA translocations arthropod phylogeny Back

3 Address for correspondence and reprints: Francesco Nardi, Department of Evolutionary Biology, University of Siena, via P.A. Mattioli 4, 53100 Siena, Italy. E-mail: nardi{at}unisi.it Back


    References
 TOP
 Abstract
 Introduction
 Materials and Methods
 Results and Discussion
 Acknowledgements
 References
 

    Adachi J., M. Hasegawa, 1996 MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood Comput. Sci. Monogr 28:1-150

    Beard C. B., D. M. Hamm, F. H. Collins, 1993 The mitochondrial genome of the mosquito Anopheles gambiae, DNA sequence, genome organization and comparisons with mitochondrial sequences of other insects Insect Mol. Biol 2:103-104[Medline]

    Bitsch C., J. Bitsch, 2000 The phylogenetic interrelationships of the higher taxa of apterygote hexapods Zool. Scripta 29:131-156[ISI]

    Black W. C., R. L. Roehrdanz, 1998 Mitochondrial gene order is not conserved in arthropods: prostriate and metastriate tick mitochondrial genomes Mol. Biol. Evol 15:1772-1785[Abstract/Free Full Text]

    Blanchette M., T. Kunisawa, D. Sankoff, 1999 Gene order breakpoint evidence in animal mitochondrial phylogeny J. Mol. Evol 49:193-203[ISI][Medline]

    Boore J. L., 1999 Animal mitochondrial genomes Nucleic Acids Res 27:1767-1780[Abstract/Free Full Text]

    Boore J. L., W. M. Brown, 1995 Complete sequence of the mitochondrial DNA of the anellid worm Lumbricus terrestris. Genetics 141:305-319[Abstract/Free Full Text]

    ———. 2000 Mitochondrial genomes of Galathealinum, Helobdella, and Platynereis: sequence and gene arrangement comparison indicate that Pogonophora is not a phylum and Anellida and Arthropoda are not sister taxa Mol. Biol. Evol 17:87-106[Abstract/Free Full Text]

    Boore J. L., T. M. Collins, D. Stanton, L. L. Daehler, W. M. Brown, 1995 Deducing the pattern of arthropod phylogeny from mitochondrial DNA rearrangements Nature 376:163-165[ISI][Medline]

    Boore J. L., D. V. Lavrov, W. M. Brown, 1998 Gene translocation links insects and crustaceans Nature 392:667-668[ISI][Medline]

    Cabot E. L., A. T. Beckenback, 1989 Simultaneous editing of multiple nucleic acid and protein sequences with ESEE Comp. Appl. Biosci 5:233-234[Medline]

    Campbell N. J. H., S. C. Barker, 1999 The novel mitochondrial gene arrangement of the cattle tick, Boophilus microplus: fivefold tandem repetition of a coding region Mol. Biol. Evol 16:732-740[Abstract]

    Carapelli A., F. Frati, F. Nardi, R. Dallai, C. Simon, 2000 Molecular phylogeny of the Apterygotan insects based on nuclear and mitochondrial genes Pedobiologia 44:361-373[ISI]

    Caterino M. S., S. Cho, F. A. H. Sperling, 2000 The current state of insect molecular systematics: a thriving tower of Babel Annu. Rev. Entomol 45:1-54[ISI][Medline]

    Clary D. O., D. R. Wolstenholme, 1985 The mitochondrial DNA molecule of Drosophila yakuba: nucleotide sequence, gene organization, and genetic code J. Mol. Evol 22:252-271[ISI][Medline]

    Crease T. J., 1999 The complete sequence of the mitochondrial genome of Daphnia pulex (Cladocera: Crustacea) Gene 233:89-99[ISI][Medline]

    Crozier R. H., Y. C. Crozier, 1993 The mitochondrial genome of the honeybee Apis mellifera: complete sequence and genome organization Genetics 133:97-117[Abstract/Free Full Text]

    De Brujin M. H. L., 1983 Drosophila melanogaster mitochondrial DNA: a novel organization and genetic code Nature 304:234-241[ISI][Medline]

    Flook P. K., C. H. F. Rowell, G. Gellissen, 1995 The sequence, organization, and evolution of the Locusta migratoria mitochondrial genome J. Mol. Evol 41:928-941[ISI][Medline]

    Frati F., C. Simon, J. Sullivan, D. L. Swofford, 1997 Evolution of the mitochondrial cytochrome oxydase II gene in Collembola J. Mol. Evol 44:145-158[ISI][Medline]

    Giribet G., C. Ribeira, 2000 A review of arthropod phylogeny: new data based on ribosomal DNA sequences and direct character optimization Cladistics 16:204-231[ISI]

    Hickerson M. J., C. W. Cunningham, 2000 Dramatic mitochondrial gene rearrangements in the hermit crab Pagurus longicarpus (Crustacea, Anomura) Mol. Biol. Evol 17:639-644[Abstract/Free Full Text]

    Hillis D. M., J. J. Bull, 1993 An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis Syst. Biol 42:182-192[ISI]

    Kocher T. D., W. K. Thomas, A. Meyer, S. V. Edwards, S. Pääbo, F. X. Villablanca, A. C. Wilson, 1989 Dynamics of mitochondrial DNA evolution in animals: amplification and sequencing with conserved primers Proc. Natl. Acad. Sci. USA 86:6196-6200[Abstract]

    Kristensen N. P., 1997 The groundplan and basal diversification of the hexapods Pp. 282–293 in R. A. Fortey and R. H. Thomas, eds. Arthropod relationships. Chapman and Hall, London

    Kurabayashi A., R. Ueshima, 2000 Complete sequence of the mitochondrial DNA of the primitive opistobranch gastropod Pupa strigosa: systematic implication of the genome organization Mol. Biol. Evol 17:266-277[Abstract/Free Full Text]

    Lavrov D. V., J. L. Boore, W. M. Brown, 2000 The complete mitochondrial DNA sequence of the horseshoe crab Limulus polyphemus. Mol. Biol. Evol 17:813-824[Abstract/Free Full Text]

    Lessinger A. C., A. C. Martins Junqueira, T. A. Lemos, E. L. Kemper, F. R. da Silva, A. L. Vettore, P. Arruda, A. M. L. Azeredo-Espin, 2000 The mitochondrial genome of the primary screwworm fly Cochliomyia hominivorax (Diptera: Calliphoridae) Insect Mol. Biol 9:521-529[ISI][Medline]

    Lewis D., C. Farr, L. Kaguni, 1995 Drosophila melanogaster mitochondrial DNA: completion of the nucleotide sequence and evolutionary comparisons Insect Mol. Biol 4:263-278[ISI][Medline]

    Matzura O., A. Wennborg, 1996 RNAdraw: an integrated program for RNA secondary structure calculation and analysis under 32-bit Microsoft Windows CABIOS 12:246-249

    Mitchell S. E., A. F. Cockburn, J. A. Seawright, 1993 The mitochondrial genome of Anopheles quadrimaculatus species A: complete nucleotide sequence and gene organization Genome 36:1058-1073[ISI][Medline]

    Ojala D., J. Montoya, G. Attardi, 1981 tRNA punctuation model of RNA processing in human mitochondria Nature 290:470-474[ISI][Medline]

    Regier J. C., J. W. Shultz, 1997 Molecular phylogeny of the major arthropod groups indicates polyphyly crustaceans and a new hypothesis for the origin of hexapods Mol. Biol. Evol 14:902-913[Abstract]

    Sambrook J., E. F. Fristch, T. Maniatis, 1989 Molecular cloning: a laboratory manual Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y

    Scouras A., M. J. Smith, 2001 A novel mitochondrial gene order in the crinoid echinoderm Florometra serratissima. Mol. Biol. Evol 18:61-73[Abstract/Free Full Text]

    Shahjahn R. M., K. J. Hughes, R. A. Leopold, J. D. de Vault, 1995 Lower incubation temperature increases yield of insect genomic DNA isolated by the CTAB method Biotechniques 19:333-334

    Sharp P. M., T. M. F. Tuohy, K. R. Mosurski, 1986 Codon usage in yeast: cluster analysis differentiates highly and lowly expressed genes Nucleic Acids Res 14:5125-5143[Abstract]

    Simon C., A. Franke, A. Martin, 1991 The polymerase chain reaction: DNA extraction and amplification Pp. 329–356 in G. M. Hewitt, A. W. B. Johnston, and J. P. W. Young, eds. Molecular techniques in taxonomy. NATO ASI series, Springer-Verlag, Berlin, Heidelberg

    Simon C., F. Frati, A. Beckenbach, B. Crespi, H. Liu, P. Flook, 1994 Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved polymerase chain reaction primers Ann. Entomol. Soc. Am 87:651-701[ISI]

    Solignac M., M. Monnerot, J. C. Mounolou, 1986 Concerted evolution of sequence repeats in Drosophila mitochondrial DNA J. Mol. Evol 24:53-60[ISI]

    Spanos L., G. Koutroumbas, M. Kostyfakis, C. Louis, 2000 The mitochondrial genome of the Mediterranean fruit fly, Ceratitis capitata. Insect Mol. Biol 9:139-144[ISI][Medline]

    Stanton D. J., L. L. Daehler, C. C. Moritz, W. M. Brown, 1994 Sequence with the potential to form stem-and-loop structures are associated with coding-region duplications in animal mitochondrial DNA Genetics 137:233-241[Abstract/Free Full Text]

    Staton J. L., L. L. Daehler, W. M. Brown, 1997 Mitochondrial gene arrangement of the horseshoe crab Limulus polyphemus L.: conservation of major features among arthropodan classes Mol. Biol. Evol 14:867-874[Abstract]

    Swofford D. L., 1998 PAUP* Phylogenetic analysis using parsimony (*and other methods). Version 4.0. Sinauer, Sunderland, Mass

    Thompson J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, D. G. Higgins, 1997 The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools Nucleic Acids Res 24:4876-4882

    Valverde J. R., B. Batuecas, C. Moratilla, R. Marco, R. Garesse, 1994 The complete mitochondrial DNA sequence of the crustacean Artemia franciscana. J. Mol. Evol 39:400-408[ISI][Medline]

    Whalley P., E. A. Jarzembowski, 1981 A new assessment of Rhyniella, the earliest known insect, from the Devonian of Rhynie, Scotland Nature 291:317.[ISI]

    Wilson K., V. Cahill, E. Ballment, J. Benzie, 2000 The complete sequence of the mitochondrial genome of the crustacean Penaeus monodon: are malacostracan crustaceans more closely related to insects than branchiopods Mol. Biol. Evol 17:863-874[Abstract/Free Full Text]

    Yokobori S., S. Pääbo, 1995 Transfer RNA editing in land snail mitochondria Proc. Natl. Acad. Sci. USA 92:10432-10435[Abstract]

    ———. 1997 Polyadenylation creates the discriminator nucleotide of chicken mitochondrial tRNATyr J. Mol. Biol 265:95-99[ISI][Medline]

    Zhang D., G. M. Hewitt, 1997 Insect mitochondrial control region: a review of its structure, evolution and usefulness in evolutionary studies Biochem. Syst. Ecol 25:99-120[ISI]

    Zhang D., J. M. Szymura, G. M. Hewitt, 1995 Evolution and structural conservation of the control region of insect mitochondrial DNA J. Mol. Evol 40:382-391[ISI][Medline]

Accepted for publication March 15, 2001.