The expressed myoglobin was identified in the radular muscle and
isolated. Oxygen equilibrium measurements on the protein reveal a high
oxygen affinity. Val-B10 and Gln-E7, important residues for the
determination of the oxygen affinity, are strikingly different from the standard molluscan pattern (Conti, E., Moser, C., Rizzi, M.,
Mattevi, A., Lionetti, C., Coda, A., Ascenzi, P., Brunori, M.,
Bolognesi, M. (1993) J. Mol. Biol. 233, 498-508).
The single gene encoding the globin chain is interrupted by three
introns at positions A3.2, B12.2, and G7.0. Comparison with other
nonvertebrate globin genes reveals on the one hand conservation (B12.2
and G7.0) and on the other hand variability of the insertion positions
(A3.2). The Biomphalaria myoglobin sequence was used together with all other molluscan globin sequences available to assess
the origin and phylogeny of the phylum. Our results confirm the doubts
raised about monophyletic origin of the Mollusca, which was first
observed using SSU rRNA as a molecular marker.
 |
INTRODUCTION |
Nonvertebrate Hbs occur episodically in most but not all
nonvertebrate phyla and exhibit a much greater variability in their primary and quaternary structures and in their functional properties than their vertebrate counterparts (1-3). Within this
structural/functional variability, two major groups can be
distinguished: (i) intracellular (myoglobin-like) hemoproteins that
occur in tissues and function as oxygen storage molecules and (ii)
intra- or extracellular (hemoglobin-like) hemoproteins that transport
oxygen between tissues. Intracellular globins mainly have low
Mr values, whereas the extracellular Hb of
nonvertebrates show a wide variety of Mr values.
Their extracellular location makes a high Mr
advantageous in minimizing loss by excretion. A high
Mr has been achieved by different routes,
exemplified in Annelida by the aggregation of many low
Mr chains into a functional Hb or, as proposed
for some Mollusca and Arthropoda, by duplication of the low
Mr chains into polymeric globins. Despite this
variability, there is compelling evidence that all globins are derived
from a common ancestor that consists of a polypeptide chain of ~150 amino acids displaying the "globin fold" (4-6).
Within the phylum Mollusca Hbs may occur (i) intracellularly in
circulating erythrocytes as monomers, dimers, or tetramers composed of
single domain or two-domain globin chains (Bivalvia); (ii)
intracellularly in the cytoplasm of specific tissues
(Mbs)1 as monomers or dimers
of single domain globin chains (Gastropoda, symbiont-containing
bivalves, and Polyplacophora); or (iii) extracellularly dissolved in
the hemolymph as high Mr aggregates of
multidomain globin chains (Gastropoda) (for a review see Ref. 1 and
references therein). The physico-chemical characteristics of these
molecules are summarized in Table I.
The gene encoding the ancestral globin chain is assumed to be
interrupted by three introns inserted in the B (B12.2), E (E15.0), and
G (G7.0) helix, as in plant legHbs. However, the conservation of this
intron pattern and the exact insertion positions of the introns during
evolution is a matter of ongoing debate. These features may reflect the
ancestral configuration (that may be masked by subsequent intron
removal and/or displacement exemplifying the introns early hypothesis),
new insertion events (introns late hypothesis), or a combination of all
these (7-11). Intron location and sequence may also shed light on the
origin of polymeric globin genes occurring in molluscs and arthropods
(12).
Globin and SSU rRNA sequences are commonly used to trace molecular
evolution and phylogeny (13-15). The origin of molluscs as well as the
phylogenetic relationships between the molluscan classes remain
controversial. Recent molecular studies, using SSU rRNA as a parameter,
failed to provide unambiguous results and suggest that Mollusca may not
be a monophyletic group (e.g. Refs. 16 and 17).
The gastropod mollusc, Biomphalaria glabrata, an
intermediate host of Schistosoma mansoni, contains an
extracellular Hb in its hemolymph. This Hb has a
Mr 1.75 106 and pI 4.6 (18) and is
composed of multidomain globin chains of Mr
~180,000 with a minimum Mr of 17,700 per iron
atom (19). It is a glycoprotein containing 2 mol of hexoses (mannose,
galactose, and fucose) and 1 mol of glucosamine per
Mr 17,700 (20). Mbs are well characterized in
several molluscs (1). However, the presence of a Mb-like molecule in
B. glabrata tissue has not been reported.
We here describe the structure of B. glabrata Mb and its
coding gene, use it to study intron evolution as well as the
origin and phylogeny of the Mollusca, and report its oxygen
binding characteristics.
 |
EXPERIMENTAL PROCEDURES |
Biological Material--
The snails, B. glabrata,
were kindly donated and identified by Dr. Cecília Pereira de
Souza from Instituto René Rachou (Fundação Oswaldo
Cruz) and were maintained in glass aquariums.
cDNA Sequencing--
A B. glabrata neural tissue
cDNA library in
ZAP, as well as a partial cDNA sequence of a
putative myoglobin clone (RBGDA16T981) was provided by BRI and TIGR,
respectively. A specific 5' forward 20-mer BIO1
(5'-TACTGTCACACAACCAGCCC-3') was designed upstream of the start codon
based on that sequence information. The library was screened for a
full-length myoglobin cDNA clone by PCR using the BIO1 primer and
either the M13/pUC universal left or universal right primer. One µl
of the cDNA library was added to a 50-µl PCR reaction mix (75 mM Tris-HCl, pH 9, 20 mM
(NH4)2SO4, 0.01% Tween 20, 1.5 mM MgCl2, 0.2 mM dNTP, and 0.5 unit
of Taq polymerase) containing 100 ng of each primer. The PCR
was carried out for 30 cycles of 94 °C for 1 min, 50 °C for 1 min, and 72 °C for 2 min. The amplified product was purified and
subcloned in pBluescript KS. Recombinants were confirmed by PCR and
sequenced by the dideoxy chain termination method. A 3' reverse 20-mer
BIO2 (5'-GAGTTGGACAGGATCCGTGG-3') was designed downstream of the stop
codon based on the obtained cDNA sequence.
Purification of Myoglobin and Protein Sequencing--
After
identification and sequencing the B. glabrata Mb at the
cDNA level, the expressed protein was localized in the radular tissue and extracted. Freshly collected radulas from B. glabrata were homogenized in 0.1 M Tris-HCl buffer, pH
7.4, containing 0.2 M NaCl. The protease inhibitors
phenylmethanesulfonyl fluoride, tosylphenylalanyl chloromethyl ketone,
iodoacetamide, EDTA, and pepstatin A were used to the following final
concentrations: 1 mM, 100 µM, 50 µM, 5 mM, and 1 µM,
respectively. Sodium nitrite was added to 0.8% final concentration in
order to convert the Mb to metmyoglobin, and the mixture was
centrifuged at 10,000 rpm for 10 min to remove cell debris. After
centrifugation, this sample was applied to a Superose 12 HR gel
filtration column (0.5-cm diameter; 25-cm length) and eluted in the
same buffer used for homogenization. Fractions (1 ml) were collected
and monitored at 415 and 280 nm. Sample homogeneity was tested by 15%
SDS-polyacrylamide gel electrophoresis according to Laemmli (21).
The purified protein was digested with trypsin, and the resulting
peptides were separated by reverse phase high performance liquid
chromatography using a Vydac C4 column developed with 0.1% trifluoroacetic acid/CH3CN. Some of the peptides were
sequenced in an ABI 471-B sequencer operated as recommended by the
manufacturer.
Circular Dichroism--
The circular dichroism was conducted in
a JASCO J20 spectropolarimeter with a constant flux of ultrapure
N2 (White Martins). The myoglobin sample at 0.117 mg/ml in
phosphate buffer 100 mM pH 7.0 was used in a 0.01-cm quartz
cuvette. The spectropolarimeter was calibrated with respect to
wavelength and signal amplitude using D-10 camphorsulfonic acid at
290.5 and 192.5 nm and D-pantolactone at 219 nm (22, 23).
The data were converted to a residual molecular ellipticity and
analyzed using the software Dicroprot version 2.3d using the Varselec
procedure to evaluate the secondary structure content of this myoglobin
(24, 25). All spectra from the data bank of the program were used as
reference for the calculations.
The myoglobin concentration was estimated by absorbance at 278 nm using
the extinction coefficient of Aplysia kurodai myoglobin (33.6 cm
1·mM
1) and by the
Lowry method, giving similar results (26, 27).
Oxygen Equilibrium Measurements--
The Mb used for oxygen
equilibrium measurements was reduced by dialysis at 5 °C against
O2-free, N2- and CO-equilibrated 0.01 M HEPES buffer, pH 7.68, containing 0.5 mM EDTA
and 0.1% sodium dithionite and further dialyzed against the same
buffer in the absence of sodium dithionite. It was then concentrated by
centrifugation in Ultrafree-MC Millipore tubes with 10,000 NMWL filters
(Millipore Corp., Bedford, MA).
Oxygen equilibria were measured using a modified diffusion chamber,
where absorption of ultrathin layers of the Mb solution are recorded
continuously during stepwise increases in the oxygen tension of
equilibration gases supplied by cascaded Woesthoff pumps for mixing
pure (>99.998%) nitrogen, oxygen, and air (3, 28). This procedure
showed high reproducibility (P50 = 4.73 ± 0.04, n = 6, for stripped human Hb at 25 °C and pH
7.4) (29). Values of P50 and
n50 were interpolated from Hill plots.
Genomic DNA Sequencing--
Genomic DNA was isolated from muscle
tissue of B. glabrata by the
N-cetyl-N,N,N-trimethylammoniumbromide
method (30) and used as template in an asymmetric PCR reaction using
Taq extender (Stratagene) with the primers BIO1 and BIO2.
The first 10 cycles were carried out at 94 °C for 30 s,
55 °C for 1 min, and 72 °C for 5 min in the presence of the BIO2
primer only, and then BIO1 primer was added and another 25 cycles were
carried out. Positively amplified products were purified and cloned in
the pCRII vector (Invitrogen) and sequenced.
Southern Blotting--
Genomic DNA was digested separately with
PstI and HindIII. Restriction fragments were
separated by agarose gel electrophoresis and denatured DNA was
transferred to a Hybond N membrane (31). After immobilizing by
ultraviolet irradiation and prehybridization (in 40% formamide, 50 mM phosphate buffer, pH 7.4, 5 mM EDTA, 0.1%
SDS, 5× Denhardt's, 0.9 M NaCl) at 42 °C for 3-4 h,
the filter was hybridized overnight at 42 °C in the same
prehybridization mixture with the denatured probe added. A genomic PCR
fragment from the Mb of B. glabrata 32P-labeled
by nick translation was used as a probe. The filter was washed
subsequently at 65 °C, 1 × 15 min in 2× SSC, 0.1% SDS; 1 × 15 min in 1× SSC, 0.1% SDS; 2 × 15 min in 0.1× SSC,
0.1% SDS and exposed to autoradiography for 4 h at
70 °C.
Tree Construction--
SSU rRNA sequences of species related to
those for which globin sequences are available were taken from the Van
de Peer et al. (32) alignment. Globin sequences were aligned
according to the nonvertebrate globin template (6). On the basis of
globin and SSU rRNA sequences, neighbor-joining (33) and maximum
parsimony trees were constructed, using the computer programs TREECON
(34) and PAUP (35), respectively. For SSU rRNA data, neighbor-joining trees were derived on the basis of distance matrices calculated using
the formula of Jukes and Cantor (36). Gaps were not taken into account.
For globin amino acid sequences, neighbor-joining trees were derived on
the basis of distance matrices calculated using the formula of Poisson
as implemented in TREECON without taking gaps into account. Maximum
parsimony trees of both globin amino acid and SSU rRNA sequences were
constructed using the heuristic search option, with the
tree-bisection-reconnection branch swapping option invoked. Gaps and
phylogenetically uninformative sites were excluded from the maximum
parsimony analyses. The confidence of neighbor-joining and maximum
parsimony trees was tested by bootstrap analyses, running 1000 replicates. According to Hillis and Bull (37), nodes are considered to
be reliable if they have a bootstrap value of at least 70%.
 |
RESULTS AND DISCUSSION |
The cDNA and Derived Amino Acid Sequence of the Biomphalaria
Glabrata Myoglobin--
A B. glabrata cDNA library was
used as template in a PCR with primer BIO1 and vector-derived primers.
A full-length cDNA fragment of ~1500 base pairs was amplified,
subcloned, and sequenced as described under "Experimental
Procedures" (Fig. 1).
The initiation codon is preceded by an incomplete untranslated region
of 45 base pairs. Nevertheless, the absence of a leader sequence is
obvious. This classifies the encoded protein as intracellular. The open
reading frame extends for 148 codons and is followed by an
exceptionally long (970-base pair) 3'-untranslated region. A normal
polyadenylation signal is present.
The cDNA translated amino acid sequence can be aligned
unambiguously with globins with known tertiary structure, including those from three molluscan species (Scapharca inaequivalvis
I, Aplysia limacina, and Lucina pectinata
I), using the nonvertebrate globin template (6, 38-42) (Fig.
2). The alignment is confirmed by (i) the
exclusion of polar residues from 33 invariant nonpolar sites listed in
Lesh and Chothia (43), (ii) the alignment of Pro-C2, which determines
the folding of the BC corner, (iii) the presence of the invariant
His-F8 and Phe-CD1, (iv) the presence of a Gly-B6 essential for the
near crossing of the B and E helices, and (v) the presence of a Trp-H8,
which is indicative of invertebrate globins. This results in a low
total penalty score for the sequence presented (Table
II) and confirms the globin nature of the
protein (6).

View larger version (15K):
[in this window]
[in a new window]
|
Fig. 2.
Alignment of B. glabrata
myoglobin with selected sequences with known tertiary structure.
The helical segments of each globin three-dimensional structure are
shown. Tryptic peptides from purified B. glabrata Mb (Fig.
4) that have been sequenced are indicated. Phys, P. catodon/Mb; Aplim, A. limacina/Mb;
Lucina I, L. pectinata/HbI; ScaI,
S. inaequivalvis/HbI; Bg, B. glabrata/Mb.
|
|
Since the molecule is intracellular, it must be interpreted as a tissue
Hb or Mb type. As shown by the low penalty score for each motif, the
presence of all the standard helical segments including a short D-helix
can be accepted. Deviations of the standard pattern can be localized in
the E-helix (penalty score: 1.7). This is mainly due to the presence of
the hydrophobic Ile at external position E5 and of Val residues at
positions E8 and E10, where normally a larger side chain is observed.
The alignment of Fig. 2 was extended with 29 other molluscan globin
sequences, and the penalty scores against the nonvertebrate template
were calculated.2 The penalty
scores of the other molluscan globin sequences are low, with the
exception of the Hbs of the primitive deep sea clam Calyptogena
soyoae (44), suggesting that they follow closely the standard
globin fold (43). When B. glabrata Mb is compared with
molluscan globin sequences, the highest similarity (30.2%) is observed
with Cerithidea rhizophorarum Mb.
The hydrophobic lining of the heme pocket of B. glabrata Mb
is normal but is less occupied by aromatic side chains than that of
A. limacina and L. pectinata Mb. In B. glabrata, Mb-specific side chain substitutions occur at several
positions. In the B-helix, Val-B10 and Trp-B12 are unprecedented.
Position B10 is usually occupied by a large hydrophobic residue
(nonvertebrate globins studied (3) have Leu (27.8%), Phe (27.8%), Tyr
(27.8%), Met (11.0%) and Trp (5.6%)). Its side chain can be turned
into the heme pocket and become involved in the control of
O2 affinity through stabilization of ligand binding, as
shown for nematode and trematode Hbs (45, 46). It is very unlikely that
the small Val can fulfill such a ligand stabilization role in B. glabrata Mb. However, the nature of the B10 side chain is
correlated to the nature of the distal (E7) residue. Indeed, when a
Gln-E7 is present, 80% of the nonvertebrate globins display a Tyr-B10.
The adjacent residue at position E11 is also important in determining oxygen affinity. However, no correlation could be found between residues at positions E7/E11 and B10/E11.
The Gln-E7 of B. glabrata is shared with C. soyoae and L. pectinata, whereas Aplysiidae (A. limacina, A. juliana, and A. kurodai) display Val-E7 and all other mollusc globins His-E7. The Val-E7 of the
monomeric Mbs of the Aplysiidae, Bursatella leachii and Dolabella auricularia, does not contact the heme directly
(39, 47). Solution 1H NMR indicates that the hydrogen
bonding of the bound ligand by the standard His-E7 is taken over by
Arg-E10. This hydrogen bonding is responsible for the relatively high
ligand affinity and the slow dissociation rate (48, 49). In contrast,
the sequence of the dimeric Mbs of Busycon canaliculatum,
C. rhizophorarum, Nassa mutabilis, and
Buccinum undatum shows the classic His at the distal
position and consequently no Arg-E10. Neither the monomeric nor the
dimeric Mbs of the gastropods have residues capable of hydrogen binding
with the ligand at position B10 (45, 50). It is clear that the ligand
stabilization system in B. glabrata Mb differs from the
unique mechanism described in the Aplysiidae. A combination Val-B10,
Gln-E7, and Ile-E11 is unprecedented.
It can be assumed that in B. glabrata Mb Ala-B5, Leu-B9, and
Met-B13 and Asn-B8, Trp-B12, and Asn-B16 form two ridges between which
a single ridge of the G-helix (Phe-G15, Asn-G11, and Pro-G7) is packed
to form the B/G-helical contact (43). As such, a strong hydrophobic
interaction can be expected among Leu-B9, Phe-G15, and Trp-B12 to
stabilize the helical contact.
The observation that Cys (EF8) (which is considered to be responsible
for dimerization and is conserved in all dimeric Mbs) is absent in the
B. glabrata sequence confirms its monomeric nature.
Since the B. glabrata Mb was identified starting from a
cDNA clone, its expression in the animal itself was verified. As
described under "Experimental Procedures," a heme-containing
protein with an apparent Mr ~17,000 could be
isolated from the radular muscle (Fig.
3). Sequencing of the intact protein
clearly proves that the amino terminus is inaccessible for Edman
degradation, suggesting that the mature protein is acetylated as in all
other mollusc Mbs observed so far (Table I). The amino acid sequences
of several tryptic peptides clearly indicate the sequence
identity of the isolated protein with the cDNA-derived amino acid
sequence of the B. glabrata Mb (Fig. 2).

View larger version (28K):
[in this window]
[in a new window]
|
Fig. 3.
Purification of B. glabrata
myoglobin. A, gel filtration of radula extract of B. glabrata on a Superose 12 HR column equilibrated with Tris-HCl 0.1 M buffer, pH 7.4, containing 0.2 M NaCl.
B, 15% SDS-polyacrylamide gel electrophoresis of gel
filtration fractions. Lane 1, molecular weight markers;
lane 2, crude radular extract; lanes 3-5,
respectively, fractions 15, 16, and 17 from panel A.
|
|
The circular dichroism analysis of the B. glabrata Mb shows
46% of
-helices, 17% of
-sheets, 14% of turns, and 23% of
coil (Fig. 4). This
-helix content is
low if compared with the vertebrate globins but is quite consistent
with the circular dichroism data obtained for A. kurodai Mb
(26). However, the crystallographic structure of A. limacina
Mb (39) reveals 76.6% of
-helices, indicating the high structural
similarity of these mollusc myoglobins with its vertebrate relatives.
Therefore, the circular dichroism data obtained for B. glabrata Mb were interpreted as confirmative of the standard
globin folding for the B. glabrata Mb.

View larger version (18K):
[in this window]
[in a new window]
|
Fig. 4.
Circular dichroism of the myoglobin of
B. glabrata. Mb at 0.117 mg/ml in 100 mM
phosphate buffer, pH 7.0. Path length was 0.01 cm.
|
|
We thus conclude that the B. glabrata Mb is effectively
expressed in the radular muscle and occurs as a monomeric molecule with
a calculated Mr of 16,049.
Oxygen Binding by B. glabrata Mb--
B. glabrata Mb
exhibits P50 values of 0.09, 0.21, 0.38, and
0.66 mm Hg at 5, 15, 20, and 25 °C, respectively (Fig.
5). The n50 values
observed (0.85-0.91) show the absence of cooperativity, which is in
agreement with the monomeric structure of the pigment.

View larger version (23K):
[in this window]
[in a new window]
|
Fig. 5.
Oxygen binding of B. glabrata Mb.
A, oxygen equilibria measured at 5 °C
(triangles), 15 °C (squares), and 24 °C
(circles) in the presence of 0.01 M HEPES
buffer, pH 7.53 (at 15 °C), containing 0.5 mM
EDTA. B, Hill plots (log (S/1 S) versus log PO2, (where
S represents fractional oxygen saturation of the Mb and the
slopes of the linear regressions give the Hill cooperativity
coefficients). C, van't Hoff plot, showing relation between
P50 and absolute temperature (T).
[Hb] = 0.049 mM.
|
|
The O2 affinities are high compared with those reported for
Mbs of other invertebrates (Table III).
Although the O2 affinity of B. glabrata Mb is
similar to that of Arenicola marina Mb I (0.38 mm Hg), it is
higher than that of A. marina Mb II (0.72 mm
Hg),3 whose lower affinity
compared with Mb I correlates with an E6 Ser/Pro exchange that may
change the surroundings of His-E7 (91).
B. glabrata Mb has an Gln-E7 in combination with Val-B10.
Zhao et al. (51) found that an E7 His/Gln mutation lowers
O2 affinity in sperm whale Mb but that this effect is
reversed by B10 Leu/Phe substitution, whereas human Hb E7 His/Gln shows
a strong increase in affinity (52). Ascaris suum Hb has a
very high oxygen affinity (on the order of 300 times higher than that
of the mammalian counterparts), which has been assigned to the
combination of Tyr-B10, Gln-E7, and Ile-E11. Ascaris Mb has
the same side chains at B10, E7, and E11, yet its oxygen affinity is
60-fold lower. These discrepancies may be explained by assuming that
other residues are essential to correctly position the B- and E-helices
to enable hydrogen binding between Tyr-B10 and the bound oxygen (53).
Trematode globins have tyrosine residues at both B10 and E7 and, like
Ascaris Hb, exhibit very high oxygen affinities (Ref. 46;
Table III). The Val-B10 residue in B. glabrata Mb projects
into the heme cavity but cannot make a hydrogen bond with the bound
oxygen as Tyr does in Ascaris and the trematodes globins
(46, 50). B. glabrata, however, does have a high oxygen
affinity, but it is unclear how this is obtained in the given
configuration.
These observations therefore suggest that (i) O2 affinity
cannot be correlated uniquely to single substitutions in the heme pocket, since there are so many other exchanges that may have an
influence, (ii) the same substitutions may have different effects in
different globins (as indicated by opposite effects of E7 His/Gln substitution in globin chains), and (iii) there may be a different molecular mechanism determining high oxygen affinity in B. glabrata Mb.
The Structure of the Biomphalaria glabrata Mb Gene and Comparison
with Other Nonvertebrate Globin Genes--
Primers flanking the coding
region were made based on the cDNA sequence and used in a PCR on
genomic DNA to determine the presence, site, and size of introns in the
Mb gene. A fragment of 3.29 kilobase pairs was amplified, suggesting a
total intron sequence length of 2.7 kilobase pairs. The gene fragment
was subsequently cloned and sequenced as described under
"Experimental Procedures" (Fig.
6).

View larger version (62K):
[in this window]
[in a new window]
|
Fig. 6.
Complete genomic DNA sequence of B. glabrata myoglobin. The coding sequence is translated in
boldface type.
|
|
The coding sequence is interrupted by three introns that are 1116, 1008, and 582 base pairs long. All exon/intron boundaries have the
expected acceptor/donor splice sites. Possible branch point sequences
are found in 2 out of 3 introns: CTAACT and CCAAC, beginning 37 and 67 base pairs upstream from the 3' splice junctions of intron II and
intron III, respectively. No known branch point sequence is detected in
intron I. Immediately upstream from the 3' splice site, no large
polypyrimidine tract is found (54).
Using the alignment of Fig. 2, the intron insertion positions can be
assigned as A3.2, B12.2, and G7.0.
A three-intron/four-exon pattern with an intron inserted in the B-, E-,
and G-helix, is considered to be ancient. According to Hardison (8) and
Lewin (4), all other gene structures evolved out of the ancestor by
intron loss. Most nematodes still have this ancient gene structure.
However, the globin genes of nematodes (55) as well as the midge larvae
(56) display central or E-helix introns at positions different from the
plant central intron (Fig. 7), suggesting
that the possibility of independent intron insertion events cannot be
excluded.

View larger version (18K):
[in this window]
[in a new window]
|
Fig. 7.
Comparison of the intron insertion positions
of relevant globin genes. The helical position and phase of intron
insertion are indicated. , leader sequence; , intron;
, precoding intron; , exon.
|
|
No central intron is observed in the
- and
'-globin genes of
Anadara trapezia (57, 58), and this also applies to the A
and B globin genes of S. inaequivalvis (59) and the c-gene of the annelid Lumbricus terrestris (60). Therefore, if
there was a central intron in the ancestral globin gene of annelids and
molluscs, it might have subsequently been lost before the divergence of
these phyla.
There are many eukaryotic genes encoding for secretory proteins that
have an additional intron (precoding intron) near the junction of the
DNA encoding the signal sequence with that of the mature protein (61).
This intron is very likely of recent origin and has been captured with
upstream sequences, being the leader sequence. The extracellular
two-domain globins of the nematodes Pseudoterranova and
Ascaris display an intron at position A4.1, which can be
considered as precoding (62, 63). This is not the case for the A3.2
intron in B. glabrata Mb, since there is no leader sequence.
It must therefore be considered as a newly inserted intron. It is
striking that precoding or A-helix introns are only observed in genes
encoding for didomain Hbs (with or without leader sequences), the
B. glabrata Mb gene being an exception. A similar precoding
intron is localized two bases before the start codon of the two-domain
globin of the mollusc Barbatia reeveana Hb (Fig. 7). The
derived bridge intron that separates the DNA sequences encoding the two
domains can be considered as "precoding" for the second domain
(64).
Several mechanisms have been proposed for the duplication of the gene.
In Pseudoterranova for example, it appears that the duplication has resulted in a direct head to tail arrangement with the
original genomic copy, which may be due to an unequal cross-over
involving the coding sequence of the last exon of the first repeat
(62). It can also be that the duplication of the gene occurred by
recombination events involving genomic DNA or that it occurred during a
mispaired gene conversion event, which is capable of producing genes
that are a fusion of two original genes. In all cases, it is not clear
whether the precoding and bridge intron were already present before the
duplication. The genes encoding for the two nine domain globin chains
of Artemia do not contain precoding or bridge introns. The
domains are ancestrally related and are presumed to be derived from
copies of an original single domain gene (12).
However, in Barbatia it has been accepted that a cross-over
has occurred between the precoding intron and the 3' noncoding region,
as indicated by sequence similarity in these regions (64).
The B. glabrata Mb gene, that contains an unprecedented
A-helix intron at position A3.2 encodes for a monodomain globin.
However, B. glabrata not only expresses a Mb but also a
polymeric Hb (18, 65).
A comparison of the sequence of the A-helix intron, that can be
considered as precoding, with that of the 3'-noncoding region, shows
places of similarity (data not shown). Thus, the multidomain Hb of
B. glabrata may have evolved from the duplication of the ancestral Mb gene involving the A-helix intron. The determination of
the structure of the gene encoding for the polymeric Hb of B. glabrata and comparison with the B. glabrata Mb gene
may elucidate these polymerization events.
To test the similarity between B. glabrata Mb and Hb as well
as to preliminarily classify the number of Mb-encoding genes, a genomic
Southern blotting experiment was done as described under "Experimental Procedures." The hybridization was carried out at low
and high stringency, and in both genomic DNA digestions
(PstI and HindIII) only a single fragment ranging
in length from ~ 8000 to 12,000 base pairs was detected (data
not shown). This result indicates that the Mb must exist in the
B. glabrata genome as a single gene and that it lacks marked
similarity with the Hb gene(s).
Recently, it has been suggested that the DNA sequence coding for the
central exon of S. inaequivalvis tetrameric Hb is part of an
open reading frame displaying all features of a functional gene in the
flanking intron sequences (66). The existence of this putative minigene
is used as an argument for the "exon theory of genes" (67). If the
central exon and its flanking introns are considered as a putative
minigene, then it should be possible to trace it back in other
nonvertebrate globins as well. Inspection of B. glabrata Mb
as well as other available globin genes clearly demonstrates that
several start and stop codons, in frame with the functional heme
binding domain, could be found in the same gene. This suggests that
they occur at random and therefore cannot be used as a proof for the
existence of "minigenes." Moreover, there is little selective
pressure to keep the intron sequences conserved, and it is therefore
unlikely that the start and stop codons of putative ancestral minigenes
would be conserved until now (68).
Molecular Phylogeny of the Molluscan Phyla Based on Globin
Sequences--
A neighbor-joining tree constructed on the basis of 41 metazoan globin sequences is shown in Fig.
8. The platyhelminth Paramphistomum epiclitum is used as an outgroup. Considering for the
interpretation of the tree only branching points with bootstrap values
higher than 70%, seven molluscan groups can be distinguished: the
Polyplacophora, Opisthobranchia, Pulmonata, Prosobranchia,
Pteriomorphia, and two heterodont groups. Surprisingly, the monophyly
of some very well established groups, such as the Heterodonta and
Gastropoda, cannot be retrieved. Also, molluscan monophyly is not
advocated by the tree topology, but there is no bootstrap support for
refuting it. None of the deeper branching points in the tree shows high bootstrap values. The maximum parsimony topology (not shown) on the
basis of the same globin data set confirms all of these findings.

View larger version (48K):
[in this window]
[in a new window]
|
Fig. 8.
Neighbor-joining tree of 28 molluscan and
other metazoan globin sequences. The platyhelminth P. epiclitum was used as an outgroup. Bootstrap values are indicated
above the nodes.
|
|
To assess the value of globin sequences for phylogeny inference, the
results of Fig. 8 were compared with a neighbor-joining tree
constructed with 19 complete metazoan SSU rRNA sequences (Fig.
9), selected as representatives of the
same higher groups (phyla, classes, or subclasses) as those included in
the globin-based tree (Fig. 8). The SSU rRNA tree supports the
monophyletic origin of the Pteriomorphia, but consistent with the
globin-based tree, it indicates molluscan polyphyly. However, as the
deeper branches have only low bootstrap values, the possibility of
Mollusca being monophyletic cannot be refuted either. In contrast to
Fig. 8, the SSU rRNA-based tree, however, does support gastropod and
heterodont monophyly and confirms the existence of a eutrochozoan
clade, including all molluscan representatives, in accordance with
previous morphological and molecular findings (16, 69, 70). A
monophyletic gastropod and heterodont cluster are also present in the
globin-based tree but have only very low bootstrap values. The results
of Fig. 9 are confirmed by the maximum parsimony tree (not shown) found on the basis of the same SSU rRNA data set.

View larger version (20K):
[in this window]
[in a new window]
|
Fig. 9.
Neighbor-joining tree on the basis of 19 complete metazoan SSU rRNA sequences. The platyhelminth
Opisthorchis viverrini was used as an outgroup. Bootstrap
values are indicated above the nodes.
|
|
Both the SSU rRNA and the globin data set fail to confidently resolve
deeper branching points. This may reflect the rapid way in which the
metazoan higher groups radiated at the Precambrian-Cambrian boundary
(71-74). Information on the branching pattern of taxa, which radiated
in an explosive way, will only be recorded by very variable, fast
evolving sites. Yet, if the explosive radiation happened a long time
ago (in the case of Metazoa, 700 million years or more), these
informative sites will have undergone many subsequent changes,
obscuring the original information. This phenomenon was already
previously described for SSU rRNA-based trees (16, 75-77). Globin
sequences seem to be even more subject to this phenomenon. Indeed, some
well established phylogenetic relationships, which can be retrieved by
SSU rRNA sequences, cannot be confidentially retrieved by globin data.
Apparently, globin sequences are, as previously concluded for SSU rRNA
sequences (16), better suited to trace more recent metazoan
divergences, such as the molluscan intraclass or even intrasubclass
relationships. Indeed, many branching points within the molluscan
clades (e.g. the pteriomorph and prosobranch clade) are
robustly supported by bootstrap analysis.
A problem with the use of globin sequences in phylogenetic analyses is
the fact that they constitute a multigene family, in which case it may
be very difficult to recognize orthologous genes necessary for species
phylogeny inference. This problem holds especially for nonvertebrate
metazoan globins, since they show a broad array of structures.
Globin-based trees are in fact a mixture of gene and species trees and
include information on the evolution of the species as well as the
gene. A comparison of Fig. 8 with the corresponding SSU rRNA tree (Fig.
9) shows that the information on the monophyly of some molluscan groups
is apparently superimposed on the gene phylogeny. Yet, apparently one
has to be more cautious when considering relationships at more
restricted taxonomic levels. Indeed, for example within the pteriomorph
clade, species phylogeny is overshadowed by gene phylogeny, resulting in a clustering of gene types rather than of closely related congeneric species. It is not clear whether the extent to which orthologous and
paralogous genes are combined in our analysis can be responsible for
the discrepancies between the globin and the SSU rRNA tree.
In conclusion, at this time it seems that globin sequences, like SSU
rRNA sequences, contain insufficient information to resolve the
metazoan radiation pattern or to recover molluscan monophyly but that
they seem to yield confident results at more restricted taxonomic
levels. Apparently, in resolving deeper branching points, globin
sequences perform worse than SSU rRNA sequences. Moreover, in contrast
to SSU rRNA sequences, the use of globin sequences in unraveling
species phylogeny suffers from the facts that it is nearly impossible
to identify which globin genes are orthologous and that gene and
species trees are always superimposed on each other. All of these
conclusions are only preliminary, since they are based on a globin data
set in which several metazoan taxa are absent and others are still
poorly sampled. However, the globin tree presented in Fig. 8 does raise
serious doubts on the monophyly of the Mollusca.
We thank Prof. Roney Elias da Silva
(GIDE-UFMG-Brasil) for instruction on the dissection of the radula. The
anonymous referees are acknowledged for valuable suggestions.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) U89283.