(Received for publication, November 20, 1995; and in revised form, February 21, 1996)
From the
The architecturally complex secretory granules of Paramecium, known as trichocysts, have two unusual and
seemingly contradictory features: their protein contents have
crystalline organization (Sperling, L., Tardieu, A., and
Gulik-Krzywicki, T.(1987) J. Cell Biol. 105, 1649-1662),
yet these proteins are a heterogeneous set of molecules encoded by a
large multigene family (Madeddu, L., Gautier, M.-C.,
Vayssié, L., Houari, A., and Sperling, L.(1995) Mol. Biol. Cell 6, 649-659). We present here the first
complete sequences of three genes coding for three different precursors
of the trichocyst crystalline matrix proteins. The deduced protein
sequences indicate that each precursor gives rise to two of the mature
polypeptides found in the crystalline trichocyst matrix. Analysis of
putative processing sites suggests that a series of reactions, some of
which may involve a novel endopeptidase, are involved in their
proteolytic maturation. Each of the 6 mature polypeptides contains
heptad segments. Characterization of the heptad segments leads us to
propose that the mature polypeptides that compose the crystalline
trichocyst matrix, despite their different primary structures, all
share a unique protein fold, probably a 4 -helical antiparallel
bundle.
Regulated secretion provides a means of communication between cells by allowing storage of biologically active molecules in specialized organelles for subsequent release in response to extracellular stimuli. Regulated secretion is restricted to a few differentiated cell types in multicellular organisms (e.g. neurons, endocrine, and exocrine cells) and is absent from most microorganisms such as yeast. However, Paramecium, like other ciliates(1) , does possess regulated secretion, which seems to be involved in defense against certain predators(2) . Each Paramecium bears around 1000 secretory granules, known as trichocysts, docked at specialized exocytotic sites in the plasma membrane, ready for rapid secretion in response to an appropriate stimulus (see (3) , for review). Genetic studies of secretory mutants have identified more than 20 genes involved in different steps of this secretory pathway: organelle biogenesis, transport and docking at the plasma membrane, and exocytosis(3, 4, 5, 6) . Our interest lies in using the Paramecium model to study the biogenesis of secretory granules, which involves the coordination of membrane traffic events with cargo protein processing, sorting, and condensation (see (7) , for review). Given their architectural complexity, Paramecium trichocysts also present an intriguing problem of morphogenesis and molecular design.
Trichocysts are
3-4 µm in size and consist of a carrot-shaped body surmounted
by a tip by which they attach to the cortical docking sites (Fig. 1a). This shape, which is genetically
determined(6) , is a property of the protein contents of the
granules, which has crystalline organization (Fig. 1b).
Indeed, structural studies have shown that the trichocyst contents is a
true protein crystal with periodicities in all three dimensions at low
resolution (30 Å)(8) . Upon exocytosis, as soon as
the crystalline trichocyst matrix comes into contact with the
Ca
and H
O of the external medium (i.e. within a few milliseconds), it elongates by a factor of
8, much like a spring suddenly released from a confined space (Fig. 1c). This dramatic and irreversible structural
transition propels the trichocyst matrix out of the cell. The
extracellular needle-shaped form, which is also an ordered array (Fig. 1d), remains insoluble(8) .
Figure 1: Trichocyst shape and crystalline organization. The panels on the left show phase-contrast light microscope images of the carrot-shaped intracellular (a) and needle-shaped extracellular (c) trichocysts. Note that despite variation in size, all the trichocysts have the same shape. The bar represents 5 µm. The corresponding panels on the right show freeze-fracture electron microscope images (see (8) ) of unfixed trichocysts of either the compact intracellular (b) or the extended extracellular (d) form and illustrate their crystalline organization. The same bar represents 500 Å. The electron micrographs are courtesy of T. Gulik-Krzywicki.
The
proteins that assemble into the crystalline trichocyst matrix (trichocyst matrix proteins, TMPs) ()are synthesized as 40-45 kDa precursor molecules
that are converted to 15-20 kDa polypeptides by proteolytic
maturation(9, 10) . Only the mature polypeptides are
able to crystallize within the maturing vesicles, and studies of
secretory mutants unable to produce functional trichocysts strongly
suggest that the protein processing controls the crystallization
process (9, 10, 11) .
Perhaps the most
unusual feature of the system is that the mature TMPs which compose the
crystalline matrix are a heterogeneous set of immunologically related,
small acidic polypeptides: at least 30 major and as many as 100
different spots are revealed by high resolution two-dimensional gel
electrophoresis of purified trichocysts(12, 13) .
Moreover, this heterogeneity appears to be situated entirely at the
level of primary structure. Analysis of PCR-generated gene fragments,
corresponding to N-terminal microsequences obtained for several mature
TMP polypeptides, showed that TMP heterogeneity is the result of
expression of a large multigene family(14) . The greatest
challenge in understanding trichocyst design is therefore to relate the
disorder at the molecular level (Å), owing to the complex
mixture of polypeptides, to the emergence of periodic order at the
electron microscope level (
nm) and of shape at the light
microscope level (
µm).
We set out to clone complete TMP genes in order to gain insight into trichocyst design as well as to obtain information necessary for further study of the post-translational processing that controls matrix assembly. We present the first complete TMP sequences, corresponding to three different precursor proteins. Analysis of the deduced amino acid sequences shows that each precursor gives rise to two mature matrix polypeptides, and that several different enzymatic reactions are likely to be involved in their processing. The parts of the precursors corresponding to the mature polypeptides contain heptad repeats. Characterization of the heptad segments allows us to identify a probable unique protein fold for the mature polypeptides and to propose a model of their arrangement in the precursor molecules.
The library was screened
using T1, T2, and T4 subfamily specific P-labeled probes
(290, 400, and 287 base pairs, respectively; Fig. 2), generated
by inverse or direct PCR as described previously(14) . Selected
clones were isolated according to standard techniques(17) .
Inserts containing the entire coding regions of the T1-b, T2-c, and
T4-a genes were identified by restriction digestion and Southern blot
analysis using as
P-labeled probes the subfamily-specific,
PCR-generated DNA fragments used for library screening as well as
gene-specific oligonucleotides (given in Fig. 2, legend). For
T1, a second larger (477 base pairs) subfamily-specific probe was also
used; the probe was generated with the same oligonucleotide primers by
inverse PCR amplification as described(14) , using as template RsaI digested and religated genomic DNA (RsaI
circles). HindIII (T1-b; 1.5 and 1.8 kb), EcoRI
(T2-c; 5 kb), and XbaI (T4-a; 2.3 kb) phage restriction
fragments were subcloned into the appropriate restriction sites of the
pUC18 plasmid. DNA sequences were determined on both strands according
to the dideoxy nucleotide chain termination method with
sequence-specific oligonucleotide primers using a T7 sequencing kit
(Pharmacia, Uppsala, Sweden).
Figure 2:
TMP gene cloning. Restriction maps of the
clones selected for subcloning and of the pUC18 subclones used
for sequencing are shown. The thick dotted lines under the
maps represent the different probes used to screen the
bacteriophage library and/or to select regions for subcloning. The boxes on the pUC18 subclones pmgh4, pmgh9, pm2431, and pmgx5
represent the regions that were sequenced, with the coding sequences
shaded. T1, probes 1 and 2, and T4, probe 1, are described under
``Experimental Procedures.'' The sequences of
oligonucleotides used as probes are as follows: T2, probe 1:
5`-GTCRGTCAAGGTTTAGAG-3` (antisense); T2, probe 2:
5`-TGGAAGACAGATATGTTG-3` (sense); T4, probe 2:
5`-GTATTTTCTATTTACTATTAATAACTAG-3` (sense).
Two-dimensional gel electrophoresis was carried out as described previously(14) . Isoelectric focusing was performed in 0.8% pH 3-10 and 1.2% pH 4-6.5 ampholines (Pharmacia) in the presence of a chemical spacer (50 mM MOPS) as described by Tindall(12) . The SDS-polyacrylamide gel electrophoresis second dimension analysis was performed according to Laemmli(19) , on 13% acrylamide gels.
Figure 3: TMP gene sequences. Complete nucleotide sequences of the genes (a) T1-b, (b) T2-c, and (c) T4-a are shown along with the deduced amino acid sequences. Coding nucleotides are in upper case and noncoding nucleotides, including the introns that interrupt each sequence, are in lower case. Note that in Paramecium, TAA and TAG code for glutamine. GenBank accession numbers for the T1-b, T2-c, and T4-a nucleotide sequences are U47115, U47116, and U47117, respectively.
Initial evaluation of heptad segments relied on the COILS program(27) ; further characterization was by visual inspection. The alignment of the three proteins was used to identify positions with apolar residues (Leu, Ile, Phe, Val, Met, Tyr, and Ala) in all sequences, which defined likely heptad segments (repetitions of heptads of the form abcdefg with apolar residues at a and d)(28) .
Homology searches of the protein sequence data bases (combined non-redundant GENBANK/EMBL/SWISSPROT) and of the SBASE protein domain data base (29) relied on the FASTA (30) and BLAST (31) algorithms; statistical significance of the results was evaluated using the RDF2 program(32) .
In order to clone one member of each
of the three subfamilies under investigation (named T1, T2, and T4
after the original microsequences), we used subfamily specific probes
to screen a bacteriophage library of P. tetraurelia genomic DNA. 14 positive clones were isolated and characterized
for T1, 6 for T2, and 20 for T4. Maps of
clones selected for
subcloning and pUC18 subclones, the probes used and the regions
sequenced are presented in Fig. 2. Restriction and Southern blot
analysis of the
T1 clones identified 4 clones with overlapping
inserts corresponding to the same genomic region. The coding region of
the gene was contained within 2 adjacent HindIII fragments of
1.5 and 1.8 kb, which were subcloned in the pUC18 plasmid vector for
sequencing. Southern blot analysis of the
T2 clones using
oligonucleotide probes specific for the T2-c gene (14) allowed
us to identify a single clone. A 5-kb EcoRI fragment
containing the coding region was subcloned in the pUC18 vector for
sequencing. Restriction and Southern blot analysis of the
T4
clones revealed 7 overlapping clones. The T4-a coding region was
contained within an XbaI fragment of 2.3 kb, which was
subcloned in the pUC18 vector for sequencing.
The sequences of the three genes (which correspond to the T1-b, T2-c, and T4-a gene fragments reported in (14) ) are shown in Fig. 3. The DNA sequences of T1-b, T2-c, and T4-a reveal a single reading frame of 1224, 1192, and 1143 nucleotides, respectively, interrupted by 4, 1, and 2 introns of 23 to 29 base pairs (the standard size in Paramecium) whose existence has been confirmed by reverse transcriptase-PCR experiments ((14) , and results not shown). The encoded proteins have lengths of 369, 387, and 363 amino acids, compatible with the 40-45 kDa size of the trichocyst precursor proteins(9, 10) .
For one of the genes, T4-a, the transcription start site was determined by primer extension (not shown). As for other characterized Paramecium genes(36, 37, 38, 39) , the untranslated leader sequence is very short: transcription begins only 13 nucleotides upstream of the ATG codon that initiates translation (ATTAATAAAAAAAATG). The T1-b and T4-a genes contain a consensus polyadenylation signal (AATAAA), found in some but not all ciliate genes(40) , located some 85 nucleotides downstream of the TGA stop codon (Fig. 2). The T2-c gene does not contain this sequence. The predicted sizes of the mRNA molecules are consistent with the 1.4 kb measured by Northern blot experiments(14) .
The minor discrepancies between the composition determined experimentally and that calculated for T1-b, T2-c, and T4-a may arise either from the fact that the complete protein sequences, not just the regions corresponding to the mature polypeptides (which account for approximately 75% of the amino acids; see below), were used for each calculation or from the fact that the three genes we have cloned constitute a random sampling of the 30 or more mature polypeptides present in the trichocyst matrix. Nonetheless the similarities outweigh the differences and speak for a close family resemblance among the proteins that constitute the trichocyst matrix.
Figure 4: Alignment of deduced TMP protein sequences. The T1-b, T2-c, and T4-a deduced protein sequences were aligned using the Clustal program. Identical amino acids are shown on a black background and similar amino acids on a gray background.
Figure 5:
Organization of TMP precursor molecules.
The same sequence alignment as in Fig. 4is shown. Above the
sequences, asterisks (*) denote positions with identical amino
acids in all sequences and dots (), positions with
similar amino acids in all sequences. The processing of N-terminal
microsequences determined for mature polypeptides are shown in white
letters on a black background (cf.(35) for a review
of the microsequencing). Putative signal peptides are darkly shaded.
The pro-regions separating the signal peptides from the first mature
polypeptides are lightly shaded. The basic regions postulated to
separate the two mature polypeptides of each precursor are also shaded,
and basic residues within the region are in bold-face type.
The structural motifs of the heptad segments are underlined and the heptads are given in italics below the sequences.
Apolar residues in the a and d positions of the
heptads are in bold-face type. Each segment has been labeled
``helix A, B, C, or D'' in accordance with the proposition
that each group of heptad segments forms an
-helical bundle. We
note that the D2 helices for T2 and T4 are less satisfactory than the
other helices. They can be improved if we remove the constraint of
perfect alignment of the heptads among the three sequences and situate
them closer to the C terminus. The 2 cysteine residues in the T2-c
sequence are in bold-face type.
To gain support for the organization of the precursors suggested by the sequence data, we performed pulse-chase experiments. We had previously investigated TMP proteolytic maturation by pulse-chase experiments, using polyclonal antibodies raised against the entire set of mature polypeptides, which allowed us to obtain a global picture of the processing. The immunoprecipitated polypeptides were analyzed on one-dimensional SDS-polyacrylamide gel electrophoresis gels which revealed the conversion of a family of 40-45-kDa precursors to a family of 15-20-kDa products(9) . Since antibodies that recognize individual TMPs are not yet available, we performed pulse-chase experiments using high resolution two-dimensional gels to analyze the proteins immunoprecipitated by the polyclonal antibodies in order to detect changes in isoelectric point as well as more subtle changes in molecular mass. A typical experiment is shown in Fig. 6.
Figure 6: Two-dimensional gel analysis of a pulse-chase experiment. Wild type cells were labeled then collected and lysed at the chase times indicated. TMP proteins were immunoprecipitated from the lysates with polyclonal antibodies raised against the complete set of mature TMP polypeptides and analyzed on two-dimensional gels. The basic end of the first dimension isoelectric focusing is on the left(-) and the acidic end (+) is on the right. The previously determined pH range for the isoelectric points of the mature (15-20 kDa) polypeptides is 5.5 to 4.7(12) . The different sets of spots discussed in the text are annotated on the 30 min gel as follows: 1, precursors; 2, weak intermediates; 3, basic mature polypeptides; 4, acidic mature polypeptides.
First of all, the precursor molecules are on average more basic than the mature polypeptides, and many of the mature polypeptides are more acidic than any of the precursors, consistent with removal of the basic region as postulated above. Second, at 20 min of chase the more basic mature polypeptides are present, and their pattern changes little between 20 and 40 min. In contrast, the more acidic spots are barely visible at 20 min, and the acidic half of the pattern of mature polypeptides evolves between 20 and 40 min of chase (but not thereafter; data not shown). Finally, the gels present a few weak spots of intermediate size, which progressively disappear as the more acidic low molecular mass polypeptides appear. These data support the idea that a temporally ordered series of reactions are involved in the conversion of the precursors to the mature polypeptides, and clearly demonstrate net loss of basic residues.
For each of the three proteins,
the COILS program(27) , an implementation of the algorithm of
Parry(44) , gave scores (>1.3 with a window of 28)
consistent with ability to form coiled-coils over most or all of the
regions that correspond to the mature polypeptides, i.e. the
unshaded portions of Fig. 5; neither the pro nor the basic
regions were predicted to form coiled-coils. The heptad, however, is a
motif typical not only of long rod-shaped -fibrous proteins, but
also of globular proteins containing bundles of
-helices. In the
case of the TMPs, the sequence data are more consistent with globular
proteins than fibrous ones. The charged to apolar residue ratios for
the complete T1-b, T2-c, and T4-a sequences are 0.55, 0.73, and 0.74,
respectively; 2-stranded
-fibrous proteins, for example, have
ratios greater than 1.0(45) .
Within globular proteins,
helices characterized by heptad repeats tend to pack in anti-parallel
bundles rather than forming extended rod-like domains(46) .
These heptad segments are shorter than those found in 2- and 3-stranded
coiled-coils, and are generally not well defined by the COILS
program(47) . ()We therefore took advantage of the
homology of the three TMP sequences to look for heptad repeats whose
positions are conserved with respect to the sequence alignment.
As
shown in Fig. 5, each of the mature polypeptides contains 4
short segments of roughly 3-4 heptads, which have been labeled A,
B, C, and D. In some of the sequences, the heptad segments are
separated or flanked by proline residues, consistent with -turns:
before helix A1, between helices B1 and C1, and after helix D1 and
helix D2. Charged residues in positions e and g,
which stabilize parallel coiled-coils in
-fibrous proteins, are
not notably present: this is consistent with a bundle of
-helices
in a globular protein. Finally and most important, the percentage of
apolar residues in positions a and d of the heptad
segments is high. These features taken together argue strongly that the
major portion of each mature polypeptide is a 4
-helical bundle (Fig. 7a).
Figure 7:
Proposed folding motif for TMPs. a, schematic drawing of a 4 -helical antiparallel bundle
(modified from (46) ). The bundles have a left-handed tilt
necessary to optimize coiled-coil packing of the helices. The chain
connectivity has arbitrarily been drawn as right-handed. This is the
basic protein fold proposed for each mature TMP polypeptide. Labeling
of the helices corresponds to that in Fig. 5. b,
possible arrangement of two helical bundles in the disulfide-bonded T2
precursor molecule. c, end-on view of the same arrangement
showing that the connectivity is the same for each bundle and for the
precursor molecule as a whole. Thick lines, top; thin and dotted lines, bottom.
We have presented the results of cloning and sequence
analysis of three genes coding for three different Paramecium secretory granule precursor proteins. Alignment of the three
protein sequences, which share only about 25% amino acid identity,
indicates common organization of the precursor molecules, each of which
gives rise to 2 mature polypeptides of the crystalline trichocyst
matrix. The organization deduced from the sequence data is supported by
two-dimensional pulse-chase experiments. Analysis of the aligned
sequences, characterized by heptad repeats, provides a picture of TMP
structure: the basic fold of all the mature polypeptides is very likely
a 4 -helical bundle motif.
Other
cleavage sites in these molecules are clearly not the targets of
subtilisin-like processing enzymes. The junction pro/first mature
polypeptide has a consensus TG-G/D. The sequence N-terminal to the
second mature polypeptide (VEAN-F for T1-b) is different but still not
a target for a (di)basic processing enzyme. Given the dearth of
knowledge of protozoan processing enzymes, we cannot exclude
involvement of a novel endopeptidase in TMP maturation. The high
-helical content of TMPs may provide a clue. It has been shown
that the magainin peptides of Xenopus skin, which are toxic to
many microorganisms because of their pore-forming capacity, are
processed by a metalloendopeptidase that recognizes
-helical
secondary structure(50) . Since it is probable that all TMP
precursor proteins have similar three-dimensional structures despite
different amino acid sequences, an endopeptidase designed for
structural rather than sequence specificity might be an efficient and
economical adaptation to the problem of trichocyst biogenesis.
The
folding pattern we propose for TMPs is based on similar identification
of the periodic disposition of apolar residues in the sequences,
consistent with coiled-coil packing of -helices(46) . Two
arguments add confidence to our assignment of a 4
-helical bundle
as the basic folding pattern for each of the mature TMP polypeptides.
First, the apolar residue periodicity is manifest in the alignment of
the three protein sequences and indeed accounts for much of the
similarity shared by the three sequences; second, the six different
mature polypeptides all contain similar arrangements of heptad
segments. Although a particular example of such a bundle has been drawn
in Fig. 7for the sake of illustration, in the absence of
structural data we are unable to specify details of the fold, for
example, the handedness of the chain connectivity.
An example of a large multigene family coding for proteins that share little sequence identity but all fold into the same structure is provided by the variant surface antigens of the African trypanosome, the agent of sleeping sickness. A repertoire of some 1000 genes codes for the surface proteins of the parasite. Although only one gene is expressed at a time, expression can switch to a different antigen to escape the host's immune response. X-ray crystallographic structure determination has revealed that several antigens, with quite different primary structures, have nearly identical tertiary structures, suggesting that all of the variant surface antigens, representing some 1000 different sequences, correspond to a unique protein fold (54, 55, 56) .
In the trypanosome example, sequence variation of the surface antigen genes would have evolved to fool the host's immune system. In our example, in which as many as 30 different sequences may share a common protein fold, the selective pressure for sequence diversification may be related to the constrained trichocyst shape which, as genetic analysis has shown, is necessary for successful exocytosis(5, 6, 9, 11) . Crystallization of the trichocyst matrix from a mixture of proteins with the same structure but slightly different chemical properties might allow formation of a gradient of crystallization within the maturing granules (which would give the carrot shape), much as pH gradients can be formed from mixtures of carrier ampholytes with slightly different pI values(57) . We should be able to test this idea, using specific antibodies and/or epitope tagged transgenes.
We know that T1 and T4 belong to the group of heat soluble proteins since N-terminal sequences of mature T1 and T4 polypeptides were determined after purification from the pool of heat soluble proteins (34) . This is consistent with the absence of cysteine residues in the deduced T1 and T4 protein sequences.
T2 belongs to the class of disulfide-bonded dimers(13, 59) , and the precursor contains exactly 2 cysteine residues. One of the residues is situated between the helices A1 and B1 of the first mature polypeptide while the other is situated between the helices C2 and D2 of the second mature polypeptide. The unique disulfide bond that can be formed in the precursor molecule would join the mature polypeptides that emanate from T2, yielding, as expected(59) , a disulfide bonded heterodimer.
The presence of a disulfide bond in T2 imposes a constraint on the
way in which the two -helical bundle motifs can be arranged in the
precursor molecule. The loops connecting helices A1 and B1 of the first
mature polypeptide and helices C2 and D2 of the second mature
polypeptide are pinned together by the disulfide bond. Another
constraint comes from polypeptide chain continuity: the basic region
connects helix D1 of the first bundle with helix A2 of the second
bundle. An arrangement which accommodates both constraints is presented
in Fig. 7, b and c. We propose that in the
precursor molecules, the two helical bundles face each other, related
by a pseudo 2-fold symmetry axis.
Although an experimental structure
determination is of course necessary to test the model, we consider it
likely that the same arrangement of the helical bundles found in T2
will also hold for the T1 and T4 precursors, despite the absence of the
disulfide bond. It is tempting to suggest that this arrangement,
determined by the initial folding of the precursor polypeptide chain,
remains after protein processing and is a feature of TMP packing in the
crystalline trichocyst matrix. However, the arrangement might be
metastable, once the pro and basic regions of the precursor were
removed. Upon exocytosis, the H0 and Ca
of the external medium would trigger the rearrangement of the
helical bundles into a thermodynamically more stable array, accounting
for the irreversible transition to the needle-shaped extracellular
form.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U47115[GenBank], U47116[GenBank], and U47117[GenBank].