©1996 by The American Society for Biochemistry and Molecular Biology, Inc.
Cloning and Sequence Analysis of Genes Coding for Paramecium Secretory Granule (Trichocyst) Proteins
A UNIQUE PROTEIN FOLD FOR A FAMILY OF POLYPEPTIDES WITH DIFFERENT PRIMARY STRUCTURES (*)

(Received for publication, November 20, 1995; and in revised form, February 21, 1996)

Marie-Christine Gautier (§) Linda Sperling Luisa Madeddu (¶)

From the Centre de Génétique Moléculaire, Associated with the Université Pierre et Marie Curie, Centre National de la Recherche Scientifique, 91198 Gif-sur-Yvette, France

ABSTRACT
INTRODUCTION
EXPERIMENTAL PROCEDURES
RESULTS
DISCUSSION
FOOTNOTES
ACKNOWLEDGEMENTS
REFERENCES

ABSTRACT

The architecturally complex secretory granules of Paramecium, known as trichocysts, have two unusual and seemingly contradictory features: their protein contents have crystalline organization (Sperling, L., Tardieu, A., and Gulik-Krzywicki, T.(1987) J. Cell Biol. 105, 1649-1662), yet these proteins are a heterogeneous set of molecules encoded by a large multigene family (Madeddu, L., Gautier, M.-C., Vayssié, L., Houari, A., and Sperling, L.(1995) Mol. Biol. Cell 6, 649-659). We present here the first complete sequences of three genes coding for three different precursors of the trichocyst crystalline matrix proteins. The deduced protein sequences indicate that each precursor gives rise to two of the mature polypeptides found in the crystalline trichocyst matrix. Analysis of putative processing sites suggests that a series of reactions, some of which may involve a novel endopeptidase, are involved in their proteolytic maturation. Each of the 6 mature polypeptides contains heptad segments. Characterization of the heptad segments leads us to propose that the mature polypeptides that compose the crystalline trichocyst matrix, despite their different primary structures, all share a unique protein fold, probably a 4 alpha-helical antiparallel bundle.


INTRODUCTION

Regulated secretion provides a means of communication between cells by allowing storage of biologically active molecules in specialized organelles for subsequent release in response to extracellular stimuli. Regulated secretion is restricted to a few differentiated cell types in multicellular organisms (e.g. neurons, endocrine, and exocrine cells) and is absent from most microorganisms such as yeast. However, Paramecium, like other ciliates(1) , does possess regulated secretion, which seems to be involved in defense against certain predators(2) . Each Paramecium bears around 1000 secretory granules, known as trichocysts, docked at specialized exocytotic sites in the plasma membrane, ready for rapid secretion in response to an appropriate stimulus (see (3) , for review). Genetic studies of secretory mutants have identified more than 20 genes involved in different steps of this secretory pathway: organelle biogenesis, transport and docking at the plasma membrane, and exocytosis(3, 4, 5, 6) . Our interest lies in using the Paramecium model to study the biogenesis of secretory granules, which involves the coordination of membrane traffic events with cargo protein processing, sorting, and condensation (see (7) , for review). Given their architectural complexity, Paramecium trichocysts also present an intriguing problem of morphogenesis and molecular design.

Trichocysts are 3-4 µm in size and consist of a carrot-shaped body surmounted by a tip by which they attach to the cortical docking sites (Fig. 1a). This shape, which is genetically determined(6) , is a property of the protein contents of the granules, which has crystalline organization (Fig. 1b). Indeed, structural studies have shown that the trichocyst contents is a true protein crystal with periodicities in all three dimensions at low resolution (30 Å)(8) . Upon exocytosis, as soon as the crystalline trichocyst matrix comes into contact with the Ca and H(2)O of the external medium (i.e. within a few milliseconds), it elongates by a factor of 8, much like a spring suddenly released from a confined space (Fig. 1c). This dramatic and irreversible structural transition propels the trichocyst matrix out of the cell. The extracellular needle-shaped form, which is also an ordered array (Fig. 1d), remains insoluble(8) .


Figure 1: Trichocyst shape and crystalline organization. The panels on the left show phase-contrast light microscope images of the carrot-shaped intracellular (a) and needle-shaped extracellular (c) trichocysts. Note that despite variation in size, all the trichocysts have the same shape. The bar represents 5 µm. The corresponding panels on the right show freeze-fracture electron microscope images (see (8) ) of unfixed trichocysts of either the compact intracellular (b) or the extended extracellular (d) form and illustrate their crystalline organization. The same bar represents 500 Å. The electron micrographs are courtesy of T. Gulik-Krzywicki.



The proteins that assemble into the crystalline trichocyst matrix (trichocyst matrix proteins, TMPs) (^1)are synthesized as 40-45 kDa precursor molecules that are converted to 15-20 kDa polypeptides by proteolytic maturation(9, 10) . Only the mature polypeptides are able to crystallize within the maturing vesicles, and studies of secretory mutants unable to produce functional trichocysts strongly suggest that the protein processing controls the crystallization process (9, 10, 11) .

Perhaps the most unusual feature of the system is that the mature TMPs which compose the crystalline matrix are a heterogeneous set of immunologically related, small acidic polypeptides: at least 30 major and as many as 100 different spots are revealed by high resolution two-dimensional gel electrophoresis of purified trichocysts(12, 13) . Moreover, this heterogeneity appears to be situated entirely at the level of primary structure. Analysis of PCR-generated gene fragments, corresponding to N-terminal microsequences obtained for several mature TMP polypeptides, showed that TMP heterogeneity is the result of expression of a large multigene family(14) . The greatest challenge in understanding trichocyst design is therefore to relate the disorder at the molecular level (Å), owing to the complex mixture of polypeptides, to the emergence of periodic order at the electron microscope level (nm) and of shape at the light microscope level (µm).

We set out to clone complete TMP genes in order to gain insight into trichocyst design as well as to obtain information necessary for further study of the post-translational processing that controls matrix assembly. We present the first complete TMP sequences, corresponding to three different precursor proteins. Analysis of the deduced amino acid sequences shows that each precursor gives rise to two mature matrix polypeptides, and that several different enzymatic reactions are likely to be involved in their processing. The parts of the precursors corresponding to the mature polypeptides contain heptad repeats. Characterization of the heptad segments allows us to identify a probable unique protein fold for the mature polypeptides and to propose a model of their arrangement in the precursor molecules.


EXPERIMENTAL PROCEDURES

Cells and Culture Conditions

The wild type Paramecium cells used in all experiments were Paramecium tetraurelia strain d4-2(15) . Cells were grown at 27 °C in wheat grass powder (Pines International Co., Lawrence, KA), inoculated with Enterobacter aerogenes and supplemented with 0.4 µg/ml beta-sitosterol(16) .

Isolation and Characterization of Genomic Clones

A library constructed in the BamHI site of EMBL3 with Sau3A partially digested Paramecium tetraurelia (strain d4-2) genomic DNA was kindly provided by Eric Meyer (Laboratoire de Génétique Moléculaire, Ecole Normale Supérieure, Paris).

The library was screened using T1, T2, and T4 subfamily specific P-labeled probes (290, 400, and 287 base pairs, respectively; Fig. 2), generated by inverse or direct PCR as described previously(14) . Selected clones were isolated according to standard techniques(17) . Inserts containing the entire coding regions of the T1-b, T2-c, and T4-a genes were identified by restriction digestion and Southern blot analysis using as P-labeled probes the subfamily-specific, PCR-generated DNA fragments used for library screening as well as gene-specific oligonucleotides (given in Fig. 2, legend). For T1, a second larger (477 base pairs) subfamily-specific probe was also used; the probe was generated with the same oligonucleotide primers by inverse PCR amplification as described(14) , using as template RsaI digested and religated genomic DNA (RsaI circles). HindIII (T1-b; 1.5 and 1.8 kb), EcoRI (T2-c; 5 kb), and XbaI (T4-a; 2.3 kb) phage restriction fragments were subcloned into the appropriate restriction sites of the pUC18 plasmid. DNA sequences were determined on both strands according to the dideoxy nucleotide chain termination method with sequence-specific oligonucleotide primers using a T7 sequencing kit (Pharmacia, Uppsala, Sweden).


Figure 2: TMP gene cloning. Restriction maps of the clones selected for subcloning and of the pUC18 subclones used for sequencing are shown. The thick dotted lines under the maps represent the different probes used to screen the bacteriophage library and/or to select regions for subcloning. The boxes on the pUC18 subclones pmgh4, pmgh9, pm2431, and pmgx5 represent the regions that were sequenced, with the coding sequences shaded. T1, probes 1 and 2, and T4, probe 1, are described under ``Experimental Procedures.'' The sequences of oligonucleotides used as probes are as follows: T2, probe 1: 5`-GTCRGTCAAGGTTTAGAG-3` (antisense); T2, probe 2: 5`-TGGAAGACAGATATGTTG-3` (sense); T4, probe 2: 5`-GTATTTTCTATTTACTATTAATAACTAG-3` (sense).



Pulse-Chase Experiments, Immunoprecipitation, and Two-dimensional Gel Analysis

Pulse-chase experiments and immunoprecipitations were carried out as described previously(9) . Briefly, log phase P. tetraurelia cells were fed with E. aerogenes bacteria metabolically labeled with SO(4) (Amersham Int., United Kingdom) for 10 min at 27° C (800 cells/ml; 5 times 10^7 labeled bacteria/ml). To start the chase, cells were centrifuged and resuspended in medium supplemented with unlabeled bacteria. Aliquots of the cultures were removed at the times indicated and centrifuged. The cell pellets were lysed by injection into equal volumes of hot (95 °C) 0.4% SDS, followed by rapid reconstitution of immunoprecipitation buffer (150 mM NaCl, 1% Nonidet P-40, 0.5% deoxycholate, 0.1% SDS, 50 mM Tris-HCl, pH 8; final concentrations). Cell lysates were cleared by centrifugation, and incubated with saturating amounts of a polyclonal antiserum which recognizes and efficiently immunoprecipitates most of the mature trichocyst matrix polypeptides as well as their precursor molecules (9) . Immunocomplexes were collected using protein A-Sepharose beads (Pharmacia). After extensive washing, final pellets were resuspended directly in isoelectric focusing sample buffer(18) .

Two-dimensional gel electrophoresis was carried out as described previously(14) . Isoelectric focusing was performed in 0.8% pH 3-10 and 1.2% pH 4-6.5 ampholines (Pharmacia) in the presence of a chemical spacer (50 mM MOPS) as described by Tindall(12) . The SDS-polyacrylamide gel electrophoresis second dimension analysis was performed according to Laemmli(19) , on 13% acrylamide gels.

Characterization of Introns

Reverse transcriptase-PCR was performed using total Paramecium RNA according to standard procedures, as described previously(14) . The absence of a putative intron sequence from the mRNA was evaluated by comparison of the size of PCR products obtained using either cDNA or genomic DNA as PCR template.

Primer Extension

Primer extension was carried out according to the method described by Di Rago and Colson(20) . Briefly, 20 pmol of oligonucleotide primer complementary to bases +166 to +186 of the T4-a gene sequence (Fig. 3c) was 5` end-labeled with T4 kinase according to standard procedures (17) and hybridized to total Paramecium RNA. The primer was extended by avian myeloblastosis virus-reverse transcriptase (Pharmacia; 0.5 unit/µl) and analyzed on a 6% polyacrylamide-urea gel with products of a sequencing reaction carried out with the same primer.



Figure 3: TMP gene sequences. Complete nucleotide sequences of the genes (a) T1-b, (b) T2-c, and (c) T4-a are shown along with the deduced amino acid sequences. Coding nucleotides are in upper case and noncoding nucleotides, including the introns that interrupt each sequence, are in lower case. Note that in Paramecium, TAA and TAG code for glutamine. GenBank accession numbers for the T1-b, T2-c, and T4-a nucleotide sequences are U47115, U47116, and U47117, respectively.



Sequence Analysis

Sequence assembly and initial characterization of the DNA and protein sequences were performed using the UWGCG sequence analysis package(21) . Signal sequences were identified using the algorithm of von Heijne(22) . Protein sequences were aligned using the Clustal program (23, 24) run with default parameters. Protein secondary structure was evaluated with UWGCG software based on the method of Garnier et al.(25) , the Heidelberg Profile Network Prediction PHD(26) , and the Heidelberg Prediction of Secondary Structural Content of Proteins from Their Amino Acid Composition. (^2)

Initial evaluation of heptad segments relied on the COILS program(27) ; further characterization was by visual inspection. The alignment of the three proteins was used to identify positions with apolar residues (Leu, Ile, Phe, Val, Met, Tyr, and Ala) in all sequences, which defined likely heptad segments (repetitions of heptads of the form abcdefg with apolar residues at a and d)(28) .

Homology searches of the protein sequence data bases (combined non-redundant GENBANK/EMBL/SWISSPROT) and of the SBASE protein domain data base (29) relied on the FASTA (30) and BLAST (31) algorithms; statistical significance of the results was evaluated using the RDF2 program(32) .


RESULTS

TMP Genes

The first two steps in cloning TMP genes have been previously described. First, N-terminal microsequences were obtained for several mature matrix polypeptides(13, 33, 34) . Three of the N-terminal sequences, chosen because they had been independently determined in different laboratories, using different protein purification procedures (reviewed in (35) ), were used to design partially degenerate PCR primers. Genomic DNA fragments, corresponding to the approximately 20 to 40 amino acids of the N-terminal microsequences, were amplified. The PCR products were cloned and sequenced and were also used as probes for genomic blot experiments, revealing that a large multigene family codes for TMPs(14) . The multigene family is organized in at least 10 subfamilies. Within each subfamily 4 to 8 genes, sharing 80-90% identity at the nucleotide level, code for nearly identical proteins, judging by analysis of gene fragments for 3 subfamilies. Genes belonging to different subfamilies share only about 25% identity at the amino acid level (see below).

In order to clone one member of each of the three subfamilies under investigation (named T1, T2, and T4 after the original microsequences), we used subfamily specific probes to screen a bacteriophage library of P. tetraurelia genomic DNA. 14 positive clones were isolated and characterized for T1, 6 for T2, and 20 for T4. Maps of clones selected for subcloning and pUC18 subclones, the probes used and the regions sequenced are presented in Fig. 2. Restriction and Southern blot analysis of the T1 clones identified 4 clones with overlapping inserts corresponding to the same genomic region. The coding region of the gene was contained within 2 adjacent HindIII fragments of 1.5 and 1.8 kb, which were subcloned in the pUC18 plasmid vector for sequencing. Southern blot analysis of the T2 clones using oligonucleotide probes specific for the T2-c gene (14) allowed us to identify a single clone. A 5-kb EcoRI fragment containing the coding region was subcloned in the pUC18 vector for sequencing. Restriction and Southern blot analysis of the T4 clones revealed 7 overlapping clones. The T4-a coding region was contained within an XbaI fragment of 2.3 kb, which was subcloned in the pUC18 vector for sequencing.

The sequences of the three genes (which correspond to the T1-b, T2-c, and T4-a gene fragments reported in (14) ) are shown in Fig. 3. The DNA sequences of T1-b, T2-c, and T4-a reveal a single reading frame of 1224, 1192, and 1143 nucleotides, respectively, interrupted by 4, 1, and 2 introns of 23 to 29 base pairs (the standard size in Paramecium) whose existence has been confirmed by reverse transcriptase-PCR experiments ((14) , and results not shown). The encoded proteins have lengths of 369, 387, and 363 amino acids, compatible with the 40-45 kDa size of the trichocyst precursor proteins(9, 10) .

For one of the genes, T4-a, the transcription start site was determined by primer extension (not shown). As for other characterized Paramecium genes(36, 37, 38, 39) , the untranslated leader sequence is very short: transcription begins only 13 nucleotides upstream of the ATG codon that initiates translation (ATTAATAAAAAAAATG). The T1-b and T4-a genes contain a consensus polyadenylation signal (AATAAA), found in some but not all ciliate genes(40) , located some 85 nucleotides downstream of the TGA stop codon (Fig. 2). The T2-c gene does not contain this sequence. The predicted sizes of the mRNA molecules are consistent with the 1.4 kb measured by Northern blot experiments(14) .

Analysis of the Protein Primary and Secondary Structure

The amino acid composition of purified trichocyst matrices was determined experimentally by Steers et al.(41) , at a time when the heterogeneity of TMPs was not suspected. Table 1compares the amino acid composition calculated for the deduced T1-b, T2-c, and T4-a protein sequences with the earlier experimental data. Interestingly, each of the three proteins has essentially the same composition, which is very close to that determined for the purified trichocyst matrices. The mature polypeptides are acidic (pI 4.7-5.5) (12) and for each precursor, 16% of the residues are acidic (Glu + Asp). The most abundant amino acids in these proteins are, however, the apolar residues alanine and leucine (Ala + Leu approx 22%) while the least abundant amino acid is cysteine. Glutamine is also quite abundant (between 7 and 10%). We note that in Paramecium, TAA and TAG, STOP codons in the universal genetic code, designate glutamine (42, 43) . In the TMP genes, over 50% of the glutamine codons are TAA: use of the cloned genes to drive a bacterial expression system will require changing 18, 25, and 23 TAA codons in T1-b, T2-c, and T4-a, respectively.



The minor discrepancies between the composition determined experimentally and that calculated for T1-b, T2-c, and T4-a may arise either from the fact that the complete protein sequences, not just the regions corresponding to the mature polypeptides (which account for approximately 75% of the amino acids; see below), were used for each calculation or from the fact that the three genes we have cloned constitute a random sampling of the 30 or more mature polypeptides present in the trichocyst matrix. Nonetheless the similarities outweigh the differences and speak for a close family resemblance among the proteins that constitute the trichocyst matrix.

TMP Organization

As mentioned in the Introduction, TMPs are synthesized as 40-45 kDa precursor molecules which are converted in the course of trichocyst biogenesis to 15-20 kDa mature polypeptides; only the mature polypeptides participate in the crystalline trichocyst matrix(9) . An alignment of the T1-b, T2-c, and T4-a deduced protein sequences, which are of the size expected to code for the precursors, is shown in Fig. 4. These sequences share 25% amino acid identity and 45% amino acid similarity. In Fig. 5, the experimentally determined N-terminal amino acid sequences of the mature polypeptides are shown in white letters on a black background. The positions of these N-terminal sequences indicate that all three precursor molecules have the same organization. Each precursor consists of a hydrophobic signal sequence (dark shading) separated from the first mature polypeptide by a pro-sequence (light shading). Note that the N-terminal sequence of the first mature polypeptide has been determined experimentally for each of the three molecules. We do not know where the first mature polypeptide ends, but for one of the three precursors, T1-b, we do know where the second mature polypeptide begins, thanks to the N-terminal microsequence (FADQGAL . . . ). It is thus likely that each precursor gives rise to 2 mature polypeptides; further evidence for this comes from analysis of the secondary structure (see below). The shaded region, the most basic part of each protein, separates the two mature polypeptides. Given the size and isoelectric points of the mature polypeptides(12) , this basic region is likely to be partially or totally removed by protein processing.


Figure 4: Alignment of deduced TMP protein sequences. The T1-b, T2-c, and T4-a deduced protein sequences were aligned using the Clustal program. Identical amino acids are shown on a black background and similar amino acids on a gray background.




Figure 5: Organization of TMP precursor molecules. The same sequence alignment as in Fig. 4is shown. Above the sequences, asterisks (*) denote positions with identical amino acids in all sequences and dots (bullet), positions with similar amino acids in all sequences. The processing of N-terminal microsequences determined for mature polypeptides are shown in white letters on a black background (cf.(35) for a review of the microsequencing). Putative signal peptides are darkly shaded. The pro-regions separating the signal peptides from the first mature polypeptides are lightly shaded. The basic regions postulated to separate the two mature polypeptides of each precursor are also shaded, and basic residues within the region are in bold-face type. The structural motifs of the heptad segments are underlined and the heptads are given in italics below the sequences. Apolar residues in the a and d positions of the heptads are in bold-face type. Each segment has been labeled ``helix A, B, C, or D'' in accordance with the proposition that each group of heptad segments forms an alpha-helical bundle. We note that the D2 helices for T2 and T4 are less satisfactory than the other helices. They can be improved if we remove the constraint of perfect alignment of the heptads among the three sequences and situate them closer to the C terminus. The 2 cysteine residues in the T2-c sequence are in bold-face type.



Several Processing Reactions Involved in TMP Maturation

Examination of the putative cleavage sites on substrate molecules can in theory help identify the enzymes responsible for the cleavage. In the TMP sequences, the junction between the pro-region and the first mature polypeptide, whose position is absolutely certain for all three proteins, presents a short consensus with TG on the N-terminal side of the cleavage (i.e. in the P2-P1 positions) and either Gly or Asp on the C-terminal side (i.e. in the P1` position). As far as we are aware, this sequence does not correspond to a known endopeptidase cleavage site, moreover cleavage at a glycine residue is quite unusual. The sequence preceding the second mature polypeptide is yet different. No consensus of even two amino acids appears in this region, and there is no TG. The endopeptidase that removes the pro-sequence from the first mature polypeptide is thus not likely to be involved in liberating the second mature polypeptide, and it moreover seems probable that several distinct processing reactions are necessary to remove the pro and basic regions and liberate mature TMP polypeptides.

To gain support for the organization of the precursors suggested by the sequence data, we performed pulse-chase experiments. We had previously investigated TMP proteolytic maturation by pulse-chase experiments, using polyclonal antibodies raised against the entire set of mature polypeptides, which allowed us to obtain a global picture of the processing. The immunoprecipitated polypeptides were analyzed on one-dimensional SDS-polyacrylamide gel electrophoresis gels which revealed the conversion of a family of 40-45-kDa precursors to a family of 15-20-kDa products(9) . Since antibodies that recognize individual TMPs are not yet available, we performed pulse-chase experiments using high resolution two-dimensional gels to analyze the proteins immunoprecipitated by the polyclonal antibodies in order to detect changes in isoelectric point as well as more subtle changes in molecular mass. A typical experiment is shown in Fig. 6.


Figure 6: Two-dimensional gel analysis of a pulse-chase experiment. Wild type cells were labeled then collected and lysed at the chase times indicated. TMP proteins were immunoprecipitated from the lysates with polyclonal antibodies raised against the complete set of mature TMP polypeptides and analyzed on two-dimensional gels. The basic end of the first dimension isoelectric focusing is on the left(-) and the acidic end (+) is on the right. The previously determined pH range for the isoelectric points of the mature (15-20 kDa) polypeptides is 5.5 to 4.7(12) . The different sets of spots discussed in the text are annotated on the 30 min gel as follows: 1, precursors; 2, weak intermediates; 3, basic mature polypeptides; 4, acidic mature polypeptides.



First of all, the precursor molecules are on average more basic than the mature polypeptides, and many of the mature polypeptides are more acidic than any of the precursors, consistent with removal of the basic region as postulated above. Second, at 20 min of chase the more basic mature polypeptides are present, and their pattern changes little between 20 and 40 min. In contrast, the more acidic spots are barely visible at 20 min, and the acidic half of the pattern of mature polypeptides evolves between 20 and 40 min of chase (but not thereafter; data not shown). Finally, the gels present a few weak spots of intermediate size, which progressively disappear as the more acidic low molecular mass polypeptides appear. These data support the idea that a temporally ordered series of reactions are involved in the conversion of the precursors to the mature polypeptides, and clearly demonstrate net loss of basic residues.

Mature Polypeptides Contain Heptad Repeats

T1-b, T2-c, and T4-a protein sequences were used to search the protein data bases. No statistically significant similarity with known proteins was found. The highest scores for each of the proteins were with the rod portions of myosins, keratins, and intermediate filament proteins. Different methods for the prediction of protein secondary structure all indicate that the trichocyst proteins have very high alpha-helical content. We therefore looked for heptad repeats (repetitions of 7 amino acids, with apolar residues in the first and fourth positions(28) ), which are indicative of coiled-coil interactions between alpha-helices.

For each of the three proteins, the COILS program(27) , an implementation of the algorithm of Parry(44) , gave scores (>1.3 with a window of 28) consistent with ability to form coiled-coils over most or all of the regions that correspond to the mature polypeptides, i.e. the unshaded portions of Fig. 5; neither the pro nor the basic regions were predicted to form coiled-coils. The heptad, however, is a motif typical not only of long rod-shaped alpha-fibrous proteins, but also of globular proteins containing bundles of alpha-helices. In the case of the TMPs, the sequence data are more consistent with globular proteins than fibrous ones. The charged to apolar residue ratios for the complete T1-b, T2-c, and T4-a sequences are 0.55, 0.73, and 0.74, respectively; 2-stranded alpha-fibrous proteins, for example, have ratios greater than 1.0(45) .

Within globular proteins, helices characterized by heptad repeats tend to pack in anti-parallel bundles rather than forming extended rod-like domains(46) . These heptad segments are shorter than those found in 2- and 3-stranded coiled-coils, and are generally not well defined by the COILS program(47) . (^3)We therefore took advantage of the homology of the three TMP sequences to look for heptad repeats whose positions are conserved with respect to the sequence alignment.

As shown in Fig. 5, each of the mature polypeptides contains 4 short segments of roughly 3-4 heptads, which have been labeled A, B, C, and D. In some of the sequences, the heptad segments are separated or flanked by proline residues, consistent with beta-turns: before helix A1, between helices B1 and C1, and after helix D1 and helix D2. Charged residues in positions e and g, which stabilize parallel coiled-coils in alpha-fibrous proteins, are not notably present: this is consistent with a bundle of alpha-helices in a globular protein. Finally and most important, the percentage of apolar residues in positions a and d of the heptad segments is high. These features taken together argue strongly that the major portion of each mature polypeptide is a 4 alpha-helical bundle (Fig. 7a).


Figure 7: Proposed folding motif for TMPs. a, schematic drawing of a 4 alpha-helical antiparallel bundle (modified from (46) ). The bundles have a left-handed tilt necessary to optimize coiled-coil packing of the helices. The chain connectivity has arbitrarily been drawn as right-handed. This is the basic protein fold proposed for each mature TMP polypeptide. Labeling of the helices corresponds to that in Fig. 5. b, possible arrangement of two helical bundles in the disulfide-bonded T2 precursor molecule. c, end-on view of the same arrangement showing that the connectivity is the same for each bundle and for the precursor molecule as a whole. Thick lines, top; thin and dotted lines, bottom.




DISCUSSION

We have presented the results of cloning and sequence analysis of three genes coding for three different Paramecium secretory granule precursor proteins. Alignment of the three protein sequences, which share only about 25% amino acid identity, indicates common organization of the precursor molecules, each of which gives rise to 2 mature polypeptides of the crystalline trichocyst matrix. The organization deduced from the sequence data is supported by two-dimensional pulse-chase experiments. Analysis of the aligned sequences, characterized by heptad repeats, provides a picture of TMP structure: the basic fold of all the mature polypeptides is very likely a 4 alpha-helical bundle motif.

TMP Processing: Novel Enzymes?

By far the most commonly found processing enzymes in the regulated (and constitutive) secretory pathways of metazoa and fungi are serine proteinases belonging to the bacterial subtilisin superfamily known as kexins or prohormone convertases(48, 49) . These enzymes cleave their substrates at dibasic (or more rarely, monobasic, tribasic, or tetrabasic) sites. In the basic region separating the two TMP mature polypeptides, all three precursor sequences contain basic residues, and T1 and T2 contain pairs of basic residues. This region could potentially be cleaved by subtilisin-like processing enzymes, thus accounting for the conversion of the precursors to 15-20 kDa products characterized by pulse-chase experiments(9) .

Other cleavage sites in these molecules are clearly not the targets of subtilisin-like processing enzymes. The junction pro/first mature polypeptide has a consensus TG-G/D. The sequence N-terminal to the second mature polypeptide (VEAN-F for T1-b) is different but still not a target for a (di)basic processing enzyme. Given the dearth of knowledge of protozoan processing enzymes, we cannot exclude involvement of a novel endopeptidase in TMP maturation. The high alpha-helical content of TMPs may provide a clue. It has been shown that the magainin peptides of Xenopus skin, which are toxic to many microorganisms because of their pore-forming capacity, are processed by a metalloendopeptidase that recognizes alpha-helical secondary structure(50) . Since it is probable that all TMP precursor proteins have similar three-dimensional structures despite different amino acid sequences, an endopeptidase designed for structural rather than sequence specificity might be an efficient and economical adaptation to the problem of trichocyst biogenesis.

Many Sequences, One Structural Motif

Although much progress has been made in the prediction of protein secondary structure from primary sequence data, especially in cases where there is homology with proteins of known structure(51) , it is not yet common to be able to infer the folding pattern of a protein simply from its sequence. alpha-Fibrous proteins that form coiled-coils are a notable exception (47) . The ``knobs into holes'' coiled-coil packing (52) has long been recognized to impose regularities on the primary sequence (apolar and charged residue periodicities) that can be appreciated by Fourier transformation(53) .

The folding pattern we propose for TMPs is based on similar identification of the periodic disposition of apolar residues in the sequences, consistent with coiled-coil packing of alpha-helices(46) . Two arguments add confidence to our assignment of a 4 alpha-helical bundle as the basic folding pattern for each of the mature TMP polypeptides. First, the apolar residue periodicity is manifest in the alignment of the three protein sequences and indeed accounts for much of the similarity shared by the three sequences; second, the six different mature polypeptides all contain similar arrangements of heptad segments. Although a particular example of such a bundle has been drawn in Fig. 7for the sake of illustration, in the absence of structural data we are unable to specify details of the fold, for example, the handedness of the chain connectivity.

An example of a large multigene family coding for proteins that share little sequence identity but all fold into the same structure is provided by the variant surface antigens of the African trypanosome, the agent of sleeping sickness. A repertoire of some 1000 genes codes for the surface proteins of the parasite. Although only one gene is expressed at a time, expression can switch to a different antigen to escape the host's immune response. X-ray crystallographic structure determination has revealed that several antigens, with quite different primary structures, have nearly identical tertiary structures, suggesting that all of the variant surface antigens, representing some 1000 different sequences, correspond to a unique protein fold (54, 55, 56) .

In the trypanosome example, sequence variation of the surface antigen genes would have evolved to fool the host's immune system. In our example, in which as many as 30 different sequences may share a common protein fold, the selective pressure for sequence diversification may be related to the constrained trichocyst shape which, as genetic analysis has shown, is necessary for successful exocytosis(5, 6, 9, 11) . Crystallization of the trichocyst matrix from a mixture of proteins with the same structure but slightly different chemical properties might allow formation of a gradient of crystallization within the maturing granules (which would give the carrot shape), much as pH gradients can be formed from mixtures of carrier ampholytes with slightly different pI values(57) . We should be able to test this idea, using specific antibodies and/or epitope tagged transgenes.

Arrangement of the alpha-Helical Bundles in the Precursor Molecules

Steers et al.(41) , who first characterized TMPs, showed that these proteins are present in the trichocyst matrix as disulfide bonded dimers. Using monoclonal antibodies that recognize defined subsets of precursors and mature polypeptides, Shih and Nelson (58, 59) were able to demonstrate that most mature TMPs are disulfide-bonded heterodimers, and that the corresponding precursor molecules contain intramolecular disulfide bonds as judged by electrophoresis in the presence or absence of reducing agents. Some of the mature TMPs are present as monomers, and there are no disulfide bonds in the corresponding precursor molecules. This latter group corresponds to the proteins that can be solubilized from extended trichocyst matrices by heating(60) .

We know that T1 and T4 belong to the group of heat soluble proteins since N-terminal sequences of mature T1 and T4 polypeptides were determined after purification from the pool of heat soluble proteins (34) . This is consistent with the absence of cysteine residues in the deduced T1 and T4 protein sequences.

T2 belongs to the class of disulfide-bonded dimers(13, 59) , and the precursor contains exactly 2 cysteine residues. One of the residues is situated between the helices A1 and B1 of the first mature polypeptide while the other is situated between the helices C2 and D2 of the second mature polypeptide. The unique disulfide bond that can be formed in the precursor molecule would join the mature polypeptides that emanate from T2, yielding, as expected(59) , a disulfide bonded heterodimer.

The presence of a disulfide bond in T2 imposes a constraint on the way in which the two alpha-helical bundle motifs can be arranged in the precursor molecule. The loops connecting helices A1 and B1 of the first mature polypeptide and helices C2 and D2 of the second mature polypeptide are pinned together by the disulfide bond. Another constraint comes from polypeptide chain continuity: the basic region connects helix D1 of the first bundle with helix A2 of the second bundle. An arrangement which accommodates both constraints is presented in Fig. 7, b and c. We propose that in the precursor molecules, the two helical bundles face each other, related by a pseudo 2-fold symmetry axis.

Although an experimental structure determination is of course necessary to test the model, we consider it likely that the same arrangement of the helical bundles found in T2 will also hold for the T1 and T4 precursors, despite the absence of the disulfide bond. It is tempting to suggest that this arrangement, determined by the initial folding of the precursor polypeptide chain, remains after protein processing and is a feature of TMP packing in the crystalline trichocyst matrix. However, the arrangement might be metastable, once the pro and basic regions of the precursor were removed. Upon exocytosis, the H(2)0 and Ca of the external medium would trigger the rearrangement of the helical bundles into a thermodynamically more stable array, accounting for the irreversible transition to the needle-shaped extracellular form.


FOOTNOTES

*
This work was supported in part by the Genome Program of the Ministère de l'Enseignement Supérieure et de la Recherche (GIP GREG) and the CNRS. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore by hereby marked ``advertisement'' in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U47115[GenBank], U47116[GenBank], and U47117[GenBank].

§
Supported by a graduate fellowship from the Ministère de l'Enseignement Supérieure et de la Recherche (MESR).

Supported by a senior fellowship of the EEC Bridge Program and by a Poste Rouge from the CNRS. To whom correspondence should be addressed. Tel.: 33-1-69-82-43-92; Fax: 33-1-69-82-31-50.

(^1)
The abbreviations used are: TMP, trichocyst matrix protein; MOPS, morpholinopropanesulfonic acid; PCR, polymerase chain reaction; kb, kilobase(s); UWGCG, University of Wisconsin Genetics Computer Group.

(^2)
F. Eisenhaber, F. Imperiale, P. Argos, and C. Froemmel, submitted for publication.

(^3)
The COILS program (27) calculates probabilities based on the statistical occurrence of amino acids in the different positions of a heptad repeat using a reference data base of 2 stranded, parallel coiled-coils from myosin, keratin, and tropomyosin.


ACKNOWLEDGEMENTS

We thank Carl Creutz, Roger Karess, and Roberto Bruzzone for critical reading of the manuscript and Janine Beisson, Jean Cohen, and Vittorio Luzzati for many useful discussions. We are particularly indebted to David A. D. Parry for his kind advice on analysis of heptad repeats and for pointing out to us the likelihood that TMPs contain 4 alpha-helical bundle motifs.


REFERENCES

  1. Hausmann, K. (1978) in International Review of Cytology (Bourne, G. H., and Danielli, J. F., eds) pp. 197-276, Academic Press, New York
  2. Harumoto, T., and Miyake, A. (1991) J. Exp. Zool. 260, 84-92
  3. Adoutte, A. (1988) in Paramecium (G ö rz, H.-D., ed) pp. 325-362, Springer-Verlag, Heidelberg
  4. Bonnemain, H., Gulik-Krzywicki, T., Grandchamp, C., and Cohen, J. (1992) Genetics 130, 461-470 [Abstract/Free Full Text]
  5. Cohen, J., and Beisson, J. (1980) Genetics 95, 797-818 [Abstract/Free Full Text]
  6. Pollack, S. (1974) J. Protozool. 21, 352-62 [Medline] [Order article via Infotrieve]
  7. Tooze, S. A., Chanat, E., Tooze, J., and Huttner, W. B. (1993) in Mechanisms of Intracellular Trafficking and Processing of Proproteins (Loh, P., ed) pp. 157-177, CRC Press, Boca Raton, FL
  8. Sperling, L., Tardieu, A., and Gulik-Krzywicki, T. (1987) J. Cell Biol. 105, 1649-1662 [Abstract]
  9. Gautier, M. C., Garreau de Loubresse, N., Madeddu, L., and Sperling, L. (1994) J. Cell Biol. 124, 893-902 [Abstract]
  10. Adoutte, A., Garreau de Loubresse, N., and Beisson, J. (1984) J. Mol. Biol. 180, 1065-1081 [Medline] [Order article via Infotrieve]
  11. Garreau de Loubresse, N., Gautier, M.-C., and Sperling, L. (1994) Biol. Cell 82, 139-147
  12. Tindall, S. H. (1986) Anal. Biochem. 159, 287-294 [Medline] [Order article via Infotrieve]
  13. Tindall, S. H., DeVito, L. D., and Nelson, D. L. (1989) J. Cell Sci. 92, 441-447 [Abstract]
  14. Madeddu, L., Gautier, M.-C., Vayssié, L., Houari, A., and Sperling, L. (1995) Mol. Biol. Cell 6, 649-659 [Abstract]
  15. Sonneborn, T. M. (1974) in Handbook of Genetics (King, R., ed) pp. 469-594, Plenum Publishing Corp., New York
  16. Sonneborn, T. M. (1970) Methods Cell Physiol. 4, 241-339
  17. Sambrook, J., Fritsch, E., and Maniatis, T. (1989) Molecular Cloning: A Laboratory Manual , Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY
  18. Garrels, J. I. (1979) J. Biol. Chem. 254, 7961-7977 [Abstract]
  19. Laemmli, U. K. (1970) Nature 227, 680-685 [Medline] [Order article via Infotrieve]
  20. Di Rago, J.-P., and Colson, A. M. (1988) J. Biol. Chem. 263, 12564-12570 [Abstract/Free Full Text]
  21. Devereux, J., Haeberli, P., and Smithies, O. (1984) Nucleic Acids Res. 12, 387-395 [Abstract]
  22. von Heijne, G. (1986) Nucleic Acids Res. 14, 4683-4690 [Abstract]
  23. Higgins, D. G., and Sharp, P. M. (1989) Comput. Appl. Biosci. 5, 151-153 [Abstract]
  24. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) Nucleic Acids Res. 22, 4673-4680 [Abstract]
  25. Garnier, J., Osguthorpe, D. J., and Robson, B. (1978) J. Mol. Biol. 120, 97-120 [Medline] [Order article via Infotrieve]
  26. Rost, B., and Sander, C. (1993) Proc. Natl. Acad. Sci. U. S. A. 90, 7558-7562 [Abstract/Free Full Text]
  27. Lupas, A., Van Dyke, M., and Stock, J. (1991) Science 252, 1162-1164 [Medline] [Order article via Infotrieve]
  28. Cohen, C., and Parry, D. (1986) Trends Biochem. Sci. 11, 245-248 [CrossRef]
  29. Pongor, S., Hatsagi, Z., Degtyarenko, K., Fabian, P., Skerl, V., Hegyi, H., Murvai, J., and Bevilacqua, V. (1994) Nucleic Acids Res. 22, 3610-3615 [Abstract]
  30. Lipman, D. J., and Pearson, W. R. (1985) Science 227, 1435-1441 [Medline] [Order article via Infotrieve]
  31. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) J. Mol. Biol, 215, 403-410 [CrossRef][Medline] [Order article via Infotrieve]
  32. Pearson, W. R., and Lipman, D. J. (1988) Proc. Natl. Acad. Sci. U. S. A. 85, 2444-2448 [Abstract]
  33. Le Caer, J. P., Rossier, J., and Sperling, L. (1990) J. Prot. Chem. 9, 290-291
  34. Peterson, J. B., Nelson, D. L., and Angeletti, R. H. (1990) in Current Research in Protein Chemistry (Villafranca, J. J., ed) pp. 79-85, Academic Press, New York
  35. Madeddu, L., Gautier, M. C., Le Caer, J. P., Garreau de Loubresse, N., and Sperling, L. (1994) Biochimie 76, 329-335 [CrossRef][Medline] [Order article via Infotrieve]
  36. Godiska, R. (1987) Mol. & Gen. Genet. 208, 529-536
  37. Klumpp, S., Hanke, C., Donella-Deana, A., Beyer, A., Kellner, R., Pinna, L. A., and Schulz, J. E. (1994) J. Biol. Chem. 269, 32774-32780 [Abstract/Free Full Text]
  38. Prat, A., Katinka, M., Caron, F., and Meyer, E. (1986) J. Mol. Biol. 189, 47-60 [Medline] [Order article via Infotrieve]
  39. Scott, J., Leeck, C., and Forney, J. (1993) Genetics 134, 189-198 [Abstract/Free Full Text]
  40. Prescott, D. M. (1994) Microbiol. Rev. 58, 233-267 [Abstract]
  41. Steers, E., Beisson, J., and Marchesi, V. T. (1969) Exp. Cell Res. 57, 392-396 [Medline] [Order article via Infotrieve]
  42. Caron, F., and Meyer, E. (1985) Nature 314, 185-188 [Medline] [Order article via Infotrieve]
  43. Preer, J. R., Jr., Preer, L. B., Rudman, B. M., and Barnett, A. J. (1985) Nature 314, 188-190 [Medline] [Order article via Infotrieve]
  44. Parry, D. A. (1982) Biosci. Rep. 2, 1017-1024 [Medline] [Order article via Infotrieve]
  45. Conway, J. F., and Parry, D. A. (1990) Int. J. Biol. Macromol. 12, 328-334 [CrossRef][Medline] [Order article via Infotrieve]
  46. Cohen, C., and Parry, D. A. (1990) Proteins 7, 1-15 [Medline] [Order article via Infotrieve]
  47. Cohen, C., and Parry, D. A. (1994) Science 263, 488-489 [Medline] [Order article via Infotrieve]
  48. Steiner, D. F., Smeekens, S. P., Ohagi, S., and Chan, S. J. (1992) J. Biol. Chem. 267, 23435-23438 [Free Full Text]
  49. Seidah, N. G., and Chretien, M. (1994) Methods Enzymol. 244, 175-188 [Medline] [Order article via Infotrieve]
  50. Resnick, N. M., Maloy, W. L., Guy, H. R., and Zasloff, M. (1991) Cell 66, 541-554 [Medline] [Order article via Infotrieve]
  51. Rost, B., Sander, C., and Schneider, R. (1994) J. Mol. Biol. 235, 13-26 [CrossRef][Medline] [Order article via Infotrieve]
  52. Crick, F. H. C. (1953) Acta Crystallog 6, 689-697 [CrossRef]
  53. McLachlan, A. D., and Stewart, M. (1976) J. Mol. Biol. 103, 271-298 [Medline] [Order article via Infotrieve]
  54. Freymann, D. M., Metcalf, P., Turner, M., and Wiley, D. C. (1984) Nature 311, 167-169 [Medline] [Order article via Infotrieve]
  55. Blum, M. L., Down, J. A., Gurnett, A. M., Carrington, M., Turner, M. J., and Wiley, D. C. (1993) Nature 362, 603-609 [CrossRef][Medline] [Order article via Infotrieve]
  56. Metcalf, P., Blum, M., Freymann, D., Turner, M., and Wiley, D. C. (1987) Nature 325, 84-86 [Medline] [Order article via Infotrieve]
  57. Vesterberg, O. (1969) Acta Chem. Scand. 23, 2653-2666
  58. Shih, S. J., and Nelson, D. L. (1992) J. Cell Sci. 103, 349-361 [Abstract/Free Full Text]
  59. Shih, S. J., and Nelson, D. L. (1991) J. Cell Sci. 100, 85-97 [Abstract]
  60. Peterson, J. B., Heuser, J. E., and Nelson, D. L. (1987) J. Cell Sci. 87, 3-25 [Abstract]

©1996 by The American Society for Biochemistry and Molecular Biology, Inc.