(Received for publication, September 10, 1996, and in revised form, February 24, 1997)
From the Biochemistry Department, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
The proteasome-like ClpP protease is widely distributed and structurally conserved among bacteria and eukaryotic cell organelles. In Chlamydomonas eugametos, however, the chloroplast clpP gene predicted a much larger ClpP protein containing large insertion sequences (ISs). One insertion sequence, IS2, is 456 amino acid residues long and not similar to known proteins. Here we show that IS2 is an unusual intein, and its protein splicing activity in Escherichia coli cells can be activated by a single amino acid substitution. Analysis of IS2 sequence revealed short sequence motifs that are similar to known intein motifs, including putative LAGLI-DADG endonuclease motifs. But a histidine residue conserved at the C terminus of known inteins is replaced in the IS2 sequence by a glycine residue (Gly455), rendering the IS2 sequence incapable of detectable protein splicing when tested in E. coli cells. Changing Gly455 to histidine activated the ability of IS2 to undergo protein splicing in E. coli cells. The IS2 sequence (intein) was precisely excised from a precursor protein, with the flanking sequences (exteins) joined together by a normal peptide bond.
An intein is defined as a protein sequence embedded in frame within a precursor protein sequence that is spliced out during maturation (1). This maturation process, termed protein splicing, involves the precise excision of the intein sequence and the formation of a normal peptide bond joining the external sequences (exteins). Protein splicing can therefore be considered the protein equivalent of RNA splicing and adds another layer of complexity to the central dogma of molecular biology. Inteins have been found in all three kingdoms of living organisms and in an increasing number of proteins. These include the 69-kDa catalytic subunit of the vacuolar H+-ATPase (VMA)1 from Saccharomyces cerevisiae (2) and Candida tropicalis (7), RecA protein from Mycobacterium tuberculosis (3) and Mycobacterium leprae (4), DNA polymerases from Thermococcus litoralis (6, 8) and Pyrococcus sp. strain GB-D (1), a gyrase protein (GyrA) from some Mycobacterium species (5), and an unidentified protein from M. leprae (9). Recent determination of the complete genome sequence of the methanogenic archaeon Methanococcus jannaschii revealed 18 putative intein sequences encoded in 14 genes (30). Most inteins have sequence motifs similar to homing endonucleases of self-splicing introns (10, 11), and the S. cerevisiae VMA intein was shown experimentally to "home" into an intein-less allele (12). The inheritance of inteins could therefore be either horizontal (intein homing) or vertical (normal chromosomal transmission).
Many intein sequences were sufficient for protein splicing when placed
within a foreign or target protein immediately before a nucleophilic
residue (3, 13, 14), suggesting that all cis information
required for protein splicing is contained within the intein. Protein
splicing occurred when intein-containing proteins were produced in a
wide range of heterologous contexts, including an in vitro
splicing system (14), suggesting that the reaction is autocatalytic.
Identified inteins range in size from 360 to 537 amino acid residues
and have little sequence identity, although a number of short sequence
motifs have been recognized that show a significant degree of
conservation (9). Several models for protein splicing mechanisms have
been proposed (8, 14-17). They generally involve an N O acyl shift
(or N
S shift) at the splice sites (18), formation of a branched
intermediate (19), and cyclization of an invariant Asn residue at the C
terminus of intein to succinimide (20), leading to excision of the
intein and ligation of the exteins.
ClpP is the catalytic subunit of the ATP-dependent and proteasome-like Clp protease that is widespread, if not ubiquitous, among prokaryotes and eukaryotic cell organelles (21-23). Genes for the ClpP protease have been characterized in many chloroplast genomes, and these invariably predicted ClpP proteins that are similar to bacterial and mitochondrial ClpP proteins in size and amino acid sequence (21, 24, 25). In the green alga Chlamydomonas, however, sequence determination of the chloroplast clpP gene revealed the presence of large translated insertion sequences (26). The Chlamydomonas eugametos chloroplast clpP gene predicted a protein sequence of 1010 residues long, which is five times the size of a typical ClpP protein. This unusually large size was entirely due to the presence of large translated insertion sequences. The largest insertion sequence, IS2, is 456 residues long, not present in several other Chlamydomonas species examined, and not similar to known proteins. Here we show that IS2 is a modified intein with many of the recognized intein motifs. IS2 apparently lost an important histidine residue that is conserved in known inteins, and restoring this histidine residue activated the protein-splicing activity of IS2 in Escherichia coli cells.
A 2-kilobase pair
BamHI-XhoI (blunt-ended) DNA fragment,
corresponding to the 3-two-thirds of the ClpP coding sequence, was cloned into the expression plasmid vector pMAL-c2 (New England Biolabs)
at the BamHI-SalI (blunt-ended) site, so that the
ClpP coding sequence was in frame with the upstream vector-encoded maltose-binding protein (MBP) coding sequence. Site-directed
mutagenesis was carried out by the polymerase chain reaction-mediated
site-directed mutagenesis method, using a mutagenic oligonucleotide
primer containing the desired mutations. The mutagenized DNA fragment
was used to substitute the corresponding DNA fragment in the
recombinant pMAL-c2 plasmids containing the ClpP-coding sequence. The
nucleotide sequence of the mutagenized DNA fragment was determined to
confirm the presence of the specified mutations and the absence of
unwanted mutations. Insertion of the ctag sequence was
carried out by first cutting the target DNA with restriction enzyme
SpeI, filling in the resulting 3
-recessed ends with Klenow
DNA polymerase and dNTPs, and then performing religation of the
resulting blunt ends.
E. coli cells containing the recombinant plasmid were grown in liquid Lurie Broth medium at 37 °C to late log phase (A600, 0.5). IPTG was added at this time to a final concentration of 1 mM to induce the production of IS2-containing fusion proteins. After 2 h of induction, cells were harvested, lysed by sonication, and separated into soluble and insoluble fractions when required. Cellular proteins were resolved by SDS-polyacrylamide gel electrophoresis and visualized by staining with Coomassie Blue R-250. When needed, a protein band of interest was excised from the stained gel, and the protein was electroeluted. Western blotting analysis was carried out using the anti-MBP antiserum from New England Biolabs. The amount of protein in individual protein bands was estimated by using a gel documentation system (Gel Doc 1000 coupled with Molecular Analyst software, Bio-Rad).
For treatment with protease Factor Xa, electroeluted protein was first dialyzed against a Factor Xa reaction buffer (20 mM Tris-HCl, pH 8.0, 100 mM NaCl, 2 mM CaCl2). 1 µg of Factor Xa was then added to every 10 µg of protein substrate and incubated at 23 °C for a specified length of time. Resulting protein fragments were resolved by SDS-polyacrylamide gel electrophoresis and transferred onto a polyvinylidene difluoride membrane by electrotransfer. The membrane was then stained in a Ponceau S staining solution (0.2% Ponceau S, 1% acetic acid in water) to visualize protein bands. The area of the polyvinylidene difluoride membrane containing the protein of interest was cut out and used for peptide analysis and protein sequencing.
Peptide analysis and protein microsequencing were carried out at the Microchemistry Facility of Harvard University. Proteins deposited on the polyvinylidene difluoride membrane were subjected to in situ (on the membrane) tryptic digest using protease trypsin, and the resulting tryptic peptides were fractionated using high-performance liquid chromatography. Fractions suspected of containing peptides of interest were analyzed by mass spectrometry to determine the precise mass of the peptides, using matrix-assisted laser desorption time-of-flight mass spectrometry performed on a Finnigan (Hemel, United Kingdom) Lasermat 2000. When needed, peptides identified by mass were subjected to further identification by protein microsequencing.
The insertion sequence IS2
of C. eugametos chloroplast ClpP (26) was analyzed for the
possible presence of intein-like features. This analysis revealed short
sequence motifs (Fig. 1) that showed significant
similarity to each of the seven conserved sequence motifs (A to G)
recognized previously in known inteins (9), including motifs C and E
that are putative endonuclease motifs (10, 12). These intein-like
sequence motifs of IS2 are similar to corresponding motifs of known
inteins both in primary sequences and in relative positions, although
the remaining part of the IS2 sequence does not have significant
sequence similarity to any of the known inteins. The IS2 sequence is
bounded by a pair of nucleophilic residues (Cys and Ser) and has an Asn
residue at its C terminus. These three residues have been shown in
known inteins to be critical for protein splicing (14, 18-20, 27). A
His residue located immediately before the C-terminal Asn residue is
also conserved in known inteins and thought to be important for protein
splicing (13, 15). Notably, this position in the IS2 sequence is
occupied by a Gly residue.
Protein Splicing of IS2-containing Proteins in E. coli Cells
To examine whether the IS2 sequence can support protein
splicing, we constructed recombinant plasmids that encode
IS2-containing fusion proteins (Fig. 2). In addition to
the unmodified (wild-type) IS2 coding sequence, mutant IS2 coding
sequences generated through site-directed mutagenesis were also
included. One mutation changed a ggt codon to a
cat codon in the DNA sequence, resulting in a substitution
of the corresponding Gly residue (Gly455) by a His residue
in the protein sequence. This substitution is located immediately
before the C-terminal Asn residue of IS2. Another mutation inserted a
4-base sequence in the DNA, creating a translation termination codon
(tag) at this position located 150 base pairs before the
3-end of the IS2 coding sequence. Each IS2 coding sequence was flanked
on both sides by in-frame coding sequences of other polypeptides: the
42-kDa MBP linked to a 105-residue ClpP sequence (SD2a) on the
N-terminal side and a 107-residue ClpP sequence (SD2b) on the
C-terminal side.
Each of these recombinant plasmids was introduced into E. coli cells to produce the corresponding fusion protein and to
observe possible protein-splicing products (Fig. 3). In
cells containing the unmodified IS2 sequence, only an unprocessed
precursor protein (117 kDa) was observed after staining with Coomassie
Blue (Fig. 3A), indicating that the wild-type IS2 sequence
was incapable of efficient protein splicing under these conditions.
When the IS2 sequence was modified by substituting the
Gly455 residue near its C terminus by a His residue, a
putative spliced protein (66 kDa) was readily observed (Fig. 3A,
lane 3), indicating that the single Gly His substitution
activated the ability of IS2 to undergo protein splicing. As controls,
mutant IS2 sequences containing a termination codon, with or without
the Gly
His substitution, resulted in only a truncated precursor
protein. An excised IS2 polypeptide (calculated size, 55 kDa) was not
observed, presumably failing to accumulate to a detectable level under
the conditions used.
Accumulation of the 117-kDa precursor protein in cells containing the modified IS2 construct indicated that the protein splicing did not proceed to completion. Most of the 117-kDa precursor protein was found in the insoluble fraction of the cell lysate (Fig. 3B, lanes 2-4), suggesting that its accumulation was due to misfolding or formation of inclusion bodies. The 66-kDa spliced protein also accumulated in the insoluble fraction of the cell lysate. The insolubility of both proteins was most likely caused by the presence of incomplete ClpP sequences in these proteins. Small amounts of the 117-kDa precursor protein and the 66-kDa spliced protein remained soluble and could be partially purified by using an amylose resin affinity column (Fig. 3B, lane 5). This is consistent with the expectation that both proteins contain MBP that binds to amylose resin. We were unable to make the purified precursor protein to undergo protein splicing in vitro, probably because the precursor protein was trapped in a misfolded conformation.
Western blotting was carried out as a more sensitive method to detect and identify the spliced protein, using an antiserum against the MBP sequence of the fusion proteins. Both the 117-kDa precursor protein and the 66-kDa spliced protein were recognized by the antiserum (Fig. 3B, lane 8), which further supported the assignment of these two proteins. Additional protein bands located between the 117- and 66-kDa bands were also recognized by the anti-MBP antiserum, and they appeared more abundant in cells containing the wild-type IS2 sequence than in cells containing the modified IS2 sequence (Fig. 3B, lanes 7 and 8). These bands likely represent protein-splicing intermediates or breakdown products. In cells containing the wild-type IS2 sequence, a Western blot revealed a weak band at a position corresponding to the 66-kDa spliced protein (Fig. 3B, lane 7). But it was not determined whether this weak band represents a small amount of the spliced protein or merely a protein breakdown product that migrates to that position.
Identification of the Spliced Protein ProductThe 66-kDa
protein was tentatively identified as the spliced protein because of
its size, its binding to amylose resin, and its recognition by anti-MBP
antiserum (Fig. 3). To confirm its identity further, the 66-kDa protein
was electroeluted from SDS-polyacrylamide gels (see Fig. 3A, lane
3) and subjected to further analysis. First, it was cleaved by the
sequence-specific protease Factor Xa, which cuts at a known site
located immediately after the MBP sequence. After incubation with
Factor Xa for various lengths of time, the 66-kDa protein was cleaved
into a 42-kDa fragment corresponding to MBP and a smaller fragment of
24 kDa in size (Fig. 4A).
The 24-kDa fragment was predicted to be a fusion product of the ClpP sequence domains SD2a and SD2b (see Fig. 2) as a result of protein splicing. To confirm this, the 24-kDa fragment was blotted onto a polyvinylidene difluoride membrane and subjected to peptide analysis and protein sequencing. After digestion of the 24-kDa fragment with protease trypsin, four of the resulting tryptic peptides were analyzed by mass spectrometry to determine their precise masses. For each of the four peptides, the measured mass correlated very well with the theoretical mass of a predicted tryptic peptide of a spliced protein (Fig. 4B). Among the four peptides, peptides p1 and p2 are within the SD2a sequence domain, and peptide p4 is within the SD2b sequence domain, whereas peptide p3 spans the spliced junction (Fig. 4B). Peptide p3 was then definitively identified by determining the first 30 residues of its sequence by protein microsequencing (Fig. 4B).
These results firmly identified the 66-kDa protein as a spliced protein produced by protein splicing, in which the IS2 sequence (intein) was precisely excised from the 117-kDa precursor protein, with the flanking sequences (exteins) joined together by a normal peptide bond (Fig. 4C). No RNA splicing was detected by Northern blotting or by a more sensitive reverse transcription-polymerase chain reaction method, although an unspliced RNA transcript was detected by Northern blotting (data not shown). This is consistent with the absence of recognizable RNA intron features in the IS2 coding sequence (26). Our observation that complete translation of the IS2 sequence is necessary for producing the 66-kDa spliced protein (Fig. 3A) also argues against RNA splicing and translational bypassing (28).
The insertion sequence IS2 of C. eugametos chloroplast ClpP protease was identified as an intein, not only by the presence in its sequence of recognizable intein motifs, but also by the observation of protein splicing of IS2-containing proteins in E. coli cells. Identification of this chloroplast-encoded intein extended the distribution of intein-coding sequences to a cell organelle genome. Previously identified intein-coding sequences were found in a nuclear genome (2, 7), eubacterial genome (3-5), and archaebacterial genome (6). A 150-residue insertion sequence in the chloroplast DnaB protein of Porphyra purpurea was noted previously to also have some of the putative intein motifs (9, 29), which may prove to be another intein encoded by a cell organelle genome. It was noted recently that inteins were found predominantly in proteins involved in DNA metabolism (5), which included DNA polymerases, a DNA recombinase, a DNA gyrase, and possibly a DNA helicase. The intein-containing ClpP protein, being a protease (21-24) not known to interact with DNA, is therefore an exception.
The ClpP IS2 intein apparently lost an important His residue (replaced by Gly455) that is highly conserved in known inteins. It takes a minimum of two nucleotide substitutions to change a His codon into a Gly codon, and no RNA editing is known in this organism. This suggests that the IS2 intein may be somewhat degenerate and may have accumulated additional, although less critical, mutations in other parts of its sequence. Although the IS2 intein apparently retains many of the putative intein motifs, the degree of overall sequence similarity between IS2 and other inteins is extremely low and practically undetectable. When only the seven conserved intein sequence motifs (a total of 93 positions) were compared, sequence identities between IS2 and other inteins range from 30% for the Pps1 intein of M. leprae, 25% for the VMA1 intein of yeast, and 19% for the RecA intein of M. tuberculosis to much lower for others. Low sequence similarities between IS2 and other inteins may suggest independent origins, long periods of separate evolution, or low functional constraint. Although an origin of the ClpP IS2 intein is not known, it probably was acquired by C. eugametos relatively recently, because several other, closely related Chlamydomonas species lack this intein (26). The ClpP IS2 intein contain two putative endonuclease motifs (C and E) that could be associated with intein mobility (intein homing), although endonuclease activity has not been detected.
The ClpP IS2 intein was incapable of detectable protein splicing in
E. coli cells under conditions used in this study, whereas replacing the single Gly455 residue with a His residue
activated the protein-splicing activity. This suggests that a His
residue at this position plays an important role in protein splicing,
consistent with the observation that this His residue is highly
conserved among known inteins. Although an explicit role for this
conserved His residue has not been demonstrated in recent studies of
protein-splicing mechanisms, it was proposed in an earlier model that
this conserved His residue may assist the N O acyl shift at the
splice sites (15). In the yeast VMA intein, experimentally replacing
this His residue by a Gly residue completely abolished its
protein-splicing activity, whereas replacement by Lys, Glu, Val, and
Leu residues resulted in a lower rate of protein splicing (13).
Interestingly, among the 18 putative intein sequences revealed recently
in the M. jannaschii genome sequence, two of them also lack
this conserved His residue (30).
In the ClpP IS2 intein, the natural substitution of the conserved His residue by a Gly residue (Gly455) raises questions about the in vivo protein-splicing activity of this intein. The conserved His residue may not be required for efficient ClpP protein splicing in its native chloroplast environment, and the in vivo splicing might be assisted by unknown trans-acting factors. A slow splicing reaction may provide a means of regulating the ClpP protease activity, and an unexcised IS2 intein might also be tolerated under some conditions. Our effort in detecting the chloroplast ClpP protein (precursor or spliced protein) in C. eugametos cells by Western blotting has not been successful, presumably due to a combination of weak anti-ClpP antibodies used in the study and a low level of the ClpP protein in the cell.
We thank Z. Hu for technical assistance in Western blot analysis. We also thank Drs. F. Doolittle, M. Gray, R. Singer, and C. Wallace for helpful discussions of this work.