(Received for publication, August 2, 1995; and in revised form, October 27, 1995)
From the
The 43,000-Da glycoprotein (gp43) of Paracoccidioides
brasiliensis is an immunodominant antigen for antibody-dependent
and immune cellular responses in patients with paracoccidioidomycosis.
In order to identify the peptide epitopes involved in the immunological
reactivities of the gp43 and to obtain highly specific recombinant
molecules for diagnosis of the infection, genomic and cDNA clones
representing the entire coding region of the antigen were sequenced.
The gp43 open reading frame was found in a 1,329-base pair fragment
with 2 exons interrupted by an intron of 78 nucleotides. The gene is
present in very few copies per genome, as indicated by Southern
blotting and chromosomal megarestriction analysis. A single transcript
of 1.5 kilobase pairs was verified in the yeast phase. The gene encodes
a polypeptide of 416 amino acids (M 45,947) with a
leader peptide of 35 residues; the mature protein has a single N-glycosylation site. The deduced amino acid sequence showed
similarities of 56-58% with exo-1,3-
-D-glucanases
from Saccharomyces cerevisiae and Candida albicans. However, the gp43 is devoid of hydrolase activity and does not
cross-react immunologically with the fungal glucanases. Internal and
COOH-terminal gene fragments of the gp43 were expressed as recombinant
fusion proteins, which reacted with antibodies elicited against the
native antigen.
Paracoccidioides brasiliensis is a dimorphic fungus that causes paracoccidioidomycosis, a deep-seated infection, prevalent in rural workers in several Latin American countries. The yeast phase is the infective form of the fungus which synthesizes antigenic heteropolysaccharides, glycoproteins, and glycolipids that may have a role in pathogenicity and interact with the immune system(1) .
The main diagnostic antigen of paracoccidioidomycosis is an
exocellularly secreted glycoprotein of 43,000 Da (gp43) ()which reacts with 100% of sera from patients with this
mycosis in double immunodiffusion and immunoprecipitation
reactions(2, 3) . This molecule has been purified by
immunoaffinity chromatography with an anti-gp43 monoclonal antibody (4) . It contains immunodominant peptide epitopes that (a) react with human antibodies and are not affected by N-deglycosylation (5) and (b) elicit T-cell
dependent delayed hypersensitivity reactions(6) . It also
induced the proliferation of T-CD4 lymphocytes in mice primed with the
antigen and of human peripheral lymphoid cells from a sensitized
individual(7) . The role of the gp43 in the pathogenicity of P. brasiliensis was suggested based on some of its properties.
Thus, the gp43 is the main secreted component of the fungus that binds
to murine laminin. It has been shown that laminin-coated yeast forms of
this fungus show a marked increase in their ability to invade and
destroy the infected tissues(8) . As a high-mannose
glycoprotein, detectable in the serum of patients with acute and
chronic paracoccidioidomycosis(9) , the gp43 and other fungal
components binding to concanavalin A may act as metabolic inhibitors or
cause a negative regulation of natural killer lymphocyte
cytotoxicity(10) . Finally, the gp43 has been associated with a
proteolytic activity (4) not necessarily a property of the
glycoprotein itself but that of an aggregated protease of very high
specific activity(11) . Early attempts to clone and express the
gene encoding the gp43 (12) aimed at isolating a recombinant
molecule that could be used in the immunodiagnosis, be sequenced for
peptide epitope identification, and be tested as a virulence factor.
The isolated clone, however, was unstable and could not be used in
subsequent studies.
In the present work we report on the characterization of the complete sequence of the gene encoding the gp43 antigen from P. brasiliensis. Determination of the sequence of peptide fragments derived from the native molecule by enzymatic proteolysis permitted PCR amplification of a genomic fragment which was used as a probe to isolate the entire gene from a genomic library of the fungus. A genomic fragment corresponding to the COOH-terminal portion of the protein was cloned into pGEX plasmid, and the expression product reacted in immunoblots with anti-gp43 polyclonal rabbit and human patient antibodies.
Figure 1:
The gp43
gene cloning strategy. A, the gp43 sequenced peptides are
shown in their assumed positions as inferred from amino acid identities
with S. cerevisiae glucanase. Degenerate, inosine-containing
primers were designed based on the three internal peptides: OLCK53
(forward primer), OLCK59 and OLCK33 (reverse primers). The
oligonucleotide pairs OLCK53/OLCK33 and OLCK53/OLCK59 were used for PCR
amplification of P. brasiliensis genomic DNA and for RT-PCR.
The black box represents the intron. B, ethidium
bromide-stained agarose gel showing the PCR amplified products from the
genomic DNA template of P. brasiliensis using as primers the
OLCK53/33 and OLCK53/59 pairs: fragments of 987 and 570 bp (lanes 2 and 3); and the 492-bp RT-PCR amplified product (lane
4), obtained using as primers the pair OLCK53/59. Lane 1,
molecular weight markers ( HindIII digested DNA and
174 HaeIII digested replicative form DNA, BRL). C, autoradiogram of the 570- and 492-bp PCR products from the
Southern blot (lanes 2 and 3 stained with ethidium
bromide) hybridized with the [
P]dATP-labeled
987-bp genomic probe. The difference in the size of the hybridized
bands (lanes 4 and 5) corresponds to splicing of the
78-nt intron. Lane 1, M
markers.
Each reaction (25 µl) was carried out in 25 cycles. Optimized
conditions included denaturing at 94 °C (1 min), annealing at 50
°C (1 min), and extension at 72 °C (1.5 min). The PCR fragments
of interest were recovered from agarose gel by using the
Sephaglass(TM) kit (Pharmacia Biotech Inc.). When used as probes,
they were labeled with [-
P]dATP using a
random-primer labeling kit (Life Technologies, Inc./BRL). Cloning was
carried out using the Sureclone(TM) kit (Pharmacia) and the
fragments were ligated into dephosphorylated M13 mp10 replicative form
DNA at the SmaI site and sequenced as described(18) .
Intact chromosome sized DNA molecules from yeast and mycelial forms (strain B339) were prepared by lysis and enzymatic proteolysis of cells embedded in low melting agarose. The chromosomal DNA was digested with 20 units of NotI and SfiI restriction enzymes. The resulting megarestriction fragments were separated using Saccharomyces cerevisiae pulsed-field gel electrophoresis conditions. MegaBase I (chromosomal DNA from S. cerevisiae, Life Technologies, Inc.) was used as the molecular mass standard. The gels were transferred onto nylon membranes and the chromosomal blots were hybridized with the above mentioned probe.
Figure 3:
The nucleotide sequence of the gp43 gene
and the deduced amino acid sequence of the 43,000-Da glycoprotein of P. brasiliensis. The TATAAA element, the CAAG motifs, the T
+ C-rich block, the stop codon, and the putative tripartite
consensus sequence for the polyadenylation signal are in bold
type. The amino acid sequence derived from the open reading frame
starts at position nt +1 and ends at position nt +1329. The
intron is written in lowercase letters. The leader
peptide(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35) is underlined. The NH-terminal and the internal
peptide sequences obtained from the purified native antigen are
indicated in bold italic type and numbered:
NH
-terminal sequence (0), internal peptides (1, 2, and 3). The N-glycosylation site (NRT) is shown underlined and bold.
To isolate the entire gp43 gene (GP43G), the 987-bp PCR fragment was used as a probe in the screening of P. brasiliensis genomic DNA library. An EcoRI genomic fragment of 3.8 kb carrying the gp43 gene was isolated and characterized (Fig. 2). This was subcloned into pUC18 giving rise to pUCGPb16A. Further analysis of pUCGPb16A with restriction enzymes and Southern blotting probed with the 987-bp fragment localized the gene in a BglII-XbaI fragment of approximately 2.8 kb.
Figure 2: Restriction map and sequenced region of the genomic insert of clone pUCGPb16A. A, showing the relative positions of the endonucleases restriction sites; E, EcoRI; Bg, BglII; B, BamHI; S, SmaI; H, HindIII; X, XbaI; B, the 1981-kb sequenced fragment of the genomic insert containing the entire gp43 gene, showing the 5`-upstream region (nts -326 to -1), the gp43 coding region (nts +1 to +1329), the two exons (1 and 2) separated by the 78-nt intron (nts +464 to +541) and the 3`-untranslated region (+1330 to +1655).
The complete nucleotide sequence of the gp43 gene and its 5`- and 3`-flanking regions is shown in Fig. 3. The deduced amino acid sequence consisted of 416 residues with an estimated molecular mass of 45,947 Da. The choice of the initiation codon (ATG) was based on the following evidence: (a) the sequence upstream of the initiation site contains stop codons in all three reading frames. Furthermore, as it will be discussed below, this region carries a putative promoter and other translational initiator consensus sequences. (b) The predicted protein size without the signal peptide (42, 281 Da) is comparable to that of the mature gp43 protein.
The coding region of the gp43 gene is interrupted by an intron of 78 nt (Fig. 3), with 5` and 3` extremities at positions +464 and +541, respectively. The 5` and 3` extremities of the intron presented the GT/AG consensus(25) . It is interesting to note the presence in the intron of a 11-nt motif (TAGAATATCTC) which was also found perfectly repeated in the 3`-untranslated region of the gp43 gene (positions 1601 to 1612). Several repetitions of the minimotif (TA) were also detected in the intron. The coding region of the gp43 gene showed A + T content of 48.2%, whereas its 5`- and 3`-flanking regions gave higher A + T contents of 57.5 and 63.8%, respectively.
Analysis of the 5` 326-bp flanking sequence of the gp43 gene revealed structural feature characteristics of the promoter regions of eukaryotic genes. A TATA element (TATAAATA) is at position -80 and a T + C-rich pyrimidine block is found immediately downstream (nt -71 to -40). The CAAG motif (26) is found once, 14 nt downstream of the TC block (nt -25), and twice upstream, at positions -195 and -259.
The 3` region immediately downstream from the gp43 open reading frame contains motifs thought to be necessary for termination of transcription, processing, and addition of poly(A) at the 3` terminus. Although a perfect match to the described eukaryotic polyadenylation consensus sequence(27) , 5`-AATAAA-3`, is not found, a similar pentanucleotide (5`-AATAA-3`) is observed at position 1616, 276 nt downstream from the termination codon. In addition, a tripartite sequence (TAG . . . TATTT . . . TTT) was identified between nucleotides 1354 and 1388. This sequence presents a high degree of homology to a tripartite consensus sequence (TAG . . . TATGT . . . TTT) that has been postulated to be a signal for termination and/or polyadenylation in S. cerevisiae(28, 29) . The size of the gp43 transcript deduced from the nucleotide sequence analysis is in agreement with the estimated size of transcript (approximately 1.5 kb) detected in the Northern blot hybridization (see Fig. 7).
Figure 7:
Northern blot hybridization. P.
brasiliensis total RNA (lane 2) and poly(A) RNA (lane 3) were isolated from 5-day-old yeast cells
and probed with the radiolabeled 987-bp PCR fragment. Lane 1, Trypanosoma cruzi total RNA (specificity control and M
marker) and lane 4, a RNA ladder. A
single transcript of approximately 1.5 kb was
hybridized.
The gp43 protein has 41.8% nonpolar, 34.5% polar noncharged, 12.8% positively charged, and 9.7% negatively charged residues. The codon usage in the gp43 gene showed a relaxed bias toward preferential codons for the amino acids Ala, Ser, Thr, Val, Ile, Arg, Leu, and Pro. Of the 61 possible codon triplets, all are used in the gp43 gene. From the calculations described by Bennetzen and Hall (30) we obtained a codon bias index of 0.476 for the gp43 gene, which for yeasts would represent a gene of moderate level of expression.
There were two potential N-glycosylation sites at residues 2 and 195 (residues 2-4, NFS, and 195-197, NRT, Fig. 3). Since amino acid 2 is within the signal sequence the mature gp43 should contain a single N-glycosylation signal. The net charge of the gp43 gives a pI of 7.43. The polypeptide is composed of alternating hydrophobic and hydrophilic regions, which is consistent with the water soluble character of the gp43; there is no hydrophobic sequence in the mature protein long enough to span the membrane.
Figure 4:
Immunological reactivity of recombinant
proteins and anti-recombinant protein antiserum. The subfragments of
the internal (residues 110-272) and COOH-terminal (residues
288-411) domains of the gp43 were fused with GST. In A the reaction of an anti-GST rabbit antiserum with the recombinant
proteins (lanes 1 and 2; GST alone, lane 3)
confirms that the products are expressed as fusion proteins; a rabbit
polyclonal monospecific anti-gp43 antiserum (anti-gp43R) and sera
(pool) from patients (anti-gp43H) recognize the recombinant proteins in
immunoblots (lanes 5, 6 and 9, 10, respectively). Lanes 4, 8, and 12, plasmid-less bacterial extracts. Lanes 7 and 11, GST. In B antibodies
generated in rabbits against the recombinant antigen carrying the gp43
COOH-terminal sequence recognized the native gp43 in immunoblots of P. brasiliensis culture filtrate (lane 3) as shown by I-protein A binding; lanes 1 and 2,
reactions with rabbit preimmune serum and with the monospecific rabbit
antiserum (anti-gp43R), respectively.
Figure 5: Alignment of the amino acid sequence of the gp43 with known sequences. Alignment of the gp43 deduced amino acid sequence with known sequences of cloned fungal exoglucanases: S. cerevisiae, vegetative (EXG1), and spore specific (SPR1) exoglucanases, and C. albicans exoglucanase (CAXOG). The alignment was performed by the J. Hein method. Asterisks and dots indicate identical and similar residues, respectively.
Figure 6:
Genomic organization of the gp43 gene. A, autoradiogram of a Southern blot of P. brasiliensis genomic DNA digested with restriction enzymes and probed with the
987-bp fragment: 1, EcoRI, a single fragment of 3.8
kb, which corresponds to the size of the insert of the recombinant
gt11 clone; 2, BamHI; 3, HindIII; 4, SmaI; 5, EcoRI
and BamHI, double digestion; 6, EcoRI and HindIII; 7, EcoRI and SmaI; 8, BamHI and HindIII; 9, BamHI and SmaI; 10, HindIII and SmaI; 11, EcoRI, BamHI, and HindIII, triple digestion; 12, EcoRI, BamHI, and SmaI. B, an ethidium bromide
stained pulsed field gel of chromosomal megarestriction fragments from P. brasiliensis mycelium (M) (lanes 2 and 4) and yeast phase (Y) (lanes 3 and 5). Digestions employed the rare cutting enzymes NotI (lanes 2 and 3) and SfiI (lanes 4 and 5). S. cerevisiae intact chromosomes were
used as molecular weight markers (lane 1). The corresponding
autoradiogram shows that the NotI digested M and Y chromosomes hybridized with a single 210-kb fragment (lanes 7 and 8).The M and Y SfiI
digests hybridized with two fragments of approximately 330 and 440 kb (lanes 9 and 10). Lane 6, S. cerevisiae DNA.
The Northern blot hybridization
of the total and poly(A) RNA using the same probe
showed, on the other hand, a single transcript, a well defined band of
approximately 1.5 kb which is large enough to encode a protein of
45,947 Da (Fig. 7).
In this report we describe the cloning and characterization of genomic and cDNA recombinant clones representing the entire coding region of the exocellularly secreted glycoprotein of 43,000 Da, previously described as the main diagnostic antigen of the dimorphic pathogenic fungus P. brasiliensis(2) .
So far, this is the first gene of the fungus to be cloned, entirely sequenced, and expressed in bacteria as recombinant fusion proteins. Several lines of evidence indicate that the sequenced clones encode the gp43 antigen. First, the deduced amino acid sequence encloses all the previously characterized partial peptide sequences of the native protein. Furthermore, it is remarkably consistent with previous analyses of amino acid composition in different preparations of the native gp43(7) . Second, different regions of the gp43 protein, expressed as fusion proteins in E. coli, reacted strongly with rabbit antiserum against the gp43 native protein and with sera from patients with paracoccidioidomycosis. On the other hand, the antiserum elicited against the gp43 recombinant fusion protein specifically recognized the native gp43 by immunoblotting of the P. brasiliensis culture filtrate. Third, Northern hybridizaton using the cloned DNA as a probe identified a transcript of 1.5 kb, which is large enough to encode a protein of the size predicted for native gp43.
The
deduced gp43 open reading frame encodes a polypeptide of 416 amino
acids (M 45,947), with a leader peptide of 35
amino acids which was defined by peptide sequencing of the NH
terminus of the mature protein. The leader peptide has a
hydrophobic sequence and a putative signal peptide cleavage site after
residue 23
(Ala
-Ser
)(33, 34) . If the
signal peptidase cleavage observes these rules there could be a second
post-translational processing of residues 24-35, which are not
present in the secreted antigen. The specificity of the second
proteolysis step would differ from that of the Kex2-like proteinases
which process the homologous fungal exoglucanases at a cleavage site
consisting of basic amino acid residues,
KR(24, 32, 35) . This basic pair is absent in
the gp43 leader peptide (Fig. 3).
Analysis of the sequences upstream and downstream from the gp43 open reading frame showed several motifs similar to the consensus elements important for the control of transcription in eukaryotic organisms. The CAAG motif, closely associated with the TC block, is correlated with a high level of expression of certain yeast genes(26, 36) .
The presence of an intron was demonstrated in the GP43G determining two exons, 1 and 2. The fungal homologous genes of exo-1,3-glucanases of S. cerevisiae and C. albicans have no introns. The presence of the intron in the P. brasiliensis DNA may implicate a transcriptional regulation. In fact, P. brasiliensis morphological transition is dependent on temperature, and therefore adaptation to temperature shifts and environmental stress is essential for the pathogen to survive in mammalian tissues. It is well known that heat shock affects RNA metabolism, including RNA processing as well as mRNA degradation. In dimorphic fungi, those changes require responses at the gene level and DNA sequences containing introns could play a vital role in adaptation to the new environment(37) .
In spite of the significant identity shared at the amino acid level with glucanases of the vegetative forms of S. cerevisiae and C. albicans, and with S. cerevisiae spore-specific glucanase, we could not demonstrate any glucanase activity in the native gp43 molecule using different substrates. It is also noteworthy that no immunological cross-reactivity could be detected between gp43 and the glucanases of the other fungi.
Although the homologies among
blocks of amino acid residues of the aligned glucanases and the gp43
might suggest functional domains, identification of the amino acid
sequences and conformation of the catalytic and binding domains of the
fungal exo-1,3-glucanases is not possible at present. In many highly
unrelated bacterial 1,4--endoglucanases and in two fungal
endoglucanases, however, the amino acid sequence NEP flanked by
hydrophobic amino acid sequences is a conserved structure. Another
similar sequence, LEP, was noted in bacterial and fungal glucanases.
The NEP sequence is essential for the catalytic activity as
demonstrated by site-directed mutagenesis of this conserved motif in
two highly unrelated bacterial endo-
-glucanases (38) . It
has been suggested that the E residue of the NEP sequence could be the
proton donor in the hydrolysis process. In fact, the NEP sequence is
similar to the MNEP sequence that exists in human lysosomal
-glucosidase and human and rabbit isomaltase and sucrase and
reacts with the sugar unit of the substrate in the pyranose
configuration(38, 39) . The LEP and NEP sequences are
conserved in C. albicans (residues 64-66 and
229-231, respectively) and S. cerevisiae EXG1 (residues
64-66 and 231-233) and SPR1 (residues 65-67 and
232-234)
exo-1,3-glucanases(24, 31, 32) . The gp43 has
the LEP sequence (residues 51-53), but the NEP is altered to NKP
(residues 207-209). Such difference in the amino acid sequence at
the catalytic site with the introduction of a basic amino acid
replacing an acidic one can itself account for the absence of glucanase
activity in the gp43 protein. With the present evidence, one can
suggest that the fungal glucosidases as well as the gp43 may have had a
common ancestral gene and that a divergent evolutionary processing of
these molecules has occurred. In the absence of a functional
glucosidase activity, the gp43 may have been conserved mainly as an
immunomodulating antigen and a virulence factor.
The two GST-gp43 fusion proteins represent different parts of the whole antigen and were both recognized by rabbit monospecific polyclonal antibodies and by human sera from patients with paracoccidioidomycosis, indicating that the recombinant proteins shared epitopes with native gp43 including those recognized by antibodies from human patients. Therefore, at least two different B cell epitopes must be present in the gp43 molecule. The highest peak of hydrophilicity, computed using an average group length of 6 amino acids, which might be considered as a third epitope, corresponds to the sequence GRDAKR (residues 78-83). This sequence is 1 amino acid shorter than the homologous sequences of C. albicans and S. cerevisiae vegetative glucanases (Fig. 5) which are in turn more hydrophobic. Such variations and others detected by comparing the hydropathic profiles of the three amino acid sequences justify the absence of immunological cross-reactivity of the glucanases with antibodies elicited against the gp43. Recombinant molecules containing peptide epitopes of the gp43 can, on the other hand, be helpful to increase the specificity of the diagnostic antigen since all reported cross-reactivities depended on the N-linked carbohydrate chain of the glycoprotein(3) .
Genomic Southern blot hybridization indicated that the gp43 is encoded by a gene with very few copies, similarly to the homologous fungal exo-1,3-glucanase genes(24, 31, 32) . P. brasiliensis is a multinucleate organism both in the mycelial and yeast phases(40) . The intensity of the hybridization signals obtained with the chromosome SfiI megarestriction fragments was very similar (Fig. 6B) indicating that they may represent two alleles of the GP43G and therefore, suggesting that both the yeast and mycelial phases of P. brasiliensis could be at least diploid. These results are of interest as there is no information about the fungus ploidy.
We have previously described the gp43 from P. brasiliensis as the most specific antigen in the paracoccidioidomycosis-P. brasiliensis system(2) . It is presently being used in a variety of serological tests for diagnostic purposes(41, 42) . The present study opens the perspective of analyzing the individual peptides and epitopes that play a role in the interaction of the gp43 with cells of the immune system, antibodies, and elements of the extracellular matrix. Recognition of individual epitopes that elicit a favorable immunological response can contribute for the immunotherapy of this systemic mycosis. The recombinant molecule and/or the selected epitopes cloned and amplified will be extremely helpful to define both the specificities and their functional role in the biology of this pathogenic fungus. The primary sequence of the gp43 will also permit evolutionary studies involving related molecules, including the glucanases of bacteria and fungi.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U26160[GenBank].