(Received for publication, May 31, 1995; and in revised form, August 3, 1995)
From the
Mucins are heavily O-glycosylated Thr/Ser/Pro-rich molecules. Given their relevant functions, mucins and their genes have been mainly studied in higher eukaryotes. In the protozoan parasite Trypanosoma cruzi, mucin-like glycoproteins were shown to play an important role in the interaction with the surface of the mammalian cell during the invasion process. We show now that this parasite has a family of putative mucin genes, whose organization resembles the one present in mammalian cells. Different parasite isolates have different sets of genes, as defined by their central domain. Central domains, rich in codons for Thr and/or Ser and Pro residues, are made up of either a variable number of repeat units in tandem or non-repetitive sequences. Conversely, 5`- and 3`-ends from different genes in different isolates have similar sequences, suggesting their common origin. Comparison of deduced amino acid sequences revealed that all members of the family have the same putative signal peptide on the N terminus and a putative sequence for glycophosphatidylinositol anchoring on the C terminus. The deduced molecular mass of the core proteins is small (from 17 to 21 kDa), in agreement with the 1-kilobase size of the mRNA detected. Putative mucin genes in T. cruzi are located on large chromosomal bands of about 1.6-2.2 megabase pairs.
Mucins are highly glycosylated proteins expressed by most
secretory epithelial tissues in vertebrates. They consist of a core
protein moiety where a number of carbohydrate chains are attached to
serines and threonines by -1-3 O-glycosidic
bonds(1) . The complex structure of these glycoproteins made
the identification of genes encoding the protein moiety more difficult.
However, several MUC-like genes have been isolated recently due to the
fact that they have a defined basic structure and sequence, which
allows their inclusion in a gene family. MUC-like genes in vertebrates
are essentially composed of a central domain and 5`- and 3`-flanking
sequences(2, 3) . The central domains, comprising up
to 70% of the coding sequences, are composed by tandemly repeated units
enriched in codons for Ser and Thr, which are the target sites for O-glycosylation in the protein product, as well as Pro
residues(4) . Sequences flanking the central domain, on the 5`-
and 3`-ends of mucins genes, lack repeated sequences.
The percentage of amino acid identities among different mucin core proteins are low. No substantial identities were found among the repeats in different molecules. They are unique in size and sequence for each member of the mucin family, even though they contain many Ser and Thr residues, suggesting that their only function is to serve as a scaffold for O-linked glycans(3) . Furthermore, different individuals have a variable number of repeated units in homologous core proteins, making the loci coding for mucins highly polymorphic among individuals(5, 6) . Partial sequence identities were found in defined regions of the N and C termini. For example, significant identities were observed between the deduced amino acid sequence from MUC2 and putative MUC5 human mucins(7) , the porcine and bovine submaxillary mucins(8, 9) , and the cysteine-rich C-terminal regions of rat intestinal mucin-like and human MUC2 peptides(10, 11, 12) .
Genes encoding
molecules that have mucin-like features in lower eukaryotes have been
detected in Leishmania major(13) and Trypanosoma
cruzi(14) . Particularly in T. cruzi, the
ethiological agent of Chagas disease, much work has been done on the
biochemical and functional characterization of mucin-like surface
glycoconjugates(15) . These heavily O-glycosylated
molecules are Thr-, Ser-, and Pro-rich and are attached to membrane by
glycophosphatidylinositol anchor(16) . Mucins in T. cruzi are the major acceptors of sialic acid in a reaction catalyzed by
trans-sialidase(15, 16) . Recent evidence suggests
that these molecules are involved in the cell invasion process,
probably mediating adhesion of the parasite to the mammalian cell
surface(17, 18) . We have previously identified a
putative mucin gene in T. cruzi(14) having a small
size and encoding five repeat units with the consensus sequence
TKP
. In this work, we show that T. cruzi has, in fact, a putative mucin gene family resembling the one
present in vertebrate cells. Their members have a Thr/Ser/Pro-rich
central domain, which might or might be not organized in repetitive
units, and highly conserved non-repetitive flanking domains.
Figure 5: Schematic scale representation of predicted regions for the different mucin-like deduced polipeptides. The deduced protein domains from genes MUC.M/76, MUC.CA-3, MUC.CA-2, MUC.RA-1, and MUC.RA-2 genes are represented. Percentage of similarities with MUC.M/76 deduced amino acid sequences are indicated for regions between dashedlines. Calculations were done using the program DNASTAR (DNASTAR Inc.) with sequences aligned by the Jotun Hein method(34) . To reduce the number of figures, location of oligonucleotides used for PCR and probes are indicated above their corresponding polipeptide site.
MUC.RA-1 and -2 clones were obtained from T. cruzi RA
strain by PCR ()using oligonucleotides P1 and P2. PCR was
performed using Vent DNA Polymerase (New Englands BioLabs) according to
manufacturer indications, with an annealing temperature of 60 °C.
The PCR
product was a single band of 800 base pairs that
was cloned into EcoRV-digested pBluescript KS(+) vector.
Figure 1: Genomic DNA blots. Genomic DNA of strains RA (lanesR) and CL (lanesL) and the cloned stocks SylvioX-10/7 (lanesS) and CA1/72 (lanesC) of T. cruzi was digested with AvaII, electrophoresed, and blotted onto nitrocellulose filters. Identical blots were analyzed using different probes. Panel1, ``repeats'' probe from MUC.M/76 clone; panel2, complete MUC.CA-2 probe; panel3, central MUC.RA-2 probe lacking the 5`- and 3`-conserved flanking sequences. Probes are described in more detail under ``Experimental Procedures.''
Figure 2:
Comparison of proteins deduced from
representative genes cloned from different T. cruzi strains.
Deduced amino acid sequences for clones MUC.M/76, MUC.CA-3, MUC.RA-1,
and MUC.RA-2 are presented. Regions composed of perfect repeats, with
the TKP
consensus sequence, are indicated with
a graybar. Amino acid identities with the original
MUC.M/76 gene are indicated with colons, and gaps introduced
for best alignment are indicated with dashes. Total sizes, in
amino acids, are indicated at the end of each sequence. Alignments and
computer-aided analyses were done using the program DNASTAR (DNASTAR
Inc.). The published sequence of MUC.M/76 (17) was found to be
incorrect for the last 40 residues. The sequence given in this figure
is correct. V.N.T.R., variable number of tandem
repeats.
These results suggest that different isolates and clones of T. cruzi might have related sequences sharing 5`- and 3`-ends but differing in their central regions. To test this possibility, one strain of T. cruzi (RA), which did not hybridize with the repeated region of the MUC.CA genes, was analyzed. 18 recombinant clones were obtained by PCR as described under ``Experimental Procedures.'' Two of them, named MUC.RA-1 and MUC.RA-2, were selected for sequencing since they cross-hybridized weakly in Southern blot experiments. Comparison of their deduced amino acid sequences revealed two highly homologous regions having 78 and 81% of identity in the N and C termini, respectively, and two degenerate repeats similar to those present in MUC.CA sequences (Fig. 2). Between these two conserved domains, both MUC.RA-1 and -2 genes lack repetitive units and differ almost completely. These results indicate that a single parasite might have a family composed of highly divergent members. Furthermore, since MUC.RA and MUC.CA sequences greatly differ and are specific of each parasite stock, it might be proposed that different parasites have a different putative mucin gene family.
Figure 3: Genomic organization of putative mucin genes. The same five strains or cloned stocks were used for pulsed field gel electrophoresis experiments in panels1 and 2. Filters were hybridized with probe ``repeats'' from MUC.M/76 (panel1) or 3`-MUC.M/76 (panel2). Size markers used were Saccharomyces cerevisiae chromosomes (Sigma).
Northern blots were carried out to determine the size of the transcripts. A complete MUC.CA-2 probe detected a broad band around 1 kilobase in the four parasite strains and clones tested ( Fig. 4and data not shown). The epimastigote and cell-derived trypomastigote parasite forms showed the same pattern of bands.
Figure 4: Northern blot analysis. Total RNA from RA and CL-Brener strains was hybridized with complete MUC.CA-2 probe. Size markers are from an RNA ladder (Life Technologies, Inc.). E and T indicate epimastigote and trypomastigote parasite stages, respectively.
After the putative signal peptide, MUC.CA and MUC.RA members have a short N-terminal extension in the mature protein, which, in MUC.CA members, can be defined by the 11 amino acids just before the first repeat unit. Then follows the central domains that are rich in Thr (from 16 to 74%), Ser (from 0 to 12%), and Pro (from 10 to 17%) residues. Given the mucin-like structure of the deduced proteins, a computational method for prediction of O-glycosylation of mammalian proteins was used(30) . Predicted O-glycosylated residues provided by this service (NetOglyc@cbs.dtu.dk) are 64 for MUC.CA-2, 87 for MUC.CA-3, 50 for MUC.M/76, 30 for MUC.RA-1, and 21 for MUC.RA-2. Further work will clarify the O-glycosylation status of these gene products, which is at present unknown. In the case of MUC.CA-deduced sequences, central domains are entirely made up of repeat units. In MUC.RA sequences, most of the central domains lack repetitive units, but two degenerate repeats are present close to the C termini.
Finally, MUC.CA and MUC.RA members have a C-terminal domain comprising the last 54 amino acids, which is almost identical in all sequences studied (Fig. 5). This region has only 1 or 2 Cys residues in the different members, at variance with the Cys-rich terminal domains present in mammalian secretory mucins(10, 12) . The last 16 amino acids are 75% rich in hydrophobic residues and lack a polar cytoplasmatic tail, suggesting that this extension might be related to a glycophosphatidylinositol anchor addition(31) .
The overall organization of these putative mucin genes and gene family in T. cruzi somewhat resembles that of mammalian cells. MUC.CA and MUC.RA members all share sequences on the N- and the C-terminal domains, suggesting a common origin. Between these two regions, MUC.CA and MUC.RA members diverged almost completely. However, the two degenerate repeats remaining in MUC.RA genes raise the possibility that they were in fact originated from genes made up of perfect repeat units. In this context, a human airway mucin contains a virtually perfect 87-base pair tandem repeat, but numerous deletions or insertions resulting in many frameshifts destroy the repetitive structure in the coded peptide(32) . A somewhat related precedent of conserved sequences flanking variable tandem repeats in a protozoan was described for two genes encoding S-antigens of Plasmodium falciparum, the causative agent of malaria. These genes are homologous over their N and C termini and even over the flanking non-coding regions, but their central regions are formed of so different repeats that they do not even cross-hybridize on Southern blots performed at low stringency(33) .
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U32572[GenBank], U32346[GenBank], U32447[GenBank], U32448[GenBank], U32449[GenBank], and L20809[GenBank].