Bioinformatic identification of polymerizing and transmembrane mucins in the puffer fish Fugu rubripes

Tiange Lang2, Marina Alexandersson3, Gunnar C. Hansson1,2 and Tore Samuelsson2

2 Department of Medical Biochemistry, Göteborg University, Gothenburg, Sweden; and 3 FCC, Fraunhofer-Chalmers Research Centre for Industrial Mathematics, Chalmers Science Park, Gothenburg, Sweden

Received on November 24, 2003; revised on January 20, 2004; accepted on February 23, 2004


    Abstract
 Top
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
Mucins are large glycoproteins characterized by mucin domains that show little sequence conservation and are rich in the amino acids Ser, Thr, and Pro. To effectively predict mucins from genomic and protein sequences obtained from genome projects, we developed a strategy based on the amino acid compositional bias characteristic of the mucin domains. This strategy is combined with an analysis of other features commonly found in mucins. Our method has now been used to predict mucins in the puffer fish Fugu rubripes that were previously not identified or annotated. At least three gel-forming mucins were found with the same general domain structure as the human MUC2 mucin. In addition one transmembrane mucin was identified with SEA and EGF domains as found in the mammalian transmembrane mucins. These results suggest that the number of gel-forming mucins has been conserved during evolution of the vertebrates, whereas the family of transmembrane mucins has been markedly expanded in the higher vertebrates.

Key words: bioinformatics / glycosylation / mucin / SEA / EGF


    Introduction
 Top
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
Mucins are large, abundant, filamentous glycoproteins that coat the surfaces of the cells lining the respiratory, digestive, and urogenital tracts and, in some amphibia, the skin. Several of these mucins are known to form mucus layers, whereas others form the glycocalyx on, for example, the intestinal enterocytes. They serve as a diffusion barrier and as a lubricant; function to protect epithelial cells from infection, dehydration, and physical or chemical injury; as well as to aid the passage of materials through a tract. Their strategic position places the mucins at center stage in many disease processes in which the interaction of epithelial cells and their surrounding have gone astray, as in inflammatory and infectious diseases, cancer, and metastasis (Gendler and Spicer, 1995Go; Perez-Vilar and Hill, 1999Go).

Mucins are characterized by one or more domains that are rich in the amino acids Ser, Thr, and Pro and that are referred to as mucin or PTS domains. These domains often contain tandem repeats. The amino acid composition of a mucin domain often exceeds 40% of Ser and Thr, sometimes up to 90%, and the Pro content is often more than 5%. The function of a mucin domain is to serve as scaffolds for O-linked glycans bound to the amino acids Ser and Thr and by these bind water and interact with lectins (carbohydrate-binding proteins) found on microorganisms or endogenously. The dense oligosaccharide clusters make this domain proteolytically resistant and give an extended and stiff conformation, which can be described as being like that of a bottle brush. The biophysical properties of mucins are largely related to these extensive O-linked domains, illustrated by the typical mucin having more than 80% of its mass as carbohydrates. These highly O-glycosylated mucin domains are also found in other proteins where they make up minor parts. Mucin domains are often found in the stalk region of membrane proteins as, for example, in the low-density lipoprotein receptor, where the mucin domain is 48 residues long (Davis et al., 1986Go). The length and nature of the mucin domains can be important as illustrated in the Ebola virus glycoprotein (Yang et al., 2000Go). In the classical mucins, the mucin domains are long. The longest known today is the porcine submaxillary mucin made of 135 repeats of 81 amino acids, giving a mucin domain with more than 10,000 residues (Eckhardt et al., 1997Go). Among the smaller is the MUC7 with six 23-amino-acid repeats, giving an length of 138 residues (Bobek et al., 1993Go). Another characteristic of mucin domains, important for the prediction of these, is that they are typically encoded by a single exon.

The amino acid sequence of mucin domains tends to be poorly conserved. As a consequence, for the identification of such domains, one cannot rely on methods like BLAST that take advantage of sequence similarity. Identification of mucin domains is therefore a bioinformatic challenge, and novel methods are needed. For this reason we developed approaches based on the amino acid compositional bias characteristic of mucin domains.

As a starting point, we applied our methods to identify as many mucins as possible in the genome of the puffer fish Fugu rubripes (Aparicio et al., 2002Go). The genome is only 390 Mb, which is about eight times smaller than the 3000-Mb human genome, yet it contains a similar repertoire of genes (Brenner et al., 1993Go). A number of mucins of higher mammals have previously been identified, and their functional roles have been studied. However, other vertebrates as well as lower metazoans have been less well characterized with respect to mucins. Therefore, the analysis of Fugu in the present work provides a better understanding of the evolution of mucins in the vertebrates. We report both gel-forming and transmembrane mucin genes that were not previously identified in the annotation of the Fugu genome.


    Results
 Top
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
We developed two different methods for the identification of mucin domains. One of the methods, implemented in the program PTSPRED, identifies regions in a protein sequence with a composition of Ser, Thr, and Pro, which is consistent with mucin domains. An alternative approach takes advantage of dipeptide frequencies characteristic of mucin domains and is analogous to the hexamer method to discriminate between coding and noncoding regions in a genomic sequence. In this method, implemented in the program MPRED, a hidden Markov model is used to decide whether an amino acid sequence region conforms to a mucin domain or not. These programs are described in some more detail in Materials and methods. We used PTSPRED and MPRED to predict mucins in the Fugu genome. Table I compares the two methods for the Fugu proteins predicted. MPRED show high sensitivity and predicted a mucin domain in all proteins. On the other hand, it has a low specificity with many false positives (data not shown). PTSPRED is more stringent, but when using a lower threshold with S + T > 25%, the specificity is markedly decreased (data not shown).


View this table:
[in this window]
[in a new window]
 
Table I. Prediction of mucin domain in Fugu and human mucins

 
For the prediction of Fugu mucins, we used two different versions (8.1 and 11.2) of the ENSEMBL Fugu genome data sets. Both versions were considered, because we found that with respect to the mucin genes analyzed here, the assembly of the newer version in some instances was less accurate than the older one.

The ENSEMBL Fugu project provides a set of proteins predicted by the standard gene prediction pipeline as well a set of proteins predicted by the Genscan method. We searched both these categories for mucin domains using the PTSPRED and MPRED programs. In all cases where a mucin domain was predicted, the full-length protein sequence was retrieved and analyzed with SignalP (Nielsen et al., 1997Go) and TMHMM (Sonnhammer et al., 1998Go) for the prediction of signal sequence and transmembrane domains, respectively. The full-length sequences were also analyzed with respect to Pfam domains using hmmer (http://hmmer.wustl.edu). All proteins predicted to have at least one mucin domain, one signal peptide (when the predicted protein included the N-terminus) and at least one Pfam domain typical of human mucins (Table II) were further studied at the genomic level to accomplish a prediction as accurate as possible for the complete coding sequence.


View this table:
[in this window]
[in a new window]
 
Table II. Pfam domains found in human mucins.

 
We predicted a total of six mucin-type molecules encoded by the Fugu genome. These proteins and their domain structures are shown in Figure 1. The human mucins MUC1 and MUC2 are also shown for comparison. An online database of the predicted proteins is available at www.medkem.gu.se/mucinbiology/databases. It should be noted that the majority of the predictions are clearly different from the proteins predicted within the ENSEMBL data sets. Furthermore, in the cases where our prediction is consistent with the ENSEMBL protein, the latter protein is not correctly annotated as a mucin. This suggests that our methods are critical to accurately predict and annotate this group of proteins.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 1. Domain structure of Fugu mucins in comparison with the human MUC1 and MUC2 mucins. Abbreviations: PTS, mucin domain (rich in Pro, Thr, and Ser); vWD, von Willebrand D domain; TIL, Trypsin inhibitor–like cysteine-rich domain; CK, cysteine knot; SEA, domain found in sea urchin sperm protein, enterokinase, agrin; EGF, EGF domain; TM, transmembrane domain; I-domain, Cys-rich domain between two PTS domain in the central parts of gel forming mucins. The thin line marks the corresponding proteins found in version 11.2 of the Ensembl Fugu database. Dotted boxes represent predicted protein parts not found in the present Fugu genome due to sequencing gaps.

 
The complete proteins had a typical N-terminal signal sequence, but only one had an additional transmembrane domain as for the family of transmembrane mucins. The individual proteins are now described in more detail.

A transmembrane, MUC1-type mucin
Fugu MUC1
Only one of the predicted mucins had a domain structure reminiscent of a MUC1-type mucin (Figure 1). It is identical to a peptide predicted by Genscan (v.11, accession number scaffold_368.124967.141885). The fMUC1 protein is predicted to have 2055 amino acids and one transmembrane domain as predicted by the TMHMM program. The mucin or PTS domain is located in the N-terminal end of the protein, which is predicted to be extracellular. The stalk region between the PTS and transmembrane domains contains two SEA as well as two EGF domains, both typical for transmembrane mucins (Table II). The SEA domain is found in several mucins like MUC1 and MUC13 (Wreschner et al., 2002Go). The function of this domain is not known, but in MUC1 it is holding the two protein parts together despite a posttranslational cleavage in the middle of the domain. An EGF domain is sometimes found together with a SEA domain (like in MUC13) but sometimes without it (like in MUC4). The cytoplasmic tail of some of the mucins has been shown to be involved in signaling by phosphorylation (Gendler, 2001Go). The presence of a large number of Tyr, Ser, and Thr residues in the cytoplasmic tail of fMUC1 is consistent with this idea and suggests a similar function for fMUC1.

Gel-forming, MUC2-type mucins
All identified human gel-forming mucins contain several von Willebrand factor D domains (vwd). The function of these domains is presently not understood, but they are found in a number of nonmucin proteins, where they take part in the formation of large multimeric protein complexes. In addition to the vwd domains, the human gel-forming mucins typically have a C-terminal cysteine-knot (CK) domain, responsible for a disulfide bond–mediated dimerization. Here we have identified four different proteins, and each has one CK domain, four vwds, at least one mucin domain, and a domain structure typical for the human gel-forming mucins. All these mucins have been named MUC2-type mucins.

Fugu MUC2A
The predicted protein fMUC2A is almost identical to a peptide predicted by Genscan in the v. 8 data set (FuguGenscan 10920). However, in the v.11 data set, the same prediction is missing and instead another related but truncated protein was found (SINFRUP00000135941). The fMUC2A mucin (1993 amino acids) has one PTS domain with three vwd domains on its N-terminal side and one on its C-terminal side (Figure 1). Not only this domain organization but also domain lengths are similar to the human MUC2. An exception is that the central mucin domain is considerably smaller in Fugu, accounting for the difference in length (about 3000 amino acids) between these two proteins.

Fugu MUC2B
In the standard category of Fugu proteins (v. 11) we only found a set of small proteins (SINFRUP00000141455, SINFRUP00000141460, and SINFRUP00000141449), representing fragments of our predicted protein fMUC2B. On the other hand, the fMUC2B is more closely related to a peptide (18974 of v. 8) predicted by Genscan. However, this Genscan prediction is different from fMUC2B in that it has four transmembrane helices in its N-terminus. These helices are in a region of the genome with a gap in the assembly and the promoter region for fMUC2B might be in this gap. Therefore, we consider the Genscan peptide 18974 as an incorrect prediction, where two unrelated genes have been combined into one. The fMUC2B (2634 amino acids) is similar to fMUC2A, except that the PTS domain is larger and contains a longer N-terminus.

Fugu MUC2C
The fMUC2C protein is based on the v. 11 protein SINFRUP00000151681 and a Genscan prediction (Scaffold_981.70259.92060), whereas it has no equivalent in the v. 8 data set. The SINFRUP00000151681 protein represents only a part of fMUC2C (from position 26 to a position within the PTS region). The Genscan prediction is identical to fMUC2C, except for the PTS domain and a C-terminal extension with three vWD domains. This extension is presumably incorrect, and it seems more likely that the fMUC2C has a CK at its C-terminal end (Figure 1). Within the central region, at least two PTS domains were predicted with an intervening Cys-rich domain. This type of intervening domain has been named CysD or Cys-subdomain and is found in the human MUC2 (two) or MUC5AC and 5B (several). Unfortunately, it is not possible to fully reconstruct the PTS domain because there is a gap in the genome assembly in this region.

Fugu MUC2D
The predicted fMUC2D (Figure 1) is encoded by a sequence in the v. 11 data set (SINFRUP00000160341, SINFRUP00000160337, and SINFRUP00000160338) where one part (SINFRUP00000160341) is identical to a part of our predicted fMUC2D. Due to a gap in the genome sequence, the predicted protein is incomplete at its N-terminus; therefore it is not known if it contains additional vwd domains as found for other proteins of this family.

Fugu zonadhesin
The human zonadhesin has not been classified as a mucin but bears many of the characteristic features for a mucin. For instance, it has a central mucin domain, although not as large as for the classical mucins (Figure 1). In the search for mucin domains, we found the Fugu zonadhesin, which is identical to SINFRUP00000149997 in the v. 11 data set except for minor differences in the N-terminus. Genscan predicts a protein (scaffold_981.70259.92060, v. 11) where our predicted Fugu zonadhesin is joined to a sequence containing a number of transmembrane domains. This prediction is probably not correct and seems to merge two different genes. The N-terminal part of the Fugu zonadhesin is still missing, but the part possible to predict has the same domain structure as the human protein (Figure 1).

Evolutionary relationship between Fugu mucins
To further analyze the relationship between the different Fugu mucins, we considered multiple alignments of regions that are well conserved. Therefore, we extracted all the vwd domains of these proteins and numbered them from the N-terminal end as shown in Figure 1. A multiple alignment with CLUSTALW was carried out, and the guide tree used by the program is shown in Figure 2. All of the zonadhesin vwd domains are found in one of the branches. However, for the other mucins each branch typically had vwd domains with the same relative position/number from the different mucins. The only exception is the vwd1 of fMUC2B, which is more related to the vwd2 family. These results suggest that the order of vwd domains has been conserved and that the four fMUC2 proteins are evolutionarily closely related. This relationship was not clearly recognized from a multiple alignment of the full-length mucins. In addition, these results give further support to our mucin predictions.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 2. Phylogenetic tree of vwds of the predicted Fugu mucin-type proteins. AE show the cluster of the fourth vwd, Fugu zonadhesin vwd, the first vwd, the second vwd, and the third vwd, respectively, all counted from the N-terminus of MUC2. The first vwd of fMUC2D is named vwd2 because the vwd1 is predicted to be lacking in the current sequences.

 

    Discussion
 Top
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
The lack of sequence conservation in the mucin domains prompted us to develop novel approaches for their prediction. The PTSPRED program predicted a lower number of mucins than the MPRED program and did not detect the fMUC2D mucin even at a Ser + Thr frequency of less than 25% (Table I). The MPRED program, on the other hand, has a higher sensitivity but lower specificity. Interestingly, the MPRED program does not detect the typical PTS domain of the human MUC1 mucin. This is due to the high frequency of single Ser or Thr, not directly adjacent to other Ser or Thr, in the 20-amino-acid tandem repeats. We thus find that both programs have their specific advantages, and it is advantageous to use them in parallel.

Many mucin domains have short tandem repeats, and it should be possible to account for this in a mucin search strategy. However, some mucins do not have any apparent repeat nature of their mucin domains, probably due to a rapid loss of the repeats by the evolution. For example, the human MUC1 has a mucin domain with nearly identical 20-amino-acid repeats, whereas the corresponding mouse sequence only show limited repetitive nature (Gendler et al., 1990Go; Spicer et al., 1991Go).

The mucin domains known so far are always contained within one exon. This is an important criterion that has been taken into account when analyzing the predicted mucins during the prediction and reconstruction of the full-length proteins. At the same time, a complicating issue is that sequencing and assembly of mucin domain genomic regions is difficult. This is because mucin domain sequences are G/C-rich, long, and of a repetitive nature. An illustration of this is that among the human gel-forming mucins, only MUC5B and MUC2 are completely sequenced (Desseyn et al., 1998Go; Gum et al., 1994Go). Other mucins, for example MUC5AC, MUC6, and MUC19, are only partially sequenced. The problem is also illustrated in the present study, where fMUC2C and fzonadhesin both are incomplete in their PTS domains as a result of gaps in the genome assembly.

A characteristic property of the mucin domains in the mature protein is their dense O-glycosylation. These glycans are added posttranslationally by enzymes present in the Golgi apparatus. This means that the predicted proteins have to be processed in the secretory pathway. As such, they have to carry a signal sequence directing these for secretion. All the mucin-type molecules known today have a typical N-terminal signal sequence. Because signal sequences predictions by the SignalP program are relatively accurate, the presence of a signal sequence has been used as an additional requirement for the prediction of a mucin. The presence of an N-terminal signal sequence is also present in all the proteins predicted here (not known for fMUC2D and fzonadhesin because the N-termini of these are still missing). For this reason, we require that a candidate protein predicted by PTSPRED or MPRED should have a signal sequence to qualify as a mucin protein. For instance, PTSPRED and MPRED searches identified several typical nuclear proteins (like RNA polymerase, splice and transcriptional factors) as these have regions with the amino acid composition typical of mucins.

The four predicted gel-forming Fugu mucins as well as the fzonadhesin contain vwds. The name is derived from the von Willebrand factor, a protein of the human coagulation system (Sadler, 1998Go). This protein has been suggested to be the ancestor for the gel-forming mucins (Desseyn et al., 2000Go) and is also found in the Fugu genome (SINFRUP00000149997, also shown in the database). These domains are typical for gel-forming mucins and are found also in other extracellular proteins that are involved in the formation of polymeric complexes, such as vitellogenin, humoral lectin, apolipophorin, and luciferin 2-oxygenase. The vwd seems to be ubiquitous in multicellular eukaryotes. For instance, vitellogenin with this domain is found in higher mammals as well as in Caenorhabditis elegans. Our identification of the gel-forming mucins in Fugu shows that these mucins have a global domain structure comparable to other gel-forming mucins and that the mucins fMUC2A-C are homologous with respect to their vwd structure (Figure 2). This is also consistent with and indirectly supports our prediction of the protein sequence in these parts. Finally, the analysis of vwds illustrates that for an efficient prediction of mucins it is important to consider also other domains of the protein than mucin domains.

Gel-forming mucins have previously been found to be coexpressed with trefoil factors (TFF) (Taupin and Podolsky, 2003Go). Searching the Fugu genome for the TFF motif did not reveal any TFF proteins (data not shown).

In the present study we identified a number of Fugu mucins that were previously not annotated. Only one transmembrane mucin was found and contained domains typical of the human transmembrane mucins. Several of these proteins, like MUC1, MUC13, and MUC17, have SEA domains (Wreschner et al., 2002Go) and some have EGF domains (MUC3A, 3B, 4, 12, 13, and 17), all in the stalk region between the mucin domain and the membrane domain. About 10 transmembrane mucins and mucin-type molecules are known to be encoded by the human genome today (MUC1, 3A, 3B, 4, 12, 13, 15, 16, 17, and CD43) as compared to the five gel-forming mucins (MUC2, 5AC, 5B, 6, and MUC19). Therefore, the evolution of mucins from lower vertebrates to higher mammals seems to have maintained the number of gel-forming mucins at about the same level, whereas the family of transmembrane mucins has been markedly expanded. This probably reflects important and expanding roles for the transmembrane mucins in higher animals, where they are involved in mucosal surface protection and signaling.


    Materials and methods
 Top
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
The Fugu genomic and protein sequence databases (versions 8.1 and 11.2) was downloaded from Ensembl (available online at www.ensembl.org). We considered the standard set of proteins predicted by the genome project as well as peptides predicted by Genscan.

One method for the identification of mucin domains, implemented in PTSPRED, is to examine the frequency of the amino acids Ser, Thr, and Pro. The basic principle of the program is that a protein sequence is analyzed by moving a window, typically 100 amino acids long, along the sequence and determining the composition of Ser, Thr, and Pro in that window. The window is by default moved in steps of 10. If the composition of S + T and P, respectively, is above a certain threshold value, it is recorded as a potential PTS domain. If two or more such domains overlap, they are merged in the output from the program. Typical threshold values are 40% S + T and 5% P. The output from the program is a list of hits ordered by the length of the PTS-rich region. Finally, one version of the program allows us to analyze a genomic sequence by considering all six possible translation reading frames.

An alternative approach, the MPRED program, is built on a generalized hidden Markov model that is currently composed of two states, mucin or nonmucin, but can be extended to include additional features. The algorithm runs through the protein sequence and determines which amino acids belong to which state, resulting in a set of start and end coordinates for potential mucin domains along with a probability indicating a reliability of the prediction. The probability distributions incorporated in the model, such as state transitions, domain lengths, sequence composition, and so on, are based on empirical data. Currently the performance of the method is limited by the relatively small number of available training sequences. With more training sequences, the specificity of the predictions would be improved.

PTSPRED and MPRED as well as further documentation of these programs are available on request.

Transmembrane domains were predicted by the TMHMM program (Sonnhammer et al., 1998Go) and signal sequences using SignalP (Nielsen et al., 1997Go). For the alignment of protein to DNA we used BLAST (Altschul et al., 1997Go) or programs of the GCG package (Wisconsin package version 10.2, Genetics Computer Group, Madison, WI). CLUSTALW (Thompson et al., 1994Go) was used for multiple sequence alignments and for Pfam searches the domains previously found in mucins were downloaded (www.sanger.ac.uk/Software/Pfam) and searches made with the hmmer package (http://hmmer.wustl.edu). In-house Perl scripts were used for additional tasks.


    Acknowledgements
 
We are indebted to Xiang Liu for writing the code for the PTSPRED program. We are indebted to the Ph.D. program in medical bioinformatics financed from the KK foundation, the Swedish Research Council (No. 7461), and Sahlgren's Hospital (grant to Dr. Nils Lycke).


    Footnotes
 
1 To whom correspondence should be addressed; e-mail: gunnar.hansson{at}medkem.gu.se


    Abbreviations
 
CK, cystein knot; TFF, trefoil factor; vwd, von Willebrand factor D


    References
 Top
 Abstract
 Introduction
 Results
 Discussion
 Materials and methods
 References
 
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acid Res., 25, 3389–3402.[Abstract/Free Full Text]

Aparicio, S., Chapman, J., and Brenner, S. (2002) Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science, 297, 1301–1310.[Abstract/Free Full Text]

Bobek, L.A., Tsai, H., Biesbrock, A.R., and Levine, M.J. (1993) Molecular cloning, sequence, and specificity of expression of the gene encoding the low molecular weight human salivary mucin (MUC7). J. Biol. Chem., 268, 20563–20569.[Abstract/Free Full Text]

Brenner, S., Elgar, G., Sandford, R., Macrae, A., Venkatesh, B., and Aparicio, S. (1993) Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature, 366, 265–268.[CrossRef][ISI][Medline]

Davis, C.G., Elhammer, R.D.W., Schneider, W.J., Kornfeld, S., Brown, M.S., and Goldstein, J.L. (1986) Deletion of clustered O-linked carbohydrates does not impair function of low density lipoprotein receptor in transfected fibroblasts. J. Biol. Chem., 261, 2828–2838.[Abstract/Free Full Text]

Desseyn, J.L., Buisine, M.P., Porchet, N., Aubert, J.P., and Laine, A. (1998) Genomic organization of the human mucin gene MUC5B-cDNA and genomic sequences upstream of the large central exon. J. Biol. Chem., 273, 30157–30164.[Abstract/Free Full Text]

Desseyn, J.L., Aubert, J.P., Porchet, N., and Laine, A. (2000) Evolution of the large secreted gel-forming mucins. Mol. Biol. Evol., 17, 1175–1184.[Abstract/Free Full Text]

Eckhardt, A.E., Timpte, C.S., DeLuca, A.W., and Hill, R.L. (1997) The complete cDNA sequence and structural polymorphism of the polypeptide chain of porcine submaxillary mucin. J. Biol. Chem., 272, 33204–33210.[Abstract/Free Full Text]

Gendler, S.J. (2001) MUC1, the renaissance molecule. J. Mamm. Gland Biol. Neoplasia, 6, 339–353.[CrossRef][ISI][Medline]

Gendler, S.J. and Spicer, A.P. (1995) Epithelial mucin genes. Ann. Rev. Physiol., 57, 607–634.[CrossRef][ISI][Medline]

Gendler, S.J., Lancaster, C.A., Taylor-Papadimitriou, J., Duhig, T., Peat, N., Burchell, J., Pemberton, L., Lalani, E.N., and Wilson, D. (1990) Molecular cloning and expression of human tumor-associated polymorphic epithelial mucin. J. Biol. Chem., 265(25), 15286–15293.[Abstract/Free Full Text]

Gum, J.R., Hicks, J.W., Toribara, N.W., Siddiki, B., and Kim, Y.S. (1994) Molecular cloning of human intestinal mucin (MUC2) cDNA. Identification of the amino terminus and overall sequence similarity to prepro-von Willebrand factor. J. Biol. Chem., 269, 2440–2446.[Abstract/Free Full Text]

Nielsen, H., Engelbrecht, J., Brunak, S., and von Heijne, G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 10, 1–6.[CrossRef][ISI]

Perez-Vilar, J. and Hill, R.L. (1999) The structure and assembly of secreted mucins. J. Biol. Chem., 274, 31751–31754.[Free Full Text]

Sadler, J.E. (1998) Biochemistry and genetics of von Willebrand factor. Ann. Rev. Biochem., 67, 395–424.[CrossRef][ISI][Medline]

Sonnhammer, E.L., von Heijne, G., and Krogh, A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol., 6, 175–182.[Medline]

Spicer, A.P., Parry, G., Patton, S., and Gendler, S.J. (1991) Molecular cloning and analysis of the mouse homologue of the tumor-associated mucin, MUC1, reveals conservation of potential O-glycosylation sites, transmembrane, and cytoplasmic domains and a loss of minisatellite-like polymorphism. J. Biol. Chem., 266, 15099–15109.[Abstract/Free Full Text]

Taupin, D. and Podolsky, D.K. (2003) Trefoil factors: Initiators of mucosal healing. Nat. Rev. Mol Cell Biol., 4, 721–723.[CrossRef][ISI][Medline]

Thompson, J.D., Higgins, D.G., and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalities and weight matrix choice. Nucl. Acids Res., 22, 4673–4680.[Abstract]

Wreschner, D., McGuckin, M.A., Williams, S.J., Baruch, A., Yoell, M., Ziv, R., Okun, L., Zaretsky, J., Smorodinsky, N., Keydar, I., and others. (2002) Generation of ligand-receptor alliances by SEA module-mediated cleavage of membrane-associated mucin proteins. Protein Sci., 11, 698–706.[Abstract/Free Full Text]

Yang, Z.Y., Duckers, H.J., Sullivan, N.J., Sanchez, A., Nabel, E.G., and Nabel, G.J. (2000) Identification of the Ebola virus glycoprotein as the main viral determinant of vascular cell cytotoxicity and injury. Nat. Med., 6, 886–889.[CrossRef][ISI][Medline]





This Article
Abstract
FREE Full Text (PDF)
All Versions of this Article:
14/6/521    most recent
cwh066v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Disclaimer
Request Permissions
Google Scholar
Articles by Lang, T.
Articles by Samuelsson, T.
PubMed
PubMed Citation
Articles by Lang, T.
Articles by Samuelsson, T.