(Received for publication, August 25, 1994; and in revised form, November 9, 1994)
From the
The apical endosomal compartment is thought to be involved in
the sorting and selective transport of receptors and ligands across
polarized epithelia. To learn about the protein components of this
compartment, we have isolated and sequenced a cDNA that encodes a
glycoprotein that is located in the apical endosomal tubules of
developing rat intestinal epithelial cells. The deduced amino acid
sequence predicts a protein of 1216 amino acids with a molecular mass
of 133,769 Da. The deduced amino acid sequence together with
aminoterminal amino acid sequencing indicate that there is a cleaved
21-amino acid signal sequence at the NH-terminal portion of
the molecule. There is a single hydrophobic region near the carboxyl
terminus that has the characteristics of a membrane-spanning domain and
a 36-amino acid cytoplasmic tail. We have found that the major form of
this protein in intestinal epithelial cells has a molecular mass of
55-60 kDa, which is significantly smaller than the size predicted
from the cDNA sequence, suggesting that the protein is synthesized as a
large precursor and processed to the smaller form. The smaller form
remains associated with the membrane, however, possibly through
noncovalent association with the transmembrane portion of the molecule
or with another membrane protein. The extracytoplasmic domain is
cysteine-rich, with three cysteine-rich repeats that are similar to
cysteine repeats present in several receptor proteins. However, there
is no other significant similarity to other proteins in the GenBank.
The cytoplasmic tail contains a possible internalization motif and
several consensus motifs for serine/threonine kinases. Northern blot
analysis suggests a single abundant message, and Southern blot analysis
is consistent with a single gene and the absence of pseudogenes for
this unique endosomal protein.
After endocytosis from the plasma membrane, internalized receptors and ligands are delivered to endosomes (for review, see Trowbridge et al.(1993)). The endosomal compartment performs a variety of functions, including the sorting of internalized receptors and ligands (Abrahamson and Rodewald, 1981; Geuze et al., 1984, 1987) and newly synthesized lysosomal membrane proteins and hydrolases (Griffiths et al., 1988; Kornfeld and Mellman, 1989). In polarized epithelial cells, the apical endosomal compartment has been implicated in both apical to basolateral and basolateral to apical transepithelial transport (Abrahamson and Rodewald, 1981; Apodaca et al., 1994).
We previously generated a monoclonal antibody against a glycoprotein that is enriched in the apical tubular endosomes of absorptive cells in developing rat ileum (Wilson et al., 1987). On Western blots, this antibody recognizes a major band with an apparent molecular mass of 55-60 kDa and a minor high molecular mass band of 130 kDa. This endosomal antigen is expressed in high amounts during the time when ileal epithelial cells are actively endocytic, and expression stops when the intestinal epithelium achieves the adult configuration (Wilson et al., 1991). However, unlike the neonatal jejunum, the ileum is not involved in the transfer of IgG across the intestine (Rodewald, 1980) and instead has been suggested to be active in the uptake and transepithelial transport of milk-borne peptides and growth factors (Siminoski et al., 1986; Gonnella et al., 1987). To further study the composition of this endosomal compartment, we have isolated the cDNA for this endosomal protein and have characterized this protein at the molecular level. Analysis of the cDNA sequence suggests that this protein is synthesized as a high molecular weight precursor and post-translationally processed to lower molecular weight forms. Screenings of the DNA and protein data bases indicate there is no significant identity to other proteins in the GenBank and that it is a unique apical endosomal protein.
Figure 2:
Reverse transcription and polymerase chain
reaction of total RNA from neonatal rat ileal epithelial cells. A, amino-terminal and internal amino acid sequences were used
to design primers. Primers were designed with a 5`-GG clamp followed by
an EcoRI restriction site (underlined) to facilitate
subcloning into sequencing vectors. Deoxyinosine was used in positions
of highest degeneracy and mixed oligonucleotides in positions of lower
degeneracy. B, reverse transcription of total RNA using the
internal primer followed by polymerase chain reaction using both
primers resulted in a single band of 450
bp.
Figure 3: Nucleotide sequence and predicted amino acid sequence of the endosomal antigen cDNA. Nucleotides are numbered 5` to 3`, with the first nucleotide of the putative signal sequence at position 1. Amino acids are numbered from the first amino acid obtained by amino-terminal sequencing. The positions of the primers used for polymerase chain reaction are denoted by diagonalunderlining. The signal sequence is boxed, and the transmembrane domain is shown in a black box. The cysteine residues are indicated by shadedcircles, and consensus signals for N-linked glycosylation are enclosed in ovals.
Figure 1: Purified endosomal antigen. The endosomal antigen immunopurified from neonatal rat ileal epithelial cells was separated on a 7.5% SDS-polyacrylamide gel and transferred to polyvinylidene difluoride membrane. The 55-60-kDa band (arrow) was excised from the membrane for amino-terminal sequencing or was electroeluted and digested with trypsin for internal amino acid sequencing.
A
ZAP cDNA library derived from neonatal rat intestine was then
screened by colony hybridization with this 450-bp fragment, and larger
clones were identified. The largest clone was
3.9 kb in length.
Three independent clones of 3-3.7 kb were identified, and they
were shown to be included within the sequence of the largest clone,
indicating that the large clone was not due to artifactual joining of
smaller clones. The full-length coding sequence was obtained by a
combination of primer extension sequencing and nuclease deletion in
both directions, and the open reading frame was determined. Additional
peptide sequences obtained from internal amino acid sequencing were
identified in the larger clones. The largest clone identified was not
as large as the mRNA seen on Northern blots (see below) and did not
contain a poly(A)
tail. However, several screenings of
the
ZAP library with probes derived from the 3`-end of the clone
did not result in larger clones or the identification of a poly(A)
stretch. 3`-Rapid amplification of cDNA ends with primers derived from
the 3`-end of the coding sequence did result in the identification of a
fragment of
750 bp that terminated in a poly(A)
tail (data not shown).
The cDNA sequence of this clone and the
predicted amino acid sequence are shown in Fig. 3. The cDNA
contains 57 bases of 5`-untranslated region, and the first methionine
in the amino acid sequence contains a purine in position -3, an
important component of the consensus motif for an initiation methionine
(Kozak, 1989). The next 21 amino acids are quite hydrophobic and
probably encode a signal peptide followed by the 6 amino acids obtained
from NH-terminal peptide sequencing. The putative signal
cleavage site conforms to the -1,-3 rule, with a glycine in
position -1 and a serine in position -3 (von Heijne, 1986).
The open reading frame after the signal peptide encodes a protein of
1195 amino acids with a calculated molecular mass of 131,128 Da.
Because of the size of the predicted open reading frame and the
presence of a high molecular weight band on immunoblots, we postulated
that the protein is synthesized as a high molecular weight precursor
and proteolytically processed into the smaller form seen on Western
blots. To determine the size of the protein encoded by the cDNA, we
performed in vitro transcription and translation of the 3.9-kb
clone in a rabbit reticulocyte lysate. In vitro translation
resulted in a single specific band of M 130,000 (Fig. 4). Other forms were absent, suggesting that the
processing event that results in the smaller in vivo form
occurs post-translationally in the endoplasmic reticulum, Golgi
apparatus, or on the cell surface. Even after cleavage, the smaller
form remains associated with the membrane (Wilson et al.,
1987) and may remain noncovalently associated with the transmembrane
portion of the molecule. This arrangement would be similar to the
brush-border glycosidases sucrase-isomaltase and lactase-phlorizin
(Semenza, 1986). In the case of sucrase-isomaltase, pancreatic enzymes
are thought to have a role in the post-translational processing to the
mature form (Hauri et al., 1979). The mechanism and site of
cleavage of the endosomal antigen remain unknown.
Figure 4:
In vitro transcription and
translation of the endosomal antigen cDNA. The 3.9-kb cDNA for the
endosomal antigen was transcribed using T3 DNA polymerase and
translated in a rabbit reticulocyte lysate. Lane1,
translation in the absence of RNA; lane 2, translation in the
presence of 1 µg of RNA. A single specific band of M 130,000 is present.
In addition to the
hydrophobic signal peptide at the amino-terminal portion of the
protein, Goldman-Engelman-Steitz analysis (Engelman et al.,
1986) with a window size of 20 indicates a single hydrophobic domain of
30 amino acids near the carboxyl terminus of the protein (Fig. 5). This hydrophobic domain is flanked by an arginine
(position 1128) and lysine (position 1160) and is proposed to be a
membrane-spanning domain. This predicted topology would indicate that
this protein is a type I membrane protein, with the majority of the
molecule on the extracytoplasmic face of the membrane. The
extracytoplasmic domain is mostly hydrophilic and contains several
cysteine-rich domains ( Fig. 3and Table 2). There are
three cysteine-rich repeats in the protein at positions 7-32,
210-246, and 435-470 (Table 2). These repeats have
similarity to cysteine repeats in a array of membrane proteins,
including the -macroglobulin receptor (low density
lipoprotein receptor-related protein) (Herz et al., 1988;
Kristensen et al., 1990; Strickland et al., 1990),
low density lipoprotein receptor (Goldstein et al., 1985), and
members of the complement family (DiScipio et al., 1984) (Table 2). The functions of these repeats in these proteins are
unclear. Overall, this endosomal antigen is relatively cysteine-rich,
with 39 cysteine residues in the mature molecule. At least some of
these cysteine residues are thought to be involved in disulfide bonds,
as SDS-polyacrylamide gels run under nonreducing conditions result in
an increased electrophoretic mobility of the protein. (
)These intramolecular disulfide bonds may protect the
protein from proteases present in the intestinal lumen or in the apical
giant lysosome, a structure that is a morphological hallmark of
neonatal rat ileal epithelial cells (Cornell and Padykula, 1969).
Figure 5: GES analysis of the predicted amino acid sequence. The deduced amino acid sequence was analyzed for the presence of nonpolar transbilayer helices employing the algorithm of Engelman et al.(1986). The window size is set for 20 residues. Two hydrophobic domains are identified, the amino-terminal signal sequence and a carboxyl-terminal domain that is proposed to be a transmembrane domain.
There are six consensus signals for asparagine-linked glycosylation
in the extracytoplasmic domain (Fig. 3). Four of these signals
are clustered toward the amino terminus of the protein with the
asparagine residues at positions 183, 269, 320, and 346, and the
remaining two are at positions 618 and 818. These consensus signals are
located outside the cysteine-rich domains. Biochemical evidence
indicates that at least three of these glycosylation sites are, in
fact, glycosylated.
As mentioned above, near the carboxyl terminus of the predicted amino acid sequence, there is a hydrophobic stretch of 30 amino acids, beginning at amino acid 1128, that is characteristic of a membrane-spanning domain (Fig. 5). The portion of the molecule predicted to lie on the cytoplasmic side of the membrane is 36 amino acids in length. On the cytoplasmic side, separated from the membrane by 15 amino acids, is the amino acid sequence FDNILF(1174-1179), a sequence similar to the internalization signal FDNPVY found in the low density lipoprotein receptor (Chen et al., 1990). Although the majority of this endosomal antigen is present in early endosomal membranes, some immunoreactivity is detected on the apical cell surface of neonatal rat enterocytes (Wilson et al., 1987). An internalization motif may be necessary to allow cycling of this protein between the two membrane domains.
Some internalization motifs have also been shown to be related to the determinants involved in the basolateral targeting of receptors and membrane proteins (Brewer and Roth, 1991), and the low density lipoprotein receptor is normally targeted to the basolateral plasma membrane of epithelial cells. Interestingly, one of the basolateral targeting signals in the low density lipoprotein receptor includes the tyrosine (position 807 of the human receptor (Goldstein et al., 1985)) of the internalization motif, followed by a cluster of negatively charged amino acids (Matter et al., 1992). In the case of the endosomal antigen, which is present in the apical endosomal compartment, the presence of phenylalanine (amino acid 1179) in the analogous position in place of tyrosine may explain the apical endosomal targeting of this protein.
In addition to their roles in internalization and polarized targeting of membrane proteins, cytoplasmic domains are important for the sorting of membrane proteins to specific organelles. The determinants for the sorting of mannose 6-phosphate receptors to the late endosomal compartment have been shown to reside in two portions of the cytoplasmic domain, a tyrosine-containing internalization signal and a carboxylterminal dileucine motif (Jadot et al., 1992; Johnson and Kornfeld, 1992). Also, dileucine or leucine-isoleucine pairs have been implicated in the intracellular targeting and/or endocytosis of other proteins (Letourneur and Klausner, 1992; Pieters et al., 1993; Verhey and Birnbaum, 1994). The endosomal antigen described here has neither of these motifs, but does contain a isoleucine-leucine pair in the cytoplasmic tail.
Phosphorylation of the cytoplasmic tails of membrane proteins can also affect their intracellular trafficking (Casanova et al., 1990). The cytoplasmic domain of the endosomal antigen contains several serine and threonine residues, and there is a consensus motif for phosphorylation by casein kinase I (SXX(S/T), positions 1190-1193) and two motifs for phosphorylation by casein kinase II (TXXEX and SXXDX, positions 1186-1190 and 1172-1176, respectively) (Pearson and Kemp, 1991). However, it is not yet known whether this protein is phosphorylated. Determination of the role of the cytoplasmic domain in the endosomal targeting, retention, and endocytosis of this molecule awaits mutagenesis experiments.
Figure 6:
Southern and Northern blot analyses. A, Southern blot of rat genomic DNA. Genomic DNA was digested
with enzymes as indicated, transferred to nitrocellulose, and
hybridized with a 100-bp fragment spanning nucleotides 395-495 of
the coding region. B, Northern blot of neonatal rat ileum
total RNA. 3 µg of RNA was probed with a cDNA fragment derived from
nucleotides 64-510. The message is 4.8 kb in length (arrow).
This endosomal protein is a unique marker for a specialized apical endosomal compartment in intestinal epithelial cells. It will be of interest to determine the role of its structure upon its targeting and retention in the endosomal compartment of polarized cells. Also, as expression of this protein is correlated with the assembly of the apical endosomal complex in developing ileal cells (Wilson et al., 1991), it will be interesting to determine whether this protein plays a role in the extent or morphology of the endosomal compartment in other cell types.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) L37380[GenBank].