From the
Duck hepatitis B virus particles bearing the L and S envelope
proteins bind a cellular glycoprotein of 180 kDa (gp180) with high
affinity and specificity. Binding is mediated by the pre-S region of
the L protein and is blocked by neutralizing but not by
non-neutralizing monoclonal antibodies to the virus. These and other
properties have suggested that gp180 may be a component of the viral
entry machinery. Here we report the purification of gp180 from duck
liver and the isolation and characterization of cDNA encoding it. DNA
sequence analysis of this cDNA indicates that gp180 is a novel member
of the basic carboxypeptidase gene family.
Hepadnaviruses are small enveloped DNA viruses that produce
persistent infections of liver cells and cause acute and chronic
hepatitis(1) . The prototype member of this virus family is
human hepatitis B virus, but related viruses are also found in a
variety of other vertebrate species(2) . Although much is known
about hepadnaviral genomic replication and gene expression, little is
known about the mechanisms by which these viruses enter their host
cells. In particular, the cell surface receptor(s) for all of these
viruses remain unidentified. We have been attempting to identify
components of the hepadnaviral entry pathway, using the duck hepatitis
B virus (DHBV)
We
have previously reported that a 180-kDa duck cell glycoprotein (gp180)
binds DHBV particles with high affinity and specificity(7) . The
species distribution of this protein mirrors the known host range of
the virus, and gp180 binding is blocked by neutralizing but not by
non-neutralizing monoclonal antibodies to the viral
envelope(7) . Of the two viral envelope proteins (L and
S)(8, 9) , binding occurs only to the L protein and can
be localized to a 65-amino acid region within the so-called pre-S
domain of L(10) , a domain previously shown to harbor important
determinants for host cell binding(11) . All of these properties
are consistent with a potential role for the protein in DHBV infection;
however, definitive assessment of the role of gp180 in entry will
require further characterization of the protein and its gene. Here we
report the purification of gp180 and the isolation of cDNA clones
encoding this protein. Nucleotide sequence analysis of this cDNA
indicates that the gp180 coding region encodes a novel member of the
basic carboxypeptidase gene family.
The supernatant generated from 500 g of
liver was then passed over a pre-S-glutathione S-transferase
affinity column. DHBV pre-S-glutathione S-transferase fusion
protein BE1 (see Ref. 10) was expressed in E. coli, and fusion
protein from 4 liters of culture was batch adsorbed onto 5 ml of
glutathione-Sepharose as described previously(12) . Liver
extract was passed over the column, and the column was then washed with
4 bed volumes of homogenization buffer plus 1% Triton X-100; bound
proteins were eluted in 20 mM glutathione (in 60 mM
Tris, pH 8.0, 150 mM NaCl, 1% Triton X-100). Fractions were
examined by SDS-polyacrylamide gel electrophoresis and Coomassie
Brilliant Blue staining, and fractions positive for gp180 were pooled
and concentrated by Centricon 100 filtration. This material was
adjusted to 1 mM CaCl
Colonies were screened by hybridization with degenerate synthetic
oligonucleotides labeled at their 5` ends with
Gel-purified gp180 prepared by each method
was used for NH
Comparison of the amino acid sequence of
gp180 with known protein sequences revealed significant homology with
members of the basic carboxypeptidase gene family. Fig. 5displays the sequence alignment of gp180 with human
carboxypeptidase H (CBPH). As can be seen, gp180 is about 3
times the size of carboxypeptidase H (and all other carboxypeptidases
in the GenBank data base), but homologies to the basic
carboxypeptidases can be found throughout the protein, in a fashion
that indicates that gp180 is actually a head-to-tail array of
carboxypeptidase homology domains. This suggests that the gene for
gp180 may have evolved by tandem duplication of an ancestral
carboxypeptidase coding sequence in the distant past. We designate each
of the carboxypeptidase homology units in gp180 as domains A, B and C,
with A the NH
Our previous work has shown that gp180 is a membrane protein, found
on both internal and plasma membranes of the cell(7) . Fig. 6shows the hydrophobicity analysis of the sequence, plotted
according to the method of Kyte and Doolittle(20) . The
hydrophobic region at the NH
The results presented here indicate that gp180, a
membrane-associated protein initially identified in duck hepatocytes on
the basis of its ability to bind DHBV envelope proteins, is actually a
member of the basic carboxypeptidase gene family. Basic
carboxypeptidases are enzymes that remove basic amino acids (lysine or
arginine) from the COOH terminus of polypeptide chains. Several
membrane-bound enzymes of this family are known (see Ref. 16 for
review), and these play important and diverse roles in biology. CPB-H,
the enzyme with the greatest homology to gp180, is found on the
membranes of secretory granules in many endocrine and neuroendocrine
cells and is involved in the post-translational maturation of insulin
and enkephalin from their precursor polypeptides. Carboxypeptidase N,
another family member, is a secreted enzyme produced by hepatocytes
that cleaves and inactivates the C3a anaphylotoxin in the circulation.
Carboxypeptidase M is a plasma membrane enzyme that cleaves the
vasoactive protein bradykinin (as well as other substrates) to generate
des-Arg bradykinin, a molecule with different receptor recognition
properties. As noted above, the sequence of gp180 indicates that the
molecule has preserved the residues involved in catalytic activity;
recent preliminary investigations
The
interaction of gp180 with DHBV envelope proteins displays many of the
properties expected for a component of the host machinery involved in
viral entry (see Introduction). Because of its affinity and specificity
for the pre-S region of the viral L protein, it is tempting to assign
to it a role in initial virion binding at the cell surface. However,
since gp180 is also found on internal membranes it could be involved in
post-internalization events (e.g. within endosomes). In
addition, there is reason to believe that gp180 is not the sole host
molecule required for viral entry. For example, gp180 is found in
several duck cells and tissues not known to be permissive for viral
infection (e.g. duck embryo fibroblasts). In addition, we have
recently noted that expression of recombinant gp180 in chicken LMH
hepatoma cells does not confer susceptibility to DHBV infection.
The nucleotide
sequence(s) reported in this paper has been submitted to the
GenBank
(
)as a model system. The avian
virus was chosen because primary hepatocytes explanted directly from
duck liver efficiently support infection with DHBV
virions(3, 4) ; for the mammalian viruses, similar
primary hepatocytes are poorly available and inefficiently
infectable(5, 6) . These primary duck hepatocytes
provide both a convenient source of starting material for receptor
identification as well as a basis for infectivity assays essential to
test the in vivo role of candidate receptor proteins.
Purification of gp180 from Duck Liver
Duck liver
cut into small pieces was added to homogenization buffer (50 mM Tris, pH 7.4, 150 mM NaCl) (5 ml/g of liver; typically
multiple aliquots of 8 g of tissue/aliquot were extracted) and then
homogenized in a Polytron homogenizer (three 30-s bursts on setting 4).
The homogenate was clarified by centrifugation for 10 min at 600
g (in an HG-4L rotor at 1500 rpm.) The supernatant was
collected and then centrifuged for 10 min at 12,000
g (in an SS34 rotor at 10,000 rpm). The supernatant from this step
was then centrifuged for 90 min in a 60 Ti rotor at 37,500 rpm (100,000
g) to generate a microsomal pellet. This pellet was
resuspended in homogenization buffer (with the aid of the Polytron
homogenizer), and the solution was adjusted to a protein concentration
of 1 mg/ml and a final concentration of 1.0% Triton X-100 and incubated
on ice for 60 min. Insoluble material was removed by centrifugation for
90 min at 100,000
g; supernatants were then frozen at
-80 °C until used.
and 1 mM MnCl
and then applied to a 5-ml lentil lectin-Sepharose 4B column
(Sigma) equilibrated with 50 mM Tris, pH 7.4, 150 mM
NaCl, 1% Triton X-100, 1 mM CaCl
, and 1 mM MnCl
; bound material was eluted with homogenization
buffer containing 0.1% Triton X-100 and 200 mM
-methyl
-D-mannopyranoside. This material was then subjected to
preparative electrophoresis through 6% SDS-polyacrylamide gels, and the
180-kDa band was identified by staining with Coomassie Brilliant Blue.
This preparation method yielded 0.4-0.8 µg of gp180/g of
extracted liver.
Proteolysis and Amino Acid Sequence
Determination
The 180-kDa protein was recovered from the
preparative SDS-polyacrylamide gels by electroelution, precipitated
with acetone, and redissolved in 70% formic acid for cleavage with CNBr
(12 h). Subsequently the protein fragments were further digested with
trypsin at 37 °C for 12 h. Tryptic cleavage fragments were
separated on a C18 reversed phase column (Vydac, Hesperia, CA), and
selected peaks were subjected to protein sequence analysis with an
Applied Biosystems Sequencer, Model 475A (Applied Biosystems, Foster
City, CA).
Preparation and Screening of cDNA Libraries
A
plasmid-based cDNA library from duck liver poly(A) RNA
was prepared by the method of Okayama and Berg(13) , using a
plasmid, pSVH, bearing an SV40 origin and early promoter. The plasmid
was derived from pSV45H (14) by deletion of a 1.4-kb internal Bam
fragment. This plasmid was cleaved with PstI, and poly(dT)
tails were added to the resulting 3` termini by incubation with
terminal transferase and dTTP. The upstream poly(dT) tail was removed
by cleavage at a unique HindIII site 5` to the original PstI site. To this dT-tailed plasmid was annealed
poly(A)
RNA from uninfected duck liver, and cDNA
synthesis was effected by addition of all four dNTPs and recombinant
Moloney murine leukemia virus reverse transcriptase (cDNA Synthesis
System Plus, Life Technologies, Inc.). Following repair of the termini
by incubation with T4 DNA polymerase, products with inserts of about 4
kb or greater in length were isolated by agarose gel electrophoresis,
self-ligated, and cloned into E. coli strain HB101.
P by
incubation with T4 polynucleotide kinase and [
-32P]ATP.
The oligonucleotides were derived from the sequences of the peptides
isolated for purified gp180; those that were successfully employed were
46 MR
(5`-TT(A/G)TC(T/C)TG(A/G/T/C)AG(A/G)AA(A/G/T/C)CC(A/G)TG(A/G/T/C)AC-3`)
com-plementary to peptide 3 and 61MR
(5`-TA(A/G)TT(A/G/T/C)(G/C)(A/T)(A/G/T/C)GT(A/G)AA(A/G)TC(A/G/T/C)GT(A/G)TC-3`)
complementary to peptide 5. This screen yielded one clone, A23 (from
about 10,000 colonies screened); this clone contained an approximately
4.3-kb insert that annealed to both oligonucleotides. This clone
comprised 2.5 kb of 3`-noncoding sequences and 1.8 kb of coding
sequence from the COOH-terminal region of gp180 (Fig. 1). The DNA
sequence of A23 confirmed that it indeed encoded both peptides from
which the screening oligos were derived. An oligonucleotide
(5`-GTTCTCTATGATGGCTTTGGTCTCA-3`) complementary to sequence from the 5`
region of A23 was used to prime cDNA synthesis from duck liver
poly(A)
RNA, and double-stranded cDNA products were
cloned into
gt11. This library was then screened with A23 probes
to identify the overlapping clones J2 and J13 (see Fig. 1). To
isolate more 5`-coding sequences, another cDNA library was prepared by
priming cDNA synthesis with oligo-primer 2
(5`-GGCAGCAGGTACAGGTCGGTGGTGT-3`) complementary to the 5` region of
clone J2. Screening of this library with J2 sequences yielded
additional clones J39 and J31. Sequencing of these cDNAs revealed that
the coding region was still open at the 5` extremity of the cDNA,
indicating missing 5` sequences. All attempts to clone the residual 5`
sequences (estimated by comparison with the mRNA size on Northern blots
to be about 200 nucleotides) by cDNA cloning and by the 5` rapid
amplification of cDNA ends method (15) failed. It was
anticipated that the missing sequences would contain both the initiator
methionine and the signal peptide, essential elements for correct
expression of the cDNA. To allow reconstruction of a functional cDNA,
the missing sequences were supplied from genomic clones. A Sau3AI partial digest of DNA from duck embryo fibroblasts was
cloned into
EMBL3 and was screened with sequences from J31 (see Fig. 1) to yield clones DG1, DG2, and DG7, the last of which
clearly spanned the 5` end of the cDNA. To reconstitute the 5` end of
the cDNA (see Fig. 2), the following fragments (listed in order
from the 5` end of the gene) were ligated together: NcoI-BglI from DG7; BglI-HindIII
from J31; HindIII-BglII from J2 (this was the product
of a BglII partial digest); and BglII-NcoI
from A23. To construct the full-length gp180 expression vectors, the NcoI fragment bearing the entire gp180 coding region was
cloned into (i) the unique EcoRI site of pSVH (after blunting
all termini with T4 DNA polymerase), to generate pSV180, and (ii) the SmaI site of pBKRSV (Stratagene), to generate pKRSV180. In the
case of pKRSV180 (Fig. 2) the lac promoter sequences
just downstream of the Rous sarcoma virus promoter (located between NheI and XbaI) were removed by digestion and
religation.
Figure 1:
Cloning of gp180. Bottom,
black bars denote extent of cDNA clones encoding the indicated portions
of the gp180 cDNA. pA23 and pD6 are plasmid-based clones; the remaining
clones were isolated in phage . Primers 1 and 2 are described in
the text and were used to prime cDNA synthesis for the indicated
subjacent cDNA clones. probe denotes extent of sequences from
clone J31 used to screen genomic clones to reconstruct the 5` end of
the gp180 coding region. Center, the structure of the gp180
cDNA. Stippledbox, coding region; openbox, 3` noncoding region. Top, clones of the
genomic gp180 locus. Darklines denote overlapping
clones bearing genomic DNA fragments annealing to gp180 cDNA
probes. A region of clone DG7 is highlighted; a fragment from this
region was used to reconstruct the 5` end of the gp180 coding region,
which was unclonable as cDNA.
Figure 2:
Construction of a full-length gp180
expression vector (pKRSV180). Center, the structure of the
gp180 expression vector. Blacklines, vector
sequences including the Rous sarcoma virus promoter (RSV
promoter) and SV40 polyadenylation sequences as indicated. Stippledbox, gp180 coding region, with indicated
restriction sites. Above and below the pKRSV180diagram are depicted the genomic and cDNA clones
(respectively) used to reconstruct the vector. Dottedlines denote the contributing fragments from each clone.
Note that the entire coding region can be mobilized on a single NcoI fragment. This fact was used to construct the related
plasmid pSV180, in which the gp180 coding region is subcloned between
the SV40 early promoter and the hepatitis B virus poly(A)
signal.
DNA Sequence Analysis
Multiple fragments of gp180
cDNA and genomic DNA were cloned into pGEM3Zf(+), and their DNA
sequences were determined on an ABI 370A DNA sequencer. Nucleotide and
amino acid sequences were analyzed with the DNASIS software (Hitachi
Software Engineering, Yokohama, Japan).
Purification of gp180 from Duck Liver
Our
initial protocol for the purification of gp180 from duck liver was
based on our earlier analytical procedure for identification of the
protein in cultured hepatocytes(7) . The liver was first
homogenized in nonionic detergent; after removal of cell and nuclear
debris by low speed centrifugation, the cell extract was passed over a
DHBV pre-S affinity column created by the binding of a pre-S-protein A
fusion protein to IgG-Sepharose. Following elution of the bound
material in 2 M MgCl, the eluate was then applied
to a ConA-Sepharose column, and bound glycoproteins eluted with
-methyl mannose. The eluted gp180 was then further purified by
SDS-polyacrylamide gel electrophoresis, and the 180-kDa band
electroeluted. However, this procedure resulted in an extremely poor
yield of purified gp180 (about 80 µg/kg of liver), which we
suspected was due to poor recovery of gp180 from the initial
homogenate. Because of this, subsequent work employed a different
extraction procedure: homogenization in aqueous buffer and preparation
of a crude membrane fraction by differential centrifugation prior to
detergent extraction (see ``Materials and Methods''). This
membrane fraction was then extracted with Nonidet P-40, and the extract
was further purified by pre-S affinity chromatography, lentil-lectin
affinity chromatography, and SDS-polyacrylamide gel electrophoresis.
(In this scheme, a pre-S-glutathione S-transferase fusion
protein immobilized on glutathione-Sepharose was used in the first
chromatographic step).
-terminal Edman degradation; no products
were identified, suggesting a blocked amino terminus. Accordingly, the
same materials were first digested with CNBr, trypsin, or the
combination of CNBr plus trypsin, and peptides were purified by high
pressure liquid chromatography and sequenced. The sequences of the
isolated peptides are shown in . With the exception of
peptide 2, all of these sequences were identified within the coding
region predicted by the isolated cDNA clones; the origin of peptide 2
remains unclear but is most likely a contaminant. Once these sequences
were known, screening of the protein data base immediately suggested
that gp180 might be homologous to members of the carboxypeptidase gene
family: homologs of peptide 4 were found in mouse carboxypeptidase A;
homologs of peptides 3, 5, 6, and 7 were found in rat and human
carboxypeptidase H.
Cloning of gp180 cDNA
The sequences of the
isolated peptides were used to design synthetic oligonucleotides
complementary to the predicted coding sequences; these oligonucleotides
displayed considerable degeneracy. Two oligonucleotides (46MR and 61MR;
see ``Materials and Methods'') were used to screen a
size-selected, oligo(dT)-primed, plasmid-based cDNA library prepared
from duck liver poly(A) RNA. A single clone, A23, was
isolated that annealed to both probes. This clone, which derived from
the 3` region of the cDNA, was used to isolate additional overlapping
cDNA clones as described in detail under ``Materials and
Methods.'' Ultimately, nearly 7 kb of cDNA was isolated, but
sequence analysis revealed that the coding region at the 5` extremity
of this cDNA was still translationally open, implying that additional
sequences were missing from the 5` end of the clone. Multiple attempts
to clone this 5` region (estimated at about 200 nucleotides) from
several different cDNA libraries, including some prepared with primers
from the most 5` regions of the existing clones, failed. Similarly, all
attempts at recovering the 5` end of the cDNA by the PCR-based 5` rapid
amplification of cDNA ends method (15) also failed. We do not
know the reason for this; perhaps the 5` region contains an RNA
structure that retards elongation by reverse transcriptase. In any
case, our repeated failures to clone the 5` end as cDNA prompted us to
reconstruct this region from genomic sequences (see ``Materials
and Methods'' for details). Examination of the sequence of the
reconstructed gene (see Fig. 4and Fig. 5) indicates that
the homology to known carboxypeptidases extends throughout the 5`
region and leaves little doubt that this procedure has accurately
reconstructed the authentic 5` end of the gene.
Figure 4:
The DNA sequence of the gp180 coding
region and predicted amino acid sequence of its product. Potential
sites for N-linked glycosylation are boxed, and peptide
fragments that were purified and sequenced from native gp180 (see Table
I) are underlined. Dashedline indicates
putative signal peptide; doublyunderlined segment
indicates putative transmembrane domain.
Figure 5:
Alignment of gp180 coding domains A, B,
and C with the coding region for human carboxypeptidase H (CBPH). Identical amino acid sequences are boxed. Stars, carboxypeptidase residues involved in zinc binding; opencircles, residues involved in substrate binding; closedcircles, residues involved in catalysis, as
described in Ref. 19.
The overlapping
recombinant cDNA fragments derived from the cloning procedures were
assembled into full-length gp-180 expression vectors as described under
``Materials and Methods'' (see also Fig. 2). To verify
that this cDNA encodes a functional product we examined cells
transfected with gp180 expression vectors for the production of
functional pre-S binding activity. One of the gp180 expression plasmids
(pKRSV180) bearing a selectable marker encoding G418 resistance was
transfected into LMH chicken hepatoma cells; stable G418-resistant
colonies were isolated and analyzed for duck gp180 sequences with P-labeled cloned gp180 cDNA. As shown in Fig. 3A, chicken genomic DNA contains sequences that
anneal to duck gp180 coding sequences (laneLMH); in
six independent clones, an additional band corresponding to the
transfected duck gp180 sequences was present. To test for expression,
the cells were labeled with [
S]methionine for 3
h. Cytoplasmic extracts prepared from these cells were then incubated
with an immobilized pre-S-glutathione S-transferase fusion
protein, and bound proteins were displayed by SDS-polyacrylamide gel
electrophoresis. As shown in Fig. 3B, a 180-kDa protein
that comigrates with authentic gp180 (not shown) was precipitated from
several of these clones (shown are clones 15, 16, and 35) but not from
parental LMH cells, confirming that the assembled clone is active in
pre-S binding. Interestingly, Western blotting of untransfected LMH
extracts with an anti-gp180 monoclonal antibody raised against the
recombinant duck protein
(
)revealed that chicken
cells express a 180-kDa protein that is presumably the chicken homolog
of duck gp180 (Fig. 3C); however, this protein is
inactive in DHBV pre-S binding (Fig. 3B).
Figure 3:
A,
Southern blotting of G418-resistant LMH clones transfected with
pKRSV180. Total genomic DNA from parental LMH cells and from
G418-resistant clones 11, 12, 13, 15, 16, and 35 was digested with HindIII, electrophoresed through a 0.8% agarose gel,
transferred to nylon membranes, and annealed to P-labeled
gp180 probe corresponding to the 2.5-kb BamHI fragment of
pKRSV180. B, expression of gp180 cDNA in LMH-gp180
transformants. Parental LMH cells and clone 15, 16, or 35 were labeled
with
S-methionine, and cytoplasmic extracts were prepared
as described (7). Extracts were incubated with the pre-S-glutathione S-transferase fusion protein BE1 (10) linked to
glutathione-Sepharose; the bound material was eluted by boiling in
Laemmli sample buffer, electrophoresed through an SDS-polyacrylamide
gel, and autoradiographed as described (10). The experiment with clones
15 and 16 was conducted separately from that involving clone 35. C, chicken cells contain a gp180 homolog. Total cytoplasmic
extracts of LMH-gp180 clone 35 or parental LMH cells were
electrophoresed through SDS-polyacrylamide gels, transferred to solid
supports, and incubated with anti-duck gp180 monoclonal antibodies
1D11; bound antibody was reacted with rabbit anti-mouse IgG conjugated
to horseradish peroxidase, and complexes were detected by enhanced
chemiluminescence.
DNA Sequence of gp180
The DNA sequence of gp180 is
presented in Fig. 4. The sequence predicts a protein of 1389
amino acids, with a corresponding molecular weight of 153,498, in good
agreement with our earlier experimental estimate of 150 kDa for the
unglycosylated chains made in vivo in the presence of
tunicamycin(7) . Thirteen candidate sites
(Asn-X-Ser/Thr) for N-linked glycosylation were
identified in the sequence.
-terminal unit and C the COOH-terminal unit.
Carboxypeptidase H displays amino acid sequence identities of 39, 43,
and 29% with gp180 domains A, B, and C, respectively. We note that
domain B has conserved the residues known to reside at the
carboxypeptidase catalytic center (19) as well as the residues
involved in zinc and substrate binding(19) ; this suggests that
gp180 has the potential to encode an enzymatically active protease.
terminus is highly homologous
to the known signal sequence of carboxypeptidase H and almost certainly
mediates the corresponding ER-translocation function in gp180. At the
COOH terminus is a second, still more hydrophobic region (residues
1309-1329) that could serve as a transmembrane domain.
Figure 6:
Hydrophobicity plot of gp180 by the method
of Kyte and Doolittle (20). Window, 20 amino acids.
Hydrophobic regions project above the horizontalline.
(
)suggest that
gp180, which is widely distributed on many tissues of the duck, is
indeed an enzymatically active protease. We do not know, however, the
normal biological function of gp180 in the uninfected host.
(
)This suggests that multiple host components may be
required to fully reconstitute viral entry, a theme that is emerging in
several viral systems (e.g. adenovirus (17) or human
immunodeficiency virus(18) ). These other host molecules could
be involved in additional virus-cell binding interactions or in the
envelope fusion reaction (or both). Clearly, further research will be
required to fully define the hepadnaviral entry mechanism and to
identify all of its components.
Table: gp180 peptide sequences
/EMBL Data Bank with accession number(s) U25126.
©1995 by The American Society for Biochemistry and Molecular Biology, Inc.