(Received for publication, October 20, 1994; and in revised form, December 12, 1994)
From the
The nucleic acid and deduced amino acid sequence of the Drosophila insulin receptor homologue (dir) has been
determined. The coding sequence of dir is contained within 10
exons spanning less than 8 kilobase pairs of genomic DNA. The deduced
amino acid sequence of the dir encodes a protein of 2148 amino
acids, larger than the human insulin receptor due to amino- and
carboxyl-terminal extensions. The overall level of amino acid identity
between the DIR and human insulin and insulin-like growth factor-I
receptors is 32.5 and 33.3%, respectively. Higher levels of identity
are found in exon 2 (45 and 43%, respectively) and in the subunit
(50 and 48%, respectively), and the positions of most cysteine residues
in the
subunit cysteine-rich domain are conserved. A novel,
400-amino acid, carboxyl-terminal extension contains 9 tyrosine
residues, four of which are present in YXXM or YXXL
motifs, suggesting that they function as binding sites for SH2
domain-containing signaling proteins. The presence of multiple putative
SH2 domain binding sites in the DIR represents a significant difference
from its mammalian homologues and suggests that, unlike the human
insulin and insulin-like growth factor-I receptors, the DIR forms
stable complexes with signaling molecules as part of its signal
transduction mechanism.
The insulin receptor of Drosophila (DIR), like its
mammalian homolog, is a tetrameric glycoprotein composed of two
subunits and two
subunits joined together by disulfide
bonds(1, 2) . The
subunits are extracellular and
contain the ligand-binding domains which bind mammalian insulin with
fairly high affinity (K
= 15
nM; (3) ). Mammalian insulin-like growth factor I
(IGF-I) (
)fails to displace bound insulin, while IGF-II
does, although at concentrations 10-20-fold higher than
insulin(3, 4) . The
subunits contain the
transmembrane domain and a ligand-activated tyrosine kinase. The kinase
domain exhibits a very high degree of sequence homology with other
members of the insulin receptor family(1, 5) . Like
the mammalian receptors, the
and
subunits are produced as a
single proreceptor precursor polypeptide which is proteolytically
cleaved to yield the mature subunits(2, 5) .
The
structure of the subunit kinase domain and the carboxyl terminus
of the
subunit have been determined by molecular cloning (1, 5) . However, these data provide little insight
into the basis of DIR ligand binding specificity and fail to account
for a larger
subunit form observed in autophosphorylation
studies(2) . Therefore, the complete structure of the DIR
protein and the organization of the dir gene have been
determined as a step toward understanding the function of this receptor
and its relationship to mammalian homologues. The results further
reinforce the evolutionary relatedness of the Drosophila and
mammalian proteins, although the DIR exhibits large extensions at both
the amino and carboxyl termini. The carboxyl-terminal extension is
particularly intriguing because it includes motifs known to be involved
in binding of SH2 domain-containing proteins(6) . This suggests
that, unlike mammalian insulin and IGF-I receptors(7) , signal
transduction from the DIR involves stable interactions with other
proteins to form multimeric signaling complexes.
Figure 1:
Map of the dir gene. A, the location of restriction sites in the dir genomic region as determined by Southern analysis. (B, BamHI; E, EcoRI; H, HindIII; K, KpnI; N, NotI; P, PstI; S, SmaI; X, XhoI.) The stippled area indicates the region of
genomic DNA that has been sequenced. Above the restriction map, the
extent of genomic clones 1C, 19-12, and S5-1 is indicated.
The SmaI site at the extreme left is >23 kb from the SmaI site in exon 2. B, the intron-exon structure of
the dir gene. C, cDNA clones of dir.
III, II
, I
, and IV
are fragments generated by PCR. D, schematic representation of the deduced DIR protein and
comparison with the HIR. The cysteine-rich regions (stippled
box), transmembrane domains (hatched box), and binding sites for
AbP2 (circle) and AbP5 (stripedbox) are
indicated.
cDNA and genomic clones encompassing the entire coding
sequence of the dir were isolated through a combination of
library screening and PCR amplification. A 5.6-kb cDNA clone (Fig. 1C, pY19) isolated from a cDNA library prepared
from 12 to 24-h Drosophila embryos (generously provided by the
laboratory of F. C. Kafatos; (8) ) encodes 277 amino acids of
the subunit, all of the
subunit, and a longer
3`-untranslated region than that found in the
2.9 cDNA (Fig. 1C; (11) ). This and other cDNA clones
were used to isolate and characterize a 15-kb genomic clone which spans
the dir coding region and extends approximately 4 kb further
in both the 5` and 3` directions (Fig. 1A, clone 1C). The III
and II
cDNA fragments (Fig. 1C) encompass the
subunit and were obtained
by PCR amplification of the above cDNA library and reverse
transcriptase-PCR of embryo RNA, respectively. Southern analysis with
the 19-12 genomic clone and subfragments thereof, the III
and I
cDNAs and S5-1 (gift of Manfred Frasch, Mount Sinai
School of Medicine) was used to generate a restriction map of the dir locus (Fig. 1A). S5-1 extends into
the beginning of the S59 gene (12) which is located proximally
to dir on the third chromosome. The sequence of 8 kb of
genomic DNA has been determined and the intron-exon structure of the dir gene deduced from comparison of genomic and cDNA sequences (Fig. 1B, Table 1). The introns are small in
size, ranging from 56 to 102 nucleotides, making the dir gene
relatively compact as compared to the human insulin or IGF-I receptor
genes(13, 14) although similar in size to the insulin
receptor-related receptor gene (HIRR, (15) ). The dir coding region is contained within 10 exons (Fig. 1B). However, the transcription start site and
extent of 5`-untranslated sequence has not yet been determined, and
another exon containing only 5`-untranslated sequence or an alternative
first exon may be found upstream.
Although the intron-exon structure
of the dir gene differs from that of the mammalian members of
the insulin receptor gene family (10 versus 21 or 22 exons,
respectively; (13, 14, 15) ), an evolutionary
relationship is suggested by the conservation of many intron-exon
boundaries (Table 2). Exon 2 is similar in size in all insulin
receptor family members. Exons 3 and 4 differ in size between the dir and mammalian receptors, although the size of both exons
together is nearly identical. Similarly, exons 5 and 6 of the mammalian
receptors are equivalent to exon 5 of dir; exons 7, 8, and 9
of the mammalian receptors are equivalent to exons 6 and 7 of dir; exons 10, 11, and 12 of the mammalian receptors
correspond to exon 8 of dir; exons 13 and 14 of the mammalian
receptors are equivalent to exon 9 of dir. A major difference
between dir and the mammalian receptor genes is that dir exon 10 encodes the entire subunit beginning 7 residues
before the transmembrane domain, whereas in the human insulin receptor,
the
subunit extending from 9 residues before the transmembrane
domain to the carboxyl terminus is encoded by 8 exons (exons
15-22). Thus, the organization of the dir gene is
clearly related to, but less complex than that of the mammalian genes.
Comparison of their structures suggests the existence of an even
simpler predecessor gene. dir exons 3 and 4, and likewise 6
and 7, may have comprised single exons in an earlier insulin receptor
gene, that were later divided differently into multiple exons during
divergent evolution.
The longest open reading frame begins with a
translational start sequence that matches the Drosophila consensus (16) in six of seven positions and encodes a
protein of 2148 amino acids (Fig. 2), although 5 nearby
downstream in-frame methionine codons with similar or slightly weaker
translational start consensus sites are found. Assuming that dir resembles most other eukaryotic mRNAs in that the 5` proximal ATG
is utilized(17) , the deduced coding region would comprise
6,444 base pairs of the mature mRNAs which are 8.6 and 11 kb in
length(11) . Data suggest that the two mature mRNA species
differ only in their untranslated sequence, because probes encoding
subunit,
subunit kinase domain, or
subunit
carboxyl-terminal extension hybridize with both mRNAs(11) .
Therefore, the 8.6- and 11.0-kb mature mRNAs are predicted to contain
approximately 2.1 and 4.5 kb of untranslated sequence, respectively.
cDNA clone pY19 extends to an apparent polyadenylation site and
contains approximately 1.6 kb of 3`-untranslated sequence. If this cDNA
is derived from the 8.6-kb mRNA, it would follow that the
5`-untranslated region is approximately 0.5 kb in length. This is in
the same size range as the 5`-untranslated sequence(s) of the human
insulin receptor (18) although smaller than that of the human
IGF-I receptor(19, 20) .
Figure 2:
Nucleotide and deduced amino acid sequence
of the longest open reading frame of dir. Exon boundaries,
determined from comparison of genomic and cDNA sequences, are indicated above the nucleotide sequence. The putative signal or membrane
anchoring sequence (amino acids 270-297), and the transmembrane
domain in the subunit (amino acids 1315-1338) are underlined. The endopeptidase cleavage site is indicated by bold lettering, and potential sites of N-linked
glycosylation are italicized. The beginning of the
carboxyl-terminal extension is indicated by the arrow at
position 1750.
The predicted DIR protein
is larger than the HIR (2148 versus 1355 amino acids) due to
extensions at both the amino- and carboxyl-terminal ends (Fig. 1D). Interestingly, the first complete codon in
exon 2 in both receptors is a conserved cysteine residue ( Fig. 2and Fig. 3). Likewise, other receptors in this
family, the IGF-I and HIRR, also begin exon 2 at a homologous cysteine (Fig. 3). Most of the size difference in the subunits is
accounted for by the first exon (341 versus 33 amino acids in
DIR and HIR, respectively), which has no significant homology with the
first exon of either the insulin or IGF-I receptors. Hydropathy
analysis does not reveal an amino-terminal signal sequence (Fig. 4A), but a stretch of 28 hydrophobic amino acids
located at positions 270-297 (Fig. 4A, open
arrow) suggests an internal signal or membrane anchoring sequence (Fig. 2, underlined). A similar internal hydrophobic
sequence is found in the Sevenless receptor(21) . The
carboxyl-terminal end of this hydrophobic stretch conforms reasonably
well with the(-3, -1) rule for identification of signal
sequence cleavage sites(22) , suggesting that the mature
subunit may begin at His-298. The size of dir
subunits
determined by gel electrophoresis after cross-linking to iodinated
insulin (2) is similar to that of the human insulin receptor
(110-120 versus 135 kDa, respectively), suggesting that
most of the leader sequence encoded by exon 1 is removed.
Figure 3: Comparison of amino acid sequences of the DIR and other insulin receptor family receptor tyrosine kinases. The single letter amino acid code is used. The amino acid sequences are aligned from the beginning of exon 2 and the carboxyl-terminal extension of the DIR is excluded. Amino acid identities with the DIR are indicated by capitalized letters, and positions of identical amino acids in all four receptors are indicated by stippled boxes underlying the sequences. The positions of DIR cysteine residues which are conserved in at least three of the four receptors are indicated by asterisks. The endopeptidase processing sites and transmembrane domains are underlined. Receptor identity and amino acid numbers are indicated on the left. The sequences were aligned with the ALIGN program.
Figure 4:
Hydrophilicity analysis of the predicted
DIR amino acid sequence. Positive values indicate more hydrophobic
regions. A, the subunit. The putative internal signal
sequence or membrane anchoring domain (open arrow) and two
smaller more amino-terminal hydrophobic domains (arrowheads) are
indicated. B, the
subunit. The transmembrane domain is
indicated (open arrow).
The DIR
amino-terminal region encoded by exon 1 contains an unusual arrangement
of sequence motifs. Two hydrophobic sequences of 17 amino acids each
are found amino-terminal to the putative signal sequence, at positions
156-173 and 206-223 ( Fig. 2and Fig. 4A,
arrowheads). Immediately following the most amino-terminal
17-amino acid hydrophobic stretch is the sequence KRRRR, which may
represent a proteolytic processing site similar to that which is used
to separate the and
subunits (residues 1086-1089). A
Gln- and His-rich sequence follows this stretch of 5 basic residues,
consisting of 26 residues of which 23 are either Gln or His. This
Gln/His-rich domain terminates at residue 205, just prior to the second
17-amino acid hydrophobic domain. A search of the protein data bases
for proteins exhibiting homology to this region of the DIR primarily
yields a variety of transcription factors including homeobox proteins,
many of which contain domains rich in glutamine or histidine residues.
The functional significance of this homology is currently unknown.
The overall level of amino acid identity between the DIR and HIR is
32.5% (excluding exon 1 and the carboxyl-terminal extension), although
the level of homology is higher if conservative substitutions are
considered. The level of identity with the IGF-I receptor is similar
(33.3%) and slightly lower with the IRR (30.7%). A higher level of
identity is found in the insulin receptor subunits (48%) than in
the
subunits (31%) because of the highly conserved tyrosine
kinase domain. However, within the
subunit, the identity in exon
2 rises to 45%. The homology is highest in the amino-terminal portion
of exon 2 (Fig. 3), which has been shown to contain determinants
responsible for high affinity insulin binding in the human insulin
receptor (23, 24, 25, 26) . dir exons 3-5 also exhibit greater than 30% amino acid identity
with all of the insulin receptor family members (Table 2).
Similar to the mammalian receptors, the cysteine-rich domain begins in
exon 2 and extends through exons 3 and 4 (Fig. 3, asterisks). The DIR has 25 cysteine residues in this region,
of which 18 occupy conserved positions in all four of the insulin
receptor family members compared (Fig. 3). The positions of 24
cysteines are conserved in the human insulin receptor, and 22 in the
IGF-I receptor (Fig. 3). Both the cysteine-rich domain (27, 28) and regions located carboxyl-terminal to the
cysteine-rich domain(29) , also appear to contribute to insulin
binding specificity. Thus, the ability of the DIR to bind mammalian
insulin with reasonably high affinity is consistent with the high
degree of sequence conservation in these regions of the
subunit.
Conversely, the carboxyl terminus of the DIR
subunit encoded by dir exons 6-8 exhibits much lower levels of homology ( Table 2and Fig. 3), suggesting that the carboxyl-terminal
region of the
subunit contributes very little to the formation of
the ligand binding pocket.
The cytoplasmic portion of the DIR
subunit contains the kinase domain which is the region of highest
homology common to all the insulin receptor family members. The portion
of the DIR
subunit which can be aligned with the HIR (Fig. 3) contains 6 cysteine residues, three of which occupy
conserved positions in other insulin receptor family members (Fig. 3, asterisks). One DIR cysteine,
Cys
, is displaced 10 residues toward the carboxyl
terminus relative to the position of a conserved cysteine in the
mammalian receptors (Cys
in the HIR, Fig. 3).
Interestingly, the c-ros protooncogene product resembles the
DIR in that it contains a cysteine residue in the identical position (30) . There are 18 tyrosine residues in the portion of the DIR
subunit which is colinear with the HIR. The positions of 9
tyrosines are conserved in the HIR and HIRR, and 10 in the IGF-I
receptor (Fig. 3). The conserved tyrosines include DIR
Tyr
which, like its counterpart Tyr
in the
juxtamembrane domain of the HIR, is found in an NPXY motif
(see below). This residue in the HIR is a major site of
autophosphorylation (31) and appears to play a critical role in
the interaction with and/or phosphorylation of substrates necessary for
signal transduction(32, 33) . Notably, the equivalent
tyrosine residue in all three mammalian receptors is preceded by an
acidic amino acid (Glu), a feature common to many tyrosine
phosphorylation sites(34) , whereas in the DIR, this tyrosine
is preceded by a hydrophobic residue (Phe).
The DIR subunit
contains a carboxyl-terminal extension of approximately 400 amino acids (Fig. 2, arrow). It was previously reported that a
termination codon (TGA) followed Pro
(numbering
according to Fig. 2), yielding a DIR
subunit comparable in
size to that of the human insulin receptor(5) . However, that
sequence included an additional cytosine residue (3287, numbering
according to (5) ) which was not found in the cDNA and genomic
clones described here. The absence of this residue leads to a
frameshift which extends the open reading frame for an additional 392
amino acids. The extension has no significant homology with the HIR
although its structure predicts an important role in DIR signal
transduction. The carboxyl-terminal extension contains 9 tyrosine
residues, some of which represent potential phosphorylation sites.
Residues Tyr
, Tyr
, Tyr
,
Tyr
, and Tyr
have nearby acidic residues.
Three tyrosines, Tyr
, Tyr
, and
Tyr
, are found in YXXM motifs which serve as
excellent substrates for the insulin receptor kinase (35) and
are involved in the binding and activation of phosphatidylinositol
3`-kinase by IRS-1(36) . However, none of the DIR YXXM
motifs exhibit amino-terminal acidic residues which have been shown to
significantly enhance the efficiency of tyrosine
phosphorylation(35) . Tyr
is part of the
sequence YRLL, which resembles the YXXL motif found in the
cytoplasmic domains of Fc, B cell, and T cell receptor subunits (37) which are involved in binding tyrosine kinases of the src and syk family (38, 39, 40) . Interestingly, these 4
tyrosines are located downstream from another apparent motif,
SXNPN, beginning at the position -5 relative to the
tyrosine residue. This suggests the consensus sequence
SXNPNYXXM/L as a functional domain in this part of
the DIR. The only other DIR
subunit tyrosine found in a similar
context is Tyr
in the juxtamembrane domain, which is
found in the sequence VNPFY
ASM. The presence of
asparagine and proline at the -3 and -2 positions,
respectively, relative to a putative tyrosine phosphorylation site is
not a feature common to many receptor tyrosine kinases or known
substrates(6) . Although the HIR, IGF-I receptor, and IRR all
contain an NPXY sequence in their juxtamembane domains, this
motif is not found elsewhere in the receptors, nor in their major
substrate, insulin receptor substrate-1 (IRS-1; 41). It is found in the
Trk and epidermal growth factor receptor tyrosine
kinases(42, 43) . In the Trk receptor, it precedes
Tyr
, phosphorylation of which is necessary for the
association of the activated receptor with SHC proteins(44) .
Thus, this motif may represent a site of interaction with a subset of
signaling proteins, and its prevalence in the carboxyl terminus of the
DIR suggests that this domain is likely to play an important role in
signaling via direct associations with particular SH2-domain proteins.
This would represent a departure from mammalian insulin and IGF-I
receptors, which seem not to form direct, stable interactions with
signaling molecules(7) . Instead, the mammalian receptors
primarily employ the substrate IRS-1 to recruit SH2 domain proteins
into signaling complexes(7) . Although direct comparison of the
amino acid sequences of IRS-1 (41) and the DIR
carboxyl-terminal extension reveals only a very weak homology over a
short region (18% in 132 amino acid overlap based primarily on the
alignment of two YXXM motifs), the presence of these motifs in
both molecules suggests a functional, and possibly an evolutionary,
relationship between them.
The presence of a carboxyl-terminal
extension as determined by DNA sequencing suggests that a larger DIR
subunit protein will be found. This is consistent with previous
observations of a 170-kDa putative receptor subunit which is directly
recognized by an antibody against the HIR tyrosine kinase
domain(2) . Two additional antibodies raised against peptides
derived from the DIR
subunit also recognize the larger subunit,
as well as smaller 93-102 kDa subunits. (
)In addition,
expression of a dir cDNA in human cells results in the
appearance of a
subunit protein of approximately 180 kDa.
Thus, consistent with the sequence analysis, the DIR
subunit is substantially larger than its mammalian counterparts. The
relationship between this larger form and the smaller 93-102 kDa
subunits which are also recognized by the same three
receptor-specific antibodies remains to be determined. It seems likely
that post-translational processing, e.g. proteolysis, and not
alternative splicing will account for the different receptor forms
because the transmembrane and cytoplasmic domains of the
subunit
including the carboxyl-terminal extension are encoded by a single exon (Fig. 1).
In summary, the determination of the complete cDNA sequence and genomic organization of the dir gene reveals that it is highly related to its mammalian counterparts, and provides some insight into the basis for ligand binding specificity. While the overall structure of the DIR protein is conserved, the presence of large amino- and carboxyl-terminal extensions represents a departure from other members of the insulin receptor family. In particular, the carboxyl-terminal extension contains structural features suggesting an important role in signal transduction, possibly in the interaction with other signaling proteins. Such direct interactions would resemble the signal transduction paradigm utilized by other receptor tyrosine kinases, such as the receptors for platelet-derived growth factor and epidermal growth factor. Therefore, this paradigm, while retained by some classes of receptor tyrosine kinases, may be the evolutionary predecessor of that utilized by mammalian insulin and IGF-I receptors in which the functions of phosphorylation and signal complex formation appear to have been largely dissociated.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank(TM)/EMBL Data Bank with accession number(s) U18351[GenBank].