1Institute of Biotechnology (Biocenter 3), University of Helsinki, PO Box 65, Viikinkaari 1, FIN-00014 Helsinki, Finland, 2Viikki Graduate School in BioSciences, PO Box 56, Viikinkaari 9, FIN-00014 Helsinki, Finland, 3Department of Biological Services and 4Department of Structural Biology, Weizmann Institute of Science, 76100 Rehovot, Israel
5 To whom correspondence should be addressed. E-mail: adrian.goldman{at}helsinki.fi
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: G protein-coupled receptors/intrinsically unstructured proteins/membrane proteins/sequence prediction
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
The N, C, intracellular (1i3i) and extracellular (1e3e) domains of GPCRs are usually fairly large (>30 residues), especially in the rhodopsin-like class A GPCRs (Baldwin et al., 1997; Horn et al., 2001
). Using the intrinsic protein disorder prediction program FoldIndex, we show here that, for human GPCRs, 55% of the N-termini, 69% of the C-termini and 56% of the third intracellular loop appear to contain intrinsincally unstructured regions (IURs). We therefore compared in detail the folding analysis of 147 human class A GPCRs with integral membrane proteins of known structure, with soluble proteins and with previously identified IUPs. Our analysis demonstrates that class A 3i domains usually contain long disordered stretches, unlike most membrane proteins of known structure. This suggests that intrinsic unfolding may play a role in GPCR functionality in vivo. Consistent with this, the 3i domains have an unusual amino acid distribution, suggesting that they form a hitherto-unidentified class of IUPs.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Data set I. We chose a set of 343 human GPCRs sequences from the SWISS-PROT protein sequence data bank such that the entries were unique and cross-referenced according to the GPCRDB cross-reference database [http://www.gpcr.org/ (Horn et al., 2001); a list of selected GPCRs is given in the Supplementary data available at PEDS Online]. The boundaries for the helices were taken from the SWISS-PROT entries, which are based on the bovine rhodopsin crystal structure [http://www.gpcr.org/ (Horn et al., 2001
)]. We recorded the ID number, accession number and overall sequence length for each protein. We also calculated and collated the overall average FoldIndex [http://bioportal.weizmann.ac.il/fldbin/findex (Zeev-Ben-Mordehai et al., 2003
)] of each extramembranous domain and the residue number of the start and end of each extramembranous domain (EMD). The FoldIndex (IF) was calculated as in Zeev-Ben-Mordehai et al. (2003)
using Equation 1:
![]() | (1) |
|
For comparison, full sequences from data set I and the control set of membrane proteins (MPs) were analyzed using the neural network program DisEMBL [http://dis.embl.de/ (Linding et al., 2003)], run with default parameters. We recorded, using the remark465 prediction, maximum length of disorder for each sequence from data set I and MPs. The neural network is trained to recognize disorder in sequences based on coordinates missing in the PDB files used in the training set (Linding et al., 2003
).
In the chargehydrophobicity plots (Figure 2). the borderline between IUPs and native proteins was calculated as in Uversky et al. (2000) using Equation 2:
![]() | (2) |
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The relationship of net mean hydrophobicity to net mean charge of both data set I [all human GPCRs (n = 343)] and the control set of solved membrane proteins (n = 91) was essentially the same (Figure 2); the overall FoldIndex value (±standard error) for full-length sequences was 0.366 (±0.005) for GPCRs and 0.254 (±0.016) for membrane proteins of known structure (MP). Both sets of proteins should be folded; if anything, the GPCRs are more folded than the control set. We thought that this was probably due to the transmembrane helices in GPCRs and to study this closer, we classified the MPs as listed in the web site (http://blanco.biomol.uci.edu/Membrane_Proteins_xtal.html; Table I). It is clear that the FoldIndex server does not predict some of the secondary elements in ß-barrel structures, such as porins, very well (overall score of 0.078 ± 0.011, n = 25; Table I). Although porins as a whole were folded by FoldIndex, some areas of regular secondary structure were predicted to be disordered using default values. This problem was avoided by applying cut-off values (FoldIndex 0.3) to the data set. With these parameters, both FoldIndex and DisEMBL had similar length distributions of predicted unfolded regions (Figure 3).
|
|
|
We also compared rhodopsin-like class A GPCRs (dataset II; n = 147) with the solved integral membrane proteins (above) and with data from Uversky et al. (2000) (Table I). We chose class A GPCRs because the only solved GPCR is bovine rhodopsin (Palczewski et al., 2000
) and because our previous studies have focused on the rhodopsin-like
2-adrenergic receptor (Liitti et al., 1997
; Bartus et al., 2003
; Sen et al., 2003
; Jaakola et al., 2005
).
Class A GPCRs (n = 147), folded soluble proteins (n = 275) and membrane proteins of known structure (n = 91) all have similar overall mean net charge and mean net hydrophobicity (Table I and Figure 2). The class A intracellular domains nonetheless have IUR-like mean net charge properties (0.12 for natively undolded/IURs and 0.11 for the largest intracellular domain of class A GPCRs), unlike folded proteins, whose mean net charge is nearly neutral (0.04) (Table I). Of the GPCR loops, the 3i was by far the most unfolded and had the highest net charge (0.17) (Table I). Conversely, the N- and C-termini are marginally folded (FoldIndex score of 0.07 and 0.003, respectively) with net charges similar to folded soluble proteins (Table I). There was no correlation between domain length and IUP-like nature; domains as long as 100 residues all appeared to be unfolded (Figure 5). All membrane proteins have higher net hydrophobicity than the soluble IUPs, even in the apparently disordered loops, (scores of 0.430.55 versus 0.39) (Table I). This suggested that there might be significant sequence differences between soluble IUPs and IURs of GPCRs.
|
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Intrinsic disorder
Among the hundreds of IUPs and IURs found so far (Uversky and Fink, 2004), many act in important biological processes, including regulatory roles in transcription, translation, signal transduction and cell cycle control (Wright and Dyson, 1999
; Dunker et al., 2001
; Tompa, 2002
; Ward et al., 2004
). IUPs occur mainly in eukaryotic (33%) rather than eubacterial (4.2%) or archaean (2%) proteins (Ward et al., 2004
). In vitro, IUPs and IURs lack tertiary structure, do not have a tightly packed protein core and have a high degree of flexibility (Uversky, 2002b
); lack of protein structure, just like protein structure, is encoded in the sequence. IURs identified so far have relatively little Trp, Tyr, Phe, Cys, Ile, Leu and Met but are significantly enriched in Pro, Glu, Lys, Ser and Gln (Romero et al., 2001
; Tompa, 2002
). They also have many uncompensated charged groups, chiefly Glu and Lys, leading to a large net charge at neutral pH and low mean net hydrophobicity (Uversky et al., 2000
). For
2b-AR, which we have studied extensively biochemically (Liitti et al., 1997
; Bartus et al., 2003
; Sen et al., 2003
; Jaakola et al., 2005
), it is clear that the long 3i is disordered (Figure 7). It is the site of extensive proteolysis and production of 3i truncations leads to more stable protein (V.-P.Jaakola et al., unpublished work). In addition, the long 3i sequences are frequently so unusual that the disorder is clear even without sequence analysis. We show here, however, that this predicted disorder is GPCR-wide and has unusual sequence properties.
|
Overall, the decrease in negative charge and increase in positive charge in the 3i and C-terminal domains presumably reflects the fact that they are inside the cell and close to the membrane. However, the positive to negative ratio in the 3i domains is very large, almost 4:1, whereas the C-terminal domain ratio is <2. The intracellular 3i is predicted to be unfolded in the absence of interacting partners. Membrane proteins of solved structure do not show an increase in Arg and Lys, but this may be because these proteins are integral to the membrane; the 3i loop is on the surface of the membrane. Conversely, although charged residues such as Arg and Lys are frequently used for positioning the TM helices, as set-screws and stop-transfer signals, the very high percentage of Arg and Lys in 3i indicates that they must also play other roles (von Heijne and Gavel, 1988; von Heijne, 1989
).
Cellular signaling of GPCRs
Each G protein-coupled receptor has several partners in the cellular signaling cascade, such as G proteins (Preininger and Hamm, 2004), arrestins (Lefkowitz and Whalen, 2004
) and GPCR kinases (Lefkowitz et al., 2002
). Furthermore, the receptor environment changes during various re-localization processes in the living cell. Interactions with specific co-proteins can cause structural changes in IUPs and IURs by binding to them. Moreover, the mean net charge and hydrophobicity of the complex will presumably be more similar to those seen in typical folded proteins. There are many examples of such regulation in soluble proteins (Kriwacki et al., 1996
; Wright and Dyson, 1999
). Many of the GPCR partners that have recently been identified interact via the C terminus (Brakeman et al., 1997
; Klein et al., 1997
; Hall et al., 1998
; Lezcano et al., 2000
) or 3i domains (Wu et al., 1997
; Prezeau et al., 1999
; Heuss and Gerber, 2000
), which we predict to be the most unfolded regions in GPCRs (Table II).
The main GPCR partner is G, which has been shown by many biochemical/pharmacological studies to interact with the 2i, 3i and C-terminal regions of GPCRs [see reviews (Hamm, 2001
; Preininger and Hamm, 2004
)]. The inactive structure of bovine rhodopsin reveals only a little about these interactions, as large structural movements occur upon receptor activation and signaling. Presumably some of these interactions involve the interaction of pre-formed structures, whereas others are based on linear sequence recognition. For instance, soluble rhodopsin receptormimetic peptide studies with NMR reveal significant ordering both at the receptor C-terminus and in the flexible C-terminal regions of G
(Brabazon et al., 2003
). However, ordering of the 3i of rhodopsin receptormimetic peptide was not seen during the interactions of the G protein peptides (Brabazon et al., 2003
). These findings suggest that the interaction between the C-terminus of rhodopsin and G
might be based on linear sequence recognition, consistent with the analysis presented here.
Both zebra fish and human 2-adrenergic receptors have extremely long 3i (>100 residues) and even the most divergent regions of the zebra fish receptor show clear molecular fingerprints for each subtype (Ruuskanen et al., 2004
). Each of these 3i domains is predicted to be unfolded and consequently the disorder must be relevant for function, as it has been preserved during 400 million years of evolution. In particular, despite the lack of negatively charged residues in 3i domains in general (Table III; Figure 6), human
2a- and
2b-adrenergic receptor have long negatively charged regions: the former contains 301DLEES4DHAE and the latter 294EDEAE12CE and 245EKEEGETPED, some or all of which are conserved among most
2a- and
2b-adrenergic receptors, including zebra fish.
2c-Adrenergic receptor, however, does not have an equivalent negatively charged region. Such regions presumably have functional significance and interact with other proteins, but they are not required for coupling to G proteins or to other currently identified interacting partners. Other interactions and interacting partners may therefore remain to be discovered, such as the recently identified interactions of ß-arrestins (Wu et al., 1997
) and spinophilin (Richman et al., 2001
) with 3i domain of GPCRs. Similarly, 1433
protein, which can cause conformational change in target proteins (Bridges and Moorhead, 2004
), binds to the 3i of
2-ARs (Prezeau et al., 1999
).
Implications for structural studies
The loops and coils connecting the more regular parts of protein secondary structure are often more disordered than the core structure itself. However, the loops found in the PDB are usually fairly short, <10 residues length on average (Espadaler et al., 2004), although there are exceptions (Abdel-Meguid et al., 1984
). Crystallizability and lack of disordered structure are correlated.
One approach to dealing with this problem, adopted by most if not all of the structural genomics projects, is to eliminate as crystallization targets all proteins with long disordered regions: the so-called low-hanging fruit approach (Linding et al., 2003). However, this would mean that structural studies of GPCRs are impossible; also, if IURs are common in eukaryota and used in signaling (Dunker et al., 2002
; Pe'er et al., 2004
), it will not be possible to study key signaling molecules. Consequently, picking the low-hanging fruit may lead to the proverbial drunks under the lamp-post problem. The keys are not there, but the light is bright.
Co-crystallization of GPCRs with cognate protein ligands may help. An obvious alternative (Linding et al., 2003) is to modify or remove the IUR sequences, but this requires identifying such regions as we have done above and then testing the modified protein to ensure that the functionality is unchanged. For instance, the structure of the Yersinia enterococcus YadA head group could be solved (Nummelin et al., 2004
), but only once the leucine-triple helix stalk had been removed (H.Nummelin, personal communication). This region (residues 225380) appears to be disordered by FoldIndex (Figure 7).
Conclusions
This and other recent work suggest it is probably incorrect to speak of a single type of IUP or IUR. All IUPs and IURs have low hydrophobicity and high net charge, but the distribution of amino acids can differ (Vucetic et al., 2003). For instance, Lu and Hansen (2004) recently showed that the linker histone C-terminal domain (CTD) appears to form a sequence-dependent structure in the presence of DNA, rather than interacting in a completely unstructured charge-dependent fashion. This implies that it, too, is an IUR, but the percentage of lysine in CTD is 3641%, even higher than in IUPs. There may therefore be as many different kinds of IURs and IUPs as there are IUR and IUP functions.
Supplementary data
The proteins studied are listed in the Supplementary data, available at PEDS Online. Also, more detailed comparison of membrane proteins of known structure is given in the Supplementary data.
![]() |
Acknowledgments |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Baldwin,J.M., Schertler,G.F. and Unger,V.M. (1997) J. Mol. Biol., 272, 144164.[CrossRef][ISI][Medline]
Bartus,C.L., Jaakola,V.P., Reusch,R., Valentine,H.H., Heikinheimo,P., Levay,A., Potter,L.T., Heimo,H., Goldman,A. and Turner,G.J. (2003) Biochim. Biophys. Acta, 1610, 109123.[ISI][Medline]
Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) Nucleic Acids Res., 28, 235242.
Brabazon,D.M., Abdulaev,N.G., Marino,J.P. and Ridge,K.D. (2003) Biochemistry, 42, 302311.[CrossRef][ISI][Medline]
Brakeman,P.R., Lanahan,A.A., O'Brien,R., Roche,K., Barnes,C.A., Huganir,R.L. and Worley,P.F. (1997) Nature, 386, 284288.[CrossRef][ISI][Medline]
Bridges,D. and Moorhead,G.B. (2004) Sci. STKE, re10.
Dafforn,T.R. and Rodger,A. (2004) Curr. Opin. Struct. Biol. 14, 541546.[CrossRef][ISI][Medline]
Dunker,A.K. et al. (2001) J. Mol. Graph. Model., 19, 2659.[CrossRef][ISI][Medline]
Dunker,A.K., Brown,C.J. and Obradovic,Z. (2002) Adv. Protein Chem., 62, 2549.[ISI][Medline]
Espadaler,J., Fernandez-Fuentes,N., Hermoso,A., Querol,E., Aviles,F.X., Sternberg,M.J. and Oliva,B. (2004) Nucleic Acids Res. 32, Database issue, D185D188.
Gether,U. and Kobilka,B.K. (1998) J. Biol. Chem., 273, 1797917982.
Hall,R.A. et al. (1998) Nature, 392, 626630.[CrossRef][ISI][Medline]
Hamm,H.E. (2001) Proc. Natl Acad. Sci. USA, 98, 48194821.
Heuss,C. and Gerber,U. (2000) Trends Neurosci., 23, 469475.[CrossRef][ISI][Medline]
Horn,F., Vriend,G. and Cohen,F.E. (2001) Nucleic Acids Res., 29, 346349.
Iakoucheva,L.M., Kimzey,A.L., Masselon,C.D., Bruce,J.E., Garner,E.C., Brown,C.J., Dunker,A.K., Smith,R.D. and Ackerman,E.J. (2001) Protein Sci., 10, 560571.
Jaakola,V.-P., Rehn,M., Moeller,M., Alexiev,U., Goldman,A. and Turner,G.J. (2005) Proteins, in press.
Johnson,D.G. and Walker,C.L. (1999) Annu. Rev. Pharmacol. Toxicol., 39, 295312.[CrossRef][ISI][Medline]
Klabunde,T. and Hessler,G. (2002) Chembiochem, 3, 928944.[CrossRef][Medline]
Klein,U., Ramirez,M.T., Kobilka,B.K. and von Zastrow,M. (1997) J. Biol. Chem., 272, 1909919102.
Kriwacki,R.W., Hengst,L., Tennant,L., Reed,S.I. and Wright,P.E. (1996) Proc. Natl Acad. Sci. USA, 93, 1150411509.
Kyte,J. and Doolittle,R.F. (1982) J. Mol. Biol., 157, 105132.[CrossRef][ISI][Medline]
Lefkowitz,R.J. and Whalen,E.J. (2004) Curr. Opin. Cell. Biol., 16, 162168.[CrossRef][ISI][Medline]
Lefkowitz,R.J., Pierce,K.L. and Luttrell,L.M. (2002) Mol. Pharmacol., 62, 971974.
Lezcano,N., Mrzljak,L., Eubanks,S., Levenson,R., Goldman-Rakic,P. and Bergson,C. (2000) Science 287, 16604.
Liitti,S., Narva,H., Marjamäki,A., Hellman,J., Kallio,J., Jalkanen,M. and Matikainen,M.T. (1997) Biochem. Biophys. Res. Commun., 233, 166172.[CrossRef][ISI][Medline]
Linding,R., Jensen,L.J., Diella,F., Bork,P., Gibson,T.J. and Russell,R.B. (2003) Structure (Camb.), 11, 14531459.[Medline]
Nummelin,H., Merckel,M.C., Leo,J.C., Lankinen,H., Skurnik,M. and Goldman,A. (2004) EMBO J., 23, 701711.
Okada,T., Fujiyoshi,Y., Silow,M., Navarro,J., Landau,E.M. and Shichida,Y. (2002) Proc. Natl Acad. Sci. USA, 99, 59825987.
Palczewski,K. et al. (2000) Science, 289, 739745.
Pavletich,N.P. (1999) J. Mol. Biol., 287, 821828.[CrossRef][ISI][Medline]
Pe'er,I., Felder,C.E., Man,O., Silman,I., Sussman,J.L. and Beckmann,J.S. (2004) Proteins, 54, 2040.[CrossRef][ISI][Medline]
Preininger,A.M. and Hamm,H.E. (2004) Sci. STKE, re3.
Prezeau,L., Richman,J.G., Edwards,S.W. and Limbird,L.E. (1999) J. Biol. Chem., 274, 1346213469.
Richman,J.G., Brady,A.E., Wang,Q., Hensel,J.L., Colbran,R.J. and Limbird, L.E. (2001) J. Biol. Chem., 276, 1500315008.
Romero,P., Obradovic,Z., Li,X., Garner,E.C., Brown,C.J. and Dunker,A.K. (2001) Proteins, 42, 3848.[CrossRef][ISI][Medline]
Ruuskanen,J.O., Xhaard,H., Marjamäki,A., Salaneck,E., Salminen,T., Yan,Y.L., Postlethwait,J.H., Johnson,M.S., Larhammar,D. and Scheinin,M. (2004) Mol. Biol. Evol., 21, 1428.
Sen,S., Jaakola,V.P., Heimo,H., Engstrom,M., Larjomaa,P., Scheinin,M., Lundstrom,K. and Goldman,A. (2003) Protein. Expr. Purif., 32, 26575.[CrossRef][ISI][Medline]
Spiegel,A.M. and Weinstein,L.S. (2004) Annu. Rev. Med., 55, 2739.[CrossRef][ISI][Medline]
Tompa,P. (2002) Trends Biochem. Sci., 27, 527533.[CrossRef][ISI][Medline]
Uversky,V.N. (2002a) Protein Sci., 11, 739756.
Uversky,V.N. (2002b) Eur. J. Biochem., 269, 212.
Uversky,V.N. and Fink,A.L. (2004) Biochim. Biophys. Acta, 1698, 131153.[ISI][Medline]
Uversky,V.N., Gillespie,J.R. and Fink,A.L. (2000) Proteins, 41, 415427.[CrossRef][ISI][Medline]
Ward,J.J., Sodhi,J.S., McGuffin,L.J., Buxton,B.F. and Jones,D.T. (2004) J. Mol. Biol., 337, 635645.[CrossRef][ISI][Medline]
von Heijne,G. (1989) Nature, 341, 456458.[CrossRef][ISI][Medline]
von Heijne,G. and Gavel,Y. (1988) Eur. J. Biochem., 174, 671678.[Abstract]
Wright,P.E. and Dyson,H.J. (1999) J. Mol. Biol., 293, 321331.[CrossRef][ISI][Medline]
Wu,G., Krupnick,J.G., Benovic,J.L. and Lanier,S.M. (1997) J. Biol. Chem., 272, 1783617842.
Vucetic,S., Brown,C.J., Dunker,A.K. and Obradovic,Z. (2003) Proteins, 52, 573584.[CrossRef][ISI][Medline]
Zeev-Ben-Mordehai,T., Rydberg,E.H., Solomon,A., Toker,L., Auld,V.J., Silman,I., Botti,S. and Sussman,J.L. (2003) Proteins: Struct. Funct. Bioinf., 53, 758767.[CrossRef]
Received January 18, 2005; accepted January 28, 2005.
Edited by Mirek Cygler
|