A study on the correlation of G-protein-coupled receptor types with amino acid composition

David W. Elrod1 and Kuo-Chen Chou

Computer-Aided Drug Discovery, Pharmacia, MI 49007-4940, USA


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 References
 
G-protein-coupled receptors have become a target in utilizing bioinformatics and genomics technology to facilitate drug discovery for psychiatric diseases. In this study the covariant-discriminant algorithm [Chou and Elrod (1999)Go Protein Eng., 12, 107–118] has been used to analyze the correlation between the types of G-protein-coupled receptors and the amino acid composition. It has been found that different types of G-protein-coupled receptors are quite closely correlated with the amino acid composition, implying that the types of G-protein-coupled receptors are predictable to a considerably accurate extent if a good training data set can be established for that purpose. The method derived here can be also used to do preliminary classification of orphan G-protein-coupled receptors. This will significantly expedite the process of identifying proper G-protein-coupled receptors for drug discovery.

Keywords: acetylcholine/adrenoceptor/amine/covariant-discriminant algorithm/dopamine/rhodopsin-like receptor/serotonin


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 References
 
It is known that a large number of hormones, neurotransmitters, chemokines and other chemical messengers interact with G-protein-coupled receptors, which comprise one of the major signal transduction systems in eukaryotic cells and thus are major targets for therapeutic intervention. G-protein-coupled receptors consist of a single polypeptide chain of variable length that traverses the lipid bilayer seven times, forming characteristic transmembrane helices and alternating extracellular and intracellular sequences (Figure 1Go). Several hundred different types of G-protein-coupled receptors have been identified to date, each of which shows specificity for different types of ligands. The chemical messengers that act as ligands for the G-protein-coupled receptors generally bind to the extracellular region; however, monoamines and other small messengers usually bind between transmembrane segments, i.e. within the seven-helical bundle embedded in the membrane (Figure 1Go). The specificity of the binding is determined by the particular type of G-protein-coupled receptor (Schwartz, 1996Go).



View larger version (79K):
[in this window]
[in a new window]
 
Fig. 1. Schematic representation of a G-protein-coupled receptor with putative seven transmembrane helices, depicted as cylinders and connected by alternating cytoplasmic and extracellular hydrophilic loops. The seven-helix bundle thus formed has a central pore on its extracellular surface. The dark-shaded entity located in the central pore represents a small messenger such as monoamine ligand.

 
The G-protein-coupled receptors interact with the guanine-nucleotide-binding signal transducing proteins (G-proteins), which consist of three different sub-units {alpha}, ß and {gamma}. G-proteins mediate adenylate cyclase activation and inhibition. G-proteins may also act to stimulate the opening of K+ channels in heart cells and to participate in the phosphoinositide signaling system. It is through various G-protein-coupled receptors that many signaling cascades convert external and internal stimuli into intracellular responses. G-protein-coupled receptors characteristically activate one or more members of G-proteins. Meanwhile, the information received by the G-protein-coupled receptors is carried by G-proteins to cellular effectors such as enzymes and ion channels. These effectors influence levels of second messengers that regulate a wide variety of cellular processes including cell growth and differentiation.

Although all known G-protein-coupled receptors are seven-helix transmembrane proteins (Voet and Voet, 1995Go), they are a large and functionally diverse superfamily. According to their binding with different ligand types, G-protein-coupled receptors are classified into at least six different families. In this short communication, we would like to report that the amine-binding classes of the rhodopsin-like family of G-protein-coupled receptors are considerably correlated with their amino acid compositions.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 References
 
According to the GPCRDB (Horn et al., 1998Go) (December 2000 release), the rhodopsin-like amine G-protein-coupled receptors can be classified as (i) acetylcholine, (ii) adrenoceptor, (iii) dopamine, (iv) histamine, (v) serotonin and (vi) octopamine. The sequences in GPCRDB are derived from the SWISS-PROT (Release 39.0, 2000) and TREMBL Data Banks (Bairoch and Apweiler, 2000Go). To collect the sequences used in this study, all of the sequences from GPCRDB that were described as being one of the six classes listed above were selected. Then, all of the incomplete sequences that only contained fragments of the receptors were removed. Next, the NRDB program (Gish, 1999Go) was used to check that none of the sequences was identical to any others in the data set. After such a screening procedure the receptor types for histamine and octopamine only contain 10 and six sequences, respectively. They are too few to have any statistical significance. Therefore, these two receptor types were left for further consideration and the statistical analysis was performed among the remaining four types. Listed in Table IGo are the accession numbers of the 167 G-coupled-protein receptors, in which 31 are of acetylcholine, 44 of adrenoceptors, 38 of dopamine and 54 of serotonin types. The accession number rather than the SWISS-PROT name is used because the accession number is more stable for representing a unique protein sequence.


View this table:
[in this window]
[in a new window]
 
Table I. List of the accession numbers for the 167 G-protein-coupled receptors classified into four types
 
It is instructive to conduct an analysis of the sequence identity for the proteins in a same subset. The sequence identity percentage between two protein sequences is defined as follows. Suppose one sequence is N1 residues long and the other N2 residues long (N1 >= N2), and the maximum number of residues matched by sliding one sequence along the other is M. The sequence identity percentage between the two sequences is defined as (M / N1)%. The treatment for gaps is according to Thompson et al. (Thompson et al., 1994Go). The sequence matches performed between all members in each subset of Table IGo have indicated that the average sequence identity percentages for the acetylcholine, adrenoceptors, dopamine and serotonin subsets are 45.55, 36.82, 37.01 and 31.24%, with a SD of 19.72, 19.66, 18.90 and 16.75%, respectively. The average sequence identity percentage for all the receptor sequences of the four subsets is 26.25% with a SD of 11.72%. These numbers indicate that the majority of pairs in each of the subsets concerned have the low relative sequence identity.

The amino acid composition for each of the 167 receptors can be easily derived based on their sequences. The covariant-discriminant algorithm (Chou and Elrod, 1999Go) was utilized to analyze the 167 G-protein-coupled receptors based on their amino acid compositions. The statistical analysis was performed by the re-substitution test and the jackknife test, respectively.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 References
 
Re-substitution test

The so-called re-substitution test is an examination for the self-consistency of an identification method. When the re-substitution test is performed for the current study, the type of each G-protein-coupled receptor in a data set is in turn identified using the rule parameters derived from the same data set, the so-called training data set. The success rate thus obtained for the 167 receptors in Table IGo is summarized in Table IIGo, from which we can see that the overall success rate is 100%, indicating a perfect self-consistency. However, during the process of the re-substitution test, the rule parameters derived from the training data set include the information of the query receptor later plugged back in the test. This will certainly give a somewhat optimistic error estimate because the same receptors are used to derive the rule parameters and to test themselves. Nevertheless, the re-substitution test is absolutely necessary because it reflects the self-consistency of an identification method, especially for its algorithm part. An identification algorithm certainly cannot be deemed as a good one if its self-consistency is poor. In other words, the re-substitution test is necessary but not sufficient for evaluating an identification method. As a complement, a cross-validation test for an independent testing data set is needed because it can reflect the effectiveness of an identification method in practical application. This is important especially for checking the validity of a training database: whether it contains sufficient information to reflect all the important features concerned so as to yield a high success rate in application.


View this table:
[in this window]
[in a new window]
 
Table II. The success rates in identifying the types of G-protein-coupled receptors based on the amino acid compositiona
 
Jackknife test

As is well known, the independent data set test, sub-sampling test and jackknife test are the three methods often used for cross-validation in statistical prediction. Among these three, however, the jackknife test is deemed as the most effective and objective one [see Chou and Zhang (Chou and Zhang, 1995Go) for a comprehensive discussion about this, and Mardia et al. (Mardia et al., 1979) for the mathematical principle]. During jackknifing, each receptor in the data set is in turn singled out as a tested receptor and all the rule parameters are calculated based on the remaining receptors. In other words, the type of each receptor is identified by the rule parameters derived using all the other receptors except the one which is being identified. During the process of jackknifing both the training data set and testing data set are actually open, and a receptor will in turn move from one to the other. The results of the jackknife test thus obtained for the 167 G-protein-coupled receptors are also given in Table IIGo, from which the following phenomena can be observed. First, as expected, the success identification rates by the jackknife test are decreased compared with those by the re-substitution test. Such a decrement is more remarkable for small subsets, such as the acetylcholine subset and the dopamine subset. This is because the cluster-tolerant capacity (Chou, 1999Go) for small subsets is usually low. Hence, the information loss resulting from jackknifing will have a greater impact on the small subsets than the large ones. Nevertheless, the overall jackknife rate for the data set of 167 G-protein-couple receptors is still as high as 83.23%. It is expected that the success rate for identifying the types of G-protein-coupled receptors can be further enhanced by improving the training data of small subsets by adding into them more new proteins that have been found belonging to the types defined by these subsets.


    Conclusion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 References
 
Imagine if the samples of receptors are completely randomly distributed among the four possible subsets, the rate of correct identification by randomly assignment would generally be 1/4 = 25%; if the distribution is weighted according to the sizes of subsets, then the rate of correct identification by the weighted random assignment would be (31/167)2 + (44/167)2 + (38/167)2 + (54/167)2 {approx} 26.02%. Therefore, the rates of correct identification obtained based on the amino acid composition in both the re-substitution and jackknife tests are much higher than the corresponding completely randomized rate and weighted randomized rate, implying that the type of G-protein-coupled receptors is considerably correlated with the amino acid composition. This suggests that the types of G-protein-coupled receptors are predictable to a considerably accurate extent if a complete or quasi-complete training data set can be established for that purpose. The establishment of such a fast and accurate prediction method will speed up the pace of identifying proper G-protein-coupled receptors to facilitate drug discovery for psychiatric and schizophrenic diseases.


    Notes
 
1 To whom correspondence should be addressed. Back


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Conclusion
 References
 
Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45–48.[Abstract/Free Full Text]

Chou,K.C. (1999) Biochem. Biophys. Res. Commun., 264, 216–224.[CrossRef][ISI][Medline]

Chou,K.C. and Elrod,D.W. (1999) Protein Eng., 12, 107–118.[Abstract/Free Full Text]

Chou,K.C. and Zhang,C.T. (1995) Crit. Rev. Biochem. Mol. Biol., 30, 275–349.[Abstract]

Gish,W. (1999) http://blast.wustl.edu/pub/nrdb/

Horn,F., Weare,J., Beukers,M.W., Hörsch,S., Bairoch,A., Chen,W., Edvardsen,Ø., Campagne,F. and Vriend,G. (1998) Nucleic Acids Res., 26, 277–281.

Schwartz,T.W (1996) In Forman,J.C. and Johansen,T. (eds), Textbook of Receptor Pharmacology. CRC Press, Boca Raton, FL, pp. 65–84.

Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 4673–4680.[Abstract]

Voet,D. and Voet,J.G. (1995) In Biochemistry, 2nd edn. John Wiley & Sons, New York, pp. 1276–1278.

Received March 5, 2002; accepted June 6, 2002.





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (17)
Request Permissions
Google Scholar
Articles by Elrod, D. W.
Articles by Chou, K.-C.
PubMed
PubMed Citation
Articles by Elrod, D. W.
Articles by Chou, K.-C.