National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK Campus, Bangalore 560065, India
1 To whom correspondence should be addressed. e-mail: mini{at}ncbs.res.in
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: cross genome comparison/domain architecture/human genome/orthologues/rate of evolution/tyrosine phosphatases
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Broadly, the tyrosine phosphatases are grouped into three main classes: classical, dual-specific and low molecular ones. The classical and the dual-specific tyrosine phos phatases (DuSPs) share similar structure and evolutionary relationship. The mechanism of catalysis between the classes is also similar (Denu and Dixon, 1998). In addition to dephosphorylation of tyrosine, the dual-specific enzymes can replace the phosphate from serine/threonine residues. The low molecular tyrosine phosphatases, though similar in functionality, possess a different structural topology, are evolutionarily unrelated from these two classes and are uncommon in vertebrates.
The estimated numbers of kinases and phosphatases in the human genome are predicted to be more than 1000 and 500, respectively (Hooft van Huijsduijnen, 1998). The tyrosine phosphatase domains alone were believed to be present in 100 different proteins (Hooft van Huijsduijnen, 1998
). Application of several homology search tools suggested the number of kinases to have been overestimated and there are 550 kinases that could be detected in humans (Kostich et al., 2002
; Krupa and Srinivasan, 2002
). The current analysis classifies the tyrosine phosphatases into the DuSPs, the classical cytosolic tyrosine phosphatases (CyPTPs) and the membrane-bound receptor tyrosine phosphatases (rPTPs). Investigation of the domain arrangements of the proteins using several sequence search techniques (Eddy, 1998
; Schaffer et al., 1999
) has permitted us to explore the different domain architectures. Potential human tyrosine phosphatases have been compared with those in other model systems such as Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster and Mus musculus to find an orthologous relationship between the proteins. Amino acid substitution rates were evaluated to understand the evolutionary trends in this superfamily of proteins.
![]() |
Materials and methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Sequences encoded by the complete human genome were obtained from the NCBIs GenBank FTP site (ftp//:ftp.ncbi.nih.gov/genomes). A search for the members of the tyrosine phosphatase family was conducted using a 4-fold approach.
(1) A preliminary search for tyrosine phosphatase was performed using PSI-BLAST (Altschul et al., 1997). Sequences belonging to the tyrosine phosphatase superfamily in accordance to SCOP (Murzin et al., 1995
) and having <30% identity among themselves were considered as queries. An E-value threshold of 103 and h-value of 0.1 for five iterations were used in the searches.
(2) The human proteome was scanned further using hidden Markov models of the tyrosine phosphatases and the DuSPs obtained from the PfamA database (Bateman et al., 2002), employing the Hmmsearch of the HMMER suite (Eddy, 1998
). The E-value thresholds were set to 0.1 (N.Mhatre and N.Srinivasan, unpublished results).
(3) A complementary approach of matching sequences to a database of annotated profiles was also employed. Each human genome sequence (The International Human Genome Sequencing Consortium, 2001) was matched to sensitive protein family profiles obtained from the PfamA database (Bateman et al., 2002
) using IMPALA (Schaffer et al., 1999
). Any sequence aligning with the tyrosine phosphatase or dual-specific tyrosine domain with an E-value of <105 was considered as potential members of the superfamily.
(4) An interactive motif-based search using conserved regions as constraints in PHI-BLAST runs (Zhang et al., 1998) with a liberal E-value cut-off of 1 was used as the fourth approach. This method is sensitive in establishing homology between distantly related proteins (A.Bhaduri, R.Ravishankar and R.Sowdhamini, manuscript submitted for publication). Four sequential motifs that are signatures of the tyrosine phosphatase superfamily (Andersen et al., 2001
) were considered as constraints along with queries used in PSI-BLAST for PHI-BLAST (Zhang et al., 1998
) runs.
Check for true positives
Annotated tyrosine phosphatase homologues for the putative human tyrosine phosphatases were searched in the Protein Data Bank (PDB) (Westbrook et al., 2002) and the non-redundant database using BLAST (Altschul et al., 1990
). If the protein is related to any of the tyrosine phosphatases in the databases with a significant expectation value, it was considered to be a true member of the family. A fold prediction applying GENTHREADER (Jones, 1999
) and 3D-PSSM (Kelley and Strenberg, 2000
) was conducted on failing to establish a sequential relationship with known tyrosine phosphatases. Failure to connect a hit to any the known members even by fold prediction, within significant values, would suggest the hit to be a false positive.
Removal of redundant proteins and pseudogenes
Redundant proteins, as evident by 100% sequence identity with another hit, were not considered for further analysis. Proteins appearing as fragments or identical to larger proteins were considered as pseudogenes. These proteins were removed from our analysis. NCBI annotations were also consulted for identifying pseudogenes.
Assignment of domains, transmembrane regions and cellular localization of the family members
Co-existing domains of the human tyrosine phosphatases were predicted employing IMPALA (Schaffer et al., 1999) and HMMPFAM (Eddy, 1998
) sequence to profile matching methods. Each of the tyrosine phosphatase sequences were matched to protein family profiles corresponding to the SMART (Letunic et al., 2002
) and PfamA (Bateman et al., 2002
). The transmembrane regions for each of the hits were identified using HMMTOP (Tusnady and Simon, 2001
) and SOSUI (Mitaku et al., 1999
). The cellular localization of the proteins was predicted using SubLoc (Hua and Sun, 2001
) and TargetP (Emanuelsson et al., 2000
).
Finding orthologues across model systems
The complete proteomes of four model systems, D.melanogaster (The Drosophila melanogaster Sequencing Consortium, 2000), C.elegans (The C. elegans Sequencing Consortium, 1998
), S.pombe (The Schizosaccharomyces pombe Sequencing Consortium, 2002
) and M.musculus (The International Mouse Genome Sequencing Consortium, 2002
) were downloaded from the NCBI genome server (ftp//:ftp. ncbi.nih.gov/
genome). Each of the human tyrosine phosphatase sequences was queried against the genome sequence database and the nearest homologue was searched back in the human genome database. Orthologous relationships (Koonin et al., 1997
) between the obtained human tyrosine phosphatases and the model systems were examined using BLASTP (Altschul et al., 1990
). Symmetrical best hits in these BLAST searches were considered as orthologues (Walker and Koonin, 1997
).
Rate of evolution of the human tyrosine phosphatases on humans with respect to mouse
The nearest homologue of the human tyrosine phosphatases was searched in the mouse genome using BLASTP (Altschul et al., 1990) and aligned using MALIGN (Johnson et al., 1993
). The evolutionary rates were calculated using the gamma distance correction (Ota and Nei, 1994
).
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
|
|
|
Reports of at least 20 different domain architectures for the tyrosine phosphatases in vertebrates have been reviewed earlier (Li and Dixon, 2000; Andersen et al., 2001
). Five of the domain combinations identified in the present analysis have not been reported earlier in the human genome.
(1) The PTPRC proteins (accession No. NP_002829; Figure 2, domain architecture no. 4) is specifically expressed in the haematopoietic cells (Yamada et al., 2002) and their deficiency leads to severe common immuno-deficiency (SCID) (Harashima et al., 2002
). These proteins have been reported to be similar to CD45 containing an Fn3 domain along with two CyPTP domains. We predict the presence of a second Fn3 domain in the extracellular region aiding its role in erythropoiesis.
(2) The myotubularin-related protein-2 (accession Nos NP_000243, NP_057240.1; Figure 2, domain architecture no. 17) is associated with muscle cell differentiation (Sutton et al., 2001) and mutations in this gene have been identified as being responsible for X-linked myotubular myopathy (Biancalana et al., 2003
). The C-terminal end is known to house the CyPTP domain (Taylor et al., 2000
). We predict a GRAM domain that is common to glucosyl transferases and other putative membrane-associated proteins.
(3) NP_009171 (Figure 2, domain architecture no. 22) is a member of the mitogen-activated protein kinase phosphatase (MAPKP) family involved in cellular proliferation and differentiation (Groom et al., 1996). Unlike other members of the family, this sequence is associated with a zinc finger domain at its C-terminus. This zinc finger domain may be playing a role in interactions with various transcriptional factors while the DuSP domain may regulate its activity by different phosphorylation states.
(4) The cyclin G-associated kinase (GAK) family members have variable expression during cell division with the highest level of expression at G1 phase (Kimura et al., 1997). In one of the GAK proteins (accession No. NP_005246; Figure 2, domain architecture no. 21) we find a strong structural compatibility (Figure 3) and a sequential relationship to the DuSP domain (residues 462584). This domain appears as an insertion between the previously characterized N-terminal kinase (Ser/Thr) domain and the C-terminus DNA-J like domain. The presence of a catalytic HCX3R motif suggests a possible catalytically active tyrosine phosphatase domain. Co-existence of a kinase and phosphatase in the single polypeptide is being reported for the first time in the human genome. It is tempting to speculate that the phosphatase and kinase domains modulate the function of the substrate, critically changing the phosphorylation state.
|
Orthologues in the model systems
Orthologues are protein pairs that have the same function, allowing transfer of functional information from one genome to the other (Tatusov et al., 1997). Study of the orthologues of tyrosine phosphatases across the different model organisms provides an approach for studying the conservation of protein function across the different model systems (yeast, Drosophila, C.elegans and mouse) (see Materials and methods). The number of orthologues for the human PTP-containing proteins varies between five (in yeast) and 56 (in mouse) in the different model systems considered. An increase in the number of orthologues is expected with a decrease in evolutionary distance between the different organisms with humans (Table III). A positive correlation between the percent of human rPTP orthologues and complexity of model organism is observed (Figure 4) that suggests an incremental preference for recruitment of rPTP during the course of evolution. No such correlation could be established for incremental or detrimental preference for DuSP and CyPTP (an average of 3045% orthologues; Figure 4) across the four model systems studied.
|
|
(1) The human mRNA capping enzyme (NP_003791.1) is orthologous to single domain yeast DuSP protein involved in the phosphatase activity. In yeast, similar RNA capping activity is carried out by two polypeptides (Itoh et al., 1987) explaining the non-conservation of domain architecture. The two adjacent genes in the yeast have fused to form a single polypeptide in higher eukaryotes for higher efficiency in the RNA capping function.
(2) In NP_009171.1, the DuSP domain is conserved across the model systems. However, the insertion of the zinc finger domain is only found in human and mouse genomes. The zinc finger domain may aid the interaction of the protein with other transcriptional factors.
On the other hand, PTEN molecules have a conserved tyrosine phosphatase domain across all the model systems considered. The single domain CyPTP PTEN protein (accession No. NP_000305.1) functions as a tumour suppressor (Koul et al., 2002). Mutation of these proteins leads to carcinoma (Kurose et al., 2002
). The protein is expressed during the G0G1 phase of the cell cycle (Ginn-Pease and Eng, 2003
) and their function in cell size regulation has been well documented in mammals (Backman et al., 2002
). This function of the protein is well conserved across the different model organisms considered.
A number of orthologous relationships between model systems and human were established; however, a similar orthologous relationship was not maintained across the higher model organism (data not shown). The myotubularin-related molecule-8 (accession No. NP_060147.2), for example, has orthologues in S.pombe and D.melanogaster genomes. However, a similar orthologous relationship could not be established in the C.elegans and the mouse genome though close homologues could be detected. Such relationships would suggest reverse evolution or possible convergent evolution in the family of tyrosine phosphatases.
Evolutionary rate of the tyrosine phosphatases
The substitution rates or the evolutionary rate of the 96 human tyrosine phosphatases with respect to mouse were evaluated employing the gamma distance correction (Ota and Nei, 1994) (data not shown). The presence of a limited number of homologues for the different human tyrosine phosphatases restricts similar analysis in other model systems. Among the three categories, the DuSP proteins have the highest rate of evolution, having a dG (rate of substitution) of 0.71. The rate of evolution is found to be least in rPTP (dG = 0.36). Though the rate of evolution is slower in CyPTP (dG = 0.44) compared with DuSP, a number of subfamilies, for example, SHP-2 (accession No. XP_069073.2), tend to have high evolutionary rates (dG = 1.13). A closer analysis shows that this high value in SHP-2 is due to an additional SH2 domain in the mouse sequence, not present in humans.
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
We also found 40 sequential clusters (data not shown), implying involvement of the tyrosine phosphatase domain in at least 40 plausible different biochemical reactions (Joost and Methner, 2002). Splice variations and different specificity to substrates in the members belonging to a cluster could contribute to additional versatility in their participation in more reactions.
Cellular localization suggests a clear preference of the different tyrosine phosphatases. DuSPs prefer to reside in the nucleus, while the CyPTPs have a predilection for the cytoplasm (Figure 1). This is consistent with the characterized tyrosine phosphatases, suggesting a greater number of nuclear DuSPs compared with the CyPTPs. However, proteins like the MAPKP-5 (accession No. NP_653329), that is weakly predicted to be cytoplasmic, functions in both the cytoplasm and the nucleus (Tanoue et al., 1999). Localization of proteins could also be varied due to splice variation in the amino acid sequence as seen in PTP
. The two different splice variants of PTP
(accession No. NP_006495.1) are either cytoplasmic or receptor-bound in localization (Wabakken et al., 2002
).
It was interesting to note that only three out of the 96 proteins contain orthologues in all the four model systems considered. A majority of the proteins that maintain an orthologous relationship across multiple genomes (data not shown) are nuclear, suggesting a primitive and conserved role of the nuclear tyrosine phosphatase compared with their cytoplasmic and membrane counterparts. Comparison of the different evolutionary rates across different architectures suggests that the highest rate of evolution is for DuSP while rPTP is the slowest evolving group. An increase in the percent of rPTP orthologues, their diverse domain architectures and relatively slow rate of evolution would suggest that the primary mode by which diversity is introduced in rPTP-containing sequences is through a domain recruitment mechanism. A rise in the number of orthologous DuSPs with increasing genome complexity (Table III) and high rate evolution may compensate for the drop in Ser/Thr phosphatases (A.Bhaduri and R.Sowdhamini, unpublished results).
Fifty-four receptor human tyrosine kinases have been reported to exist in 18 domain combinations (Krupa and Srinivasan, 2002), associating with 22 different domains. The 29 receptor tyrosine phosphatases are present as nine different domain architectures (Figure 2) in combination with only five domains (Table II). Comparing the different domain architectures of kinases (Krupa and Srinivasan, 2002
) and phosphatases (data not shown), we failed to identify similarity in the different domain architectures apart from the SHP-1 that reports two SH2 domains and a tyrosine phosphatase at the C-terminus (accession Nos NP_002825, NP_536858). A number of tyrosine kinase proteins have SH3 domains present between the SH2 domain and the kinase domain. Similar associations with SH3 domains are not reported in the present analysis; however, six proteins containing tyrosine phosphatase domains also contain SH2 domains (Table II).
The current work presents a comprehensive study and an early bioinformatic overview of the PTP family in the human genome. Classification of the PTP-containing polypeptides on the basis of domain architecture and cellular localization helps us to associate the proteins to the various biochemical pathways and the different cellular niches. The functional diversification of tyrosine phosphatases in the human genome is quite apparent from the present and previous analysis (Li and Dixon, 2000; Andersen et al., 2001
). Shuffling of domains or modules among the various PTPs seems to be one of the obvious means of adapting to diverse biological roles. Reports of several novel domain combinations indicate an increase in the functional repertoire. The presence of several unexpected domains provides insights into the unknown regions of several known protein families. Experimental verification of these observations could enhance our understanding of the specific biological roles of these novel PTPs.
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang, J, Zhang, Z, Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 33893402.
Andersen,J.N. et al. (2001) Mol. Cell. Biol., 21, 71177136.
Angers-Loustau,A., Cote J.F. and Tremblay,M.L. .(1999) Biochem. Cell Biol., 77, 493505.[CrossRef][ISI][Medline]
Backman,S., Stambolic,V. and Mak,T. (2002) Curr. Opin. Neurobiol., 12, 516522.[CrossRef][ISI][Medline]
Bateman,A. et al. (2002) Nucleic Acids Res., 30, 276280.
Biancalana,V. et al. (2003) Hum. Genet., 112, 135142.[ISI][Medline]
Cui,L., Yu,W.P., DeAizpurua,H.J., Schmidli,R.S. and Pallen,C.J. (1996) J. Biol. Chem., 271, 2481724823.
Denu,J.M. and Dixon,J.E. (1998) Curr. Opin. Chem. Biol., 5, 633641.[CrossRef]
Eddy,S.R. (1998) Bioinformatics, 14, 755763.[Abstract]
Emanuelsson,O., Nielsen,H., Brunak,S. and Heijne G.V. (2000) J. Mol. Biol., 300, 10051016.[CrossRef][ISI][Medline]
Ginn-Pease,M.E. and Eng,C. (2003) Cancer Res., 63, 282286.
Groom,L.A., Sneddon,A.A., Alessi,D.R., Dowd,S. and Keyse,S.M. (1996) EMBO J., 15, 36213632.[Abstract]
Harashima,A., Suzuki., M, Okochi., A, Yamamoto,M., Matsuo,Y., Motoda,R., Yoshioka,T. and Orita,K. (2002) Blood, 100, 44404445.
Hirokawa,T., Boon-Chieng,S. and Mitaku,S. (1998) Bioinformatics, 14, 378379.[Abstract]
Hooft van Huijsduijnen,R. (1998) Gene, 225, 18.[CrossRef][ISI][Medline]
Hua,S. and Sun,Z. (2001) Bioinformatics, 17, 721728.
Hunter,T. (1995) Cell, 80, 225236.[ISI][Medline]
Hunter,T. (2000) Cell, 100, 113127.[ISI][Medline]
Itoh,N., Yamada,H., Kaziro,Y. and Mizumoto,K. (1987) J. Biol. Chem., 262, 19891995.
Johnson,M.S., Overington,J.P. and Blundell,T.L. (1993) J. Mol. Biol., 233, 735752.
Jones,D.T. (1999) J. Mol. Biol., 287, 797815.[CrossRef][ISI][Medline]
Joost,P. and Methner,A. (2002) Genome Biol., 3, research0063.
Kimura,S.H., Tsuruga,H., Yabuta,N., Endo,Y. and Nojima,H. (1997) Genomics, 44, 179187.[CrossRef][ISI][Medline]
Kishihara,K. et al. (1993) Cell, 74, 143156.[ISI][Medline]
Kelley,L.A. and Strenberg,M.J.E. (2000) J. Mol. Biol., 292, 507522.[CrossRef]
Koonin,E.V., Mushegian,A.R., Galperin,M.Y. and Walker,D.R. (1997) Mol. Microbiol., 25, 619637.[ISI][Medline]
Kostich,M., English,J., Madison,V., Gheyas F., Wang L., Qiu,P., Greene,J. and Laz,T.M. (2002) Genome Biol., 3, research0043.1research 0043.12.
Koul,D., Shen,R., Garyali,A., Ke,L.D., Liu,T.J. and Yung,W.K. (2002) Int. J. Oncol., 21, 469475.[ISI][Medline]
Krupa,A. and Srinivasan N., (2002) Genome Biol., 3, research0066.1research0066.14.
Kurose,K., Gilley,K., Matsumoto,S., Watson,P.H., Zhou,X.P. and Eng,C. (2002) Nat. Genet., 32, 355357.[CrossRef][ISI][Medline]
Letunic,I., Goodstadt,L., Dickens,N.J., Doerks,T., Schultz,J., Mott,R., Ciccarelli,F., Copley,R.R., Ponting,C.P. and Bork,P. (2002) Nucleic Acids Res., 30, 242244.
Li,L. and Dixon,J.E. (2000) Semin. Immunol., 12, 7584.[CrossRef][ISI][Medline]
McGuffin,L.J., Bryson,K. and Jones,D.T. (2000) Bioinformatics, 16, 404405.[Abstract]
Mitaku,S., Ono,M., Hirokawa,T., Boon-Chieng,S. and Sonoyama,M. (1999) Biophys. Chem., 82, 165171.[CrossRef][ISI][Medline]
Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536540.[CrossRef][ISI][Medline]
Ota,T. and Nei,M. (1994) Mol. Biol. Evol., 11, 613619.[Abstract]
Schaffer,A.A., Wolf,Y.I., Ponting C.P., Koonin,E.V., Aravind,L. and Altschul,S.F. (1999) Bioinformatics, 15, 10001011.
Shultz,L.D., Schweitzer,P.A., Rajan,T.V., Yi,T., Ihle,J.N., Matthews,R.J., Thomas,M.L. and Beier,D.R. (1993) Cell, 73, 14451454.[ISI][Medline]
Sutton,I.J., Winer,J.B., Norman,A.N., Liechti-Gallati,S. and MacDonald,F. (2001) Neurology, 57, 900902.
Tanoue,T., Moriguchi,T. and Nishida,E. (1999) J. Biol. Chem., 274, 1994919956.
Tatusov,R.L., Koonin,E.V. and Lipman,D.J. (1997) Science, 278, 631637.
Taylor,G.S., Maehama,T. and Dixon,J.E. (2000) Proc. Natl Acad. Sci. USA, 97, 89108915.
46 The C. elegans Sequencing Consortium (1998) Science, 282, 20112046.[CrossRef][ISI]
47 The Drosophila melanogaster Sequencing Consortium (2000) Science, 287, 21852195.
48 The International Human Genome Sequencing Consortium (2001) Nature, 409, 860921.[CrossRef][ISI][Medline]
49 The International Mouse Genome Sequencing Consortium (2002) Nature, 420, 520562[CrossRef][ISI][Medline]
50 The Schizosaccharomyces pombe Sequencing Consortium (2002) Nature, 415, 871880.[CrossRef][ISI][Medline]
Toyooka,S., Ouchida,M., Jitsumori,Y., Tsukuda,K., Sakai,A., Nakamura,A., Shimizu,N. and Shimizu,K. (2000) Biochem. Biophys. Res. Commun., 278, 671678.[CrossRef][ISI][Medline]
Tsukamoto,T., Shibagaki,Y., Murakoshi,T., Suzuki,M., Nakamura,A., Gotoh,H. and Mizumoto,K. (1998) Biochem. Biophys. Res. Commun., 243, 101108.[CrossRef][ISI][Medline]
Tusnady,G.E. and Simon,I. (2001) Bioinformatics, 17, 849850.
Wabakken,T., Hauge,H., Funderud,S. and Aasheim,H.C. (2002) Scand. J. Immunol., 56, 276285.[CrossRef][ISI][Medline]
Walker,D.R. and Koonin,E.V. (1997) Proc. Int. Conf. Intell. Syst. Mol. Biol., 5, 333339.[Medline]
Walton,K.M. and Dixon,J.E. (1993) Annu. Rev. Biochem., 62, 101120.[CrossRef][ISI][Medline]
Westbrook,J. et al. (2002) Nucleic Acids Res., 30, 245248.
Yamada,T., Zhu,D., Saxon,A. and Zhang,K. (2002) J. Biol. Chem., 277, 2883028835.
Yoshida,H. et al. (2002) J. Immunol., 168, 32133220.
Zhang,Z., Schaffer,A.A., Miller,W., Madden,T.L., Lipman,D.J., Koonin,E.V. and Alstchul,S.F. (1998) Nucleic Acids Res., 26, 39863990.
Received June 11, 2003; revised October 24, 2003; accepted October 30, 2003