A genome-wide survey of human tyrosine phosphatases

Anirban Bhaduri and R. Sowdhamini1

National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK Campus, Bangalore 560065, India

1 To whom correspondence should be addressed. e-mail: mini{at}ncbs.res.in


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Tyrosine phosphatases play an important role in cellular signalling and networking that is antagonistic to the kinases. Near completion of the human genome- sequencing project permits us to review the distribution of this family and study its involvement in different pathways. Ninety-six homologues of the classical and dual- specific tyrosine phosphatases (DuSPs) were identified in the human genome using sensitive sequence search techniques. Uncommon domain architectures were encountered, including an example where a kinase and a phosphatase domain are found to co-exist in a single polypeptide. The evolutionary rate is higher for the DuSP compared with the classical tyrosine phosphatases. Orthologues of the 96 putative human tyrosine phosphatases were identified in four model organisms to study the conservation of the family members. Three nuclear localized tyrosine phosphatases retain an orthologous relationship with all model systems considered but still differ in their domain architectures. The diversity in the multi-domain members of the superfamily occurs mainly through domain recruitment, especially in receptor tyrosine phosphatases. The curation of human tyrosine phosphatases provides a convenient framework for characterizing and analysing the functional and structural properties of this diverse family of proteins.

Keywords: cross genome comparison/domain architecture/human genome/orthologues/rate of evolution/tyrosine phosphatases


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Phosphorylation and dephosphorylation have been shown to be key switches in signalling pathways (Hunter, 1995Go, 2000Go). Serine, threonine and tyrosine are the sites of phosphorylation in eukaryotic proteins. Tyrosine phosphatases regulate signalling and other biochemical processes through dephosphorylation of proteins at the tyrosine residue (Walton and Dixon, 1993Go; Angers-Loustau et al., 1999Go). Disruption of the functioning of tyrosine phosphatases may lead to serious diseases and disorders that have been reviewed earlier (Kishihara et al., 1993Go; Shultz et al., 1993Go; Li and Dixon, 2000Go).

Broadly, the tyrosine phosphatases are grouped into three main classes: classical, dual-specific and low molecular ones. The classical and the dual-specific tyrosine phos phatases (DuSPs) share similar structure and evolutionary relationship. The mechanism of catalysis between the classes is also similar (Denu and Dixon, 1998Go). In addition to dephosphorylation of tyrosine, the dual-specific enzymes can replace the phosphate from serine/threonine residues. The low molecular tyrosine phosphatases, though similar in functionality, possess a different structural topology, are evolutionarily unrelated from these two classes and are uncommon in vertebrates.

The estimated numbers of kinases and phosphatases in the human genome are predicted to be more than 1000 and 500, respectively (Hooft van Huijsduijnen, 1998Go). The tyrosine phosphatase domains alone were believed to be present in 100 different proteins (Hooft van Huijsduijnen, 1998Go). Application of several homology search tools suggested the number of kinases to have been overestimated and there are 550 kinases that could be detected in humans (Kostich et al., 2002Go; Krupa and Srinivasan, 2002Go). The current analysis classifies the tyrosine phosphatases into the DuSPs, the classical cytosolic tyrosine phosphatases (CyPTPs) and the membrane-bound receptor tyrosine phosphatases (rPTPs). Investigation of the domain arrangements of the proteins using several sequence search techniques (Eddy, 1998Go; Schaffer et al., 1999Go) has permitted us to explore the different domain architectures. Potential human tyrosine phosphatases have been compared with those in other model systems such as Schizosaccharomyces pombe, Caenorhabditis elegans, Drosophila melanogaster and Mus musculus to find an orthologous relationship between the proteins. Amino acid substitution rates were evaluated to understand the evolutionary trends in this superfamily of proteins.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Search for tyrosine phosphatase domains in the human genome

Sequences encoded by the complete human genome were obtained from the NCBI’s GenBank FTP site (ftp//:ftp.ncbi.nih.gov/genomes). A search for the members of the tyrosine phosphatase family was conducted using a 4-fold approach.

(1) A preliminary search for tyrosine phosphatase was performed using PSI-BLAST (Altschul et al., 1997Go). Sequences belonging to the tyrosine phosphatase superfamily in accordance to SCOP (Murzin et al., 1995Go) and having <30% identity among themselves were considered as queries. An E-value threshold of 10–3 and h-value of 0.1 for five iterations were used in the searches.

(2) The human proteome was scanned further using hidden Markov models of the tyrosine phosphatases and the DuSPs obtained from the PfamA database (Bateman et al., 2002Go), employing the Hmmsearch of the HMMER suite (Eddy, 1998Go). The E-value thresholds were set to 0.1 (N.Mhatre and N.Srinivasan, unpublished results).

(3) A complementary approach of matching sequences to a database of annotated profiles was also employed. Each human genome sequence (The International Human Genome Sequencing Consortium, 2001Go) was matched to sensitive protein family profiles obtained from the PfamA database (Bateman et al., 2002Go) using IMPALA (Schaffer et al., 1999Go). Any sequence aligning with the tyrosine phosphatase or dual-specific tyrosine domain with an E-value of <10–5 was considered as potential members of the superfamily.

(4) An interactive motif-based search using conserved regions as constraints in PHI-BLAST runs (Zhang et al., 1998Go) with a liberal E-value cut-off of 1 was used as the fourth approach. This method is sensitive in establishing homology between distantly related proteins (A.Bhaduri, R.Ravishankar and R.Sowdhamini, manuscript submitted for publication). Four sequential motifs that are signatures of the tyrosine phosphatase superfamily (Andersen et al., 2001Go) were considered as constraints along with queries used in PSI-BLAST for PHI-BLAST (Zhang et al., 1998Go) runs.

Check for true positives

Annotated tyrosine phosphatase homologues for the putative human tyrosine phosphatases were searched in the Protein Data Bank (PDB) (Westbrook et al., 2002Go) and the non-redundant database using BLAST (Altschul et al., 1990Go). If the protein is related to any of the tyrosine phosphatases in the databases with a significant expectation value, it was considered to be a true member of the family. A fold prediction applying GENTHREADER (Jones, 1999Go) and 3D-PSSM (Kelley and Strenberg, 2000Go) was conducted on failing to establish a sequential relationship with known tyrosine phosphatases. Failure to connect a hit to any the known members even by fold prediction, within significant values, would suggest the hit to be a false positive.

Removal of redundant proteins and pseudogenes

Redundant proteins, as evident by 100% sequence identity with another hit, were not considered for further analysis. Proteins appearing as fragments or identical to larger proteins were considered as pseudogenes. These proteins were removed from our analysis. NCBI annotations were also consulted for identifying pseudogenes.

Assignment of domains, transmembrane regions and cellular localization of the family members

Co-existing domains of the human tyrosine phosphatases were predicted employing IMPALA (Schaffer et al., 1999Go) and HMMPFAM (Eddy, 1998Go) sequence to profile matching methods. Each of the tyrosine phosphatase sequences were matched to protein family profiles corresponding to the SMART (Letunic et al., 2002Go) and PfamA (Bateman et al., 2002Go). The transmembrane regions for each of the hits were identified using HMMTOP (Tusnady and Simon, 2001Go) and SOSUI (Mitaku et al., 1999Go). The cellular localization of the proteins was predicted using SubLoc (Hua and Sun, 2001Go) and TargetP (Emanuelsson et al., 2000Go).

Finding orthologues across model systems

The complete proteomes of four model systems, D.melanogaster (The Drosophila melanogaster Sequencing Consortium, 2000Go), C.elegans (The C. elegans Sequencing Consortium, 1998Go), S.pombe (The Schizosaccharomyces pombe Sequencing Consortium, 2002Go) and M.musculus (The International Mouse Genome Sequencing Consortium, 2002Go) were downloaded from the NCBI genome server (ftp//:ftp. ncbi.nih.gov/~genome). Each of the human tyrosine phosphatase sequences was queried against the genome sequence database and the nearest homologue was searched back in the human genome database. Orthologous relationships (Koonin et al., 1997Go) between the obtained human tyrosine phosphatases and the model systems were examined using BLASTP (Altschul et al., 1990Go). Symmetrical best hits in these BLAST searches were considered as orthologues (Walker and Koonin, 1997Go).

Rate of evolution of the human tyrosine phosphatases on humans with respect to mouse

The nearest homologue of the human tyrosine phosphatases was searched in the mouse genome using BLASTP (Altschul et al., 1990Go) and aligned using MALIGN (Johnson et al., 1993Go). The evolutionary rates were calculated using the gamma distance correction (Ota and Nei, 1994Go).


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The 4-fold approach search was successful in detecting 144 putative tyrosine phosphatases in the human genome. None of the hits were found to be false positives as validated by the reverse PSI-BLAST and fold recognition (see Materials and methods for details). Forty-eight sequences were not considered in our analysis since they were redundant, appearing as fragments of larger sequences or psuedogenes. Sixty-one human tyrosine phosphatases have been experimentally verified so far and our 4-fold approach is able to identify all (see Supplementary data for details; these data can be accessed from http://www.ncbs.res.in/%7Efaculty/mini/hum_tp/suppl.html). Out of these, 39 are cytoplasmic phosphatases and the rest are dual-specific phosphatases. The recent work by Andersen et al. (2001)Go on vertebrate classical tyrosine phosphatases has considered 32 of them in their analysis. Forty-one of the final 96 putative tyrosine phosphatases closely resembled the DuSP family profile. The classical tyrosine phosphatase proteins were evenly distributed as cytosolic (26) and membrane bound (29) in number. Cellular localization of DuSPs shows its preference in residing in the nucleus, while the CyPTPs are mostly found in the cytoplasm (Figure 1).



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 1. The predicted cellular localization of cytosolic (CyPTP), receptor-bound (rPTP) and dual-specific tyrosine phosphatases (dPTP). Transmembrane regions of the tyrosine phosphatases were predicted using HMMTOP (Tusnady and Simon, 2001Go) and SOSUI (Hirokawa et al., 1998Go). Subcellular localization has been determined using the SubLoc (Hua and Sun, 2001Go) and TargetP (Emanuelsson et al., 2000Go). Twenty-nine rPTPs, 26 CyPTPs (seven nuclear and 19 cytoplasmic) and 41 dual-specific tyrosine phosphatases (dPTP/DuSP) (30 nuclear and 11 cytoplasmic) have been predicted.

 
The 96 plausible human tyrosine phosphatase sequences exhibit 26 different domain architectures (Figure 2; see Supplementary data). Fifty-two of the proteins appeared to have a single domain as assigned by our domain finding methods. DuSPs, CyPTPs and the receptor-bound tyrosine phosphatases (rPTPs) contain six, 10 and nine different multi-domain architectures, respectively. Three DuSP and two CyPTP domain architectures have not been known earlier. The CyPTPs, similar to the DuSPs, have a simple domain architecture, often having three or less domains in a single polypeptide (Table I). The exceptions are the BAS molecule (accession No. NP_542416) and the His-binding domain associated tyrosine phosphatase protein (HDPTP) (accession No. NP_056281). Both proteins have several modular domains (Toyooka et al., 2000Go; Yoshida et al., 2002Go) involved in interactions with various proteins in different pathways, thus explaining their complex domain architecture. The rPTPs have complex domain architectures with a large number of domains including tandem repeats of fibronectin (III) domains (Fn3) (Figure 2).



View larger version (36K):
[in this window]
[in a new window]
 
Fig. 2. Twenty-six different domain architectures observed in 96 proteins of the human genome that contain tyrosine phosphatases. Occurrence of adjacent domains in the polypeptides was recognized using sensitive methods such as IMPALA (Schaffer et al., 1999)Go and HMMPFAM (Eddy, 1998Go) (see Materials and methods for details). Novel domain architectures (discussed in detail in the text) are indicated by an asterisk. The observed domain architecture includes the positions of predicted transmembrane (TM) domains. Identified domains are colour coded and indicated schematically: PTP, classical tyrosine phosphatases; dPTP, dual-specific tyrosine phosphatases (DuSP); Fn3, fibronectin (III) domain; SH2, Src homology domain 2; Starch Binding Dom, starch-binding domain; Carb Anhyd, carbonic anhydrase domain; Ig, immunoglobulin domain; FERM, FERM domain; PDZ, PDZ domain; Cad like, cadhrin-like domain; MAM, mepin/A5/µ domain; CH2, CH2 domain; Cell Retinaldehyde Associating dom, cellular retinaldehyde associating domain; Zn Finger, zinc finger; GRAM, GRAM domain; FYVE, FYVE domain; Guanylyl Trans, guanylyl transferase; PIB, phosphotyrosine interacting domain; BRO1, BRO1 homology domain; His Dom, His-binding domain; PEST, PEST segment; S/T Kin, Ser/Thr kinase; DNA J, DNA-J domain.

 

View this table:
[in this window]
[in a new window]
 
Table I. Number of domains present along with human tyrosine phosphatases
 
The different tyrosine phosphatases have preferences for their co-existing domain, when present in multi-domain polypeptides where the co-existing domain is believed to aid their interactions with substrates. CyPTP is commonly associated with immunoglobulin and FERM domains, while the CH2 domains are frequently present in multi-domain DuSPs. There is a strong preference for Fn3 domains in the rPTPs (Table II). Tandem repeats of Fn3 domains are commonly associated with one or two tyrosine phosphatase domains in the rPTP polypeptides.


View this table:
[in this window]
[in a new window]
 
Table II. Different domains associated with CyPTPs, DuSPs and rPTPs
 
Uncommon domain architecture of tyrosine phosphatase

Reports of at least 20 different domain architectures for the tyrosine phosphatases in vertebrates have been reviewed earlier (Li and Dixon, 2000Go; Andersen et al., 2001Go). Five of the domain combinations identified in the present analysis have not been reported earlier in the human genome.

(1) The PTPRC proteins (accession No. NP_002829; Figure 2, domain architecture no. 4) is specifically expressed in the haematopoietic cells (Yamada et al., 2002Go) and their deficiency leads to severe common immuno-deficiency (SCID) (Harashima et al., 2002Go). These proteins have been reported to be similar to CD45 containing an Fn3 domain along with two CyPTP domains. We predict the presence of a second Fn3 domain in the extracellular region aiding its role in erythropoiesis.

(2) The myotubularin-related protein-2 (accession Nos NP_000243, NP_057240.1; Figure 2, domain architecture no. 17) is associated with muscle cell differentiation (Sutton et al., 2001Go) and mutations in this gene have been identified as being responsible for X-linked myotubular myopathy (Biancalana et al., 2003Go). The C-terminal end is known to house the CyPTP domain (Taylor et al., 2000Go). We predict a GRAM domain that is common to glucosyl transferases and other putative membrane-associated proteins.

(3) NP_009171 (Figure 2, domain architecture no. 22) is a member of the mitogen-activated protein kinase phosphatase (MAPKP) family involved in cellular proliferation and differentiation (Groom et al., 1996Go). Unlike other members of the family, this sequence is associated with a zinc finger domain at its C-terminus. This zinc finger domain may be playing a role in interactions with various transcriptional factors while the DuSP domain may regulate its activity by different phosphorylation states.

(4) The cyclin G-associated kinase (GAK) family members have variable expression during cell division with the highest level of expression at G1 phase (Kimura et al., 1997Go). In one of the GAK proteins (accession No. NP_005246; Figure 2, domain architecture no. 21) we find a strong structural compatibility (Figure 3) and a sequential relationship to the DuSP domain (residues 462–584). This domain appears as an insertion between the previously characterized N-terminal kinase (Ser/Thr) domain and the C-terminus DNA-J like domain. The presence of a catalytic HCX3R motif suggests a possible catalytically active tyrosine phosphatase domain. Co-existence of a kinase and phosphatase in the single polypeptide is being reported for the first time in the human genome. It is tempting to speculate that the phosphatase and kinase domains modulate the function of the substrate, critically changing the phosphorylation state.




View larger version (102K):
[in this window]
[in a new window]
 
Fig. 3. (a) The presence of Ser/Thr and DuSP domains in GAK protein and aligned with classical members of the family. The observed helices and ß-strands in the classical members are marked. The predicted positions of secondary structures in GAK sequence have been projected in a similar manner. The N-terminal part of GAK that is aligned with PAK-{alpha} (PDB code 1f3m), a member of the Ser/Thr kinase superfamily, is shown. (b) C-terminal domain of GAK aligned with phosphoinositide phosphatase Pten (Pten tumour suppressor; PDB code 1d5r), a classical member of the dual-specific tyrosine superfamily.

 
(5) NP_003791 (Figure 2, domain architecture no. 20) is an example of a novel domain architecture that involves a DuSP domain at the N-terminus and the C-terminal RNA guanylyl transferase catalytic domain, belonging to the mRNA capping enzyme family. These proteins are known to be bifunctional having RNA 5'-triphosphatase and mRNA guanylyl transferase activities in a single polypeptide (Tsukamoto et al., 1998Go). We predict the involvement of DuSP domain in RNA 5'-triphosphatase activity.

Orthologues in the model systems

Orthologues are protein pairs that have the same function, allowing transfer of functional information from one genome to the other (Tatusov et al., 1997Go). Study of the orthologues of tyrosine phosphatases across the different model organisms provides an approach for studying the conservation of protein function across the different model systems (yeast, Drosophila, C.elegans and mouse) (see Materials and methods). The number of orthologues for the human PTP-containing proteins varies between five (in yeast) and 56 (in mouse) in the different model systems considered. An increase in the number of orthologues is expected with a decrease in evolutionary distance between the different organisms with humans (Table III). A positive correlation between the percent of human rPTP orthologues and complexity of model organism is observed (Figure 4) that suggests an incremental preference for recruitment of rPTP during the course of evolution. No such correlation could be established for incremental or detrimental preference for DuSP and CyPTP (an average of 30–45% orthologues; Figure 4) across the four model systems studied.


View this table:
[in this window]
[in a new window]
 
Table III. Number of orthologues present in different model systems
 


View larger version (12K):
[in this window]
[in a new window]
 
Fig. 4. Percent rPTPs, CyPTP and DuSP (dPTP) orthologues in the different model systems. An orthologous relationship for the tyrosine phosphatases in the different model systems was searched for based on symmetrical best hits (Walker and Koonin, 1997Go). An incremental rise in the percent of rPTPs with complexity of organism is observed.

 
Three nuclear localized tyrosine phosphatases find orthologues in all the model systems whose genomes were examined. Interestingly, this is not always accompanied by preservation of domain architecture across organisms. Two out of these three proteins, mRNA capping enzyme and DuSP (accession Nos NP_003791.1, NP_009171.1), do not maintain a conservation of domain architecture.

(1) The human mRNA capping enzyme (NP_003791.1) is orthologous to single domain yeast DuSP protein involved in the phosphatase activity. In yeast, similar RNA capping activity is carried out by two polypeptides (Itoh et al., 1987Go) explaining the non-conservation of domain architecture. The two adjacent genes in the yeast have fused to form a single polypeptide in higher eukaryotes for higher efficiency in the RNA capping function.

(2) In NP_009171.1, the DuSP domain is conserved across the model systems. However, the insertion of the zinc finger domain is only found in human and mouse genomes. The zinc finger domain may aid the interaction of the protein with other transcriptional factors.

On the other hand, PTEN molecules have a conserved tyrosine phosphatase domain across all the model systems considered. The single domain CyPTP PTEN protein (accession No. NP_000305.1) functions as a tumour suppressor (Koul et al., 2002Go). Mutation of these proteins leads to carcinoma (Kurose et al., 2002Go). The protein is expressed during the G0–G1 phase of the cell cycle (Ginn-Pease and Eng, 2003Go) and their function in cell size regulation has been well documented in mammals (Backman et al., 2002Go). This function of the protein is well conserved across the different model organisms considered.

A number of orthologous relationships between model systems and human were established; however, a similar orthologous relationship was not maintained across the higher model organism (data not shown). The myotubularin-related molecule-8 (accession No. NP_060147.2), for example, has orthologues in S.pombe and D.melanogaster genomes. However, a similar orthologous relationship could not be established in the C.elegans and the mouse genome though close homologues could be detected. Such relationships would suggest reverse evolution or possible convergent evolution in the family of tyrosine phosphatases.

Evolutionary rate of the tyrosine phosphatases

The substitution rates or the evolutionary rate of the 96 human tyrosine phosphatases with respect to mouse were evaluated employing the gamma distance correction (Ota and Nei, 1994Go) (data not shown). The presence of a limited number of homologues for the different human tyrosine phosphatases restricts similar analysis in other model systems. Among the three categories, the DuSP proteins have the highest rate of evolution, having a dG (rate of substitution) of 0.71. The rate of evolution is found to be least in rPTP (dG = 0.36). Though the rate of evolution is slower in CyPTP (dG = 0.44) compared with DuSP, a number of subfamilies, for example, SHP-2 (accession No. XP_069073.2), tend to have high evolutionary rates (dG = 1.13). A closer analysis shows that this high value in SHP-2 is due to an additional SH2 domain in the mouse sequence, not present in humans.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
The 96 human tyrosine phosphatases (DuSP, CyPTP and rPTP) identified by the present analysis can be classified into 26 domain architectures. Since the low molecular weight tyrosine phosphatases are expected to be small in number (A.Bhaduri and R.Sowdhamini, unpublished results) and are structurally and evolutionarily unrelated to the dual-specific and classical tyrosine phosphatases, we have not considered their family in the present analysis. Owing to the three-dimensional similarity, we have clustered the single domain rPTPs, CyPTPs and the DuSPs together (Figure 2, domain architecture no. 1). The plausible existence of novel domains or limitations in the domain finding techniques led to the incomplete assignment of domains in nine proteins, the majority of these (five in number) being the rPTP sequences. For example, the receptor-bound N-type rPTP (accession No. NP_002837) that acts as an auto antigen and is reactive with insulin-dependent diabetes mellitus patient sera (Cui et al., 1996Go), appears to have a single tyrosine phosphatase at its C-terminus. A total of 553 N-terminal extracellular residues in this sequence failed to match any protein family. Secondary structure prediction of the extracellular domains using PSI-PRED (McGuffin et al., 2000Go) suggests an alpha-rich N-terminal domain with no definite fold prediction.

We also found 40 sequential clusters (data not shown), implying involvement of the tyrosine phosphatase domain in at least 40 plausible different biochemical reactions (Joost and Methner, 2002Go). Splice variations and different specificity to substrates in the members belonging to a cluster could contribute to additional versatility in their participation in more reactions.

Cellular localization suggests a clear preference of the different tyrosine phosphatases. DuSPs prefer to reside in the nucleus, while the CyPTPs have a predilection for the cytoplasm (Figure 1). This is consistent with the characterized tyrosine phosphatases, suggesting a greater number of nuclear DuSPs compared with the CyPTPs. However, proteins like the MAPKP-5 (accession No. NP_653329), that is weakly predicted to be cytoplasmic, functions in both the cytoplasm and the nucleus (Tanoue et al., 1999Go). Localization of proteins could also be varied due to splice variation in the amino acid sequence as seen in PTP{epsilon}. The two different splice variants of PTP{epsilon} (accession No. NP_006495.1) are either cytoplasmic or receptor-bound in localization (Wabakken et al., 2002Go).

It was interesting to note that only three out of the 96 proteins contain orthologues in all the four model systems considered. A majority of the proteins that maintain an orthologous relationship across multiple genomes (data not shown) are nuclear, suggesting a primitive and conserved role of the nuclear tyrosine phosphatase compared with their cytoplasmic and membrane counterparts. Comparison of the different evolutionary rates across different architectures suggests that the highest rate of evolution is for DuSP while rPTP is the slowest evolving group. An increase in the percent of rPTP orthologues, their diverse domain architectures and relatively slow rate of evolution would suggest that the primary mode by which diversity is introduced in rPTP-containing sequences is through a domain recruitment mechanism. A rise in the number of orthologous DuSPs with increasing genome complexity (Table III) and high rate evolution may compensate for the drop in Ser/Thr phosphatases (A.Bhaduri and R.Sowdhamini, unpublished results).

Fifty-four receptor human tyrosine kinases have been reported to exist in 18 domain combinations (Krupa and Srinivasan, 2002Go), associating with 22 different domains. The 29 receptor tyrosine phosphatases are present as nine different domain architectures (Figure 2) in combination with only five domains (Table II). Comparing the different domain architectures of kinases (Krupa and Srinivasan, 2002Go) and phosphatases (data not shown), we failed to identify similarity in the different domain architectures apart from the SHP-1 that reports two SH2 domains and a tyrosine phosphatase at the C-terminus (accession Nos NP_002825, NP_536858). A number of tyrosine kinase proteins have SH3 domains present between the SH2 domain and the kinase domain. Similar associations with SH3 domains are not reported in the present analysis; however, six proteins containing tyrosine phosphatase domains also contain SH2 domains (Table II).

The current work presents a comprehensive study and an early bioinformatic overview of the PTP family in the human genome. Classification of the PTP-containing polypeptides on the basis of domain architecture and cellular localization helps us to associate the proteins to the various biochemical pathways and the different cellular niches. The functional diversification of tyrosine phosphatases in the human genome is quite apparent from the present and previous analysis (Li and Dixon, 2000Go; Andersen et al., 2001Go). Shuffling of domains or modules among the various PTPs seems to be one of the obvious means of adapting to diverse biological roles. Reports of several novel domain combinations indicate an increase in the functional repertoire. The presence of several unexpected domains provides insights into the unknown regions of several known protein families. Experimental verification of these observations could enhance our understanding of the specific biological roles of these novel PTPs.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 References
 
Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) J. Mol. Biol., 215, 403–410.[CrossRef][ISI][Medline]

Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang, J, Zhang, Z, Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

Andersen,J.N. et al. (2001) Mol. Cell. Biol., 21, 7117–7136.[Free Full Text]

Angers-Loustau,A., Cote J.F. and Tremblay,M.L. .(1999) Biochem. Cell Biol., 77, 493–505.[CrossRef][ISI][Medline]

Backman,S., Stambolic,V. and Mak,T. (2002) Curr. Opin. Neurobiol., 12, 516–522.[CrossRef][ISI][Medline]

Bateman,A. et al. (2002) Nucleic Acids Res., 30, 276–280.[Abstract/Free Full Text]

Biancalana,V. et al. (2003) Hum. Genet., 112, 135–142.[ISI][Medline]

Cui,L., Yu,W.P., DeAizpurua,H.J., Schmidli,R.S. and Pallen,C.J. (1996) J. Biol. Chem., 271, 24817–24823.[Abstract/Free Full Text]

Denu,J.M. and Dixon,J.E. (1998) Curr. Opin. Chem. Biol., 5, 633–641.[CrossRef]

Eddy,S.R. (1998) Bioinformatics, 14, 755–763.[Abstract]

Emanuelsson,O., Nielsen,H., Brunak,S. and Heijne G.V. (2000) J. Mol. Biol., 300, 1005–1016.[CrossRef][ISI][Medline]

Ginn-Pease,M.E. and Eng,C. (2003) Cancer Res., 63, 282–286.[Abstract/Free Full Text]

Groom,L.A., Sneddon,A.A., Alessi,D.R., Dowd,S. and Keyse,S.M. (1996) EMBO J., 15, 3621–3632.[Abstract]

Harashima,A., Suzuki., M, Okochi., A, Yamamoto,M., Matsuo,Y., Motoda,R., Yoshioka,T. and Orita,K. (2002) Blood, 100, 4440–4445.[Abstract/Free Full Text]

Hirokawa,T., Boon-Chieng,S. and Mitaku,S. (1998) Bioinformatics, 14, 378–379.[Abstract]

Hooft van Huijsduijnen,R. (1998) Gene, 225, 1–8.[CrossRef][ISI][Medline]

Hua,S. and Sun,Z. (2001) Bioinformatics, 17, 721–728.[Abstract/Free Full Text]

Hunter,T. (1995) Cell, 80, 225–236.[ISI][Medline]

Hunter,T. (2000) Cell, 100, 113–127.[ISI][Medline]

Itoh,N., Yamada,H., Kaziro,Y. and Mizumoto,K. (1987) J. Biol. Chem., 262, 1989–1995.[Abstract/Free Full Text]

Johnson,M.S., Overington,J.P. and Blundell,T.L. (1993) J. Mol. Biol., 233, 735–752.

Jones,D.T. (1999) J. Mol. Biol., 287, 797–815.[CrossRef][ISI][Medline]

Joost,P. and Methner,A. (2002) Genome Biol., 3, research0063.

Kimura,S.H., Tsuruga,H., Yabuta,N., Endo,Y. and Nojima,H. (1997) Genomics, 44, 179–187.[CrossRef][ISI][Medline]

Kishihara,K. et al. (1993) Cell, 74, 143–156.[ISI][Medline]

Kelley,L.A. and Strenberg,M.J.E. (2000) J. Mol. Biol., 292, 507–522.[CrossRef]

Koonin,E.V., Mushegian,A.R., Galperin,M.Y. and Walker,D.R. (1997) Mol. Microbiol., 25, 619–637.[ISI][Medline]

Kostich,M., English,J., Madison,V., Gheyas F., Wang L., Qiu,P., Greene,J. and Laz,T.M. (2002) Genome Biol., 3, research0043.1–research 0043.12.

Koul,D., Shen,R., Garyali,A., Ke,L.D., Liu,T.J. and Yung,W.K. (2002) Int. J. Oncol., 21, 469–475.[ISI][Medline]

Krupa,A. and Srinivasan N., (2002) Genome Biol., 3, research0066.1–research0066.14.

Kurose,K., Gilley,K., Matsumoto,S., Watson,P.H., Zhou,X.P. and Eng,C. (2002) Nat. Genet., 32, 355–357.[CrossRef][ISI][Medline]

Letunic,I., Goodstadt,L., Dickens,N.J., Doerks,T., Schultz,J., Mott,R., Ciccarelli,F., Copley,R.R., Ponting,C.P. and Bork,P. (2002) Nucleic Acids Res., 30, 242–244.[Abstract/Free Full Text]

Li,L. and Dixon,J.E. (2000) Semin. Immunol., 12, 75–84.[CrossRef][ISI][Medline]

McGuffin,L.J., Bryson,K. and Jones,D.T. (2000) Bioinformatics, 16, 404–405.[Abstract]

Mitaku,S., Ono,M., Hirokawa,T., Boon-Chieng,S. and Sonoyama,M. (1999) Biophys. Chem., 82, 165–171.[CrossRef][ISI][Medline]

Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536–540.[CrossRef][ISI][Medline]

Ota,T. and Nei,M. (1994) Mol. Biol. Evol., 11, 613–619.[Abstract]

Schaffer,A.A., Wolf,Y.I., Ponting C.P., Koonin,E.V., Aravind,L. and Altschul,S.F. (1999) Bioinformatics, 15, 1000–1011.[Abstract/Free Full Text]

Shultz,L.D., Schweitzer,P.A., Rajan,T.V., Yi,T., Ihle,J.N., Matthews,R.J., Thomas,M.L. and Beier,D.R. (1993) Cell, 73, 1445–1454.[ISI][Medline]

Sutton,I.J., Winer,J.B., Norman,A.N., Liechti-Gallati,S. and MacDonald,F. (2001) Neurology, 57, 900–902.[Abstract/Free Full Text]

Tanoue,T., Moriguchi,T. and Nishida,E. (1999) J. Biol. Chem., 274, 19949–19956.[Abstract/Free Full Text]

Tatusov,R.L., Koonin,E.V. and Lipman,D.J. (1997) Science, 278, 631–637.[Abstract/Free Full Text]

Taylor,G.S., Maehama,T. and Dixon,J.E. (2000) Proc. Natl Acad. Sci. USA, 97, 8910–8915.[Abstract/Free Full Text]

46 The C. elegans Sequencing Consortium (1998) Science, 282, 2011–2046.[CrossRef][ISI]

47 The Drosophila melanogaster Sequencing Consortium (2000) Science, 287, 2185–2195.[Abstract/Free Full Text]

48 The International Human Genome Sequencing Consortium (2001) Nature, 409, 860–921.[CrossRef][ISI][Medline]

49 The International Mouse Genome Sequencing Consortium (2002) Nature, 420, 520–562[CrossRef][ISI][Medline]

50 The Schizosaccharomyces pombe Sequencing Consortium (2002) Nature, 415, 871–880.[CrossRef][ISI][Medline]

Toyooka,S., Ouchida,M., Jitsumori,Y., Tsukuda,K., Sakai,A., Nakamura,A., Shimizu,N. and Shimizu,K. (2000) Biochem. Biophys. Res. Commun., 278, 671–678.[CrossRef][ISI][Medline]

Tsukamoto,T., Shibagaki,Y., Murakoshi,T., Suzuki,M., Nakamura,A., Gotoh,H. and Mizumoto,K. (1998) Biochem. Biophys. Res. Commun., 243, 101–108.[CrossRef][ISI][Medline]

Tusnady,G.E. and Simon,I. (2001) Bioinformatics, 17, 849–850.[Abstract/Free Full Text]

Wabakken,T., Hauge,H., Funderud,S. and Aasheim,H.C. (2002) Scand. J. Immunol., 56, 276–285.[CrossRef][ISI][Medline]

Walker,D.R. and Koonin,E.V. (1997) Proc. Int. Conf. Intell. Syst. Mol. Biol., 5, 333–339.[Medline]

Walton,K.M. and Dixon,J.E. (1993) Annu. Rev. Biochem., 62, 101–120.[CrossRef][ISI][Medline]

Westbrook,J. et al. (2002) Nucleic Acids Res., 30, 245–248.[Abstract/Free Full Text]

Yamada,T., Zhu,D., Saxon,A. and Zhang,K. (2002) J. Biol. Chem., 277, 28830–28835.[Abstract/Free Full Text]

Yoshida,H. et al. (2002) J. Immunol., 168, 3213–3220.[Abstract/Free Full Text]

Zhang,Z., Schaffer,A.A., Miller,W., Madden,T.L., Lipman,D.J., Koonin,E.V. and Alstchul,S.F. (1998) Nucleic Acids Res., 26, 3986–3990.[Abstract/Free Full Text]

Received June 11, 2003; revised October 24, 2003; accepted October 30, 2003





This Article
Abstract
FREE Full Text (PDF)
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Search for citing articles in:
ISI Web of Science (8)
Request Permissions
Google Scholar
Articles by Bhaduri, A.
Articles by Sowdhamini, R.
PubMed
PubMed Citation
Articles by Bhaduri, A.
Articles by Sowdhamini, R.