Identification of the Linker-SH2 Domain of STAT as the Origin of the SH2 Domain Using Two-dimensional Structural Alignment*

Qian Gao{ddagger},§, Jian Hua{ddagger},, Rich Kimura||, Jeffery J. Headd**, Xin-yuan Fu{ddagger}{ddagger} and Y. Eugene Chin**,¶¶

From the {ddagger} Whitehead Institute for Biomedical Research, Cambridge, MA 02142; || Section of Molecular and Cellular Biology, University of California, Davis, CA 95616; {ddagger}{ddagger} Department of Pathology, Yale University School of Medicine, New Haven, CT 06510; and ** Departments of Surgery Science and Molecular Biology, Cell Biology and Biochemistry, Brown University School of Medicine, Providence, RI 02903


    ABSTRACT
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
The availability of large volumes of genomic sequences presents an unprecedented proteomic challenge to characterize the structure and function of various protein motifs. Primary structural alignment is often unable to accurately identify a given motif due to sequence divergence; however, with the aid of secondary structural prediction for analysis, it becomes feasible to explore protein motifs on a proteome-wide scale. Here we report the use of secondary structural alignment to characterize the Src homology 2 (SH2) domains of both conventional and divergent sequences and divide them into two groups, Src-type and STAT-type. In addition to the basic "{alpha}ßßß{alpha}" structure (ßB), the Src-type SH2 domain contains an extra ß-strand (ßE or ßE-ßF motif). Alternatively, the linker domain-conjugated SH2 domain in STAT contains the {alpha}B` motif. Combining BLAST data from ßB core motif sequences with predicted secondary structural alignment, we have screened for SH2 domains in various eukaryotic model systems including Arabidopsis, Dictyostelium, and Saccharomyces. Two novel genes carrying the linker-SH2 domain of STAT were discovered and subsequently cloned from Arabidopsis. These genes, designated as STAT-type linker-SH2 domain factors (STATL), are found in a wide array of vascular and nonvascular plants, suggesting that the linker-SH2 domain evolved prior to the divergence of plants and animals. Using this approach, we expanded the number of putative SH2 domain-bearing genes in Dictyostelium and comparatively studied the secondary structural profiles of both typical and atypical SH2 domains. Our results indicate that the linker-SH2 domain of the transcription factor STAT is one of the most ancient and fully developed functional domains, serving as a template for the continuing evolution of the SH2 domain essential for phosphotyrosine signal transduction.


The Src homology 2 (SH2)1 domain is an ~100-aa-long motif that recognizes and interacts with phosphotyrosine-containing motifs on the same or different protein molecules during signal transduction in animal cells. About 200 SH2 domain-containing genes have been identified in human cells, suggesting that this domain is one of the most rapidly expanded protein modules (1). In animal cells, SH2 domains are predominately present in signaling molecules, i.e. signaling-related enzymes including protein tyrosine kinases, protein tyrosine phosphatases, inositol phosphatase, and phospholipase and signaling adapters. However, the SH2 domain has also been found in transcription factor STAT family members (2, 3). In a signaling molecule with catalytic activity, the SH2 domain is often conjugated immediately upstream with another functional motif such as the SH3 domain, whereas in STAT the linker domain is immediate upstream of the SH2 domain. Recently, two STAT proteins have been discovered in Dictyostelium, a facultative slime mold capable of both growing as a single cell and differentiating into multicellular structures (4, 5). More recently, SHK, the SH2 domain-bearing protein kinase, has been identified in the same species (6). In both cases, the SH2 domains are the linker domain-conjugated. Because a typical SH2 domain has not been found in single-cell eukaryote yeast or microorganisms, the SH2 domain formation and its phospho-signaling were proposed to be coincident with animal evolution, perhaps critical during the transition from single-cell to multicellular animals or metazoan (7, 8).

The characteristic structure of the SH2 domain is three ß-strands flanked by two {alpha}-helices ({alpha}ßßß{alpha}). The first ß-strand (ßB), conserved in its sequence GXF/YBBR (9), is the core motif critical for binding phosphotyrosine (pTyr) (10). This sequence is required for the normal function of the SH2 domain and conveniently serves as the fingerprint structure in SH2 domain recognition. While ßB core motif and motif-like sequences exist widely in different genes, some perfect ßB sequences are not necessarily indicative of the SH2 domain (11). In these cases, secondary structural alignment clarifies the confusion caused by sequence alignment alone. Additionally, secondary structural analysis can provide reliable structural evidence for SH2 domains with ambiguous ßB and ßB-flanking sequences. A careful analysis of the amino acid sequence, motif orientation, and secondary structural features indicate that STAT SH2 domains differ from those involved in signal transduction. In a STAT protein, the SH2 domain is the immediate extension from five continuing {alpha}-helices representing the linker domain. Moreover, all STAT-like SH2 domains carry a {alpha}B` motif between ßD and {alpha}B sequences. We combined ßB core motif sequence BLAST with secondary structural screening to identify SH2 domains in genome databases of various eukaryotic model systems. We identified and analyzed a novel gene family, which carries the linker-SH2 domain of STAT from the genome database of Arabidopsis as well as other plants. Two of these linker-SH2 domain-carrying genes were cloned from the cDNA library of Arabidopsis and sequenced. Using secondary structural alignment, we comprehensively analyzed the typical and atypical SH2 domains found in plants, yeast, and Dictyostelium. According to our secondary structural analysis, the linker-SH2 domain of SHK in Dictyostelium, representing the modern SH2 domain in signal transduction, may share the common ancestor with or even directly evolved from the linker-SH2 domain of STAT. The discovery of the linker-SH2 domain of STAT in plants supports the notion that this domain had been developed prior to the divergence of plant and animal kingdoms.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Pat-match Search and Secondary Structural Analysis—
The consensus sequence used for Pat-match (www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl for TAIR Arabidopsis genome and seq.yeastgenome.org/cgi-bin/SGD/PATMATCH/nph-patmatch for yeast genome) screening is "GXF/YBBR" (X = any amino acid; B = hydrophobic amino acid). This consensus represents 98% of the ßB core motif sequences based upon analysis of 775 SH2 domain sequences in the SMART database using the SMART Simple Modular Architecture Research Tool (smart.embl-heidelberg.de). The sequences flanking the ßB motif in these putative genes were analyzed with the secondary structure prediction program 3D-PSSM web tool version 2.5.6 (www.sbg.bio.ic.ac.uk/~3dpssm/), which predicts {alpha}-helices and ß-strands. When overall {alpha}-helix/ß-strand arrangement, but not individual amino acid residue around the motif sequences was evaluated, all predicted {alpha}-helix and ß-sheet were correct comparing with the structural information obtained from crystallization analysis.

Cloning and Northern blot of at-STATLa and at-STATLb—
Both at-statla and at-statlb genes were cloned from the Arabidopsis cDNA library. Primers from known, flanking regions were used in order to clone the full-length sequences of at-statla and at-statlb. Various primer combinations were designed in order to cover with overlapping the full-length sequences of at-statla and at-statlb. PCR products were subsequently sequenced and the gene sequences of both at-statla and at-statlb have been deposited in GenBank. Total RNAs prepared from different parts of Arabidopsis were used for Northern blot analysis.

Western Blotting and Immuno-precipitation—
Arabidopsis thaliana, ecotype Columbia, was grown on Murashige and Skoog agar medium at 22 °C under constant light for 3 weeks. Sodium orthovanadate (100 µM) and hydrogen peroxide (1 mM) were added for the indicated times. The whole plant of Arabidopsis was homogenized in ice-cold extraction buffer (30 mM Tris-HCl, pH 8.5, 150 mM NaCl, 1 mM EDTA, 20% glycerol, 1 mM dithiothreitol, and proteinase inhibitors). Cell debris was separated from soluble material by centrifugation at 18,000 x g for 5 min. GST-at-STATLa in full length and GST-at-STATLb-SH2 domain (596–692) were constructed, expressed, and purified as described previously (12). Purified glutathione S-transferase (GST) recombinant proteins were incubated with the above-prepared extracts. Extensively washed GST protein precipitates were then subjected to standard Western blotting procedures using horseradish peroxidase-conjugated anti-pTyr (pY20) and enhanced chemiluminescence (Amersham Biosciences, Piscartaway, NJ) for detection.


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
STAT-type SH2 Domain Differs from Src-type SH2 Domain at Secondary Structural Level—
Using the secondary structural prediction program, we analyzed the SH2 domains available in SMART protein database. As expected, all the SH2 domains examined contain the basic "{alpha}ßßß{alpha}" structure. The SH2 domains of signaling factors including enzymes and adapters exclusively contain the predicted ßE or ßE-ßF motif, consistent with crystallographic findings (Fig. 1A) (1316). The small ß-strand, ßE or ßE-ßF motif, has proven to be critical for protein/ligand recognition in pTyr signaling factors (10). However, the ßF fragment is not always detectable or predictable, presumably due to the instability of this sequence in ß-strand formation (15, 17). According to the distance between ßC and ßD motifs, the SH2 domain-carrying enzymes can be further divided into the long ßC-ßD loop and the short ßC-ßD loop groups (Fig. 1A). Src family members all belong to the long ßC-ßD group. In most of the adapters, the SH2 domains carry a putative short ßC-ßD loop. For transcription factor STAT, the length of the predicted ßC-{alpha}B sequence varies with various family members and are overall shorter than those of signaling factors (Fig. 1A). But the most striking difference between Src-type and STAT-type SH2 domains is that STAT or STAT-like SH2 domains do not contain the ßE or ßE-ßF motif. Instead, they all contain the {alpha}B` or a nonsplit {alpha}B motif (Fig. 1A), which is considered as the critical region for STAT dimerization (16, 18). All these secondary structural features of STAT-type SH2 domains obtained from the secondary structural prediction are consistent with the findings obtained from the crystallization studies (16, 18). In Drosophila STAT, the SH2 domain carries the putative {alpha}B` motif despite lacking the ßD motif (Fig. 1A). Two putative STAT-like sequences (ce-STATa and ce-STATb) were previously found by homologous alignment in Caenorhabditis elegans genome data (19). Our secondary structural analysis predicts that these SH2 domains also contain {alpha}B` motifs (Fig. 1A). Therefore, the level of detail obtained here can easily distinguish subtle domain differences between STAT SH2 and signaling factors.




View larger version (177K):
[in this window]
[in a new window]
 
FIG. 1. Primary and secondary structural analysis of the SH2 domains. A, proteins that carry the SH2 domains are grouped into (i) signaling enzymes: SRC(K03218), BLK(S76617), FGR(M19722), FYN(M14333), HCK(M16591), LCK(X13529), LYN(M16038), YES(BC08960), YRK, BTK(X58957), BMX(X83107), ITK(D13720), TEC(D29767), IPP(), SHP-1(M74903), SHP-2(L08807), BSK(Q13882), FRK(U00803), SYK(Z29630), ZAP-70(L05148), CSK(X74765), MATK(L18974), ABL(X16416), FES(X52192), PLC{gamma}1(NM_182811), and PLC{gamma}2 (X14034); (ii) signaling adaptors: DAPP(NP_055210), GRB2(M96995), GRAP(U52518), GRP2(Y18051), NCK1(X17576), NCK2(AF043119), SH2B(AF227967), SHC(U73377), SCK (AL360254), PI3K-p85 (P27986), VAV1(AF030227), VAV2(S76992), SOCS1(U88326), SOCS2(AB004903), GRB10(U34355), RIN1(L36463), NSP1(AAQ88772), and NSP3(AAQ89948); and (iii) self-signaling transcription factors (i.e. STAT): hs-STAT1: M97935, hs-STAT2: U18671, hs-STAT3: L29277, hs-STAT4: L78440, hs-STAT5a: L41142, hs-STAT5b: U48730, and hs-STAT6: U16031), dm-STAT (U40070), and two putative C. elegans STATs (ce-STATa: CEY51H4A and ce-STATb: AF163113) (19) were included for analysis. All the SH2 domains as indicated were submitted to secondary structure program analysis and aligned. The SH2 domains of the enzyme group are divided into long ßC-{alpha}B and short ßC-{alpha}B subgroups according to the distance between putative ßC and {alpha}B motifs. The predicted {alpha}-helices are printed in cyan, and the predicted ß-sheets are printed in yellow. The ßB motifs are shown in red. In the STAT SH2 domain, the ßB motif is followed by a phenylalanine (F) residue, which is highlighted in blue. Hs-Src-Crys1 and Hs-Src-Crys2 represent the secondary structure obtained from two independent crystal studies of the Src (16, 17). B, the divergent amino acid sequence of the CBL-SH2 domain (U26170) was analyzed with secondary structure program and compared with the features obtained from crystallographic study (20). C, the suspected SH2-like sequences of human Jak1 (M64174, M35203), Jak2 (AF058925), Jak3 (U09607), Tyk2 (X54637), and Drosophila JAK (L26975) were submitted to secondary structure program analysis and aligned with each other.

 
We also analyzed the SH2 domains with divergent sequences. We investigated CBL and JAK family members that are all known to contain a ßB-like motif flanked by ambiguous sequences. For CBL, the typical "{alpha}ßßß{alpha}" topology obtained from secondary structural prediction (Fig. 1B) agreed with the conclusion drawn in a previous crystallography study (20). However, unlike Src-type or STAT-type SH2 domains, the CBL SH2 domain lacks the small ßE or {alpha}B` motif. The immediate upstream sequence of the kinase-like domain in JAK has long been suspected as an SH2-like domain (21). Two-dimensional alignment clearly reveals that this sequence, though ambiguous, represents a typical Src-type SH2 domain (Fig. 1C). Among all the JAK members analyzed, Drosophila JAK contains the longest loop between ßB and ßC motifs. Utilizing other programs such as PHD (maple.bioc. columbia.edu/predictprotein) and JPRED (www.compbio.dundee.ac.uk) for prediction, similar results were obtained (data not shown). Thus, secondary structural prediction is particularly suited for, and reliable in, the detection of such fine structural difference of protein motifs with divergent sequences.

STATLs Are Novel Genes Carrying the STAT-like Linker-SH2 Domain in Plants—
To locate putative SH2 domains in plants, we used Pat-match to screen the genomes of various eukaryotes for the ßB core motif (see "Materials and Methods"). In the Arabidopsis Information Resource (TAIR; www.arabidopsis.org), we found 604 ßB sequences in 583 putative genes, of which secondary structural analysis confirmed two putative genes containing the typical "{alpha}ßßß{alpha}" structure of an SH2 domain [AC007651 (protein locus: AAD50031); AC007260 (protein locus: AAD30582)]. Subsequent cDNA cloning and DNA sequencing analysis indicate that these two genes are closely related to each other (65% identical at the amino acid level) (Fig. 2A). In both genes, the C termini are longer and divergent than those predicated by those published genomic sequences. The SH2 domains reside at the C-terminal regions and the predicted secondary structure match those found in STAT (Fig. 2B). Moreover, a sequence of 90 amino acids immediately N-terminal of the SH2 domain is also well conserved and resembles STAT’s linker domain (Fig. 2B) (16, 18, 22). We therefore named the two novel genes STAT-type linker-SH2 domain factor a and b (STATLa and STATLb). The linker-SH2 domain is well conserved in putative STATL genes identified in both monocot and dicot plants including soybean, sorghum, potato, tomato, medicago, wheat, and rice (Fig. 2C). Surprisingly, the STATL sequences were also retrieved in lower plants like the moss, Physcomitrella patens (pp-STATL) from NCBI translated BLAST searches, and the green algae, Chlamydomonas (cr-STATL) from the Chlamydomonas Resource Center (www.biology.duke.edu/chlamy_genome/crc.html), (Fig. 2C). The ubiquitous presence of STATL in plants suggests that the SH2 domain plays a fundamental role in plants and animals and rejects the possibility that this domain originated in plants through some accidental means (i.e. reverse horizontal gene transfer) or coevolved with tyrosine kinases or the SH3 domain (23).



View larger version (54K):
[in this window]
[in a new window]
 
FIG. 2. Two novel genes in A. thaliana carry the linker-SH2 domain of STAT. A, two novel genes at-statla and at-statlb were cloned from A. thaliana cDNA library by performing RT-PCR. Predicted protein sequences of at-STATLa and at-STATLb were aligned by using MacVector ClustalW (v1.4) program. The SH2 domains are shown in blue, and the ßB motifs are shown in red. B, the C-terminal regions of at-STATLa and at-STATLb were analyzed with secondary structure program and aligned with the linker-SH2 domains of human STAT3 and Dictyolstalium STATc. The linker and the SH2 domains are indicated and are compared with the secondary structural characteristics of the linker-SH2 domain of STAT1 and STAT3 obtained from crystallization (16, 18). C, the C-terminal regions of putative STATL genes are obtained from different plants, including gm-STATL (Glycine max, BE657344), pp-STATL (Physcomitrella patens subsp. Patens, BJ172371), le-STATL (Lycopersicon esculentum, BG133386), os-STATL (Oryza sativa, AC087599), sb-STATL (Sorgham bicolor, BE356256), sp-STATL (Sorghum propinquum, BG158680), ta-STATL (Triticum aestivum, BJ255035), cr-STATL (Chlamydomonas reinhardtii, BI724230.1 and BE237845), mt-STATL (Medicago truncatula, BG583569), st-STATL (Solanum tuberosum, BE472025, and EST416878-TIGR Potato Gene Index), and zm-STATL (Zea mays, BH788331). D, regions from Ser139 to Pro232 of both at-STATLa and at-STATLb as well as the DNA-binding domain of different STAT proteins of Dictyolstalium were analyzed with secondary structure program and aligned as indicated. The secondary structural features of the DBD of STAT1 and STAT3 based on the x-ray crystal structures (16, 18) are in purple. E, total RNA (20 µg) extracted from stem, leave, flower, and root of Arabidopsis were analyzed by Northern blot and mRNAs of at-statla (1.0 kb) and at-astatlb (1.4 kb) are indicated. Bottom panel shows 28S rRNA. F, whole extracts were prepared from Arabidopsis treated with Na3VO5 plus H2O2 for different times. Such prepared whole extracts were separated by SDS-PAGE and transferred to polyvinylidene difluoride membrane followed by Western blotting analysis with anti-pTyr antibody (pY20). G, GST-STATLa-full length, GST-STATLb-SH2 domain, and GST control were incubated with the protein lysates of Arabidopsis treated with or without Na3VO5/H2O2 for 3 h. The precipitates were washed extensively and subjected to Western blotting analysis with pY20 (left panel). The right panel provides input of GST proteins used for the affinity precipitation.

 
Amino acid sequence alignment indicates that the linker-SH2 domain identity is 29% between at-STATLa and hs-STAT3 and 33% between at-STATLb and hs-STAT3. About the same range of amino acid identity was obtained when the same domain of dd-STATa (29%) or dd-STATc (30%) was aligned with that of hs-STAT3. In the SH2 domains of all STAT proteins regardless of the species, the ßB motif is exclusively followed by a phenylalanine. All STATL members follow this rule without exception (Fig. 2C), whereas in the SH2 domains of signaling factors the ßB motif has never been found to be followed immediately by phenylalanine. Another protein sequence feature is that, like most STAT members, the first residue of {alpha}A motif in STATL is also a lysine rather than an arginine, which coordinates with a phosphate group in metazoan STAT (16).

The secondary structure of STAT’s linker-SH2 domain (the five {alpha}-helices and the "{alpha}ßßß{alpha}" sandwich) (16, 18) is well conserved in at-STATL proteins according to our prediction (Fig. 2B). When aligned with STAT, sequence gaps in at-STATLa and at-STATLb occur at neutral positions and do not interrupt the arrangement of {alpha}-helices and ß-strands. For instance, a 9-aa stretch (ENMAGKGFS), absent in the linker domain of at-STATLa or at-STATLb, forms an out loop between {alpha}9 and {alpha}10 in the linker domain, which does not interrupt the helicity between {alpha}9 and {alpha}10 (Fig. 2B) (16). For all STATL sequences obtained from different plants, the SH2 domains do not carry the ßE motif. Instead, they exclusively carry the {alpha}B` motif or a nonsplit {alpha}B`{alpha}B (Fig. 2C), which matches the secondary structural characteristics of the STAT SH2 domain. The lack of ßD motif coupled with the presence of a large nonsplit {alpha}B in the SH2 domain of cr-STATL in Chlamydomonas may indicate a premature form of SH2 domain that arose during its development. The upstream of the linker-SH2 domain (Ser139-Pro232 in both at-STATLa and at-STATLb) is predicted to form a continuous ß-sheet (Fig. 2D), which has some similarity to the DNA-binding domain of dd-STATa or dd-STATc but not that of human STAT (16, 18, 24). Therefore, the strong similarity between STAT and STATL within their linker-SH2 domains at both the amino acid sequence and secondary structure levels strongly indicate that these two domains share a common ancestor that evolved prior to the divergence of plants and animals.

Messenger RNAs of both genes were detected ubiquitously in different parts of Arabidopsis (Fig. 2E). Plants are not known to contain JAK-like nonreceptor tyrosine kinases (25). However, receptor protein kinases, serine/threonine plus tyrosine dual-function protein kinases, as well as protein phosphatases exist in plants (13, 26, 27). In Fig. 2F, tyrosine-phosphorylated proteins ranging in size from 60 to 120 kDa were detected in Arabidopsis treated with vanadate, a naturally occurring transition metal that can function as a nonspecific protein phosphatase inhibitor and trigger protein tyrosine phosphorylation in cells (28). Purified GST-STATLa-full length and GST-STATLb-SH2 domain proteins but not the GST control (Fig. 2G, right panel) were able to pull down the 120-kDa tyrosine phosphorylated protein in vanadate-treated samples but not in the samples without vanadate treatment (Fig. 2G, left panel). Therefore, the SH2 domain of STATL proteins might be involved in pTyr-dependent protein-protein interaction in plants.

SHK’s Linker-SH2 Domain Is Homologous to That of STAT in Dictyostelium—
How does the STAT-type linker-SH2 domain relate phylogenetically to the tyrosine signaling- or Src-type SH2 domain, which quickly expanded in number in animal cells? To answer this question, we studied the database of Dictyostelium, a slime mold considered more closely related to fungi and animals than to plants (29). SH2-bearing genes cloned from Dictyostelium include two transcription factors (i.e. STATa and STATc) and one signaling factor (i.e. SHK1) (46). From the Dictyostelium discoideum Genome Project (www.sanger.ac.uk/Projects/D_discoideum), two additional putative STAT sequences, designated as dd-STATb and dd-STATd, and four additional putative SHK genes, designated as dd-SHK2, dd-SHK3, dd-SHK4, and dd-SHK5, were identified using ßB core motif sequence as well as the whole SH2 domain for BLAST searches. Like SHK1, the protein kinases of these novel putative SHK members are most closely related to the protein kinases found in plants (6). However, these same kinases in plants are not conjugated to any SH2 or SH2-like sequences. Using the kinase domain (C-region) of the putative SHK2 for a BLAST search, a large number of homologous expressed sequence tags (ESTs) were identified in the genomes of both Arabidopsis and Dictyostelium, but not in the databases of other eukaryotes (not shown). This suggests a close evolutionary relationship between plants and Dictyostelium.

Primary and predicted secondary structure alignment indicates that the SHK SH2 domains carry some features of the STAT SH2 domains in Dictyostelium. Phenylalanine (F) has been found to follow the ßB motif in the SH2 domains of SHK1, SHK2, and SHK3 (Fig. 3A). However, secondary structural modifications have been noted in SHK’s SH2 domain that contrast STAT. In the region between the predicted ßD and {alpha}B motifs, the {alpha}B` motif has been replaced by the ßF or ßE-ßF motif. The introduction of ßE in SHK5 lengthened the distance between {alpha}A and {alpha}B motifs (Fig. 3A). This seems to reflect the trend of SH2 maturation because this distance between {alpha}A and {alpha}B motifs have become even longer in the Src SH2 domain in metazoans (Fig. 1A). In this region, both sequence similarities and gaps were observed, suggesting that the accumulation of favorable mutations, insertions, and deletions might all contribute to the {alpha}-helix/ß-sheet switch (Fig. 3A). Thus, this {alpha}B`/ßE-ßF-containing region in the SH2 domain serves as an evolutionarily active region (EAR) within an otherwise conserved domain essential for its function. When STATc’s linker domain was used for a BLAST search, the sequence between the protein kinase domain and the SH2 domain (the linker) of SHK was recovered, suggesting a close relationship among these molecules within this region. SHK’s linker domain is predicted to contain a {alpha}-helix repeat composed of {alpha}7 to {alpha}10 motifs, which is indeed homologous to that of STAT (Fig. 3B). The C-terminal three {alpha}-helices ({alpha}9, {alpha}10, and {alpha}11) formed a large {alpha}-helix immediately upstream of the SH2 domain. While the linker domains of most SHK members are relatively similar in size, SHK5’s linker domain is apparently much longer due to homopolymerism, a characteristic of many genes found in Dictyostelium (4, 7). Comparing with STAT or STATL, the predicated helical characteristic of the linker domain is degenerating in SHK (Fig. 3B), perhaps due to a functional regression of this domain in tyrosine signaling.



View larger version (63K):
[in this window]
[in a new window]
 
FIG. 3. The SH2 domain coevolves with the linker domain but not with the SH3 domain. A, the SH2 domains of SHK and STAT members from D. discoideum were analyzed with secondary structure program and aligned. SHK family members include dd-SHK1, putative dd-SHK2 (Contig13319, Sanger Center), putative dd-SHK3 (JAX4a118b12.r1), putative dd-SHK4 (JC2d33h04.s1), and putative dd-SHK5 (Contig12179, Sanger Center). STAT family members include dd-STATa, dd-STATc, putative dd-STATb (Contig17339, Sanger Center), and putative dd-STATd (JC1b225h10.r1). In dd-STATd, an S/T-rich sequence (KDSLSKSSNDKLLQSPTTTTT TTTS) between ßC and ßD was omitted. B, the linker domains of dd-SHK and dd-STAT members were analyzed with secondary structure program and aligned. C, the SH3 domains of human Src (hs-Src, P12931), yeast NAP1-binding protein (sc-NBP,YDR162C), a putative gene of Dictyostelium (dd-SH3a, C94356), and three putative genes of A. thaliana (at-SH3a, AAG5264; at-SH3b, AAL32440; and at-SH3c, AAL32439) were analyzed with secondary structure program and compared with the structural features of the SH3 domain obtained from crystallization (15).

 
Although the linker domain in STAT may play a role in transcription (22), it was either lost or replaced by the SH3 domain in SH2-bearing signaling proteins of animal cells. For a long time, the linker domain of metazoan STAT was confused as the SH3 domain based upon amino acid sequence alignment (2). The SH3 domain consists of five ß-strands arranged as two tightly packed anti-parallel ß-sheets (14, 15). Our secondary structural analysis revealed the typical SH3 domain structure in the C-terminal regions of three putative Arabidopsis genes among 14 putative SH3-carrying genes under both plant and prokaryote categories given by the SMART protein database (Fig. 3C). The SH3 domains of these three putative genes showed either a {alpha}-helix or {alpha}/ß-hybridized motif in the region of the ße motif (Fig. 3C), suggesting that this region is a putative EAR. Although a weak sequence homology exists between the STAT linker domain and the SH3 domain (2), the presence of well-developed SH3 domains as well as the fully developed linker-SH2 domain in independent genes in plants indicates the SH3 domain is unlikely evolved from the linker domain. While SH3 domain carrying genes exist in plants and quickly multiplied in its number in lower eukaryotes such as Dictyostelium and yeast, SH3-SH2 domain conjugation has not been discovered in these organisms.

SPT6 Gene Carries a Putative Immature Linker-SH2-like Domain—
We next studied the yeast genome in which no typical SH2 domains were identified (30). Using the same approach, we identified 89 ßB sequences from 86 putative genes in the Saccaromyces genome database (genome-www.stanford.edu/Saccharomyces). The suppressor of Ty 6 gene (SPT6) attracted our attention after secondary structural analysis of all these sequences. SPT6 was previously reported in yeast and animal cells and is involved in transcriptional initiation and DNA/RNA binding (31, 32). Using the yeast SPT6 protein sequence for a BLAST search, putative SPT6 genes were identified in plants, Dictyostelium, and other eukaryotes (NCBI BLAST searches), suggesting that SPT6 is also an ancient gene that existed prior to the divergence of plants and animals. The conserved third residue should be either Phe or Tyr in the standard ßB sequence GXF/YBBR (Fig. 4); however, sc-SPT6 of yeast is the only one to follow this rule (Fig. 4). Nevertheless, the putative {alpha}ßßß{alpha}-like structure, albeit less typical, is well maintained in all SPT6 proteins according to our secondary structural prediction, supporting the suspicion of a degenerate SH2 domain in SPT6 genes (33, 34). The {alpha}A, ßB, and ßC motifs that compose the evolutionary inactive region in the SH2 domains are conserved in SPT6s regardless of the origin. The putative {alpha}B`{alpha}B helical structure in SPT6 does not split as STAT’s SH2 domain does. In the suspected EARs of most SPT6s, a short ß-strand and a {alpha}-helix are predicted, and the ßD motif was not fully extended according to the prediction analysis. Such a poorly developed putative ßD motif may not efficiently form an anti-parallel ß-sheet with ßB and ßC, and may eventually hamper its function in protein-protein interactions. The large Lys/Glu-rich helical structure in the linker domain of SPT6 resembles the ancient STATL-like {alpha}9-{alpha}10-{alpha}11 continuing {alpha}-helices that became three discrete helices in the linker domain of human STAT or was discontinued in SHK (Figs. 2A and 3B). Moreover, in the region upstream of the linker domain, SPT6 contains a putative 6-ß-strand repeat resembling the DNA-binding domain in both STAT and at-STATL (not shown). Interestingly, our secondary structure prediction analysis indicates that SPT6 and cr-STATL of the unicellular plant Chlamydomonas share some critical features at secondary structural level. Both are predicted to bear a large nonsplit {alpha}B`{alpha}B motif and a poorly developed ßD motif (Figs. 2B and 4). Therefore, it is possible that SPT6 and STAT are evolutionarily related.



View larger version (40K):
[in this window]
[in a new window]
 
FIG. 4. SPT6 carries a putative atypical linker-SH2 domain within the C terminus. The C-terminal regions of SPT6 proteins were analyzed with secondary structure program and aligned with the SH2 domains of hs-STAT1 and at-STATLb. SPT6 or putative SPT6 genes included for this study are: sc-SPT6 (S. cerevisiaes, ACP23615), sp-SPT6 (Schizosaccharomyces pombe, ACQ09915), at-SPT6a (A. thaliana, AC010795), at-SPT6b (A. thaliana, AC022355), dd-SPT6 (D. discoideum, Contig17121, Sanger Center), hs-SPT (Homo sapiens, HSU46691), dm-SPT6 (Drosophila melanogaster, AF104400), ce-SPT6 (C. elegans, D14635), xl-SPT6 (Xenopus laevis, BJ098459), lm-SPT6 (Leishmania major, AL499622), tb-SPT6 (Trypanosoma brucei, AC008368), dr-SPT6 (Danio rerio, AF421378), pc-SPT6 (Pneumocystis carinii f. sp. Carinii, AW334882), lj-SPT6 (Lotus japonicus, BG662187), gx-SPT6 (Glycine max, BI424708), mt-SPT6 (Medicago truncatula, BG645814), and sb-SPT6 (Sorghum bicolor, BI139864).

 
In terms of amino acid sequence, the SH2 domains of SPT6, STAT, and JAK are among the most divergent. Unlike JAK family members, both SPT6 and STAT are transcription factors that contain special secondary structural characteristics, especially in the EAR sequence as seen above. Based on the phylogenetic alignment, SH2 domains can be grouped into two categories, i.e. STAT-type and Src-type (Fig. 5). SHK family members fall in between STAT-type and Src-type but are closer to the STAT-type (Fig. 5). This strongly indicates a close relationship between SHK and STAT families in their SH2 domains and further supports the notion that SHK’s linker-SH2 domain evolved from STAT or STATL. In SHK, STAT, and SPT6, the linker-SH2 domains all reside exclusively in the C-terminal regions. The appearance of SHK gene, which bears both the linker-SH2 domain and the kinase domain, in Dictyostelium started the new era toward pTyr signal transduction.



View larger version (25K):
[in this window]
[in a new window]
 
FIG. 5. Different types of SH2 domains are revealed by phylogenetic analysis. The amino acid sequences of SH2 domains of different genes are aligned with ClustalW (v1.4) program and grouped into two categories: Src-type and STAT-type. The alignment parameters are gap distance = 8; similarity matrix = blosum; open gap penalty = 10.0; and extend gap penalty = 0.1.

 
The combination of amino acid sequence screening with secondary structural analysis improves the accuracy in protein motif prediction. The ßB core motif-like and its surrounding sequences in the Rht-B1/D1 gene of wheat have been considered as the first SH2 domain discovered in plant (35). Moreover, GAI, RGA, and SCR, all members of a putative transcription factor family termed GRAS, were suspected as STAT-like factors in plant and carry the SH2 domains due to ßB-like core sequences found in their C-terminal regions (11, 35). However, according to our secondary structural prediction "ß{alpha}{alpha}" rather than "{alpha}ßßß{alpha}" topology is present in all those SH2-like sequences. Therefore, a perfect ßB sequence does not necessarily reveal an SH2 domain structure. In contrast, as long as the "{alpha}ßßß{alpha}" secondary structure maintains, the amino acid sequence can be very divergent. The balance between evolution and conservation in the SH2 domain development reflects evolution at the primary structural level but conservation at secondary structural level.

The discovery of STAT linker-SH2 domain-bearing genes in plants underscores the proposal that SH2 domain development was an early step in the evolution of multicellularity. The SH2 domain formed most likely in a common eukaryote ancestor prior to divergence of any of the major eukaryote taxa. Hence, the linker-SH2 domain of the transcription factor STAT has been placed on center stage of transcriptional activation prior to the development of the SH2 domain into pTyr signal transduction (36, 37).


    ACKNOWLEDGMENTS
 
We are grateful to J. M. Wang for critical suggestions in performing secondary structural analysis and D. Chatterjee for careful reading the manuscript.


    FOOTNOTES
 
Received, December 7, 2003, and in revised form, March 28, 2004.

Published, MCP Papers in Press, April 7, 2004, DOI 10.1074/mcp.M300131-MCP200

1 The abbreviations used are: SH2, Src homologous 2; SH3, Src homologous 3; SHK, SH2 domain-bearing protein kinase; SPT6, suppressor of Ty 6; STAT, signal transducer and activator of transcription; STATL, STAT-like; pTyr, phosphotyrosine; GST, glutathione S-transferase; EAR, evolutionarily active region. Back

* This work was supported in part by National Institutes of Health Grant RO1 CA82549 (to E. Y. C.) and by Grant RR-15578 from the Center of Biomedical Research Excellence (COBRE) Program of the National Center for Research Resources to Brown University. The costs of publication of this article were defrayed in part by the pay-ment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Back

§ Current address: Department of Gynecology, Yale University School of Medicine, New Haven, CT 06510. Back

Current address: Department of Plant Biology, Cornell University, Ithaca, NY 14850. Back

¶¶ To whom correspondence should be addressed: Departments of Surgery Science and Pathology & Laboratory Medicine, Brown University School of Medicine, Providence, RI 02903. Tel.: 401-444-0172; Fax: 401-444-3278; E-mail: y_eugene_chin{at}brown.edu


    REFERENCES
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 

  1. Pawson, T., Gish, G. D., and Nash, P. (2001) SH2 domains, interaction modules and cellular wiring. Trends Cell Biol. 11, 504 –511[CrossRef][Medline]

  2. Fu, X. Y. (1992) A transcription factor with SH2 and SH3 domains is directly activated by an interferon {alpha}-induced cytoplasmic protein tyrosine kinase(s). Cell 70, 323 –335[Medline]

  3. Darnell, J. E., Jr. (1997) STATs and gene regulation. Science 277, 1630 –1635[Abstract/Free Full Text]

  4. Kawata, T., Shevchenko, A., Fukuzawa, M., Jermyn, K. A., Totty, N. F., Zhukovskaya, N. V., Sterling, A. E., Mann, M., and Williams, J. G. (1997) SH2 signaling in a lower eukaryote: A STAT protein that regulates stalk cell differentiation in dictyostelium. Cell 89, 909 –916[Medline]

  5. Fukuzawa, M., Araki, T., Adrian, I., and Williams, J. G. (2001) Tyrosine phosphorylation-independent nuclear translocation of a dictyostelium STAT in response to DIF signaling. Mol. Cell 7, 779 –788[CrossRef][Medline]

  6. Moniakis, J., Funamoto, S., Fukuzawa, M., Meisenhelder, J., Araki, T., Abe, T., Meili, R., Hunter, T., Williams, J., and Firtel, R. A. (2001) An SH2-domain-containing kinase negatively regulates the phosphatidylinositol-3 kinase pathway. Genes Dev. 15, 687 –698[Abstract/Free Full Text]

  7. Kimmel, A. R., and Firtel, R. A. (1985) Sequence organization and developmental expression of an interspersed, repetitive element and associated single-copy DNA sequences in Dictyostelium discoideum. Mol. Cell Biol. 5, 2123 –2130[Medline]

  8. Kay, R. R. (1997) Development at the edge of multi-cellularity: Dictyostelium discoideum. Curr. Biol. 7, R723 –725[CrossRef][Medline]

  9. Pawson, T., and Gish, G. D. (1992) SH2 and SH3 domains: From structure to function. Cell 71, 359 –362[Medline]

  10. Waksman, G., Kominos, D., Robertson, S. C., Pant N., Baltimore, D., Birge, R. B., Cowburn, D., Hanafusa, H., Mayer, B. J., Overduin, M., et al. (1992) Crystal structure of the phosphotyrosine recognition domain SH2 of v-src complexed with tyrosine-phosphorylated peptides. Nature 358, 646 –653[CrossRef][Medline]

  11. Richards, D. E., Peng, J., and Harberd, N. P. (2000) Plant GRAS and metazoan STATs: one family? BioEssays 22, 575 –577

  12. Wang, Y., Wu, T. R., Cai, S., Welte, T., and Chin, Y. E. (2000) Stat1 as a component of tumor necrosis factor {alpha} receptor 1-TRADD signaling complex to inhibit NF-{kappa}B activation. Mol. Cell Biol. 20, 4505 –4512[Abstract/Free Full Text]

  13. Hof, P., Pluskey, S., Dhe-Paganon, S., Eck, M. J., and Shoelson, S. E. (1998) Crystal structure of the tyrosine phosphatase SHP-2. Cell 92, 441 –450[Medline]

  14. Sicheri, F., Moarefi, I., and Kuriyan, J. (1997) Crystal structure of the Src family tyrosine kinase Hck. Nature 385, 602 –609[CrossRef][Medline]

  15. Xu, W., Harrison, S. C., and Eck, M. J. (1997) Three-dimensional structure of the tyrosine kinase c-Src. Nature 385, 595 –602[CrossRef][Medline]

  16. Chen, X., Vinkemeier, U., Zhao, Y., Jeruzalmi, D., Darnell, J. E., Jr., and Kuriyan, J. (1998) Crystal structure of a tyrosine phosphorylated STAT-1 dimer bound to DNA. Cell 93, 827 –839[Medline]

  17. Charifson, P. S., Shewchuk, L. M., Rocque, W., Hummel, C. W., Jordan, S. R., Mohr, C., Pacofsky, G. J., Peel, M. R., Rodriguez, M., Sternbach, D. D., and Consler, T. G. (1997) Peptide ligands of pp60(c-src) SH2 domains: A thermodynamic and structural study. Biochemistry 36, 6283 –6293[CrossRef][Medline]

  18. Becker, S., Groner, B., and Müller, C. W. (1998) Three-dimensional structure of the Stat3ß homodimer bound to DNA. Nature 394, 145 –151[CrossRef][Medline]

  19. Liu, X. D., Quinn, M., Chin, Y. E., and Fu, X. Y. (1999) STAT genes in C. elegans. Science 285, 167 –168[CrossRef]

  20. Meng, W., Sawasdikosol, S. Burakoff, S. J., and Eck, M. J. (1999) Structure of the amino-terminal domain of Cbl complexed to its binding site on ZAP-70 kinase. Nature 398, 84 –90[CrossRef][Medline]

  21. Zhou, Y. J., Chen, M., Cusack, N. A., Kimmel, L. H., Magnuson, K. S., Boyd, J. G., Lin, W., Roberts, J. L., Lengi, A., Buckley, R. H., Geahlen, R. L., Candotti, F., Gadina, M., Changelian, P. S., and O’Shea, J. J. (2001) Unexpected effects of FERM domain mutations on catalytic activity of Jak3: Structural implication for Janus kinases. Mol. Cell 8, 959 –969[CrossRef][Medline]

  22. Yang. E., Wen, Z., Haspel, R. L., Zhang, J. J., and Darnell, J. E., Jr. (1999) The linker domain of Stat1 is required for {gamma} interferon-driven transcription. Mol. Cell Biol. 19, 5106 –5112[Abstract/Free Full Text]

  23. Nars, M., and Vihinen, M. (2001) Coevolution of the domains of cytoplasmic tyrosine kinases. Mol. Biol. Evol. 18, 312 –321[Abstract/Free Full Text]

  24. Horvath, C. M., Wen, Z., and Darnell, J. E., Jr. (1995) A STAT protein domain that determines DNA sequence recognition suggests a novel DNA-binding domain. Genes Dev. 9, 984 –994[Abstract]

  25. The Arabidopsis Genome Initiative. (2000) Nature 408, 796 –815[CrossRef][Medline]

  26. Ali, N., Halfter, U., and Chua, N. H. (1994) Cloning and biochemical characterization of a plant protein kinase that phosphorylates serine, threonine, and tyrosine. J. Biol. Chem. 269, 31626 –31629[Abstract/Free Full Text]

  27. Stone, J. M., Collinge, M. A., Smith, R. D., Horn, M. A., and Walker, J. C. (1994) Interaction of a protein phosphatase with an Arabidopsis serine-threonine receptor kinase. Science 266, 793 –795[Medline]

  28. Sweitzer, S. M., Calvo, S., Kraus, M. H., Finbloom, D. S., and Larner, A. C. (1995) Characterization of a Stat-like DNA binding activity in Drosophila melanogaster. J. Biol. Chem. 270, 16510 –16513[Abstract/Free Full Text]

  29. Baldauf, S., and Doolittle, W. F. (1997) Origin and evolution of the slime molds (Mycetozoa). Proc. Natl. Acad. Sci. U. S. A. 94, 12007 –12012[Abstract/Free Full Text]

  30. Hunter, T., and Plowman, G. D. (1997) The protein kinases of budding yeast: Six score and more. Trends Biochem. Sci. 22, 18 –22[CrossRef][Medline]

  31. Clark-Adams, C. D., and Winston, F. (1987) The SPT6 gene is essential for growth and is required for {delta}-mediated transcription in Saccharomyces cerevisiae. Mol. Cell Biol. 7, 679 –686[Medline]

  32. Swanson, M. S., Carlson, M., and Winston, F. (1990) SPT6, an essential gene that affects transcription in Saccharomyces cerevisiae, encodes a nuclear protein with an extremely acidic amino terminus. Cell Biol. 10, 4935 –4941

  33. Maclennan, A. J., and Shaw, G. (1993) A yeast SH2 domain. Trends Biochem. 18, 464 –465

  34. Chiang, P. W., Wang, S. Q., Smithivas, P., Song, W. J., Crombez, E., Akhtar, A., Im, R., Greenfield, J., Ramamoorthy, S., Van Keuren, M., Blackburn, C. C., Tsai, C. H., and Kurnit, D. M. (1996) Isolation and characterization of the human and mouse homologues (SUPT4H and Supt4h) of the yeast SPT4 gene. Genomics 34, 328 –333[CrossRef][Medline]

  35. Peng, J., Richards, D. E., Hartley, N. M., Murphy, G. P., Devos, K. M., Flintham, J. E., Beales, J., Fish, L. J., Worland, A. J., Pelica, F., Sudhakar, D., Christou, P., Snape, J. W., Gale, M. D., and Harberd, N. P. (1999) "Green revolution" genes encode mutant gibberellin response modulators. Nature 400, 256 –261[CrossRef][Medline]

  36. Darnell, J. E., Jr. (1997) Phosphotyrosine signaling and the single cell: Metazoan boundary. Proc. Natl. Acad. Sci. U. S. A. 94, 11767 –11769[Free Full Text]

  37. Kuriyan, J., and Darnell, J. E., Jr. (1999) An SH2 domain in disguise. Nature 398, 22 –23[CrossRef][Medline]