Department of Chemical Engineering, The Pennsylvania State University, 112 Fenske Laboratory, University Park, PA 16802, USA
1 To whom correspondence should be addressed. e-mail: costas{at}psu.edu
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Keywords: bioinformatics/directed evolution/protein engineering/residueresidue clash
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
A number of hypotheses have been proposed (Bogarad and Deem, 1999; Voigt et al., 2001
) to explain why functional crossovers are not randomly distributed along the sequence but rather form distinct patterns. One of the most recent methods, the SCHEMA algorithm (Voigt et al., 2002
), postulates that crossover patterns resulting in hybrids with a large number of contacting residue pairs originating from the same parental sequences are more likely to retain their functionality. The key idea here is that each contact is a representation of favorable interaction between the two residues. Thus, by retaining these contacting residues in the hybrids, one retains the favorable interactions that exist in the parental sequences. This interesting approach has led to a number of successful predictions (Hiraga and Arnold, 2003
; Meyer et al., 2003
). One potential shortcoming, however, is that it cannot differentiate between hybrids with different directionality (i.e. an AB versus a BA crossover), which often have substantially different functionalities (Lutz et al., 2001
; Moore and Maranas, 2003
). Here, we rethink the effect of having contacting residue pairs with different parental origins. Instead of always counting them as unfavorable, we view such pairs as places where clashes may or may not occur between the contacting residues. This view allows us to re-establish context in the interaction between the residue pair and thus capture the effect of crossover directionality (e.g. an AB versus a BA crossover) on function. Specifically, motivated by the results of Moore and Maranas (2003), we explore three out of the many different mechanisms that may render a contacting residue pair detrimental to the ability of the hybrid to fold properly (i.e. stability) and thus retain its functionality: (i) introduction of repulsive residue pairs such as +/+ or /, (ii) disruption of hydrogen bonds due to the formation of donor/donor or acceptor/acceptor pairs and (iii) generation of steric clashes or cavities. It is quite straightforward to show that upon recombination residue clashes such as the repulsive residue pairs, disrupted hydrogen bonds and steric clashes can be introduced due to reversed orientation of charged, acceptor/donor or bulky residue pairs (Figure 1). Other forms of clashes, not considered here, include the disruption of important protein-specific interactions (Oldfield, 2002
) such as metal binding motifs (Glusker, 1991
), the catalytic triad (Fischer et al., 1994
; Wallace et al., 1997
) and a number of ligand binding sites (Chakrabarti, 1993
; Copley and Barton, 1994
).
|
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Repulsive residue pairs
Residue pairs found in the contact map of the hybrids are screened for +/+ or / charge contacts that may be brought about by recombination (Figure 1a). A contacting pair that has a repulsive residue pair (+/+ or /) at these positions in either of the parental sequences is not counted since they evidently do not seem to disrupt functionality. Note that the crossover directionality is automatically accounted for since charge repulsion may be generated between residue pairs in one hybrid but not necessarily in the hybrid that has the reverse directionality (Figure 1a). For example, parental contacting residue pairs with a single charged residue (n/+ and +/n) may form upon recombination either a neutral pair (n/n) or a repulsive residue pair (+/+) depending on the directionality of the crossover. Also, lysine and arginine are considered to be positively charged and glutamate and aspartate as negatively charged.
Steric hindrance or cavity formation in the hybrids
A significant reduction in the total volume of a contacting residue pair is likely to give rise to a cavity formation, whereas a corresponding increase may cause steric hindrance. Figure 1b illustrates the effect of such volume changes as a consequence of the reversed orientation of large (residues A, D) and small (residues B, C) side chains in the parental sequences. Cavity formation or steric hindrance is detected by observing whether the combined volume of the contacting residue pair in the resultant hybrid is much lower or higher than the mean combined volume (M) of the same contacting residue pairs in the parental sequences (A+B, C+D):
Here Vk is the side chain volume of residue k (k = A, B, C, D) in Å3. Specifically, the scores SAD and SCB (for hybrids 1 and 2 shown in Figure 1b) are defined separately for hybrids with different crossover directionality as a measure of the deviation from M:
A parameter [ = |(VA + VB) (VC + VD)|], which quantifies the extent of difference between the combined volumes of the two parental contacting residue pairs, is introduced into these scores to account for the tolerance of such volume changes. If the contacting residue pairs in both parental sequences are of similar size, they could lead to a small (even zero) value of
, thus resulting in artificially inflated scores particularly in cases where the large and small residues have reversed orientation. Therefore, a lower bound is set on
equal to 10% of the mean (M):
In general, the core of most proteins has a higher packing fraction as compared with the surface (Munson et al., 1996). This suggests that steric clashes are less likely to be tolerated in the protein core (Dupraz et al., 1990
) as they often lead to packing defects (Song et al., 1999
; Ratnaparkhi and Varadarajan, 2000
). To account for the difference in the tolerance level for steric clashes at the protein surface and in the core, we set different cut-off scores Sc for contacting pairs. Cavity formation and steric hindrance in the core of the protein (i.e. accessible surface area of side chain <8 Å2) are considered to be significant if they score above a cut-off value, Sc = 15 Å3, whereas only steric hindrance is considered with a cut-off value of 30 Å3 at the surface. The accessible surface area of a side chain is obtained by rolling a water probe of radius 1.4 Å over the exposed surface. These calculations are performed using the WHATIF software package (Vriend, 1990
).
Hydrogen bond disruption
Protein family members share many common hydrogen bonds, particularly those that are essential for functionality (Agarwal et al., 2002; Loll et al., 2003
). Swapping the positions of the donor and acceptor groups of a hydrogen bond within a sequence preserves the hydrogen bond. However, similarly to volume and charge clashes, orientation reversals of the donor and acceptor groups in parental sequences lead to hybrids with donordonor or acceptoracceptor contacting pairs, thus disrupting the hydrogen bond between the two residues (Figure 1c). Note that hydrogen bonds between two backbone atoms are not of interest here since both the acceptor (CO) and donor (NH) groups are retained upon recombination. Here, we consider all possible cases (i.e. side chain/backbone and side chain/side chain) to identify potentially disrupted hydrogen bonds. The WHATIF software package (Vriend, 1990
) is used to detect common hydrogen bonds and identify the donor and acceptor groups of the parental sequences.
Contacting residue pairs identified for hybrids that violate at least one of the above three criteria (i.e. charge repulsion, steric hindrance and hydrogen bond disruption) are denoted as arcs (Figure 2) linking the two residue positions. A crossover occurring between these two positions results in differing parental origins for the two contacting residues, connected by the arc, in the resulting hybrid. This representation of clashes is generalized for hybrids with multiple crossovers by using bicolored arcs to encode the specific directionality of the parental combination leading to a clash. We next examine the effectiveness of the proposed residue clash maps at explaining known functional crossover combinations for a number of protein systems.
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
These systems vary considerably not only in terms of pairwise sequence identity and number of functional hybrids, but also in the directed evolution protocol used for generating crossovers. All possible residue pairs with different parental origin that are brought in contact in one (or more) of the resultant hybrids are screened for all three forms of clashes. These clashes are then shown as arcs composing the residue clash map (Figure 2). This representation is used for hybrids with a single crossover (GART) while a generalized representation (i.e. bicolored arcs) is used for hybrids with multiple crossovers (GST, ß-lactamase, C23O, and dioxygenases). A detailed comparison of the available experimental data using the proposed (i) residue clash map, (ii) residue contact map, and (iii) randomly generated clashes is presented. A randomly generated clash map is constructed by randomly choosing an Arbitrary number of pairs of non-conserved residue positions from the structural alignment. Note that conserved residue positions are not of interest here since they are also conserved in the hybrids and therefore will not form a clash. These results are examined in terms of %ACC (percent of avoided calculated clashes), defined as the percentage of the predicted clashes avoided by the functional hybrids present in the data set, and %CFC (percent of clash free crossovers), defined as the percentage of the observed functional crossovers that do not lead to any of the identified clashes. The %ACC of the randomly generated clash map is obtained by averaging these values over 100 000 such randomly generated samples. Alternatively, these values can be calculated as the ratio of all pairs of non-conserved residue positions that have residues at these positions in the functional hybrids that are both simultaneously retained from either one of the parental sequences to the total number of combinations of such residue pairs.
Glycinamide ribonucleotide transformylase
In this case study we identify all clashing residue pairs for the two single-crossover incremental truncation libraries encoding purN/hGART and hGART/purN hybrids. These hybrids are constructed using purN (209 residues) and hGART (201 residues) sequences whose structures (PDB i.d.: 1GAR, 1MEO, respectively) are obtained from the PDB. Structural alignment of the two structures using the CE method results in a root mean square distance (r.m.s.d.) value of 1.30 Å and a sequence identity of 38.20%. The residue clash map is constructed after identifying all common contacting residues based on the structural alignment. The purN/hGART library includes eight steric clashes (shown as gray arcs in Figure 3a) and five repulsive residue pairs (shown as black arcs), while the hGART/purN library exhibits nine steric clashes, three cases of charge repulsion and one hydrogen bond disruption (shown as a broken arc in Figure 3b).
|
|
The two GST parental sequences (i.e. human Mu class glutathione S-transferases, GST M1-1 and M2-2) share a relatively high sequence identity of 84% and align well both at the sequence and structural level. Both sequences are 217 residues in length, and have available structures (PDB i.d.: 1GTU and 2GTU). Even though they share only a 16% difference in the sequence at the protein level, their specific activities with the substrate aminochrome and 2-cyano-1,3-dimethyl-1-nitrosoguanidine (cyanoDMNG) differ by more than 100-fold (Hansson et al., 1999a). The chimeric GSTs in the experimental study were modified so that the first 32 bp (
10 amino acids) of each were from GST M1-1 (Figure 4). The two segments vary only at two positions (i.e. 3 and 8) implying that the modified DNA shuffled parental sequences have a slightly increased sequence identity of 85.25% at the protein level. The 20 functional hybrid sequences involving multiple crossovers (Hansson et al., 1999b
) are shown in Figure 4 with gray denoting fragments retained from GST M1-1 and black denoting fragments from GST M2-2. All recombinant sequences have a number of identical stretches of undetermined parental origin, shown in white. The hybrids are listed in decreasing order of activities with respect to aminochrome and CDNB.
|
ß-Lactamases
Surprisingly, even though the sequence identity between the two ß-lactamase parental sequences [PDB i.d.: 1G68 (PSE-4) and 1BTL (TEM-1)] is 43.17%, slightly more than the GART system, the number of identified clashes is significantly higher. The total number of clashes in the TEM-1/PSE-4 directionality is found to be 27 while the reverse directionality involved 30 clashes (Figure 5). Hybrids for both directions contained 14 cases of charge repulsion while the remaining clashes resulted from steric clashes. Crossover sequence data for functional hybrids are taken from the in vitro recombination experiments conducted by Voigt et al. (2002) where 10 functional hybrids (Figure 5) are reported. These crossovers were generated between residue positions 26 and 290. Notably, by superimposing the residue clash map against the crossover distribution, we find that 80.70% of the predicted clashes share such directionalities so that they are not found in any of the functional members of the library. Figure 5 shows that most of the predicted clashes fall in the range between positions 25 and 125 and are present in only four out of the 19 functional crossovers. On the other hand, residue contact map and random clash distributions yielded much lower %ACC values of only 65.00 and 14.68%, respectively (Table I). Recently, Hiraga and Arnold (Hiraga and Arnold, 2003
) published additional crossover results for functional ß-lactamase hybrids constructed using SISDC. These new data were also compared with the predicted clash map shown in Figure 5 and the results of these comparisons are summarized in Table I.
|
Kikuchi et al. (2000) obtained seven thermally stable hybrids using single-stranded DNA shuffling on the parental sequences xylE (catechol-2,3-dioxygenase from Pseudomonas putida, PDB i.d.: 1MPY) and nahH (synthetic construct). Because no structure is currently available for nahH, we used an estimated structure obtained using Swiss-Model (Peitsch, 1996
) with the structure of nahH (IMPY) as the template. This was subsequently used to obtain the structural alignment using the CE method (Shindyalov and Bourne, 1998
). The two sequences share 84.7% sequence identity at the protein level. A total of six clashes are identified for both directions, all of which resulted from electrostatic repulsion (Figure 6). Five of these have xylE/nahH directionality [7980 (+/+), 8283 (/), 183184 (/), 183286 (/) and 285286 (/)] and only one with nahH/xylE directionality [8083 (+/+)]. The residue clash map identified three clashes located in the region around residue 80 which is the region retained from the same parental sequence in all of the hybrids, thus, preventing the formation of clashes. Interestingly, all the functional hybrids in the library have different parental origins for the contacting residue pair 183286; however, none have xylE/nahH directionality, thus avoiding the charge clash that could be formed in the hybrids with reverse (xylE/nahH) directionality (Table I).
|
All four protein systems analyzed so far included hybrids constructed from two parental sequences. The dioxygenase hybrids involve three parental sequences and have a relatively higher number of crossovers per sequence. The active library was created (Joern et al., 2002) by recombining the
and ß subunits of toluene dioxygenase (todC1C2), tetrachlorobenzene dioxygenase (tecA1A2) and biphenyl dioxygenase (bhpA1A2). tod and tec are 89.16% identical at the protein level. The bhp sequence is less similar, exhibiting 62.30 and 61.85% pairwise sequence identity with tec and tod, respectively. No structures are available for any of these protein sequences, thus an estimated structure for each one of them is used. The dimeric state of the dioxygenases requires the use of Swiss-Model in Optimize mode (Schwede et al., 2003
) for structure prediction. Naphthalene dioxygenase (PDB i.d.: 1O7G), a distant homolog of the three dioxygenases was found using the ExPDB database (Schwede et al., 2000
) and was used as the template. Figure 7 shows the clash maps for the three different sequence combinations (i.e. tec-tod, tod-bhp and bhp-tec) contrasted against the eight active clones with one to eight crossovers per sequence. Comparisons of these results are summarized in Table II. A total of 94 clashes are identified of which 94.68% result from the tod-bhp and bhp-tec combinations alone, a consequence of low sequence identity between these sequences. Notably, out of the 94 identified clashes only one clash is present in the hybrids [arising from charge repulsion (+/+) between residues 13 and 385 with a tec-bhp directionality] resulting in a high %ACC of 98.9% and a %CFC of 96.8%. Alternatively, we calculated a total of 3685 non-conserved contacting residues with different parental origins using the estimated structures out of which 84.42% result from the tod-bhp and bhp-tec combinations. Of these contacts, 1063 are found to be present in the active hybrids, resulting in %ACC and %CFC values of 71.2 and 9.7%, respectively (Table I).
|
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Note also that we find that the residue clash maps are on average 1.55 times more specific (i.e. ratio of %ACCs) than residue contact maps and 5.03 times more specific than randomly generated clashes at explaining observed functional crossovers. While residue contact maps do capture some information on residue pairs that result in unfavorable interaction in the hybrids, not all disrupted contact pairs are detrimental to functionality. The proposed residue clash map improves prediction by filtering out many of the incorrectly predicted pairs. The clash map categorizes these clashes into three distinct types (i.e. electrostatic repulsion, steric clash and hydrogen bond disruption). By pinpointing the cause of these clashes one can then perform site-directed mutagenesis to ameliorate clashes by replacing problematic residues with ones that do not form any clashes. Admittedly, the residue clash map does not account for the possibility of relieving some of the identified clashes through side chain and/or backbone movement. This simplification is reflected in the results as the accuracy in crossover classification is reduced as the sequence identity and thus similarity between the parental sequences is reduced (Table I). Therefore, some of the residues that are in contact in the parental sequences may not necessarily remain in contact in the hybrid, thus relieving some of the predicted clashes. Alternatively, new clashes may be introduced due to new contacts formed or altered side chain conformations. Nevertheless, the proposed approach enables the rapid prescreening of an entire protein family for revealing favorable recombination partners that can subsequently be analyzed by more detailed molecular modeling methods that capture side chain and backbone movement. So far the clash map based method can only classify hybrids as functional or non-functional but cannot rank hybrids with respect to their activity. We are currently developing methods for overcoming this limitation by ranking the hybrids with respect to their activity based on the identified clashes.
![]() |
Acknowledgements |
---|
|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) Nucleic Acids Res., 28, 235242.
Bogarad,L.D. and Deem,M.W. (1999) Proc. Natl Acad. Sci. USA, 96, 25912595.
Chakrabarti,P. (1993) J. Mol. Biol., 234, 463482.[CrossRef][ISI][Medline]
Copley,R.R. and Barton,G.J. (1994) J. Mol. Biol., 242, 321329.[ISI][Medline]
Dupraz,P., Oertle,S., Meric,C., Damay,P. and Spahr,P.F. (1990) J. Virol., 64, 49784987.[ISI][Medline]
Fischer,D., Wolfson,H., Lin,S.L. and Nussinov,R. (1994) Protein Sci., 3, 769778.
Glusker,J.P. (1991) Adv. Protein Chem., 42, 176.[Medline]
Gobel,U., Sander,C., Schneider,R. and Valencia,A. (1994) Proteins, 18, 309317.[ISI][Medline]
Hansson,L.O., Bolton-Grob,R., Massoud,T. and Mannervik,B. (1999a) J. Mol. Biol., 287, 265276.[CrossRef][ISI][Medline]
Hansson,L.O., Bolton-Grob,R., Widersten,M. and Mannervik,B. (1999b) Protein Sci., 8, 27422750.[Abstract]
Hiraga,K. and Arnold,F.H. (2003) J. Mol. Biol., 330, 287296.[CrossRef][ISI][Medline]
Joern,J.M., Meinhold,P. and Arnold,F.H. (2002) J. Mol. Biol., 316, 643656.[CrossRef][ISI][Medline]
Kikuchi,M., Ohnishi,K. and Harayama,S. (2000) Gene, 243, 133137.[CrossRef][ISI][Medline]
Loll,B., Raszewski,G., Saenger,W. and Biesiadka,J. (2003) J. Mol. Biol., 328, 737747.[CrossRef][ISI][Medline]
Lutz,S., Ostermeier,M., Moore,G.L., Maranas,C.D. and Benkovic,S.J. (2001) Proc. Natl Acad. Sci. USA, 98, 1124811253.
Meyer,M.M., Silberg,J.J., Voigt,C.A., Endelman,J.B., Mayo,S.L., Wang,Z.G. and Arnold,F.H. (2003) Protein Sci., 12, 16861693.
Moore,G.L. and Maranas,C.D. (2003) Proc. Natl Acad. Sci. USA, 100, 50915096.
Moore,J.C., Jin,H.M., Kuchner,O. and Arnold,F.H. (1997) J. Mol. Biol., 272, 336347.[CrossRef][ISI][Medline]
Munson,M., Balasubramanian,S., Fleming,K.G., Nagi,A.D., OBrien,R., Sturtevant,J.M. and Regan,L. (1996) Protein Sci., 5, 15841593.
Oldfield,T.J. (2002) Proteins, 49, 510528.[CrossRef][ISI][Medline]
Ostermeier,M. (2003) Biotechnol. Bioeng., 82, 564577.[CrossRef][ISI][Medline]
Ostermeier,M., Nixon,A.E., Shim,J.H. and Benkovic,S.J. (1999) Proc. Natl Acad. Sci. USA, 96, 35623567.
Peitsch,M.C. (1996) Biochem. Soc. Trans, 24, 274279.[ISI][Medline]
Ratnaparkhi,G.S. and Varadarajan,R. (2000) Biochemistry, 39, 1236512374.[CrossRef][ISI][Medline]
Saraf,M.C., Moore,G.L. and Maranas,C.D. (2003) Protein Eng., 16, 397406.[CrossRef][ISI][Medline]
Schwede,T., Diemand,A., Guex,N. and Peitsch,M.C. (2000) Res. Microbiol., 151, 107112.[CrossRef][ISI][Medline]
Schwede,T., Kopp,J., Guex,N. and Peitsch,M.C. (2003) Nucleic Acids Res., 31, 33813385.
Shindyalov,I.N. and Bourne,P.E. (1998) Protein Eng., 11, 739747.[CrossRef][ISI][Medline]
Sieber,V., Martinez,C.A. and Arnold,F.H. (2001) Nat. Biotechnol., 19, 456460.[CrossRef][ISI][Medline]
Song,K.S., Park,Y.S., Choi,J.R., Kim,H.K. and Park,Q. (1999) Exp. Mol. Med., 31, 4751.[ISI][Medline]
van Gunsteren,W.F., Billeter,S.R., Eising,A.A., Hünenberger,P.H., Krüger,P.K., Mark,A.E., Scott,W.R.P. and Tironi,I.G. (1996) Biomolecular Simulations: The GROMOS96 Manual and User Guide. Verlag der Fachvereine, Zurich, pp. 11024.
Voigt,C.A., Mayo,S.L., Arnold,F.H. and Wang,Z.G. (2001) J. Cell. Biochem. Suppl., 37, 5863.
Voigt,C.A., Martinez,C., Wang,Z.G., Mayo,S.L. and Arnold,F.H. (2002) Nat. Struct. Biol., 9, 553558.[ISI][Medline]
Vriend,G. (1990) J. Mol. Graph., 8, 5256.[CrossRef][ISI][Medline]
Wallace,A.C., Borkakoti,N. and Thornton,J.M. (1997) Protein Sci., 6, 23082323.
Wang,P.L. (2000) Dis. Markers, 16, 313.[ISI][Medline]
Westbrook,J. et al. (2002) Nucleic Acids Res., 30, 245248.
Received August 1, 2003; revised October 21, 2003; accepted October 23, 2003