Designability, aggregation propensity and duplication of disease-associated proteins

Philip Wong1, Andreas Fritz2 and Dmitrij Frishman1,3,4

1Institute for Bioinformatics, GSF – National Research Center for Environment and Health, Ingolstädter Landstrasse 1, D-85764 Neuherberg, 2Biomax Informatics AG, Lochhamer Strasse 9, D-82152 Martinsried and 3Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftzentrum Weihenstephan, D-85350 Freising, Germany

4 To whom correspondence should be addressed at the Technische Universität München. E-mail: d.frishman{at}wzw.tum.de


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
Over 2000 proteins in the Ensembl human genome database have been linked with disease information from OMIM. In comparison with all human proteins, we find that disease-associated proteins tend to have less designable folds in terms of their SCOP family counts, suggesting that they are intrinsically less robust to mutation and environmental stress. Disease proteins also tend to have isoelectric points closer to neutrality and more alternating hydrophilic–hydrophobic amino acid stretches compared with the average human protein. These results suggest that protein aggregation is a significant phenomenon associated with diseases. Another finding in this work is that many disease proteins are highly sequence similar to other disease proteins, suggesting that gene duplication has contributed to the expansion of disease-prone protein families.

Keywords: aggregation/designability/disease/duplication


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
A mystery in medicine that has largely remained unsolved is the question of how heritable diseases have evolved. The sequencing of the human and related genomes and the emergence of databases such as the Online Database of Mendelian Inheritance in Man (OMIM) (Hamosh et al., 2005Go), which contain mappings of genes to disease phenotypes, provide novel opportunities to explore this question. Although the definition of what phenotypes may be considered a disease can be highly debatable (Scully, 2004Go), the diseases currently associated with genes in databases are harmful to enough individuals as to be clinically recognized. One may hypothesize that by comparing disease-associated genes in these databases, termed disease genes, against genes without such annotation, termed non-disease genes, one can derive properties which predispose genes for diseases currently clinically identifiable. Knowledge of such properties can be applied to help predict or verify disease–gene relations derived from linkage and association studies (Botstein and Risch, 2003Go; Carlson et al., 2004Go) and to select targets for drug screens. Such knowledge is also crucial for understanding the evolution of heritable diseases.

Previously, disease genes were found to be more expressed (Bortoluzzi et al., 2003Go), longer (Smith and Eyre-Walker, 2003Go), more tissue-specific (Smith and Eyre-Walker, 2003Go; Winter et al., 2004Go), have more synonymous nucleotide substitutions (Huang et al., 2004Go) and have less members amongst slowly evolving housekeeping genes (Winter et al., 2004Go). Comparisons conducted by López-Bigas and Ouzounis (2004)Go at the protein level revealed that disease proteins tend to be longer, more conserved, phylogenetically more extended and have less highly conserved paralogs than the average human protein. They have subsequently exploited these differences to create a decision tree-based predictor of disease proteins.

A number of developments have allowed us to add to these studies. First, an intriguing hypothesis that proteins with more designable structures (i.e. proteins which have more sequences that encode their structures) were more robust to mutation and thermal stresses had been proposed (Li et al., 1996Go). In line with this hypothesis is the finding that proteins of a random sample of thermophiles exhibited a higher contact trace, a measure which correlates well with the designability, than a sample of mesophiles (England et al., 2003Go). We hypothesized that since proteins could be functionally impaired by mutations or environmental stresses, disease proteins would be less designable than non-disease proteins. A second development involved the discovery of sequence properties associated with protein aggregation (Chiti et al., 2003Go; DuBay et al., 2004Go). Many diseases have been associated with protein aggregation (Dobson, 2004Go; Ross and Poirier, 2004Go), but the extent of this phenomenon had not been assessed. One could test whether disease proteins are more aggregation prone than non-disease proteins in terms of these properties.

In this work, we compared disease and non-disease proteins from the Ensembl human database (Birney et al., 2004Go) in terms of designability and aggregation propensity. In addition, we assessed the likelihood that proteins highly sequence similar to disease proteins would also be associated with disease based on the current level of annotation. We validated our findings using a differently annotated database of human proteins provided by Biomax Informatics, containing roughly twice the number of proteins annotated with disease.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
Proteomes and disease annotation

A total of 34 111 proteins predicted to be encoded in the human genome were obtained from the Ensembl human v23.34e.1 database (Birney et al., 2004Go). We term this dataset the Ensembl protein dataset. OMIM (Hamosh et al., 2005Go) based disease annotations for human genes were obtained using the Ensmart tool (Hammond and Birney, 2004Go) and mapped to 2113 in the Ensembl protein dataset. OMIM is a database focused on heritable genetic diseases, most of which are of high penetrance. A larger set of 39 801 human proteins were obtained from the Biomax Human Genome Database (BHGDB), a product of Biomax Informatics (http://www.biomax.de). In all, 4352 proteins from the Biomax genome were manually annotated with disease information.

High-quality disease proteins

We consider all proteins with any disease-related annotation as disease proteins for our analysis. This annotation varies from suggestions of disease susceptibility effects upon mutation to disease–gene associations identified by positional cloning or by multiple methods. Disease-causing mutations in OMIM, however, are marked with a number in parentheses indicating whether the mutation was positioned by mapping the wild-type gene (1), by mapping the disease phenotype itself (2) or by both approaches (3) (see http://www.ncbi.nlm.nih.gov/Omim/omimfaq.html). To check if our results were different using higher quality data, a subset of 1470 proteins was obtained from the 2113 Ensembl protein dataset by excluding proteins that were not associated with a disease marked with a ‘(3)’ in the OMIM annotation. We term this the ‘high-quality’ disease protein set.

Proteins associated with disease caused by amino acid substitution

Proteins associated with disease caused by amino acid substitution (DPAA) were first screened for by text scanning OMIM entries associated with proteins found in the Ensembl and Biomax datasets using a Perl script. Only OMIM entries associated with a single protein in each dataset were included in order to avoid potential errors in mapping amino acid substitutions to the wrong protein. Potential DPAAs were identified by amino acid substitutions defined by the pattern ANA (where A is a letter or group of letters representing an amino acid and N is a residue number with no possibility of ANA representing nucleotide substitutions) in the OMIM text. For example, L234E would be considered an amino acid substitution while A23C would be ignored for this study, as A and C could be potentially represent nucleotides. The list of potential DPAAs was then refined manually.

Protein properties

In all cases in which disease and non-disease proteins were compared, only the largest protein encoded by each gene was included, as done by López-Bigas and Ouzounis (2004)Go. Protein length, pI and SCOP (Andreeva et al., 2004Go) assignments were obtained from the PEDANT system (Riley et al., 2005Go). SCOP folds were assigned to proteins if the corresponding sequences were within a BlastP (Altschul et al., 1997Go) E-value of 10–6. The residues A, C, F, G, I, L, M, P, V, W and Y were considered to be hydrophobic and H, Q, N, S, T, K, R, D and E were considered hydrophilic in this study.

Designability

Protein designability was measured by counting the number of families in each fold contained in a given protein and taking the minimum. For example, if protein A contains three domains with folds F1, F2 and F3 and these folds in turn contain eight, three and seven families, respectively, protein A's minimum family count would be three. By recording the minimum family count of the folds in proteins, we assessed their designability by computing the designability of their least designable fold.


    Results and discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
The comparison of disease and non-disease proteins was carried out in three steps. First, the full sets of disease proteins and non-disease proteins were compared. Second, as the study here involves comparing the sequence properties of proteins in which the disease state is attributed to the presence of altered proteins, we reduced our disease protein datasets to include only such proteins. Because of computational ease, we selected those proteins annotated with disease-causing amino acid substitutions as our reduced disease protein set (see Materials and methods) and repeated our comparison. Third, we identified a high-quality disease protein set within the Ensembl dataset (see Materials and methods) and again repeated our comparison. The conclusions drawn from comparisons using these repeated experiments were not different from those found using the full disease datasets (Supplementary material S3–S7 available at PEDS Online). Counts of proteins used in this study are listed in Table I.


View this table:
[in this window]
[in a new window]
 
Table I. Counts of disease and non-disease proteinsa

 
Disease proteins tend to have folds with less families

Mutation or environmental change can disrupt and/or create aberrant function in proteins. Disease proteins were hypothesized to contain more often structures which are susceptible to perturbation by mutation or external stresses. Structures which are more designable tend to be more robust against mutation and thermal fluctuations (Zhang, 1997Go; Wingreen et al., 2004Go). Because a direct relationship exists between the number of sequences and the number of families in protein folds (Zhang et al., 1997Go), one can measure a protein fold's designability, by counting the number of families in that fold. To provide an estimate of a protein's structural susceptibility to perturbing influences, we counted the number of families in each SCOP domain fold (Andreeva et al., 2004Go) contained in the protein and recorded the minimum count (see Materials and methods). In other words, we assessed the designability of proteins by computing the designability of their least designable fold. We reason that if a domain which occupies a portion of a protein is destabilized and becomes misfolded owing to stress or mutation (within or outside the domain), it is likely that the function of the entire protein would be affected. Therefore, it was intuitive that we assessed designability in this way.

Analysis of our human datasets revealed that disease proteins tend to have significantly smaller minimum family counts than the average human protein. Disease proteins have a noticeably larger proportion of folds containing only one family than non-disease proteins (Figure 1). A similar trend was observed when SCOP superfamilies were counted instead of families (data not shown). Overall, these results were independent of the length of proteins examined (see Supplementary material S1). Taken together, our study on designability and disease suggest that disease proteins tend to be intrinsically less robust to mutation or external stresses than those belonging to the average human protein.



View larger version (14K):
[in this window]
[in a new window]
 
Fig. 1. Distribution of family counts associated with folds in disease and non-disease Ensembl proteins. The minimum family counts for disease and non-disease proteins are shown. The mean counts for disease proteins was significantly lower than those pertaining to the non-disease and the combined non-disease + disease protein datasets (Mann–Whitney test, P < 0.01; Kolmogorov–Smirnov test, D > 5%, P < 0.01).

 
Interestingly, the largest proportion of disease-related mutations (~50%) in the Human Gene Mutation database (Stenson et al., 2003Go) are missense mutations. Evidence that a large proportion, if not the majority, of missense mutations have structural consequences (Wang and Moult, 2001Go; Ferrer-Costa et al., 2002Go; Ramensky et al., 2002Go; Terp et al., 2002Go; Steward et al., 2003Go; Reumers et al., 2005Go) suggests that structural robustness to mutation is a significant factor influencing disease propensity. It is likely that the structural vulnerability of disease proteins with low designability has contributed to their propensity to be associated with disease, but the extent remains to be fully assessed.

Disease proteins are more likely to aggregate

A possible consequence of structural perturbation by mutation or environmental change is that of misfolding or unfolding of proteins leading to aggregate formation. Aggregates or their precursors are cytotoxic and can cause cell death (Bucciantini et al., 2002Go, 2004Go). In comparing disease and non-disease proteins, we find that the former tend to have isoelectric points closer to neutrality and more stretches of alternating hydrophobic–hydrophilic residues (of length 5 or more) than the latter (Figure 2A and B). Such properties have been implicated to increase aggregation rates of unfolded proteins in in vitro experiments (Chiti et al., 2003Go; DuBay et al., 2004Go). These results suggest that disease proteins tend to be more aggregation prone than non-disease proteins and complement work by Dima and Thirumalai (2004)Go, which suggested that low sequence correlation entropies, mixed charged-hydrophobic and charged-polar runs in proteins may be indicative of disease association and tendency to aggregate.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 2. Distributions of properties associated with protein aggregation in Ensembl disease and non-disease proteins. (A) Distance of isoelectric point from neutrality. (B) Hydrophilic–hydrophobic stretch count. Disease proteins tend to have isoelectric points closer to neutrality and more hydrophilic-hydrophobic stretches than non-disease proteins and the combined set of disease + non-disease proteins dataset (Mann–Whitney test, P < 0.01; Kolmogorov–Smirnov test, D > 5%, P < 0.01).

 
Furthermore, the number of alternating hydrophobic–hydrophilic stretches in proteins correlates well with the length of proteins (Pearson R2 > 0.8) (Figure 3A). Longer proteins also tend to have isoelectric points closer to neutrality, although this attribute does not correlate well with length (Figure 3B). In addition, longer proteins, especially those containing multiple domains, must undergo many more folding processes to achieve their native states than short single-domain proteins. These properties are likely to predispose longer proteins for misfolding and subsequent aggregation. Indeed, in vitro folding of relatively large eukaryotic proteins frequently fails. The finding that disease proteins tend to be much longer (López-Bigas and Ouzounis, 2004Go) (~1.5x and 1.7x longer on average in the Ensembl and Biomax datasets, respectively) and less designable than non-disease proteins suggests that disease proteins are more likely to misfold upon mutation or stress. With isoelectric points closer to neutrality and greater number of hydrophobic–hydrophilic stretches, disease proteins would likely have a greater potential for aggregation than non-disease proteins.



View larger version (10K):
[in this window]
[in a new window]
 
Fig. 3. Correlation of Ensembl protein properties against length. (A) Hydrophilic–hydrophobic stretch count. (B) Distance of isoelectric point from neutrality. Pearson R2 correlations are (A) 0.83 and (B) 0.05. A line of best fit is drawn in grey for (A) diagramming the strong correlation between length and hydrophilic–hydrophobic stretches counts. For (B), a cumulative mean curve is shown in grey indicating tendencies for proteins to have isoelectric points near neutrality as length increases.

 
Many disease proteins are duplicates of other disease proteins

There are reasons to believe that sequence similarity to known disease proteins may also be a significant factor contributing to disease propensity. Protein length, designability, isoelectric point and sequence stretch patterns have so far been implicated to contribute to disease propensity. All of these properties are dependent on the sequences of the corresponding proteins. In addition, highly sequence similar proteins are likely to share interacting partners (Yu et al., 2004Go) which may serve to link disease proteins functionally to non-disease proteins. Such functional linkage may indicate that both disease and non-disease proteins share a function that when disrupted would cause disease. Alternatively, disrupting the function of a non-disease protein functionally linked to a disease protein may subsequently disrupt the function of the disease protein and cause disease. Moreover, if the DNA sequences which encode proteins are sufficiently similar, non-allelic homologous recombination, a mechanism associated with disease (Bailey et al., 2002Go; Shaw and Lupsky, 2004Go), may also occur. Hence one may hypothesize that annotating non-disease proteins, highly sequence similar to disease proteins, as disease proteins would be valid for many proteins. To confirm this, the proportions of disease proteins in the Ensembl and Biomax human databases with duplicates annotated with disease were assessed (Table II). Almost 40% of the disease proteins in the Biomax database (and 50% in the Ensembl database) have duplicates (paralogs) associated with disease. Over one-fifth of disease proteins have all duplicates associated with disease in both databases. Disease proteins represent 5 and 9% of proteins in the Ensembl and Biomax datasets, respectively (Table I). In contrast, the chance that a duplicate of a disease protein is also a disease protein is significantly higher than expected ({chi}2 test: P < 0.01) at 20 and 29% in the Ensembl and Biomax datasets, respectively (Table II). These findings strongly suggest that assigning disease status to non-disease proteins based on high sequence similarity is valid for many proteins.


View this table:
[in this window]
[in a new window]
 
Table II. A large proportion of disease proteins have duplicates associated with diseasea

 
Notably, many disease proteins have been associated with a different disease annotation than one of their duplicates. For example, the protein ATP2A2 (NP_001672) is annotated with Darier disease (OMIM: 124200), an autosomal dominant skin disorder, whereas its duplicate ATP2A1 is annotated as being associated with Brody myopathy (OMIM: 108730), a disorder characterized by painless muscle cramping and exercise-induced impairment of muscle relaxation. The high sequence similarity between the two proteins (within BlastP E-value of 10–70) suggests that they share a common evolutionary origin. The functional compromise of the two duplicates in different tissues results in two different disease phenotypes. A similar phenomenon has been reported for PYPAF1 and NOD2 (Albrecht et al., 2003Go).

The large proportion of disease proteins with duplicates associated with disease suggests that gene duplication is a significant phenomenon contributing to the expansion of disease-prone protein families. These families may contain proteins associated with different diseases. A clustering of OMIM entries based on sequence similarity of their associated proteins is shown in Figure 4. Examining such clusters allows the identification of disease families, which may yield new insight into how diseases may be related.



View larger version (45K):
[in this window]
[in a new window]
 
Fig. 4. Clustering of OMIM diseases. OMIM entries corresponding to 1132 Ensembl proteins annotated with disease were clustered with ClustalW (Chenna et al., 2003Go) based on protein sequences using default parameters and displayed using HyperTree (Bingham and Sudarsanam, 2000Go). The corresponding Phylip file is available in the Supplementary material.

 
Conclusion

By comparing disease and non-disease proteins, we have shown that disease proteins tend to have folds with less families than non-disease proteins, suggesting that they are intrinsically more structurally vulnerable to mutation or environmental stresses. Disease proteins also tend to be longer, have isoelectric points closer to neutrality and more aggregation-prone stretches than non-disease proteins, suggesting that the former are more likely to aggregate than the latter upon unfolding or misfolding. Many disease proteins are duplicates of other disease proteins, reinforcing the notion that sequence similarity to known disease proteins can contribute substantially to disease propensity. These results were apparent even when we defined our disease protein set to include only those proteins with known disease causing amino acid substitutions and when we chose to use a high-quality disease protein set (Supplementary material S3, S5–7).

The reader should be aware that our results are based on incomplete data. We have defined the domains in human proteins using SCOP, which is biased towards domains which are soluble and commendable to structural determination. However, disease proteins tend to be relatively more conserved than non-disease proteins (López-Bigas and Ouzounis, 2004Go). Therefore, even without consideration of what domains are defined by SCOP, one finds that disease proteins are more sequence restricted, consistent with the hypothesis that they have less designable structures (if any) than non-disease proteins.

Our results are also dependent on the accuracy of the human gene models in the human genomes used, the proteins predicted to be expressed using these gene models and the annotation associating proteins with various diseases. Our results are robust to two different human databases with different levels of disease annotation. We have also verified them against a high-quality subset of disease–gene relations.

The probability that a protein becomes associated with disease depends on multiple factors, including the mutation type, the protein involved, the rest of the organism and the environment to which the organism is exposed. Our results are global trends gleaned from data currently stored in databases. The trends do not necessarily apply to specific populations or individuals. It would be of great interest to integrate information from epidemiological studies with the trends derived here to shed light on this matter.

The finding that families of sequence similar disease proteins exist suggests common origins and mechanisms to many of our modern diseases. Drugs targeting one member of a particular family may affect others in that family. Understanding the properties which predispose proteins for disease and how they may have evolved will perhaps aid the identification of novel gene-to-disease relations and the treatment of the associated diseases.


    Acknowledgements
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
We thank Philipp Pagel for extremely helpful advice, Louise Riley and Martin Münsterkötter for creating the PEDANT database tables used in this investigation and Hans-Werner Mewes and Pawel Smialowski for useful comments. This work was funded by a grant from the German Federal Ministry of Education and Research (BMBF) within the BFAM framework (031U112C).


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results and discussion
 Acknowledgements
 References
 
Albrecht,M., Lengauer,T. and Schreiber,S. (2003) Bioinformatics, 19, 2171–2175.[Abstract/Free Full Text]

Altschul,S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

Andreeva,A., Howorth,D., Brenner,S.E., Hubbard,T.J., Chothia,C. and Murzin,A.G. (2004) Nucleic Acids Res., 32, D226–D229.[Abstract/Free Full Text]

Bailey,J.A., Gu,Z., Clark,R.A., Reinert,K., Samonte,R.V., Schwartz,S., Adams,M.D., Myers,E.W., Li,P.W. and Eichler,E.E. (2002) Science, 297, 1003–1007.[Abstract/Free Full Text]

Bingham,J. and Sudarsanam,S. (2000) Bioinformatics, 16, 660–661.[Abstract]

Birney,E. et al. (2004) Genome Res., 14, 925–928.[Abstract/Free Full Text]

Bortoluzzi,S., Romualdi,C., Bisognin,A. and Danieli G.A. (2003) Physiol. Genomics, 15, 223–227.[Abstract/Free Full Text]

Botstein,D. and Risch,N. (2003) Nat. Genet., 33(Suppl.), 228–237.[CrossRef][ISI][Medline]

Bucciantini,M., Giannoni,E., Chiti,F., Baroni,F., Formigli,L., Zurdo,J., Taddei,N., Ramponi,G., Dobson,C.M. and Stefani,M. (2002) Nature, 416, 507–511.[CrossRef][ISI][Medline]

Bucciantini,M., Calloni,G., Chiti,F., Formigli,L., Nosi,D., Dobson,C.M. and Stefani,M. (2004) J. Biol. Chem., 279, 31374–31382.[Abstract/Free Full Text]

Carlson,C.S., Eberle,M.A., Kruglyak,L. and Nickerson,D.A. (2004) Nature, 429, 446–452.[CrossRef][ISI][Medline]

Chenna,R., Sugawara,H., Koike,T., Lopez,R., Gibson,T.J., Higgins,D.G. and Thompson,J.D. (2003) Nucleic Acids Res., 31, 3497–3500.[Abstract/Free Full Text]

Chiti,F., Stefani,M., Taddei,N., Ramponi,G. and Dobson,C.M. (2003) Nature, 424, 805–808.[CrossRef][ISI][Medline]

Dima,R.I. and Thirumalai,D. (2004) Bioinformatics, 20, 2345–2354.[Abstract/Free Full Text]

Dobson,C.M. (2004) Semin. Cell Dev. Biol., 15, 3–16.[CrossRef][ISI][Medline]

DuBay,K.F., Pawar,A.P., Chiti,F., Zurdo,J., Dobson,C.M. and Vendruscolo,M. (2004) J. Mol. Biol., 341, 1317–1326.[CrossRef][ISI][Medline]

England,J.L., Shakhnovich,B.E. and Shakhnovich,E.I. (2003) Proc. Natl Acad. Sci. USA, 100, 8727–8731.[Abstract/Free Full Text]

Ferrer-Costa,C., Orozco,M. and de la Cruz,X. (2002) J. Mol. Biol., 315, 771–786.[CrossRef][ISI][Medline]

Hammond,M.P. and Birney,E.R (2004) Trends Genet., 20, 268–272.

Hamosh,A., Scott,A.F., Amberger,J.S., Bocchini,C.A. and McKusick,V.A. (2005) Nucleic Acids Res., 33, D514–D517.[Abstract/Free Full Text]

Huang,H. et al. (2004) Genome Biol., 5, R47.[CrossRef][Medline]

Li,H., Helling,R., Tang,C. and Wingreen,N. (1996) Science, 273, 666–669.[Abstract]

López-Bigas,N. and Ouzounis,C.A. (2004) Nucleic Acids Res., 32, 3108–3114.[Abstract/Free Full Text]

Ramensky,V., Bork,P. and Sunyaev,S. (2002) Nucleic Acids Res., 30, 3894–3900.[Abstract/Free Full Text]

Reumers,J., Schymkowitz,J., Ferkinghoff-Borg,J., Stricher,F., Serrano,L. and Rousseau,F. (2005) Nucleic Acids Res., 33, D527–D532.[Abstract/Free Full Text]

Riley,M.L., Schmidt,T., Wagner,C., Mewes,H.W. and Frishman,D. (2005) Nucleic Acids Res., 33, Database Issue, D308–D310.[Abstract/Free Full Text]

Ross,C.A. and Poirier,M.A. (2004) Nat. Med., 10, S10–S17.[CrossRef][Medline]

Scully,J.L. (2004) EMBO Rep., 5, 650–653.[Free Full Text]

Shaw,C.J. and Lupski,J.R. (2004) Hum. Mol. Genet., 13, R57–R64.[Abstract/Free Full Text]

Smith,N.G. and Eyre-Walker,A. (2003) Gene, 318, 169–175.[CrossRef][ISI][Medline]

Stenson,P.D., Ball,E.V., Mort,M., Phillips,A.D., Shiel,J.A., Thomas,N.S., Abeysinghe,S., Krawczak,M. and Cooper,D.N. (2003) Hum. Mutat., 21, 577–581.[CrossRef][ISI][Medline]

Steward,R.E., MacArthur,M.W., Laskowski,R.A. and Thornton,J.M. (2003) Trends Genet., 19, 505–513.[CrossRef][ISI][Medline]

Terp,B.N., Cooper,D.N., Christensen,I.T., Jorgensen,F.S., Bross,P., Gregersen,N. and Krawczak,M. (2002) Hum. Mutat., 20, 98–109.[CrossRef][ISI][Medline]

Wang,Z. and Moult,J. (2001) Hum. Mutat., 17, 263–270.[CrossRef][ISI][Medline]

Wingreen,N., Li,H. and Tang,C. (2004) Polymer, 45, 699–705.[CrossRef][ISI]

Winter,E.E., Goodstadt,L. and Ponting,C.P. (2004) Genome Res., 14, 54–61.[Abstract/Free Full Text]

Yu,H., Luscombe,N.M., Lu,H.X., Zhu,X., Xia,Y., Han,J.D., Bertin,N., Chung,S., Vidal,M. and Gerstein,M. (2004) Genome Res., 14, 1107–1118.[Abstract/Free Full Text]

Zhang,C.T. (1997) Protein Eng., 10, 757–761.[CrossRef][ISI][Medline]

Received February 15, 2005; revised May 25, 2005; accepted August 3, 2005.

Edited by Luis Serrano





This Article
Abstract
Full Text (PDF)
[Supplementary data]
All Versions of this Article:
18/10/503    most recent
gzi056v1
Alert me when this article is cited
Alert me if a correction is posted
Services
Email this article to a friend
Similar articles in this journal
Similar articles in ISI Web of Science
Similar articles in PubMed
Alert me to new issues of the journal
Add to My Personal Archive
Download to citation manager
Request Permissions
Google Scholar
Articles by Wong, P.
Articles by Frishman, D.
PubMed
PubMed Citation
Articles by Wong, P.
Articles by Frishman, D.