Faculty of Natural Science, Department of Mathematics and Computer Science, Ben Gurion University, Beer-Sheva 84015, Israel
![]() |
Introduction |
---|
![]() ![]() ![]() |
---|
I have analyzed the 3D fold assignments for the genome of Mycoplasma genitalium (Fraser et al., 1995), which due to its small size has served as a minimal model organism for various studies. Several publications have reported different fractions of the genome for which 3D folds can be assigned. The earliest works reported fractions as low as 9 and 12% (Casari et al., 1996
; Frishman and Mewes, 1997
; Gerstein, 1997
). Later works using methods aimed at detecting more distant relationships have increased this fraction to 25% (Fischer and Eisenberg, 1997
), and more recently, up to around 40% (Huynen et al., 1998; Rychlewski et al., 1988; Teichmann et al., 1998; Jones, 1999; Wolf et al., 1999 and others; for recent reviews on this topic see Fischer and Eisenberg, 1999a; Teichmann et al., 1999). The differences in the reported fractions depend mainly on (i) the methods' sensitivities (the rate of true positives) and their selectivities (the rate of false positives); (ii) whether assignments are accounted for full structural domain matches or for only small sequencestructure segments and (iii) the date that the study was done (which determines the number of known sequences and structures and hence the number of sequences that can be assigned to known folds).
To evaluate how much the increase in the fraction of assignable ORFs depends on the number of available folds, I have compared the fold assignment of M.genitalium proteins obtained by one particular method using three different sets of structures. The method used in this comparison (Fischer and Eisenberg, 1997) is aimed at detecting full structural domain matches and uses rather conservative thresholds (the method chosen to carry out this comparison is irrelevant; qualitatively similar results are likely to be obtained with any other method). When using only those structures available before 1996 only 20% of the genome could be assigned a fold. With structures from the PDB available in April 1997, 25% of the genome was assigned a fold (Fischer and Eisenberg, 1997
). When using all the structures available in October 1998, the fraction of assigned proteins reached 32% (see http: //www.doe-mbi.ucla.edu/people/frsvr/preds/MG/MG.html).This indicates that because of the availability of more structures, the fraction of assignable ORFs has increased at an annual rate of roughly 18% (Fischer and Eisenberg, 1999a; see also Teichmann et al., 1999 and references therein).
Will the rate of increase in fold assignment be sustained throughout the next few years? To address this question, I have analyzed the distribution of the fold assignments of M.genitalium among the various functional categories described by Fraser et al. (1995). Table I shows that the three categories with the largest percentages of folds assigned are purine metabolism, energy metabolism and translation-tRNA. For example, all but two ORFs in the first category have been assigned a fold. As expected, and mostly due to the difficulties in determining the structures of membrane proteins, the three least covered categories are cell envelope, unknown and transport. The last column in Table I
shows that the largest number of non-membrane proteins with no assigned fold belong in the unknown and ribosomal categories (ORFs characterized as membranal or with putative transmembrane helices were excluded).
|
![]() |
Notes |
---|
![]() |
References |
---|
![]() ![]() ![]() |
---|
Dujon,B. et al. (1994) Nature, 369, 371377.[ISI][Medline]
Fischer,D. and Eisenberg,D. (1997) Proc. Natl Acad. Sci. USA, 94, 1192911934.
Fischer,D. and Eisenberg,D. (1999a) Curr. Opin. Struct. Biol., 9, 208211.[ISI][Medline]
Fischer,D. and Eisenberg,D. (1999b) Bioinformatics, 15, 759762.
Fraser,C. et al. (1995) Science, 270, 397403.[Abstract]
Frishman,D. and Mewes,H.-W. (1997) Nature Struct. Biol., 4, 626628.[ISI][Medline]
Gerstein,M. (1997) J. Mol. Biol., 274, 562576.[ISI][Medline]
Goffeau,A. et al. (1996) Science, 274, 546547.
Huynen,M., Doerks,T., Eisenhaber,F., Orengo,C., Sunyaev,S., Yuan,Y. and Bork,P. (1998) J. Mol. Biol., 280, 323326.[ISI][Medline]
Jones,D. (1999) J. Mol. Biol., 287, 797815.[ISI][Medline]
Kim,S.H. (1997) Nature Struct. Biol., 5, 643645.[ISI]
Rost,B. (1998) Structure, 6, 259263.[ISI][Medline]
Rychlewski,L., Zhang,B. and Godzik,A. (1998) Folding Des., 3, 229236.[ISI][Medline]
Teichmann,S., Park,J. and Chothia,C. (1998)Proc. Natl Acad. Sci. USA,95, ???-???.[Medline]
Teichmann,S., Chothia,C. and Gerstein,M. (1999) Curr. Opin. Struct. Biol., 9, 390399.[ISI][Medline]
Wolf,Y., Brenner,S., Bash,P. and Koonin,E. (1999) Genom. Res., 9, 1726.
Zarembinski,T. et al. (1998) Proc. Natl Acad. Sci. USA, 95, 1518915193.
Received June 28, 1999; revised September 9, 1999; accepted September 9, 1999.