When the human genome sequence was completed ahead of schedule in 2000, news reports declared the start of a new race to decode the human proteomethe full set of proteins encoded by those genes.
As impressive as biologys first Big Science milestone was, the challenges of proteomics make genomics look almost elementary. In fact, researchers say, the comparison itself is misleading, because no definitive proteome actually exists for humans or any other organism.
But while the concept may be slippery, proteomicsa term coined in 1994 by Mark Wilkins and colleagues at Macqarie University in Sydney, Australiahas undeniably become biologys favorite buzzword. Start-up companies and established pharmaceutical firms are lofting its banner to signal their position on the leading edge of biotech. Proteomics centers and institutes are proliferating at universities around the world. And the Human Proteome Organization will hold its first annual meeting in Versailles, France, in November.
So, what exactly is proteomics? The National Cancer Institutes Frederick Cancer Research Facility in Frederick, Md., offers up this definition: "the study of all the protein forms expressed within an organism as a function of time, age, state, external factors, etc." Each cell in an organism presents a constantly changing kaleidoscope of proteins that wax and wane depending on what job the cell is doingand whether its function is compromised by disease.
Although their genomes might be identical, the proteome of a neuron carrying visual information from the retina, for example, would be quite different from the proteome of a lymphocyte engaged in fending off a microbial invasion, or that of a breast cancer cell proliferating out of control.
Another index of proteomic complexity is the relatively paltry number of human genes. The current estimate, 30,000 to 40,000, one-third to one-tenth of what most genome researchers predicted just a few years ago. Aside from the humbling fact that humans have not many more genes than worms, the major lesson from this surprisingly low count is a greater appreciation of the role played by processes that are not coded in the DNA. These include alternate splicing of RNA to produce different forms of a protein and post-translational modificationssuch as phosphorylation and glycosylationthat act as on-off switches for proteins.
In one sense, the central questions asked in proteomics are the same ones that scientists in biochemistry, cell biology, and related fields have been wrestling with for decades: What proteins are present in which cells? How do they work together in signaling pathways? What are the changes in protein expression and activity that drive the development, repair, breakdown, and death of an organism?
What proteomics adds is the vision of these research areas transformed and empowered by "high-throughput" technological breakthroughssuch as the invention of polymerase chain reaction and the exponential gains in sequencing speedthat drove the genomics revolution of the late 20th century. Advances in 2-dimensional gel electrophoresis, liquid chromatography, mass spectrometry, and bioinformatics have moved the field forward, but so far, no single technology has emerged to revolutionize proteomics the way PCR did for genomics.
"Proteomics is very much a jellyfish right nowvery amorphous and hard to get a clear sense of," said Joshua LaBaer, M.D., Ph.D., director of the Harvard Institute of Proteomics at Harvard Medical School in Boston. But two major areas of activity can be distinguished, he said. In what he terms "abundance-based" proteomics, researchers look at which proteins are present in a given tissue or body fluid.
|
The potential power of abundance-based proteomics came to light with a February publication in the British medical journal the Lancet, A team of researchers led by Emanuel Petricoin, M.D., of the Division of Therapeutic Products at the U.S. Food and Drug Administration, and Lance Liotta, M.D., Ph.D., of NCIs Laboratory of Pathology, used protein patterns in blood serum to distinguish ovarian cancer patients from healthy women or those with noncancerous ovarian pathology. Their method correctly identified 50 of 50 ovarian cancer specimens and 63 of 66 noncancer specimensfar more accurate than any ovarian cancer detection tests used now, such as CA125. Most importantly, the method was just as good at identifying patients with early-stage cancer as those with advanced disease.
The technique will have to be validated on much larger numbers of patients and in prospective studies before it can be used in clinical practice. But J. Carl Barrett, Ph.D., director of NCIs Center for Cancer Research, said the study has already broken new ground. "The idea that rather than a single biomarker, an entire pattern of proteins contains important diagnostic information, is an exciting new paradigm," he said.
The method employs a number of technologies commonly used in proteomics. But its particular power lies in two techniques: Liottas lasercapture microdissection, which gives a snapshot of expressed proteins in a small number of cells (see News, Nov. 1, 2000, p. 1710), and the pattern-recognition algorithms developed by the bioinformatics company Correlogic of Bethesda, Md. The functions of the five proteins that distinguish between ovarian cancer patients and unaffected women test are completely unknown. Although the Lancet authors suggest that exploring their functions may elucidate disease processes, it is the sheer numbers-crunching, not any insight about what the proteins do, that holds the key to the tests utility.
The second major area of proteomicsand the one that will ultimately have a far greater impact, according to LaBaeris functional proteomics: figuring out what each proteins role is and how proteins interact. "The variety of techniques involved is much broader, but also at a much earlier state of development," he said. These include the yeast two-hybrid system, immunoprecipitation-based methods, protein microarrays and the Harvard Institutes work to refine high-throughput methods for protein purification.
Different proteins typically require different conditions for their expression and purification. A big challenge for proteomics is to find the best ways to express and purify thousands of proteins in parallel. In the March 5 Proceedings of the National Academy of Sciences, LaBaer and colleagues report on their efforts to do this. They took a test set of 32 human genes of varying size, expressed them in the bacterium E. coli, and attached each product to four different chemical "purification tags." Applying their methods to 336 randomly selected human genes, they successfully purified 60% of them, a figure they believe represents a reasonably lower estimate for the overall success rate of this strategy.
"The availability of such methods will alleviate a key bottleneck in the application of new proteomic techniques such as protein arrays to human biology," the authors wrote.
Among private-sector groups that have begun to stake out territory in proteomics, the most ambitious appears to be the alliance of Myriad Genetics, Inc., of Salt Lake City; Hitachi, Ltd., of Tokyo; and Oracle of Redwood Shores, which announced in April 2001 that it plans to commit $185 million to "map the human proteome in less than 3 years."
William S. Hancock, vice president and general manager of proteomics at Thermo Finnigan, San Jose, Calif., is the editor of the American Chemical Societys new Journal of Proteome Research.
He said the commercial claim to "map the proteome" contains a degree of hype, because the effort, comprehensive though it may be on its own terms, relies solely on one type of protein analysis, the yeast two-hybrid method, and will yield only a fraction of all potential information about the proteins.
"In this case, what youre mapping is simple protein interactions. But youre still not characterizing the full set of a proteome, not doing quantitation, not measuring interactions of complexity, not measuring activity, not measuring localization, not measuring post-translational modificationthe list goes on and on," said Hancock. "Were all going to scratch our heads and try to understand the significance of it."
LaBaer agreed, calling the commercial claim to map the proteome "a clever turn of phrase to capitalize on the Human Genome Project press."
LaBaer and Hancock both emphasized that in the long run, the most substantial progress in proteomics is likely to come from publicly shared research efforts, not proprietary commercial programs.
One model that is already up and running, Hancock said, is Canadas Biomolecular Interaction Network Database, which stores full descriptions of interactions, molecular complexes and pathways involving proteins, nucleic acids and small molecules.
Another freely available resource will be the Harvard Institute of Proteomics FLEX (Full Length Expression) repository, containing all known genes from humans and other organisms in a form that can be expressed as proteins in a variety of experimental systems.
The repository is intended to "create a gold standard" for proteomics researchers worldwide, LaBaer said.
Hancock suggested that one way to organize the workload would be for programs to take on the proteomes of different organs or tissue. Germany has announced that it plans to characterize the proteome of the human brain, he said.
![]() |
||||
|
Oxford University Press Privacy Policy and Legal Statement |