From The Plasma Proteome Institute, Washington, D. C. 20009-3450
![]() |
ABSTRACT |
---|
Having lived through the recent superlatives of the human genome effort(s) and the expectations these generated, it might be thought unwise to use such hyperbole in the more modest world of proteins and proteomics. In fact, the exceptional nature of plasma does not lead us to the sin of self-congratulation insofar as we are in no imminent danger of completing its analysis or even of making optimal use of its diagnostic possibilities. At this stage, the combination of extreme analytical difficulty with well founded hopes for radical improvements in disease diagnosis provides a strong case for increased research effort and in particular some systematic means of accelerating an exploration that has been in process for many decades while so far yielding only a handful of medically useful nuggets.
Molecular biology, including the genome and proteome projects, is revolutionizing the biological and medical sciences, holding out the promise of both fully understanding and effectively treating all human diseases (1). These projects epitomize the ultimate goal of reductionist biology, which is a complete analysis and description of living systems at the molecular level. In the one case quasi-completed so far (the human genome), billions of dollars were raised, tens if not hundreds of thousands of patents were filed, and new large integrated laboratories were constructed and operated on a crash basis. And yet this is now generally concluded to have been simply laying the foundation for proteomics, a field that requires completely different technologies, sympathy for a very different sort of molecules, and ultimately a very different scale. We are currently in the phase of seeking shortcuts through proteomics analogous to the path that shotgun sequencing blazed through the genome, but without any guarantee that one exists.
Against this backdrop it may be useful to take a somewhat broader view than might be expected in a review of one particular proteome. Hence we have attempted to survey the larger context of the plasma proteome as well as the history and status of efforts to explore and use it medically. Finally, we have indulged in some speculation as to the kinds of efforts needed to reach the next stage in the analysis of plasma and its diagnostic applications.
In what follows we use the term "plasma" to embrace all the protein components of the blood soluble phase (excluding cells) and not as a prescription for a specific sample processing technique. We could have referred instead to the "serum" proteome but chose plasma because it is in a sense the larger, parent collection from which other related samples are derived.
![]() |
DEFINING THE PLASMA PROTEOME |
---|
Elaborating on Putnams classification from a functional viewpoint, we can classify the protein content of plasma into the following design/function groups.
Proteins Secreted by Solid Tissues and That Act in Plasma
The classical plasma proteins are largely secreted by the liver and intestines. A key aspect of plasma proteins is a native molecular mass larger than the kidney filtration cutoff (45 kDa) and thus an extended residence time in plasma (albumin, which is just larger than the cutoff, has a lifetime of about 21 days).
Immunoglobulins
Although the antibodies typically function in plasma, they represent a unique class of proteins because of their complexity: there are thought to be on the order of 10 million different sequences of antibodies in circulation in a normal adult.
"Long Distance" Receptor Ligands
The classical peptide and protein hormones are included in this group. These proteins come in a range of sizes, which may indicate a range of time scales for their control actions (i.e. rapid adjustment with small hormones such as insulin and slower adjustments with larger hormones such as erythropoietin).
"Local" Receptor Ligands
These include cytokines and other short distance mediators of cellular responses. In general these proteins have native molecular weights under the kidney filtration cutoff (and hence relatively short residence times in plasma) and appear to be designed to mediate local interactions between cells followed by dilution into plasma at ineffective levels. High plasma levels may cause deleterious effects remote from the site of synthesis, e.g. sepsis.
Temporary Passengers
These include non-hormone proteins that traverse the plasma compartment temporarily on their way to their site of primary function, e.g. lysosomal proteins that are secreted and then taken up via a receptor for sequestration in the lysosomes.
Tissue Leakage Products
These are proteins that normally function within cells but can be released into plasma as a result of cell death or damage. These proteins include many of the most important diagnostic markers, e.g. cardiac troponins, creatine kinase, or myoglobin used in the diagnosis of myocardial infarction.
Aberrant Secretions
These proteins are released from tumors and other diseased tissues, presumably not as a result of a functional requirement of the organism. These include cancer markers, which may be normal, non-plasma-accessible proteins expressed, secreted, or released into plasma by tumor cells.
Foreign Proteins
These are proteins of infectious organisms or parasites that are released into, or exposed to, the circulation.
Given this variety of classes of protein components, how many "proteins" are likely to be present in plasma? A reasonable calculation could be proposed in three stages. First, assume as a base line that there is a modest number (say 500) of true "plasma proteins" (the first group indicated above) and that each of these is present in 20 variously glycosylated forms (since most plasma proteins are heavily glycosylated) and in five different sizes (including precursors, "mature" forms, degradation products, and splice variants), yielding a total of 50,000 molecular forms. A second large set of components is contributed by tissue leakage: this is effectively the entire human proteome (say 50,000 gene products), each of these gene products having (on average) 10 splice variants, post-translational modifications, or cleavage products, yielding a further 500,000 protein forms. Finally consider the immunoglobulin class as containing perhaps 10,000,000 different sequences. At least in principal, plasma is thus the most comprehensive and the largest version of the human proteome. In comparison with the genome, its degree of complexity is reminiscent of the real number line as compared with the integers: in other words its complexity is not simply n-fold that of the genome but exists on another level entirely. This immense complexity does not doom current efforts to failure, however, because the measurement methods in many cases automatically simplify the picture, collapsing most of the fine variation to yield measurements of all the forms of a protein as one value or at most as a few special classes.
Serum, the protein solution remaining after plasma (or whole blood) is allowed to clot, is very similar to plasma: prothrombin is cleaved to thrombin, fibrinogen is removed (to form the clot), and a limited series of other protein changes (mainly proteolytic cleavages) take place. We use the term plasma preferentially to refer to the soluble proteome of the blood because it is the parent mixture and because there may be persuasive reasons to avoid an in vitro proteolysis process (which may unexpectedly alter some proteins) as part of the preferred sample acquisition protocol.
At present we do not know much about the detailed relationship between plasma (the routinely available sample), the much larger extracellular fluid compartment (17 liters in the average person but practically unsampleable), and the lymph derived from extracellular fluid. Roughly 2.5 liters of lymph flow through the thoracic duct into the blood each day plus another 500 ml through other channels, bringing into the blood much of the protein output of organs like muscles or the liver. Although the total protein concentration of thoracic duct lymph is only about half that of plasma (and must generally be so to support the Starling equilibrium governing fluid transport out of the capillaries), it transports a great deal of protein and in particular contains 510 times as much lipoprotein as plasma. A comprehensive examination of the relationship between lymph and plasma by proteomics methods remains to be done.
A series of other body fluids including cerebrospinal fluid, synovial fluid, and urine (the ultimate destination of most of the <60-kDa protein material in plasma) share some of the protein content of plasma with specific local additions that reveal interesting clinical information. Unfortunately, these samples are more difficult to obtain in a useful state than plasma: collection of cerebrospinal fluid and synovial fluid are invasive procedures involving pain and some risk, while urine is more difficult to process to a useful sample quickly in a clinical setting (centrifugation to remove cells that can lyse if left in suspension, prevention of microbial growth, and concentration).
Is there only one plasma proteome, or are there many: arterial, venous, capillary, capillary in different tissues, etc.? This question is, in many ways, one of timing. Pharmacokinetic studies indicate that there is a central volume of vascular blood that circulates (and is presumably mixed) fairly quickly: the almost immediate appearance in most organs of magnetic resonance imaging or computed tomography imaging contrast agents injected as bolus doses into the venous circulation attests to a very rapid (seconds to a minute) homogenization of this volume of blood. Exchange with the larger volume of blood that is not in the major vessels takes longer and depends on transport of blood through the whole path of arteries to arterioles to capillaries to venules to veins and back through the heart. This process generally has a time scale of minutes to a few hours depending on the molecular weight of the protein and on the flow of extracellular fluid from the site of manufacture either to a nearby capillary or to the lymphatics. We would thus expect the "immediacy" of a protein marker in plasma to depend on its site of origin with time scales of minutes to hours possible for different molecules and sites.
A further important feature of the plasma proteome is that it is the furthest removed, among tissue proteomes, from the mRNA level. While many of the major plasma proteins are synthesized in the liver (and comprise many of its most abundant mRNAs (3)), it is known that their plasma levels correlate only poorly with message abundance in liver (4) and presumably even more poorly for proteins synthesized in smaller organs (individually or collectively). For these reasons, plasma is a biological system that can only be approached with protein methods and thus remains beyond the scope of DNA- or RNA-based diagnostics.
![]() |
HISTORY OF SYSTEMATIC EXPLORATION |
---|
Early History and Chemical Methods (Fractionation)
Blood was first emphasized diagnostically by Hippocrates, who proposed that disease was due to an imbalance of four humors: blood, phlegm, yellow bile, and black bile. The importance of this idea was to propose a physical cause, and not a divine one, for human disease, and it remained basic to medical practice for over a thousand years. With Wohlers synthesis of urea in 1828, the distinction between living matter and chemicals began to disappear, and with the enunciation of the cell theory by Schleiden and Schwann, the question of the location of disease could be productively revisited: Virchow described the cellular (as opposed to humoral) basis of disease and finally put an end to phlebotomy as general therapy.
Despite not being a humor or "vital principle," plasma remained a subject of interest throughout this period: in the 1830s Liebig and Mulder analyzed a substance called "albumin," in 1862 Schmidt coined the term "globulin" for the proteins that were insoluble in pure water, and in 1894 Gurber crystallized horse serum albumin (5). Within the last 100 years, two groups revolutionized plasma protein chemistry. One was the group of Cohn (6) and Edsall working on the preparation and fractionation of plasma; during the Second World War, they generated large amounts of albumin and gamma globulin for therapeutic use. The methods they developed are still used in the plasma fractionation industry described below. The second was the Behring Institute, which, using an unusual rivanol precipitation technique, discovered and prepared numerous human plasma proteins, made antibodies against them, and distributed these world wide (7). This latter work, and the development of simple immunological methods for analysis, meant that researchers around the world could readily discover new correlations between the amounts of specific proteins and disease.
Enzyme Activities
Enzyme activities were detectable in body fluids long before the enzyme proteins could be isolated and studied (8). Alkaline and acid phosphatase activities were related to bone disease and prostate cancer, respectively, in the decades before 1950, and in 1955 the enzyme now called aspartate aminotransferase was detected in serum following acute myocardial infarction. The attraction of enzymes as analytes is the sensitivity with which the products of an enzymatic reaction can be detected and the lack of any necessity to fractionate the sample. Serum "chemistry analyzers" represented some of the first fruits of automation in clinical medicine and made possible the concept of batteries of tests rather than one or a very few tests ordered by the astute diagnostician. These evolved from the autoanalyzer developed by Leonard Skeggs (9) in the 1950s and commercialized by Technicon, through computer-controlled instruments such as the centrifugal fast analyzer (10), to the very sophisticated integrated instrument/reagent systems of today. Up to the present time, enzyme assays persist for protein markers of liver toxicity because they are so inexpensive relative to immunoassays (whose development requires development of specific antibody reagents instead of simple chemical substrates).
Enzyme assays have the advantage over all the other assay methods that they measure level of function rather than amount of a molecule: unfortunately many of the enzyme activities measured in plasma probably do not have a physiological function there but rather represent leakage of protein from tissues. Additional drawbacks of enzymatic assays are the difficulty of obtaining an estimate of the mass of protein involved (since results are in activity units), the difficulty of associating some activities with a single protein and hence with a specific source, and the lack of isotype information unless some electrophoretic or other separation precedes the enzyme detection. In any case, since a large proportion of proteins in plasma and elsewhere are not enzymes, alternative means are required to discover and measure them.
Antibodies and Monoclonal Antibodies
All proteins have unique surface shapes, and antibodies are natures answer to accurate shape recognition. It appears to be possible to make a specific antibody to any protein provided that pure protein is available to immunize an animal (the inverse of the limitation encountered with enzymes). Proteins purified by fractionation are useful as antigens for the preparation of classical rabbit and goat polyclonal antibodies, and these antibodies provide the basis for simple immunochemical tests for each protein (e.g. radial immunodiffusion, rocket electrophoresis, or more recently automated nephelometry) as well as more complex and sensitive sandwich assays with enzymatic or radiochemical detection. These technologies provide a general solution (as demonstrated by the Behring Institute) to the problem of measuring one or more proteins individually in large numbers of samples provided that relatively pure protein is available in significant quantities as antigen.
The requirement for an isolated antigen was circumvented, however, following the introduction of monoclonal mouse antibodies by Kohler and Milstein (11) in 1975. The general supposition, confirmed in many situations, is that a monoclonal "sees" a specific epitope that is likely to occur on one protein (or potentially on its very close relatives). The antibody thus serves as a ready-made detection reagent for constructing a specific assay, a potential drug (if the epitope is on a therapeutic target), and an immunoadsorbent useful for isolating the protein from plasma. Many different monoclonals can be produced after immunization with a complex antigen mixture (which can be a tissue or a body fluid), and one can test each to determine whether it sees an antigen that can distinguish diseased samples from normal ones for example. The approach can be likened to screening a library of chemicals against a protein target to discover most new drugs: it is allows one to be lucky if not smart.
As a protein (or more properly, epitope) discovery process, the monoclonal approach was more useful in the discovery of new tissue antigens shed into plasma than in finding new plasma proteins. The reason is presumably that many immunodominant proteins are present at very high abundance in plasma, while tissue extracts are not so rich in a few already known antigens. Thus, for example, many of the newer monoclonal-based cancer detection tests are used clinically before the underlying protein is identified by sequence or gene. A striking example of this phenomenon is cancer antigen 125 (CA 125),1 used in the diagnosis of ovarian and other cancers. First reported in 1984 (12), this protein marker is in widespread clinical use and has been the subject of more than 2,000 scientific publications, yet its sequence was not elucidated until recently (13) in part because the protein is huge: more than a million daltons. Such monoclonal-based discoveries exist initially outside the boundaries of the current genomic/proteomic space. When markers of this type are identified, it is sometimes the case that two epitopes (for which there may be competing commercial tests) are on the same molecule. The CA 27.29, CA 15-3, and CASA assays, for example, all recognize antigenic determinants on the MUC1 mucin protein (14, 15). Current attempts to invert the monoclonal antibody generation process (i.e. expecting to make good antibodies to a list of specific proteins rather than embracing whatever the immune system selects as immunogenic in a mixture) have revealed that antibodies in general are idiosyncratic and not analogous to oligonucleotide probes in their painless generality.
Profiling and Proteomics
The use of analytical separations to look at the plasma proteome parallels very closely the development of the separations themselves: plasma is always among the first samples to be examined. Shortly after Svedberg, using the analytical ultracentrifuge, found that proteins had unique molecular weights, Tiselius found that serum could be fractionated into multiple components on the basis of electrophoretic mobility. His method of electrophoresis, first in liquid and then later in anticonvective media such as paper, cellulose acetate, starch, agarose, and polyacrylamide, has dominated the separative side of plasma proteome work until very recently, evolving through a series of one- and two-dimensional systems and finally to combinations with chromatography and mass spectrometry that generalize to n-dimensions. This evolution has resulted in an almost constant exponential increase in resolved protein species for the past 70 years (Fig. 1) within which one can distinguish at least three separate phases arising as increasingly complex separations were required to continue forward progress. We do not review in detail the "one-dimensional" phase of this development but begin with the era of "proteomics" and two-dimensional gels, which for 20 years have been the core of proteomic technology and the source of most published work on the plasma proteome.
|
The practical utility of 2-DE for studies of the high abundance plasma proteome has been substantial. Because the first dimension of the procedure (isoelectric focusing) is exquisitely sensitive to molecular charge and the second dimension (SDS electrophoresis) is sensitive to polypeptide length, 2-DE is very effective at revealing genetic variants (about one-third of which differ in net charge from wild type), proteolytic cleavages, and variations in sialic acid content. Several genetic variants have been discovered by 2-DE (2325), and a broad survey concluded that heterozygosity estimates obtained by this method were significantly higher than those obtained in cellular protein samples (26), possibly reflecting a lesser selection pressure against structural change in the highly soluble plasma proteins.
Many proteins in plasma show complex combinations of post-translational modifications (particularly involving glycosylation) that can be discriminated by 2-DE. In the haptoglobin ß chain, for example, three heterogeneities (attachment of 0, 1, 2, or 3 glycan structures; a unimodal distribution of sialic acid content; and two size forms) superimpose to create a complex but interpretable pattern of at least 43 resolved spots (27). The mean number of sialic acids present on plasma protein charge trains like haptoglobin ß can be summarized by computing a charge modification index (28). The results indicate that sialation remains relatively constant within a healthy person over time but differs significantly between individuals as expected if glycosylation (particularly sialation level) is under genetic control.4 The analysis of glycoforms was extended by Gravel and Golaz (29, 30) using lectins and Western blots to probe the glycosylation of plasma proteins revealed in the 2-DE pattern, a technique that clearly demonstrated the changes in plasma protein glycosylation associated with alcoholism and cirrhosis (31). In patients with the genetic disease carbohydrate-deficient glycoprotein syndrome this expectation is confirmed by a systematic loss of highly glycosylated forms of many proteins (32, 33), something so characteristic in the 2-DE pattern and so difficult to see by other means that the technique found its first unique clinical use in the diagnosis of this disease. A similar but less extensive loss of glycosylation (transferrin, haptoglobin ß, and 1-antitrypsin) occurs with alcoholism (34), while the more serious liver pathology of cirrhosis (31) causes declines in the net amounts of haptoglobin and albumin, the most abundant product of the liver.
A series of 2-DE studies have examined aspects of the acute phase response in which many plasma proteins increase or decrease following a range of inflammatory insults. Changes in the plasma concentrations of acute phase proteins such as serum amyloid A have been found to be associated with severe head injury (35, 36), meningitis due to infection by Haemophilus influenzae type b (37), normal neonatal development (38), sepsis in newborns (39), and viral infections (40). The latter work by Bini et al. (40) indicates particularly interesting differences between the response of the body to bacterial and viral infection: 18 of 18 bacterial infections gave rise to elevated serum amyloid A, while only 6 of 16 viral infections did, and while the acute phase changes in other proteins were simultaneous in bacterial infection, they were staggered in viral infection. Other work showed that the plasma proteome changes associated with rheumatoid arthritis are very similar to those provoked by typhoid vaccination (a classic acute phase inducer in humans) and that reversal of these changes could be used to assess the efficacy of non-steroidal anti-inflammatory drugs (41). These results indicate both the generality of the acute phase response and the power of proteomics to subdivide its features in diagnostically useful ways.
Many other disease states and developmental processes have been examined. Tissot et al. demonstrated characteristic changes in the plasma 2-DE pattern indicative of monoclonal gammopathies, hypergammaglobulinemia, hepatic failure, chronic renal failure, and hemolytic anemia (42) as well as progressive changes during fetal development (43). In vivo covalent binding of ampicillin to multiple plasma proteins (not simply albumin) was demonstrated (44), providing a general method to probe covalent drug binding in plasma following drug treatment. Other changes have been reported associated with malnutrition (45), haptoglobin in Duchenne muscular dystrophy (46), haptoglobin in Down syndrome (47), apoA-I during parturition (48), return from extended space flight (49) (possibly an acute phase response to resuming 1 x g), apoA-I isoforms in heart disease (50), human chorionic gonadotropin isoforms in patients with trophoblastic tumors (51), apoE isoforms in chronic spinal pain (52), and oxidized plasma proteins in Alzheimers disease (53). Important parallel work on the rat plasma proteome and its perturbation by experimental treatments (not reviewed in detail here) have been carried out by Gianazza and colleagues (54, 55).
The immunoglobulins represent a special class of plasma proteins because of the tremendous sequence heterogeneity involved in generating their functional specificities: an estimated 107 different forms may be in circulation in a healthy human at one time. Individual Ig light chain sequences typically appear as single spots in 2-DE (56), thus allowing up to a few hundred "overabundant" sequences to be inspected as indicators of the activity of individual Ig-producing B-cell clones, while the heavy chains of all classes are glycosylated, adding an additional heterogeneity that prevents their being resolved and used as clonal indicators. Thus the position (pI and molecular weight) of a heavy chain is indicative of its class (18, 5760) (i.e. falling in a band associated with µ, ,
,
, or
chains representing IgM, IgG, IgA, IgD, or IgE, respectively) and sometimes its
chain subclass (61), but each sequence appears as a charge train overlapping others. The work on class properties by 2-DE led to successful clinical applications in detecting and classifying myelomas (clonal B-cell tumors in which a single Ig molecule is vastly overproduced and where prognosis is related to heavy chain class) because a simple quantitative test for the Ig class is not sufficient and low resolution methods may not be able to determine whether the protein is truly clonal. However, to follow the overall clonal development of the immune system it has been necessary to characterize the light chains, whose clonal spots occur in two partially overlapping bands (representing the
and
classes). Tissot et al. observed transient "clonal imbalances" in the light chains of patients after immune system reconstitution by bone marrow transplantation (62) and in normal individuals in the 2nd to 4th months after birth (63), both of which are consistent with an early expansion of selected B-cell clones in response to a few new dominant antigenic stimuli, followed weeks or months later by the emergence of the fully mature, heterogeneous response to all the antigens of daily life. The same phenomenon occurs when mice are removed from the protection of a germ-free facility and exposed to the antigens and infectious agents of the normal world, and a similar result appears in the Ig patterns of some cancer patients.4 One effort (64) has been made to develop a quantitative measure of the "texture" of the complex 2-DE pattern of light chain spots (classifying patterns as monoclonal, oligoclonal, or normal), but the more difficult challenge of tracing the time course of multiple clones in an individual over time (likely to be particularly relevant in cancer) has not yet been successfully addressed.
Mass Spectrometry
MS has solved the problem of identifying proteins resolved by two-dimensional gel and other methods and appears poised to provide general solutions to the analysis of complex protein mixtures as well. In the latter category, two general classes of approach can be distinguished: first, the "unbiased" discovery of proteins and peptides achieved via their detection or identification in a sample, and second, the quantitative measurement of protein or peptides, usually requiring some type of additional standardization. In many cases these methods are just beginning to be applied to the plasma proteome but offer sufficient promise for future work that their review here is necessary.
MS Discovery
The power of mass spectrometry techniques to discover proteins in complex samples relies, with one notable exception described below, upon the existence of large protein sequence databases generally derived from DNA sequencing efforts. Since these databases are becoming comprehensive, the approach offers, at least in theory, a general solution to protein discovery. So far MS efforts have examined three basic windows into the proteome problem: whole proteins, peptide fragments obtained by digesting proteins in vitro (e.g. with trypsin), and naturally occurring peptides (the low molecular weight proteome, or peptidome).
Whole proteins can be analyzed by an approach termed SELDI-TOF (for surface-enhanced laser desorption ionization-time of flight) mass spectrometry, a variant of MALDI-TOF (for matrix-assisted laser desorption ionization-time of flight), in which chemical fractionation based on protein affinity for derivatized MS targets is used to reduce sample complexity to a level at which whole protein MS can resolve a series of fairly reproducible features. This approach has been implemented commercially (e.g. by Ciphergen and Lumicyte) and used to discover patterns of protein features associated with disease (e.g. the ovarian cancer markers discovered by Petricoin et al. (65)). A significant disadvantage of the approach is that MS analysis of whole proteins does not directly provide a sequence-based identification (there being many proteins with close to a given mass), and hence the protein peaks discovered as markers are not strictly speaking identified without significant additional effort. In particular, without a discrete identification, it is not generally possible to demonstrate that a peak is one protein analyte or to translate the measurement into a classical immunoassay format. However, as has been clearly demonstrated by the success of monoclonal-based assays, this does not pose a significant limitation to clinical use if the technology allows the analysis to be repeated in any interested laboratory (an effort that now appears to be underway).
A more general approach involves digesting proteins (e.g. with trypsin) into peptides that can be further fragmented (MS/MS) in a mass spectrometer to generate a sequence-based identification. The approach can be used with either electrospray ionization (ESI) or MALDI and is typically applied after one or more dimensions of chromatographic fractionation to reduce the complexity of peptides introduced into the MS at a given instant. Optimized systems of multidimensional chromatography, ionization, mass spectrometry, and data analysis (e.g. the multidimensional protein identification technology, or "MudPIT" approach of Yates, also referred to as shotgun proteomics) have been shown to be capable of detecting and identifying 1,500 yeast proteins in one analysis (66), while a single dimensional LC separation, combined with the extremely high resolution of a Fourier-transform ion cyclotron resonance MS identified more than 1,900 protein products of distinct open reading frames (i.e. predicted proteins) in a bacterium (67). In human urine, a sample much more like plasma than the microbial samples mentioned above, Patterson (68, 69) used a single LC separation ahead of ESI-MS/MS to detect 751 sequences derived from 124 different gene products. Very recently, Adkins et al. (70) have used two chromatographic separations with MS to identify a total of 490 different proteins in human serum, thus substantially expanding the proteome. Such methods should have the ability to deal with the numerous post-translational modifications characteristic of many proteins in plasma as demonstrated by their ability to characterize the very complex post-translational modifications occurring in aging human lens (71) (73 modified sites characterized in 11 crystallins). The sensitivity obtainable in such an analysis has been tested by Wu et al. (72) using human growth hormone spiked into human plasma at a concentration of 16 ng/ml. Using only a reverse phase separation to resolve a tryptic digest of whole spiked plasma, a single human growth hormone peptide was observed (among 200+ proteins apparently identified), and with additional fractionation processes, additional human growth hormone peptides were seen, confirming its detection.
Naturally occurring peptides, typically below the kidney size cutoff and hence usually collected from urine or from blood hemodialysate, provide a complementary picture of many events at the low mass end of the plasma proteome. Thousands of liters of human hemodialysate can be collected from patients with end stage renal disease undergoing therapeutic dialysis (73), and although it contains only 50 µg/ml protein/peptide material, it provides a large scale source of proteins and peptides below 45 kDa. Such material has been analyzed by combined chromatography and MS approaches to resolve 5,000 different peptides, including fragments of 75 different proteins (74). Fifty-five percent of the fragments were derived from plasma proteins, and 7% of the entries represented peptide hormones, growth factors, and cytokines.
MS Quantitation
The well known idiosyncrasies of peptide ionization, which have been major impediments to accurate quantitation by mass spectrometry, can be overcome through the use of stable isotope-labeled internal standards. At least four suitable isotopes (2H, 13C, 15N, and 18O) are commercially available in suitable highly enriched (>98 atom %) forms. In principle, abundance data as accurate as that obtained in MS measurement of drug metabolites with internal standards (coefficients of variation <1%) should ultimately be obtainable. In the early 1980s 18O-labeled enkephalins were prepared and used to measure these peptides in tissues at ppb levels (75), and in the 1990s GC/MS methods were developed to precisely quantitate stable isotope-labeled amino acids, and hence protein turnover, in human muscle (76, 77) and plasma (78, 79) proteins labeled in vivo. The extreme sensitivity and precision of these methods suggested that stable isotope approaches could be applied in quantitative proteomics investigations given suitable protein or peptide labeling schemes.
Over the past several years, a variety of such labeling strategies have been developed. The most straightforward approach (incorporation of label to a high substitution level during biosynthesis) has been successfully applied to microorganisms (80, 81) and mammalian cells in culture (82) but is unlikely to be usable directly in humans for cost and ethical reasons. A related approach (which is applicable to human proteins) is the now-conventional chemical synthesis of monitor peptides containing heavy isotopes at specific positions. Postsynthetic methods have also been developed for labeling of peptides to distinguish those derived from an "internal control" sample from those derived from an experimental sample with a labeled/unlabeled pair subsequently being mixed and analyzed together by MS. These methods include Aebersolds isotope-coded affinity tag approach (83) as well as deuterated acrylamide (84) and N-alkylmaleimides (85) for labeling peptide sulfhydryls, deuterated acetate (86) to label primary amino groups, N-terminal-specific reagents (87), permethyl esterification of peptide carboxyl groups (88), and addition of twin 18O labels to the C terminus of tryptic peptides during cleavage (89, 90).
In the realm of the plasma proteome, as noted above for MS discovery approaches, the principle applications have occurred in commercial companies and not yet been published. Nevertheless we conclude that MS quantitation methods are mature enough to successfully measure tens to hundreds of proteins in plasma on an experimental basis once suitable standards are available.
Antibody Arrays
Antibody microarrays were developed by Ekins (91) in the mid-1980s (i.e. before DNA microarrays) for measuring proteins in plasma, thus miniaturizing as "microspots" the well established immunoassay concept. At the time of these developments, the number of proteins of clinical utility was relatively small. Now, however, with the expected proliferation of useful protein analytes, there is renewed interest in the concept in a variety of physical formats (for a recent review, see Ref. 92). Detection sensitivity can be very good as demonstrated by Schweitzer et al. (93) who recently showed that 75 different cytokines could be measured (using glass-arrayed monoclonal capture antibodies, polyclonal second antibodies, and rolling circle amplification) with sensitivities ranging from 1 to 1,000 pg/ml. This suggests that such systems could be used to measure all of the known plasma proteome components for which high affinity antibody pairs are available. The critical issue is the latter: the generation of antibodies suitable for high sensitivity immunoassays. Even given the resources of large clinical diagnostic organizations, the generation of such antibodies and assays using them is a painstaking process considerably more difficult than getting an antibody usable in most laboratory research. The extensive effort devoted in the past to making good cytokine antibodies accounts for the fact that most antibody array demonstrations focus on this application and also indicates why antibody arrays have not so far made a major contribution to protein discovery in plasma. Nevertheless, as the repertoire of suitable antibodies expands, microarrays provide the most likely path to low cost, routine measurement of large numbers of plasma markers required for an impact on medical practice.
Arrays of large numbers of expressed antigens have also been used (94) very effectively to define populations of autoantibodies present in human plasma. This approach improves on the classical gel Western blot by presenting all the antigens in a more native state and at approximately equal abundances, thus enabling a much more informative picture of the activity of the immune system.
The Plasma "Interactome"
As noted below, almost half the known proteins in plasma are too small to persist there as monomers, and many of the larger proteins are known to exist in complexes as well. Hence it is likely that a large majority of plasma proteome exists in multiprotein complexes. In addition there are in plasma a significant number of proteins defined by their ability to bind others and thereby affect function (e.g. the tumor necrosis factor-binding proteins, insulin-like growth factor-binding proteins, etc.). Such interactions have been studied using a variety of classical and modern techniques. The most well known particulate complexes of plasma are the plasma lipoprotein particles, whose protein components have been enumerated by 2-DE (95). An alternative approach, in which 2-DE-resolved plasma proteins are stained with a specific protein ligand, has been used to characterize and extend the set of insulin-like growth factor-binding proteins of rat serum (96). Recently elegant and systematic methods have been devised for enumerating protein complexes in yeast using a combination of molecular tags to capture individual complexes followed by LC/MS/MS (with or without a gel separation step) to analyze them (97). These methods are not directly applicable to plasma (since they require alteration of a gene to introduce the peptide tag during biosynthesis), but specific antibodies can substitute for such tags in many cases, used either for "pull-down" of the complex or binding it to a sensor surface. In one successful example of the latter approach, immobilized antibodies to transthyretin captured both transthyretin and its interaction partner retinol-binding protein as indicated by MALDI-MS of the bound material (98). While the problem of distinguishing possible antibody cross-reactions from weak associations of the target proteins will always need to be addressed, it is nevertheless likely that a systematic plasma interactome will be built using antibodies to define the associations between the hundreds of proteins detected.
Genomics
It may be easier to search for and measure a predicted protein at very low abundance in plasma (e.g. via a sensitive immunoassay) than to detect it using an unbiased discovery approach. Hence there is a value in predicting, if possible, all the secreted human proteins as candidates for such a quantitative approach. We expect the in silico analysis of human genome sequences to contribute a significant number of previously unknown candidate proteins to be sought in plasma. Several efforts have been made to identify secreted proteins via signal sequence features (either including or excluding the extracellular domains of appropriate transmembrane proteins), and these have variously generated 1,000 (99) to 9,0005 potential secreted proteins. To date, however, this information has been viewed primarily as a potential commercial source of therapeutic targets, and its relevance to expansion of the plasma proteome from a measurement perspective awaits public release of detailed data.
![]() |
THE KNOWN PLASMA PROTEOME |
---|
|
One of the revealing features of this collection of proteins is its molecular mass distribution (Fig. 2), which shows that a substantial fraction of proteins in plasma (just over 50%) are smaller than the presumed size cutoff of the kidney filtration apparatus (45 kDa). This suggests that, to persist in plasma without rapid loss through the kidney, most of these proteins must be part of larger protein complexes or else subject to specific retention mechanisms.
|
![]() |
DYNAMIC RANGE OF PROTEINS IN PLASMA |
---|
|
There can be a significant division of opinion over whether the low abundance plasma components are more "interesting" or clinically meaningful than those of high abundance (or vice versa). In fact we know that proteins at all abundance levels prove to be useful as evidenced by the existence of clinical assays for all those plotted in Fig. 3. More important questions are where are the proteins in various abundance classes coming from and how many of them are there? Here the clinically measured proteins seem to be divided into three major classes: plasma proteins at the high abundance end, tissue leakage proteins in the middle, and cytokines at the low abundance end.
Tissue leakage proteins are important because a serious pathology can be detected in a small volume of tissue by measuring release into plasma of a high abundance tissue protein. Cardiac myoglobin (Mb) is present in plasma from normal subjects at 185 ng/ml but is increased to 2001,100 ng/ml by a myocardial infarction (102) and up to 3,000 ng/ml by fibrinolytic therapy to treat the infarct (103). The last figure perhaps best represents a complete release of highly soluble (>99% cytosolic (104)) Mb as a result of clearing blocked vessels perfusing the damaged tissue volume. Given an average blood volume of 4.5 liters in a 70-kg male (105) and an average volume proportion of plasma in blood of 55%, there are 2.5 liters of plasma in the average person, containing roughly 250 g of plasma protein. A level of 3.0 µg/ml Mb is equivalent to the release and dilution of 7.5 mg of Mb into the whole plasma volume, which, since Mb accounts for 0.28% of total cardiac protein (106), would be equivalent to complete release of Mb from about 2.7 g of cardiac tissue. This is clearly an underestimate since moderate infarct masses are typically
45 g (107), and thus it is probable that something like 5% of the releasable Mb is in circulation at the relevant time point (a marker recovery efficiency of about 5%). A protein such as troponin-T, of which only 8% is cytosolic (104), is released at a 300-fold lower level (
10 ng/ml following reperfusion therapy for myocardial infarction (108)), indicating a much lower marker recovery efficiency. In the future, all tissue proteins will be candidate damage markers, and they will be sought at even lower abundance levels, raising important questions about the normal background levels of tissue destruction and remodeling in healthy individuals.
Cytokines, which in general act locally (at the site of infection or inflammation), are probably not active at their normal plasma concentrations (or even at the higher levels after a major local release) because they are diluted from microliter or milliliter volumes of tissue into 17 liters of interstitial fluid. Hence they are in a sense leakage markers as well, although their presence in plasma does not indicate cell breakage.
The question of sensitivity ultimately required for plasma proteome measurement is not easy to answer. The theoretical limit is clearly calculable as a single protein molecule (of e.g. 60 kDa) in a practically analyzable plasma sample (e.g. 1 ml): this is 10-7 pg/ml (
1 zeptomolar), or 78 orders of magnitude more sensitive than current immunoassay technologies. A technology to make such measurements would require a dynamic range of more than 1017 (between the analyte and albumin), exceeding even the capabilities of accelerator mass spectrometry (109) and approaching some hard physical limits. Nevertheless it is clear that sub-pg/ml sensitivities (low femtomolar concentrations and below) will be needed to allow measurement of many of the tissue proteins that might leak into the plasma. Whether this is compatible with a simultaneous capability to measure many proteins remains to be seen.
![]() |
PLASMA PROTEIN THERAPEUTICS |
---|
Additional plasma products include 1 protease inhibitor 3(a treatment for emphysema caused by genetic deficiency), factor VIII concentrate (for prophylaxis and treatment of hemophilia A bleeding episodes and von Willebrand disease), anti-inhibitor coagulant complex (for treatment of bleeding episodes in the presence of factor VIII inhibitor), anti-thrombin III (for prevention of clotting and thromboembolism associated with liver disease and anti-thrombin III deficiency), factor IX complex (for prophylaxis and treatment of hemophilia B bleeding episodes), factor XIII (for prophylaxis and treatment of bleeding episodes due to factor XIII deficiency), and fibrin sealant (which helps to heal wounds following surgery) as well as a series of specific immune globulin products (e.g. hepatitis B, chicken pox, measles, rabies, tetanus, vaccinia, hepatitis A, and cytomegalovirus). In principle, many other components of the plasma proteome could become useful therapeutic products produced by similar methods.
![]() |
PLASMA DIAGNOSTICS |
---|
The use of plasma and serum for disease diagnosis is thus an obvious approach and one that has been undertaken with some success for many decades (110). Nevertheless, the potential of the "direct" approach to plasma proteome diagnostics (finding and measuring a single protein that is unambiguously associated with a disease) seems to be waning. This is both a challenge and an opportunity requiring us to inquire carefully into the successes and the failures of existing diagnostics as well as the biological foundation of the protein marker approach.
The Decline in New Protein Tests
Effective diagnostic tests for proteins in plasma, of which there are many, ultimately appear as commercial products sold to clinical laboratories worldwide. Since 1993, a total of just under 7,000 assays applicable to the determination of 117 proteins or peptides in plasma have been approved by the FDA under the Clinical Laboratory Improvement Amendments (CLIA) that govern medical use of clinical tests in the United States. Most of these assay products were in existence before the CLIA regulations came into effect, and since 1994 only 10 new protein analytes have been added (Fig. 4). Since 1998, only two new protein analytes have been the subject of such an FDA approval: B-type natriuretic peptide for use in diagnosing congestive heart failure and a cancer-related epitope (CA 19-9) recognized by a monoclonal antibody. It is true that not all new protein diagnostic tests are implemented as machine-based, FDA-approved assays: some are implemented through "analyte-specific reagents" (typically antibodies) sold to clinical laboratories for implementation of in-house tests. However, since a test based on analyte-specific reagents must be labeled on the laboratory result report as "not cleared or approved by the FDA," they represent a very modest proportion of clinical diagnostics, and the FDA does not provide a public database listing analyte-specific reagent analytes. Thus we believe it is correct to conclude that the pace of introduction of new protein markers through full FDA approval has substantially slowed in recent years (Fig. 5), reminiscent of the similar decline in introduction of new therapeutic drug classes. This picture is clearly at odds with our expectation that genomics and proteomics are transforming the clinical landscape through diagnostic application of knowledge on large numbers of new proteins. What accounts for this huge discrepancy?
|
|
Interpretation of Diagnostic Protein Measurements: What Is Considered Abnormal?
The utility of a protein measurement is currently determined by the definition of what is and what is not in the reference interval (a term adopted as a replacement for the sometimes ambiguous "normal range"). It is conventional practice in clinical chemistry to assign the reference interval based on an analysis of the distribution of measurement results obtained by analyzing a large number of samples from people that satisfy some established criteria making them normal with respect to the test in question. The reference interval is typically set as the range (about 2 standard deviations above and below the mean value) within which 95% of the results from the normal population lie: 2.5% of the reference population will be classified as above the normal range, and 2.5% will be classified as below.
The concept that an individuals plasma level of a particular protein should be within +2 standard deviations of the reference populations mean value is obviously of limited value, a fact that has been recognized by clinical chemists for more than 20 years (111, 112). Because of the differences among normal people (due to genetics, age, sex, and life history including disease) there are likely to be significant differences in the normal, healthy levels for many individuals. This has two effects: it broadens the populations distribution of values (and thus requires larger variations to be treated as significant), and it misclassifies outliers who are healthy. A simple reference interval cutoff will misclassify many of these as abnormal when they are normal for themselves and miss many abnormal values when disease shifts a normal outlier into the population normal range. An example of this kind of data is shown in Fig. 6 using published data for C-reactive protein measured every 3 weeks for 6 months in a number of individuals (113).
|
There is also a significant theoretical problem with the notion that there should be a single protein in plasma whose level changes in response to one specific disease. If the disease is caused by the protein abnormality, as in insulin-dependent diabetes, the reasoning is clear. However, for multifactorial diseases, such as atherosclerosis, or diseases whose cause is likely to be tissue-based, such as osteoporosis, it seems improbable that there should be a very specific single marker.
Contribution of Genetics to the Noise in Protein Measurements and Its Removal
Many influences affect the concentration of proteins in plasma besides disease, and if these are not taken into account, the detection of medically meaningful changes will be more difficult. Influences can be considered as 1) genetic, 2) non-genetic other than medical treatment, 3) related to medical treatment, such as drug therapy, and 4) related to sample handling. The last is typically a small effect since methods of venipuncture typically used in studies of plasma proteins are well standardized, and a recent report demonstrates that extended storage at -70 °C preserves the structure and activity of plasma proteins tested (115).
The effects of differing genetic constitutions is evident both at the population level (e.g. race (116, 117) differences in plasma protein abundances) and the individual level. Twin studies, in which quantitative protein measurements are compared within and between monozygotic twin pairs, have been carried out for a small number of individually assayed plasma proteins as well as by simple one-dimensional electrophoresis (118) (Table II). In aggregate these studies show that 1295% of the quantitative variation in specific plasma protein levels is genetic in origin with an average of 62% for the proteins shown. Gel patterns of plasma proteins from monozygotic twins are quantitatively almost indistinguishable whether analyzed by one-dimensional (118) or two-dimensional approaches.
|
|
The effects of drugs on protein levels have also been examined extensively and tabulated periodically by Young (136, 137). The effects observed can be more or less direct as when an intramuscular drug injection causes an increase in muscle enzymes in plasma through leakage or localized tissue toxicity or when a diuretic alters total protein concentration by changing blood volume. Alternatively they can result from regulatory changes in target organs as when liver enzymes are elevated (a side effect associated with a substantial proportion of commonly prescribed drugs, albeit in only a subset of patients). Such effects of drugs should not be thought of as noise because they represent real biological effects and because these effects may (indeed should) be related (ideally inversely) to the disease effects they are meant to counteract.
Of these influences, genetic effects on protein abundance in plasma are usually the largest and in many ways the least useful from the individuals point of view. Effects of environment or drug administration are either small or else useful as signals of medically relevant events. Taken together these considerations suggest that the most productive approach to avoid the difficulties of the classical reference range problem is to use the individual as his or her own reference, comparing measurements taken over time to see changes associated with disease or treatment. Such an approach would eliminate the genetic effects on protein levels and go some distance toward minimizing many of the non-genetic influences as well since most lifestyle changes are made slowly. In particular, it appears clear that individual reference intervals could be decreased on average about 50% if a self-standardized approach were taken, thereby potentially improving the detection of many disease states at an earlier stage.
Multiparameter Indicators
An additional improvement, both in terms of statistical power and breadth of coverage, is to use many protein measurements instead of one or a few. Several streams of scientific effort have generated data supporting this conclusion. More than 20 years ago it became clear that different tumor cell types could be distinguished based on patterns of metabolites analyzed by GC/MS (138) and that a panel of biochemical markers (most of them simple low molecular weight clinical analytes) could "recognize" individuals within a group over periods of years (139) when analyzed by appropriate multivariate statistical methods. There were efforts to use the latter approach to detect disease signatures in then-standard 20-analyte serum chemistry panels, but these met with little success, probably due to the character and small number of the analytes.
Once specific protein measurements were available, a veritable industry was created exploring their correlation with various diseases and treatments. Thousands of scientific studies explore these correlations, but the number of protein analytes used is typically less than five (often one), and usually one disease or treatment is examined. Particularly valuable are the large epidemiological studies in which a number of proteins have been measured with high precision in large numbers of samples (140).
Proteomics, in the early stages of its use of two-dimensional electrophoresis, also focused on multivariate effect fingerprints. In the early 1980s insect developmental changes (141), mammalian cell type classification (142), and chemical effects on protein expression patterns (142) were examined using multivariate approaches with clustering of large panels of protein markers, methods derived from molecular taxonomy. Later similar methods were applied to analyze the effects of drugs and other toxic agents in rodent tissues, particularly liver. These studies examined the relationships between the effects of compounds within a mechanistic class (143, 144) and showed that a mechanism outside the known class could be recognized based on a multivariate protein pattern change (145). Quantitative serum protein proteomics was used to show that a panel of high abundance acute phase-related proteins could give a better statistical measure of inflammation than the classical marker serum amyloid A (41). Cancer tissue samples could be analyzed to distinguish tumor type and prognosis (146), and a panel of six cancer markers in plasma was found to be useful (147, 148). More recently, mass spectrometry-based proteomic approaches have been used as well to discover patterns of disease-related protein features related to a specific cancer (65).
The concept and utility of multivariate protein markers has thus been established for some time. What requires comment is why this approach has not penetrated significantly into clinical practice. To some extent the barrier has been the difficulty of demonstrating the clinical superiority of multianalyte measurements in diagnosing and treating disease, attributable in part to the fact that a workable clinical test (something much more robust than two-dimensional gels (149), for example) is usually required before this demonstration can occur. Each step in the required progression from discovery to routine implementation has been impeded by the technical complexity of multiprotein measurements. While proteomics can demonstrate and sometimes measure many proteins, the techniques required have been difficult to apply to a number of samples large enough to prove a clinical correlation at the research level. Once that hurdle is passed, the technology required to do routine clinical measurement has usually not been the same as that used in the marker discovery process, requiring an expensive additional effort, typically requiring the skills of another group or institution. To date, a simple commercial calculation has provided more than adequate discouragement: if to measure n proteins (for use as a single marker) one must invest n x the resources required to measure a single protein marker while the potential market size remains the same, then resources are better spent searching for single markers. The proteins shown in Fig. 3 can all be measured in a single sample of plasma, but the commercial cost using individual assays is $10,896.30. Thus, in the end, the success of multianalyte diagnostics is as much a matter of cost as science.
![]() |
TRANSLATING THE PLASMA PROTEOME INTO USEFUL DIAGNOSTIC RESULTS |
---|
Lack of a Shared Knowledge Base
The plasma proteome occupies an important position at the intersection between three primary fields: proteomics (a technology-based research discipline), medicine (a profession serving the ill patient), and the diagnostic industry (a business delivering data to medicine under growing financial and regulatory constraints). Currently the appearance of the plasma proteome from these three directions is quite different, and thus communication among the participants required to achieve an advance in plasma-based diagnostics can be challenging when it comes to describing specific proteins and their significance (as was recognized some years ago by Merril and Lemkin (150)). This problem will become much more troublesome when multiple proteins are involved. A useful model that could be adapted in this context is the shared literature annotation database pioneered by Garrels in yeast (151, 152) and subsequently expanded to cover some aspects of the human proteome (e.g. G-protein-coupled receptors (153)). These databases collect, analyze, and abstract the published scientific literature for each protein in the collection and relate the proteins to external non-text resources such as genome sequences and genetic data. While the present scope of a Plasma Proteome Database is smaller in terms of number of known proteins, it is far larger in terms of the relevant literature to be summarized and will require a concerted ongoing effort.
Discovery of Low Abundance Proteins and Their Disease Relationships
Most of the proteins and disease markers in plasma remain to be discovered and doing so will be very challenging because of the technical issues discussed above (especially dynamic range). A deep exploration effort is required coupled with one or more technologies that can measure newly found proteins in significant sets of carefully collected clinical samples. Access to these technologies and samples spans a wide landscape divided by many intellectual property fences, compounding the technical problems.
Acceptance of Multivariate Protein Panels as Medical Tests
As argued above, we feel that multiprotein panels will be required to achieve broad coverage of disease states where single markers do not exist, and that panels are likely to be more reliable even when there is a single marker. A series of convincing demonstrations, undertaken in a valid clinical context, of this assertion are required. These demonstrations require samples, identified proteins for testing, measurement platforms, and advanced data analysis, all of which are expensive and difficult to assemble. From a commercial viewpoint, the problem is magnified by the breadth of intellectual property that must be assembled: many protein markers are the subjects of existing patents or applications, and it is unclear how this will impact use of those proteins as components of multivariate tests. Acquiring samples and advanced measurement technologies expose an additional series of patent concerns. Finally there is the issue of regulatory approval: regulatory agencies have been unenthusiastic about multivariate tests in the past, often for good reasons. In the future a serious effort will be required to demonstrate that these difficulties can be overcome.
Acceptance of the Individual as His/Her Own Reference with Concomitant Repeated Testing to Obtain Time Series Data
The concept itself is not unfamiliar: many people have annual measurements of blood prostate-specific antigen or lipoprotein cholesterol. However, in most cases these values are still compared with a simple reference interval and not examined as an explicit function of time to detect trends even when a trend is the purpose of the measurement. How many of us have seen a graph of our cholesterol over time, even those of us who take cholesterol-lowering drugs? When each diagnostic episode is considered in isolation, much of the patient context (the personalization) of the diagnostic is lost. We thus need to look at the balance between the added cost and inconvenience (periodic blood sampling) on the one hand and the increased ability to detect and manage disease on the other. We suspect that there are major benefits of routine analytical monitoring in the area of chronic disease and aging, areas in which large aggregate expense can be saved by an early intervention. If this is the case, then there may be a major economic incentive to the diagnostic industry in the patients repeated, rather than occasional, use of a diagnostic system. This benefit comes, however, with an added challenge: because the individuals reference interval is narrower than the populations, the measurements may need to be more accurate to obtain the added value. The scientific challenge is to demonstrate beyond argument the quantitative benefit of using a person as his/her own control.
Lack of Systematic Knowledge of the Genetic Control of Diagnostic Marker Abundance
Unless we know the level of genetic control over the amount of a protein in plasma, we do not know whether an individuals value is a measure primarily of genetics (which affects risk but cannot be changed) or of phenotype (which reflects risk and can be changed). It can be argued that the latter markers are candidates for repeated measurement (for monitoring disease progression and therapy), while the former are not (once is enough).
Cost
As noted above, the cost of protein tests, averaged over all the proteins for which there are now clinical assays, is more than $100 per protein measurement. If we need another 150 proteins measured by approved clinical tests, will we need the same (huge) aggregate investment that was required to develop the 150 tests on the market today? Clearly this cannot be: one or more major advances are required that decrease the cost of multiprotein panel measurements by a factor of 10, and preferably 100, while generating data of very high quality and a reasonably profitable business. Major reagent and manufacturing innovations, particularly in the area of antibody arrays and their equivalents, will be required to achieve these goals.
Regulatory Issues
The regulatory standards to be met by new protein analytes and new measurement technologies for the clinical laboratory are very high. The measurements must not only be accurate, but they must have a concrete medical implication. The cost of demonstrating medical meaning has to date been very high, requiring clinical trials for each analyte in each indication. This tradition would pose a grave problem if panels of new proteins were required to be the subject of trials one at a time. Instead it may be appropriate to explore a more general retrospective trial process (using banks of well characterized stored samples) that could allow a stream of new protein analytes to be evaluated and added to panels as they demonstrate added statistical value. Changes such as this would require at the outset that regulatory agencies become comfortable with multivariate protein diagnostics treated as panels and with the new technologies required to make them cost-effective. At present the data needed to produce this comfort are only beginning to appear.
There is an essential, non-commercial, but very focused, research component required to overcome these barriers, one that is addressed tangentially in many institutions but that we believe will require the commitment of a dedicated team.
![]() |
CONCLUSIONS |
---|
Why, in fact, should we not be able to monitor some of the fundamental process parameters of human biology in the way that real engineers monitor complex systems? Why should we not be able to expand the range of monitored diseases to include a much wider range of chronic or slowly developing conditions? Why should we not have a long term patient "protein history" showing the slow changes associated with such diseases and tracking various therapeutic options to see which anti-inflammatory drug, for example, best down-regulates the inflammation of rheumatoid arthritis in a particular individual? Clearly we should, and we can, by taking a rational approach to exploration and measurement of the plasma proteome.
![]() |
FOOTNOTES |
---|
Published, MCP Papers in Press, October 29, 2002, DOI 10.1074/mcp.R200007-MCP200
1 The abbreviations used are: CA, cancer antigen; 2-DE, two-dimensional electrophoresis; LC, liquid chromatography; Ig, immunoglobulin; MS, mass spectrometry; MALDI, matrix-assisted laser desorption ionization; TOF, time of flight; ESI, electrospray ionization; GC, gas chromatography; FDA, Food and Drug Administration; CLIA, Clinical Laboratory Improvement Amendments; CV, coefficient of variation.
2 Swiss Institute of Bioinformatics, SWISS 2D-PAGE, ExPASy Molecular Biology Server at us.expasy.org/ch2d/.
3 R. Pieper, T. Gatlin, S. Steiner, and L. Anderson, manuscript in preparation.
4 N. L. Anderson, unpublished.
5 Human Genome Sciences Inc., B Lymphocyte Stimulator (BLyS) at www.hgsi.com/products/BLyS.html.
6 Plasma Protein Therapeutics Association, Industry Facts at www.plasmatherapeutics.org/ppta_worldwide/wo_industry_facts.htm.
7 Georgetown Economic Services, Monthly Industry Aggregation Data, www.plasmatherapeutics.org/ppta_namerica/0302_data.pdf.
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
To whom correspondence should be addressed: The Plasma Proteome Inst., P.O. Box 53450, Washington, D. C. 20009-3450. Fax: 202-234-9175; E-mail: leighanderson{at}plasmaproteome.org
![]() |
REFERENCES |
---|