MALDI-TOF Mass Spectrometry Analysis of Cerebrospinal Fluid Tryptic Peptide Profiles to Diagnose Leptomeningeal Metastases in Patients with Breast Cancer*

Lennard J. Dekker{ddagger}, Willem Boogerd§, Guenther Stockhammer, Johannes C. Dalebout{ddagger}, Ivar Siccama||, Pingpin Zheng{ddagger}, Johannes M. Bonfrer**, Jan J. Verschuuren{ddagger}{ddagger}, Guido Jenster§§, Marcel M. Verbeek¶¶, Theo M. Luider{ddagger} and Peter A. Sillevis Smitt{ddagger},||||

From the {ddagger} Laboratory of Neuro-oncology, Department of Neurology, Dr Molewaterplein 40, 3015 GD, and §§ Department of Urology, P.O. Box 1738, 3000 DR, Erasmus MC, Rotterdam, The Netherlands, Departments of § Neurology and ** Clinical Chemistry, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX, Amsterdam, The Netherlands, Department of Neurology, University of Innsbruck, Anichstrasse 35, 6020, Innsbruck, Austria, || Chordiant, De Lairessestraat 150, 1075 HL, Amsterdam, The Netherlands, {ddagger}{ddagger} Department of Neurology, Leiden University Medical Centre, P.O. Box 9600, 2300 RC, Leiden, The Netherlands, and ¶¶ Department of Neurology, University Medical Center Nijmegen, P.O. Box 9101, 6500 HB, Nijmegen, The Netherlands


    ABSTRACT
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Leptomeningeal metastasis (LM) is a devastating complication that occurs in 5% of patients with breast cancer. Early diagnosis and initiation of treatment are essential to prevent neurological deterioration. However, early diagnosis of LM remains challenging because 25% of cerebrospinal fluid (CSF) samples produce false-negative results at first cytological examination. We developed a new, MS-based method to investigate the protein expression patterns present in the CSF from patients with breast cancer with and without LM. CSF samples from 106 patients with active breast cancer (54 with LM and 52 without LM) and 45 control subjects were digested with trypsin. The resulting peptides were measured by MALDI-TOF MS. Then, the mass spectra were analyzed and compared between patient groups using newly developed bioinformatics tools. A total of 895 possible peak positions was detected, and 164 of these peaks discriminated between the patient groups (Kruskal-Wallis, p < 0.01). The discriminatory masses were clustered, and a classifier was built to distinguish patients with breast cancer with and without LM. After bootstrap validation, the classifier had a maximum accuracy of 77% with a sensitivity of 79% and a specificity of 76%. Direct MALDI-TOF analysis of tryptic digests of CSF gives reproducible peptide profiles that can assist in diagnosing LM in patients with breast cancer. The same method can be used to develop diagnostic assays for other neurological disorders.


Leptomeningeal metastases (LM)1 arise when tumor cells metastasize to the cerebrospinal fluid (CSF). The flow of CSF results in widespread dissemination of the tumor cells along the surface of the central nervous system, causing symptoms by invading the brain, spinal cord, cranial nerves, and nerve roots (1).

One of the tumors most frequently associated with LM is breast cancer. During the course of the disease, ~5% of patients with metastatic breast cancer will develop symptoms caused by LM. This debilitating complication’s response to therapy depends upon early treatment. However, diagnosis of LM remains challenging because 25% of samples tested are false negative at the first cytological examination of the CSF, probably because of sampling error (1).

Protein expression profiling of body fluids from patients with cancer has recently become a valuable tool for obtaining information on the state of protein circuits inside tumor cells and outside the cells at the host-tumor interface (2, 3). In serum and CSF, low molecular weight proteins and peptides that are related to this altered microenvironmental "cancerous" state can be detected.

We studied the differential tryptic peptide profiles in the CSF from patients with breast cancer with and without LM and in CSF from control subjects. Studying CSF has several advantages over studying serum. First, tumor cells in LM patients are located in the CSF and in the leptomeninges that are surrounded by CSF. Before their transport into serum, tumor-related proteins will therefore first be shed into the CSF. Second, the normal protein concentration of CSF is 100- to 400-fold lower than in serum (4). This results in a significant over-representation of LM-related proteins in CSF compared with serum. The identification of protein profiles specific for LM may be helpful in diagnosing patients with clinical suspicion of LM but negative cytology. In addition, such proteins may reveal cellular mechanisms relevant to the biology of LM.

With the advent of mass spectrometry into the field of clinical proteomics, the comparison of large numbers of proteins in complex biological samples such as serum and CSF has become feasible (3, 5). Until now, the most commonly used instrument was the SELDI-TOF MS. Using SELDI-TOF MS analysis of various body fluids, discriminatory protein expression profiles have been identified in various diseases (3, 6). However, SELDI-TOF MS does not allow a direct identification of the discriminatory proteins and suffers from low reproducibility and accuracy (710). To improve the reproducibility and accuracy and find better ways to identify relevant discriminatory proteins (11), we first digested our samples with trypsin and analyzed the resulting peptide mixtures by MALDI-TOF MS (12). The reproducibility of this type of analyses has been described elsewhere (13).

We analyzed CSF samples from 106 patients with active breast cancer, 54 of whom had LM, and CSF from 57 control subjects. Tryptic peptide mixtures were measured by MALDI-TOF MS and analyzed using a newly designed bioinformatics tool. We could identify unique peptide patterns that discriminated the LM patients from the other patients with breast cancer and from control subjects.


    EXPERIMENTAL PROCEDURES
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Patient Selection—
Using clinical databases and CSF banks, we retrospectively identified all patients with breast cancer with available CSF samples collected in the last 7 years in four participating institutions (Erasmus MC, Netherlands Cancer Institute, UMC Nijmegen, and Innsbruck Medical University). We included only patients with advanced breast cancer defined as metastatic or locally progressive disease. Patients with positive cytology or with a compatible neurological syndrome and diagnostic MRI were considered to have LM (group I, n = 54). When cytology was negative and when clinical follow-up was incompatible with LM, patients were classified as having advanced breast cancer without LM (group II, n = 52). Control subjects (group III, n = 45) were patients who were not known to have cancer and who did not suffer any known neurological disease (collected at Erasmus MC, Netherlands Cancer Institute, and LUMC). We noted the date of diagnosis of breast cancer, the date of lumbar puncture, and the date of last follow up or death. The symptoms at the time of lumbar puncture were scored as either central, cranial nerve involvement, radicular, compatible with raised intracranial pressure, or other. The following CSF parameters were noted: total protein concentration, white cell count, and cytology (positive or negative). MRI and CT scans were scored as either positive, suggestive, or negative for LM. Nodular or focal linear meningeal enhancement was considered diagnostic of LM, and communicating hydrocephalus was considered suggestive of LM. All samples had been routinely centrifuged to discard cellular elements before storage at –80 °C.

Sample Preparation and Measurement of Samples—
All samples were blinded and analyzed in random order. From each sample, 20 µl of CSF was put into a 96-well plate, and 20 µl of 0.2% Rapigest (Waters, Milford, MA) in 50 mM ammoniumbicarbonate buffer was added to each well. The samples were incubated for 2 min at 37 °C. 4 µl of 0.1 µg/µl gold grade trypsin (Promega, Madison, WI) in 3 mM Tris-HCl was added to each well, and the 96-well plates were incubated at 37 °C. After 2 h of incubation, 2 µl of 500 mM HCl was added to obtain a final concentration of 30–50 mM HCl, pH < 2. The 96-well plates were then incubated again for 45 min. A 96-well zip C18 microtiter plate (Millipore Corporation, Bedford, MA) was prewetted and washed twice with 200 µl of acetonitrile per well. Full vacuum was applied to the plate using a vacuum manifold (Millipore). 3 µl of acetonitrile was put on the C18 resin without vacuum to prevent it from drying. Each sample was mixed with 200 µl of water HPLC grade/TFA 0.1%. Subsequently the samples were loaded on the washed and prewetted 96-well zip C18 plate (Millipore); a pressure differential of 5 inches of Hg vacuum was used. After the wells had been cleared, the wells were washed twice with 100 µl of 0.1% TFA. Full vacuum was applied until all wells were empty. The samples were eluted in a new 96-well plate with an elution volume of 15 µl of 50% acetonitrile/water HPLC grade 0.1% TFA; a pressure differential of 5 inches of Hg vacuum was used. After elution, the samples were stored at 4 °C in the 96-well plates covered with aluminum seals. All samples were spotted on a MALDI target (600/384 anchor chip with transponder plate; Bruker Daltonik GmbH, Bremen, Germany) in triplicate. To do so, 2 µl of elute was mixed with 10 µl of matrix solution (2 mg of {alpha}-cyano-4 hydroxycinnamic acid; Bruker Daltonik GmbH) in 1 ml of acetonitrile for 30 min using an ultrasonic bath). Afterward, samples were automatically measured on a MALDI-TOF MS (Biflex III; Bruker Daltonik GmbH). The digestion step was repeated twice for each sample, the purified peptides were spotted in triplicate, and all the spots were measured in triplicate. This resulted in 18 spectra for each sample. The standard method for peptide measurements on the MALDI-TOF MS was used (default file Bruker "1–2kD positive" with the measurement range changed to 300–3000 Da). For the automated measurements, the settings of initial laser power of 20% and a maximum of 35% were used. The highest peak above the 750 Da had to have a signal-to-noise ratio of at least 5 and a minimum resolution of 5000. After every 30 laser shots, the sum spectrum was checked for these criteria. If the sum spectrum did not meet these criteria, it was rejected. If 13 sum spectra from 30 shots met the criteria, these were combined and saved; when 50 sum spectra from 30 shots were rejected, the measurement of that spot was then ended, and the next spot was measured.

Analysis of Spectra—
First, the raw binary data files were converted to ASCII files containing the measured intensities for all channel indices of the spectra. We then developed a peak detection algorithm in the statistical language R (www.r-project.org). The definition of a peak (or local maximum) in this algorithm states that the intensity of the peak position has to be above a predefined threshold and has to be the highest intensity value in a surrounding mass window. This peak-finding algorithm was tested on a small set of spectra with different settings for the threshold and the mass window. The settings for the peak finding were chosen such that the resulting peak list most resembled peaks that would be manually assigned, thus optimizing the trade off between signal sensitivity and noise detection. We chose a percentile threshold of 98.5% (the intensity of the position must belong to the 1.5% highest intensity values of the spectrum) and a mass window of 0.5 Da. A quadratic fit with a number of internal calibrants was used to calibrate the channel numbers to masses. For this mass calibration, five omnipresent albumin peaks (960.5631, 1000.6043, 1149.6156, 1511.8433, and 2045.0959 m/z) were used. The accurate mass of these albumin peaks was obtained by performing a "tryptic digest" on the human albumin amino acid sequence with MS-digest (prospector.ucsf.edu/ucsfhtml4.0/msdigest.htm).

During the process of alignment and conversion, the quality of the spectra was checked as follows. If two or more of the omnipresent albumin peaks were not detected, the spectrum was not used in the further analysis. The peak finding algorithm was then used to create a list of peak positions for each individual spectrum. These peak lists were combined by comparing the lists one by one. If peak positions were present in a mass window of 0.5 dalton in both spectra, these peak positions were combined. The combined peak list was then compared with a new spectrum until all peak lists had been combined. The latter peak list was used to create a matrix displaying the frequency of each peak position for each sample. Peak positions that were present in less than 5% of the spectra were deleted from the matrix to reduce the number of noise peaks. The matrix created in this way was used for statistical analysis of the data. Using a univariate analysis in R, a p value was determined for every peak position. When comparing more than two groups, we used the Kruskal-Wallis test; when comparing two groups, we used the Wilcoxon-Mann-Whitney test.

To investigate whether differences in the total CSF protein concentration of the samples affected the performance of the MALDI-TOF, we first used the Bio-Rad detergent-compatible protein assay (Bio-Rad) to determine the protein concentration of all the CSF samples. We then calculated the sum of albumin peaks detected in all seven spectra of each sample (excluding the albumin peaks that had been used for calibration). Using Prism version 4.0 (GraphPad Software, San Diego, CA), we compared the total protein concentration and the sum of the albumin peaks of the three groups. To test for statistically significant differences between the three groups, we used one-way ANOVA followed by Bonferroni’s multiple comparison test. All tests were two-sided, and p < 0.05 was considered statistically significant. In addition, the correlation between peak frequency and protein concentration was calculated for each individual peak position. A histogram of all correlation coefficients was created and the distribution was compared, with a normal distribution using the Kolmogorov Smirnov (SPSS Inc., Chicago, IL).

All peak positions with a frequency that was two times higher in group I than in the control groups II and III were selected. These peak values were submitted to the Mascot search engine (Matrix Science, London, UK) to search the MSDB human database using a 100-ppm tolerance.

Building a Predictive Model—
A supervised multivariate analysis method was used to determine whether sample groups I and II could be separated on the basis of their peak positions. For each patient, seven mass spectra were used, and the peak positions of each of those seven spectra were combined. Therefore, the number of times that a peak was present varied between 0 and 7. To reduce noise, a minimum of two peaks was required to determine whether a peak was present (≥2, 1) in a sample or not (<2, 0) allowing the formation of a binary data matrix. The required frequency was kept low to minimize loss of signal. To reduce the number of variables, a clustering was performed that combined peaks of similar behavior. Peptide peaks that often occurred simultaneously were grouped into the same cluster using a hierarchical clustering algorithm. The distance between each possible pair of peptide peaks was determined with the Manhattan distance measure (i.e. the sum of the absolute differences for all patients). The number of clusters was set at 50. With 50 clusters, isotope peaks were generally grouped into the same cluster. The clusters generated represented groups of peptide peaks that might be derived, at least in part, from the same protein or proteins. The clustering of the masses made it possible to compose a new data matrix. Each matrix cell contained the number of peaks present for a certain patient relative to the total number of peaks in a particular cluster. In other words, each cell in the new matrix defined the proportion of peptides that was present in a cluster for a certain patient. To further reduce the complexity of the data, we set a threshold for the presence of a cluster to obtain a binary data matrix. Using the clustered, binary variables, we constructed a non-linear predictive model that separated group I from group II. In the model thus generated, a maximum of eight clusters was allowed, and only those clusters with an area under curve (AUC) greater than 0.62 were considered. Genetic programming was used to search for the model with the highest AUC (14). To obtain an unbiased estimate of the predictive accuracy of the model, we used the bootstrapping method (15, 16). Bootstrap data sets were created by randomly selecting patients with replacements from the original data set. As an extra precaution, the clustering step was included in the bootstrapping process as well. 100 bootstrapped matrices were created from the original matrix by resampling with replacement. The clustering was repeated for each of these resampled matrices and a predictive model was constructed. The AUC of each model was measured on the bootstrap data set as well as on the original data set. The average difference between the performance on the bootstrap data set and the performance on the original data set provided a correction factor that gave an estimate of the bias of our model development process. Finally, we developed a model on the original data and corrected its AUC with the correction factor, producing a conservative estimate of the performance of the model.


    RESULTS
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
Patients—
Clinical information, CSF data, and imaging results are summarized in Table I. Forty-six percent of patients with breast cancer with LM (group I) presented with more than one neurological symptom, whereas the majority of patients with breast cancer (73%) in group II presented with a single symptom at the time of lumbar puncture. All patients in group I had LM, whereas diagnoses in group II included bone metastases (n = 15), tension headache (n = 5), metabolic encephalopathy (n = 3), carpal tunnel syndrome (n = 3), brain metastases (n = 2), migraine (n = 2), disseminated intravascular coagulation (n = 2), dural metastasis (n = 2), lumbago (n = 2), and psychiatric problems (n = 2). All of the following conditions were diagnosed, each in one patient: jugular vein thrombosis, herniated disk, dementia, anaplastic astrocytoma, whiplash injury, polyneuropathy, and syncope. In two patients, no cause for the symptoms was found.


View this table:
[in this window]
[in a new window]
 
TABLE I Clinical information on advanced breast cancer patients with (group I) and without leptomeningeal metastasis (group II) and control subjects (group III)

NA, not applicable.

 
The cytological examination of the CSF revealed tumor cells in all group I patients and in none of the group II patients. The CSF white cell count was increased (>4 cells/µl) in 56% of group I and 0% of group II patients. The total protein concentration was higher than the institutional upper limit in 85% of group I and in 30% of group II patients. Imaging studies were obtained in 39 of 41 group I patients, 35 MRIs and 4 CT scans. Imaging results were positive for LM in 23 (two CT scans) patients, suggestive in 4 patients, and negative in 10 patients (two CT scans). MRI results were available in 26 patients in group II. The MRI was considered negative in 22 patients, suggestive in 3, and positive for LM in 1 patient. The patient with a false-positive MRI had a single non-symptomatic dural enhancing lesion at T9; the CSF cytology was three times negative.

Peak Detection—
We detected an average of 350 peaks per spectrum (95% CI, 250 to 450). Spectra with more than 450 detectable peaks were excluded from the analysis because these spectra consisted mainly of noise peaks (peaks without an isotopic distribution). After alignment and quality control, the number of good quality spectra per sample was counted, and samples with fewer than seven good quality spectra were discarded. The number of seven spectra is based on an earlier reproducibility study (13). At this threshold, the remaining number of cases per patient group was 41 in group I, 46 in group II, and 43 in group III (Table II). Of the samples with more than seven spectra, only the first seven spectra were used. The combined peak list of all these spectra contained 2006 possible peak positions. After noise reduction, the matrix created from this list contained 895 peak positions.


View this table:
[in this window]
[in a new window]
 
TABLE II Number of patients per group before and after quality control of spectra

 
Univariate Analysis—
For each peak, the significance of difference in distribution over groups I-III was tested with the Kruskal-Wallis test, and a p value was calculated. We detected 323 peak positions with p < 0.05 and 172 with p < 0.01, indicating that a considerable number of peaks correlated with diagnosis groups. In Fig. 1A, the number of peaks (frequency) is presented for each p value interval. On the same data, we performed a cross-validation by randomly assigning a group number to each CSF sample and then repeating the Kruskal-Wallis test. This scrambling procedure was repeated 10,000 times. The blue line in Fig. 1A presents the mean frequency of peaks per pvalue interval. The flat distribution of the p value histogram indicates that there is no correlation between peak positions and groups after scrambling. The p value histogram of the actual experiment is clearly skewed to lower p values and is significantly different from the histogram after scrambling (Fig. 1A).



View larger version (21K):
[in this window]
[in a new window]
 
FIG. 1. A significant number of peptides is differentially expressed between patients with breast cancer with LM (group I) and without LM (group II) and healthy control subjects (group III). The figure shows histograms of p values, where the height of each bar denotes the number of peptide peaks, whereas the horizontal base corresponds to the p value interval (interval size, 0.01). The blue line represents the histogram of p values after cross validation. The height of the blue line shows the average number of peptide peaks after 10,000 scrambling procedures (see "Results"). A, histogram of p values after comparing groups I, II, and III using the Kruskal-Wallis test. The distribution is clearly different from the random distribution (blue line) and skewed to the left, indicating a high number of peptides that discriminate between the three groups (low p value). B–D, histogram of p values for peak positions comparing two groups with the Wilcoxon-Mann-Whitney test. Groups I and III are compared in B, groups I and II in C, and groups II and III in D. The distribution of a resampled situation is displayed in blue. The resampling was repeated 10,000 times.

 
We then compared the groups pair-wise with a Wilcoxon-Mann-Whitney test. The highest number of peak positions with a significant p value was found when groups I and III were compared (p < 0.05, 329 peak positions; p < 0.01, 190 peak positions) (Fig. 1B). When groups I and II were compared, the p value histogram was significantly different from a random distribution (p < 0.05: 326 peak positions; p < 0.01: 164 peak positions) (Fig. 1C). When groups II and III were compared, fewer peak positions with a significant p value were detected (p < 0.05: 119 peak positions; p < 0.01: 46 peak positions) (Fig. 1D). As a cross-check, the correlation between the different institutes and the peak occurrences were compared. The histogram of the p values did not differ significantly from a random distribution.

The CSF total protein concentration of the samples differed significantly between groups I and III (ANOVA, p < 0.001; Fig. 2A). However, the sum of albumin peaks detected per sample did not differ between the groups (ANOVA, p = 0.8; Fig. 2B). For all individual peak positions, the correlation between the peak frequency and the protein concentration was calculated and plotted in a histogram (Fig. 3). The distribution of the correlation coefficients did not significantly differ from a normal distribution (Kolmogorov Smirnov test, p = 0.35). The constant number of albumin peaks and the lack of effect of protein concentration on the peak frequency indicated that differences in total protein concentration had not significantly affected MALDI-TOF performance.



View larger version (12K):
[in this window]
[in a new window]
 
FIG. 2. Differences in protein concentration do not affect MALDI-TOF analysis of CSF samples. A, the mean protein concentration differed significantly among the three groups (ANOVA, p < 0.0001). Differences were significant between groups I and II (Bonferroni multiple comparison test, p < 0.0001) and between groups I and III (p < 0.0001) but not between groups II and III (p > 0.05). B, the mean number of albumin peaks per group does not differ significantly among the three groups (ANOVA, p = 0.8). Bars denote S.E.

 


View larger version (45K):
[in this window]
[in a new window]
 
FIG. 3. Correlation coefficients of protein content and peak frequency. The histogram shows the distribution of correlation coefficients for protein content and peak frequency for each peak position. The histogram does not differ significantly from a normal distribution with a mean value of 0 (Kolmogorov Smirnov test, p = 0.35).

 
From the matrix file, we made a selection of all peak positions that were up-regulated two times in group I compared with both control groups II and III. This list contained 52 peak positions that were used in a mascot database search against the MSDB human database. Five of the 52 peptides matched to apolipoprotein A1 (28 kDa). These five matching peptides had an average mass error of 19 ppm compared with the MSDB database.

Clustering Analysis—
A clustering on the masses was performed on a matrix in which the samples had been sorted on group number. This clustering resulted in the detection of group-specific clusters (Fig. 4). In Fig. 4A, a zoom in of the dendrogram is displayed that shows peak positions that have higher frequencies in patients with breast cancer without LM (group II) and healthy control subjects (group III) than in patients with breast cancer with LM (group I). In B, peak positions with a higher frequency in patients with breast cancer with LM (group I) than in groups II and III are shown.



View larger version (49K):
[in this window]
[in a new window]
 
FIG. 4. Unsupervised clustering demonstrates peptide peaks that differentiate patients with breast cancer with and without LM. The figure illustrates close ups of an unsupervised clustering dendrogram. The clustered masses are displayed on the y-axis, whereas the x-axis represents the samples ordered by group. Colors represent the frequency of a peak position in the seven spectra of a sample (up to two spectra, increasing intensity green; three spectra, black; four to seven spectra, increasing intensity red; see key on figure. A, peak positions that are less frequently expressed in CSF from patients with breast cancer with LM (group I) than in the CSF from patients with breast cancer without LM (group II) and in CSF from control subjects (group III). B, peak positions that are more frequently expressed in group I than in groups II and III.

 
Predictive Model—
Clustering of the masses resulted in a reduction of the variables from 895 peaks to 50 clusters. All albumin peaks were assigned to the same cluster (a large cluster containing 164 peaks). Table II lists 10 clusters ordered by their univariate area under the receiver operating characteristic curve (AUC), indicating the highest discriminatory value. As predicted, isotopic peaks are grouped together in the same cluster (Table III). The AUC can vary between 0.5 and 1 (0.5 for a random prediction and a value of 1 for a optimal prediction). The cluster with the highest AUC consisted of 10 peaks, all of which had p values <0.01. When adjoining isotope peaks were combined, four distinctive peptide peaks in this cluster remained. The model with the highest predictive value, as selected with genetic programming, used six clusters (Table II). The AUC achieved by this model on the original data set was 0.936; the bootstrap-corrected AUC was 0.852. The maximum accuracy that could be achieved after bootstrapping was 77%. At this cut-off point, the corrected sensitivity was 79% and the corrected specificity was 76%.


View this table:
[in this window]
[in a new window]
 
TABLE III Ten clusters with the highest predictive value

 

    DISCUSSION
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 
We studied the value of proteomic profiling in the diagnosis of LM in patients with breast cancer. After digestion of CSF proteins with trypsin, we analyzed the resulting peptides with MALDI-TOF. In a univariate analysis, we detected that many peptides were differentially expressed among patients with breast cancer with and without LM and healthy control subjects. We then built a predictive model to diagnose LM in patients with breast cancer. After validation by bootstrapping, the model achieved a sensitivity of 79% and specificity of 76%. In current clinical practice, diagnostic tests for LM include CSF cytology and gadolinium-enhanced MRI (1, 17, 18). The sensitivity of CSF cytology (75%) and Gd MRI (76%) are comparable with our predictive model (17). The specificity of MRI (77%) is also similar, but the specificity of CSF cytology is much higher (100%) (17). We conclude that our test may be useful to support the diagnosis of LM in patients with breast cancer. It is noteworthy that the test requires only 20 µl of CSF and can therefore easily be combined with cytological examination of the CSF.

After the original highly intriguing report that the serum proteome profile can be used for the early detection of ovarian cancer (3), many researchers have applied the SELDI-TOF technology to detect proteome profiles specific for other forms of cancer and non-malignant disease (6, 19). However, criticism has focused on the low reproducibility of the SELDI-TOF analytical tool (7, 9, 10, 2025). Models based on SELDI-TOF protein profiling data generally performed poorly upon external validation in time (26). This lack of reproducibility may be due to variation in chip batches, mass spectrometers, sample stability, the low reproducibility of peak height, and the low number of measurements per sample (7, 20, 27). We believe that our model is less affected by these variations for several reasons. First, the sample preparation is simple, fully automated, and does not require chips or fractionations. Second, we did not include the height of the peaks in the model because quantitative measurements of peak heights with both the MALDI and SELDI methods are poorly reproducible (28). In addition, we have carefully determined before analysis the number of replicates per sample that provided the optimal reproducibility (13). The number of replicates that we used (18) was much higher than in other studies. Third, the predictors that we used in our model were clusters of peaks and not single peaks improving the robustness of the model. Changes in one peak position of a cluster, used as a predictor, will not have a dramatic effect on the performance of the predictive value of the entire cluster. In the future, the reliability of the method can be further improved by linking multiple peptide peaks to a single protein.

The direct identification of peptides from complex samples remains difficult because of the complexity of tryptic digests of body fluids. A direct MS/MS identification of the peptides using MALDI TOF/TOF is not possible as a result of the presence of multiple peptides, even in small mass windows. Although off line nano LC-MALDI could solve this problem, we believe that the best method to identify peptides in complex mixtures is Fourier transform MS in which the exact mass of the peptide of interest is obtained. In most cases, the detection of multiple peptides derived from a single protein will allow identification of the protein. Our database search on the up-regulated peptides has demonstrated the feasibility of this approach for apolipoprotein A1. The up-regulation of different forms of apolipoprotein has been observed before in different SELDI-TOF studies (2931). We are currently performing Fourier transform MS to identify the other up-regulated peptides as well.

Confounding factors in the present study could be differences in sample collection and storage between institutes, differences in total protein concentration and white cell count between the groups, reproducibility of the method and patient selection bias. The number of peptides, differentially expressed between the institutes, was identical to a chance distribution, excluding potential biases introduced by differences in sample handling. The white cell count and protein concentration in CSF from patients with LM were increased compared with both patients with breast cancer without LM and healthy control subjects. All samples were routinely centrifuged after lumbar puncture, making contamination of the supernatant with cellular debris unlikely. The elevated protein concentration in the CSF from LM patients is well known and is caused by dysfunction of the blood-brain barrier (4), resulting in an increase of high abundance serum proteins in the CSF. A normalization on protein concentration could have been used to compensate for this difference. However, this implies that less CSF should be used from LM samples. This would result in a lower amount of CSF-specific proteins compared with the control samples. In our opinion, this would result in a bias among the three sample groups. To investigate the potential confounding effect of differences in total protein concentration, we calculated the average number of tryptic peptide digests derived from albumin in each group. The number of albumin-derived peptide peaks did not differ among the three groups. In addition, no significant negative or positive correlation between the number of peaks and the protein concentration could be detected. This provided strong evidence that the differences in protein concentration had not interfered with the analysis.

All patients with breast cancer in the present study had signs or symptoms compatible with LM, which led to the performance of a lumbar puncture. All patients in group I who were diagnosed with LM had positive cytology. All patients in group II had negative cytology combined with clinical follow up, indicating an alternative diagnosis. At this stage, we did not include a group of patients with "false-negative" CSF cytology as indicated by MRI and/or clinical follow up. It will be particularly interesting to investigate the performance of this proteome-based test also in these patients, preferably in a prospective manner.

We conclude that MALDI-TOF analysis of tryptic peptide digests derived from the CSF of patients with breast cancer can support the diagnosis of LM. We expect that the use of more accurate and sensitive measurements by Fourier transform mass spectrometry will further improve the identification of disease-specific patterns and markers from body fluids in the near future.


   FOOTNOTES
 
Received, March 18, 2005, and in revised form, June 17, 2005.

Published, MCP Papers in Press, June 21, 2005, DOI 10.1074/mcp.M500081-MCP200

1 The abbreviations used are: LM, leptomeningeal metastases; CSF, cerebrospinal fluid; AUC, area under the curve; ANOVA, analysis of variance. Back

* This study was supported by the Netherlands Proteomics Centre, by a grant from the Erasmus MC Revolving Fund, and by the European Union grant P-mark. Back

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

|||| To whom correspondence should be addressed. Tel.: 31-104633327; Fax: 31-104633208; E-mail: p.sillevissmitt{at}erasmusmc.nl


    REFERENCES
 TOP
 ABSTRACT
 EXPERIMENTAL PROCEDURES
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. DeAngelis, L. M. (1998 ) Current diagnosis and treatment of leptomeningeal metastasis. J. Neurooncol. 38, 245 –252[CrossRef][Medline]

  2. Zheng, P. P., Luider, T. M., Pieters, R., Avezaat, C. J., van den Bent, M. J., Sillevis Smitt, P. A., and Kros, J. M. (2003 ) Identification of tumor-related proteins by proteomic analysis of cerebrospinal fluid from patients with primary brain tumors. J. Neuropathol. Exp. Neurol. 62, 855 –862[Medline]

  3. Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V. A., Steinberg, S. M., Mills, G. B., Simone, C., Fishman, D. A., Kohn, E. C., and Liotta, L. A. (2002 ) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572 –577[CrossRef][Medline]

  4. Fishman, R. A. (1992 ). Cerebrospinal Fluid in Diseases of the Nervous System, W. B. Saunders Company, Philadelphia

  5. Pusch, W., Flocco, M. T., Leung, S.-M., and Thiele, H. (2003 ) Mass spectrometry-based clinical proteomics. Pharmacogenomics 4, 463 –476[CrossRef][Medline]

  6. Carrette, O., Demalte, I., Scherl, A., Yalkinoglu, O., Corthals, G., Burkhard, P., Hochstrasser, D. F., and Sanchez, J. C. (2003 ) A panel of cerebrospinal fluid potential biomarkers for the diagnosis of Alzheimer's disease. Proteomics 3, 1486 –1494[CrossRef][Medline]

  7. Diamandis, E. P. (2003 ) Point: Proteomic patterns in biological fluids: do they represent the future of cancer diagnostics? Clin. Chem. 49, 1272 –1275[Free Full Text]

  8. Diamandis, E. P., and van der Merwe, D. E. (2005 ) Plasma protein profiling by mass spectrometry for cancer diagnosis: opportunities and limitations. Clin. Cancer Res. 11, 963 –965[Free Full Text]

  9. Diamandis, E. P. (2004 ) Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems. J. Natl. Cancer Inst. 96, 353 –356[Free Full Text]

  10. Diamandis, E. P. (2004 ) Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations. Mol. Cell. Proteomics 3, 367 –378[Abstract/Free Full Text]

  11. Koomen, J. M., Zhao, H., Li, D., Abbruzzese, J., Baggerly, K., and Kobayashi, R. (2004 ) Diagnostic protein discovery using proteolytic peptide targeting and identification. Rapid Commun. Mass Spectrom. 18, 2537 –2548[CrossRef][Medline]

  12. Ramstrom, M., Palmblad, M., Markides, K. E., Hakansson, P., and Bergquist, J. (2003 ) Protein identification in cerebrospinal fluid using packed capillary liquid chromatography Fourier transform ion cyclotron resonance mass spectrometry. Proteomics 3, 184 –190[CrossRef][Medline]

  13. Dekker, L. J., Dalebout, J. C., Siccama, I., Jenster, G., Sillevis Smitt, P. A., and Luider, T. M. (2005 ) A new method to analyze matrix-assisted laser desorption/ionization time-of-flight peptide profiling mass spectra. Rapid Commun. Mass Spectrom. 19, 865 –870[CrossRef][Medline]

  14. Koza, J. R. (1992 ). Genetic Programming, MIT Press, Cambridge, MA

  15. Efron, B., and Tibshirani, R. (1986 ) Bootstrap methods for standard errors, confidence intervals, and other measures of accuracy. Stat. Sci. 1, 54 –77

  16. Efron, B., and Tibshirani, R. (1993 ). An Introduction to the Bootstrap, Chapman and Hall, New York

  17. Straathof, C. S., de Bruin, H. G., Dippel, D. W., and Vecht, C. J. (1999 ) The diagnostic accuracy of magnetic resonance imaging and cerebrospinal fluid cytology in leptomeningeal metastasis. J. Neurol. 246, 810 –814[CrossRef][Medline]

  18. Freilich, R. J., Krol, G., and DeAngelis, L. M. (1995 ) Neuroimaging and cerebrospinal fluid cytology in the diagnosis of leptomeningeal metastasis. Ann. Neurol. 38, 51 –57[CrossRef][Medline]

  19. Soltys, S. G., Shi, G., Tibshirani, R., Giaccia, A. J., Koong, A. C., and Le, Q. (2003 ) The use of plasma SELDI-TOF MS proteomic patterns for detection of head and neck squamous cell cancers (HNSCC). Int. J. Radiat. Oncol. Biol. Phys. 57, S202

  20. Baggerly, K. A., Morris, J. S., and Coombes, K. R. (2004 ) Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20, 777 –785[Abstract/Free Full Text]

  21. Check, E. (2004 ) Proteomics and cancer: running before we can walk? Nature 429, 496 –497[CrossRef][Medline]

  22. White, C. N., Chan, D. W., and Zhang, Z. (2004 ) Bioinformatics strategies for proteomic profiling. Clin. Biochem. 37, 636 –641[CrossRef][Medline]

  23. Diamandis, E. P. (2004 ) How are we going to discover new cancer biomarkers? A proteomic approach for bladder cancer. Clin. Chem. 50, 793 –795[Free Full Text]

  24. Diamandis, E. P. (2004 ) Proteomic patterns to identify ovarian cancer: 3 years on. Expert Rev. Mol. Diagn. 4, 575 –577[CrossRef][Medline]

  25. Diamandis, E. P. (2004 ). Identification of serum amyloid a protein as a potentially useful biomarker for nasopharyngeal carcinoma. Clin. Cancer Res. 10, 5293; author reply5293 –4[Free Full Text]

  26. Rogers, M. A., Clarke, P., Noble, J., Munro, N. P., Paul, A., Selby, P. J., and Banks, R. E. (2003 ) Proteomic profiling of urinary proteins in renal cancer by surface enhanced laser desorption ionization and neural-network analysis: identification of key issues affecting potential clinical utility. Cancer Res. 63, 6971 –6983[Abstract/Free Full Text]

  27. Qu, Y., Adam, B.-L., Yasui, Y., and Ward, M. D. (2002 ) Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostrate cancer from noncancer patients. Clin. Chem. 48, 1835 –1843[Abstract/Free Full Text]

  28. Villanueva, J., Philip, J., Entenberg, D., Chaparro, C. A., Tanwar, M. K., Holland, E. C., and Tempst, P. (2004 ) Serum Peptide profiling by magnetic particle-assisted, automated sample processing and MALDI-TOF mass spectrometry. Anal. Chem. 76, 1560 –1570[CrossRef][Medline]

  29. Heike, Y., Hosokawa, M., Osumi, S., Fujii, D., Aogi, K., Takigawa, N., Ida, M., Tajiri, H., Eguchi, K., Shiwa, M., Wakatabe, R., Arikuni, H., Takaue, Y., and Takashima, S. (2005 ) Identification of serum proteins related to adverse effects induced by docetaxel infusion from protein expression profiles of serum using SELDI ProteinChip system. Anticancer Res. 25, 1197 –1203[Medline]

  30. Malik, G., Ward, M. D., Gupta, S. K., Trosset, M. W., Grizzle, W. E., Adam, B. L., Diaz, J. I., and Semmes, O. J. (2005 ) Serum levels of an isoform of apolipoprotein A-II as a potential marker for prostate cancer. Clin. Cancer Res. 11, 1073 –1085[Abstract/Free Full Text]

  31. Allard, L., Lescuyer, P., Burgess, J., Leung, K. Y., Ward, M., Walter, N., Burkhard, P. R., Corthals, G., Hochstrasser, D. F., and Sanchez, J. C. (2004 ) ApoC-I and ApoC-III as potential plasmatic markers to distinguish between ischemic and hemorrhagic stroke. Proteomics 4, 2242 –2251[CrossRef][Medline]





This Article
Abstract
Full Text (PDF)
All Versions of this Article:
M500081-MCP200v1
4/9/1341    most recent
Submit a response
Purchase Article
View Shopping Cart
Alert me when this article is cited
Alert me when eLetters are posted
Alert me if a correction is posted
Citation Map
Services
Email this article to a friend
Similar articles in this journal
Similar articles in PubMed
Alert me to new issues of the journal
Download to citation manager
Glossary
Copyright Permissions
Google Scholar
Articles by Dekker, L. J.
Articles by Smitt, P. A. S.
Articles citing this Article
PubMed
PubMed Citation
Articles by Dekker, L. J.
Articles by Smitt, P. A. S.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 All ASBMB Journals   Journal of Biological Chemistry 
 Journal of Lipid Research   Biochemistry and Molecular Biology Education