Affiliations of authors: K. E. Warren, Neuro-Oncology Branch, National Cancer Institute (NCI) and National Institute of Neurological Disorders and Stroke, Bethesda, MD; A. A. Aikin, F. M. Balis, Pediatric Oncology Branch, NCI; N. Patronas, Department of Radiology, Clinical Center, Bethesda; P. S. Albert, Biometric Research Branch, Clinical Center.
Correspondence to: Katherine E. Warren, M.D., National Institutes of Health, Bldg. 10, Rm. 12S245, 10 Center Dr., MSC 1928, Bethesda, MD 208921928 (e-mail: warrenk{at}mail.nih.gov).
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The recently implemented RECIST (Response Evaluation Criteria in Solid Tumors) method of assessing tumor response uses a simplified one-dimensional (1D) measurement (sum of the longest diameter of tumors) (1,2). If tumors have a spheric shape, then a 30% decrease in the sum of the longest diameters (RECIST) is equivalent to a 50% decrease in the sum of the 2D products (WHO) and a 65% decrease in three-dimensional (3D) tumor volume (3). The RECIST method also assumes that the length, width, and height of a tumor have equivalent percent reductions or increases in length after treatment (2). The RECIST PD category requires an increase of 20% or more in the sum of the longest diameters or the appearance of new lesions (1). This percentage is equivalent to a 44% increase in the 2D product, rather than the traditional 25% increase used in the WHO criteria, and a 73% increase in the 3D tumor volume. There is excellent concordance between the RECIST and WHO response criteria for assessing response in extracranial solid tumors, but the RECIST has not been validated as a method of assessing response or time to progression in brain tumors.
Software for the estimation of tumor volume by 3D measurements is now more widely available, can accurately measure the volume of irregularly shaped tumors (i.e., does not assume tumors are spheric), and does not require the assumption that the change in tumor size is equivalent in each dimension. We compared 3D measurements of tumor volume for brain tumors as a method of quantifying changes in tumor size to the 2D or 1D measurements.
![]() |
PATIENTS AND METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Pediatric patients who were referred to our institution for investigational drug trials were eligible for this study if they had primary brain tumors, measurable lesions, and multiple magnetic resonance imaging (MRI) scans performed to assess response to the investigational therapy. The study was approved by the local Institutional Review Board. Written informed consent was obtained from all patients or their guardians.
Imaging
MRI of the brain was performed on a 1.5-tesla GE Signa scanner with a standard quadrature head coil (General Electric Medical Systems, Milwaukee, WI). Axial precontrast and postcontrast T1, fast-spin echo T2, and fluid-attenuated inversion recovery images were obtained. For 3D analysis, 124 postcontrast, sagittal images were obtained with a fast-spoiled gradient-recalled sequence with a slice thickness of 1.5 mm, a matrix of 256 x 256 pixels, and number of excitations = 1. Images were printed on standard x-ray film for 1D and 2D measurements. For 3D estimation of tumor volume, the digitized sagittal images were transferred to a GE Medical Systems Windows Advantage Workstation (General Electric Medical Systems).
Analysis
One neuroradiologist (N. Patronas), who was unaware of the 3D tumor volume estimate, performed all 1D and 2D measurements. 1D tumor size was the sum of the longest diameters (measured with calipers) from each measurable contrast-enhanced tumor on the postgadolinium T1-weighted axial or sagittal images. 2D measurements were the sum of the products of the largest diameters and their maximum perpendicular diameters in the same plane from each measurable contrast-enhanced tumor on the postgadolinium T1-weighted scans. 3D measurements were performed on a GE Medical Systems Windows Advantage Workstation by one investigator (K. E. Warren). The contrast-enhanced tumor on digitized postcontrast T1-weighted images was outlined in three planes by the user with a paintbrush technique (Fig. 1). The tumor volume was calculated by the workstation from these user-defined tumor images. The time to clinical progression was determined independently by chart review at the end of the study by an independent investigator (A. A. Aikin). Patients were considered to have clinical progression if they had worsening of existing neurologic symptoms or signs or the appearance of new neurologic symptoms or signs with no other defined etiology.
|
|
The response distribution was compared across measurement methods (1D, 2D, and 3D) by use of a proportional odds model (4). The model was logit P (Y k) =
k +
1G2 +
2G3, where Y is the response variable; k = 1, 2, 3, 4, and 5 and reflects the ordinal categories (complete response, PR, minor response, SD, and PD); and G2 and G3 are indicator variables for whether the measurement method is 2D and 3D. A test of whether
1 =
2 = 0 is a test of whether the response distribution is the same for the three measurement methods. This was done by use of a
2/Wald test performed with a jack-knife variance estimate (5) to account for the dependent response data. Percent agreement was estimated as the number of follow-up scans in which the corresponding techniques agreed divided by the total number of follow-up scans (98 scans). Standard errors (data not shown) were estimated by use of the jack-knife method to account for the correlated dependent-response data, and these values were used to estimate 95% confidence intervals (CIs). Comparisons between percent agreement (e.g., % agreement for 1D and 3D compared with that for 1D and 2D) were done with Z-tests with jack-knife variance estimates to account for dependent data.
The progression-free survival (PFS) distributions were compared across the 1D, 2D, 3D, and clinical-assessment methods, with a proportional hazards model developed for dependent survival data (6). A global comparison between survival distributions was done with a 2/Wald test, and individual comparisons were done with Z tests. All statistical tests were two-sided. A P value of less than .05 was chosen as the criterion for statistical significance, while a P value between .05 and 0.1 was considered to be a statistical trend.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
|
Table 2 lists the number of follow-up scans (n = 98) that were classified into each response category according to the measurement method and based on the criteria listed in Table 1
. A test of the difference between the response distributions across measurement methods was not statistically significant (
2/Wald test = 2.2; 2 df; P = .33). The 1D and 2D methods were concordant (scans were classified into the same response category) for 83% (95% CI = 67% to 99%) of the follow-up scans, the 2D and 3D methods were concordant for 66% (95% CI = 52% to 80%) of the follow-up scans, and the 1D and 3D methods were concordant for 61% (95% CI = 47% to 75%) of the follow-up scans (Fig. 2
). The percent agreement for the 1D and 3D methods was statistically significantly lower than that for the 1D and 2D methods (Z test; P<.001). Similarly, the percent agreement for the 2D and 3D methods was statistically significantly lower than that for the 1D and 2D methods (Z test; P = .003). There was good agreement between the three methods in detecting PRs. The 1D and 2D methods both detected PRs on 11 scans, and on two additional scans, the 2D method detected PRs that were not classified as PRs by the 1D method. The 3D method was concordant with the 2D and 1D methods in detecting PRs for the same 11 scans, but the 2D and 3D methods were discordant for PRs on three scans (detected as PR by one method but not by the other method), and 1D and 3D methods were discordant for PRs on one scan. However, there was less concordance among the methods in classifying tumors in the minor response and PD categories. For minor responses, the 1D and 2D methods were concordant for seven scans and discordant for seven scans, and the 3D method was concordant with the 1D and 2D methods for two scans but was discordant for 16 and 13 scans, respectively. For PD, the 1D and 2D methods were concordant for 18 scans and discordant for nine scans. The 1D and 2D methods were concordant with the 3D method for 13 and 18 scans, respectively, and discordant on 24 and 21 scans, respectively.
|
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In the comparison of different measurement methods, the percent agreement of the 3D method with both the 1D and 2D methods was statistically significantly lower than the percent agreement between the 1D and 2D methods. This result would be expected because the same measurement used by the 1D method is also used in the 2D method. There was substantial discordance among the three methods in detecting minor responses, especially when comparing the 1D and 2D methods with the 3D method. As shown in Fig. 2, for most discordant cases, a minor response by one method was usually categorized as SD by the second method. This lack of consistency in detecting minor response raises a concern about the ability to accurately measure this response category.
Many new agents that are currently in the early phases of clinical testing are cytostatic and are not anticipated to produce an immediate shrinkage of tumor. The primary end point used to define activity of these agents may be PFS rather than response. The definition of PD and the method of measuring changes in tumor size may, therefore, become critical determinants of drug activity.
With the use of the WHO criteria, which defines PD as a 25% or more increase in the 2D area of one or more lesions, there is a 25% chance of misclassifying patients with SD into the PD category because of variability in measurements (7). This high rate of false-positive results led the Southwest Oncology Group to adopt a new definition of PD (a 50% increase in the sum of the products, evidence of new lesions, or an absolute increase of 10 cm2 in the sum of the products). The initial intention of the WHO criteria was to avoid such heterogeneity in response definitions.
The computer-assisted method of estimating 3D tumor volume intuitively appears to be a more accurate method of measuring tumor size and monitoring changes in tumor size after treatment. The actual volume of irregularly shaped tumors can be measured without assuming a spheric shape, cystic components of tumors can be excluded from the volume analysis, and unequal shrinkage in one or more dimensions can be accommodated in the measurement. The coefficient of variation of the 3D method in the present study also suggests that there may be a smaller chance of misclassifying patients with SD compared with the 2D method by use of the WHO criteria, presumably because a larger percentage change in volume is required for PD with the 3D method.
We demonstrated considerable discordance in detecting PD with 1D, 2D, and 3D measurement methods. The ideal method of tumor measurement should most closely correlate with clinical outcome. In this small study of patients with recurrent tumors, we found some evidence that the 2D, 3D, and clinical assessments had shorter times to progression than the 1D method. This observation suggests that the method used to assess tumor progression could influence the estimate of PFS for childhood brain tumors in larger trials of less heavily pretreated patients. Thus, 1D measurement of childhood brain tumors must be validated before implementing the RECIST criteria in this group of tumors. Further studies of computer-assisted 3D volumetric measurements are also warranted.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
1
Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, et al. New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J Natl Cancer Inst 2000;92:20516.
2
Gehan EA, Tefft MC. Will there be resistance to the RECIST (Response Evaluation Criteria in Solid Tumors)? J Natl Cancer Inst 2000;92:17981.
3
James K, Eisenhauer E, Christian M, Terenziani M, Vena D, Muldal A, et al. Measuring response in solid tumors: unidimensional versus bidimensional measurement. J Natl Cancer Inst 1999;91:5238.
4 Agresti A. Categorical data analysis. New York (NY): John Wiley & Sons; 1990. p. 3225.
5 Efron B, Tibshirani RJ. An introduction to the bootstrap. New York (NY): Chapman & Hall; 1993.
6 Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc 1989;84:106573.
7 Lavin P, Flowerdew G. Studies in variation associated with the measurement of solid tumors. Cancer 1980;46:128690.[Medline]
Manuscript received December 20, 2000; revised July 2, 2001; accepted July 12, 2001.
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
||||
|
Oxford University Press Privacy Policy and Legal Statement |