REPORT

Comparison of One-, Two-, and Three-Dimensional Measurements of Childhood Brain Tumors

Katherine E. Warren, Nicholas Patronas, Alberta A. Aikin, Paul S. Albert, Frank M. Balis

Affiliations of authors: K. E. Warren, Neuro-Oncology Branch, National Cancer Institute (NCI) and National Institute of Neurological Disorders and Stroke, Bethesda, MD; A. A. Aikin, F. M. Balis, Pediatric Oncology Branch, NCI; N. Patronas, Department of Radiology, Clinical Center, Bethesda; P. S. Albert, Biometric Research Branch, Clinical Center.

Correspondence to: Katherine E. Warren, M.D., National Institutes of Health, Bldg. 10, Rm. 12S245, 10 Center Dr., MSC 1928, Bethesda, MD 20892–1928 (e-mail: warrenk{at}mail.nih.gov).


    ABSTRACT
 Top
 Abstract
 Introduction
 Patients and Methods
 Results
 Discussion
 References
 
Background: End points for assessing drug activity in brain tumors are determined by measuring the change in tumor size by magnetic resonance imaging (MRI) relative to a pretreatment or best-response scan. Traditionally, two-dimensional (2D) tumor measurements have been used, but one-dimensional (1D) measurements have recently been proposed as an alternative. Because software to estimate three-dimensional (3D) tumor volume from digitized MRI images is available, we compared all three methods of tumor measurement for childhood brain tumors and clinical outcome. Methods: Tumor size from 130 MRI scans from 32 patients (32 baseline and 98 follow-up scans, for a total of 130 scans; median, three scans per patient; range, two to 18 scans) was measured by each method. Tumor-response category (partial response, minor response, stable disease, or progressive disease) was determined from the percentage change in tumor size between the baseline or best-response scan and follow-up scans. Time to clinical progression was independently determined by chart review. All statistical tests were two-sided. Results: Concordances between 1D and 2D, 1D and 3D, and 2D and 3D were 83% (95% confidence interval [CI] = 67% to 99%), 61% (95% CI = 47% to 75%), and 66% (95% CI = 52% to 80%), respectively, on follow-up scans. Concordances for 1D and 3D and for 2D and 3D were statistically significantly lower than the concordance for 1D and 2D (P<.001 and P = .003, respectively). Concordance among 1D, 2D, and 3D methods in detecting partial response was high; there was less concordance in classifying tumors in the minor response and progressive-disease categories. Median times to progression measured by the 1D, 2D, and 3D methods were 154, 105, and 112 days, respectively, compared with 114 days based on neurologic symptoms and signs (P = .09 for overall comparison). Conclusions: Detection of partial responses was not influenced by the measurement method, but estimating time to disease progression may be method dependent for childhood brain tumors.



    INTRODUCTION
 Top
 Abstract
 Introduction
 Patients and Methods
 Results
 Discussion
 References
 
Activity of anticancer drugs is established by measuring changes in tumor size in response to treatment. Tumor size has traditionally been estimated from two-dimensional (2D) tumor measurements (the product of the longest diameter and its longest perpendicular diameter for each tumor). The standard World Health Organization (WHO) response criteria define a complete response as the disappearance of all known disease for a minimum of 4 weeks, a partial response (PR) as a decrease of 50% or more in the sum of the products of perpendicular diameters of all measured tumors, and progressive disease (PD) as an increase of 25% or more in the product of perpendicular diameters of any measurable lesion or the appearance of new lesions. Tumor measurements that do not fulfill the criteria for an objective response or PD are considered to be stable disease (SD) (<50% decrease and <25% increase in size). A minor response, which is defined as a decrease of 25% or more but less than 50% in the sum of the products of perpendicular diameters of all measured tumors, is often split out of the SD category.

The recently implemented RECIST (Response Evaluation Criteria in Solid Tumors) method of assessing tumor response uses a simplified one-dimensional (1D) measurement (sum of the longest diameter of tumors) (1,2). If tumors have a spheric shape, then a 30% decrease in the sum of the longest diameters (RECIST) is equivalent to a 50% decrease in the sum of the 2D products (WHO) and a 65% decrease in three-dimensional (3D) tumor volume (3). The RECIST method also assumes that the length, width, and height of a tumor have equivalent percent reductions or increases in length after treatment (2). The RECIST PD category requires an increase of 20% or more in the sum of the longest diameters or the appearance of new lesions (1). This percentage is equivalent to a 44% increase in the 2D product, rather than the traditional 25% increase used in the WHO criteria, and a 73% increase in the 3D tumor volume. There is excellent concordance between the RECIST and WHO response criteria for assessing response in extracranial solid tumors, but the RECIST has not been validated as a method of assessing response or time to progression in brain tumors.

Software for the estimation of tumor volume by 3D measurements is now more widely available, can accurately measure the volume of irregularly shaped tumors (i.e., does not assume tumors are spheric), and does not require the assumption that the change in tumor size is equivalent in each dimension. We compared 3D measurements of tumor volume for brain tumors as a method of quantifying changes in tumor size to the 2D or 1D measurements.


    PATIENTS AND METHODS
 Top
 Abstract
 Introduction
 Patients and Methods
 Results
 Discussion
 References
 
Patients

Pediatric patients who were referred to our institution for investigational drug trials were eligible for this study if they had primary brain tumors, measurable lesions, and multiple magnetic resonance imaging (MRI) scans performed to assess response to the investigational therapy. The study was approved by the local Institutional Review Board. Written informed consent was obtained from all patients or their guardians.

Imaging

MRI of the brain was performed on a 1.5-tesla GE Signa scanner with a standard quadrature head coil (General Electric Medical Systems, Milwaukee, WI). Axial precontrast and postcontrast T1, fast-spin echo T2, and fluid-attenuated inversion recovery images were obtained. For 3D analysis, 124 postcontrast, sagittal images were obtained with a fast-spoiled gradient-recalled sequence with a slice thickness of 1.5 mm, a matrix of 256 x 256 pixels, and number of excitations = 1. Images were printed on standard x-ray film for 1D and 2D measurements. For 3D estimation of tumor volume, the digitized sagittal images were transferred to a GE Medical Systems Windows Advantage Workstation (General Electric Medical Systems).

Analysis

One neuroradiologist (N. Patronas), who was unaware of the 3D tumor volume estimate, performed all 1D and 2D measurements. 1D tumor size was the sum of the longest diameters (measured with calipers) from each measurable contrast-enhanced tumor on the postgadolinium T1-weighted axial or sagittal images. 2D measurements were the sum of the products of the largest diameters and their maximum perpendicular diameters in the same plane from each measurable contrast-enhanced tumor on the postgadolinium T1-weighted scans. 3D measurements were performed on a GE Medical Systems Windows Advantage Workstation by one investigator (K. E. Warren). The contrast-enhanced tumor on digitized postcontrast T1-weighted images was outlined in three planes by the user with a paintbrush technique (Fig. 1Go). The tumor volume was calculated by the workstation from these user-defined tumor images. The time to clinical progression was determined independently by chart review at the end of the study by an independent investigator (A. A. Aikin). Patients were considered to have clinical progression if they had worsening of existing neurologic symptoms or signs or the appearance of new neurologic symptoms or signs with no other defined etiology.



View larger version (167K):
[in this window]
[in a new window]
 
Fig. 1. Magnetic resonance imaging. Postcontrast, T1-weighted sagittal image of a tumor containing both cystic and solid components. Red hatch marks indicate areas included in the manual paintbrush technique for volume determination.

 
Tumor response or disease progression was determined from the percent change in tumor size from a baseline scan or from a best-response scan for 98 follow-up scans. The response categories (PR, minor response, SD, and PD) are defined in Table 1Go.


View this table:
[in this window]
[in a new window]
 
Table 1. Definition of response categories for the one-dimensional (1D), two-dimensional (2D), and three-dimensional (3D) measurement methods*
 
Statistical Methods

The response distribution was compared across measurement methods (1D, 2D, and 3D) by use of a proportional odds model (4). The model was logit P (Y <= k) = {theta}k + {beta}1G2 + {beta}2G3, where Y is the response variable; k = 1, 2, 3, 4, and 5 and reflects the ordinal categories (complete response, PR, minor response, SD, and PD); and G2 and G3 are indicator variables for whether the measurement method is 2D and 3D. A test of whether {beta}1 = {beta}2 = 0 is a test of whether the response distribution is the same for the three measurement methods. This was done by use of a {chi}2/Wald test performed with a jack-knife variance estimate (5) to account for the dependent response data. Percent agreement was estimated as the number of follow-up scans in which the corresponding techniques agreed divided by the total number of follow-up scans (98 scans). Standard errors (data not shown) were estimated by use of the jack-knife method to account for the correlated dependent-response data, and these values were used to estimate 95% confidence intervals (CIs). Comparisons between percent agreement (e.g., % agreement for 1D and 3D compared with that for 1D and 2D) were done with Z-tests with jack-knife variance estimates to account for dependent data.

The progression-free survival (PFS) distributions were compared across the 1D, 2D, 3D, and clinical-assessment methods, with a proportional hazards model developed for dependent survival data (6). A global comparison between survival distributions was done with a {chi}2/Wald test, and individual comparisons were done with Z tests. All statistical tests were two-sided. A P value of less than .05 was chosen as the criterion for statistical significance, while a P value between .05 and 0.1 was considered to be a statistical trend.


    RESULTS
 Top
 Abstract
 Introduction
 Patients and Methods
 Results
 Discussion
 References
 
Thirty-two pediatric patients (4–20 years old) with primary brain tumors (Table 2Go) had 130 MRI scans performed before and after treatment (median, three scans per patient; range, two to 18 scans). The diagnoses included high-grade glioma (n = 10), brainstem glioma (n = 9), ependymoma (n = 7), and medulloblastoma/primitive neuroectodermal tumor (n = 6). Seventeen tumors were primarily located in the cerebral cortex or thalamus, and the remainder were primarily located in the posterior fossa. Thirty-two baseline and 98 follow-up scans were performed on these patients. Twenty males and 12 females were enrolled in this study; however, because follow-up scans were compared with the patients' own baseline scan and the number of patients was small, subset analysis by sex was not performed.


View this table:
[in this window]
[in a new window]
 
Table 2. Response category determined by each measurement method for the 98 follow-up scans compared with tumor size at baseline or best response*
 
The longest tumor diameter ranged from 0.9 to 7.6 cm (median, 4.2 cm). 2D products ranged from 0.5 to 50.3 cm2 (median, 10.6 cm2). Tumor volumes (3D) ranged from 0.1 to 79.5 cm3 (median, 10.6 cm3). The 1D or 2D measurements after selecting the image with the longest diameter took a median of 2 minutes, and 3D measurements after the digitized images were loaded onto the workstation took a median of 10.5 minutes. The interday and intraday coefficient of variation for the 3D technique was less than 10%.

Table 2Go lists the number of follow-up scans (n = 98) that were classified into each response category according to the measurement method and based on the criteria listed in Table 1Go. A test of the difference between the response distributions across measurement methods was not statistically significant ({chi}2/Wald test = 2.2; 2 df; P = .33). The 1D and 2D methods were concordant (scans were classified into the same response category) for 83% (95% CI = 67% to 99%) of the follow-up scans, the 2D and 3D methods were concordant for 66% (95% CI = 52% to 80%) of the follow-up scans, and the 1D and 3D methods were concordant for 61% (95% CI = 47% to 75%) of the follow-up scans (Fig. 2Go). The percent agreement for the 1D and 3D methods was statistically significantly lower than that for the 1D and 2D methods (Z test; P<.001). Similarly, the percent agreement for the 2D and 3D methods was statistically significantly lower than that for the 1D and 2D methods (Z test; P = .003). There was good agreement between the three methods in detecting PRs. The 1D and 2D methods both detected PRs on 11 scans, and on two additional scans, the 2D method detected PRs that were not classified as PRs by the 1D method. The 3D method was concordant with the 2D and 1D methods in detecting PRs for the same 11 scans, but the 2D and 3D methods were discordant for PRs on three scans (detected as PR by one method but not by the other method), and 1D and 3D methods were discordant for PRs on one scan. However, there was less concordance among the methods in classifying tumors in the minor response and PD categories. For minor responses, the 1D and 2D methods were concordant for seven scans and discordant for seven scans, and the 3D method was concordant with the 1D and 2D methods for two scans but was discordant for 16 and 13 scans, respectively. For PD, the 1D and 2D methods were concordant for 18 scans and discordant for nine scans. The 1D and 2D methods were concordant with the 3D method for 13 and 18 scans, respectively, and discordant on 24 and 21 scans, respectively.



View larger version (50K):
[in this window]
[in a new window]
 
Fig. 2. Comparison of methods of measuring the percent change in tumor size from baseline or best response. A) One-dimensional (1D) method versus two-dimensional (2D) method. B) 1D method versus three-dimensional (3D) method. C) 2D method versus 3D method. Points in white areas are concordant. Yellow, green, and gray boxes represent increasing degrees of discordance between measurement methods (yellow = one degree of discordance, green = two degrees of discordance, and gray = three degrees of discordance). D) Progression-free survival (PFS) measured by clinical symptoms and signs and by 1D, 2D, and 3D methods. At 100 days, the number of patients at risk for 1D, 2D, 3D, and clinical assessments are 16, 12, 14, and 16, respectively. Different numbers of patients at risk are due to differential censoring between methods. The corresponding 95% confidence intervals (CIs) for PFS at 100 days are as follows: 95% CI = 0.56 to 0.93; 95% CI = 0.36 to 0.77; 95% CI = 0.38 to 0.77; and 95% CI = 0.41 to 0.79. PR = partial response; MR = minor response; SD = stable disease, PD = progressive disease.

 
The duration of PFS was measured clinically by neurologic symptoms and signs and by use of the 1D, 2D, and 3D methods based on the percent increase in tumor size (Table 1Go). Median PFSs for the 1D, 2D, and 3D methods were 154, 105, and 112 days, respectively. On the basis of clinical assessment, the median PFS was 114 days. An overall comparison of the PFS survival curves showed a statistical trend ({chi}2/Wald test, 6.5; 3 df; P = .09). There was some evidence that the 2D and 3D methods and clinical assessment had a shorter time to progression than the 1D method. A test of the 1D versus 2D methods was statistically significant (Z test; P = .015). Comparisons of the other methods were not statistically significant: 1D versus 3D methods (P = .06) and 1D method versus clinical assessment (P = .18) (Fig. 2Go, D).


    DISCUSSION
 Top
 Abstract
 Introduction
 Patients and Methods
 Results
 Discussion
 References
 
In childhood brain tumors, detecting a PR does not appear to be dependent on the measurement method used, but detecting a minor response or tumor progression may be method dependent. The median duration of PFS based on the 1D method was 38% longer than the PFS based on the 3D method and 47% longer than the PFS based on the 2D method. This result would be expected, because a 20% increase in the diameter of a sphere is equivalent to a 44% increase in the 2D area and a 73% increase in the 3D volume, but a 25% increase in 2D tumor size or a 65% increase in volume was used to define disease progression. Clinical trials that incorporate the RECIST method for measuring PFS should not be compared with an historic control population for which the WHO criteria were used to define PD.

In the comparison of different measurement methods, the percent agreement of the 3D method with both the 1D and 2D methods was statistically significantly lower than the percent agreement between the 1D and 2D methods. This result would be expected because the same measurement used by the 1D method is also used in the 2D method. There was substantial discordance among the three methods in detecting minor responses, especially when comparing the 1D and 2D methods with the 3D method. As shown in Fig. 2Go, for most discordant cases, a minor response by one method was usually categorized as SD by the second method. This lack of consistency in detecting minor response raises a concern about the ability to accurately measure this response category.

Many new agents that are currently in the early phases of clinical testing are cytostatic and are not anticipated to produce an immediate shrinkage of tumor. The primary end point used to define activity of these agents may be PFS rather than response. The definition of PD and the method of measuring changes in tumor size may, therefore, become critical determinants of drug activity.

With the use of the WHO criteria, which defines PD as a 25% or more increase in the 2D area of one or more lesions, there is a 25% chance of misclassifying patients with SD into the PD category because of variability in measurements (7). This high rate of false-positive results led the Southwest Oncology Group to adopt a new definition of PD (a 50% increase in the sum of the products, evidence of new lesions, or an absolute increase of 10 cm2 in the sum of the products). The initial intention of the WHO criteria was to avoid such heterogeneity in response definitions.

The computer-assisted method of estimating 3D tumor volume intuitively appears to be a more accurate method of measuring tumor size and monitoring changes in tumor size after treatment. The actual volume of irregularly shaped tumors can be measured without assuming a spheric shape, cystic components of tumors can be excluded from the volume analysis, and unequal shrinkage in one or more dimensions can be accommodated in the measurement. The coefficient of variation of the 3D method in the present study also suggests that there may be a smaller chance of misclassifying patients with SD compared with the 2D method by use of the WHO criteria, presumably because a larger percentage change in volume is required for PD with the 3D method.

We demonstrated considerable discordance in detecting PD with 1D, 2D, and 3D measurement methods. The ideal method of tumor measurement should most closely correlate with clinical outcome. In this small study of patients with recurrent tumors, we found some evidence that the 2D, 3D, and clinical assessments had shorter times to progression than the 1D method. This observation suggests that the method used to assess tumor progression could influence the estimate of PFS for childhood brain tumors in larger trials of less heavily pretreated patients. Thus, 1D measurement of childhood brain tumors must be validated before implementing the RECIST criteria in this group of tumors. Further studies of computer-assisted 3D volumetric measurements are also warranted.


    REFERENCES
 Top
 Abstract
 Introduction
 Patients and Methods
 Results
 Discussion
 References
 

1 Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, et al. New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J Natl Cancer Inst 2000;92:205–16.[Abstract/Free Full Text]

2 Gehan EA, Tefft MC. Will there be resistance to the RECIST (Response Evaluation Criteria in Solid Tumors)? J Natl Cancer Inst 2000;92:179–81.[Free Full Text]

3 James K, Eisenhauer E, Christian M, Terenziani M, Vena D, Muldal A, et al. Measuring response in solid tumors: unidimensional versus bidimensional measurement. J Natl Cancer Inst 1999;91:523–8.[Abstract/Free Full Text]

4 Agresti A. Categorical data analysis. New York (NY): John Wiley & Sons; 1990. p. 322–5.

5 Efron B, Tibshirani RJ. An introduction to the bootstrap. New York (NY): Chapman & Hall; 1993.

6 Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc 1989;84:1065–73.

7 Lavin P, Flowerdew G. Studies in variation associated with the measurement of solid tumors. Cancer 1980;46:1286–90.[Medline]

Manuscript received December 20, 2000; revised July 2, 2001; accepted July 12, 2001.


This article has been cited by other articles in HighWire Press-hosted journals:


             
Copyright © 2001 Oxford University Press (unless otherwise stated)
Oxford University Press Privacy Policy and Legal Statement